Files
lmxopcua/docs/v2/implementation/exit-gate-phase-2-final.md
Joseph Doherty a3d16a28f1 Phase 2 Stream D Option B — archive v1 surface + new Driver.Galaxy.E2E parity suite. Non-destructive intermediate state: the v1 OtOpcUa.Host + Historian.Aveva + Tests + IntegrationTests projects all still build (494 v1 unit + 6 v1 integration tests still pass when run explicitly), but solution-level dotnet test ZB.MOM.WW.OtOpcUa.slnx now skips them via IsTestProject=false on the test projects + archive-status PropertyGroup comments on the src projects. The destructive deletion is reserved for Phase 2 PR 3 with explicit operator review per CLAUDE.md "only use destructive operations when truly the best approach". tests/ZB.MOM.WW.OtOpcUa.Tests/ renamed via git mv to tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/; csproj <AssemblyName> kept as the original ZB.MOM.WW.OtOpcUa.Tests so v1 OtOpcUa.Host's [InternalsVisibleTo("ZB.MOM.WW.OtOpcUa.Tests")] still matches and the project rebuilds clean. tests/ZB.MOM.WW.OtOpcUa.IntegrationTests gets <IsTestProject>false</IsTestProject>. src/ZB.MOM.WW.OtOpcUa.Host + src/ZB.MOM.WW.OtOpcUa.Historian.Aveva get PropertyGroup archive-status comments documenting they're functionally superseded but kept in-build because cascading dependencies (Historian.Aveva → Host; IntegrationTests → Host) make a single-PR deletion high blast-radius. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ project (.NET 10) with ParityFixture that spawns OtOpcUa.Driver.Galaxy.Host.exe (net48 x86) as a Process.Start subprocess with OTOPCUA_GALAXY_BACKEND=db env vars, awaits 2s for the PipeServer to bind, then exposes a connected GalaxyProxyDriver; skips on non-Windows / Administrator shells (PipeAcl denies admins per decision #76) / ZB unreachable / Host EXE not built — each skip carries a SkipReason string the test method reads via Assert.Skip(SkipReason). RecordingAddressSpaceBuilder captures every Folder/Variable/AddProperty registration so parity tests can assert on the same shape v1 LmxNodeManager produced. HierarchyParityTests (3) — Discover returns gobjects with attributes; attribute full references match the tag.attribute Galaxy reference grammar; HistoryExtension flag flows through correctly. StabilityFindingsRegressionTests (4) — one test per 2026-04-13 stability finding from commits c76ab8f and 7310925: phantom probe subscription doesn't corrupt unrelated host status; HostStatusChangedEventArgs structurally carries a specific HostName + OldState + NewState (event signature mathematically prevents the v1 cross-host quality-clear bug); all GalaxyProxyDriver capability methods return Task or Task<T> (sync-over-async would deadlock OPC UA stack thread); AcknowledgeAsync completes before returning (no fire-and-forget background work that could race shutdown). Solution test count: 470 pass / 7 skip (E2E on admin shell) / 1 pre-existing Phase 0 baseline. Run archived suites explicitly: dotnet test tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive (494 pass) + dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests (6 pass). docs/v2/V1_ARCHIVE_STATUS.md inventories every archived surface with run-it-explicitly instructions + a 10-step deletion plan for PR 3 + rollback procedure (git revert restores all four projects). docs/v2/implementation/exit-gate-phase-2-final.md supersedes the two partial-exit docs with the per-stream status table (A/B/C/D/E all addressed, D split across PR 2/3 per safety protocol), the test count breakdown, fresh adversarial review of PR 2 deltas (4 new findings: medium IsTestProject=false safety net loss, medium structural-vs-behavioral stability tests, low backend=db default, low Process.Start env inheritance), the 8 carried-forward findings from exit-gate-phase-2.md, the recommended PR order (1 → 2 → 3 → 4). docs/v2/implementation/pr-2-body.md is the Gitea web-UI paste-in for opening PR 2 once pushed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:56:21 -04:00

7.0 KiB
Raw Blame History

Phase 2 Final Exit Gate (2026-04-18)

Supersedes phase-2-partial-exit-evidence.md and exit-gate-phase-2.md. Captures the as-built state at the close of Phase 2 work delivered across two PRs.

Status: All five Phase 2 streams addressed. Stream D split across PR 2 (archive) + PR 3 (delete) per safety protocol.

Stream-by-stream status

Stream Plan §reference Status PR
A — Driver.Galaxy.Shared §A.1A.3 Complete PR 1 (merged or pending)
B — Driver.Galaxy.Host §B.1B.10 Real Win32 pump, all Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess with live COM) PR 1
C — Driver.Galaxy.Proxy §C.1C.4 All 9 capability interfaces + supervisor (Backoff + CircuitBreaker + HeartbeatMonitor) PR 1
D — Retire legacy Host §D.1D.3 Migration script, installer scripts, Stream D procedure doc, archive markings on all v1 surface (this PR 2), deletion deferred to PR 3 PR 2 (this) + PR 3 (next)
E — Parity validation §E.1E.4 E2E test scaffold + 4 stability-finding regression tests + HostSubprocessParityTests cross-FX integration PR 2 (this)

What changed in PR 2 (this branch phase-2-stream-d)

  1. tests/ZB.MOM.WW.OtOpcUa.Tests/ renamed to tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/, <AssemblyName> kept as ZB.MOM.WW.OtOpcUa.Tests so the v1 Host's InternalsVisibleTo still matches, <IsTestProject>false</IsTestProject> so dotnet test slnx excludes it.
  2. Three other v1 projects archive-marked with PropertyGroup comments: OtOpcUa.Host, Historian.Aveva, IntegrationTests. IntegrationTests also gets <IsTestProject>false</IsTestProject>.
  3. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ project (.NET 10):
    • ParityFixture spawns OtOpcUa.Driver.Galaxy.Host.exe (net48 x86) as subprocess via Process.Start, connects via real named pipe, exposes a connected GalaxyProxyDriver. Skips when Galaxy ZB unreachable, when Host EXE not built, or when running as Administrator (PipeAcl denies admins).
    • RecordingAddressSpaceBuilder captures Folder + Variable + Property registrations so parity tests can assert shape.
    • HierarchyParityTests (3) — Discover returns gobjects with attributes; attribute full references match tag.attribute shape; HistoryExtension flag flows through.
    • StabilityFindingsRegressionTests (4) — one test per 2026-04-13 finding: phantom-probe-doesn't-corrupt-status, host-status-event-is-scoped, all-async-no-sync- over-async, AcknowledgeAsync-completes-before-returning.
  4. docs/v2/V1_ARCHIVE_STATUS.md — inventory + deletion plan for PR 3.
  5. docs/v2/implementation/exit-gate-phase-2-final.md (this doc) — supersedes the two partial-exit docs.

Test counts

Solution-level dotnet test ZB.MOM.WW.OtOpcUa.slnx: 470 pass / 7 skip / 1 baseline failure.

Project Pass Skip
Core.Abstractions.Tests 24 0
Configuration.Tests 42 0
Core.Tests 4 0
Server.Tests 2 0
Admin.Tests 21 0
Driver.Galaxy.Shared.Tests 6 0
Driver.Galaxy.Host.Tests 30 0
Driver.Galaxy.Proxy.Tests 10 0
Driver.Galaxy.E2E (NEW) 0 7 (all skip with documented reason — admin shell)
Client.Shared.Tests 131 0
Client.UI.Tests 98 0
Client.CLI.Tests 51 / 1 fail 0
Historian.Aveva.Tests 41 0

Excluded from solution run (run explicitly when needed):

  • OtOpcUa.Tests.v1Archive — 494 pass (v1 unit tests, kept as parity reference)
  • OtOpcUa.IntegrationTests — 6 pass (v1 integration tests, kept as parity reference)

Adversarial review of the PR 2 diff

Independent pass over the PR 2 deltas. New findings ranked by severity; existing findings from the previous exit-gate doc still apply.

New findings

Medium 1 — IsTestProject=false on OtOpcUa.IntegrationTests removes the safety net. The 6 v1 integration tests no longer run on solution test. Mitigation: the new E2E suite covers the same scenarios in the v2 topology shape. Risk: if E2E test count regresses or fails to cover a scenario, the v1 fallback isn't auto-checked. Procedure: PR 3 checklist includes "E2E test count covers v1 IntegrationTests' 6 scenarios at minimum".

Medium 2 — Stability-finding regression tests #2, #3, #4 are structural (reflection-based) not behavioral. Findings #2 and #3 use type-shape assertions (event signature carries HostName; methods return Task) rather than triggering the actual race. Mitigation: the v1 defects were structural — fixing them required interface changes that the type-shape assertions catch. Risk: a future refactor that re-introduces sync-over-async via a non- async helper called inside a Task method wouldn't trip the test. Filed as v2.1: add a runtime async-call-stack analyzer (Roslyn or post-build).

Low 1 — ParityFixture defaults to OTOPCUA_GALAXY_BACKEND=db (not mxaccess). Discover works against ZB without needing live MXAccess. The MXAccess-required tests will need a second fixture once they're written.

Low 2 — Process.Start(EnvironmentVariables) doesn't always inherit clean state. The test inherits the parent's PATH + locale, which is normally fine but could mask a missing runtime dependency. Mitigation: in CI, pin a clean environment block.

Existing findings (carried forward from exit-gate-phase-2.md)

All 8 still apply unchanged. Particularly:

  • High 1 (MxAccess Read subscription-leak on cancellation) — open
  • High 2 (no MXAccess reconnect loop, only supervisor-driven recycle) — open
  • Medium 3 (SubscribeAsync doesn't push OnDataChange frames yet) — open
  • Medium 4 (WriteValuesAsync doesn't await OnWriteComplete) — open

Cross-cutting deferrals (out of Phase 2)

  • Deletion of v1 archive — PR 3, gated on operator review + E2E coverage parity check
  • Wonderware Historian SDK plugin port (Historian.AvevaDriver.Galaxy.Host/Backend/Historian/) — Task B.1.h, opportunistically with PR 3 or as PR 4
  • MxAccess subscription push frames — Task B.1.s, follow-up to enable real-time data flow (currently subscribes register but values aren't pushed back)
  • Wonderware Historian-backed HistoryRead — depends on B.1.h
  • Alarm subsystem wire-upMxAccessGalaxyBackend.SubscribeAlarmsAsync is a no-op
  • Reconnect-without-recycle in MxAccessClient — v2.1 refinement
  • Real downstream-consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) — outside this repo
  1. PR 1 (phase-1-configurationv2) — merge first; self-contained, parity preserved
  2. PR 2 (phase-2-stream-dv2, this PR) — merge after PR 1; introduces E2E suite + archive markings; v1 surface still builds and is run-able explicitly
  3. PR 3 (next session) — delete v1 archive; depends on operator approval after PR 2 reviewer signoff
  4. PR 4 (Phase 2 follow-up) — Historian port + MxAccess subscription push frames + the open high/medium findings