Files
lmxopcua/docs/v2/implementation/exit-gate-phase-2-final.md
Joseph Doherty 33b87a3aa4 Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:00:35 -04:00

7.5 KiB
Raw Blame History

Phase 2 Final Exit Gate (2026-04-18)

⚠️ Superseded by exit-gate-phase-2-closed.md (2026-04-20). This doc captures the snapshot at PR 2 merge — when the four High + Medium findings in the adversarial review were still OPEN and Historian port + alarm subsystem were still deferred. All of those closed subsequently (PR 4 + PR 12 + PR 13 + PR 14 + PR 61). Kept as historical evidence; consult the close-out doc for current Phase 2 status.

Supersedes phase-2-partial-exit-evidence.md and exit-gate-phase-2.md. Captures the as-built state at the close of Phase 2 work delivered across two PRs.

Status: All five Phase 2 streams addressed. Stream D split across PR 2 (archive) + PR 3 (delete) per safety protocol.

Stream-by-stream status

Stream Plan §reference Status PR
A — Driver.Galaxy.Shared §A.1A.3 Complete PR 1 (merged or pending)
B — Driver.Galaxy.Host §B.1B.10 Real Win32 pump, all Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess with live COM) PR 1
C — Driver.Galaxy.Proxy §C.1C.4 All 9 capability interfaces + supervisor (Backoff + CircuitBreaker + HeartbeatMonitor) PR 1
D — Retire legacy Host §D.1D.3 Migration script, installer scripts, Stream D procedure doc, archive markings on all v1 surface (this PR 2), deletion deferred to PR 3 PR 2 (this) + PR 3 (next)
E — Parity validation §E.1E.4 E2E test scaffold + 4 stability-finding regression tests + HostSubprocessParityTests cross-FX integration PR 2 (this)

What changed in PR 2 (this branch phase-2-stream-d)

  1. tests/ZB.MOM.WW.OtOpcUa.Tests/ renamed to tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/, <AssemblyName> kept as ZB.MOM.WW.OtOpcUa.Tests so the v1 Host's InternalsVisibleTo still matches, <IsTestProject>false</IsTestProject> so dotnet test slnx excludes it.
  2. Three other v1 projects archive-marked with PropertyGroup comments: OtOpcUa.Host, Historian.Aveva, IntegrationTests. IntegrationTests also gets <IsTestProject>false</IsTestProject>.
  3. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ project (.NET 10):
    • ParityFixture spawns OtOpcUa.Driver.Galaxy.Host.exe (net48 x86) as subprocess via Process.Start, connects via real named pipe, exposes a connected GalaxyProxyDriver. Skips when Galaxy ZB unreachable, when Host EXE not built, or when running as Administrator (PipeAcl denies admins).
    • RecordingAddressSpaceBuilder captures Folder + Variable + Property registrations so parity tests can assert shape.
    • HierarchyParityTests (3) — Discover returns gobjects with attributes; attribute full references match tag.attribute shape; HistoryExtension flag flows through.
    • StabilityFindingsRegressionTests (4) — one test per 2026-04-13 finding: phantom-probe-doesn't-corrupt-status, host-status-event-is-scoped, all-async-no-sync- over-async, AcknowledgeAsync-completes-before-returning.
  4. docs/v2/V1_ARCHIVE_STATUS.md — inventory + deletion plan for PR 3.
  5. docs/v2/implementation/exit-gate-phase-2-final.md (this doc) — supersedes the two partial-exit docs.

Test counts

Solution-level dotnet test ZB.MOM.WW.OtOpcUa.slnx: 470 pass / 7 skip / 1 baseline failure.

Project Pass Skip
Core.Abstractions.Tests 24 0
Configuration.Tests 42 0
Core.Tests 4 0
Server.Tests 2 0
Admin.Tests 21 0
Driver.Galaxy.Shared.Tests 6 0
Driver.Galaxy.Host.Tests 30 0
Driver.Galaxy.Proxy.Tests 10 0
Driver.Galaxy.E2E (NEW) 0 7 (all skip with documented reason — admin shell)
Client.Shared.Tests 131 0
Client.UI.Tests 98 0
Client.CLI.Tests 51 / 1 fail 0
Historian.Aveva.Tests 41 0

Excluded from solution run (run explicitly when needed):

  • OtOpcUa.Tests.v1Archive — 494 pass (v1 unit tests, kept as parity reference)
  • OtOpcUa.IntegrationTests — 6 pass (v1 integration tests, kept as parity reference)

Adversarial review of the PR 2 diff

Independent pass over the PR 2 deltas. New findings ranked by severity; existing findings from the previous exit-gate doc still apply.

New findings

Medium 1 — IsTestProject=false on OtOpcUa.IntegrationTests removes the safety net. The 6 v1 integration tests no longer run on solution test. Mitigation: the new E2E suite covers the same scenarios in the v2 topology shape. Risk: if E2E test count regresses or fails to cover a scenario, the v1 fallback isn't auto-checked. Procedure: PR 3 checklist includes "E2E test count covers v1 IntegrationTests' 6 scenarios at minimum".

Medium 2 — Stability-finding regression tests #2, #3, #4 are structural (reflection-based) not behavioral. Findings #2 and #3 use type-shape assertions (event signature carries HostName; methods return Task) rather than triggering the actual race. Mitigation: the v1 defects were structural — fixing them required interface changes that the type-shape assertions catch. Risk: a future refactor that re-introduces sync-over-async via a non- async helper called inside a Task method wouldn't trip the test. Filed as v2.1: add a runtime async-call-stack analyzer (Roslyn or post-build).

Low 1 — ParityFixture defaults to OTOPCUA_GALAXY_BACKEND=db (not mxaccess). Discover works against ZB without needing live MXAccess. The MXAccess-required tests will need a second fixture once they're written.

Low 2 — Process.Start(EnvironmentVariables) doesn't always inherit clean state. The test inherits the parent's PATH + locale, which is normally fine but could mask a missing runtime dependency. Mitigation: in CI, pin a clean environment block.

Existing findings (carried forward from exit-gate-phase-2.md)

All 8 still apply unchanged. Particularly:

  • High 1 (MxAccess Read subscription-leak on cancellation) — open
  • High 2 (no MXAccess reconnect loop, only supervisor-driven recycle) — open
  • Medium 3 (SubscribeAsync doesn't push OnDataChange frames yet) — open
  • Medium 4 (WriteValuesAsync doesn't await OnWriteComplete) — open

Cross-cutting deferrals (out of Phase 2)

  • Deletion of v1 archive — PR 3, gated on operator review + E2E coverage parity check
  • Wonderware Historian SDK plugin port (Historian.AvevaDriver.Galaxy.Host/Backend/Historian/) — Task B.1.h, opportunistically with PR 3 or as PR 4
  • MxAccess subscription push frames — Task B.1.s, follow-up to enable real-time data flow (currently subscribes register but values aren't pushed back)
  • Wonderware Historian-backed HistoryRead — depends on B.1.h
  • Alarm subsystem wire-upMxAccessGalaxyBackend.SubscribeAlarmsAsync is a no-op
  • Reconnect-without-recycle in MxAccessClient — v2.1 refinement
  • Real downstream-consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) — outside this repo
  1. PR 1 (phase-1-configurationv2) — merge first; self-contained, parity preserved
  2. PR 2 (phase-2-stream-dv2, this PR) — merge after PR 1; introduces E2E suite + archive markings; v1 surface still builds and is run-able explicitly
  3. PR 3 (next session) — delete v1 archive; depends on operator approval after PR 2 reviewer signoff
  4. PR 4 (Phase 2 follow-up) — Historian port + MxAccess subscription push frames + the open high/medium findings