Files
lmxopcua/docs/v2/implementation/exit-gate-phase-2-closed.md
Joseph Doherty 33b87a3aa4 Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:00:35 -04:00

8.7 KiB
Raw Permalink Blame History

Phase 2 Close-Out (2026-04-20)

Supersedes exit-gate-phase-2-final.md (2026-04-18) which captured the state at PR 2 merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13 shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1 archive deletion. Phase 2 is closed.

Status: CLOSED

Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review is resolved. The v1 archive is deleted. The Galaxy driver runs the full Shared / Host / Proxy topology against live MXAccess on the dev box with all 9 capability interfaces wired end-to-end.

Stream-by-stream

Stream Plan §reference Status Close commit
A — Driver.Galaxy.Shared §A.1A.3 Complete PR 1
B — Driver.Galaxy.Host §B.1B.10 Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up PR 1 + PR 4 + PR 12 + PR 13 + PR 14
C — Driver.Galaxy.Proxy §C.1C.4 Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames PR 1 + PR 4
D — Retire legacy Host §D.1D.3 Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 PR 2 → Phase 3 PR 18 → PR 61
E — Parity validation §E.1E.4 Complete — E2E suite + 4 stability-finding regression tests + HostSubprocessParityTests cross-FX integration PR 2

2026-04-18 adversarial findings — resolved

All four High + Medium items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4 (caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from exit-gate-phase-2-final.md):

ID Finding Resolution
High 1 MxAccess Read subscription-leak on cancellation One-shot read now wraps subscribe → first OnDataChange → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior _addressToHandle key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout.
High 2 No MXAccess reconnect loop, only supervisor-driven recycle MxAccessClient gains MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s } + a background MonitorLoopAsync started on first ConnectAsync. Checks _lastObservedActivityUtc each interval (bumped by every OnDataChange callback); if stale, probes the proxy with a no-op COM AddItem("$Heartbeat") on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot _addressToHandle.Keys, clear, re-AddItem every previously-active subscription. ConnectionStateChanged fires on the false→true transition; ReconnectCount bumps.
Medium 3 SubscribeAsync doesn't push OnDataChange frames yet IGalaxyBackend gains OnDataChange / OnAlarmEvent / OnHostStatusChanged events. New IFrameHandler.AttachConnection(FrameWriter) called per-connection by PipeServer after Hello. GalaxyFrameHandler.ConnectionSink subscribes the events for the connection lifetime, fire-and-forgets pushes as MessageKind.OnDataChangeNotification / AlarmEvent / RuntimeStatusChange frames through the writer, swallows ObjectDisposedException for dispose race, unsubscribes on Dispose. MxAccessGalaxyBackend.SubscribeAsync wires OnTagValueChanged that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via _refToSubs reverse map). UnsubscribeAsync only calls mx.UnsubscribeAsync when the last sub for a tag drops.
Medium 4 WriteValuesAsync doesn't await OnWriteComplete MxAccessClient.WriteAsync rewritten to return Task<bool> via the v1-style TCS-keyed-by-item-handle pattern in _pendingWrites. TCS added before the Write call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when OnWriteComplete reported success. MxAccessGalaxyBackend.WriteValuesAsync reports per-tag Bad_InternalError ("MXAccess runtime reported write failure") when the bool returns false.

Cross-cutting deferrals — resolved

Deferral Resolution
Deletion of v1 archive Phase 3 PR 18 deleted the source trees; PR 61 closed V1_ARCHIVE_STATUS.md
Wonderware Historian SDK plugin port Driver.Galaxy.Host/Backend/Historian/ ports the 10 source files (HistorianDataSource, HistorianClusterEndpointPicker, HistorianHealthSnapshot, etc.). MxAccessGalaxyBackend implements HistoryReadAsync / HistoryReadProcessedAsync / HistoryReadAtTimeAsync / HistoryReadEventsAsync. GalaxyProxyDriver.MapAggregateToColumn translates HistoryAggregateTypeAnalogSummaryQuery column names on the proxy side so Host stays OPC-UA-free.
MxAccess subscription push frames Closed under Medium 3 above
Wonderware Historian-backed HistoryRead Closed under the Historian port row
Alarm subsystem wire-up PR 14. GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes per IsAlarm=true attribute (.InAlarm, .Priority, .DescAttrName, .Acked), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises AlarmTransition events (Active / Acknowledged / Inactive) forwarded through the existing OnAlarmEvent IPC frame. AcknowledgeAlarmAsync writes operator comment to <tag>.AckMsg through the PR 4 TCS-by-handle write path.
Reconnect-without-recycle in MxAccessClient Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback).
Real downstream-consumer cutover Out of scope for this repo; phased Year-3 rollout per docs/v2/plan.md §Rollout — not a Phase 2 deliverable.

2026-04-20 test baseline

Full-solution dotnet test ZB.MOM.WW.OtOpcUa.slnx on v2 tip:

Project Pass Skip Target
Core.Abstractions.Tests 37 0 net10
Client.Shared.Tests 136 0 net10
Client.CLI.Tests 52 0 net10
Client.UI.Tests 98 0 net10
Driver.S7.Tests 58 0 net10
Driver.Modbus.Tests 182 0 net10
Driver.Modbus.IntegrationTests 2 21 net10 (Docker-gated)
Driver.AbLegacy.Tests 96 0 net10
Driver.AbCip.Tests 211 0 net10
Driver.AbCip.IntegrationTests 11 1 net10 (ab_server-gated)
Driver.TwinCAT.Tests 110 0 net10
Driver.OpcUaClient.Tests 78 0 net10
Driver.FOCAS.Tests 119 0 net10
Driver.Galaxy.Shared.Tests 6 0 net10
Driver.Galaxy.Proxy.Tests 18 7 net10 (live-Galaxy-gated)
Driver.Galaxy.Host.Tests 107 0 net48 x86
Analyzers.Tests 5 0 net10
Core.Tests 182 0 net10
Configuration.Tests 71 0 net10
Admin.Tests 92 0 net10
Server.Tests 173 0 net10
Total 1844 29

Observed flake: one Configuration.Tests failure on the first full-solution run turned green on re-run. Not a stable regression; logged as a known flake until it reproduces.

Skips are all infra-gated:

  • Modbus 21 skips — oitc/modbus-server Docker container not started.
  • AbCip 1 skip — libplctag ab_server binary not on PATH.
  • Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL).

What "Phase 2 closed" means for Phase 3 and later

  • Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient.
  • No v1 code path remains. Anything invoking the ZB.MOM.WW.LmxOpcUa.* namespaces is historical; any future work routes through Driver.Galaxy.Proxy + the named-pipe IPC.
  • The 2026-04-13 stability findings live on as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs — a future refactor that reintroduces any of those four defects trips the test.
  • Aveva Historian integration is wired end-to-end; new driver families don't need Historian-specific plumbing in the IPC — they just implement IHistoryProvider.

Outstanding — not Phase 2 blockers

  • AB CIP whole-UDT read optimization (task #194) — niche performance win for large UDT reads; current per-member fan-out works correctly.
  • AB CIP IAlarmSource via tag-projected ALMA/ALMD (task #177) — AB CIP driver doesn't currently expose alarms; feature-flagged follow-up.
  • IdentificationFolderBuilder wire-in (task #195) — blocked on Equipment node walker.
  • UnsTab Playwright E2E (task #199) — infra setup PR.

None of these are Phase 2 scope; all are tracked independently.