8.7 KiB
Phase 2 Close-Out (2026-04-20)
Supersedes
exit-gate-phase-2-final.md(2026-04-18) which captured the state at PR 2 merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13 shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1 archive deletion. Phase 2 is closed.
Status: CLOSED
Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review
is resolved. The v1 archive is deleted. The Galaxy driver runs the full
Shared / Host / Proxy topology against live MXAccess on the dev box with all 9
capability interfaces wired end-to-end.
Stream-by-stream
| Stream | Plan §reference | Status | Close commit |
|---|---|---|---|
| A — Driver.Galaxy.Shared | §A.1–A.3 | ✅ Complete | PR 1 |
| B — Driver.Galaxy.Host | §B.1–B.10 | ✅ Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up | PR 1 + PR 4 + PR 12 + PR 13 + PR 14 |
| C — Driver.Galaxy.Proxy | §C.1–C.4 | ✅ Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames | PR 1 + PR 4 |
| D — Retire legacy Host | §D.1–D.3 | ✅ Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 | PR 2 → Phase 3 PR 18 → PR 61 |
| E — Parity validation | §E.1–E.4 | ✅ Complete — E2E suite + 4 stability-finding regression tests + HostSubprocessParityTests cross-FX integration |
PR 2 |
2026-04-18 adversarial findings — resolved
All four High + Medium items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4
(caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from exit-gate-phase-2-final.md):
| ID | Finding | Resolution |
|---|---|---|
| High 1 | MxAccess Read subscription-leak on cancellation | One-shot read now wraps subscribe → first OnDataChange → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior _addressToHandle key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout. |
| High 2 | No MXAccess reconnect loop, only supervisor-driven recycle | MxAccessClient gains MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s } + a background MonitorLoopAsync started on first ConnectAsync. Checks _lastObservedActivityUtc each interval (bumped by every OnDataChange callback); if stale, probes the proxy with a no-op COM AddItem("$Heartbeat") on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot _addressToHandle.Keys, clear, re-AddItem every previously-active subscription. ConnectionStateChanged fires on the false→true transition; ReconnectCount bumps. |
| Medium 3 | SubscribeAsync doesn't push OnDataChange frames yet |
IGalaxyBackend gains OnDataChange / OnAlarmEvent / OnHostStatusChanged events. New IFrameHandler.AttachConnection(FrameWriter) called per-connection by PipeServer after Hello. GalaxyFrameHandler.ConnectionSink subscribes the events for the connection lifetime, fire-and-forgets pushes as MessageKind.OnDataChangeNotification / AlarmEvent / RuntimeStatusChange frames through the writer, swallows ObjectDisposedException for dispose race, unsubscribes on Dispose. MxAccessGalaxyBackend.SubscribeAsync wires OnTagValueChanged that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via _refToSubs reverse map). UnsubscribeAsync only calls mx.UnsubscribeAsync when the last sub for a tag drops. |
| Medium 4 | WriteValuesAsync doesn't await OnWriteComplete |
MxAccessClient.WriteAsync rewritten to return Task<bool> via the v1-style TCS-keyed-by-item-handle pattern in _pendingWrites. TCS added before the Write call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when OnWriteComplete reported success. MxAccessGalaxyBackend.WriteValuesAsync reports per-tag Bad_InternalError ("MXAccess runtime reported write failure") when the bool returns false. |
Cross-cutting deferrals — resolved
| Deferral | Resolution |
|---|---|
| Deletion of v1 archive | Phase 3 PR 18 deleted the source trees; PR 61 closed V1_ARCHIVE_STATUS.md |
| Wonderware Historian SDK plugin port | Driver.Galaxy.Host/Backend/Historian/ ports the 10 source files (HistorianDataSource, HistorianClusterEndpointPicker, HistorianHealthSnapshot, etc.). MxAccessGalaxyBackend implements HistoryReadAsync / HistoryReadProcessedAsync / HistoryReadAtTimeAsync / HistoryReadEventsAsync. GalaxyProxyDriver.MapAggregateToColumn translates HistoryAggregateType → AnalogSummaryQuery column names on the proxy side so Host stays OPC-UA-free. |
| MxAccess subscription push frames | Closed under Medium 3 above |
| Wonderware Historian-backed HistoryRead | Closed under the Historian port row |
| Alarm subsystem wire-up | PR 14. GalaxyAlarmTracker in Backend/Alarms/ advises the four Galaxy alarm-state attributes per IsAlarm=true attribute (.InAlarm, .Priority, .DescAttrName, .Acked), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises AlarmTransition events (Active / Acknowledged / Inactive) forwarded through the existing OnAlarmEvent IPC frame. AcknowledgeAlarmAsync writes operator comment to <tag>.AckMsg through the PR 4 TCS-by-handle write path. |
| Reconnect-without-recycle in MxAccessClient | Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback). |
| Real downstream-consumer cutover | Out of scope for this repo; phased Year-3 rollout per docs/v2/plan.md §Rollout — not a Phase 2 deliverable. |
2026-04-20 test baseline
Full-solution dotnet test ZB.MOM.WW.OtOpcUa.slnx on v2 tip:
| Project | Pass | Skip | Target |
|---|---|---|---|
| Core.Abstractions.Tests | 37 | 0 | net10 |
| Client.Shared.Tests | 136 | 0 | net10 |
| Client.CLI.Tests | 52 | 0 | net10 |
| Client.UI.Tests | 98 | 0 | net10 |
| Driver.S7.Tests | 58 | 0 | net10 |
| Driver.Modbus.Tests | 182 | 0 | net10 |
| Driver.Modbus.IntegrationTests | 2 | 21 | net10 (Docker-gated) |
| Driver.AbLegacy.Tests | 96 | 0 | net10 |
| Driver.AbCip.Tests | 211 | 0 | net10 |
| Driver.AbCip.IntegrationTests | 11 | 1 | net10 (ab_server-gated) |
| Driver.TwinCAT.Tests | 110 | 0 | net10 |
| Driver.OpcUaClient.Tests | 78 | 0 | net10 |
| Driver.FOCAS.Tests | 119 | 0 | net10 |
| Driver.Galaxy.Shared.Tests | 6 | 0 | net10 |
| Driver.Galaxy.Proxy.Tests | 18 | 7 | net10 (live-Galaxy-gated) |
| Driver.Galaxy.Host.Tests | 107 | 0 | net48 x86 |
| Analyzers.Tests | 5 | 0 | net10 |
| Core.Tests | 182 | 0 | net10 |
| Configuration.Tests | 71 | 0 | net10 |
| Admin.Tests | 92 | 0 | net10 |
| Server.Tests | 173 | 0 | net10 |
| Total | 1844 | 29 |
Observed flake: one Configuration.Tests failure on the first full-solution run turned green on re-run. Not a stable regression; logged as a known flake until it reproduces.
Skips are all infra-gated:
- Modbus 21 skips — oitc/modbus-server Docker container not started.
- AbCip 1 skip — libplctag
ab_serverbinary not on PATH. - Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL).
What "Phase 2 closed" means for Phase 3 and later
- Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient.
- No v1 code path remains. Anything invoking the
ZB.MOM.WW.LmxOpcUa.*namespaces is historical; any future work routes throughDriver.Galaxy.Proxy+ the named-pipe IPC. - The 2026-04-13 stability findings live on as named regression tests under
tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs— a future refactor that reintroduces any of those four defects trips the test. - Aveva Historian integration is wired end-to-end; new driver families don't need
Historian-specific plumbing in the IPC — they just implement
IHistoryProvider.
Outstanding — not Phase 2 blockers
- AB CIP whole-UDT read optimization (task #194) — niche performance win for large UDT reads; current per-member fan-out works correctly.
- AB CIP
IAlarmSourcevia tag-projected ALMA/ALMD (task #177) — AB CIP driver doesn't currently expose alarms; feature-flagged follow-up. - IdentificationFolderBuilder wire-in (task #195) — blocked on Equipment node walker.
- UnsTab Playwright E2E (task #199) — infra setup PR.
None of these are Phase 2 scope; all are tracked independently.