Files
lmxopcua/docs/v2/implementation/exit-gate-phase-2-closed.md
Joseph Doherty 33b87a3aa4 Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:00:35 -04:00

109 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 2 Close-Out (2026-04-20)
> Supersedes `exit-gate-phase-2-final.md` (2026-04-18) which captured the state at PR 2
> merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13
> shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1
> archive deletion. Phase 2 is closed.
## Status: **CLOSED**
Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review
is resolved. The v1 archive is deleted. The Galaxy driver runs the full
`Shared` / `Host` / `Proxy` topology against live MXAccess on the dev box with all 9
capability interfaces wired end-to-end.
## Stream-by-stream
| Stream | Plan §reference | Status | Close commit |
|---|---|---|---|
| A — Driver.Galaxy.Shared | §A.1A.3 | ✅ Complete | PR 1 |
| B — Driver.Galaxy.Host | §B.1B.10 | ✅ Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up | PR 1 + PR 4 + PR 12 + PR 13 + PR 14 |
| C — Driver.Galaxy.Proxy | §C.1C.4 | ✅ Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames | PR 1 + PR 4 |
| D — Retire legacy Host | §D.1D.3 | ✅ Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 | PR 2 → Phase 3 PR 18 → PR 61 |
| E — Parity validation | §E.1E.4 | ✅ Complete — E2E suite + 4 stability-finding regression tests + `HostSubprocessParityTests` cross-FX integration | PR 2 |
## 2026-04-18 adversarial findings — resolved
All four `High` + `Medium` items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4
(`caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from
exit-gate-phase-2-final.md`):
| ID | Finding | Resolution |
|----|---------|------------|
| High 1 | MxAccess Read subscription-leak on cancellation | One-shot read now wraps subscribe → first `OnDataChange` → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior `_addressToHandle` key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout. |
| High 2 | No MXAccess reconnect loop, only supervisor-driven recycle | `MxAccessClient` gains `MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s }` + a background `MonitorLoopAsync` started on first `ConnectAsync`. Checks `_lastObservedActivityUtc` each interval (bumped by every `OnDataChange` callback); if stale, probes the proxy with a no-op COM `AddItem("$Heartbeat")` on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot `_addressToHandle.Keys`, clear, re-AddItem every previously-active subscription. `ConnectionStateChanged` fires on the false→true transition; `ReconnectCount` bumps. |
| Medium 3 | `SubscribeAsync` doesn't push `OnDataChange` frames yet | `IGalaxyBackend` gains `OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged` events. New `IFrameHandler.AttachConnection(FrameWriter)` called per-connection by `PipeServer` after Hello. `GalaxyFrameHandler.ConnectionSink` subscribes the events for the connection lifetime, fire-and-forgets pushes as `MessageKind.OnDataChangeNotification` / `AlarmEvent` / `RuntimeStatusChange` frames through the writer, swallows `ObjectDisposedException` for dispose race, unsubscribes on Dispose. `MxAccessGalaxyBackend.SubscribeAsync` wires `OnTagValueChanged` that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via `_refToSubs` reverse map). `UnsubscribeAsync` only calls `mx.UnsubscribeAsync` when the last sub for a tag drops. |
| Medium 4 | `WriteValuesAsync` doesn't await `OnWriteComplete` | `MxAccessClient.WriteAsync` rewritten to return `Task<bool>` via the v1-style TCS-keyed-by-item-handle pattern in `_pendingWrites`. TCS added before the `Write` call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when `OnWriteComplete` reported success. `MxAccessGalaxyBackend.WriteValuesAsync` reports per-tag `Bad_InternalError` ("MXAccess runtime reported write failure") when the bool returns false. |
## Cross-cutting deferrals — resolved
| Deferral | Resolution |
|----------|------------|
| Deletion of v1 archive | Phase 3 PR 18 deleted the source trees; PR 61 closed `V1_ARCHIVE_STATUS.md` |
| Wonderware Historian SDK plugin port | `Driver.Galaxy.Host/Backend/Historian/` ports the 10 source files (`HistorianDataSource`, `HistorianClusterEndpointPicker`, `HistorianHealthSnapshot`, etc.). `MxAccessGalaxyBackend` implements `HistoryReadAsync` / `HistoryReadProcessedAsync` / `HistoryReadAtTimeAsync` / `HistoryReadEventsAsync`. `GalaxyProxyDriver.MapAggregateToColumn` translates `HistoryAggregateType``AnalogSummaryQuery` column names on the proxy side so Host stays OPC-UA-free. |
| MxAccess subscription push frames | Closed under Medium 3 above |
| Wonderware Historian-backed HistoryRead | Closed under the Historian port row |
| Alarm subsystem wire-up | PR 14. `GalaxyAlarmTracker` in `Backend/Alarms/` advises the four Galaxy alarm-state attributes per `IsAlarm=true` attribute (`.InAlarm`, `.Priority`, `.DescAttrName`, `.Acked`), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises `AlarmTransition` events (Active / Acknowledged / Inactive) forwarded through the existing `OnAlarmEvent` IPC frame. `AcknowledgeAlarmAsync` writes operator comment to `<tag>.AckMsg` through the PR 4 TCS-by-handle write path. |
| Reconnect-without-recycle in MxAccessClient | Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback). |
| Real downstream-consumer cutover | Out of scope for this repo; phased Year-3 rollout per `docs/v2/plan.md` §Rollout — not a Phase 2 deliverable. |
## 2026-04-20 test baseline
Full-solution `dotnet test ZB.MOM.WW.OtOpcUa.slnx` on `v2` tip:
| Project | Pass | Skip | Target |
|---|---:|---:|---|
| Core.Abstractions.Tests | 37 | 0 | net10 |
| Client.Shared.Tests | 136 | 0 | net10 |
| Client.CLI.Tests | 52 | 0 | net10 |
| Client.UI.Tests | 98 | 0 | net10 |
| Driver.S7.Tests | 58 | 0 | net10 |
| Driver.Modbus.Tests | 182 | 0 | net10 |
| Driver.Modbus.IntegrationTests | 2 | 21 | net10 (Docker-gated) |
| Driver.AbLegacy.Tests | 96 | 0 | net10 |
| Driver.AbCip.Tests | 211 | 0 | net10 |
| Driver.AbCip.IntegrationTests | 11 | 1 | net10 (ab_server-gated) |
| Driver.TwinCAT.Tests | 110 | 0 | net10 |
| Driver.OpcUaClient.Tests | 78 | 0 | net10 |
| Driver.FOCAS.Tests | 119 | 0 | net10 |
| Driver.Galaxy.Shared.Tests | 6 | 0 | net10 |
| Driver.Galaxy.Proxy.Tests | 18 | 7 | net10 (live-Galaxy-gated) |
| **Driver.Galaxy.Host.Tests** | **107** | **0** | **net48 x86** |
| Analyzers.Tests | 5 | 0 | net10 |
| Core.Tests | 182 | 0 | net10 |
| Configuration.Tests | 71 | 0 | net10 |
| Admin.Tests | 92 | 0 | net10 |
| Server.Tests | 173 | 0 | net10 |
| **Total** | **1844** | **29** | |
**Observed flake**: one Configuration.Tests failure on the first full-solution run turned
green on re-run. Not a stable regression; logged as a known flake until it reproduces.
**Skips are all infra-gated**:
- Modbus 21 skips — oitc/modbus-server Docker container not started.
- AbCip 1 skip — libplctag `ab_server` binary not on PATH.
- Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL).
## What "Phase 2 closed" means for Phase 3 and later
- Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 /
AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient.
- No v1 code path remains. Anything invoking the `ZB.MOM.WW.LmxOpcUa.*` namespaces is
historical; any future work routes through `Driver.Galaxy.Proxy` + the named-pipe IPC.
- The 2026-04-13 stability findings live on as named regression tests under
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs` — a
future refactor that reintroduces any of those four defects trips the test.
- Aveva Historian integration is wired end-to-end; new driver families don't need
Historian-specific plumbing in the IPC — they just implement `IHistoryProvider`.
## Outstanding — not Phase 2 blockers
- **AB CIP whole-UDT read optimization** (task #194) — niche performance win for large UDT
reads; current per-member fan-out works correctly.
- **AB CIP `IAlarmSource` via tag-projected ALMA/ALMD** (task #177) — AB CIP driver doesn't
currently expose alarms; feature-flagged follow-up.
- **IdentificationFolderBuilder wire-in** (task #195) — blocked on Equipment node walker.
- **UnsTab Playwright E2E** (task #199) — infra setup PR.
None of these are Phase 2 scope; all are tracked independently.