Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the ✅ CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
108
docs/v2/implementation/exit-gate-phase-2-closed.md
Normal file
108
docs/v2/implementation/exit-gate-phase-2-closed.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# Phase 2 Close-Out (2026-04-20)
|
||||
|
||||
> Supersedes `exit-gate-phase-2-final.md` (2026-04-18) which captured the state at PR 2
|
||||
> merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13
|
||||
> shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1
|
||||
> archive deletion. Phase 2 is closed.
|
||||
|
||||
## Status: **CLOSED**
|
||||
|
||||
Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review
|
||||
is resolved. The v1 archive is deleted. The Galaxy driver runs the full
|
||||
`Shared` / `Host` / `Proxy` topology against live MXAccess on the dev box with all 9
|
||||
capability interfaces wired end-to-end.
|
||||
|
||||
## Stream-by-stream
|
||||
|
||||
| Stream | Plan §reference | Status | Close commit |
|
||||
|---|---|---|---|
|
||||
| A — Driver.Galaxy.Shared | §A.1–A.3 | ✅ Complete | PR 1 |
|
||||
| B — Driver.Galaxy.Host | §B.1–B.10 | ✅ Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up | PR 1 + PR 4 + PR 12 + PR 13 + PR 14 |
|
||||
| C — Driver.Galaxy.Proxy | §C.1–C.4 | ✅ Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames | PR 1 + PR 4 |
|
||||
| D — Retire legacy Host | §D.1–D.3 | ✅ Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 | PR 2 → Phase 3 PR 18 → PR 61 |
|
||||
| E — Parity validation | §E.1–E.4 | ✅ Complete — E2E suite + 4 stability-finding regression tests + `HostSubprocessParityTests` cross-FX integration | PR 2 |
|
||||
|
||||
## 2026-04-18 adversarial findings — resolved
|
||||
|
||||
All four `High` + `Medium` items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4
|
||||
(`caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from
|
||||
exit-gate-phase-2-final.md`):
|
||||
|
||||
| ID | Finding | Resolution |
|
||||
|----|---------|------------|
|
||||
| High 1 | MxAccess Read subscription-leak on cancellation | One-shot read now wraps subscribe → first `OnDataChange` → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior `_addressToHandle` key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout. |
|
||||
| High 2 | No MXAccess reconnect loop, only supervisor-driven recycle | `MxAccessClient` gains `MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s }` + a background `MonitorLoopAsync` started on first `ConnectAsync`. Checks `_lastObservedActivityUtc` each interval (bumped by every `OnDataChange` callback); if stale, probes the proxy with a no-op COM `AddItem("$Heartbeat")` on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot `_addressToHandle.Keys`, clear, re-AddItem every previously-active subscription. `ConnectionStateChanged` fires on the false→true transition; `ReconnectCount` bumps. |
|
||||
| Medium 3 | `SubscribeAsync` doesn't push `OnDataChange` frames yet | `IGalaxyBackend` gains `OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged` events. New `IFrameHandler.AttachConnection(FrameWriter)` called per-connection by `PipeServer` after Hello. `GalaxyFrameHandler.ConnectionSink` subscribes the events for the connection lifetime, fire-and-forgets pushes as `MessageKind.OnDataChangeNotification` / `AlarmEvent` / `RuntimeStatusChange` frames through the writer, swallows `ObjectDisposedException` for dispose race, unsubscribes on Dispose. `MxAccessGalaxyBackend.SubscribeAsync` wires `OnTagValueChanged` that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via `_refToSubs` reverse map). `UnsubscribeAsync` only calls `mx.UnsubscribeAsync` when the last sub for a tag drops. |
|
||||
| Medium 4 | `WriteValuesAsync` doesn't await `OnWriteComplete` | `MxAccessClient.WriteAsync` rewritten to return `Task<bool>` via the v1-style TCS-keyed-by-item-handle pattern in `_pendingWrites`. TCS added before the `Write` call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when `OnWriteComplete` reported success. `MxAccessGalaxyBackend.WriteValuesAsync` reports per-tag `Bad_InternalError` ("MXAccess runtime reported write failure") when the bool returns false. |
|
||||
|
||||
## Cross-cutting deferrals — resolved
|
||||
|
||||
| Deferral | Resolution |
|
||||
|----------|------------|
|
||||
| Deletion of v1 archive | Phase 3 PR 18 deleted the source trees; PR 61 closed `V1_ARCHIVE_STATUS.md` |
|
||||
| Wonderware Historian SDK plugin port | `Driver.Galaxy.Host/Backend/Historian/` ports the 10 source files (`HistorianDataSource`, `HistorianClusterEndpointPicker`, `HistorianHealthSnapshot`, etc.). `MxAccessGalaxyBackend` implements `HistoryReadAsync` / `HistoryReadProcessedAsync` / `HistoryReadAtTimeAsync` / `HistoryReadEventsAsync`. `GalaxyProxyDriver.MapAggregateToColumn` translates `HistoryAggregateType` → `AnalogSummaryQuery` column names on the proxy side so Host stays OPC-UA-free. |
|
||||
| MxAccess subscription push frames | Closed under Medium 3 above |
|
||||
| Wonderware Historian-backed HistoryRead | Closed under the Historian port row |
|
||||
| Alarm subsystem wire-up | PR 14. `GalaxyAlarmTracker` in `Backend/Alarms/` advises the four Galaxy alarm-state attributes per `IsAlarm=true` attribute (`.InAlarm`, `.Priority`, `.DescAttrName`, `.Acked`), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises `AlarmTransition` events (Active / Acknowledged / Inactive) forwarded through the existing `OnAlarmEvent` IPC frame. `AcknowledgeAlarmAsync` writes operator comment to `<tag>.AckMsg` through the PR 4 TCS-by-handle write path. |
|
||||
| Reconnect-without-recycle in MxAccessClient | Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback). |
|
||||
| Real downstream-consumer cutover | Out of scope for this repo; phased Year-3 rollout per `docs/v2/plan.md` §Rollout — not a Phase 2 deliverable. |
|
||||
|
||||
## 2026-04-20 test baseline
|
||||
|
||||
Full-solution `dotnet test ZB.MOM.WW.OtOpcUa.slnx` on `v2` tip:
|
||||
|
||||
| Project | Pass | Skip | Target |
|
||||
|---|---:|---:|---|
|
||||
| Core.Abstractions.Tests | 37 | 0 | net10 |
|
||||
| Client.Shared.Tests | 136 | 0 | net10 |
|
||||
| Client.CLI.Tests | 52 | 0 | net10 |
|
||||
| Client.UI.Tests | 98 | 0 | net10 |
|
||||
| Driver.S7.Tests | 58 | 0 | net10 |
|
||||
| Driver.Modbus.Tests | 182 | 0 | net10 |
|
||||
| Driver.Modbus.IntegrationTests | 2 | 21 | net10 (Docker-gated) |
|
||||
| Driver.AbLegacy.Tests | 96 | 0 | net10 |
|
||||
| Driver.AbCip.Tests | 211 | 0 | net10 |
|
||||
| Driver.AbCip.IntegrationTests | 11 | 1 | net10 (ab_server-gated) |
|
||||
| Driver.TwinCAT.Tests | 110 | 0 | net10 |
|
||||
| Driver.OpcUaClient.Tests | 78 | 0 | net10 |
|
||||
| Driver.FOCAS.Tests | 119 | 0 | net10 |
|
||||
| Driver.Galaxy.Shared.Tests | 6 | 0 | net10 |
|
||||
| Driver.Galaxy.Proxy.Tests | 18 | 7 | net10 (live-Galaxy-gated) |
|
||||
| **Driver.Galaxy.Host.Tests** | **107** | **0** | **net48 x86** |
|
||||
| Analyzers.Tests | 5 | 0 | net10 |
|
||||
| Core.Tests | 182 | 0 | net10 |
|
||||
| Configuration.Tests | 71 | 0 | net10 |
|
||||
| Admin.Tests | 92 | 0 | net10 |
|
||||
| Server.Tests | 173 | 0 | net10 |
|
||||
| **Total** | **1844** | **29** | |
|
||||
|
||||
**Observed flake**: one Configuration.Tests failure on the first full-solution run turned
|
||||
green on re-run. Not a stable regression; logged as a known flake until it reproduces.
|
||||
|
||||
**Skips are all infra-gated**:
|
||||
- Modbus 21 skips — oitc/modbus-server Docker container not started.
|
||||
- AbCip 1 skip — libplctag `ab_server` binary not on PATH.
|
||||
- Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL).
|
||||
|
||||
## What "Phase 2 closed" means for Phase 3 and later
|
||||
|
||||
- Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 /
|
||||
AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient.
|
||||
- No v1 code path remains. Anything invoking the `ZB.MOM.WW.LmxOpcUa.*` namespaces is
|
||||
historical; any future work routes through `Driver.Galaxy.Proxy` + the named-pipe IPC.
|
||||
- The 2026-04-13 stability findings live on as named regression tests under
|
||||
`tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs` — a
|
||||
future refactor that reintroduces any of those four defects trips the test.
|
||||
- Aveva Historian integration is wired end-to-end; new driver families don't need
|
||||
Historian-specific plumbing in the IPC — they just implement `IHistoryProvider`.
|
||||
|
||||
## Outstanding — not Phase 2 blockers
|
||||
|
||||
- **AB CIP whole-UDT read optimization** (task #194) — niche performance win for large UDT
|
||||
reads; current per-member fan-out works correctly.
|
||||
- **AB CIP `IAlarmSource` via tag-projected ALMA/ALMD** (task #177) — AB CIP driver doesn't
|
||||
currently expose alarms; feature-flagged follow-up.
|
||||
- **IdentificationFolderBuilder wire-in** (task #195) — blocked on Equipment node walker.
|
||||
- **UnsTab Playwright E2E** (task #199) — infra setup PR.
|
||||
|
||||
None of these are Phase 2 scope; all are tracked independently.
|
||||
Reference in New Issue
Block a user