diff --git a/docs/v2/implementation/exit-gate-phase-2-closed.md b/docs/v2/implementation/exit-gate-phase-2-closed.md new file mode 100644 index 0000000..8deb0bd --- /dev/null +++ b/docs/v2/implementation/exit-gate-phase-2-closed.md @@ -0,0 +1,108 @@ +# Phase 2 Close-Out (2026-04-20) + +> Supersedes `exit-gate-phase-2-final.md` (2026-04-18) which captured the state at PR 2 +> merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13 +> shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1 +> archive deletion. Phase 2 is closed. + +## Status: **CLOSED** + +Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review +is resolved. The v1 archive is deleted. The Galaxy driver runs the full +`Shared` / `Host` / `Proxy` topology against live MXAccess on the dev box with all 9 +capability interfaces wired end-to-end. + +## Stream-by-stream + +| Stream | Plan §reference | Status | Close commit | +|---|---|---|---| +| A — Driver.Galaxy.Shared | §A.1–A.3 | ✅ Complete | PR 1 | +| B — Driver.Galaxy.Host | §B.1–B.10 | ✅ Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up | PR 1 + PR 4 + PR 12 + PR 13 + PR 14 | +| C — Driver.Galaxy.Proxy | §C.1–C.4 | ✅ Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames | PR 1 + PR 4 | +| D — Retire legacy Host | §D.1–D.3 | ✅ Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 | PR 2 → Phase 3 PR 18 → PR 61 | +| E — Parity validation | §E.1–E.4 | ✅ Complete — E2E suite + 4 stability-finding regression tests + `HostSubprocessParityTests` cross-FX integration | PR 2 | + +## 2026-04-18 adversarial findings — resolved + +All four `High` + `Medium` items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4 +(`caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from +exit-gate-phase-2-final.md`): + +| ID | Finding | Resolution | +|----|---------|------------| +| High 1 | MxAccess Read subscription-leak on cancellation | One-shot read now wraps subscribe → first `OnDataChange` → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior `_addressToHandle` key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout. | +| High 2 | No MXAccess reconnect loop, only supervisor-driven recycle | `MxAccessClient` gains `MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s }` + a background `MonitorLoopAsync` started on first `ConnectAsync`. Checks `_lastObservedActivityUtc` each interval (bumped by every `OnDataChange` callback); if stale, probes the proxy with a no-op COM `AddItem("$Heartbeat")` on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot `_addressToHandle.Keys`, clear, re-AddItem every previously-active subscription. `ConnectionStateChanged` fires on the false→true transition; `ReconnectCount` bumps. | +| Medium 3 | `SubscribeAsync` doesn't push `OnDataChange` frames yet | `IGalaxyBackend` gains `OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged` events. New `IFrameHandler.AttachConnection(FrameWriter)` called per-connection by `PipeServer` after Hello. `GalaxyFrameHandler.ConnectionSink` subscribes the events for the connection lifetime, fire-and-forgets pushes as `MessageKind.OnDataChangeNotification` / `AlarmEvent` / `RuntimeStatusChange` frames through the writer, swallows `ObjectDisposedException` for dispose race, unsubscribes on Dispose. `MxAccessGalaxyBackend.SubscribeAsync` wires `OnTagValueChanged` that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via `_refToSubs` reverse map). `UnsubscribeAsync` only calls `mx.UnsubscribeAsync` when the last sub for a tag drops. | +| Medium 4 | `WriteValuesAsync` doesn't await `OnWriteComplete` | `MxAccessClient.WriteAsync` rewritten to return `Task` via the v1-style TCS-keyed-by-item-handle pattern in `_pendingWrites`. TCS added before the `Write` call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when `OnWriteComplete` reported success. `MxAccessGalaxyBackend.WriteValuesAsync` reports per-tag `Bad_InternalError` ("MXAccess runtime reported write failure") when the bool returns false. | + +## Cross-cutting deferrals — resolved + +| Deferral | Resolution | +|----------|------------| +| Deletion of v1 archive | Phase 3 PR 18 deleted the source trees; PR 61 closed `V1_ARCHIVE_STATUS.md` | +| Wonderware Historian SDK plugin port | `Driver.Galaxy.Host/Backend/Historian/` ports the 10 source files (`HistorianDataSource`, `HistorianClusterEndpointPicker`, `HistorianHealthSnapshot`, etc.). `MxAccessGalaxyBackend` implements `HistoryReadAsync` / `HistoryReadProcessedAsync` / `HistoryReadAtTimeAsync` / `HistoryReadEventsAsync`. `GalaxyProxyDriver.MapAggregateToColumn` translates `HistoryAggregateType` → `AnalogSummaryQuery` column names on the proxy side so Host stays OPC-UA-free. | +| MxAccess subscription push frames | Closed under Medium 3 above | +| Wonderware Historian-backed HistoryRead | Closed under the Historian port row | +| Alarm subsystem wire-up | PR 14. `GalaxyAlarmTracker` in `Backend/Alarms/` advises the four Galaxy alarm-state attributes per `IsAlarm=true` attribute (`.InAlarm`, `.Priority`, `.DescAttrName`, `.Acked`), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises `AlarmTransition` events (Active / Acknowledged / Inactive) forwarded through the existing `OnAlarmEvent` IPC frame. `AcknowledgeAlarmAsync` writes operator comment to `.AckMsg` through the PR 4 TCS-by-handle write path. | +| Reconnect-without-recycle in MxAccessClient | Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback). | +| Real downstream-consumer cutover | Out of scope for this repo; phased Year-3 rollout per `docs/v2/plan.md` §Rollout — not a Phase 2 deliverable. | + +## 2026-04-20 test baseline + +Full-solution `dotnet test ZB.MOM.WW.OtOpcUa.slnx` on `v2` tip: + +| Project | Pass | Skip | Target | +|---|---:|---:|---| +| Core.Abstractions.Tests | 37 | 0 | net10 | +| Client.Shared.Tests | 136 | 0 | net10 | +| Client.CLI.Tests | 52 | 0 | net10 | +| Client.UI.Tests | 98 | 0 | net10 | +| Driver.S7.Tests | 58 | 0 | net10 | +| Driver.Modbus.Tests | 182 | 0 | net10 | +| Driver.Modbus.IntegrationTests | 2 | 21 | net10 (Docker-gated) | +| Driver.AbLegacy.Tests | 96 | 0 | net10 | +| Driver.AbCip.Tests | 211 | 0 | net10 | +| Driver.AbCip.IntegrationTests | 11 | 1 | net10 (ab_server-gated) | +| Driver.TwinCAT.Tests | 110 | 0 | net10 | +| Driver.OpcUaClient.Tests | 78 | 0 | net10 | +| Driver.FOCAS.Tests | 119 | 0 | net10 | +| Driver.Galaxy.Shared.Tests | 6 | 0 | net10 | +| Driver.Galaxy.Proxy.Tests | 18 | 7 | net10 (live-Galaxy-gated) | +| **Driver.Galaxy.Host.Tests** | **107** | **0** | **net48 x86** | +| Analyzers.Tests | 5 | 0 | net10 | +| Core.Tests | 182 | 0 | net10 | +| Configuration.Tests | 71 | 0 | net10 | +| Admin.Tests | 92 | 0 | net10 | +| Server.Tests | 173 | 0 | net10 | +| **Total** | **1844** | **29** | | + +**Observed flake**: one Configuration.Tests failure on the first full-solution run turned +green on re-run. Not a stable regression; logged as a known flake until it reproduces. + +**Skips are all infra-gated**: +- Modbus 21 skips — oitc/modbus-server Docker container not started. +- AbCip 1 skip — libplctag `ab_server` binary not on PATH. +- Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL). + +## What "Phase 2 closed" means for Phase 3 and later + +- Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 / + AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient. +- No v1 code path remains. Anything invoking the `ZB.MOM.WW.LmxOpcUa.*` namespaces is + historical; any future work routes through `Driver.Galaxy.Proxy` + the named-pipe IPC. +- The 2026-04-13 stability findings live on as named regression tests under + `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs` — a + future refactor that reintroduces any of those four defects trips the test. +- Aveva Historian integration is wired end-to-end; new driver families don't need + Historian-specific plumbing in the IPC — they just implement `IHistoryProvider`. + +## Outstanding — not Phase 2 blockers + +- **AB CIP whole-UDT read optimization** (task #194) — niche performance win for large UDT + reads; current per-member fan-out works correctly. +- **AB CIP `IAlarmSource` via tag-projected ALMA/ALMD** (task #177) — AB CIP driver doesn't + currently expose alarms; feature-flagged follow-up. +- **IdentificationFolderBuilder wire-in** (task #195) — blocked on Equipment node walker. +- **UnsTab Playwright E2E** (task #199) — infra setup PR. + +None of these are Phase 2 scope; all are tracked independently. diff --git a/docs/v2/implementation/exit-gate-phase-2-final.md b/docs/v2/implementation/exit-gate-phase-2-final.md index 17725e0..1a66004 100644 --- a/docs/v2/implementation/exit-gate-phase-2-final.md +++ b/docs/v2/implementation/exit-gate-phase-2-final.md @@ -1,5 +1,11 @@ # Phase 2 Final Exit Gate (2026-04-18) +> **⚠️ Superseded by [`exit-gate-phase-2-closed.md`](exit-gate-phase-2-closed.md) (2026-04-20).** +> This doc captures the snapshot at PR 2 merge — when the four `High` + `Medium` findings +> in the adversarial review were still OPEN and Historian port + alarm subsystem were still +> deferred. All of those closed subsequently (PR 4 + PR 12 + PR 13 + PR 14 + PR 61). Kept +> as historical evidence; consult the close-out doc for current Phase 2 status. + > Supersedes `phase-2-partial-exit-evidence.md` and `exit-gate-phase-2.md`. Captures the > as-built state at the close of Phase 2 work delivered across two PRs. diff --git a/docs/v2/plan.md b/docs/v2/plan.md index 0df8ad6..cf21ca9 100644 --- a/docs/v2/plan.md +++ b/docs/v2/plan.md @@ -736,7 +736,7 @@ Each step leaves the system runnable. The generic extraction is effectively free 6. **Wire `Server`** — bootstrap from Configuration using an instance-bound credential (cert/gMSA/SQL login), fail fast if the credential is rejected, register drivers, start Core. 7. **Scaffold `Admin`** — Blazor Server app with: instance + credential management, draft/publish/rollback generation workflow (diff viewer, "publish to fleet", per-instance override), and core CRUD for drivers/devices/tags. Driver-specific config screens deferred to later phases. -**Phase 2 — Galaxy driver (prove the refactor)** +**Phase 2 — Galaxy driver (prove the refactor) — ✅ CLOSED 2026-04-20** (see [`implementation/exit-gate-phase-2-closed.md`](implementation/exit-gate-phase-2-closed.md)) 8. **Build `Galaxy.Shared`** — .NET Standard 2.0 IPC message contracts 9. **Build `Galaxy.Host`** — .NET 4.8 x86 process hosting MxAccessBridge, GalaxyRepository, alarms, HDA with IPC server 10. **Build `Galaxy.Proxy`** — .NET 10 in-process proxy implementing IDriver interfaces, forwarding over IPC