From 33b87a3aa41f55f2126a9b17750d81abff29a92c Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 20 Apr 2026 02:00:35 -0400 Subject: [PATCH] =?UTF-8?q?Phase=202=20official=20close-out.=20Closes=20ta?= =?UTF-8?q?sk=20#209.=20The=202026-04-18=20exit-gate-phase-2-final.md=20ca?= =?UTF-8?q?ptured=20Phase=202=20state=20at=20PR=202=20merge=20=E2=80=94=20?= =?UTF-8?q?four=20High/Medium=20adversarial=20findings=20still=20OPEN,=20H?= =?UTF-8?q?istorian=20port=20+=20alarm=20subsystem=20+=20v1=20archive=20de?= =?UTF-8?q?letion=20all=20deferred.=20Since=20then:=20PR=204=20closed=20al?= =?UTF-8?q?l=20four=20findings=20end-to-end=20(High=201=20Read=20subscript?= =?UTF-8?q?ion-leak,=20High=202=20no=20reconnect=20loop,=20Medium=203=20Su?= =?UTF-8?q?bscribeAsync=20doesn't=20push=20frames,=20Medium=204=20WriteVal?= =?UTF-8?q?uesAsync=20doesn't=20await=20OnWriteComplete=20=E2=80=94=20mapp?= =?UTF-8?q?ed=20+=20resolved=20inline=20in=20the=20new=20doc),=20PR=2012?= =?UTF-8?q?=20landed=20the=20richer=20historian=20quality=20mapper,=20PR?= =?UTF-8?q?=2013=20shipped=20GalaxyRuntimeProbeManager=20with=20per-Platfo?= =?UTF-8?q?rm/AppEngine=20ScanState=20subscriptions=20+=20StateChanged=20e?= =?UTF-8?q?vents=20forwarded=20through=20the=20existing=20OnHostStatusChan?= =?UTF-8?q?ged=20IPC=20frame,=20PR=2014=20wired=20the=20alarm=20subsystem?= =?UTF-8?q?=20(GalaxyAlarmTracker=20advising=20the=20four=20alarm-state=20?= =?UTF-8?q?attributes=20per=20IsAlarm=3Dtrue=20attribute,=20raising=20Alar?= =?UTF-8?q?mTransition=20events=20forwarded=20through=20OnAlarmEvent=20IPC?= =?UTF-8?q?=20frames),=20Phase=203=20PR=2018=20deleted=20the=20v1=20source?= =?UTF-8?q?=20trees,=20and=20PR=2061=20closed=20V1=5FARCHIVE=5FSTATUS.md.?= =?UTF-8?q?=20Phase=202=20is=20functionally=20done;=20this=20commit=20is?= =?UTF-8?q?=20the=20bookkeeping=20pass.=20New=20exit-gate-phase-2-closed.m?= =?UTF-8?q?d=20at=20docs/v2/implementation/=20=E2=80=94=20five-stream=20st?= =?UTF-8?q?atus=20table=20(A/B/C/D/E=20all=20complete=20with=20the=20speci?= =?UTF-8?q?fic=20close=20commits=20named),=20full=20resolution=20table=20f?= =?UTF-8?q?or=20every=202026-04-18=20adversarial=20finding=20mapped=20to?= =?UTF-8?q?=20the=20PR=204=20resolution,=20cross-cutting=20deferrals=20tab?= =?UTF-8?q?le=20marking=20every=20one=20resolved=20(Historian=20SDK=20plug?= =?UTF-8?q?in=20port=20=E2=86=92=20done,=20subscription=20push=20frames=20?= =?UTF-8?q?=E2=86=92=20done=20under=20Medium=203,=20Historian-backed=20His?= =?UTF-8?q?toryRead=20=E2=86=92=20done,=20alarm=20subsystem=20wire-up=20?= =?UTF-8?q?=E2=86=92=20done,=20reconnect-without-recycle=20=E2=86=92=20don?= =?UTF-8?q?e=20under=20High=202,=20v1=20archive=20deletion=20=E2=86=92=20d?= =?UTF-8?q?one).=20Fresh=202026-04-20=20test=20baseline=20captured=20from?= =?UTF-8?q?=20the=20current=20v2=20tip:=201844=20passing=20+=2029=20infra-?= =?UTF-8?q?gated=20skips=20across=2021=20test=20projects,=20including=20th?= =?UTF-8?q?e=20net48=20x86=20Galaxy.Host.Tests=20suite=20(107=20pass)=20th?= =?UTF-8?q?at=20exercises=20the=20MXAccess=20COM=20path=20on=20the=20dev?= =?UTF-8?q?=20box.=20Flake=20observed=20=E2=80=94=20Configuration.Tests=20?= =?UTF-8?q?70/71=20on=20first=20full-solution=20run,=2071/71=20on=20retry;?= =?UTF-8?q?=20logged=20as=20a=20known=20non-stable=20flake=20rather=20than?= =?UTF-8?q?=20chased=20because=20it=20did=20not=20reproduce.=20The=20prior?= =?UTF-8?q?=20exit-gate-phase-2-final.md=20is=20kept=20in=20place=20(histo?= =?UTF-8?q?rical=20record=20of=20the=202026-04-18=20snapshot)=20but=20gets?= =?UTF-8?q?=20a=20superseded-by=20banner=20at=20the=20top=20pointing=20at?= =?UTF-8?q?=20the=20new=20close-out=20doc=20so=20future=20readers=20land?= =?UTF-8?q?=20on=20current=20status=20first.=20docs/v2/plan.md=20Phase=202?= =?UTF-8?q?=20section=20header=20gains=20the=20=E2=9C=85=20CLOSED=202026-0?= =?UTF-8?q?4-20=20marker=20+=20a=20link=20to=20the=20close-out=20doc=20so?= =?UTF-8?q?=20the=20top-level=20plan=20index=20reflects=20reality.=20"What?= =?UTF-8?q?=20Phase=202=20closed=20means=20for=20Phase=203=20and=20later"?= =?UTF-8?q?=20section=20in=20the=20new=20doc=20captures=20the=20downstream?= =?UTF-8?q?=20contract:=20Galaxy=20now=20runs=20as=20a=20first-class=20v2?= =?UTF-8?q?=20driver=20with=20the=20same=20capability-interface=20shape=20?= =?UTF-8?q?as=20Modbus=20/=20S7=20/=20AbCip=20/=20AbLegacy=20/=20TwinCAT?= =?UTF-8?q?=20/=20FOCAS=20/=20OpcUaClient;=20no=20v1=20code=20path=20remai?= =?UTF-8?q?ns;=20the=202026-04-13=20stability=20findings=20persist=20as=20?= =?UTF-8?q?named=20regression=20tests=20under=20tests/ZB.MOM.WW.OtOpcUa.Dr?= =?UTF-8?q?iver.Galaxy.E2E/StabilityFindingsRegressionTests.cs=20so=20any?= =?UTF-8?q?=20future=20refactor=20reintroducing=20them=20trips=20the=20tes?= =?UTF-8?q?t.=20"Outstanding=20=E2=80=94=20not=20Phase=202=20blockers"=20s?= =?UTF-8?q?ection=20lists=20the=20four=20pending=20non-Phase-2=20tasks=20(?= =?UTF-8?q?#177,=20#194,=20#195,=20#199)=20so=20nobody=20mistakes=20them?= =?UTF-8?q?=20for=20Phase=202=20tail=20work.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- .../exit-gate-phase-2-closed.md | 108 ++++++++++++++++++ .../implementation/exit-gate-phase-2-final.md | 6 + docs/v2/plan.md | 2 +- 3 files changed, 115 insertions(+), 1 deletion(-) create mode 100644 docs/v2/implementation/exit-gate-phase-2-closed.md diff --git a/docs/v2/implementation/exit-gate-phase-2-closed.md b/docs/v2/implementation/exit-gate-phase-2-closed.md new file mode 100644 index 0000000..8deb0bd --- /dev/null +++ b/docs/v2/implementation/exit-gate-phase-2-closed.md @@ -0,0 +1,108 @@ +# Phase 2 Close-Out (2026-04-20) + +> Supersedes `exit-gate-phase-2-final.md` (2026-04-18) which captured the state at PR 2 +> merge. Between that doc and today, PR 4 closed all open high + medium findings, PR 13 +> shipped the probe manager, PR 14 shipped the alarm subsystem, and PR 61 closed the v1 +> archive deletion. Phase 2 is closed. + +## Status: **CLOSED** + +Every stream in Phase 2 is complete. Every finding from the 2026-04-18 adversarial review +is resolved. The v1 archive is deleted. The Galaxy driver runs the full +`Shared` / `Host` / `Proxy` topology against live MXAccess on the dev box with all 9 +capability interfaces wired end-to-end. + +## Stream-by-stream + +| Stream | Plan §reference | Status | Close commit | +|---|---|---|---| +| A — Driver.Galaxy.Shared | §A.1–A.3 | ✅ Complete | PR 1 | +| B — Driver.Galaxy.Host | §B.1–B.10 | ✅ Complete — real Win32 pump, Tier C protections, all 3 IGalaxyBackend impls (Stub / DbBacked / MxAccess), probe manager, alarm tracker, Historian wire-up | PR 1 + PR 4 + PR 12 + PR 13 + PR 14 | +| C — Driver.Galaxy.Proxy | §C.1–C.4 | ✅ Complete — all 9 capability interfaces, supervisor (Backoff + CircuitBreaker + HeartbeatMonitor), subscription push frames | PR 1 + PR 4 | +| D — Retire legacy Host | §D.1–D.3 | ✅ Complete — archive markings landed in PR 2, source tree deletion in Phase 3 PR 18, status doc closed in PR 61 | PR 2 → Phase 3 PR 18 → PR 61 | +| E — Parity validation | §E.1–E.4 | ✅ Complete — E2E suite + 4 stability-finding regression tests + `HostSubprocessParityTests` cross-FX integration | PR 2 | + +## 2026-04-18 adversarial findings — resolved + +All four `High` + `Medium` items flagged as OPEN at the 2026-04-18 exit gate closed in PR 4 +(`caa9cb8 Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from +exit-gate-phase-2-final.md`): + +| ID | Finding | Resolution | +|----|---------|------------| +| High 1 | MxAccess Read subscription-leak on cancellation | One-shot read now wraps subscribe → first `OnDataChange` → unsubscribe in try/finally. Per-tag callback always detached. If the read installed the underlying subscription (prior `_addressToHandle` key was absent) it tears it down on the way out — no leaked probe item handles on caller cancel or timeout. | +| High 2 | No MXAccess reconnect loop, only supervisor-driven recycle | `MxAccessClient` gains `MxAccessClientOptions { AutoReconnect, MonitorInterval=5s, StaleThreshold=60s }` + a background `MonitorLoopAsync` started on first `ConnectAsync`. Checks `_lastObservedActivityUtc` each interval (bumped by every `OnDataChange` callback); if stale, probes the proxy with a no-op COM `AddItem("$Heartbeat")` on the StaPump; on probe failure does reconnect-with-replay — Unregister (best-effort), Register, snapshot `_addressToHandle.Keys`, clear, re-AddItem every previously-active subscription. `ConnectionStateChanged` fires on the false→true transition; `ReconnectCount` bumps. | +| Medium 3 | `SubscribeAsync` doesn't push `OnDataChange` frames yet | `IGalaxyBackend` gains `OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged` events. New `IFrameHandler.AttachConnection(FrameWriter)` called per-connection by `PipeServer` after Hello. `GalaxyFrameHandler.ConnectionSink` subscribes the events for the connection lifetime, fire-and-forgets pushes as `MessageKind.OnDataChangeNotification` / `AlarmEvent` / `RuntimeStatusChange` frames through the writer, swallows `ObjectDisposedException` for dispose race, unsubscribes on Dispose. `MxAccessGalaxyBackend.SubscribeAsync` wires `OnTagValueChanged` that fans values out per-tag to every subscription listening (one MXAccess subscription, multi-fan-out via `_refToSubs` reverse map). `UnsubscribeAsync` only calls `mx.UnsubscribeAsync` when the last sub for a tag drops. | +| Medium 4 | `WriteValuesAsync` doesn't await `OnWriteComplete` | `MxAccessClient.WriteAsync` rewritten to return `Task` via the v1-style TCS-keyed-by-item-handle pattern in `_pendingWrites`. TCS added before the `Write` call, awaited with configurable timeout (default 5s), removed in finally. Returns true only when `OnWriteComplete` reported success. `MxAccessGalaxyBackend.WriteValuesAsync` reports per-tag `Bad_InternalError` ("MXAccess runtime reported write failure") when the bool returns false. | + +## Cross-cutting deferrals — resolved + +| Deferral | Resolution | +|----------|------------| +| Deletion of v1 archive | Phase 3 PR 18 deleted the source trees; PR 61 closed `V1_ARCHIVE_STATUS.md` | +| Wonderware Historian SDK plugin port | `Driver.Galaxy.Host/Backend/Historian/` ports the 10 source files (`HistorianDataSource`, `HistorianClusterEndpointPicker`, `HistorianHealthSnapshot`, etc.). `MxAccessGalaxyBackend` implements `HistoryReadAsync` / `HistoryReadProcessedAsync` / `HistoryReadAtTimeAsync` / `HistoryReadEventsAsync`. `GalaxyProxyDriver.MapAggregateToColumn` translates `HistoryAggregateType` → `AnalogSummaryQuery` column names on the proxy side so Host stays OPC-UA-free. | +| MxAccess subscription push frames | Closed under Medium 3 above | +| Wonderware Historian-backed HistoryRead | Closed under the Historian port row | +| Alarm subsystem wire-up | PR 14. `GalaxyAlarmTracker` in `Backend/Alarms/` advises the four Galaxy alarm-state attributes per `IsAlarm=true` attribute (`.InAlarm`, `.Priority`, `.DescAttrName`, `.Acked`), runs the OPC UA Part 9 lifecycle simplified for the Galaxy AlarmExtension model, raises `AlarmTransition` events (Active / Acknowledged / Inactive) forwarded through the existing `OnAlarmEvent` IPC frame. `AcknowledgeAlarmAsync` writes operator comment to `.AckMsg` through the PR 4 TCS-by-handle write path. | +| Reconnect-without-recycle in MxAccessClient | Closed under High 2 (reconnect-with-replay loop is the "without-recycle" path — supervisor recycle remains the fallback). | +| Real downstream-consumer cutover | Out of scope for this repo; phased Year-3 rollout per `docs/v2/plan.md` §Rollout — not a Phase 2 deliverable. | + +## 2026-04-20 test baseline + +Full-solution `dotnet test ZB.MOM.WW.OtOpcUa.slnx` on `v2` tip: + +| Project | Pass | Skip | Target | +|---|---:|---:|---| +| Core.Abstractions.Tests | 37 | 0 | net10 | +| Client.Shared.Tests | 136 | 0 | net10 | +| Client.CLI.Tests | 52 | 0 | net10 | +| Client.UI.Tests | 98 | 0 | net10 | +| Driver.S7.Tests | 58 | 0 | net10 | +| Driver.Modbus.Tests | 182 | 0 | net10 | +| Driver.Modbus.IntegrationTests | 2 | 21 | net10 (Docker-gated) | +| Driver.AbLegacy.Tests | 96 | 0 | net10 | +| Driver.AbCip.Tests | 211 | 0 | net10 | +| Driver.AbCip.IntegrationTests | 11 | 1 | net10 (ab_server-gated) | +| Driver.TwinCAT.Tests | 110 | 0 | net10 | +| Driver.OpcUaClient.Tests | 78 | 0 | net10 | +| Driver.FOCAS.Tests | 119 | 0 | net10 | +| Driver.Galaxy.Shared.Tests | 6 | 0 | net10 | +| Driver.Galaxy.Proxy.Tests | 18 | 7 | net10 (live-Galaxy-gated) | +| **Driver.Galaxy.Host.Tests** | **107** | **0** | **net48 x86** | +| Analyzers.Tests | 5 | 0 | net10 | +| Core.Tests | 182 | 0 | net10 | +| Configuration.Tests | 71 | 0 | net10 | +| Admin.Tests | 92 | 0 | net10 | +| Server.Tests | 173 | 0 | net10 | +| **Total** | **1844** | **29** | | + +**Observed flake**: one Configuration.Tests failure on the first full-solution run turned +green on re-run. Not a stable regression; logged as a known flake until it reproduces. + +**Skips are all infra-gated**: +- Modbus 21 skips — oitc/modbus-server Docker container not started. +- AbCip 1 skip — libplctag `ab_server` binary not on PATH. +- Galaxy.Proxy 7 skips — live Galaxy stack not reachable from the current shell (admin-token pipe ACL). + +## What "Phase 2 closed" means for Phase 3 and later + +- Galaxy runs as first-class v2 driver, same capability-interface contract as Modbus / S7 / + AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient. +- No v1 code path remains. Anything invoking the `ZB.MOM.WW.LmxOpcUa.*` namespaces is + historical; any future work routes through `Driver.Galaxy.Proxy` + the named-pipe IPC. +- The 2026-04-13 stability findings live on as named regression tests under + `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs` — a + future refactor that reintroduces any of those four defects trips the test. +- Aveva Historian integration is wired end-to-end; new driver families don't need + Historian-specific plumbing in the IPC — they just implement `IHistoryProvider`. + +## Outstanding — not Phase 2 blockers + +- **AB CIP whole-UDT read optimization** (task #194) — niche performance win for large UDT + reads; current per-member fan-out works correctly. +- **AB CIP `IAlarmSource` via tag-projected ALMA/ALMD** (task #177) — AB CIP driver doesn't + currently expose alarms; feature-flagged follow-up. +- **IdentificationFolderBuilder wire-in** (task #195) — blocked on Equipment node walker. +- **UnsTab Playwright E2E** (task #199) — infra setup PR. + +None of these are Phase 2 scope; all are tracked independently. diff --git a/docs/v2/implementation/exit-gate-phase-2-final.md b/docs/v2/implementation/exit-gate-phase-2-final.md index 17725e0..1a66004 100644 --- a/docs/v2/implementation/exit-gate-phase-2-final.md +++ b/docs/v2/implementation/exit-gate-phase-2-final.md @@ -1,5 +1,11 @@ # Phase 2 Final Exit Gate (2026-04-18) +> **⚠️ Superseded by [`exit-gate-phase-2-closed.md`](exit-gate-phase-2-closed.md) (2026-04-20).** +> This doc captures the snapshot at PR 2 merge — when the four `High` + `Medium` findings +> in the adversarial review were still OPEN and Historian port + alarm subsystem were still +> deferred. All of those closed subsequently (PR 4 + PR 12 + PR 13 + PR 14 + PR 61). Kept +> as historical evidence; consult the close-out doc for current Phase 2 status. + > Supersedes `phase-2-partial-exit-evidence.md` and `exit-gate-phase-2.md`. Captures the > as-built state at the close of Phase 2 work delivered across two PRs. diff --git a/docs/v2/plan.md b/docs/v2/plan.md index 0df8ad6..cf21ca9 100644 --- a/docs/v2/plan.md +++ b/docs/v2/plan.md @@ -736,7 +736,7 @@ Each step leaves the system runnable. The generic extraction is effectively free 6. **Wire `Server`** — bootstrap from Configuration using an instance-bound credential (cert/gMSA/SQL login), fail fast if the credential is rejected, register drivers, start Core. 7. **Scaffold `Admin`** — Blazor Server app with: instance + credential management, draft/publish/rollback generation workflow (diff viewer, "publish to fleet", per-instance override), and core CRUD for drivers/devices/tags. Driver-specific config screens deferred to later phases. -**Phase 2 — Galaxy driver (prove the refactor)** +**Phase 2 — Galaxy driver (prove the refactor) — ✅ CLOSED 2026-04-20** (see [`implementation/exit-gate-phase-2-closed.md`](implementation/exit-gate-phase-2-closed.md)) 8. **Build `Galaxy.Shared`** — .NET Standard 2.0 IPC message contracts 9. **Build `Galaxy.Host`** — .NET 4.8 x86 process hosting MxAccessBridge, GalaxyRepository, alarms, HDA with IPC server 10. **Build `Galaxy.Proxy`** — .NET 10 in-process proxy implementing IDriver interfaces, forwarding over IPC