# Phase 2 Exit Gate Record (2026-04-18) > Supersedes `phase-2-partial-exit-evidence.md`. Captures the as-built state of Phase 2 after > the MXAccess COM client port + DB-backed and MXAccess-backed Galaxy backends + adversarial > review. ## Status: **Streams A, B, C complete. Stream D + E gated only on legacy-Host removal + parity-test rewrite.** The Phase 2 plan exit criterion ("v1 IntegrationTests pass against v2 Galaxy.Proxy + Galaxy.Host topology byte-for-byte") still cannot be auto-validated in a single session. The blocker is no longer "the Galaxy code lift" — that's done in this session — but the structural fact that the 494 v1 IntegrationTests instantiate v1 `OtOpcUa.Host` classes directly. They have to be rewritten to use the IPC-fronted Proxy topology before legacy `OtOpcUa.Host` can be deleted, and the plan budgets that work as a multi-day debug-cycle (Task E.1). What changed today: the MXAccess COM client now exists in Galaxy.Host with a real `ArchestrA.MxAccess.dll` reference, runs end-to-end against live `LMXProxyServer`, and 3 live COM smoke tests pass on this dev box. `MxAccessGalaxyBackend` (the third `IGalaxyBackend` implementation, alongside `StubGalaxyBackend` and `DbBackedGalaxyBackend`) combines the ported `GalaxyRepository` with the ported `MxAccessClient` so Discover / Read / Write / Subscribe all flow through one production-shape backend. `Program.cs` selects between the three backends via the `OTOPCUA_GALAXY_BACKEND` env var (default = `mxaccess`). ## Delivered in Phase 2 (full scope, not just scaffolds) ### Stream A — Driver.Galaxy.Shared (✅ complete) - 9 contract files: Hello/HelloAck (version negotiation), OpenSession/CloseSession/Heartbeat, Discover + GalaxyObjectInfo + GalaxyAttributeInfo, Read/Write + GalaxyDataValue, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle. - Length-prefixed framing (4-byte BE length + 1-byte kind + MessagePack body) with a 16 MiB cap. - Thread-safe `FrameWriter` (semaphore-gated) and single-consumer `FrameReader`. - 6 round-trip tests + reflection-scan that asserts contracts only reference BCL + MessagePack. ### Stream B — Driver.Galaxy.Host (✅ complete, exceeded original scope) - Real Win32 message pump in `StaPump` — `GetMessage`/`PostThreadMessage`/`PeekMessage`/ `PostQuitMessage` P/Invoke, dedicated STA thread, `WM_APP=0x8000` work dispatch, `WM_APP+1` graceful-drain → `PostQuitMessage`, 5s join-on-dispose, responsiveness probe. - Strict `PipeAcl` (allow configured server SID only, deny LocalSystem + Administrators), `PipeServer` with caller-SID verification + per-process shared-secret `Hello` handshake. - Galaxy-specific `MemoryWatchdog` (warn `max(1.5×baseline, +200 MB)`, soft-recycle `max(2×baseline, +200 MB)`, hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min window). - `RecyclePolicy` (1/hr cap + 03:00 daily scheduled), `PostMortemMmf` (1000-entry ring buffer, hard-crash survivable, cross-process readable), `MxAccessHandle : SafeHandle`. - `IGalaxyBackend` interface + 3 implementations: - **`StubGalaxyBackend`** — keeps IPC end-to-end testable without Galaxy. - **`DbBackedGalaxyBackend`** — real Discover via the ported `GalaxyRepository` against ZB. - **`MxAccessGalaxyBackend`** — Discover via DB + Read/Write/Subscribe via the ported `MxAccessClient` over the StaPump. - `GalaxyRepository` ported from v1 (HierarchySql + AttributesSql byte-for-byte identical). - `MxAccessClient` ported from v1 (Connect/Read/Write/Subscribe/Unsubscribe + ConcurrentDict handle tracking + OnDataChange / OnWriteComplete event marshalling). The reconnect loop + Historian plugin loader + extended-attribute query are explicit follow-ups. - `MxProxyAdapter` + `IMxProxy` for COM-isolation testability. - `Program.cs` env-driven backend selection (`OTOPCUA_GALAXY_BACKEND=stub|db|mxaccess`, `OTOPCUA_GALAXY_ZB_CONN`, `OTOPCUA_GALAXY_CLIENT_NAME`, plus the Phase 2 baseline `OTOPCUA_GALAXY_PIPE` / `OTOPCUA_ALLOWED_SID` / `OTOPCUA_GALAXY_SECRET`). - ArchestrA.MxAccess.dll referenced via HintPath at `lib/ArchestrA.MxAccess.dll`. Project flipped to **x86 platform target** (the COM interop requires it). ### Stream C — Driver.Galaxy.Proxy (✅ complete) - `GalaxyProxyDriver` implements **all 9** capability interfaces — `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IRediscoverable`, `IHostConnectivityProbe` — each forwarding through the matching IPC contract. - `GalaxyIpcClient` with `CallAsync` (request/response gated through a semaphore so concurrent callers don't interleave frames) + `SendOneWayAsync` for fire-and-forget calls (Unsubscribe / AlarmAck / CloseSession). - `Backoff` (5s → 15s → 60s, capped, reset-on-stable-run), `CircuitBreaker` (3 crashes per 5 min opens; 1h → 4h → manual escalation; sticky alert), `HeartbeatMonitor` (2s cadence, 3 misses = host dead). ### Tests - **963 pass / 1 pre-existing baseline** across the full solution. - New in this session: - `StaPumpTests` — pump still passes 3/3 against the real Win32 implementation - `EndToEndIpcTests` (5) — every IPC operation through Pipe + dispatcher + StubBackend - `IpcHandshakeIntegrationTests` (2) — Hello + heartbeat + secret rejection - `GalaxyRepositoryLiveSmokeTests` (5) — live SQL against ZB, skip when ZB unreachable - `MxAccessLiveSmokeTests` (3) — live COM against running `aaBootstrap` + `LMXProxyServer` - All net48 x86 to match Galaxy.Host ## Adversarial review findings Independent pass over the Phase 2 deltas. Findings ranked by severity; **all open items are explicitly deferred to Stream D/E or v2.1 with rationale.** ### Critical — none. ### High 1. **MxAccess `ReadAsync` has a subscription-leak window on cancellation.** The one-shot read uses subscribe → first-OnDataChange → unsubscribe. If the caller cancels between the `SubscribeOnPumpAsync` await and the `tcs.Task` await, the subscription stays installed. *Mitigation:* the StaPump's idempotent unsubscribe path drops orphan subs at disconnect, but a long-running session leaks them. **Fix scoped to Phase 2 follow-up** alongside the proper subscription registry that v1 had. 2. **No reconnect loop on the MXAccess COM connection.** v1's `MxAccessClient.Monitor` polled a probe tag and triggered reconnect-with-replay on disconnection. The ported client's `ConnectAsync` is one-shot and there's no health monitor. *Mitigation:* the Tier C supervisor on the Proxy side (CircuitBreaker + HeartbeatMonitor) restarts the whole Host process on liveness failure, so connection loss surfaces as a process recycle rather than silent data loss. **Reconnect-without-recycle is a v2.1 refinement** per `driver-stability.md`. ### Medium 3. **`MxAccessGalaxyBackend.SubscribeAsync` doesn't push OnDataChange frames back to the Proxy.** The wire frame `MessageKind.OnDataChangeNotification` is defined and `GalaxyProxyDriver` has the `RaiseDataChange` internal entry point, but the Host-side push pipeline isn't wired — the subscribe registers on the COM side but the value just gets discarded. *Mitigation:* the SubscribeAsync handle is still useful for the ack flow, and one-shot reads work. **Push plumbing is the next-session item.** 4. **`WriteValuesAsync` doesn't await the OnWriteComplete callback.** v1's implementation awaited a TCS keyed on the item handle; the port fires the write and returns success without confirming the runtime accepted it. *Mitigation:* the StatusCode in the response will be 0 (Good) for a fire-and-forget — false positive if the runtime rejects post-callback. **Fix needs the same TCS-by-handle pattern as v1; queued.** 5. **`MxAccessGalaxyBackend.Discover` re-queries SQL on every call.** v1 cached the tree and only refreshed on the deploy-watermark change. *Mitigation:* AttributesSql is the slow one (~30s for a large Galaxy); first-call latency is the symptom, not data loss. **Caching + `IRediscoverable` push is a v2.1 follow-up.** ### Low 6. **Live MXAccess test `Backend_ReadValues_against_discovered_attribute_returns_a_response_shape` silently passes if no readable attribute is found.** Documented; the test asserts the *shape* not the *value* because some Galaxy installs are configuration-only. 7. **`FrameWriter` allocates the length-prefix as a 4-byte heap array per call.** Could be stackalloc. Microbenchmark not done — currently irrelevant. 8. **`MxProxyAdapter.Unregister` swallows exceptions during `Unregister(handle)`.** v1 did the same; documented as best-effort during teardown. Consider logging the swallow. ### Out of scope (correctly deferred) - Stream D.1 — delete legacy `OtOpcUa.Host`. **Cannot be done in any single session** because the 494 v1 IntegrationTests reference Host classes directly. Requires the test rewrite cycle in Stream E. - Stream E.1 — run v1 IntegrationTests against v2 topology. Requires (a) test rewrite to use Proxy/Host instead of in-process Host classes, then (b) the parity-debug iteration that the plan budgets 3-4 weeks for. - Stream E.2 — Client.CLI walkthrough diff. Requires the v1 baseline capture. - Stream E.3 — four 2026-04-13 stability findings regression tests. Requires the parity test harness from Stream E.1. - Wonderware Historian SDK plugin loader (Task B.1.h). HistoryRead returns a recognisable error until the plugin loader is wired. - Alarm subsystem wire-up (`MxAccessGalaxyBackend.SubscribeAlarmsAsync` is a no-op today). v1's alarm tracking is its own subtree; queued as Phase 2 follow-up. ## Stream-D removal checklist (next session) 1. Decide policy on the 494 v1 tests: - **Option A**: rewrite to use `Driver.Galaxy.Proxy` + `Driver.Galaxy.Host` topology (multi-day; full parity validation as a side effect) - **Option B**: archive them as `OtOpcUa.Tests.v1Archive` and write a smaller v2 parity suite against the new topology (faster; less coverage initially) 2. Execute the chosen option. 3. Delete `src/ZB.MOM.WW.OtOpcUa.Host/`, remove from `.slnx`. 4. Update Windows service installer to register two services (`OtOpcUa` + `OtOpcUaGalaxyHost`) with the correct service-account SIDs. 5. Migration script for `appsettings.json` Galaxy sections → `DriverInstance.DriverConfig` JSON. 6. PR + adversarial review + `exit-gate-phase-2-final.md`. ## What ships from this session Eight commits on `phase-1-configuration` since the previous push: - `01fd90c` Phase 1 finish + Phase 2 scaffold - `7a5b535` Admin UI core - `18f93d7` LDAP + SignalR - `a1e9ed4` AVEVA-stack inventory doc - `32eeeb9` Phase 2 A+B+C feature-complete - `549cd36` GalaxyRepository ported + DbBackedBackend + live ZB smoke - `(this commit)` MXAccess COM port + MxAccessGalaxyBackend + live MXAccess smoke + adversarial review `494/494` v1 tests still pass. No regressions.