# Phase 2 — Partial Exit Evidence (2026-04-17) > This records what Phase 2 of v2 completed in the current session and what was explicitly > deferred. See `phase-2-galaxy-out-of-process.md` for the full task plan; this is the as-built > delta. ## Status: **Streams A + B + C scaffolded and test-green. Streams D + E deferred — runtime now in place.** The goal per the plan is "parity, not regression" — the phase exit gate requires v1 IntegrationTests to pass against the v2 Galaxy.Proxy + Galaxy.Host topology byte-for-byte. Achieving that requires live MXAccess runtime plus the Galaxy code lift out of the legacy `OtOpcUa.Host`. Without that cycle, deleting the legacy Host would break the 494 passing v1 tests that are the parity baseline. > **Update 2026-04-17 — runtime confirmed local.** The dev box has the full AVEVA stack required > for the LmxOpcUa breakout: 27 ArchestrA / Wonderware / AVEVA services running including > `aaBootstrap`, `aaGR` (Galaxy Repository), `aaLogger`, `aaUserValidator`, `aaPim`, > `ArchestrADataStore`, `AsbServiceManager`; the full Historian set > (`aahClientAccessPoint`, `aahGateway`, `aahInSight`, `aahSearchIndexer`, `InSQLStorage`, > `InSQLConfiguration`, `InSQLEventSystem`, `InSQLIndexing`, `InSQLIOServer`, > `HistorianSearch-x64`); SuiteLink (`slssvc`); MXAccess COM at > `C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll`; and the OI-Gateway > install at `C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\` (so the > AppServer-via-OI-Gateway smoke test from decision #142 is *also* runnable here, not blocked > on a dedicated AVEVA test box). > > The "needs a dev Galaxy" prerequisite is therefore satisfied. Stream D + E can start whenever > the team is ready to take the parity-cycle hit on the 494 v1 tests; no environmental blocker > remains. What *is* done: all scaffolding, IPC contracts, supervisor logic, and stability protections needed to hang the real MXAccess code onto. Every piece has unit-level or IPC-level test coverage. ## Delivered ### Stream A — `Driver.Galaxy.Shared` (1 week estimate, **complete**) - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` (.NET Standard 2.0, MessagePack-only dependency) - **Contracts**: `Hello`/`HelloAck` (version negotiation per Task A.3), `OpenSessionRequest`/ `OpenSessionResponse`/`CloseSessionRequest`, `Heartbeat`/`HeartbeatAck`, `ErrorResponse`, `DiscoverHierarchyRequest`/`Response` + `GalaxyObjectInfo` + `GalaxyAttributeInfo`, `ReadValuesRequest`/`Response`, `WriteValuesRequest`/`Response`, `SubscribeRequest`/ `Response`/`UnsubscribeRequest`/`OnDataChangeNotification`, `AlarmSubscribeRequest`/ `GalaxyAlarmEvent`/`AlarmAckRequest`, `HistoryReadRequest`/`Response`+`HistoryTagValues`, `HostConnectivityStatus`+`RuntimeStatusChangeNotification`, `RecycleHostRequest`/ `RecycleStatusResponse` - **Framing**: length-prefixed (decision #28) + 1-byte kind tag + MessagePack body. 16 MiB body cap. `FrameWriter`/`FrameReader` with thread-safe write gate. - **Tests (6)**: reflection-scan round-trip for every `[MessagePackObject]`, referenced- assemblies guard (only MessagePack allowed outside BCL), Hello version defaults, `FrameWriter`↔`FrameReader` interop, oversize-frame rejection. ### Stream B — `Driver.Galaxy.Host` (3–4 week estimate, **scaffold complete; MXAccess lift deferred**) - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` (.NET Framework 4.8 AnyCPU — flips to x86 when the Galaxy code lift happens per Task B.1 scope) - **`Ipc/PipeAcl`**: builds the strict `PipeSecurity` — allow configured server-principal SID, explicit deny on LocalSystem + Administrators, owner = allowed SID (decision #76). - **`Ipc/PipeServer`**: named-pipe server that (1) enforces the ACL, (2) verifies caller SID via `pipe.RunAsClient` + `WindowsIdentity.GetCurrent`, (3) requires the per-process shared secret in the Hello frame before any other RPC, (4) rejects major-version mismatches. - **`Stability/MemoryWatchdog`**: Galaxy thresholds — warn at `max(1.5×baseline, +200 MB)`, soft-recycle at `max(2×baseline, +200 MB)`, hard ceiling 1.5 GB, slope ≥5 MB/min over 30 min. Pluggable RSS source for unit testability. - **`Stability/RecyclePolicy`**: 1-recycle/hr cap; 03:00 local daily scheduled recycle. - **`Stability/PostMortemMmf`**: ring buffer of 1000 × 256-byte entries in `%ProgramData%\ OtOpcUa\driver-postmortem\galaxy.mmf`. Single-writer / multi-reader. Survives hard crash; supervisor reads the MMF via a second process. - **`Sta/MxAccessHandle`**: `SafeHandle` subclass — `ReleaseHandle` calls `Marshal.ReleaseComObject` in a loop until refcount = 0 then invokes the optional `unregister` callback. Finalizer-safe. Wraps any RCW via `object` so we can unit-test against a mock; the real wiring to `ArchestrA.MxAccess.LMXProxyServer` lands with the deferred code move. - **`Sta/StaPump`**: dedicated STA thread with `BlockingCollection` work queue + `InvokeAsync` dispatch. Responsiveness probe (`IsResponsiveAsync`) returns false on wedge. The real Win32 `GetMessage/DispatchMessage` pump from v1 `LmxProxy.Host` slots in here with the same dispatch semantics. - **`IsExternalInit` shim**: required for `init` setters on .NET 4.8. - **`Program.cs`**: reads `OTOPCUA_GALAXY_PIPE`, `OTOPCUA_ALLOWED_SID`, `OTOPCUA_GALAXY_SECRET` from env (supervisor sets at spawn), runs the pipe server, logs via Serilog to `%ProgramData%\OtOpcUa\galaxy-host-YYYY-MM-DD.log`. - **`Ipc/StubFrameHandler`**: placeholder that heartbeat-acks and returns `not-implemented` errors. Swapped for the real Galaxy-backed handler when the MXAccess code move completes. - **Tests (15)**: `MemoryWatchdog` thresholds + slope detection; `RecyclePolicy` cap + daily schedule; `PostMortemMmf` round-trip + ring-wrap + truncation-safety; `StaPump` apartment-state + responsiveness-probe wedge detection. ### Stream C — `Driver.Galaxy.Proxy` (1.5 week estimate, **complete as IPC-forwarder**) - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` (.NET 10) - **`Ipc/GalaxyIpcClient`**: Hello handshake + shared-secret authentication + single-call request/response over the data-plane pipe. Serializes concurrent callers via `SemaphoreSlim`. Lifts `ErrorResponse` to `GalaxyIpcException` with the error code. - **`GalaxyProxyDriver`**: implements `IDriver` + `ITagDiscovery`. Forwards lifecycle and discovery over IPC; maps Galaxy MX data types → `DriverDataType` and security classifications → `SecurityClassification`. Stream C-plan capability interfaces for `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe`, `IRediscoverable` are structured identically — wire them in when the Host's MXAccess backend exists so the round-trips can actually serve data. - **`Supervisor/Backoff`**: 5s → 15s → 60s capped; `RecordStableRun` resets after 2-min successful run. - **`Supervisor/CircuitBreaker`**: 3 crashes per 5 min opens; cooldown escalates 1h → 4h → manual (`TimeSpan.MaxValue`). Sticky alert doesn't auto-clear when cooldown elapses; `ManualReset` only. - **`Supervisor/HeartbeatMonitor`**: 2s cadence, 3 consecutive misses = host dead. - **Tests (11)**: `Backoff` sequence + reset; `CircuitBreaker` full 1h/4h/manual escalation path; `HeartbeatMonitor` miss-count + ack-reset; full IPC handshake round-trip (Host + Proxy over a real named pipe, heartbeat ack verified; shared-secret mismatch rejected with `UnauthorizedAccessException`). ## Deferred (explicitly noted as TODO) ### Stream D — Retire legacy `OtOpcUa.Host` **Not executable until Stream E parity passes.** Deleting the legacy project now would break the 494 v1 IntegrationTests that are the parity baseline. Recovery requires: 1. Host MXAccess code lift (Task B.1 "move Galaxy code") from `OtOpcUa.Host/` into `OtOpcUa.Driver.Galaxy.Host/` — STA pump wiring, `MxAccessHandle` backing the real `LMXProxyServer`, `GalaxyRepository` and its SQL queries, `GalaxyRuntimeProbeManager`, Historian loader, the Ipc stub handler replaced with a real `IFrameHandler` that invokes the handle. 2. Address-space build via `IAddressSpaceBuilder` produces byte-equivalent OPC UA browse output to v1 (Task C.4). 3. Windows service installer registers two services (`OtOpcUa` + `OtOpcUaGalaxyHost`) with the correct service-account SIDs and per-process secret provisioning. Galaxy.Host starts before OtOpcUa. 4. `appsettings.json` Galaxy config (MxAccess / Galaxy / Historian sections) migrated into `DriverInstance.DriverConfig` JSON in the Configuration DB via an idempotent migration script. Post-migration, the local `appsettings.json` keeps only `Cluster.NodeId`, `ClusterId`, and the DB conn string per decision #18. ### Stream E — Parity validation Requires live MXAccess + Galaxy runtime and the above lift complete. Work items: - Run v1 IntegrationTests against the v2 Galaxy.Proxy + Galaxy.Host topology. Pass count = v1 baseline; failures = 0. Per-test duration regression report flags any test >2× baseline. - Scripted Client.CLI walkthrough recorded at Phase 2 entry gate against v1, replayed against v2; diff must show only timestamp/latency differences. - Regression tests for the four 2026-04-13 stability findings (phantom probe, cross-host quality clear, sync-over-async guard, fire-and-forget alarm drain). - `/codex:adversarial-review --base v2` on the merged Phase 2 diff — findings closed or deferred with rationale. ## Also deferred from Stream B - **Task B.10 FaultShim** (test-only `ArchestrA.MxAccess` substitute for fault injection). Needs the production `ArchestrA.MxAccess` reference in place first; flagged as part of the plan's "mid-gate review" fallback (Risk row 7). - **Task B.8 WM_QUIT hard-exit escalation** — wired in when the real Win32 pump replaces the `BlockingCollection` dispatcher. The `StaPump.IsResponsiveAsync` probe already exists; the supervisor escalation-to-`Environment.Exit(2)` belongs to the Program main loop after the pump integration. ## Cross-session impact on the build - **Full solution**: 926 tests pass, 1 fails (pre-existing Phase 0 baseline `Client.CLI.Tests.SubscribeCommandTests.Execute_PrintsSubscriptionMessage` — not a Phase 2 regression; was red before Phase 1 and stays red through Phase 2). - **New projects added to `.slnx`**: `Driver.Galaxy.Shared`, `Driver.Galaxy.Host`, `Driver.Galaxy.Proxy`, plus the three matching test projects. - **No existing tests broke.** The 494 v1 `OtOpcUa.Tests` (net48) and 6 `IntegrationTests` (net48) still pass because the legacy `OtOpcUa.Host` is untouched. ## Next-session checklist for Stream D + E 1. Verify the local AVEVA stack is still green (`Get-Service aaGR, aaBootstrap, slssvc` → Running) and the Galaxy `ZB` repository is reachable from `sqlcmd -S localhost -d ZB -E`. The runtime is already on this machine — no install step needed. 2. Capture Client.CLI walkthrough baseline against v1 (the parity reference). 3. Move Galaxy-specific files from `OtOpcUa.Host` into `Driver.Galaxy.Host`, renaming namespaces. Replace `StubFrameHandler` with the real one. 4. Wire up the real Win32 pump inside `StaPump` (lift from scadalink-design's `LmxProxy.Host` reference per CLAUDE.md). 5. Run v1 IntegrationTests against the v2 topology — iterate on parity defects until green. 6. Run Client.CLI walkthrough and diff. 7. Regression tests for the four 2026-04-13 stability findings. 8. Delete legacy `OtOpcUa.Host`; update `.slnx`; update installer scripts. 9. Optional but valuable now that the runtime is local: AppServer-via-OI-Gateway smoke test (decision #142 / Phase 1 Task E.10) — the OI-Gateway install at `C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\` is in place; the test was deferred for "needs live AVEVA runtime" reasons that no longer apply on this dev box. 10. Adversarial review; `exit-gate-phase-2.md` recorded; PR merged.