# Galaxy parity rig — runbook Brings up both Galaxy backends side-by-side against a single live Galaxy so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs` can run for real. Closing the parity matrix is the gate for PR 7.2 (retire legacy Galaxy projects). ## Conceptual layout ``` Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) │ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host" │ └── named pipe "OtOpcUaGalaxy" │ ▲ │ │ pipe IPC │ │ │ GalaxyProxyDriver ◄── parity test (legacy half) │ └── mxaccessgw service └── MxAccess COM, ClientName "OtOpcUa-Parity" └── gRPC on http://localhost:5120 ▲ │ gRPC │ GalaxyDriver (in-process) ◄── parity test (mxgw half) ``` Both halves talk to the **same Galaxy** through **two distinct MxAccess sessions** (different ClientNames so they don't evict each other). ## What's already on this dev box Per `~/.claude/projects/.../memory/`: - **AVEVA System Platform + Galaxy + MXAccess runtime** — `project_aveva_platform_installed.md`. - **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped, binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`, shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433` — `project_galaxy_host_installed.md`. - **Parity test project** (`Driver.Galaxy.ParityTests`) committed and skip-clean — runs as soon as the mxgw half resolves. ## Setup steps (one-time) ### 1. Build + run mxaccessgw The gateway source is at `c:\Users\dohertj2\Desktop\mxaccessgw\`. From that repo: ```powershell cd C:\Users\dohertj2\Desktop\mxaccessgw dotnet publish src\MxGateway.Server -c Release -o C:\publish\MxAccessGw ``` Configure: - An API key. Pick anything stable (e.g. `parity-suite-key`) and put it in whichever config file `MxGateway.Server` reads — see `mxaccessgw/gateway.md` for the current shape. - ClientName for the worker's MxAccess registration — set to `OtOpcUa-Parity` so it doesn't collide with `OtOpcUa-Galaxy.Host`. - Bind to `http://localhost:5120` (default in `launchSettings.json`). Run it as a console app for the first session — easier to inspect logs. NSSM-wrap it later if the rig becomes long-lived: ```powershell C:\publish\MxAccessGw\MxGateway.Server.exe ``` The worker should log a successful `Register` against MxAccess after a few seconds. If it loops on `Register` failures, that's an MxAccess-side problem — the legacy `OtOpcUaGalaxyHost` going through the same COM stack is a known-good reference point. ### 2. Set the parity env vars In the test-runner shell: ```powershell $env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120" $env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config $env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity" ``` Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated `dohertj2` shells alike (the Administrators deny ACE was removed 2026-04-24; see `project_galaxy_host_installed.md`). ### 3. Verify both halves resolve ```powershell cd C:\Users\dohertj2\Desktop\lmxopcua dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ ` --filter "FullyQualifiedName~HarnessShapeTests" ``` `Harness_records_a_skip_reason_for_each_unavailable_backend` is the two-line truth-teller: - Both `LegacyDriver` non-null + both `MxGatewayDriver` non-null → rig is up. - One side null → read its `LegacySkipReason` / `MxGatewaySkipReason` and fix. ## Running the matrix Once both halves resolve: ```powershell dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ ` --filter "Category=ParityE2E" ``` This runs all 17 scenario tests across the seven scenario classes (BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect / ScanState). Each scenario class is independent — failures in one don't block the rest. Track the result against `docs/v2/Galaxy.ParityMatrix.md`. Update each row to: - **green** if the scenario passes - **yellow** if it skipped because the dev Galaxy doesn't have the right shape (see coverage matrix below) - **red** if it asserted a real delta — those are the deltas that block PR 7.2; chase each before retiring the legacy backend ## Galaxy shape needed for full coverage Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a real result, the dev Galaxy needs the shape in the right column: | Scenario | Needs | |---|---| | `BrowseAndReadParityTests` (3 tests) | Any deployed objects with attributes | | `SubscribeAndEventRateParityTests` event-rate | ≥5 attributes whose values *change* in 3s | | `WriteByClassificationParityTests` (FreeAccess/Operate) | A FreeAccess/Operate numeric attribute | | `WriteByClassificationParityTests` (Configure/Tune) | A Configure/Tune attribute | | `AlarmTransitionParityTests` (2 tests) | Attributes with the `$Alarm*` extension | | `HistoryReadParityTests` (historized set) | Attributes with the History extension | | `ScanStateProbeParityTests` (2 tests) | Multiple `$WinPlatform` / `$AppEngine` objects | The dev Galaxy from the existing E2E smoke (`gr/seed-phase-7-smoke.sql`) covers most of these; the multi-platform scenario probably needs hand-deploying a second `$WinPlatform` instance. ## Soak run The 24h × 50k soak gates the production confidence half of PR 7.2. ```powershell $env:OTOPCUA_SOAK_RUN = "1" $env:OTOPCUA_SOAK_TAGS = "" $env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs $env:OTOPCUA_SOAK_DROP_PCT = "0.5" dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ ` --filter "Category=Soak" ``` The test logs a per-minute CSV-style line to stdout: ``` soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412 soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415 ... ``` Capture stdout to a file for post-run analysis. The three guards (`received` growing, `dropped/received` ratio, working-set delta) all fire mid-run rather than at end-of-test, so a failure surfaces within the first few minutes if the architecture is wrong. ## Compressed-tag soak (when Galaxy isn't 50k tags) A first-pass validation is fine with the override: ```powershell $env:OTOPCUA_SOAK_RUN = "1" $env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has $env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs $env:OTOPCUA_SOAK_DROP_PCT = "1.0" ``` This validates the *plumbing* (bounded channel, pump invariants, leak guard) but doesn't pin the 50k-tag scaling assertion. Defer the full 50k validation to a customer rig with that scale, or build a synthetic Galaxy with a script that imports 50k attributes onto a generated UDO (~2 hours of one-off work). ## Troubleshooting - **`MxGatewaySkipReason` says "mxaccessgw not reachable"** — the gw isn't listening, or it's on a different port. `Test-NetConnection localhost -Port 5120` is the quick check. - **`MxGatewaySkipReason` says "mxgateway backend boot failed: RpcException: Unauthenticated"** — API key mismatch. Verify the `OTOPCUA_PARITY_GW_API_KEY` env var matches the gw's configured key. - **`LegacySkipReason` says "Galaxy ZB SQL not reachable on localhost:1433"** — SQL Server isn't running, or its TCP listener is off. Check `services.msc` for the SQL Server (default) instance. - **`LegacySkipReason` says "Galaxy.Host EXE not built"** — the parity harness looks under `src/.../bin/Debug/net48/`. Build it once: `dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`. Note the separately-published copy at `C:\publish\OtOpcUaGalaxyHost\` is for the Windows service; the parity harness spawns its own subprocess. - **Both halves resolve but parity scenarios assert deltas** — that's the expected outcome the rig exists to surface. Review each delta against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section to decide whether it's a real bug or a pre-accepted divergence. ## After the rig is green Three further steps before PR 7.2 lands: 1. Promote any newly-discovered accepted-delta to the matrix doc with the why. 2. Run the full 24h × 50k soak (or compressed tag count if Galaxy isn't that big) and link the stdout log in the PR description. 3. Pilot the default flip (PR 7.1's `Galaxy.DefaultBackend = "GalaxyMxGateway"`) on a single production node for ~2 weeks. Watch the OTel/metrics surface (`docs/v2/Galaxy.Performance.md`) for regressions. Then 7.2 has the rollback-risk evidence its precondition asks for.