Walks through standing up both Galaxy backends side-by-side against a single live Galaxy: - Conceptual layout (two MxAccess sessions on distinct ClientNames so they don't evict each other) - What's already on the dev box (AVEVA + OtOpcUaGalaxyHost service) - mxaccessgw build + run + config (API key, ClientName) - The three OTOPCUA_PARITY_* env vars the harness reads - HarnessShapeTests as the two-line truth-teller for "did both halves resolve" - Galaxy-shape coverage matrix mapping each scenario to what's needed for it to assert (rather than skip) - Soak run recipes, including the compressed-tag fallback when the dev Galaxy doesn't have 50k attributes - Troubleshooting for the four common SkipReasons - Three further gates before PR 7.2 lands (matrix green, soak data, pilot flip) Explicitly drops the stale "use a non-elevated shell" precondition — the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated dohertj2 alike (resolved 2026-04-24). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.9 KiB
Galaxy parity rig — runbook
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in docs/v2/Galaxy.ParityMatrix.md and the soak
scenario in tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs
can run for real. Closing the parity matrix is the gate for PR 7.2
(retire legacy Galaxy projects).
Conceptual layout
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86)
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
│ │ pipe IPC
│ │
│ GalaxyProxyDriver ◄── parity test (legacy half)
│
└── mxaccessgw service
└── MxAccess COM, ClientName "OtOpcUa-Parity"
└── gRPC on http://localhost:5120
▲
│ gRPC
│
GalaxyDriver (in-process) ◄── parity test (mxgw half)
Both halves talk to the same Galaxy through two distinct MxAccess sessions (different ClientNames so they don't evict each other).
What's already on this dev box
Per ~/.claude/projects/.../memory/:
- AVEVA System Platform + Galaxy + MXAccess runtime —
project_aveva_platform_installed.md. OtOpcUaGalaxyHostWindows service running asdohertj2, NSSM-wrapped, binary atC:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe, shared secret at.local/galaxy-host-secret.txt, ZB SQL onlocalhost:1433—project_galaxy_host_installed.md.- Parity test project (
Driver.Galaxy.ParityTests) committed and skip-clean — runs as soon as the mxgw half resolves.
Setup steps (one-time)
1. Build + run mxaccessgw
The gateway source is at c:\Users\dohertj2\Desktop\mxaccessgw\. From
that repo:
cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet publish src\MxGateway.Server -c Release -o C:\publish\MxAccessGw
Configure:
- An API key. Pick anything stable (e.g.
parity-suite-key) and put it in whichever config fileMxGateway.Serverreads — seemxaccessgw/gateway.mdfor the current shape. - ClientName for the worker's MxAccess registration — set to
OtOpcUa-Parityso it doesn't collide withOtOpcUa-Galaxy.Host. - Bind to
http://localhost:5120(default inlaunchSettings.json).
Run it as a console app for the first session — easier to inspect logs. NSSM-wrap it later if the rig becomes long-lived:
C:\publish\MxAccessGw\MxGateway.Server.exe
The worker should log a successful Register against MxAccess after a
few seconds. If it loops on Register failures, that's an MxAccess-side
problem — the legacy OtOpcUaGalaxyHost going through the same COM
stack is a known-good reference point.
2. Set the parity env vars
In the test-runner shell:
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
elevated and non-elevated dohertj2 shells alike (the Administrators deny
ACE was removed 2026-04-24; see project_galaxy_host_installed.md).
3. Verify both halves resolve
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "FullyQualifiedName~HarnessShapeTests"
Harness_records_a_skip_reason_for_each_unavailable_backend is the
two-line truth-teller:
- Both
LegacyDrivernon-null + bothMxGatewayDrivernon-null → rig is up. - One side null → read its
LegacySkipReason/MxGatewaySkipReasonand fix.
Running the matrix
Once both halves resolve:
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=ParityE2E"
This runs all 17 scenario tests across the seven scenario classes (BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect / ScanState). Each scenario class is independent — failures in one don't block the rest.
Track the result against docs/v2/Galaxy.ParityMatrix.md. Update each
row to:
- green if the scenario passes
- yellow if it skipped because the dev Galaxy doesn't have the right shape (see coverage matrix below)
- red if it asserted a real delta — those are the deltas that block PR 7.2; chase each before retiring the legacy backend
Galaxy shape needed for full coverage
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a real result, the dev Galaxy needs the shape in the right column:
| Scenario | Needs |
|---|---|
BrowseAndReadParityTests (3 tests) |
Any deployed objects with attributes |
SubscribeAndEventRateParityTests event-rate |
≥5 attributes whose values change in 3s |
WriteByClassificationParityTests (FreeAccess/Operate) |
A FreeAccess/Operate numeric attribute |
WriteByClassificationParityTests (Configure/Tune) |
A Configure/Tune attribute |
AlarmTransitionParityTests (2 tests) |
Attributes with the $Alarm* extension |
HistoryReadParityTests (historized set) |
Attributes with the History extension |
ScanStateProbeParityTests (2 tests) |
Multiple $WinPlatform / $AppEngine objects |
The dev Galaxy from the existing E2E smoke (gr/seed-phase-7-smoke.sql)
covers most of these; the multi-platform scenario probably needs
hand-deploying a second $WinPlatform instance.
Soak run
The 24h × 50k soak gates the production confidence half of PR 7.2.
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=Soak"
The test logs a per-minute CSV-style line to stdout:
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...
Capture stdout to a file for post-run analysis. The three guards
(received growing, dropped/received ratio, working-set delta) all
fire mid-run rather than at end-of-test, so a failure surfaces within
the first few minutes if the architecture is wrong.
Compressed-tag soak (when Galaxy isn't 50k tags)
A first-pass validation is fine with the override:
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
This validates the plumbing (bounded channel, pump invariants, leak guard) but doesn't pin the 50k-tag scaling assertion. Defer the full 50k validation to a customer rig with that scale, or build a synthetic Galaxy with a script that imports 50k attributes onto a generated UDO (~2 hours of one-off work).
Troubleshooting
MxGatewaySkipReasonsays "mxaccessgw not reachable" — the gw isn't listening, or it's on a different port.Test-NetConnection localhost -Port 5120is the quick check.MxGatewaySkipReasonsays "mxgateway backend boot failed: RpcException: Unauthenticated" — API key mismatch. Verify theOTOPCUA_PARITY_GW_API_KEYenv var matches the gw's configured key.LegacySkipReasonsays "Galaxy ZB SQL not reachable on localhost:1433" — SQL Server isn't running, or its TCP listener is off. Checkservices.mscfor the SQL Server (default) instance.LegacySkipReasonsays "Galaxy.Host EXE not built" — the parity harness looks undersrc/.../bin/Debug/net48/. Build it once:dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host. Note the separately-published copy atC:\publish\OtOpcUaGalaxyHost\is for the Windows service; the parity harness spawns its own subprocess.- Both halves resolve but parity scenarios assert deltas — that's
the expected outcome the rig exists to surface. Review each delta
against
docs/v2/Galaxy.ParityMatrix.md's "Accepted deltas" section to decide whether it's a real bug or a pre-accepted divergence.
After the rig is green
Three further steps before PR 7.2 lands:
- Promote any newly-discovered accepted-delta to the matrix doc with the why.
- Run the full 24h × 50k soak (or compressed tag count if Galaxy isn't that big) and link the stdout log in the PR description.
- Pilot the default flip (PR 7.1's
Galaxy.DefaultBackend = "GalaxyMxGateway") on a single production node for ~2 weeks. Watch the OTel/metrics surface (docs/v2/Galaxy.Performance.md) for regressions.
Then 7.2 has the rollback-risk evidence its precondition asks for.