The parity matrix gate is the precondition for retiring the legacy Galaxy projects. The 24h × 50k soak run and 2-week production pilot were sketched in early planning as additional safety nets but aren't operationally applicable for this deployment — there's no separate production fleet to pilot against, and the soak harness's value is as ongoing diagnostic infrastructure (still shipped in PR 6.4) rather than a one-shot release gate. PR 7.2's only remaining precondition is the matrix being fully green or carrying documented accepted-deltas — verified 2026-04-30 on the dev rig: 14 passed / 1 skipped / 0 failed. Affected: - docs/v2/Galaxy.ParityMatrix.md "Outstanding deltas" — flips to "PR 7.2 is unblocked" - docs/v2/Galaxy.ParityRig.md "After the rig is green" — drops the three-step soak+pilot flow, keeps only the matrix-doc bookkeeping follow-up - lmx_mxgw_impl.md PR 7.2 "Depends on" — replaces "fully soaked" with the matrix-green precondition + the verification date Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Galaxy parity rig — runbook
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in docs/v2/Galaxy.ParityMatrix.md and the soak
scenario in tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs
can run for real. Closing the parity matrix is the gate for PR 7.2
(retire legacy Galaxy projects).
Conceptual layout
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86)
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
│ │ pipe IPC
│ │
│ GalaxyProxyDriver ◄── parity test (legacy half)
│
└── mxaccessgw service
└── MxAccess COM, ClientName "OtOpcUa-Parity"
└── gRPC on http://localhost:5120
▲
│ gRPC
│
GalaxyDriver (in-process) ◄── parity test (mxgw half)
Both halves talk to the same Galaxy through two distinct MxAccess sessions (different ClientNames so they don't evict each other).
What's already on this dev box
Per ~/.claude/projects/.../memory/:
- AVEVA System Platform + Galaxy + MXAccess runtime —
project_aveva_platform_installed.md. OtOpcUaGalaxyHostWindows service running asdohertj2, NSSM-wrapped, binary atC:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe, shared secret at.local/galaxy-host-secret.txt, ZB SQL onlocalhost:1433—project_galaxy_host_installed.md.- Parity test project (
Driver.Galaxy.ParityTests) committed and skip-clean — runs as soon as the mxgw half resolves.
Setup steps (one-time)
1. Build + run mxaccessgw
The gateway source is at c:\Users\dohertj2\Desktop\mxaccessgw\.
Build both halves — the worker has to be x86 net48 (MxAccess COM
bitness), the server is .NET 10:
cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet build src\MxGateway.Worker -c Release # produces bin\x86\Release\net48\MxGateway.Worker.exe
dotnet build src\MxGateway.Server -c Release # produces bin\Release\net10.0\MxGateway.Server.dll
Initialize the auth database and mint an API key. The CLI mode is
gated by an apikey first-arg prefix:
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # any stable string for dev
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"
dotnet $srv apikey init-db # → "init-db: initialized"
dotnet $srv apikey create-key `
--key-id parity-rig `
--display-name "OtOpcUa-Parity" `
--scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
# → "API key: mxgw_parity-rig_<base64suffix>" ← capture this; you can't list secrets later
Save that exact key string for OTOPCUA_PARITY_GW_API_KEY in step 2.
Run the server with three env-var overrides — the defaults don't quite match what gRPC + the parity test need:
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # MUST match the create-key invocation
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
$env:Kestrel__Endpoints__Http__Protocols = "Http2" # gRPC needs h2c on plain HTTP
$env:MxGateway__Worker__ExecutablePath = `
"C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
# appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that
dotnet $srv
# → "Now listening on: http://localhost:5120"
The worker spawns lazily on the first OpenSession RPC — there's no
worker process visible in Task Manager until the first session. If
the worker can't spawn, the server returns Failed to open session session-… with a WorkerProcessLaunchException in the server log.
NSSM-wrap it later if the rig becomes long-lived; for first-pass provisioning a console window is easier to inspect.
2. Set the parity env vars
In the test-runner shell:
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
elevated and non-elevated dohertj2 shells alike (the Administrators deny
ACE was removed 2026-04-24; see project_galaxy_host_installed.md).
3. Verify both halves resolve
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "FullyQualifiedName~HarnessShapeTests"
Harness_records_a_skip_reason_for_each_unavailable_backend is the
two-line truth-teller:
- Both
LegacyDrivernon-null + bothMxGatewayDrivernon-null → rig is up. - One side null → read its
LegacySkipReason/MxGatewaySkipReasonand fix.
Running the matrix
Once both halves resolve:
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=ParityE2E"
This runs all 17 scenario tests across the seven scenario classes (BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect / ScanState). Each scenario class is independent — failures in one don't block the rest.
Track the result against docs/v2/Galaxy.ParityMatrix.md. Update each
row to:
- green if the scenario passes
- yellow if it skipped because the dev Galaxy doesn't have the right shape (see coverage matrix below)
- red if it asserted a real delta — those are the deltas that block PR 7.2; chase each before retiring the legacy backend
Galaxy shape needed for full coverage
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a real result, the dev Galaxy needs the shape in the right column:
| Scenario | Needs | Local rig |
|---|---|---|
BrowseAndReadParityTests (3 tests) |
Any deployed objects with attributes | ✅ existing seed |
SubscribeAndEventRateParityTests event-rate |
≥5 attributes whose values change in 3s | ⚙ scriptable via graccess-cli |
WriteByClassificationParityTests (FreeAccess/Operate) |
A FreeAccess/Operate numeric attribute | ⚙ scriptable via graccess-cli |
WriteByClassificationParityTests (Configure/Tune) |
A Configure/Tune attribute | ⚙ scriptable via graccess-cli |
AlarmTransitionParityTests (2 tests) |
Attributes with the $Alarm* extension |
⚙ scriptable via graccess-cli |
HistoryReadParityTests (historized set) |
Attributes with the History extension | ⚙ scriptable via graccess-cli |
ScanStateProbeParityTests (2 tests) |
Multiple $WinPlatform / $AppEngine objects |
❌ deferred to customer rig — this dev box is provisioned for one platform only |
The single-platform constraint
The dev box at DESKTOP-6JL3KKO is licensed / configured for a single
deployed $WinPlatform. Adding a second platform isn't feasible here,
so ScanStateProbeParityTests will skip in a "no overlap" branch on
this rig. Both of its scenarios already handle that case gracefully
(Assert.Skip("no overlapping platform hosts between backends — likely the transport names differ but no $WinPlatform was discovered")), so
the matrix reports them as n/a (deferred) rather than red.
Plan: defer the two ScanState scenarios to a customer rig with multiple
platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows
provided the legacy GalaxyRuntimeProbeManager and the in-process
PerPlatformProbeWatcher have matching unit-test coverage of the
state-decoder + member-tracking logic — which they do (PR 4.7's tests).
Treat the runtime parity check as a customer-rig acceptance gate before
that customer goes live, not a precondition for retiring the legacy
projects on this dev box.
Provisioning the rest via graccess-cli
C:\Users\dohertj2\Desktop\graccess\graccess_cli\ is a .NET Framework
4.8 console app over the ArchestrA GRAccess COM API. It can configure
templates, instances, attributes, UDAs, extensions, and attribute
security — i.e. every row above marked ⚙ scriptable. Full surface in
graccess/graccess_cli/docs/usage.md and per-area workflow guides
(attribute-editing.md, template-editing.md,
template-instance-editing.md).
Reserve a sandbox UDO (e.g. OtOpcUaParityTest) to avoid mutating
attributes on plant-relevant objects. Concrete commands per requirement:
A FreeAccess/Operate numeric attribute (covers WriteByClassification FreeAccess/Operate scenario):
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda OperateValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityOperate `
--confirm --confirm-target OtOpcUaParityTest
A Configure / Tune attribute (covers WriteByClassification Configure/Tune scenario):
# Tune
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda TuneValue --data-type MxFloat `
--category MxCategoryWriteable_T --security MxSecurityTune `
--confirm --confirm-target OtOpcUaParityTest
# Configure
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda ConfigValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityConfigure `
--confirm --confirm-target OtOpcUaParityTest
A changing-value attribute (covers Subscribe event-rate scenario). Two ways:
- On-scan increment — bind a script extension that bumps a counter
each scan. Simplest to author with
object extension addagainstScriptExtensionplusobject attribute setfor the script body (seeattribute-editing.md§"Edit Extensions" for the pattern). - External writer loop — leave the attribute as plain Float and run a one-liner that writes incrementing values from the parity-test shell. Uses the legacy backend path so it's available before the mxgw subscriber is up. This keeps the Galaxy template clean.
For first-pass validation pick #2 — no template surgery needed, and the
write loop runs only during dotnet test.
Attributes with the $Alarm* extension (covers AlarmTransition
scenario). Per attribute-editing.md §"Edit Alarm Settings" the
likely-named attributes vary by extension type
(Limit, RateOfChange, etc.). Add the extension via:
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type AnalogLimitAlarm --primitive AlarmInput `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting
attributes via object attribute set. Inspect first via
object attributes to see the names the extension introduces — they
differ across Aveva versions.
Attributes with the History extension (covers HistoryRead routing
scenario). History settings are usually attribute or extension
attributes; attribute-editing.md §"Edit History Settings" covers the
discovery flow. Quick start:
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type HistoryExtension --primitive HistoryRecord `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
# Then enable history on whichever attribute the extension points at
graccess object attribute set `
--galaxy ZB --name OtOpcUaParityTest --type template `
--attribute HistoryEnabled --value true --data-type bool `
--confirm --confirm-target OtOpcUaParityTest
Deploy + restart Galaxy.Host after any of the above so MxAccess sees the change:
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
--confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost
Then re-run the parity matrix. The previously-skipped scenarios should now find a sandbox attribute matching their selector and assert.
Soak run
The 24h × 50k soak gates the production confidence half of PR 7.2.
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=Soak"
The test logs a per-minute CSV-style line to stdout:
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...
Capture stdout to a file for post-run analysis. The three guards
(received growing, dropped/received ratio, working-set delta) all
fire mid-run rather than at end-of-test, so a failure surfaces within
the first few minutes if the architecture is wrong.
Compressed-tag soak (when Galaxy isn't 50k tags)
A first-pass validation is fine with the override:
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
This validates the plumbing (bounded channel, pump invariants, leak guard) but doesn't pin the 50k-tag scaling assertion. Defer the full 50k validation to a customer rig with that scale, or build a synthetic Galaxy with a script that imports 50k attributes onto a generated UDO (~2 hours of one-off work).
Troubleshooting
MxGatewaySkipReasonsays "mxaccessgw not reachable" — the gw isn't listening, or it's on a different port.Test-NetConnection localhost -Port 5120is the quick check.MxGatewaySkipReasonsays "mxgateway backend boot failed: RpcException: Unauthenticated" — API key mismatch. Verify theOTOPCUA_PARITY_GW_API_KEYenv var matches the gw's configured key.LegacySkipReasonsays "Galaxy ZB SQL not reachable on localhost:1433" — SQL Server isn't running, or its TCP listener is off. Checkservices.mscfor the SQL Server (default) instance.LegacySkipReasonsays "Galaxy.Host EXE not built" — the parity harness looks undersrc/.../bin/Debug/net48/. Build it once:dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host. Note the separately-published copy atC:\publish\OtOpcUaGalaxyHost\is for the Windows service; the parity harness spawns its own subprocess.- Both halves resolve but parity scenarios assert deltas — that's
the expected outcome the rig exists to surface. Review each delta
against
docs/v2/Galaxy.ParityMatrix.md's "Accepted deltas" section to decide whether it's a real bug or a pre-accepted divergence.
After the rig is green
When the matrix is fully green or carries documented accepted-deltas, PR 7.2 (legacy project deletion) is unblocked. The only follow-up is to promote any newly-discovered accepted-delta to the matrix doc with the why so the matrix history stays auditable.