The parity matrix gate is the precondition for retiring the legacy Galaxy projects. The 24h × 50k soak run and 2-week production pilot were sketched in early planning as additional safety nets but aren't operationally applicable for this deployment — there's no separate production fleet to pilot against, and the soak harness's value is as ongoing diagnostic infrastructure (still shipped in PR 6.4) rather than a one-shot release gate. PR 7.2's only remaining precondition is the matrix being fully green or carrying documented accepted-deltas — verified 2026-04-30 on the dev rig: 14 passed / 1 skipped / 0 failed. Affected: - docs/v2/Galaxy.ParityMatrix.md "Outstanding deltas" — flips to "PR 7.2 is unblocked" - docs/v2/Galaxy.ParityRig.md "After the rig is green" — drops the three-step soak+pilot flow, keeps only the matrix-doc bookkeeping follow-up - lmx_mxgw_impl.md PR 7.2 "Depends on" — replaces "fully soaked" with the matrix-green precondition + the verification date Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.9 KiB
Galaxy backend parity matrix
This document tracks the scenario × result matrix that the
Driver.Galaxy.ParityTests suite drives against both Galaxy backends —
the legacy out-of-process Galaxy.Host (.NET 4.8 x86 + MXAccess COM,
fronted by GalaxyProxyDriver) and the new in-process mxgateway
backend (GalaxyDriver, .NET 10 + gRPC against mxaccessgw).
Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip (PR 7.1) consumes this matrix as its go/no-go gate — every row must be either green or carry an explicit accepted-delta justification.
Reading the matrix
- Status: green — the scenario asserts strict parity and passes (or skips cleanly when the rig isn't up).
- Status: yellow — soft pin only (count or shape parity, not value parity) — acceptable when the underlying COM/gRPC stacks have known divergences in raw payloads but the surface presented to the DriverNodeManager is equivalent.
- Status: red — divergence detected. Row carries a fix or a follow-up task ID.
Scenarios
Last verified end-to-end on the dev parity rig: 2026-04-30
(legacy OtOpcUaGalaxyHost mxaccess backend; mxaccessgw v1.x at
http://localhost:5120; sandbox OtOpcUaParityTest_001 deployed in
the ZB galaxy; 13 passed / 1 skipped / 0 failed in 19 minutes).
| PR | Test class | Scenario | Status | Notes |
|---|---|---|---|---|
| 5.2 | BrowseAndReadParityTests |
Same variable set | green | symmetric set diff on full-reference set, after [] array-suffix workaround in GalaxyDiscoverer |
| 5.2 | BrowseAndReadParityTests |
Same DataType / SecurityClass / IsHistorized | green | per-attribute meta triple parity |
| 5.2 | BrowseAndReadParityTests |
Same StatusCode-class on a sampled read | yellow | pins status class (Bad/Uncertain/Good); CLR type intentionally not asserted — see "Accepted deltas" #6 |
| 5.3 | SubscribeAndEventRateParityTests |
Subscribe returns a handle on each backend | green | symmetric Unsubscribe cleanup |
| 5.3 | SubscribeAndEventRateParityTests |
Event rate within ±50% over 3s | yellow | both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter |
| 5.4 | WriteByClassificationParityTests |
FreeAccess / Operate write status-class parity | yellow | pins status class only; legacy flat-maps every failure to BadInternalError, mxgw distinguishes (BadCommunicationError, BadDeviceFailure, etc.) — see "Accepted deltas" #7 |
| 5.4 | WriteByClassificationParityTests |
Configure / Tune routes via secured-write | yellow | same status-class pin |
| 5.5 | AlarmTransitionParityTests |
Same alarm-condition source-node-id set | green | one-way invariant on sub-attribute refs (legacy populated → mxgw matches; legacy null → mxgw free to populate per AlarmRefBuilder) |
| 5.5 | AlarmTransitionParityTests |
IsAlarm-marked variable count parity | green | soft pin — count must match, doesn't have to be non-zero |
| 5.6 | HistoryReadParityTests |
Same historized attribute set | green | what HistoryRouter consumes when routing to the Wonderware sidecar |
| 5.6 | HistoryReadParityTests |
New mxgw GalaxyDriver does not implement IHistoryProvider |
green | architectural pin from Phase 1 (PR 1.3) on the new path; legacy GalaxyProxyDriver keeps the interface for back-compat until PR 7.2 — see "Accepted deltas" #8 |
| 5.7 | ReconnectParityTests |
Reinitialize → both Healthy + reads succeed | green | recovery latency is not pinned (legacy: pipe + COM client; mxgw: re-Register gw session) |
| 5.7 | ReconnectParityTests |
Health diverges only when one side recovers | yellow | soft pin until a toxiproxy-style fault injector lands |
| 5.8 | ScanStateProbeParityTests |
Same per-platform host set | n/a — deferred | dev rig is licensed for one $WinPlatform only; multi-platform parity deferred to a customer rig (PR 4.7's unit tests pin the state-decoder + member-tracking logic) |
| 5.8 | ScanStateProbeParityTests |
Same HostState per overlapping platform |
n/a — deferred | same single-platform constraint |
Accepted deltas
These are intentional differences between the two backends — the parity suite skips or tolerates them by design.
-
Transport-entry host name. The legacy backend's
IHostConnectivityProbesurface includes a host entry named after the Galaxy.Host process identity; the mxgw backend uses the configuredMxAccess.ClientName. The names differ, but both are correct for their respective sessions — the parity test compares only the platform-host subset. -
Reconnect latency cadence. Legacy reconnect roundtrips an OS named pipe + an MxAccess COM client + a Galaxy.Host process restart if the host died. The mxgw reconnect re-Registers the gateway session over an existing gRPC channel. Sub-second vs multi-second recoveries are both correct for their own paths; only the eventual
Healthyconvergence is pinned. -
Read-value drift. A read sampled twice on a live Galaxy can return different values legitimately. We pin
StatusCode-class parity (Bad/Uncertain/Good); value equality is not pinned. -
Event-rate variance. Both backends consume the same upstream MXAccess publish events but route them through different deserializers (LMXProxyServer COM events vs gRPC
MxEventprotos). Scheduler jitter on either side can shift counts within a 3s window; we pin a ±50% ratio, not strict equality. -
IHistoryProvideron the new path only. Phase 1 (PR 1.3) lifted history off the per-driver path onto the server-ownedHistoryRouterfor the new in-processGalaxyDriver. The legacyGalaxyProxyDriverstill surfacesIHistoryProviderfor back-compat with the legacy server bootstrap path — it's an accepted delta retired in PR 7.2 alongside the rest of the legacy projects. The pin we want to enforce is "the new path doesn't regress to per-driver history." -
Read value-CLR-type. Legacy returns the raw VARIANT (e.g.
Byte[]) for an attribute that hasn't received its first value cycle from MxAccess yet, while mxgw returns the typed value (Single,Int32, etc.). Once a real value is written or scanned, both converge. Pinning CLR-type equality across the uninitialized window adds noise without a real parity invariant — theStatusCode-class assertion already covers the "did the read succeed" question. -
Write-failure StatusCode mapping. Legacy
MxAccessGalaxyBackend.WriteValuesAsyncflat-maps every failure toBadInternalError(0x80020000); mxgwGatewayGalaxyDataWriter.TranslateReplyusesMxStatusProxy.RawDetectedByto distinguish gw-layer faults (BadCommunicationError,0x80050000) from MxAccess HRESULT faults (BadDeviceFailure,BadNotConnected, etc.). Both yield Bad-status — the parity invariant is the status class, not the exact code. Tighter mapping parity isn't worth investing in: the legacy mapping retires alongsideGalaxyProxyDriverin PR 7.2. -
Single-platform scope on the dev rig. Two
ScanStateProbeParityTestsscenarios are deferred to a customer rig with multiple deployed$WinPlatforminstances; this dev box is licensed for one. PR 4.7's unit tests (PerPlatformProbeWatcherTests) pin the state-decoder + member-tracking logic at the seam level, so the runtime parity check becomes a customer-rig acceptance gate before that customer goes live, not a precondition for retiring the legacy projects on this dev box. -
Workaround for the gw
[]array-suffix bug.mxaccessgw/src/MxGateway.Server/Galaxy/GalaxyRepository.cs:173-175appends[]to thefull_tag_referenceof array-typed attributes, whichMxAccess COM IInstance.AddItemdoesn't accept. The lmxopcua discoverer (GalaxyDiscoverer.StripArraySuffix) defensively strips the suffix. Tracked inmxaccessgw/requirements-array-suffix-fix.md; the workaround is removed when that gw fix lands.
Outstanding deltas
None as of 2026-04-30. Phase 7 (PR 7.1) flipped the default to
mxgw; PR 7.2 (legacy project deletion) is unblocked — the matrix
gate is satisfied and no further soak/pilot precondition applies.
Running the matrix
# Both backends must be reachable for any row to run; rows skip
# cleanly when their backend is unavailable.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
Environment overrides for the mxgw backend:
| Variable | Default | Purpose |
|---|---|---|
OTOPCUA_PARITY_GW_ENDPOINT |
http://localhost:5120 |
mxaccessgw gRPC endpoint |
OTOPCUA_PARITY_GW_API_KEY |
parity-suite-key |
API key handed to MxGatewayClient |
OTOPCUA_PARITY_CLIENT_NAME |
OtOpcUa-Parity |
MxAccess.ClientName for the session |
The legacy backend reads ZB SQL on localhost:1433 and spawns
OtOpcUa.Driver.Galaxy.Host.exe from
src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/ — both
must exist for the legacy half to resolve.