PR 5.W — Galaxy.ParityMatrix.md

Tabular scenario × result map for the seven Phase 5 parity scenarios
(BrowseAndRead, Subscribe, Write, Alarm, History, Reconnect, ScanState).
Each row records the assertion strength (green strict, yellow soft) and
flags accepted-delta cases:

- Transport-entry host name divergence (legacy = Galaxy.Host process,
  mxgw = MxAccess.ClientName)
- Reconnect latency cadence — different paths, both correct for their
  own session shape
- Sampled-read value drift (we pin StatusCode + type, not value)
- Event-rate ±50% tolerance over a 3s window
- Per-driver IHistoryProvider absence (architectural pin from PR 1.3)

Phase 7 (PR 7.1) consumes this matrix as the default-flip gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-29 16:32:20 -04:00
parent 837172ab39
commit 78fe3e8a45

View File

@@ -0,0 +1,104 @@
# Galaxy backend parity matrix
This document tracks the scenario × result matrix that the
`Driver.Galaxy.ParityTests` suite drives against both Galaxy backends —
the legacy out-of-process **Galaxy.Host** (.NET 4.8 x86 + MXAccess COM,
fronted by `GalaxyProxyDriver`) and the new in-process **mxgateway**
backend (`GalaxyDriver`, .NET 10 + gRPC against `mxaccessgw`).
Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip
(PR 7.1) consumes this matrix as its go/no-go gate — every row must be
either green or carry an explicit *accepted-delta* justification.
## Reading the matrix
- **Status: green** — the scenario asserts strict parity and passes
(or skips cleanly when the rig isn't up).
- **Status: yellow** — soft pin only (count or shape parity, not value
parity) — acceptable when the underlying COM/gRPC stacks have known
divergences in raw payloads but the surface presented to the
DriverNodeManager is equivalent.
- **Status: red** — divergence detected. Row carries a fix or a
follow-up task ID.
## Scenarios
| PR | Test class | Scenario | Status | Notes |
|----|-----------|----------|--------|-------|
| 5.2 | `BrowseAndReadParityTests` | Same variable set | green | symmetric set diff on full-reference set |
| 5.2 | `BrowseAndReadParityTests` | Same DataType / SecurityClass / IsHistorized | green | per-attribute meta triple parity |
| 5.2 | `BrowseAndReadParityTests` | Same StatusCode + value-CLR-type on a sampled read | yellow | raw values legitimately drift between two reads on a live Galaxy; we pin StatusCode + type, not value equality |
| 5.3 | `SubscribeAndEventRateParityTests` | Subscribe returns a handle on each backend | green | symmetric Unsubscribe cleanup |
| 5.3 | `SubscribeAndEventRateParityTests` | Event rate within ±50% over 3s | yellow | both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter |
| 5.4 | `WriteByClassificationParityTests` | FreeAccess / Operate write StatusCode parity | green | both backends use plain Write |
| 5.4 | `WriteByClassificationParityTests` | Configure / Tune routes via secured-write | green | both backends pick up SecurityClassification from DiscoverAsync |
| 5.5 | `AlarmTransitionParityTests` | Same alarm-condition source-node-id set | green | + per-condition SourceName / InitialSeverity / InAlarmRef / DescAttrNameRef |
| 5.5 | `AlarmTransitionParityTests` | IsAlarm-marked variable count parity | green | soft pin — count must match, doesn't have to be non-zero |
| 5.6 | `HistoryReadParityTests` | Same historized attribute set | green | what HistoryRouter consumes when routing to the Wonderware sidecar |
| 5.6 | `HistoryReadParityTests` | Neither backend implements `IHistoryProvider` | green | architectural pin from Phase 1 (PR 1.3) |
| 5.7 | `ReconnectParityTests` | Reinitialize → both Healthy + reads succeed | green | recovery latency is *not* pinned (legacy: pipe + COM client; mxgw: re-Register gw session) |
| 5.7 | `ReconnectParityTests` | Health diverges only when one side recovers | yellow | soft pin until a toxiproxy-style fault injector lands |
| 5.8 | `ScanStateProbeParityTests` | Same per-platform host set | green | transport-entry names differ by design (legacy = Galaxy.Host process; mxgw = `MxAccess.ClientName`) and are excluded |
| 5.8 | `ScanStateProbeParityTests` | Same `HostState` per overlapping platform | green | drives Discover, waits 1.5s for the probe-watcher push, then snapshots both |
## Accepted deltas
These are intentional differences between the two backends — the parity
suite skips or tolerates them by design.
1. **Transport-entry host name.** The legacy backend's
`IHostConnectivityProbe` surface includes a host entry named after
the Galaxy.Host process identity; the mxgw backend uses the
configured `MxAccess.ClientName`. The names differ, but both are
correct for their respective sessions — the parity test compares
only the platform-host subset.
2. **Reconnect latency cadence.** Legacy reconnect roundtrips an OS
named pipe + an MxAccess COM client + a Galaxy.Host process restart
if the host died. The mxgw reconnect re-Registers the gateway session
over an existing gRPC channel. Sub-second vs multi-second recoveries
are both correct for their own paths; only the eventual `Healthy`
convergence is pinned.
3. **Read-value drift.** A read sampled twice on a live Galaxy can
return different values legitimately. We pin `StatusCode` and
value-CLR-type equality, not value equality. Driving an explicit
write-then-read pin requires the parity rig to own a writable
sandbox attribute — out of scope for the current suite.
4. **Event-rate variance.** Both backends consume the same upstream
MXAccess publish events but route them through different deserializers
(LMXProxyServer COM events vs gRPC `MxEvent` protos). Scheduler
jitter on either side can shift counts within a 3s window; we pin a
±50% ratio, not strict equality.
5. **Per-driver `IHistoryProvider` is gone.** Phase 1 (PR 1.3) lifted
history off the per-driver path onto the server-owned
`HistoryRouter`. Both Galaxy backends correctly *do not* surface
`IHistoryProvider` — the absence is itself a parity assertion.
## Outstanding deltas
None as of PR 5.W. Phase 7 (PR 7.1) flips the default to `mxgw` once
this matrix is fully green on the dev parity rig.
## Running the matrix
```bash
# Both backends must be reachable for any row to run; rows skip
# cleanly when their backend is unavailable.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
```
Environment overrides for the mxgw backend:
| Variable | Default | Purpose |
|----------|---------|---------|
| `OTOPCUA_PARITY_GW_ENDPOINT` | `http://localhost:5120` | mxaccessgw gRPC endpoint |
| `OTOPCUA_PARITY_GW_API_KEY` | `parity-suite-key` | API key handed to `MxGatewayClient` |
| `OTOPCUA_PARITY_CLIENT_NAME` | `OtOpcUa-Parity` | `MxAccess.ClientName` for the session |
The legacy backend reads ZB SQL on `localhost:1433` and spawns
`OtOpcUa.Driver.Galaxy.Host.exe` from
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` — both
must exist for the legacy half to resolve.