diff --git a/docs/v2/Galaxy.ParityMatrix.md b/docs/v2/Galaxy.ParityMatrix.md new file mode 100644 index 0000000..c6d6de6 --- /dev/null +++ b/docs/v2/Galaxy.ParityMatrix.md @@ -0,0 +1,104 @@ +# Galaxy backend parity matrix + +This document tracks the scenario × result matrix that the +`Driver.Galaxy.ParityTests` suite drives against both Galaxy backends — +the legacy out-of-process **Galaxy.Host** (.NET 4.8 x86 + MXAccess COM, +fronted by `GalaxyProxyDriver`) and the new in-process **mxgateway** +backend (`GalaxyDriver`, .NET 10 + gRPC against `mxaccessgw`). + +Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip +(PR 7.1) consumes this matrix as its go/no-go gate — every row must be +either green or carry an explicit *accepted-delta* justification. + +## Reading the matrix + +- **Status: green** — the scenario asserts strict parity and passes + (or skips cleanly when the rig isn't up). +- **Status: yellow** — soft pin only (count or shape parity, not value + parity) — acceptable when the underlying COM/gRPC stacks have known + divergences in raw payloads but the surface presented to the + DriverNodeManager is equivalent. +- **Status: red** — divergence detected. Row carries a fix or a + follow-up task ID. + +## Scenarios + +| PR | Test class | Scenario | Status | Notes | +|----|-----------|----------|--------|-------| +| 5.2 | `BrowseAndReadParityTests` | Same variable set | green | symmetric set diff on full-reference set | +| 5.2 | `BrowseAndReadParityTests` | Same DataType / SecurityClass / IsHistorized | green | per-attribute meta triple parity | +| 5.2 | `BrowseAndReadParityTests` | Same StatusCode + value-CLR-type on a sampled read | yellow | raw values legitimately drift between two reads on a live Galaxy; we pin StatusCode + type, not value equality | +| 5.3 | `SubscribeAndEventRateParityTests` | Subscribe returns a handle on each backend | green | symmetric Unsubscribe cleanup | +| 5.3 | `SubscribeAndEventRateParityTests` | Event rate within ±50% over 3s | yellow | both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter | +| 5.4 | `WriteByClassificationParityTests` | FreeAccess / Operate write StatusCode parity | green | both backends use plain Write | +| 5.4 | `WriteByClassificationParityTests` | Configure / Tune routes via secured-write | green | both backends pick up SecurityClassification from DiscoverAsync | +| 5.5 | `AlarmTransitionParityTests` | Same alarm-condition source-node-id set | green | + per-condition SourceName / InitialSeverity / InAlarmRef / DescAttrNameRef | +| 5.5 | `AlarmTransitionParityTests` | IsAlarm-marked variable count parity | green | soft pin — count must match, doesn't have to be non-zero | +| 5.6 | `HistoryReadParityTests` | Same historized attribute set | green | what HistoryRouter consumes when routing to the Wonderware sidecar | +| 5.6 | `HistoryReadParityTests` | Neither backend implements `IHistoryProvider` | green | architectural pin from Phase 1 (PR 1.3) | +| 5.7 | `ReconnectParityTests` | Reinitialize → both Healthy + reads succeed | green | recovery latency is *not* pinned (legacy: pipe + COM client; mxgw: re-Register gw session) | +| 5.7 | `ReconnectParityTests` | Health diverges only when one side recovers | yellow | soft pin until a toxiproxy-style fault injector lands | +| 5.8 | `ScanStateProbeParityTests` | Same per-platform host set | green | transport-entry names differ by design (legacy = Galaxy.Host process; mxgw = `MxAccess.ClientName`) and are excluded | +| 5.8 | `ScanStateProbeParityTests` | Same `HostState` per overlapping platform | green | drives Discover, waits 1.5s for the probe-watcher push, then snapshots both | + +## Accepted deltas + +These are intentional differences between the two backends — the parity +suite skips or tolerates them by design. + +1. **Transport-entry host name.** The legacy backend's + `IHostConnectivityProbe` surface includes a host entry named after + the Galaxy.Host process identity; the mxgw backend uses the + configured `MxAccess.ClientName`. The names differ, but both are + correct for their respective sessions — the parity test compares + only the platform-host subset. + +2. **Reconnect latency cadence.** Legacy reconnect roundtrips an OS + named pipe + an MxAccess COM client + a Galaxy.Host process restart + if the host died. The mxgw reconnect re-Registers the gateway session + over an existing gRPC channel. Sub-second vs multi-second recoveries + are both correct for their own paths; only the eventual `Healthy` + convergence is pinned. + +3. **Read-value drift.** A read sampled twice on a live Galaxy can + return different values legitimately. We pin `StatusCode` and + value-CLR-type equality, not value equality. Driving an explicit + write-then-read pin requires the parity rig to own a writable + sandbox attribute — out of scope for the current suite. + +4. **Event-rate variance.** Both backends consume the same upstream + MXAccess publish events but route them through different deserializers + (LMXProxyServer COM events vs gRPC `MxEvent` protos). Scheduler + jitter on either side can shift counts within a 3s window; we pin a + ±50% ratio, not strict equality. + +5. **Per-driver `IHistoryProvider` is gone.** Phase 1 (PR 1.3) lifted + history off the per-driver path onto the server-owned + `HistoryRouter`. Both Galaxy backends correctly *do not* surface + `IHistoryProvider` — the absence is itself a parity assertion. + +## Outstanding deltas + +None as of PR 5.W. Phase 7 (PR 7.1) flips the default to `mxgw` once +this matrix is fully green on the dev parity rig. + +## Running the matrix + +```bash +# Both backends must be reachable for any row to run; rows skip +# cleanly when their backend is unavailable. +dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/ +``` + +Environment overrides for the mxgw backend: + +| Variable | Default | Purpose | +|----------|---------|---------| +| `OTOPCUA_PARITY_GW_ENDPOINT` | `http://localhost:5120` | mxaccessgw gRPC endpoint | +| `OTOPCUA_PARITY_GW_API_KEY` | `parity-suite-key` | API key handed to `MxGatewayClient` | +| `OTOPCUA_PARITY_CLIENT_NAME` | `OtOpcUa-Parity` | `MxAccess.ClientName` for the session | + +The legacy backend reads ZB SQL on `localhost:1433` and spawns +`OtOpcUa.Driver.Galaxy.Host.exe` from +`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` — both +must exist for the legacy half to resolve.