Files
lmxopcua/docs/v2/Galaxy.ParityMatrix.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

9.6 KiB
Raw Blame History

Completed 2026-04-30 — historical record of the parity-rig validation gate for PR 7.2.

The matrix below was the go/no-go gate for retiring the legacy Galaxy.Host backend (PR 7.2). Final run on the dev rig 2026-04-30 returned 14 passed / 1 skipped / 0 failed; PR 7.2 (commit fe91d42) deleted the legacy projects + service the next day. The "Running the matrix" section is preserved for historical reproducibility but the test projects it references (Driver.Galaxy.ParityTests) were deleted alongside the legacy backend; this matrix is no longer runnable. Current Galaxy testing flows through the gateway's own test suite (sibling mxaccessgw repo).

Galaxy backend parity matrix

This document tracks the scenario × result matrix that the Driver.Galaxy.ParityTests suite drives against both Galaxy backends — the legacy out-of-process Galaxy.Host (.NET 4.8 x86 + MXAccess COM, fronted by GalaxyProxyDriver) and the new in-process mxgateway backend (GalaxyDriver, .NET 10 + gRPC against mxaccessgw).

Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip (PR 7.1) consumes this matrix as its go/no-go gate — every row must be either green or carry an explicit accepted-delta justification.

Reading the matrix

  • Status: green — the scenario asserts strict parity and passes (or skips cleanly when the rig isn't up).
  • Status: yellow — soft pin only (count or shape parity, not value parity) — acceptable when the underlying COM/gRPC stacks have known divergences in raw payloads but the surface presented to the DriverNodeManager is equivalent.
  • Status: red — divergence detected. Row carries a fix or a follow-up task ID.

Scenarios

Last verified end-to-end on the dev parity rig: 2026-04-30 (legacy OtOpcUaGalaxyHost mxaccess backend; mxaccessgw v1.x at http://localhost:5120; sandbox OtOpcUaParityTest_001 deployed in the ZB galaxy; 13 passed / 1 skipped / 0 failed in 19 minutes).

PR Test class Scenario Status Notes
5.2 BrowseAndReadParityTests Same variable set green symmetric set diff on full-reference set, after [] array-suffix workaround in GalaxyDiscoverer
5.2 BrowseAndReadParityTests Same DataType / SecurityClass / IsHistorized green per-attribute meta triple parity
5.2 BrowseAndReadParityTests Same StatusCode-class on a sampled read yellow pins status class (Bad/Uncertain/Good); CLR type intentionally not asserted — see "Accepted deltas" #6
5.3 SubscribeAndEventRateParityTests Subscribe returns a handle on each backend green symmetric Unsubscribe cleanup
5.3 SubscribeAndEventRateParityTests Event rate within ±50% over 3s yellow both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter
5.4 WriteByClassificationParityTests FreeAccess / Operate write status-class parity yellow pins status class only; legacy flat-maps every failure to BadInternalError, mxgw distinguishes (BadCommunicationError, BadDeviceFailure, etc.) — see "Accepted deltas" #7
5.4 WriteByClassificationParityTests Configure / Tune routes via secured-write yellow same status-class pin
5.5 AlarmTransitionParityTests Same alarm-condition source-node-id set green one-way invariant on sub-attribute refs (legacy populated → mxgw matches; legacy null → mxgw free to populate per AlarmRefBuilder)
5.5 AlarmTransitionParityTests IsAlarm-marked variable count parity green soft pin — count must match, doesn't have to be non-zero
5.6 HistoryReadParityTests Same historized attribute set green what HistoryRouter consumes when routing to the Wonderware sidecar
5.6 HistoryReadParityTests New mxgw GalaxyDriver does not implement IHistoryProvider green architectural pin from Phase 1 (PR 1.3) on the new path; legacy GalaxyProxyDriver keeps the interface for back-compat until PR 7.2 — see "Accepted deltas" #8
5.7 ReconnectParityTests Reinitialize → both Healthy + reads succeed green recovery latency is not pinned (legacy: pipe + COM client; mxgw: re-Register gw session)
5.7 ReconnectParityTests Health diverges only when one side recovers yellow soft pin until a toxiproxy-style fault injector lands
5.8 ScanStateProbeParityTests Same per-platform host set n/a — deferred dev rig is licensed for one $WinPlatform only; multi-platform parity deferred to a customer rig (PR 4.7's unit tests pin the state-decoder + member-tracking logic)
5.8 ScanStateProbeParityTests Same HostState per overlapping platform n/a — deferred same single-platform constraint

Accepted deltas

These are intentional differences between the two backends — the parity suite skips or tolerates them by design.

  1. Transport-entry host name. The legacy backend's IHostConnectivityProbe surface includes a host entry named after the Galaxy.Host process identity; the mxgw backend uses the configured MxAccess.ClientName. The names differ, but both are correct for their respective sessions — the parity test compares only the platform-host subset.

  2. Reconnect latency cadence. Legacy reconnect roundtrips an OS named pipe + an MxAccess COM client + a Galaxy.Host process restart if the host died. The mxgw reconnect re-Registers the gateway session over an existing gRPC channel. Sub-second vs multi-second recoveries are both correct for their own paths; only the eventual Healthy convergence is pinned.

  3. Read-value drift. A read sampled twice on a live Galaxy can return different values legitimately. We pin StatusCode-class parity (Bad/Uncertain/Good); value equality is not pinned.

  4. Event-rate variance. Both backends consume the same upstream MXAccess publish events but route them through different deserializers (LMXProxyServer COM events vs gRPC MxEvent protos). Scheduler jitter on either side can shift counts within a 3s window; we pin a ±50% ratio, not strict equality.

  5. IHistoryProvider on the new path only. Phase 1 (PR 1.3) lifted history off the per-driver path onto the server-owned HistoryRouter for the new in-process GalaxyDriver. The legacy GalaxyProxyDriver still surfaces IHistoryProvider for back-compat with the legacy server bootstrap path — it's an accepted delta retired in PR 7.2 alongside the rest of the legacy projects. The pin we want to enforce is "the new path doesn't regress to per-driver history."

  6. Read value-CLR-type. Legacy returns the raw VARIANT (e.g. Byte[]) for an attribute that hasn't received its first value cycle from MxAccess yet, while mxgw returns the typed value (Single, Int32, etc.). Once a real value is written or scanned, both converge. Pinning CLR-type equality across the uninitialized window adds noise without a real parity invariant — the StatusCode-class assertion already covers the "did the read succeed" question.

  7. Write-failure StatusCode mapping. Legacy MxAccessGalaxyBackend.WriteValuesAsync flat-maps every failure to BadInternalError (0x80020000); mxgw GatewayGalaxyDataWriter.TranslateReply uses MxStatusProxy.RawDetectedBy to distinguish gw-layer faults (BadCommunicationError, 0x80050000) from MxAccess HRESULT faults (BadDeviceFailure, BadNotConnected, etc.). Both yield Bad-status — the parity invariant is the status class, not the exact code. Tighter mapping parity isn't worth investing in: the legacy mapping retires alongside GalaxyProxyDriver in PR 7.2.

  8. Single-platform scope on the dev rig. Two ScanStateProbeParityTests scenarios are deferred to a customer rig with multiple deployed $WinPlatform instances; this dev box is licensed for one. PR 4.7's unit tests (PerPlatformProbeWatcherTests) pin the state-decoder + member-tracking logic at the seam level, so the runtime parity check becomes a customer-rig acceptance gate before that customer goes live, not a precondition for retiring the legacy projects on this dev box.

  9. Workaround for the gw [] array-suffix bug. mxaccessgw/src/MxGateway.Server/Galaxy/GalaxyRepository.cs:173-175 appends [] to the full_tag_reference of array-typed attributes, which MxAccess COM IInstance.AddItem doesn't accept. The lmxopcua discoverer (GalaxyDiscoverer.StripArraySuffix) defensively strips the suffix. Tracked in mxaccessgw/requirements-array-suffix-fix.md; the workaround is removed when that gw fix lands.

Outstanding deltas

None as of 2026-04-30. Phase 7 (PR 7.1) flipped the default to mxgw; PR 7.2 (legacy project deletion) is unblocked — the matrix gate is satisfied and no further soak/pilot precondition applies.

Running the matrix

# Both backends must be reachable for any row to run; rows skip
# cleanly when their backend is unavailable.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/

Environment overrides for the mxgw backend:

Variable Default Purpose
OTOPCUA_PARITY_GW_ENDPOINT http://localhost:5120 mxaccessgw gRPC endpoint
OTOPCUA_PARITY_GW_API_KEY parity-suite-key API key handed to MxGatewayClient
OTOPCUA_PARITY_CLIENT_NAME OtOpcUa-Parity MxAccess.ClientName for the session

The legacy backend reads ZB SQL on localhost:1433 and spawns OtOpcUa.Driver.Galaxy.Host.exe from src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/ — both must exist for the legacy half to resolve.