Files
lmxopcua/docs/v2/Galaxy.ParityRig.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

16 KiB
Raw Blame History

Galaxy parity rig — runbook

Completed 2026-04-30 — historical record. This runbook is the recipe that produced the green parity matrix that gated PR 7.2 (retire legacy Galaxy projects, merged at commit ae7106d). The matrix it produced is captured in Galaxy.ParityMatrix.md, also marked historical. The test project this doc drove (Driver.Galaxy.ParityTests) was deleted in PR 7.2, along with Driver.Galaxy.{Host,Proxy,Shared} and the OtOpcUaGalaxyHost Windows service. You cannot re-run this rig today. Current Galaxy testing flows through the gateway's own test suite in the sibling mxaccessgw repo.

The text below is preserved as-written so the migration trail (what was tested, against what shape, with what env vars) stays auditable.

Brings up both Galaxy backends side-by-side against a single live Galaxy so the parity matrix in docs/v2/Galaxy.ParityMatrix.md and the soak scenario in tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs can run for real. Closing the parity matrix was the gate for PR 7.2 (retire legacy Galaxy projects).

Conceptual layout

Galaxy ZB SQL  ──┬──  OtOpcUaGalaxyHost (NSSM service, net48 x86)   [DELETED in PR 7.2]
                 │       └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
                 │       └── named pipe "OtOpcUaGalaxy"
                 │              ▲
                 │              │ pipe IPC
                 │              │
                 │       GalaxyProxyDriver  ◄── parity test (legacy half)
                 │
                 └──  mxaccessgw service
                         └── MxAccess COM, ClientName "OtOpcUa-Parity"
                         └── gRPC on http://localhost:5120
                                ▲
                                │ gRPC
                                │
                         GalaxyDriver (in-process)  ◄── parity test (mxgw half)

Both halves talk to the same Galaxy through two distinct MxAccess sessions (different ClientNames so they don't evict each other).

What was on the dev box at the time

Per ~/.claude/projects/.../memory/ as of the rig run:

  • AVEVA System Platform + Galaxy + MXAccess runtimeproject_aveva_platform_installed.md.
  • OtOpcUaGalaxyHost Windows service running as dohertj2, NSSM-wrapped, binary at C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe, shared secret at .local/galaxy-host-secret.txt, ZB SQL on localhost:1433project_galaxy_host_installed.md. (Service uninstalled and binary retired as part of PR 7.2; the host source project no longer exists in this repo.)
  • Parity test project (Driver.Galaxy.ParityTests) — committed and skip-clean at the time of the rig run. Deleted in PR 7.2.

Setup steps (one-time)

1. Build + run mxaccessgw

The gateway source is at c:\Users\dohertj2\Desktop\mxaccessgw\. Build both halves — the worker has to be x86 net48 (MxAccess COM bitness), the server is .NET 10:

cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet build src\MxGateway.Worker -c Release   # produces bin\x86\Release\net48\MxGateway.Worker.exe
dotnet build src\MxGateway.Server -c Release   # produces bin\Release\net10.0\MxGateway.Server.dll

Initialize the auth database and mint an API key. The CLI mode is gated by an apikey first-arg prefix:

$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper"   # any stable string for dev
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"

dotnet $srv apikey init-db                                # → "init-db: initialized"

dotnet $srv apikey create-key `
  --key-id parity-rig `
  --display-name "OtOpcUa-Parity" `
  --scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
# → "API key: mxgw_parity-rig_<base64suffix>"   ← capture this; you can't list secrets later

Save that exact key string for OTOPCUA_PARITY_GW_API_KEY in step 2.

Run the server with three env-var overrides — the defaults don't quite match what gRPC + the parity test need:

$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper"   # MUST match the create-key invocation
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
$env:Kestrel__Endpoints__Http__Protocols = "Http2"        # gRPC needs h2c on plain HTTP
$env:MxGateway__Worker__ExecutablePath = `
  "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
  # appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that

dotnet $srv
# → "Now listening on: http://localhost:5120"

The worker spawns lazily on the first OpenSession RPC — there's no worker process visible in Task Manager until the first session. If the worker can't spawn, the server returns Failed to open session session-… with a WorkerProcessLaunchException in the server log.

NSSM-wrap it later if the rig becomes long-lived; for first-pass provisioning a console window is easier to inspect.

2. Set the parity env vars

In the test-runner shell:

$env:OTOPCUA_PARITY_GW_ENDPOINT  = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY   = "parity-suite-key"   # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME  = "OtOpcUa-Parity"

Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated dohertj2 shells alike (the Administrators deny ACE was removed 2026-04-24; see project_galaxy_host_installed.md).

3. Verify both halves resolve

cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
  --filter "FullyQualifiedName~HarnessShapeTests"

Harness_records_a_skip_reason_for_each_unavailable_backend is the two-line truth-teller:

  • Both LegacyDriver non-null + both MxGatewayDriver non-null → rig is up.
  • One side null → read its LegacySkipReason / MxGatewaySkipReason and fix.

Running the matrix

Once both halves resolve:

dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
  --filter "Category=ParityE2E"

This runs all 17 scenario tests across the seven scenario classes (BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect / ScanState). Each scenario class is independent — failures in one don't block the rest.

Track the result against docs/v2/Galaxy.ParityMatrix.md. Update each row to:

  • green if the scenario passes
  • yellow if it skipped because the dev Galaxy doesn't have the right shape (see coverage matrix below)
  • red if it asserted a real delta — those are the deltas that block PR 7.2; chase each before retiring the legacy backend

Galaxy shape needed for full coverage

Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a real result, the dev Galaxy needs the shape in the right column:

Scenario Needs Local rig
BrowseAndReadParityTests (3 tests) Any deployed objects with attributes existing seed
SubscribeAndEventRateParityTests event-rate ≥5 attributes whose values change in 3s ⚙ scriptable via graccess-cli
WriteByClassificationParityTests (FreeAccess/Operate) A FreeAccess/Operate numeric attribute ⚙ scriptable via graccess-cli
WriteByClassificationParityTests (Configure/Tune) A Configure/Tune attribute ⚙ scriptable via graccess-cli
AlarmTransitionParityTests (2 tests) Attributes with the $Alarm* extension ⚙ scriptable via graccess-cli
HistoryReadParityTests (historized set) Attributes with the History extension ⚙ scriptable via graccess-cli
ScanStateProbeParityTests (2 tests) Multiple $WinPlatform / $AppEngine objects deferred to customer rig — this dev box is provisioned for one platform only

The single-platform constraint

The dev box at DESKTOP-6JL3KKO is licensed / configured for a single deployed $WinPlatform. Adding a second platform isn't feasible here, so ScanStateProbeParityTests will skip in a "no overlap" branch on this rig. Both of its scenarios already handle that case gracefully (Assert.Skip("no overlapping platform hosts between backends — likely the transport names differ but no $WinPlatform was discovered")), so the matrix reports them as n/a (deferred) rather than red.

Plan: defer the two ScanState scenarios to a customer rig with multiple platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows provided the legacy GalaxyRuntimeProbeManager and the in-process PerPlatformProbeWatcher have matching unit-test coverage of the state-decoder + member-tracking logic — which they do (PR 4.7's tests). Treat the runtime parity check as a customer-rig acceptance gate before that customer goes live, not a precondition for retiring the legacy projects on this dev box.

Provisioning the rest via graccess-cli

C:\Users\dohertj2\Desktop\graccess\graccess_cli\ is a .NET Framework 4.8 console app over the ArchestrA GRAccess COM API. It can configure templates, instances, attributes, UDAs, extensions, and attribute security — i.e. every row above marked ⚙ scriptable. Full surface in graccess/graccess_cli/docs/usage.md and per-area workflow guides (attribute-editing.md, template-editing.md, template-instance-editing.md).

Reserve a sandbox UDO (e.g. OtOpcUaParityTest) to avoid mutating attributes on plant-relevant objects. Concrete commands per requirement:

A FreeAccess/Operate numeric attribute (covers WriteByClassification FreeAccess/Operate scenario):

graccess object uda add `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --uda OperateValue --data-type MxFloat `
  --category MxCategoryWriteable_C --security MxSecurityOperate `
  --confirm --confirm-target OtOpcUaParityTest

A Configure / Tune attribute (covers WriteByClassification Configure/Tune scenario):

# Tune
graccess object uda add `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --uda TuneValue --data-type MxFloat `
  --category MxCategoryWriteable_T --security MxSecurityTune `
  --confirm --confirm-target OtOpcUaParityTest

# Configure
graccess object uda add `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --uda ConfigValue --data-type MxFloat `
  --category MxCategoryWriteable_C --security MxSecurityConfigure `
  --confirm --confirm-target OtOpcUaParityTest

A changing-value attribute (covers Subscribe event-rate scenario). Two ways:

  1. On-scan increment — bind a script extension that bumps a counter each scan. Simplest to author with object extension add against ScriptExtension plus object attribute set for the script body (see attribute-editing.md §"Edit Extensions" for the pattern).
  2. External writer loop — leave the attribute as plain Float and run a one-liner that writes incrementing values from the parity-test shell. Uses the legacy backend path so it's available before the mxgw subscriber is up. This keeps the Galaxy template clean.

For first-pass validation pick #2 — no template surgery needed, and the write loop runs only during dotnet test.

Attributes with the $Alarm* extension (covers AlarmTransition scenario). Per attribute-editing.md §"Edit Alarm Settings" the likely-named attributes vary by extension type (Limit, RateOfChange, etc.). Add the extension via:

graccess object extension add `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --extension-type AnalogLimitAlarm --primitive AlarmInput `
  --object-extension `
  --confirm --confirm-target OtOpcUaParityTest

Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting attributes via object attribute set. Inspect first via object attributes to see the names the extension introduces — they differ across Aveva versions.

Attributes with the History extension (covers HistoryRead routing scenario). History settings are usually attribute or extension attributes; attribute-editing.md §"Edit History Settings" covers the discovery flow. Quick start:

graccess object extension add `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --extension-type HistoryExtension --primitive HistoryRecord `
  --object-extension `
  --confirm --confirm-target OtOpcUaParityTest

# Then enable history on whichever attribute the extension points at
graccess object attribute set `
  --galaxy ZB --name OtOpcUaParityTest --type template `
  --attribute HistoryEnabled --value true --data-type bool `
  --confirm --confirm-target OtOpcUaParityTest

Deploy + restart Galaxy.Host after any of the above so MxAccess sees the change:

graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
  --confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost   # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead

Then re-run the parity matrix. The previously-skipped scenarios should now find a sandbox attribute matching their selector and assert.

Soak run

The 24h × 50k soak gates the production confidence half of PR 7.2.

$env:OTOPCUA_SOAK_RUN      = "1"
$env:OTOPCUA_SOAK_TAGS     = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES  = "1440"   # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"

dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
  --filter "Category=Soak"

The test logs a per-minute CSV-style line to stdout:

soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...

Capture stdout to a file for post-run analysis. The three guards (received growing, dropped/received ratio, working-set delta) all fire mid-run rather than at end-of-test, so a failure surfaces within the first few minutes if the architecture is wrong.

Compressed-tag soak (when Galaxy isn't 50k tags)

A first-pass validation is fine with the override:

$env:OTOPCUA_SOAK_RUN      = "1"
$env:OTOPCUA_SOAK_TAGS     = "500"      # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES  = "60"       # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"

This validates the plumbing (bounded channel, pump invariants, leak guard) but doesn't pin the 50k-tag scaling assertion. Defer the full 50k validation to a customer rig with that scale, or build a synthetic Galaxy with a script that imports 50k attributes onto a generated UDO (~2 hours of one-off work).

Troubleshooting

  • MxGatewaySkipReason says "mxaccessgw not reachable" — the gw isn't listening, or it's on a different port. Test-NetConnection localhost -Port 5120 is the quick check.
  • MxGatewaySkipReason says "mxgateway backend boot failed: RpcException: Unauthenticated" — API key mismatch. Verify the OTOPCUA_PARITY_GW_API_KEY env var matches the gw's configured key.
  • LegacySkipReason says "Galaxy ZB SQL not reachable on localhost:1433" — SQL Server isn't running, or its TCP listener is off. Check services.msc for the SQL Server (default) instance.
  • LegacySkipReason says "Galaxy.Host EXE not built" — at rig time the parity harness looked under src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/ for the EXE it spawned as a subprocess, separate from the published copy at C:\publish\OtOpcUaGalaxyHost\ used by the Windows service. Both the source project and the published binary were removed in PR 7.2, so this troubleshooting branch no longer applies — the legacy half cannot be brought up at all.
  • Both halves resolve but parity scenarios assert deltas — that's the expected outcome the rig exists to surface. Review each delta against docs/v2/Galaxy.ParityMatrix.md's "Accepted deltas" section to decide whether it's a real bug or a pre-accepted divergence.

After the rig is green

When the matrix is fully green or carries documented accepted-deltas, PR 7.2 (legacy project deletion) is unblocked. The only follow-up is to promote any newly-discovered accepted-delta to the matrix doc with the why so the matrix history stays auditable.