Files
lmxopcua/docs/v2/Galaxy.ParityRig.md
Joseph Doherty 006af51768 docs: post-PR-7.2 cleanup — audit + three-track scrub
Audit (three parallel agent passes) found 43 markdown files carrying
stale references to the deleted Galaxy.Host/Proxy/Shared projects
after the v2-mxgw merge. This commit lands the prioritized fixes.

Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted)
- README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install
  text; leads with the multi-driver .NET 10 server identity and points
  at scripts/install/Install-Services.ps1 and the parity rig.
- docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the
  Tier-C out-of-process spec with a Tier-A in-process description
  matching the current GalaxyDriver code, with the four-section
  GalaxyDriverOptions JSON shape pulled verbatim from
  Config/GalaxyDriverOptions.cs.
- docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the
  current Browse/Runtime/Health/Config sub-folders.

Track 2 — historical banners (5 files)
- lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md,
  docs/v2/Galaxy.ParityMatrix.md,
  docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a
  " Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md
  also fixes two dead links (`docs/Galaxy.Driver.md` and
  `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`.

Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs)
- Moved 10 v1 docs under docs/v1/ preserving subpath structure:
  AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess,
  Subscriptions (top-level); drivers/Galaxy-Repository,
  drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs,
  reqs/MxAccessClientReqs, reqs/ServiceHostReqs.
- New docs/v1/README.md is the shared archive banner + per-file table.
- docs/README.md repointed to the v1 paths and updated to reflect the
  v2 two-process deploy shape (Server + Admin + optional
  OtOpcUaWonderwareHistorian).
- docs/v2/Galaxy.ParityRig.md got a historical banner + four inline
  scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host
  EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2.

The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now
describes only the post-PR-7.2 architecture. v1 docs are preserved as
a labelled archive under docs/v1/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:59:59 -04:00

382 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Galaxy parity rig — runbook
> ✅ **Completed 2026-04-30 — historical record.** This runbook is the
> recipe that produced the green parity matrix that gated PR 7.2
> (retire legacy Galaxy projects, merged at commit `ae7106d`). The
> matrix it produced is captured in
> [`Galaxy.ParityMatrix.md`](Galaxy.ParityMatrix.md), also marked
> historical. The test project this doc drove
> (`Driver.Galaxy.ParityTests`) was deleted in PR 7.2, along with
> `Driver.Galaxy.{Host,Proxy,Shared}` and the `OtOpcUaGalaxyHost`
> Windows service. **You cannot re-run this rig today.** Current
> Galaxy testing flows through the gateway's own test suite in the
> sibling `mxaccessgw` repo.
>
> The text below is preserved as-written so the migration trail (what
> was tested, against what shape, with what env vars) stays auditable.
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak
scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs`
can run for real. Closing the parity matrix was the gate for PR 7.2
(retire legacy Galaxy projects).
## Conceptual layout
```
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) [DELETED in PR 7.2]
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
│ │ pipe IPC
│ │
│ GalaxyProxyDriver ◄── parity test (legacy half)
└── mxaccessgw service
└── MxAccess COM, ClientName "OtOpcUa-Parity"
└── gRPC on http://localhost:5120
│ gRPC
GalaxyDriver (in-process) ◄── parity test (mxgw half)
```
Both halves talk to the **same Galaxy** through **two distinct MxAccess
sessions** (different ClientNames so they don't evict each other).
## What was on the dev box at the time
Per `~/.claude/projects/.../memory/` *as of the rig run*:
- **AVEVA System Platform + Galaxy + MXAccess runtime** — `project_aveva_platform_installed.md`.
- **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped,
binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`,
shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433`
`project_galaxy_host_installed.md`. **(Service uninstalled and binary
retired as part of PR 7.2; the host source project no longer exists in
this repo.)**
- **Parity test project** (`Driver.Galaxy.ParityTests`) — committed and
skip-clean at the time of the rig run. **Deleted in PR 7.2.**
## Setup steps (one-time)
### 1. Build + run mxaccessgw
The gateway source is at `c:\Users\dohertj2\Desktop\mxaccessgw\`.
Build both halves — the worker has to be x86 net48 (MxAccess COM
bitness), the server is .NET 10:
```powershell
cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet build src\MxGateway.Worker -c Release # produces bin\x86\Release\net48\MxGateway.Worker.exe
dotnet build src\MxGateway.Server -c Release # produces bin\Release\net10.0\MxGateway.Server.dll
```
Initialize the auth database and mint an API key. The CLI mode is
gated by an `apikey` first-arg prefix:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # any stable string for dev
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"
dotnet $srv apikey init-db # → "init-db: initialized"
dotnet $srv apikey create-key `
--key-id parity-rig `
--display-name "OtOpcUa-Parity" `
--scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
# → "API key: mxgw_parity-rig_<base64suffix>" ← capture this; you can't list secrets later
```
Save that exact key string for `OTOPCUA_PARITY_GW_API_KEY` in step 2.
Run the server with three env-var overrides — the defaults don't
quite match what gRPC + the parity test need:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # MUST match the create-key invocation
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
$env:Kestrel__Endpoints__Http__Protocols = "Http2" # gRPC needs h2c on plain HTTP
$env:MxGateway__Worker__ExecutablePath = `
"C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
# appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that
dotnet $srv
# → "Now listening on: http://localhost:5120"
```
The worker spawns lazily on the first OpenSession RPC — there's no
worker process visible in Task Manager until the first session. If
the worker can't spawn, the server returns `Failed to open session
session-…` with a `WorkerProcessLaunchException` in the server log.
NSSM-wrap it later if the rig becomes long-lived; for first-pass
provisioning a console window is easier to inspect.
### 2. Set the parity env vars
In the test-runner shell:
```powershell
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
```
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
elevated and non-elevated `dohertj2` shells alike (the Administrators deny
ACE was removed 2026-04-24; see `project_galaxy_host_installed.md`).
### 3. Verify both halves resolve
```powershell
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "FullyQualifiedName~HarnessShapeTests"
```
`Harness_records_a_skip_reason_for_each_unavailable_backend` is the
two-line truth-teller:
- Both `LegacyDriver` non-null + both `MxGatewayDriver` non-null → rig is up.
- One side null → read its `LegacySkipReason` / `MxGatewaySkipReason` and fix.
## Running the matrix
Once both halves resolve:
```powershell
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=ParityE2E"
```
This runs all 17 scenario tests across the seven scenario classes
(BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect /
ScanState). Each scenario class is independent — failures in one don't
block the rest.
Track the result against `docs/v2/Galaxy.ParityMatrix.md`. Update each
row to:
- **green** if the scenario passes
- **yellow** if it skipped because the dev Galaxy doesn't have the right
shape (see coverage matrix below)
- **red** if it asserted a real delta — those are the deltas that block
PR 7.2; chase each before retiring the legacy backend
## Galaxy shape needed for full coverage
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a
real result, the dev Galaxy needs the shape in the right column:
| Scenario | Needs | Local rig |
|---|---|---|
| `BrowseAndReadParityTests` (3 tests) | Any deployed objects with attributes | ✅ existing seed |
| `SubscribeAndEventRateParityTests` event-rate | ≥5 attributes whose values *change* in 3s | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (FreeAccess/Operate) | A FreeAccess/Operate numeric attribute | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (Configure/Tune) | A Configure/Tune attribute | ⚙ scriptable via graccess-cli |
| `AlarmTransitionParityTests` (2 tests) | Attributes with the `$Alarm*` extension | ⚙ scriptable via graccess-cli |
| `HistoryReadParityTests` (historized set) | Attributes with the History extension | ⚙ scriptable via graccess-cli |
| `ScanStateProbeParityTests` (2 tests) | Multiple `$WinPlatform` / `$AppEngine` objects | ❌ **deferred to customer rig** — this dev box is provisioned for one platform only |
### The single-platform constraint
The dev box at `DESKTOP-6JL3KKO` is licensed / configured for a single
deployed `$WinPlatform`. Adding a second platform isn't feasible here,
so `ScanStateProbeParityTests` will skip in a "no overlap" branch on
this rig. Both of its scenarios already handle that case gracefully
(`Assert.Skip("no overlapping platform hosts between backends — likely
the transport names differ but no $WinPlatform was discovered")`), so
the matrix reports them as **n/a (deferred)** rather than red.
Plan: defer the two ScanState scenarios to a customer rig with multiple
platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows
provided the legacy `GalaxyRuntimeProbeManager` and the in-process
`PerPlatformProbeWatcher` have matching unit-test coverage of the
state-decoder + member-tracking logic — which they do (PR 4.7's tests).
Treat the runtime parity check as a customer-rig acceptance gate before
that customer goes live, not a precondition for retiring the legacy
projects on this dev box.
### Provisioning the rest via graccess-cli
`C:\Users\dohertj2\Desktop\graccess\graccess_cli\` is a .NET Framework
4.8 console app over the ArchestrA GRAccess COM API. It can configure
templates, instances, attributes, UDAs, extensions, and attribute
security — i.e. every row above marked ⚙ scriptable. Full surface in
`graccess/graccess_cli/docs/usage.md` and per-area workflow guides
(`attribute-editing.md`, `template-editing.md`,
`template-instance-editing.md`).
Reserve a sandbox UDO (e.g. `OtOpcUaParityTest`) to avoid mutating
attributes on plant-relevant objects. Concrete commands per requirement:
**A FreeAccess/Operate numeric attribute** (covers WriteByClassification
FreeAccess/Operate scenario):
```powershell
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda OperateValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityOperate `
--confirm --confirm-target OtOpcUaParityTest
```
**A Configure / Tune attribute** (covers WriteByClassification
Configure/Tune scenario):
```powershell
# Tune
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda TuneValue --data-type MxFloat `
--category MxCategoryWriteable_T --security MxSecurityTune `
--confirm --confirm-target OtOpcUaParityTest
# Configure
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda ConfigValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityConfigure `
--confirm --confirm-target OtOpcUaParityTest
```
**A changing-value attribute** (covers Subscribe event-rate scenario).
Two ways:
1. *On-scan increment* — bind a script extension that bumps a counter
each scan. Simplest to author with `object extension add` against
`ScriptExtension` plus `object attribute set` for the script body
(see `attribute-editing.md` §"Edit Extensions" for the pattern).
2. *External writer loop* — leave the attribute as plain Float and run
a one-liner that writes incrementing values from the parity-test
shell. Uses the legacy backend path so it's available before the
mxgw subscriber is up. This keeps the Galaxy template clean.
For first-pass validation pick #2 — no template surgery needed, and the
write loop runs only during `dotnet test`.
**Attributes with the `$Alarm*` extension** (covers AlarmTransition
scenario). Per `attribute-editing.md` §"Edit Alarm Settings" the
likely-named attributes vary by extension type
(`Limit`, `RateOfChange`, etc.). Add the extension via:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type AnalogLimitAlarm --primitive AlarmInput `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
```
Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting
attributes via `object attribute set`. Inspect first via
`object attributes` to see the names the extension introduces — they
differ across Aveva versions.
**Attributes with the History extension** (covers HistoryRead routing
scenario). History settings are usually attribute or extension
attributes; `attribute-editing.md` §"Edit History Settings" covers the
discovery flow. Quick start:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type HistoryExtension --primitive HistoryRecord `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
# Then enable history on whichever attribute the extension points at
graccess object attribute set `
--galaxy ZB --name OtOpcUaParityTest --type template `
--attribute HistoryEnabled --value true --data-type bool `
--confirm --confirm-target OtOpcUaParityTest
```
**Deploy + restart Galaxy.Host after any of the above** so MxAccess
sees the change:
```powershell
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
--confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead
```
Then re-run the parity matrix. The previously-skipped scenarios should
now find a sandbox attribute matching their selector and assert.
## Soak run
The 24h × 50k soak gates the production confidence half of PR 7.2.
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=Soak"
```
The test logs a per-minute CSV-style line to stdout:
```
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...
```
Capture stdout to a file for post-run analysis. The three guards
(`received` growing, `dropped/received` ratio, working-set delta) all
fire mid-run rather than at end-of-test, so a failure surfaces within
the first few minutes if the architecture is wrong.
## Compressed-tag soak (when Galaxy isn't 50k tags)
A first-pass validation is fine with the override:
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
```
This validates the *plumbing* (bounded channel, pump invariants, leak
guard) but doesn't pin the 50k-tag scaling assertion. Defer the full
50k validation to a customer rig with that scale, or build a synthetic
Galaxy with a script that imports 50k attributes onto a generated UDO
(~2 hours of one-off work).
## Troubleshooting
- **`MxGatewaySkipReason` says "mxaccessgw not reachable"** — the gw
isn't listening, or it's on a different port. `Test-NetConnection
localhost -Port 5120` is the quick check.
- **`MxGatewaySkipReason` says "mxgateway backend boot failed:
RpcException: Unauthenticated"** — API key mismatch. Verify the
`OTOPCUA_PARITY_GW_API_KEY` env var matches the gw's configured key.
- **`LegacySkipReason` says "Galaxy ZB SQL not reachable on
localhost:1433"** — SQL Server isn't running, or its TCP listener is
off. Check `services.msc` for the SQL Server (default) instance.
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — at rig time
the parity harness looked under
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` for the
EXE it spawned as a subprocess, separate from the published copy at
`C:\publish\OtOpcUaGalaxyHost\` used by the Windows service. **Both
the source project and the published binary were removed in PR 7.2,
so this troubleshooting branch no longer applies — the legacy half
cannot be brought up at all.**
- **Both halves resolve but parity scenarios assert deltas** — that's
the expected outcome the rig exists to surface. Review each delta
against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section
to decide whether it's a real bug or a pre-accepted divergence.
## After the rig is green
When the matrix is fully green or carries documented accepted-deltas,
PR 7.2 (legacy project deletion) is unblocked. The only follow-up is
to promote any newly-discovered accepted-delta to the matrix doc with
the why so the matrix history stays auditable.