Audit (three parallel agent passes) found 43 markdown files carrying stale references to the deleted Galaxy.Host/Proxy/Shared projects after the v2-mxgw merge. This commit lands the prioritized fixes. Track 1 — high-traffic in-place rewrites (3 files, ~454 lines deleted) - README.md (202 → 91 lines): drops .NET 4.8 / x86 / TopShelf install text; leads with the multi-driver .NET 10 server identity and points at scripts/install/Install-Services.ps1 and the parity rig. - docs/v2/driver-specs.md §1 Galaxy (~289 → ~66 lines): replaces the Tier-C out-of-process spec with a Tier-A in-process description matching the current GalaxyDriver code, with the four-section GalaxyDriverOptions JSON shape pulled verbatim from Config/GalaxyDriverOptions.cs. - docs/drivers/Galaxy.md (211 → 92 lines): full rewrite around the current Browse/Runtime/Health/Config sub-folders. Track 2 — historical banners (5 files) - lmx_mxgw.md, lmx_mxgw_impl.md, lmx_backend.md, docs/v2/Galaxy.ParityMatrix.md, docs/v2/implementation/phase-2-galaxy-out-of-process.md each get a "✅ Completed 2026-04-30 — historical record" banner block. lmx_mxgw.md also fixes two dead links (`docs/Galaxy.Driver.md` and `docs/v2/Galaxy.Driver.md`) → `docs/drivers/Galaxy.md`. Track 3 — v1 archive sweep (10 git mv + 1 new index + 2 in-place scrubs) - Moved 10 v1 docs under docs/v1/ preserving subpath structure: AlarmTracking, Configuration, DataTypeMapping, HistoricalDataAccess, Subscriptions (top-level); drivers/Galaxy-Repository, drivers/Galaxy-Test-Fixture; reqs/GalaxyRepositoryReqs, reqs/MxAccessClientReqs, reqs/ServiceHostReqs. - New docs/v1/README.md is the shared archive banner + per-file table. - docs/README.md repointed to the v1 paths and updated to reflect the v2 two-process deploy shape (Server + Admin + optional OtOpcUaWonderwareHistorian). - docs/v2/Galaxy.ParityRig.md got a historical banner + four inline scrubs marking the OtOpcUaGalaxyHost service / Driver.Galaxy.Host EXE / Driver.Galaxy.ParityTests project as deleted-in-PR-7.2. The repo's live-reading surface (README + CLAUDE.md + docs/v2/) now describes only the post-PR-7.2 architecture. v1 docs are preserved as a labelled archive under docs/v1/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
382 lines
16 KiB
Markdown
382 lines
16 KiB
Markdown
# Galaxy parity rig — runbook
|
||
|
||
> ✅ **Completed 2026-04-30 — historical record.** This runbook is the
|
||
> recipe that produced the green parity matrix that gated PR 7.2
|
||
> (retire legacy Galaxy projects, merged at commit `ae7106d`). The
|
||
> matrix it produced is captured in
|
||
> [`Galaxy.ParityMatrix.md`](Galaxy.ParityMatrix.md), also marked
|
||
> historical. The test project this doc drove
|
||
> (`Driver.Galaxy.ParityTests`) was deleted in PR 7.2, along with
|
||
> `Driver.Galaxy.{Host,Proxy,Shared}` and the `OtOpcUaGalaxyHost`
|
||
> Windows service. **You cannot re-run this rig today.** Current
|
||
> Galaxy testing flows through the gateway's own test suite in the
|
||
> sibling `mxaccessgw` repo.
|
||
>
|
||
> The text below is preserved as-written so the migration trail (what
|
||
> was tested, against what shape, with what env vars) stays auditable.
|
||
|
||
Brings up both Galaxy backends side-by-side against a single live Galaxy
|
||
so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak
|
||
scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs`
|
||
can run for real. Closing the parity matrix was the gate for PR 7.2
|
||
(retire legacy Galaxy projects).
|
||
|
||
## Conceptual layout
|
||
|
||
```
|
||
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) [DELETED in PR 7.2]
|
||
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
|
||
│ └── named pipe "OtOpcUaGalaxy"
|
||
│ ▲
|
||
│ │ pipe IPC
|
||
│ │
|
||
│ GalaxyProxyDriver ◄── parity test (legacy half)
|
||
│
|
||
└── mxaccessgw service
|
||
└── MxAccess COM, ClientName "OtOpcUa-Parity"
|
||
└── gRPC on http://localhost:5120
|
||
▲
|
||
│ gRPC
|
||
│
|
||
GalaxyDriver (in-process) ◄── parity test (mxgw half)
|
||
```
|
||
|
||
Both halves talk to the **same Galaxy** through **two distinct MxAccess
|
||
sessions** (different ClientNames so they don't evict each other).
|
||
|
||
## What was on the dev box at the time
|
||
|
||
Per `~/.claude/projects/.../memory/` *as of the rig run*:
|
||
|
||
- **AVEVA System Platform + Galaxy + MXAccess runtime** — `project_aveva_platform_installed.md`.
|
||
- **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped,
|
||
binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`,
|
||
shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433`
|
||
— `project_galaxy_host_installed.md`. **(Service uninstalled and binary
|
||
retired as part of PR 7.2; the host source project no longer exists in
|
||
this repo.)**
|
||
- **Parity test project** (`Driver.Galaxy.ParityTests`) — committed and
|
||
skip-clean at the time of the rig run. **Deleted in PR 7.2.**
|
||
|
||
## Setup steps (one-time)
|
||
|
||
### 1. Build + run mxaccessgw
|
||
|
||
The gateway source is at `c:\Users\dohertj2\Desktop\mxaccessgw\`.
|
||
Build both halves — the worker has to be x86 net48 (MxAccess COM
|
||
bitness), the server is .NET 10:
|
||
|
||
```powershell
|
||
cd C:\Users\dohertj2\Desktop\mxaccessgw
|
||
dotnet build src\MxGateway.Worker -c Release # produces bin\x86\Release\net48\MxGateway.Worker.exe
|
||
dotnet build src\MxGateway.Server -c Release # produces bin\Release\net10.0\MxGateway.Server.dll
|
||
```
|
||
|
||
Initialize the auth database and mint an API key. The CLI mode is
|
||
gated by an `apikey` first-arg prefix:
|
||
|
||
```powershell
|
||
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # any stable string for dev
|
||
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"
|
||
|
||
dotnet $srv apikey init-db # → "init-db: initialized"
|
||
|
||
dotnet $srv apikey create-key `
|
||
--key-id parity-rig `
|
||
--display-name "OtOpcUa-Parity" `
|
||
--scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
|
||
# → "API key: mxgw_parity-rig_<base64suffix>" ← capture this; you can't list secrets later
|
||
```
|
||
|
||
Save that exact key string for `OTOPCUA_PARITY_GW_API_KEY` in step 2.
|
||
|
||
Run the server with three env-var overrides — the defaults don't
|
||
quite match what gRPC + the parity test need:
|
||
|
||
```powershell
|
||
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # MUST match the create-key invocation
|
||
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
|
||
$env:Kestrel__Endpoints__Http__Protocols = "Http2" # gRPC needs h2c on plain HTTP
|
||
$env:MxGateway__Worker__ExecutablePath = `
|
||
"C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
|
||
# appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that
|
||
|
||
dotnet $srv
|
||
# → "Now listening on: http://localhost:5120"
|
||
```
|
||
|
||
The worker spawns lazily on the first OpenSession RPC — there's no
|
||
worker process visible in Task Manager until the first session. If
|
||
the worker can't spawn, the server returns `Failed to open session
|
||
session-…` with a `WorkerProcessLaunchException` in the server log.
|
||
|
||
NSSM-wrap it later if the rig becomes long-lived; for first-pass
|
||
provisioning a console window is easier to inspect.
|
||
|
||
### 2. Set the parity env vars
|
||
|
||
In the test-runner shell:
|
||
|
||
```powershell
|
||
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
|
||
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
|
||
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
|
||
```
|
||
|
||
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
|
||
elevated and non-elevated `dohertj2` shells alike (the Administrators deny
|
||
ACE was removed 2026-04-24; see `project_galaxy_host_installed.md`).
|
||
|
||
### 3. Verify both halves resolve
|
||
|
||
```powershell
|
||
cd C:\Users\dohertj2\Desktop\lmxopcua
|
||
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
|
||
--filter "FullyQualifiedName~HarnessShapeTests"
|
||
```
|
||
|
||
`Harness_records_a_skip_reason_for_each_unavailable_backend` is the
|
||
two-line truth-teller:
|
||
|
||
- Both `LegacyDriver` non-null + both `MxGatewayDriver` non-null → rig is up.
|
||
- One side null → read its `LegacySkipReason` / `MxGatewaySkipReason` and fix.
|
||
|
||
## Running the matrix
|
||
|
||
Once both halves resolve:
|
||
|
||
```powershell
|
||
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
|
||
--filter "Category=ParityE2E"
|
||
```
|
||
|
||
This runs all 17 scenario tests across the seven scenario classes
|
||
(BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect /
|
||
ScanState). Each scenario class is independent — failures in one don't
|
||
block the rest.
|
||
|
||
Track the result against `docs/v2/Galaxy.ParityMatrix.md`. Update each
|
||
row to:
|
||
|
||
- **green** if the scenario passes
|
||
- **yellow** if it skipped because the dev Galaxy doesn't have the right
|
||
shape (see coverage matrix below)
|
||
- **red** if it asserted a real delta — those are the deltas that block
|
||
PR 7.2; chase each before retiring the legacy backend
|
||
|
||
## Galaxy shape needed for full coverage
|
||
|
||
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a
|
||
real result, the dev Galaxy needs the shape in the right column:
|
||
|
||
| Scenario | Needs | Local rig |
|
||
|---|---|---|
|
||
| `BrowseAndReadParityTests` (3 tests) | Any deployed objects with attributes | ✅ existing seed |
|
||
| `SubscribeAndEventRateParityTests` event-rate | ≥5 attributes whose values *change* in 3s | ⚙ scriptable via graccess-cli |
|
||
| `WriteByClassificationParityTests` (FreeAccess/Operate) | A FreeAccess/Operate numeric attribute | ⚙ scriptable via graccess-cli |
|
||
| `WriteByClassificationParityTests` (Configure/Tune) | A Configure/Tune attribute | ⚙ scriptable via graccess-cli |
|
||
| `AlarmTransitionParityTests` (2 tests) | Attributes with the `$Alarm*` extension | ⚙ scriptable via graccess-cli |
|
||
| `HistoryReadParityTests` (historized set) | Attributes with the History extension | ⚙ scriptable via graccess-cli |
|
||
| `ScanStateProbeParityTests` (2 tests) | Multiple `$WinPlatform` / `$AppEngine` objects | ❌ **deferred to customer rig** — this dev box is provisioned for one platform only |
|
||
|
||
### The single-platform constraint
|
||
|
||
The dev box at `DESKTOP-6JL3KKO` is licensed / configured for a single
|
||
deployed `$WinPlatform`. Adding a second platform isn't feasible here,
|
||
so `ScanStateProbeParityTests` will skip in a "no overlap" branch on
|
||
this rig. Both of its scenarios already handle that case gracefully
|
||
(`Assert.Skip("no overlapping platform hosts between backends — likely
|
||
the transport names differ but no $WinPlatform was discovered")`), so
|
||
the matrix reports them as **n/a (deferred)** rather than red.
|
||
|
||
Plan: defer the two ScanState scenarios to a customer rig with multiple
|
||
platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows
|
||
provided the legacy `GalaxyRuntimeProbeManager` and the in-process
|
||
`PerPlatformProbeWatcher` have matching unit-test coverage of the
|
||
state-decoder + member-tracking logic — which they do (PR 4.7's tests).
|
||
Treat the runtime parity check as a customer-rig acceptance gate before
|
||
that customer goes live, not a precondition for retiring the legacy
|
||
projects on this dev box.
|
||
|
||
### Provisioning the rest via graccess-cli
|
||
|
||
`C:\Users\dohertj2\Desktop\graccess\graccess_cli\` is a .NET Framework
|
||
4.8 console app over the ArchestrA GRAccess COM API. It can configure
|
||
templates, instances, attributes, UDAs, extensions, and attribute
|
||
security — i.e. every row above marked ⚙ scriptable. Full surface in
|
||
`graccess/graccess_cli/docs/usage.md` and per-area workflow guides
|
||
(`attribute-editing.md`, `template-editing.md`,
|
||
`template-instance-editing.md`).
|
||
|
||
Reserve a sandbox UDO (e.g. `OtOpcUaParityTest`) to avoid mutating
|
||
attributes on plant-relevant objects. Concrete commands per requirement:
|
||
|
||
**A FreeAccess/Operate numeric attribute** (covers WriteByClassification
|
||
FreeAccess/Operate scenario):
|
||
|
||
```powershell
|
||
graccess object uda add `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--uda OperateValue --data-type MxFloat `
|
||
--category MxCategoryWriteable_C --security MxSecurityOperate `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
```
|
||
|
||
**A Configure / Tune attribute** (covers WriteByClassification
|
||
Configure/Tune scenario):
|
||
|
||
```powershell
|
||
# Tune
|
||
graccess object uda add `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--uda TuneValue --data-type MxFloat `
|
||
--category MxCategoryWriteable_T --security MxSecurityTune `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
|
||
# Configure
|
||
graccess object uda add `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--uda ConfigValue --data-type MxFloat `
|
||
--category MxCategoryWriteable_C --security MxSecurityConfigure `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
```
|
||
|
||
**A changing-value attribute** (covers Subscribe event-rate scenario).
|
||
Two ways:
|
||
|
||
1. *On-scan increment* — bind a script extension that bumps a counter
|
||
each scan. Simplest to author with `object extension add` against
|
||
`ScriptExtension` plus `object attribute set` for the script body
|
||
(see `attribute-editing.md` §"Edit Extensions" for the pattern).
|
||
2. *External writer loop* — leave the attribute as plain Float and run
|
||
a one-liner that writes incrementing values from the parity-test
|
||
shell. Uses the legacy backend path so it's available before the
|
||
mxgw subscriber is up. This keeps the Galaxy template clean.
|
||
|
||
For first-pass validation pick #2 — no template surgery needed, and the
|
||
write loop runs only during `dotnet test`.
|
||
|
||
**Attributes with the `$Alarm*` extension** (covers AlarmTransition
|
||
scenario). Per `attribute-editing.md` §"Edit Alarm Settings" the
|
||
likely-named attributes vary by extension type
|
||
(`Limit`, `RateOfChange`, etc.). Add the extension via:
|
||
|
||
```powershell
|
||
graccess object extension add `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--extension-type AnalogLimitAlarm --primitive AlarmInput `
|
||
--object-extension `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
```
|
||
|
||
Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting
|
||
attributes via `object attribute set`. Inspect first via
|
||
`object attributes` to see the names the extension introduces — they
|
||
differ across Aveva versions.
|
||
|
||
**Attributes with the History extension** (covers HistoryRead routing
|
||
scenario). History settings are usually attribute or extension
|
||
attributes; `attribute-editing.md` §"Edit History Settings" covers the
|
||
discovery flow. Quick start:
|
||
|
||
```powershell
|
||
graccess object extension add `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--extension-type HistoryExtension --primitive HistoryRecord `
|
||
--object-extension `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
|
||
# Then enable history on whichever attribute the extension points at
|
||
graccess object attribute set `
|
||
--galaxy ZB --name OtOpcUaParityTest --type template `
|
||
--attribute HistoryEnabled --value true --data-type bool `
|
||
--confirm --confirm-target OtOpcUaParityTest
|
||
```
|
||
|
||
**Deploy + restart Galaxy.Host after any of the above** so MxAccess
|
||
sees the change:
|
||
|
||
```powershell
|
||
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
|
||
--confirm --confirm-target OtOpcUaParityTest_001
|
||
sc.exe restart OtOpcUaGalaxyHost # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead
|
||
```
|
||
|
||
Then re-run the parity matrix. The previously-skipped scenarios should
|
||
now find a sandbox attribute matching their selector and assert.
|
||
|
||
## Soak run
|
||
|
||
The 24h × 50k soak gates the production confidence half of PR 7.2.
|
||
|
||
```powershell
|
||
$env:OTOPCUA_SOAK_RUN = "1"
|
||
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
|
||
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
|
||
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
|
||
|
||
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
|
||
--filter "Category=Soak"
|
||
```
|
||
|
||
The test logs a per-minute CSV-style line to stdout:
|
||
|
||
```
|
||
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
|
||
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
|
||
...
|
||
```
|
||
|
||
Capture stdout to a file for post-run analysis. The three guards
|
||
(`received` growing, `dropped/received` ratio, working-set delta) all
|
||
fire mid-run rather than at end-of-test, so a failure surfaces within
|
||
the first few minutes if the architecture is wrong.
|
||
|
||
## Compressed-tag soak (when Galaxy isn't 50k tags)
|
||
|
||
A first-pass validation is fine with the override:
|
||
|
||
```powershell
|
||
$env:OTOPCUA_SOAK_RUN = "1"
|
||
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
|
||
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
|
||
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
|
||
```
|
||
|
||
This validates the *plumbing* (bounded channel, pump invariants, leak
|
||
guard) but doesn't pin the 50k-tag scaling assertion. Defer the full
|
||
50k validation to a customer rig with that scale, or build a synthetic
|
||
Galaxy with a script that imports 50k attributes onto a generated UDO
|
||
(~2 hours of one-off work).
|
||
|
||
## Troubleshooting
|
||
|
||
- **`MxGatewaySkipReason` says "mxaccessgw not reachable"** — the gw
|
||
isn't listening, or it's on a different port. `Test-NetConnection
|
||
localhost -Port 5120` is the quick check.
|
||
- **`MxGatewaySkipReason` says "mxgateway backend boot failed:
|
||
RpcException: Unauthenticated"** — API key mismatch. Verify the
|
||
`OTOPCUA_PARITY_GW_API_KEY` env var matches the gw's configured key.
|
||
- **`LegacySkipReason` says "Galaxy ZB SQL not reachable on
|
||
localhost:1433"** — SQL Server isn't running, or its TCP listener is
|
||
off. Check `services.msc` for the SQL Server (default) instance.
|
||
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — at rig time
|
||
the parity harness looked under
|
||
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` for the
|
||
EXE it spawned as a subprocess, separate from the published copy at
|
||
`C:\publish\OtOpcUaGalaxyHost\` used by the Windows service. **Both
|
||
the source project and the published binary were removed in PR 7.2,
|
||
so this troubleshooting branch no longer applies — the legacy half
|
||
cannot be brought up at all.**
|
||
- **Both halves resolve but parity scenarios assert deltas** — that's
|
||
the expected outcome the rig exists to surface. Review each delta
|
||
against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section
|
||
to decide whether it's a real bug or a pre-accepted divergence.
|
||
|
||
## After the rig is green
|
||
|
||
When the matrix is fully green or carries documented accepted-deltas,
|
||
PR 7.2 (legacy project deletion) is unblocked. The only follow-up is
|
||
to promote any newly-discovered accepted-delta to the matrix doc with
|
||
the why so the matrix history stays auditable.
|