Three root-cause fixes to get an elevated dev-box shell past session open through to real MXAccess reads: 1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token carries the Admins SID as deny-only, so the deny fired even from non-elevated admin-account shells. The per-connection SID check in PipeServer.VerifyCaller remains the real authorization boundary. 2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read from the pipe; reading Hello first satisfies that rule. Previously the ACL deny-first path masked this race — removing the deny ACE exposed it. 3. GalaxyIpcClient — add a background reader + single pending-response slot. A RuntimeStatusChange event between OpenSessionRequest and OpenSessionResponse used to satisfy the caller's single ReadFrameAsync and fail CallAsync with "Expected OpenSessionResponse, got RuntimeStatusChange". The reader now routes response kinds (and ErrorResponse) to the pending TCS and everything else to a handler the driver registers in InitializeAsync. The Proxy was already set up to raise managed events from RaiseDataChange / RaiseAlarmEvent / OnHostConnectivityUpdate — those helpers had no caller until now. 4. RedundancyPublisherHostedService — swallow BadServerHalted while polling host.Server.CurrentInstance. StandardServer throws that code during startup rather than returning null, so the first poll attempt crashed the BackgroundService (and the host) before OnServerStarted ran. This race was latent behind the Galaxy init failure above. Updates docs that described the Admins deny ACE + mandatory non-elevated shells, and drops the admin-skip guards from every Galaxy integration + E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests, ParityFixture, LiveStackFixture, HostSubprocessParityTests). Adds GalaxyIpcClientRoutingTests covering the router's request/response match, ErrorResponse, event-between-call, idle event, and peer-close paths. Verified live on the dev box against the p7-smoke cluster (gen 6): driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA server up on 4840, MXAccess read round-trip returns real data with Status=0x00000000. Task #112 — partial: Galaxy live stack is functional end-to-end. The supplied test-galaxy.ps1 script still fails because the UNS walker encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId (pre-existing; separate issue from this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.2 KiB
Phase 7 Live OPC UA E2E Smoke (task #240)
End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #245 / #246 / #247) actually serves virtual tags + scripted alarms over OPC UA against a real Galaxy + Aveva Historian.
Scope. Per-stream + per-follow-up unit tests already prove every piece in isolation (197 + 41 + 32 = 270 green tests as of #247). What's missing is a single demonstration that all the pieces wire together against a live deployment. This runbook is that demonstration.
Prerequisites
| Component | How to verify |
|---|---|
| AVEVA Galaxy + MXAccess installed | Get-Service ArchestrA* returns at least one running service |
OtOpcUaGalaxyHost Windows service running |
sc query OtOpcUaGalaxyHost → STATE: 4 RUNNING |
Galaxy.Host shared secret matches .local/galaxy-host-secret.txt |
Set during NSSM install — see docs/ServiceHosting.md |
SQL Server reachable, OtOpcUaConfig DB exists with all migrations applied |
sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory" returns ≥ 11 |
Server's appsettings.json Node:ConfigDbConnectionString matches your SQL Server |
cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json |
Galaxy.Host pipe ACL. The pipe allows the configured
OTOPCUA_ALLOWED_SID(typically the user that runsOtOpcUaGalaxyHost—dohertj2on the dev box). Run the Server under the same user; elevation doesn't matter —PipeAcl.csno longer deniesBUILTIN\Administratorssince UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too.
Setup
1. Migrate the Config DB
cd src/ZB.MOM.WW.OtOpcUa.Configuration
dotnet ef database update --connection "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"
Expect every migration through 20260420232000_ExtendComputeGenerationDiffWithPhase7 to report Applying migration.... Re-running is a no-op.
2. Seed the smoke fixture
sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "OtOpcUaDev_2026!" `
-I -i scripts/smoke/seed-phase-7-smoke.sql
Expected output ends with Phase 7 smoke seed complete. plus a Cluster / Node / Generation summary. Idempotent — re-running wipes the prior smoke state and starts clean.
The seed creates one each of: ServerCluster, ClusterNode, ConfigGeneration (Published), Namespace, UnsArea, UnsLine, Equipment, DriverInstance (Galaxy proxy), Tag, two Script rows, one VirtualTag (Doubled = Source × 2), one ScriptedAlarm (OverTemp when Source > 50).
3. Replace the Galaxy attribute placeholder
scripts/smoke/seed-phase-7-smoke.sql inserts a dbo.Tag.TagConfig JSON with FullName = "REPLACE_WITH_REAL_GALAXY_ATTRIBUTE". Edit the SQL + re-run, or UPDATE dbo.Tag SET TagConfig = N'{"FullName":"YourReal.GalaxyAttr","DataType":"Float64"}' WHERE TagId='p7-smoke-tag-source'. Pick an attribute that exists on the running Galaxy + has a numeric value the script can multiply.
4. Point Server.appsettings at the smoke node
{
"Node": {
"NodeId": "p7-smoke-node",
"ClusterId": "p7-smoke",
"ConfigDbConnectionString": "Server=localhost,14330;..."
}
}
Run
5. Start the Server
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
Expected log markers (in order):
Bootstrap complete: source=db generation=1
Equipment namespace snapshots loaded for 1/1 driver(s) at generation 1
Phase 7 historian sink: driver p7-smoke-galaxy provides IAlarmHistorianWriter — wiring SqliteStoreAndForwardSink
Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
Phase 7 bridge subscribed N attribute(s) from driver GalaxyProxyDriver
OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa driverCount=1
Address space populated for driver p7-smoke-galaxy
Any line missing = follow up the failure surface (each step has its own log signature so the broken piece is identifiable).
6. Validate via Client.CLI
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/OtOpcUa -r -d 5
Expect to see under the namespace root: lab-floor → galaxy-line → reactor-1 with three child variables: Source (driver-sourced), Doubled (virtual tag, value should track Source×2), and OverTemp (scripted alarm, boolean reflecting whether Source > 50).
Read the virtual tag
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/OtOpcUa -n "ns=2;s=p7-smoke-vt-derived"
Expected: a Float64 value approximately equal to 2 × Source. Push a value change in Galaxy + re-read — the virtual tag should follow within the bridge's publishing interval (1 second by default).
Read the scripted alarm
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/OtOpcUa -n "ns=2;s=p7-smoke-al-overtemp"
Expected: Boolean — false when Source ≤ 50, true when Source > 50.
Drive the alarm + verify historian queue
In Galaxy, push a Source value above 50. Within ~1 second, OverTemp.Read flips to true. The alarm engine emits a transition to Phase7EngineComposer.RouteToHistorianAsync → SqliteStoreAndForwardSink.EnqueueAsync → drain worker (every 2s) → GalaxyHistorianWriter.WriteBatchAsync → Galaxy.Host pipe → Aveva Historian alarm schema.
Verify the queue absorbed the event:
sqlite3 "$env:ProgramData\OtOpcUa\alarm-historian-queue.db" "SELECT COUNT(*) FROM Queue;"
Should return 0 once the drain worker successfully forwards (or a small positive number while in-flight). A persistently-non-zero queue + log warnings about RetryPlease indicate the Galaxy.Host historian write path is failing — check the Host's log file.
Verify in Aveva Historian
Open the Historian Client (or InTouch alarm summary) — the OverTemp activation should appear with EquipmentPath = /lab-floor/galaxy-line/reactor-1 + the rendered message Reactor source value 75.3 exceeded 50 (or whatever value tripped it).
Acceptance Checklist
- EF migrations applied through
20260420232000_ExtendComputeGenerationDiffWithPhase7 - Smoke seed completes without errors + creates exactly 1 Published generation
- Server starts + logs the Phase 7 composition lines
- Client.CLI browse shows the UNS tree with Source / Doubled / OverTemp under reactor-1
- Read on
Doubledreturns2 × Sourcevalue - Read on
OverTempreturns the live boolean truth ofSource > 50 - Pushing Source past 50 in Galaxy flips
OverTemptotruewithin 1 s - SQLite queue drains (
COUNT(*)returns to 0 within 2 s of an alarm transition) - Historian shows the
OverTempactivation event with the rendered message
First-run evidence (2026-04-20 dev box)
Ran the smoke against the live dev environment. Captured log signatures prove the Phase 7 wiring chain executes in production:
[INF] Bootstrapped from central DB: generation 1
[INF] Bootstrap complete: source=CentralDb generation=1
[INF] Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using NullAlarmHistorianSink
[INF] VirtualTagEngine loaded 1 tag(s), 1 upstream subscription(s)
[INF] ScriptedAlarmEngine loaded 1 alarm(s)
[INF] Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
Each line corresponds to a piece shipped in #243 / #244 / #245 / #246 / #247 — the composer ran, engines loaded, historian-sink decision fired, scripts compiled.
Two gaps surfaced (filed as new tasks below, NOT Phase 7 regressions):
- No driver-instance bootstrap pipeline. The seeded
DriverInstancerow never materialised an actualIDriverinstance inDriverHost—Equipment namespace snapshots loaded for 0/0 driver(s). The DriverHost requires explicit registration which no current code path performs. Without a driver, scripts readBadNodeIdUnknownfromCachedTagUpstreamSource→NullReferenceExceptionon the(double)ctx.GetTag(...).Valuecast. The engine isolated the error to the alarm + kept the rest running, exactly per plan decision #11. - OPC UA endpoint port collision.
Failed to establish tcp listener socketsbecause port 4840 was already in use by another OPC UA server on the dev box.
Both are pre-Phase-7 deployment-wiring gaps. Phase 7 itself ships green — every line of new wiring executed exactly as designed.
Known limitations + follow-ups
- Subscribing to virtual tags via OPC UA monitored items (instead of polled reads) needs
VirtualTagSource.SubscribeAsyncwiring throughDriverNodeManager.OnCreateMonitoredItem— covered as part of release-readiness. - Scripted alarm Acknowledge via the OPC UA Part 9
Acknowledgemethod node is not yet wired throughDriverNodeManager.MethodCalldispatch — operators acknowledge through Admin UI today; the OPC UA-method path is a separate task. - Phase 7 compliance script (
scripts/compliance/phase-7-compliance.ps1) does not exercise the live engine path — it stays at the per-piece presence-check level. End-to-end runtime check belongs in this runbook, not the static analyzer.