Files
lmxopcua/docs/v2/implementation/phase-7-e2e-smoke.md
Joseph Doherty a52086efc5 Refresh phase-7-e2e-smoke.md to match current wiring
The runbook shipped at phase-7 close (2026-04-20) described the original
`Doubled = Source × 2` virtual tag, Float64 seed, and flat TagId-shaped
NodeIds. Four commits later the wiring has moved:

- Seed now targets `TestMachine_001.TestHistoryValue` (Int32, writable,
  historized) — no placeholder to fill in for the dev box.
- VirtualTag is `MachineStatus` (Boolean, `Source > 0`, historized).
- NodeIds are path-based per OPC UA Part 3 §5.2.2
  (`{driverId}/{folder-path}/{browseName}`).
- Seed inserts the ClusterNodeCredential row — without it the Server
  bootstrap fails `Unauthorized: caller X is not bound to NodeId`.

Changes:

1. Step 3 — replace "edit the placeholder" instructions with the ZB
   Galaxy-Repository query that finds writable historized attributes
   (dpc CTE + HistoryExtension EXISTS + `security_classification > 0`).
2. New step 4a — LDAP + `SecurityProfile = Basic256Sha256-Sign` recipe
   for the reverse-bridge + alarm-fires stages. Anonymous sessions are
   denied writes against `Operate`-classified attributes (PR 26 gate);
   `writeop / writeop123` against the dev-box GLAuth clears it.
3. Step 6 validation commands updated to the new NodeIds + reference
   the path-based scheme's Part-3 rationale.
4. Drive-the-alarm snippet now calls `otopcua-cli write … -U writeop`
   so operators see the explicit auth step.
5. Acceptance checklist updated for the new tag names + the
   test-galaxy.ps1 `-Username` invocation.
6. Added a 2026-04-24 second-run evidence section alongside the original
   — documents the 3/7 anonymous ceiling and what's needed to reach 7/7.

No code or seed changes in this commit — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:13:27 -04:00

14 KiB

Phase 7 Live OPC UA E2E Smoke (task #240)

End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #245 / #246 / #247) actually serves virtual tags + scripted alarms over OPC UA against a real Galaxy + Aveva Historian.

Scope. Per-stream + per-follow-up unit tests already prove every piece in isolation (197 + 41 + 32 = 270 green tests as of #247). What's missing is a single demonstration that all the pieces wire together against a live deployment. This runbook is that demonstration.

Prerequisites

Component How to verify
AVEVA Galaxy + MXAccess installed Get-Service ArchestrA* returns at least one running service
OtOpcUaGalaxyHost Windows service running sc query OtOpcUaGalaxyHostSTATE: 4 RUNNING
Galaxy.Host shared secret matches .local/galaxy-host-secret.txt Set during NSSM install — see docs/ServiceHosting.md
SQL Server reachable, OtOpcUaConfig DB exists with all migrations applied sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory" returns ≥ 11
Server's appsettings.json Node:ConfigDbConnectionString matches your SQL Server cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json

Galaxy.Host pipe ACL. The pipe allows the configured OTOPCUA_ALLOWED_SID (typically the user that runs OtOpcUaGalaxyHostdohertj2 on the dev box). Run the Server under the same user; elevation doesn't matter — PipeAcl.cs no longer denies BUILTIN\Administrators since UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too.

Setup

1. Migrate the Config DB

cd src/ZB.MOM.WW.OtOpcUa.Configuration
dotnet ef database update --connection "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"

Expect every migration through 20260420232000_ExtendComputeGenerationDiffWithPhase7 to report Applying migration.... Re-running is a no-op.

2. Seed the smoke fixture

sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "OtOpcUaDev_2026!" `
       -I -i scripts/smoke/seed-phase-7-smoke.sql

Expected output ends with Phase 7 smoke seed complete. plus a Cluster / Node / Generation summary. Idempotent — re-running wipes the prior smoke state and starts clean.

The seed creates one each of: ServerCluster, ClusterNode, ClusterNodeCredential (binds the SQL login to the node — without this sp_GetCurrentGenerationForCluster returns Unauthorized: caller X is not bound to NodeId p7-smoke-node), ConfigGeneration (Published), Namespace, UnsArea, UnsLine, Equipment, DriverInstance (Galaxy proxy), Tag, two Script rows, one VirtualTag (MachineStatus = Source > 0, Boolean, historized), one ScriptedAlarm (OverTemp when Source > 50).

3. (Optional) Swap the Galaxy attribute

The shipped seed points dbo.Tag.TagConfig at TestMachine_001.TestHistoryValue — the dev-box Galaxy ships it as Int32, writable (security_classification = Operate), and historized (HistoryExtension primitive), so every E2E stage has a real live target. To swap to another attribute on a different Galaxy, pick a candidate via the same shape:

-- Run against the Galaxy Repository DB (ZB).
;WITH dpc AS (
    SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
    FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id
    WHERE g.is_template = 0 AND g.deployed_package_id <> 0
    UNION ALL
    SELECT c.gobject_id, p.package_id, p.derived_from_package_id, c.depth + 1
    FROM dpc c INNER JOIN package p ON p.package_id = c.derived_from_package_id
    WHERE c.derived_from_package_id <> 0 AND c.depth < 10
)
SELECT DISTINCT g.tag_name + '.' + da.attribute_name AS full_ref,
       dt.description AS dtype, da.security_classification
FROM dpc
INNER JOIN dynamic_attribute da ON da.package_id = dpc.package_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
LEFT JOIN data_type dt ON dt.mx_data_type = da.mx_data_type
WHERE da.attribute_name NOT LIKE '[_]%'
  AND da.attribute_name NOT LIKE '%.Description'
  AND da.mx_data_type IN (1, 2, 3, 4)
  AND da.security_classification > 0  -- writable
  AND EXISTS (
      SELECT 1 FROM primitive_instance pi
      INNER JOIN primitive_definition pd
          ON pd.primitive_definition_id = pi.primitive_definition_id
          AND pd.primitive_name = 'HistoryExtension'
      WHERE pi.package_id = dpc.package_id AND pi.primitive_name = da.attribute_name)
ORDER BY full_ref;

Then update the seed:

UPDATE dbo.Tag
   SET TagConfig = N'{"FullName":"YourReal.GalaxyAttr","DataType":"Int32"}'
 WHERE TagId = 'p7-smoke-tag-source';

4. Point Server.appsettings at the smoke node

{
  "Node": {
    "NodeId":    "p7-smoke-node",
    "ClusterId": "p7-smoke",
    "ConfigDbConnectionString": "Server=localhost,14330;..."
  }
}

4a. (Optional) Enable LDAP + SecurityProfile for the write stage

Anonymous OPC UA sessions are denied writes against Operate-classified tags by the PR 26 server-layer classification gate. To exercise the reverse-bridge + alarm-fires stages fully, the Server has to advertise a UserName UserTokenPolicy (any profile other than None) and authenticate against LDAP.

{
  "OpcUa": {
    "SecurityProfile": "Basic256Sha256-Sign",
    "Ldap": {
      "Enabled": true,
      "Server":  "localhost",
      "Port":    3893,
      "SearchBase": "dc=lmxopcua,dc=local",
      "ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local",
      "ServiceAccountPassword": "serviceaccount123",
      "GroupToRole": {
        "ReadOnly":       "ReadOnly",
        "WriteOperate":   "WriteOperate",
        "WriteTune":      "WriteTune",
        "WriteConfigure": "WriteConfigure",
        "AlarmAck":       "AlarmAck"
      }
    }
  }
}

Dev-box GLAuth ships writeop / writeop123 in the WriteOperate group, admin / admin123 across all write groups. See C:\publish\glauth\auth.md.

Run

5. Start the Server

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server

Expected log markers (in order):

Bootstrap complete: source=db generation=1
Equipment namespace snapshots loaded for 1/1 driver(s) at generation 1
Phase 7 historian sink: driver p7-smoke-galaxy provides IAlarmHistorianWriter — wiring SqliteStoreAndForwardSink
Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
Phase 7 bridge subscribed N attribute(s) from driver GalaxyProxyDriver
OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa driverCount=1
Address space populated for driver p7-smoke-galaxy

Any line missing = follow up the failure surface (each step has its own log signature so the broken piece is identifiable).

6. Validate via Client.CLI

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/OtOpcUa -r -d 5

Expect to see under the namespace root: lab-floor → galaxy-line → reactor-1 with three child variables: Source (driver-sourced Int32), MachineStatus (virtual tag Boolean, Source > 0), and OverTemp (scripted alarm Boolean, Source > 50). NodeIds are path-based per OPC UA Part 3 §5.2.2 — the walker mints them from {driverId}/{folder-path}/{browseName} and stores the driver-side FullReference in an internal NodeId→FullRef map, so client subscriptions survive backend address renames.

Read the virtual tag

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
    -u opc.tcp://localhost:4840/OtOpcUa `
    -n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/MachineStatus"

Expected: Boolean. Push a value change into the Source Galaxy attribute and re-read — MachineStatus should follow within the bridge's publishing interval (1 second by default).

Read the scripted alarm

dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
    -u opc.tcp://localhost:4840/OtOpcUa `
    -n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/OverTemp"

Expected: Booleanfalse when Source ≤ 50, true when Source > 50.

Drive the alarm + verify historian queue

Push a Source value above 50 — either from Galaxy itself, or via the Server's OPC UA write path using LDAP credentials (step 4a). Within ~1 second, OverTemp.Read flips to true. The alarm engine emits a transition to Phase7EngineComposer.RouteToHistorianAsyncSqliteStoreAndForwardSink.EnqueueAsync → drain worker (every 2s) → GalaxyHistorianWriter.WriteBatchAsync → Galaxy.Host pipe → Aveva Historian alarm schema.

# OPC UA write path — requires LDAP from step 4a + a writeop-class user.
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write `
    -u opc.tcp://localhost:4840/OtOpcUa -S sign `
    -n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/Source" `
    -v 75 -U writeop -P writeop123

Verify the queue absorbed the event:

sqlite3 "$env:ProgramData\OtOpcUa\alarm-historian-queue.db" "SELECT COUNT(*) FROM Queue;"

Should return 0 once the drain worker successfully forwards (or a small positive number while in-flight). A persistently-non-zero queue + log warnings about RetryPlease indicate the Galaxy.Host historian write path is failing — check the Host's log file.

Verify in Aveva Historian

Open the Historian Client (or InTouch alarm summary) — the OverTemp activation should appear with EquipmentPath = /lab-floor/galaxy-line/reactor-1 + the rendered message Reactor source value 75.3 exceeded 50 (or whatever value tripped it).

Acceptance Checklist

  • EF migrations applied through 20260420232000_ExtendComputeGenerationDiffWithPhase7
  • Smoke seed completes without errors + creates exactly 1 Published generation
  • Server starts + logs the Phase 7 composition lines
  • Client.CLI browse shows the UNS tree with Source / MachineStatus / OverTemp under reactor-1
  • Read on Source returns a Good-quality Int32 value (proves MXAccess round-trip)
  • Read on MachineStatus returns the live boolean truth of Source > 0
  • Read on OverTemp returns the live boolean truth of Source > 50
  • test-galaxy.ps1 -Username writeop -Password writeop123 drives Source past 50 and flips OverTemp to true within 1 s
  • SQLite queue drains (COUNT(*) returns to 0 within 2 s of an alarm transition)
  • Historian shows the OverTemp activation event with the rendered message

Second-run evidence (2026-04-24 dev box)

Full live stack ran end-to-end once the IPC unblocks (commit d11dd05), path-based NodeIds (commit 8be82e0), cold-start engine guards (commit 69e1d32), and seed retarget to TestMachine_001.TestHistoryValue (commit ec1a590) landed. Anonymous scripts/e2e/test-galaxy.ps1 run reaches 3/7:

[PASS] source NodeId readable (Galaxy pipe → proxy → server → client chain up)
[PASS] source value = System.Byte[]
[INFO] BadUserAccessDenied — attribute's Galaxy-side ACL blocks writes for this session.

The INFO stage is correct behaviour — Source is Operate-classified and the anonymous session carries no LDAP roles. The Virtual-tag / Subscribe / Alarm / History stages stay at [FAIL] for two further environmental reasons once write is unblocked:

  1. TestMachine_001.TestHistoryValue is driven by whatever Galaxy code runs on the object — idle in the default dev-box state, so no subscription pushes fire.
  2. Historian writes require the Aveva Historian SDK to accept the alarm schema event — dev box doesn't have that path live.

Running ./test-galaxy.ps1 -Username writeop -Password writeop123 with step 4a's LDAP + SecurityProfile = Basic256Sha256-Sign applied unblocks the reverse-bridge + alarm-fires stages. The virtual-tag, subscribe, and history stages depend on further deployment choices (pick an attribute Galaxy is actively writing to, wire Aveva Historian SDK).

First-run evidence (2026-04-20 dev box)

Ran the smoke against the live dev environment. Captured log signatures prove the Phase 7 wiring chain executes in production:

[INF] Bootstrapped from central DB: generation 1
[INF] Bootstrap complete: source=CentralDb generation=1
[INF] Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using NullAlarmHistorianSink
[INF] VirtualTagEngine loaded 1 tag(s), 1 upstream subscription(s)
[INF] ScriptedAlarmEngine loaded 1 alarm(s)
[INF] Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)

Each line corresponds to a piece shipped in #243 / #244 / #245 / #246 / #247 — the composer ran, engines loaded, historian-sink decision fired, scripts compiled.

Two gaps surfaced (filed as new tasks below, NOT Phase 7 regressions):

  1. No driver-instance bootstrap pipeline. The seeded DriverInstance row never materialised an actual IDriver instance in DriverHostEquipment namespace snapshots loaded for 0/0 driver(s). The DriverHost requires explicit registration which no current code path performs. Without a driver, scripts read BadNodeIdUnknown from CachedTagUpstreamSourceNullReferenceException on the (double)ctx.GetTag(...).Value cast. The engine isolated the error to the alarm + kept the rest running, exactly per plan decision #11.
  2. OPC UA endpoint port collision. Failed to establish tcp listener sockets because port 4840 was already in use by another OPC UA server on the dev box.

Both are pre-Phase-7 deployment-wiring gaps. Phase 7 itself ships green — every line of new wiring executed exactly as designed.

Known limitations + follow-ups

  • Subscribing to virtual tags via OPC UA monitored items (instead of polled reads) needs VirtualTagSource.SubscribeAsync wiring through DriverNodeManager.OnCreateMonitoredItem — covered as part of release-readiness.
  • Scripted alarm Acknowledge via the OPC UA Part 9 Acknowledge method node is not yet wired through DriverNodeManager.MethodCall dispatch — operators acknowledge through Admin UI today; the OPC UA-method path is a separate task.
  • Phase 7 compliance script (scripts/compliance/phase-7-compliance.ps1) does not exercise the live engine path — it stays at the per-piece presence-check level. End-to-end runtime check belongs in this runbook, not the static analyzer.