Files
lmxopcua/docs/v2/implementation/phase-7-e2e-smoke.md
Joseph Doherty a52086efc5 Refresh phase-7-e2e-smoke.md to match current wiring
The runbook shipped at phase-7 close (2026-04-20) described the original
`Doubled = Source × 2` virtual tag, Float64 seed, and flat TagId-shaped
NodeIds. Four commits later the wiring has moved:

- Seed now targets `TestMachine_001.TestHistoryValue` (Int32, writable,
  historized) — no placeholder to fill in for the dev box.
- VirtualTag is `MachineStatus` (Boolean, `Source > 0`, historized).
- NodeIds are path-based per OPC UA Part 3 §5.2.2
  (`{driverId}/{folder-path}/{browseName}`).
- Seed inserts the ClusterNodeCredential row — without it the Server
  bootstrap fails `Unauthorized: caller X is not bound to NodeId`.

Changes:

1. Step 3 — replace "edit the placeholder" instructions with the ZB
   Galaxy-Repository query that finds writable historized attributes
   (dpc CTE + HistoryExtension EXISTS + `security_classification > 0`).
2. New step 4a — LDAP + `SecurityProfile = Basic256Sha256-Sign` recipe
   for the reverse-bridge + alarm-fires stages. Anonymous sessions are
   denied writes against `Operate`-classified attributes (PR 26 gate);
   `writeop / writeop123` against the dev-box GLAuth clears it.
3. Step 6 validation commands updated to the new NodeIds + reference
   the path-based scheme's Part-3 rationale.
4. Drive-the-alarm snippet now calls `otopcua-cli write … -U writeop`
   so operators see the explicit auth step.
5. Acceptance checklist updated for the new tag names + the
   test-galaxy.ps1 `-Username` invocation.
6. Added a 2026-04-24 second-run evidence section alongside the original
   — documents the 3/7 anonymous ceiling and what's needed to reach 7/7.

No code or seed changes in this commit — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:13:27 -04:00

255 lines
14 KiB
Markdown

# Phase 7 Live OPC UA E2E Smoke (task #240)
End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #245 / #246 / #247) actually serves virtual tags + scripted alarms over OPC UA against a real Galaxy + Aveva Historian.
> **Scope.** Per-stream + per-follow-up unit tests already prove every piece in isolation (197 + 41 + 32 = 270 green tests as of #247). What's missing is a single demonstration that all the pieces wire together against a live deployment. This runbook is that demonstration.
## Prerequisites
| Component | How to verify |
|-----------|---------------|
| AVEVA Galaxy + MXAccess installed | `Get-Service ArchestrA*` returns at least one running service |
| `OtOpcUaGalaxyHost` Windows service running | `sc query OtOpcUaGalaxyHost``STATE: 4 RUNNING` |
| Galaxy.Host shared secret matches `.local/galaxy-host-secret.txt` | Set during NSSM install — see `docs/ServiceHosting.md` |
| SQL Server reachable, `OtOpcUaConfig` DB exists with all migrations applied | `sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory"` returns ≥ 11 |
| Server's `appsettings.json` `Node:ConfigDbConnectionString` matches your SQL Server | `cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` |
> **Galaxy.Host pipe ACL.** The pipe allows the configured `OTOPCUA_ALLOWED_SID` (typically the user that runs `OtOpcUaGalaxyHost` — `dohertj2` on the dev box). Run the Server under the same user; elevation doesn't matter — `PipeAcl.cs` no longer denies `BUILTIN\Administrators` since UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too.
## Setup
### 1. Migrate the Config DB
```powershell
cd src/ZB.MOM.WW.OtOpcUa.Configuration
dotnet ef database update --connection "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"
```
Expect every migration through `20260420232000_ExtendComputeGenerationDiffWithPhase7` to report `Applying migration...`. Re-running is a no-op.
### 2. Seed the smoke fixture
```powershell
sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "OtOpcUaDev_2026!" `
-I -i scripts/smoke/seed-phase-7-smoke.sql
```
Expected output ends with `Phase 7 smoke seed complete.` plus a Cluster / Node / Generation summary. Idempotent — re-running wipes the prior smoke state and starts clean.
The seed creates one each of: `ServerCluster`, `ClusterNode`, `ClusterNodeCredential` (binds the SQL login to the node — without this `sp_GetCurrentGenerationForCluster` returns `Unauthorized: caller X is not bound to NodeId p7-smoke-node`), `ConfigGeneration` (Published), `Namespace`, `UnsArea`, `UnsLine`, `Equipment`, `DriverInstance` (Galaxy proxy), `Tag`, two `Script` rows, one `VirtualTag` (`MachineStatus` = `Source > 0`, Boolean, historized), one `ScriptedAlarm` (`OverTemp` when `Source > 50`).
### 3. (Optional) Swap the Galaxy attribute
The shipped seed points `dbo.Tag.TagConfig` at `TestMachine_001.TestHistoryValue` — the dev-box Galaxy ships it as Int32, writable (`security_classification = Operate`), and historized (`HistoryExtension` primitive), so every E2E stage has a real live target. To swap to another attribute on a different Galaxy, pick a candidate via the same shape:
```sql
-- Run against the Galaxy Repository DB (ZB).
;WITH dpc AS (
SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0
UNION ALL
SELECT c.gobject_id, p.package_id, p.derived_from_package_id, c.depth + 1
FROM dpc c INNER JOIN package p ON p.package_id = c.derived_from_package_id
WHERE c.derived_from_package_id <> 0 AND c.depth < 10
)
SELECT DISTINCT g.tag_name + '.' + da.attribute_name AS full_ref,
dt.description AS dtype, da.security_classification
FROM dpc
INNER JOIN dynamic_attribute da ON da.package_id = dpc.package_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
LEFT JOIN data_type dt ON dt.mx_data_type = da.mx_data_type
WHERE da.attribute_name NOT LIKE '[_]%'
AND da.attribute_name NOT LIKE '%.Description'
AND da.mx_data_type IN (1, 2, 3, 4)
AND da.security_classification > 0 -- writable
AND EXISTS (
SELECT 1 FROM primitive_instance pi
INNER JOIN primitive_definition pd
ON pd.primitive_definition_id = pi.primitive_definition_id
AND pd.primitive_name = 'HistoryExtension'
WHERE pi.package_id = dpc.package_id AND pi.primitive_name = da.attribute_name)
ORDER BY full_ref;
```
Then update the seed:
```sql
UPDATE dbo.Tag
SET TagConfig = N'{"FullName":"YourReal.GalaxyAttr","DataType":"Int32"}'
WHERE TagId = 'p7-smoke-tag-source';
```
### 4. Point Server.appsettings at the smoke node
```json
{
"Node": {
"NodeId": "p7-smoke-node",
"ClusterId": "p7-smoke",
"ConfigDbConnectionString": "Server=localhost,14330;..."
}
}
```
### 4a. (Optional) Enable LDAP + SecurityProfile for the write stage
Anonymous OPC UA sessions are denied writes against `Operate`-classified tags by the PR 26 server-layer classification gate. To exercise the reverse-bridge + alarm-fires stages fully, the Server has to advertise a `UserName` UserTokenPolicy (any profile other than `None`) and authenticate against LDAP.
```json
{
"OpcUa": {
"SecurityProfile": "Basic256Sha256-Sign",
"Ldap": {
"Enabled": true,
"Server": "localhost",
"Port": 3893,
"SearchBase": "dc=lmxopcua,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local",
"ServiceAccountPassword": "serviceaccount123",
"GroupToRole": {
"ReadOnly": "ReadOnly",
"WriteOperate": "WriteOperate",
"WriteTune": "WriteTune",
"WriteConfigure": "WriteConfigure",
"AlarmAck": "AlarmAck"
}
}
}
}
```
Dev-box GLAuth ships `writeop` / `writeop123` in the `WriteOperate` group, `admin` / `admin123` across all write groups. See `C:\publish\glauth\auth.md`.
## Run
### 5. Start the Server
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
```
Expected log markers (in order):
```
Bootstrap complete: source=db generation=1
Equipment namespace snapshots loaded for 1/1 driver(s) at generation 1
Phase 7 historian sink: driver p7-smoke-galaxy provides IAlarmHistorianWriter — wiring SqliteStoreAndForwardSink
Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
Phase 7 bridge subscribed N attribute(s) from driver GalaxyProxyDriver
OPC UA server started — endpoint=opc.tcp://0.0.0.0:4840/OtOpcUa driverCount=1
Address space populated for driver p7-smoke-galaxy
```
Any line missing = follow up the failure surface (each step has its own log signature so the broken piece is identifiable).
### 6. Validate via Client.CLI
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/OtOpcUa -r -d 5
```
Expect to see under the namespace root: `lab-floor → galaxy-line → reactor-1` with three child variables: `Source` (driver-sourced Int32), `MachineStatus` (virtual tag Boolean, `Source > 0`), and `OverTemp` (scripted alarm Boolean, `Source > 50`). NodeIds are path-based per OPC UA Part 3 §5.2.2 — the walker mints them from `{driverId}/{folder-path}/{browseName}` and stores the driver-side FullReference in an internal NodeId→FullRef map, so client subscriptions survive backend address renames.
#### Read the virtual tag
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840/OtOpcUa `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/MachineStatus"
```
Expected: `Boolean`. Push a value change into the Source Galaxy attribute and re-read — `MachineStatus` should follow within the bridge's publishing interval (1 second by default).
#### Read the scripted alarm
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840/OtOpcUa `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/OverTemp"
```
Expected: `Boolean``false` when Source ≤ 50, `true` when Source > 50.
#### Drive the alarm + verify historian queue
Push a Source value above 50 — either from Galaxy itself, or via the Server's OPC UA write path using LDAP credentials (step 4a). Within ~1 second, `OverTemp.Read` flips to `true`. The alarm engine emits a transition to `Phase7EngineComposer.RouteToHistorianAsync``SqliteStoreAndForwardSink.EnqueueAsync` → drain worker (every 2s) → `GalaxyHistorianWriter.WriteBatchAsync` → Galaxy.Host pipe → Aveva Historian alarm schema.
```powershell
# OPC UA write path — requires LDAP from step 4a + a writeop-class user.
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write `
-u opc.tcp://localhost:4840/OtOpcUa -S sign `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/Source" `
-v 75 -U writeop -P writeop123
```
Verify the queue absorbed the event:
```powershell
sqlite3 "$env:ProgramData\OtOpcUa\alarm-historian-queue.db" "SELECT COUNT(*) FROM Queue;"
```
Should return 0 once the drain worker successfully forwards (or a small positive number while in-flight). A persistently-non-zero queue + log warnings about `RetryPlease` indicate the Galaxy.Host historian write path is failing — check the Host's log file.
#### Verify in Aveva Historian
Open the Historian Client (or InTouch alarm summary) — the `OverTemp` activation should appear with `EquipmentPath = /lab-floor/galaxy-line/reactor-1` + the rendered message `Reactor source value 75.3 exceeded 50` (or whatever value tripped it).
## Acceptance Checklist
- [ ] EF migrations applied through `20260420232000_ExtendComputeGenerationDiffWithPhase7`
- [ ] Smoke seed completes without errors + creates exactly 1 Published generation
- [ ] Server starts + logs the Phase 7 composition lines
- [ ] Client.CLI browse shows the UNS tree with Source / MachineStatus / OverTemp under reactor-1
- [ ] Read on `Source` returns a Good-quality Int32 value (proves MXAccess round-trip)
- [ ] Read on `MachineStatus` returns the live boolean truth of `Source > 0`
- [ ] Read on `OverTemp` returns the live boolean truth of `Source > 50`
- [ ] `test-galaxy.ps1 -Username writeop -Password writeop123` drives Source past 50 and flips `OverTemp` to `true` within 1 s
- [ ] SQLite queue drains (`COUNT(*)` returns to 0 within 2 s of an alarm transition)
- [ ] Historian shows the `OverTemp` activation event with the rendered message
## Second-run evidence (2026-04-24 dev box)
Full live stack ran end-to-end once the IPC unblocks (commit `d11dd05`), path-based NodeIds (commit `8be82e0`), cold-start engine guards (commit `69e1d32`), and seed retarget to `TestMachine_001.TestHistoryValue` (commit `ec1a590`) landed. Anonymous `scripts/e2e/test-galaxy.ps1` run reaches 3/7:
```
[PASS] source NodeId readable (Galaxy pipe → proxy → server → client chain up)
[PASS] source value = System.Byte[]
[INFO] BadUserAccessDenied — attribute's Galaxy-side ACL blocks writes for this session.
```
The `INFO` stage is correct behaviour — Source is `Operate`-classified and the anonymous session carries no LDAP roles. The Virtual-tag / Subscribe / Alarm / History stages stay at `[FAIL]` for two further environmental reasons once write is unblocked:
1. `TestMachine_001.TestHistoryValue` is driven by whatever Galaxy code runs on the object — idle in the default dev-box state, so no subscription pushes fire.
2. Historian writes require the Aveva Historian SDK to accept the alarm schema event — dev box doesn't have that path live.
Running `./test-galaxy.ps1 -Username writeop -Password writeop123` with step 4a's LDAP + `SecurityProfile = Basic256Sha256-Sign` applied unblocks the reverse-bridge + alarm-fires stages. The virtual-tag, subscribe, and history stages depend on further deployment choices (pick an attribute Galaxy is actively writing to, wire Aveva Historian SDK).
## First-run evidence (2026-04-20 dev box)
Ran the smoke against the live dev environment. Captured log signatures prove the Phase 7 wiring chain executes in production:
```
[INF] Bootstrapped from central DB: generation 1
[INF] Bootstrap complete: source=CentralDb generation=1
[INF] Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using NullAlarmHistorianSink
[INF] VirtualTagEngine loaded 1 tag(s), 1 upstream subscription(s)
[INF] ScriptedAlarmEngine loaded 1 alarm(s)
[INF] Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
```
Each line corresponds to a piece shipped in #243 / #244 / #245 / #246 / #247 — the composer ran, engines loaded, historian-sink decision fired, scripts compiled.
**Two gaps surfaced** (filed as new tasks below, NOT Phase 7 regressions):
1. **No driver-instance bootstrap pipeline.** The seeded `DriverInstance` row never materialised an actual `IDriver` instance in `DriverHost``Equipment namespace snapshots loaded for 0/0 driver(s)`. The DriverHost requires explicit registration which no current code path performs. Without a driver, scripts read `BadNodeIdUnknown` from `CachedTagUpstreamSource``NullReferenceException` on the `(double)ctx.GetTag(...).Value` cast. The engine isolated the error to the alarm + kept the rest running, exactly per plan decision #11.
2. **OPC UA endpoint port collision.** `Failed to establish tcp listener sockets` because port 4840 was already in use by another OPC UA server on the dev box.
Both are pre-Phase-7 deployment-wiring gaps. Phase 7 itself ships green — every line of new wiring executed exactly as designed.
## Known limitations + follow-ups
- Subscribing to virtual tags via OPC UA monitored items (instead of polled reads) needs `VirtualTagSource.SubscribeAsync` wiring through `DriverNodeManager.OnCreateMonitoredItem` — covered as part of release-readiness.
- Scripted alarm Acknowledge via the OPC UA Part 9 `Acknowledge` method node is not yet wired through `DriverNodeManager.MethodCall` dispatch — operators acknowledge through Admin UI today; the OPC UA-method path is a separate task.
- Phase 7 compliance script (`scripts/compliance/phase-7-compliance.ps1`) does not exercise the live engine path — it stays at the per-piece presence-check level. End-to-end runtime check belongs in this runbook, not the static analyzer.