Files
lmxopcua/docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime.md
T
Joseph Doherty 2b890fa716 docs(plan): re-scope script-log Layer 2 inbound-ack into T17-T24
The original single T17 (inbound method dispatch + ack plumbing) proved on a
2026-06-11 deep dive to be four hard problems: roles on the session identity
(T17), node-manager command router + AlarmAck veto + alarm-commands DPS topic
(T18), host-actor inbound handler (T19), and delta-gate double-emit (T20). Old
T18->T21 (AdminUI), old T19 split into T22 (Client.CLI feature) + T23 (verify),
old T20->T24. Adds the Layer 2 design-decisions preamble.
2026-06-11 05:37:54 -04:00

670 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Script-log Engine Emit + Scripted-Alarm Runtime — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: use superpowers-extended-cc:subagent-driven-development
> (or executing-plans) to implement this plan task-by-task.
**Goal:** Make the Script-log page tail real script output, and stand up scripted
alarms end-to-end including real OPC UA Part 9 condition nodes + client ack.
**Architecture:** Three sequenced layers off one shared seam (a root script logger
fanning to file + companion + a new DPS topic sink). Layer 0 = emit (F8 live). Layer 1
= F9 engine runtime on the Akka equipment-namespace runtime. Layer 2 = F14b real Part 9
nodes + events + inbound ack. Design: `docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime-design.md`.
Verified gap analysis: `pending.md`.
**Tech:** .NET 10, Akka.NET, EF Core (SQL prod / InMemory tests), Serilog, OPC
Foundation UA .NET Standard, xUnit + Shouldly, Akka TestKit. No bUnit.
**Hard rules (every task):** stage by explicit path — never `git add .`; never stage
`sql_login.txt` or `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`; never echo the gateway API
key into a new tracked file; no force-push, no `--no-verify`. **No Configuration entity
/ EF migration change** (`ScriptedAlarmState` table already exists). Agent does **not**
sign in to the AdminUI — the user drives live `/run`.
**Branch:** `feat/scriptlog-alarm-runtime` off `master @ df4c2657` (design committed there).
**Reference patterns to mirror:** `VirtualTagHostActor` (host actor shape),
`EfAlarmActorStateStore` (EF store shape), the `{{equip}}` two-seam parity work
(`Phase7Composer``DeploymentArtifact`), `RoslynVirtualTagEvaluator` (evaluator).
---
# LAYER 0 — Shared script-log emit + F8 live
### Task 0: Branch + test-project check
**Classification:** small · **~2 min** · **Parallelizable with:** none
**Files:** none created (branch + verification only)
**Steps:**
1. `git switch -c feat/scriptlog-alarm-runtime` (off `master @ df4c2657`).
2. Confirm `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` exists and is in the
`.slnx` (it does — `ScriptLoggerFactoryTests.cs` lives there). New Layer-0 tests land
here. Confirm `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` for Layer-1 tests.
3. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — green baseline. Commit nothing.
---
### Task 1: `IScriptLogPublisher` + `ScriptLogTopicSink`
**Classification:** standard · **~4 min** · **Parallelizable with:** none
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/IScriptLogPublisher.cs`
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLogTopicSink.cs`
- Test: `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptLogTopicSinkTests.cs`
- Maybe modify: `Core.Scripting.csproj` (add ProjectReference to `Commons` for
`ScriptLogEntry` if not already referenced — verify first).
**Step 1 — failing tests** (`ScriptLogTopicSinkTests`):
- A `LogEvent` (Information) with properties `ScriptId="S1"`, `VirtualTagId="V1"`,
`EquipmentId="EQ1"`, message `"hello"` → publisher receives one `ScriptLogEntry` with
those fields, `Level=="Information"`, `Message=="hello"`.
- `AlarmId` property maps to `ScriptLogEntry.AlarmId`; absent properties → null fields.
- A `Debug` event with default `minLevel=Information` → publisher receives **nothing**.
- Template message renders (`"v={V}"` + prop V=3 → `"v=3"`).
Use a fake `IScriptLogPublisher` capturing entries.
**Step 2 — run, expect fail** (types don't exist).
**Step 3 — implement:**
```csharp
public interface IScriptLogPublisher { void Publish(ScriptLogEntry entry); }
public sealed class ScriptLogTopicSink : ILogEventSink
{
private readonly IScriptLogPublisher _publisher;
private readonly LogEventLevel _min;
public ScriptLogTopicSink(IScriptLogPublisher publisher,
LogEventLevel min = LogEventLevel.Information) { _publisher = publisher; _min = min; }
public void Emit(LogEvent e)
{
if (e is null || e.Level < _min) return;
string? P(string k) => e.Properties.TryGetValue(k, out var v)
&& v is ScalarValue { Value: string s } ? s : null;
_publisher.Publish(new ScriptLogEntry(
ScriptId: P("ScriptId") ?? P("ScriptName") ?? "unknown",
Level: e.Level.ToString(),
Message: e.RenderMessage(),
TimestampUtc: e.Timestamp.UtcDateTime,
VirtualTagId: P("VirtualTagId"), AlarmId: P("AlarmId"), EquipmentId: P("EquipmentId")));
}
}
```
(Property-name constants — reuse/extend `ScriptLoggerFactory`'s `ScriptNameProperty`;
add `ScriptIdProperty`/`VirtualTagIdProperty`/`AlarmIdProperty`/`EquipmentIdProperty`.)
**Step 4 — run tests, expect pass. Step 5 — commit** (`git add` the 3 files by path).
---
### Task 2: Root script logger + `DpsScriptLogPublisher` + Host wiring
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T1)
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Scripting/DpsScriptLogPublisher.cs`
(or Host — wherever the `ActorSystem`/`Mediator` is reachable at construction).
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Logging/ScriptRootLoggerFactory.cs`
(builds the root `ILogger`: rolling `scripts-*.log` + `ScriptLogCompanionSink` +
`ScriptLogTopicSink`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (build + register root logger;
register `IScriptLogPublisher`).
- Test: `tests/.../Core.Scripting.Tests/ScriptRootLoggerFanoutTests.cs` (or Host.Tests).
**Steps (TDD):**
1. Failing test: a logger built by `ScriptRootLoggerFactory` with a fake publisher +
in-memory companion → an `Error` event reaches the companion mirror AND the topic
publisher; a `Debug` event reaches neither topic nor companion (file only). (Assert
via fakes; don't assert the physical file.)
2. Implement `DpsScriptLogPublisher` — ctor takes the DPS mediator `IActorRef` (or
`ActorSystem`); `Publish``mediator.Tell(new Publish("script-logs", entry))`
(topic constant `VirtualTagActor.ScriptLogsTopic`).
3. Implement `ScriptRootLoggerFactory.Build(IScriptLogPublisher, config)`
`LoggerConfiguration().WriteTo.File(...).WriteTo.Sink(new ScriptLogCompanionSink(Log.Logger))
.WriteTo.Sink(new ScriptLogTopicSink(publisher, minLevel)).CreateLogger()`.
4. `Program.cs`: resolve the mediator after the ActorSystem is up; register
`IScriptLogPublisher` (singleton) + the root `ILogger` (keyed/named for scripts).
Min-level from config (`Scripting:LogTopicMinLevel`, default `Information`).
5. Run + commit by path.
---
### Task 3: Rewire evaluators to the root script logger
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T1, T2)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs`
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs`
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLoggerFactory.cs` (bind the
full property set, not just `ScriptName`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (inject root logger into the
evaluators).
- Test: `tests/.../Core.Scripting.Tests/` — evaluator emits via fake publisher.
**Steps:**
1. Failing test: a `RoslynVirtualTagEvaluator` built with a root logger wired to a fake
publisher; evaluate a script `ctx.Logger.Information("hi"); return 1;` → publisher
gets one entry with `ScriptId`/`VirtualTagId` bound and `Message=="hi"`.
2. Replace the static `ScriptLogger` field with a ctor-injected root `ILogger`. Per
evaluation, `var log = _root.ForContext("ScriptId", id).ForContext("VirtualTagId", virtualTagId)`
(+ `EquipmentId` when available) and pass into the `VirtualTagContext`. Same for the
alarm evaluator (binds `AlarmId`).
3. `ScriptLoggerFactory`: add a `Create(scriptId, virtualTagId?, alarmId?, equipmentId?)`
overload binding the standard properties (keep the old `Create(scriptName)` for
compatibility).
4. `Program.cs`: pass the root logger to both evaluator registrations.
5. Run + commit by path.
> Note: `IVirtualTagEvaluator.Evaluate` carries `virtualTagId`; in the live path
> `scriptId == virtualTagId`, so Layer 0 binds both from it. Threading a distinct
> `EquipmentId` (nice-to-have on the page) is optional here — if it requires an
> interface change, defer it to a Layer-1 follow-up rather than expanding T3.
---
### Task 4: Live-verify Layer 0
**Classification:** verification · **Parallelizable with:** none (depends T2, T3)
**Steps:**
1. Rebuild docker-dev central nodes (user-driven `/run`). Author a virtual tag whose
script calls `ctx.Logger.Information(...)`.
2. Open `/script-log`; drive the dependency so the script evaluates; confirm the line
appears live with the right ScriptId/level. Confirm Debug stays off the page,
Information+ shows.
3. **Agent does not sign in** — user signs in and drives. Record outcome. No code unless
a defect surfaces (→ new fix task).
---
# LAYER 1 — F9 engine runtime
### Task 5: `EquipmentScriptedAlarmPlan` + Phase7Composer enrichment
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 7, Task 8
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs` (new record +
build the enriched list from `ScriptedAlarm` + `Script` rows).
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/Phase7/` (or wherever Phase7Composer
is tested) — new `Phase7ComposerScriptedAlarmTests.cs`.
**Steps:**
1. Failing test: compose two equipments each with a scripted alarm referencing a script;
assert each `EquipmentScriptedAlarmPlan` carries the resolved `PredicateSource`,
extracted `DependencyRefs` (via `DependencyExtractor`), `AlarmType`, `Severity`,
`MessageTemplate`, `HistorizeToAveva`, `Retain`, `Enabled`, `Name`.
2. Add `public sealed record EquipmentScriptedAlarmPlan(string ScriptedAlarmId, string
EquipmentId, string Name, string AlarmType, int Severity, string MessageTemplate,
string PredicateScriptId, string PredicateSource, IReadOnlyList<string> DependencyRefs,
bool HistorizeToAveva, bool Retain, bool Enabled);`
3. In `Compose`: join `ScriptedAlarm.PredicateScriptId → Script.SourceCode`; run
`DependencyExtractor.Extract(source).Reads` ( `MessageTemplate` token paths) for
`DependencyRefs`; project into the new list on the composition result. Skip
`Enabled=false` alarms (or carry the flag — carry it; host decides). Drop alarms whose
script is missing with a structured warning (don't throw the whole compose).
4. Run + commit by path.
---
### Task 6: DeploymentArtifact parity for the alarm plan
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 7, Task 8 (depends T5)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs`
(encode/decode `EquipmentScriptedAlarmPlan`; add
`Phase7CompositionResult.EquipmentScriptedAlarms`; filter-by-equipment like
`EquipmentVirtualTags` at :263).
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — artifact round-trip + parity
with the Composer for the same input.
**Steps:**
1. Failing test: build a composition via `Phase7Composer`, serialize to artifact, parse
back → `EquipmentScriptedAlarms` is byte-identical (same discipline as the `{{equip}}`
parity tests). Equipment-filter test (only alarms for resident equipment survive).
2. Add the field to `Phase7CompositionResult`; mirror the `EquipmentVirtualTags`
encode/decode/filter exactly (`:202`, `:263`).
3. Run + commit by path.
---
### Task 7: `DependencyMuxTagUpstreamSource`
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 5, Task 6, Task 8
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/DependencyMuxTagUpstreamSource.cs`
(implements `Core.ScriptedAlarms`/`Core.VirtualTags` `ITagUpstreamSource`).
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/DependencyMuxTagUpstreamSourceTests.cs`
**Steps:**
1. Failing tests: `Push(path, snapshot)` updates cache so `ReadTag(path)` returns it;
`SubscribeTag(path, obs)` → `obs` fires on the next `Push`; `ReadTag` for an unknown
path returns a Bad-quality snapshot; dispose removes the observer.
2. Implement: a thread-safe cache (`ConcurrentDictionary<string, DataValueSnapshot>`) +
per-path observer list; `Push` updates cache then invokes observers; `ReadTag` reads
cache (Bad if absent); `SubscribeTag` returns an `IDisposable` that deregisters. The
host actor calls `Push` from its `DependencyValueChanged` handler. Value wrap:
`new DataValueSnapshot(value, StatusCode:0, ts, ts)`.
3. Run + commit by path.
---
### Task 8: `EfAlarmConditionStateStore : IAlarmStateStore`
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 5, Task 6, Task 7
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/EfAlarmConditionStateStore.cs`
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/EfAlarmConditionStateStoreTests.cs`
(in-memory EF).
**Steps:**
1. Failing tests (in-memory `OtOpcUaConfigDbContext`): `SaveAsync(state)` then
`LoadAsync(alarmId)` round-trips Enabled/Acked/Confirmed/Shelving(+UnshelveAtUtc)/
LastAck*/LastConfirm*/Comments; `LoadAsync` of an unknown id → null; `ActiveState`
is **not** persisted (a saved state's Active is ignored on load — load returns the
stored operator state, Active defaults). Comments JSON round-trips.
2. Implement mapping `AlarmConditionState` ↔ `ScriptedAlarmState` entity (mirror
`EfAlarmActorStateStore`'s `IDbContextFactory` upsert pattern; serialize
`ImmutableList<AlarmComment>` ↔ `CommentsJson`). Map enum states ↔ the entity's string
columns.
3. Run + commit by path.
---
### Task 9: `ScriptedAlarmHostActor`
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T6, T7, T8; needs Layer 0 T2/T3 root logger)
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs`
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
(Akka TestKit + a real engine with the fake upstream, or a fake engine seam).
**Design:** mirrors `VirtualTagHostActor`. Owns one `ScriptedAlarmEngine` (built with the
`DependencyMuxTagUpstreamSource`, the `EfAlarmConditionStateStore`, a `ScriptLoggerFactory`
wrapping the **Layer 0 root logger**, and the engine's root logger). Message
`ApplyScriptedAlarms(IReadOnlyList<EquipmentScriptedAlarmPlan> Plans)`.
**Steps:**
1. Failing TestKit tests:
- `ApplyScriptedAlarms` with one alarm → engine loaded (assert via a probe/seam);
registers interest with the (probe) mux for the alarm's dep refs.
- A `DependencyValueChanged` that makes the predicate true → the host tells the
(probe) `OpcUaPublishActor` a `WriteAlarmState(alarmId, active:true, …)`, tells the
(probe) historian an `AlarmHistorianEvent` (when `HistorizeToAveva`), and publishes
an `AlarmTransitionEvent` on `alerts`.
- Re-`ApplyScriptedAlarms` with a different set reloads the engine (LoadAsync replace).
2. Implement: on `ApplyScriptedAlarms`, build `ScriptedAlarmDefinition`s from the plans
(map `AlarmType`→`AlarmKind`, `Severity`→`AlarmSeverity`, `EquipmentId`→`EquipmentPath`),
`engine.LoadAsync`; register mux interest for ` DependencyRefs`; on
`DependencyValueChanged` → `_upstream.Push(...)`. Subscribe `engine.OnEvent` once →
map `ScriptedAlarmEvent.Condition` to `(active, acknowledged)` →
`OpcUaPublishActor.WriteAlarmState`; map → `AlarmHistorianEvent` → historian (if
Historize); publish `AlarmTransitionEvent` on `alerts`. Dispose engine in `PostStop`.
3. Run targeted tests (`dotnet test --filter ScriptedAlarmHostActor`). Commit by path.
---
### Task 10: Spawn + apply in DriverHostActor
**Classification:** standard · **~4 min** · **Parallelizable with:** none (depends T9)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs`
(spawn `ScriptedAlarmHostActor` next to `VirtualTagHostActor` ~:197; tell
`ApplyScriptedAlarms(composition.EquipmentScriptedAlarms)` next to the vtag apply ~:532;
add an override field for tests like `_virtualTagHostOverride`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` if the
host needs new injected deps (EF store factory, root logger, historian ref).
- Test: extend `DriverHostActorTests` — apply pushes `ApplyScriptedAlarms`.
**Steps:** mirror the VirtualTag spawn/apply exactly; thread `_opcUaPublishActor`,
`_dependencyMux`, the EF store, the root logger, the historian actor ref. Run + commit.
---
### Task 11: Retire the orphaned actor + F9b evaluator
**Classification:** small · **~3 min** · **Parallelizable with:** none (depends T9, T10)
**Files:**
- Delete: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`
(+ its tests) and `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs`.
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (remove F9b DI registration,
lines ~110-114) and any `IScriptedAlarmEvaluator` references. Keep
`EfAlarmActorStateStore` only if nothing else uses it — otherwise delete with the actor.
**Steps:** delete, fix build, run full `dotnet test` for the touched projects, commit by
path. (If something unexpectedly depends on these, stop and surface — don't expand scope.)
---
### Task 12: Live-verify Layer 1
**Classification:** verification · **Parallelizable with:** none (depends T10, T11)
**Steps:** rebuild docker-dev; author a scripted alarm whose predicate references a live
tag; drive the tag; confirm the alarm node flips active/clear, the historian queue
advances (`/alarms/historian`), the `alerts`/Alerts page shows it, and predicate
`ctx.Logger` output appears on `/script-log`. User drives sign-in. Defects → new tasks.
---
# LAYER 2 — F14b real Part 9 + client ack
### Task 13: SDK research spike (DeepWiki)
**Classification:** small (research) · **~5 min** · **Parallelizable with:** Layer-1 tasks
**Steps:** Use the DeepWiki MCP (`OPCFoundation/UA-.NETStandard`) to confirm: how to
create + add an `AlarmConditionState` (and Limit/OffNormal/Discrete subtypes) under a
parent in a `CustomNodeManager2`; how to set ActiveState/AckedState/ConfirmedState/
ShelvingState/Severity/Retain; how transitions fire events (`ReportEvent`); how inbound
`Acknowledge`/`Shelve`/`Confirm` method calls are dispatched + where to hook them. Write
findings to `docs/v2/f14b-part9-sdk-notes.md` (committed). This de-risks T14-T17.
---
### Task 14: Real condition-node materialisation
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T13)
**Files:** `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` (replace the
placeholder `[active, ack]` variable in `WriteAlarmState` / add a `MaterialiseAlarmCondition`
path per `AlarmType`); `Phase7Applier.cs` (call the new materialiser); tests where the
SDK allows (node existence/type assertions).
**Steps:** create real condition nodes on materialise; keep `WriteAlarmState` as a thin
shim during transition or replace its callers. Run + commit. (SDK threading: all via the
pinned `OpcUaPublishActor` dispatcher.)
---
### Task 15: Richer alarm-state bridge
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 17 (depends T14, T9)
**Files:** `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs` (new message
carrying the full `AlarmConditionState`, not 2 bools); `ScriptedAlarmHostActor` bridge
(send the richer message); `OtOpcUaNodeManager` (apply full state to the condition).
Tests: message mapping.
---
### Task 16: Event firing on transition
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T14, T15)
**Files:** `OtOpcUaNodeManager.cs` (`condition.ReportEvent(...)` on state change). Tests:
mapping/coverage where feasible; behaviour proven in T19.
---
# LAYER 2 — inbound client ack/shelve (re-scoped 2026-06-11)
> **Status:** T0T16 are **merged to `master`** (Layers 0+1 live-verified; Layer 2 Part-9
> nodes/state/events done). The original single **T17 "Inbound method dispatch + ack plumbing"**
> (`high-risk · ~5 min`) proved to be **four separate hard problems**, each its own task. After a
> 2026-06-11 deep dive into the real code, T17 is split into **T17T20** and the old T18T20 shift
> to **T21T24** (old T19's Client.CLI work also grew a feature half, T22). This is the deferred
> "fresh piece": branch off the **current** `master` (`git switch -c feat/scriptlog-alarm-ack`),
> not the old `feat/scriptlog-alarm-runtime` base.
### Layer 2 design decisions (resolved in the re-scope deep dive)
These are the load-bearing findings the new tasks rest on — verified against the current code, not
the original recon's assumptions:
1. **Topology — same-node co-location, multi-node ownership.** The OPC UA SDK server (+
`OtOpcUaNodeManager`) and the `ScriptedAlarmHostActor` are **both spawned on every
driver-role node** in the same `ActorSystem` (`OtOpcUaServerHostedService` +
`DriverHostActor.SpawnScriptedAlarmHost`). So per node they're co-located and an in-process
`Tell` would reach the local host. **But** in a multi-driver-node cluster each node owns a
**disjoint** subset of alarms (its resident equipment, via the T6 artifact equipment-filter),
and a client connects to **one** node's server. Whether that node owns the alarm the client
acks is **not guaranteed**. **Decision:** route inbound commands over a new DPS topic
`alarm-commands` (mirrors `alerts`/`script-logs`), and have each `ScriptedAlarmHostActor`
**ignore commands for alarmIds its engine doesn't own**. This works same-node and cross-node
with one mechanism. (Open item to confirm in T18: whether each node's address space is
partitioned to its own equipment or replicated — if partitioned, a client can only ever ack
local alarms and the DPS broadcast is still correct, just always locally satisfied.)
2. **The node manager has no Akka handle and must stay that way.** `OtOpcUaNodeManager(server,
configuration)` (`OtOpcUaSdkServer.CreateMasterNodeManager`) holds **no** `IActorRef` /
`ActorSystem` / DI. The existing **forward** seam is `OpcUaPublishActor → IOpcUaAddressSpaceSink
(DeferredAddressSpaceSink → SdkAddressSpaceSink → node manager)`. For the **reverse** path the
node manager gets a **settable command-router delegate** (`Action<AlarmCommand>`), wired at
boot by `OtOpcUaServerHostedService` (which *does* have the DPS mediator) to publish onto
`alarm-commands`. The node manager itself never touches Akka.
3. **No explicit re-projection after an engine op.** Every `ScriptedAlarmEngine` op
(`AcknowledgeAsync`/`ConfirmAsync`/`OneShotShelveAsync`/`TimedShelveAsync`/`UnshelveAsync`/
`EnableAsync`/`DisableAsync`/`AddCommentAsync` — all exist, signatures verified) raises the
engine's `OnEvent`, which the host's **existing** `OnEngineEmission` already projects to the
node. So the inbound handler just calls the op and awaits — the ack visibly updates the node
for free. This makes T19 small.
4. **Roles are dropped at the impersonation seam.** `OpcUaApplicationHost.cs:292` does
`args.Identity = new UserIdentity(token)` and **discards** `result.Roles` (only logs them at
:293). `OpcUaUserAuthResult.Roles` is `IReadOnlyList<string>` (`ReadOnly`/`WriteOperate`/
`WriteTune`/`WriteConfigure`/`AlarmAck`); there is an `OpcUaOperation` enum
(`Core.Abstractions`) with `AlarmAcknowledge`/`AlarmConfirm`/`AlarmShelve`, but **no role is
consulted anywhere post-auth today** (writes aren't gated either — this is greenfield, not a
pattern to copy). **Risk (drives T17 being its own task):** it is **unconfirmed** that a custom
`UserIdentity` subclass survives the SDK round-trip back to `context.UserIdentity` inside a
method handler. T17 must *prove* the round-trip (integration assertion); fallback is populating
`GrantedRoleIds` (`NodeIdCollection`) by mapping role strings → role NodeIds, which is more work.
5. **Double-emit is real, and delta-gating resolves it.** `WriteAlarmCondition` calls
`ReportConditionEvent` **unconditionally** (`OtOpcUaNodeManager.cs:156`); the node manager keeps
**no** previous snapshot. Once inbound acks route through the engine, the SDK's own
`OnAcknowledgeCalled` auto-fires event E2 (applying acked state to the node) **and** the engine
round-trip re-projects → would fire E3. Because the SDK applies the acked state *before* the
async engine round-trip completes, **delta-gating `WriteAlarmCondition` against the node's
current state** suppresses E3 (no delta) while still firing on genuine engine-driven
transitions. That's T20. (Fallback if it proves racy: the correlation-suppression option already
sketched in the `:190-198` in-code note — skip engine re-projection for inbound-originated
transitions.)
---
### Task 17: Carry LDAP roles onto the OPC UA session identity
**Classification:** high-risk · **~5 min** · **Parallelizable with:** Task 22
**Files:**
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Security/RoleCarryingUserIdentity.cs`
(`: UserIdentity`, adds `IReadOnlyList<string> Roles`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs:292`
(`args.Identity = new RoleCarryingUserIdentity(token, result.Roles)`).
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostImpersonationTests.cs`
(existing home for `HandleImpersonation`).
**Steps:**
1. **Round-trip spike FIRST (de-risk the whole task).** Before building anything, confirm the SDK
preserves a custom `IUserIdentity` instance: in a booted in-process server test (mirror
`SdkAddressSpaceSinkTests`' server fixture), set `args.Identity` to a sentinel subclass during
impersonation and assert a method handler reads it back via
`(context as ISessionOperationContext)?.UserIdentity` **as that subclass**. If it does NOT
survive (SDK wraps/strips it), STOP and switch to the `GrantedRoleIds` approach — surface this
as a scope change, don't silently expand.
2. Failing unit test: `HandleImpersonation` on a successful auth sets `args.Identity` to a
`RoleCarryingUserIdentity` whose `Roles` equals `result.Roles` (and the existing
identity/denial/anonymous tests still pass).
3. Implement `RoleCarryingUserIdentity` + the one-line `:292` swap.
4. Run (`OpcUaServer.Tests` + the impersonation tests) + commit by path.
> Security-path change → high-risk. Touches only the identity construction; no auth-decision logic
> changes (roles were already resolved, just discarded). Do **not** change `IOpcUaUserAuthenticator`
> or the LDAP bind.
---
### Task 18: Node-manager command router + `AlarmAck` veto gate + `alarm-commands` topic
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T17)
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/AlarmCommand.cs`
(`record AlarmCommand(string AlarmId, string Operation, string User, string? Comment, DateTime? UnshelveAtUtc)`;
`Operation` ∈ Acknowledge/Confirm/OneShotShelve/TimedShelve/Unshelve/Enable/Disable/AddComment).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — add a settable
`Action<AlarmCommand>? AlarmCommandRouter`; in `MaterialiseAlarmCondition` (after `Create` +
initial state, before `AddChild`) wire `alarm.OnAcknowledge`/`OnConfirm`/`OnAddComment`/
`OnShelve`/`OnTimedUnshelve`. Each delegate: (a) read principal via
`(context as ISessionOperationContext)?.UserIdentity as RoleCarryingUserIdentity`, **gate on
`AlarmAck`** → return `StatusCodes.BadUserAccessDenied` if absent; (b) invoke `AlarmCommandRouter`
with the mapped `AlarmCommand` (so the engine updates the **domain** store + audit + alerts
historization); (c) return `ServiceResult.Good` so the SDK applies node state + auto-fires (the
engine re-projection is de-duped in T20).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs` (pass-through to expose
the router setter on the node manager).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs` (after the server
starts + node manager exists: resolve the DPS mediator, set the router to
`mediator.Tell(new Publish(ScriptedAlarmHostActor.AlarmCommandsTopic, cmd))`).
- Add the topic const `AlarmCommandsTopic = "alarm-commands"` on `ScriptedAlarmHostActor` (used here
and in T19).
- Test: `OpcUaServer.Tests` — veto gate allows with `AlarmAck` / denies without (drive a wired
condition's `OnAcknowledge` with a `RoleCarryingUserIdentity` context); router invoked with the
correctly-mapped `AlarmCommand` (fake `Action`).
**Steps:** TDD the gate + router-mapping in the node manager; then the SDK-server pass-through; then
the hosted-service wiring (no unit test for the boot wiring — exercised by T23 live-verify). Commit
by path. **Serialize with T20** (both touch `OtOpcUaNodeManager.cs`).
---
### Task 19: `ScriptedAlarmHostActor` inbound command handler
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 20 (depends T18)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` —
subscribe to `AlarmCommandsTopic` in `PreStart` (alongside the existing `_mediator` use);
`Receive<AlarmCommand>(OnAlarmCommand)`; `OnAlarmCommand` is `async void`, switches on
`Operation` → the matching `engine.<Op>Async(AlarmId, User, …, CancellationToken.None)`.
**Ownership filter:** if the engine doesn't own `AlarmId`, no-op (multi-node broadcast). Catch +
log op failures (mirror `OnLoadFailed`). **No explicit re-projection** — the engine's `OnEvent`
drives the existing `OnEngineEmission` → node update.
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
(extend) — TestKit: an `AlarmCommand{Operation="Acknowledge"}` for a loaded alarm calls the
engine's `AcknowledgeAsync` (fake/probe engine seam); an unknown `AlarmId` is ignored;
`TimedShelve` without `UnshelveAtUtc` is rejected/logged, not thrown.
**Steps:** TDD via the existing host-actor test seam; run `dotnet test --filter ScriptedAlarmHostActor`;
commit by path.
---
### Task 20: Delta-gate event firing (kill the inbound double-emit)
**Classification:** high-risk · **~4 min** · **Parallelizable with:** Task 19 (depends T18)
**Files:**
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — keep a
`ConcurrentDictionary<string, AlarmConditionSnapshot> _lastAlarmState`; in `WriteAlarmCondition`,
after projecting, compare the new `state` to the stored snapshot and **only call
`ReportConditionEvent` when it differs** (then store it). Replace the now-stale `:151-156`
"fire exactly one event" comment + tighten the `:190-198` double-emit note to "resolved by
delta-gate".
- Test: `tests/.../OpcUaServer.Tests/SdkAddressSpaceSinkTests.cs` (extend) — two identical
`WriteAlarmCondition` calls fire the condition event **once**; a changed call fires again.
(Assert via an event-count probe / monitored-item on the booted server fixture.)
**Steps:** TDD the delta-gate; run `OpcUaServer.Tests`; commit by path. If the booted-server test
can't cleanly count events, fall back to asserting the gate's decision via a seam and prove
end-to-end in T23. **Serialize after T18** (same file).
---
### Task 21: AdminUI ack/shelve control
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T19)
**Files:**
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/AcknowledgeAlarmCommand.cs` +
`ShelveAlarmCommand.cs` (control-plane messages, mirror `StartDeployment`'s shape with a
`CorrelationId`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs`
(the existing admin-pinned cluster singleton) — `ReceiveAsync` handlers that publish onto
`alarm-commands` (reusing T18's topic + the host's ownership filter → the singleton solves
cross-node routing for the AdminUI path too).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminOperationsClient.cs`
(+ its interface) — `AcknowledgeAlarmAsync` / `ShelveAlarmAsync` (mirror `StartDeploymentAsync`).
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Alerts.razor` — per-row
Acknowledge / Shelve buttons → `IAdminOperationsClient`; operator from
`AuthState … User.Identity?.Name`.
- Test: the control-plane command service + the new `AdminOperationsActor` handlers (TestKit / Ask).
**No bUnit** — the razor is proven in T23.
**Steps:** TDD the messages + actor handlers + client; wire the razor; run + commit by path.
---
### Task 22: Client.CLI ack / confirm / shelve commands
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 17T21 (disjoint `Client.*`)
**Files:**
- Modify: `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/IOpcUaClientService.cs` +
`OpcUaClientService.cs` — `AcknowledgeAlarmAsync` already declared (no command wires it yet); add
`ConfirmAlarmAsync` + `ShelveAlarmAsync` (call the SDK `Confirm`/`OneShotShelve`/`TimedShelve`/
`Unshelve` methods on the condition).
- Create: `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI/Commands/{Acknowledge,Confirm,Shelve}Command.cs`
(`--node`, `--event-id`, `--comment`; shelve adds `--kind OneShot|Timed --unshelve-at`).
- Test: unit-test what's pure (arg→request mapping); the live round-trip is T23.
**Steps:** add the service methods + CLI commands; build; commit by path. This is **net-new client
feature work** (the reason old "T19 verification" couldn't just be a verify pass).
---
### Task 23: Live-verify Layer 2 end-to-end
**Classification:** verification · **Parallelizable with:** none (depends T18, T19, T20, T21, T22)
**Steps:** docker-dev up. Use the **already-deployed `t12-overheat`** alarm (rig state, below) as the
live condition. With `Client.CLI`: `alarms --refresh` shows the real condition; drive
`TestMachine_002.TestChangingInt` past the predicate so it fires an event on transition; call the new
`acknowledge` command → confirm the ack **round-trips** (node `AckedState` flips, exactly **one** ack
event fires — no double-emit, T20 — state **persists** across a node restart). Repeat the ack from
the AdminUI `/alerts` buttons (T21) and confirm parity. Verify the `AlarmAck` gate: a user **without**
`AlarmAck` is denied (`BadUserAccessDenied`). **Agent does not sign in** — user drives. Defects → new
fix tasks.
---
### Task 24: Docs + cleanup + finish branch
**Classification:** small · **~5 min** · **Parallelizable with:** none (depends all)
**Files:** update `docs/ScriptedAlarms.md`, `docs/VirtualTags.md`, `docs/v2/Runtime.md`,
`docs/AlarmTracking.md` (the inbound-ack + `AlarmAck`-gate flow is now real); **correct the stale
`docs/v2/phase-7-status.md` alarm-runtime status**; CLAUDE.md note. **Clean up the deliberately-left
rig artifacts** (`t12-overheat`, script `SC-ba675b168a85`, the `layer0-logcheck` vtag, and revert
filler-02's inert `cycle-time-s` logger line — redeploy). Delete/condense `resume.md` + `pending.md`.
Then run superpowers-extended-cc:finishing-a-development-branch (full `dotnet test`, merge to master).
---
## Execution notes
- **Parallel dispatch (Layers 0+1, done):** Layer 0 serial (T1→T2→T3→T4). Layer 1: **T5→T6** serial;
**T7, T8 parallel** with T5/T6; T9 waits on T6/T7/T8; T10→T11→T12 serial.
- **Parallel dispatch (Layer 2 remainder, T17T24):**
- **T17** first (roles) — its **step-1 round-trip spike is a go/no-go gate** for the gate design.
- **T18** after T17 (the veto gate needs the roles). **T19 ∥ T20** after T18 (disjoint files:
`ScriptedAlarmHostActor.cs` vs `OtOpcUaNodeManager.cs`).
- **T22 runs in parallel with the whole T17T21 server chain** (only `Client.*` files).
- **T21** after T19. **T23** after T18/T19/T20/T21/T22. **T24** last.
- **One writer at a time** within a shared file: `OtOpcUaNodeManager.cs` is touched by **T18 and
T20 — serialize T18 → T20**. (Layers 0/1 already merged, so Program.cs/T14-T16 contention is moot.)
- **Layer boundaries are natural checkpoints** — Layers 0+1 shipped; the T17 round-trip spike is the
next gate before committing to the rest of the Layer 2 inbound epic.