2b890fa716
The original single T17 (inbound method dispatch + ack plumbing) proved on a 2026-06-11 deep dive to be four hard problems: roles on the session identity (T17), node-manager command router + AlarmAck veto + alarm-commands DPS topic (T18), host-actor inbound handler (T19), and delta-gate double-emit (T20). Old T18->T21 (AdminUI), old T19 split into T22 (Client.CLI feature) + T23 (verify), old T20->T24. Adds the Layer 2 design-decisions preamble.
670 lines
36 KiB
Markdown
670 lines
36 KiB
Markdown
# Script-log Engine Emit + Scripted-Alarm Runtime — Implementation Plan
|
||
|
||
> **For Claude:** REQUIRED SUB-SKILL: use superpowers-extended-cc:subagent-driven-development
|
||
> (or executing-plans) to implement this plan task-by-task.
|
||
|
||
**Goal:** Make the Script-log page tail real script output, and stand up scripted
|
||
alarms end-to-end including real OPC UA Part 9 condition nodes + client ack.
|
||
|
||
**Architecture:** Three sequenced layers off one shared seam (a root script logger
|
||
fanning to file + companion + a new DPS topic sink). Layer 0 = emit (F8 live). Layer 1
|
||
= F9 engine runtime on the Akka equipment-namespace runtime. Layer 2 = F14b real Part 9
|
||
nodes + events + inbound ack. Design: `docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime-design.md`.
|
||
Verified gap analysis: `pending.md`.
|
||
|
||
**Tech:** .NET 10, Akka.NET, EF Core (SQL prod / InMemory tests), Serilog, OPC
|
||
Foundation UA .NET Standard, xUnit + Shouldly, Akka TestKit. No bUnit.
|
||
|
||
**Hard rules (every task):** stage by explicit path — never `git add .`; never stage
|
||
`sql_login.txt` or `src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/`; never echo the gateway API
|
||
key into a new tracked file; no force-push, no `--no-verify`. **No Configuration entity
|
||
/ EF migration change** (`ScriptedAlarmState` table already exists). Agent does **not**
|
||
sign in to the AdminUI — the user drives live `/run`.
|
||
|
||
**Branch:** `feat/scriptlog-alarm-runtime` off `master @ df4c2657` (design committed there).
|
||
|
||
**Reference patterns to mirror:** `VirtualTagHostActor` (host actor shape),
|
||
`EfAlarmActorStateStore` (EF store shape), the `{{equip}}` two-seam parity work
|
||
(`Phase7Composer` ↔ `DeploymentArtifact`), `RoslynVirtualTagEvaluator` (evaluator).
|
||
|
||
---
|
||
|
||
# LAYER 0 — Shared script-log emit + F8 live
|
||
|
||
### Task 0: Branch + test-project check
|
||
|
||
**Classification:** small · **~2 min** · **Parallelizable with:** none
|
||
|
||
**Files:** none created (branch + verification only)
|
||
|
||
**Steps:**
|
||
1. `git switch -c feat/scriptlog-alarm-runtime` (off `master @ df4c2657`).
|
||
2. Confirm `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` exists and is in the
|
||
`.slnx` (it does — `ScriptLoggerFactoryTests.cs` lives there). New Layer-0 tests land
|
||
here. Confirm `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` for Layer-1 tests.
|
||
3. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` — green baseline. Commit nothing.
|
||
|
||
---
|
||
|
||
### Task 1: `IScriptLogPublisher` + `ScriptLogTopicSink`
|
||
|
||
**Classification:** standard · **~4 min** · **Parallelizable with:** none
|
||
|
||
**Files:**
|
||
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/IScriptLogPublisher.cs`
|
||
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLogTopicSink.cs`
|
||
- Test: `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptLogTopicSinkTests.cs`
|
||
- Maybe modify: `Core.Scripting.csproj` (add ProjectReference to `Commons` for
|
||
`ScriptLogEntry` if not already referenced — verify first).
|
||
|
||
**Step 1 — failing tests** (`ScriptLogTopicSinkTests`):
|
||
- A `LogEvent` (Information) with properties `ScriptId="S1"`, `VirtualTagId="V1"`,
|
||
`EquipmentId="EQ1"`, message `"hello"` → publisher receives one `ScriptLogEntry` with
|
||
those fields, `Level=="Information"`, `Message=="hello"`.
|
||
- `AlarmId` property maps to `ScriptLogEntry.AlarmId`; absent properties → null fields.
|
||
- A `Debug` event with default `minLevel=Information` → publisher receives **nothing**.
|
||
- Template message renders (`"v={V}"` + prop V=3 → `"v=3"`).
|
||
Use a fake `IScriptLogPublisher` capturing entries.
|
||
|
||
**Step 2 — run, expect fail** (types don't exist).
|
||
|
||
**Step 3 — implement:**
|
||
```csharp
|
||
public interface IScriptLogPublisher { void Publish(ScriptLogEntry entry); }
|
||
|
||
public sealed class ScriptLogTopicSink : ILogEventSink
|
||
{
|
||
private readonly IScriptLogPublisher _publisher;
|
||
private readonly LogEventLevel _min;
|
||
public ScriptLogTopicSink(IScriptLogPublisher publisher,
|
||
LogEventLevel min = LogEventLevel.Information) { _publisher = publisher; _min = min; }
|
||
public void Emit(LogEvent e)
|
||
{
|
||
if (e is null || e.Level < _min) return;
|
||
string? P(string k) => e.Properties.TryGetValue(k, out var v)
|
||
&& v is ScalarValue { Value: string s } ? s : null;
|
||
_publisher.Publish(new ScriptLogEntry(
|
||
ScriptId: P("ScriptId") ?? P("ScriptName") ?? "unknown",
|
||
Level: e.Level.ToString(),
|
||
Message: e.RenderMessage(),
|
||
TimestampUtc: e.Timestamp.UtcDateTime,
|
||
VirtualTagId: P("VirtualTagId"), AlarmId: P("AlarmId"), EquipmentId: P("EquipmentId")));
|
||
}
|
||
}
|
||
```
|
||
(Property-name constants — reuse/extend `ScriptLoggerFactory`'s `ScriptNameProperty`;
|
||
add `ScriptIdProperty`/`VirtualTagIdProperty`/`AlarmIdProperty`/`EquipmentIdProperty`.)
|
||
|
||
**Step 4 — run tests, expect pass. Step 5 — commit** (`git add` the 3 files by path).
|
||
|
||
---
|
||
|
||
### Task 2: Root script logger + `DpsScriptLogPublisher` + Host wiring
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T1)
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Scripting/DpsScriptLogPublisher.cs`
|
||
(or Host — wherever the `ActorSystem`/`Mediator` is reachable at construction).
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Logging/ScriptRootLoggerFactory.cs`
|
||
(builds the root `ILogger`: rolling `scripts-*.log` + `ScriptLogCompanionSink` +
|
||
`ScriptLogTopicSink`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (build + register root logger;
|
||
register `IScriptLogPublisher`).
|
||
- Test: `tests/.../Core.Scripting.Tests/ScriptRootLoggerFanoutTests.cs` (or Host.Tests).
|
||
|
||
**Steps (TDD):**
|
||
1. Failing test: a logger built by `ScriptRootLoggerFactory` with a fake publisher +
|
||
in-memory companion → an `Error` event reaches the companion mirror AND the topic
|
||
publisher; a `Debug` event reaches neither topic nor companion (file only). (Assert
|
||
via fakes; don't assert the physical file.)
|
||
2. Implement `DpsScriptLogPublisher` — ctor takes the DPS mediator `IActorRef` (or
|
||
`ActorSystem`); `Publish` → `mediator.Tell(new Publish("script-logs", entry))`
|
||
(topic constant `VirtualTagActor.ScriptLogsTopic`).
|
||
3. Implement `ScriptRootLoggerFactory.Build(IScriptLogPublisher, config)` →
|
||
`LoggerConfiguration().WriteTo.File(...).WriteTo.Sink(new ScriptLogCompanionSink(Log.Logger))
|
||
.WriteTo.Sink(new ScriptLogTopicSink(publisher, minLevel)).CreateLogger()`.
|
||
4. `Program.cs`: resolve the mediator after the ActorSystem is up; register
|
||
`IScriptLogPublisher` (singleton) + the root `ILogger` (keyed/named for scripts).
|
||
Min-level from config (`Scripting:LogTopicMinLevel`, default `Information`).
|
||
5. Run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 3: Rewire evaluators to the root script logger
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T1, T2)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs`
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs`
|
||
- Modify: `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLoggerFactory.cs` (bind the
|
||
full property set, not just `ScriptName`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (inject root logger into the
|
||
evaluators).
|
||
- Test: `tests/.../Core.Scripting.Tests/` — evaluator emits via fake publisher.
|
||
|
||
**Steps:**
|
||
1. Failing test: a `RoslynVirtualTagEvaluator` built with a root logger wired to a fake
|
||
publisher; evaluate a script `ctx.Logger.Information("hi"); return 1;` → publisher
|
||
gets one entry with `ScriptId`/`VirtualTagId` bound and `Message=="hi"`.
|
||
2. Replace the static `ScriptLogger` field with a ctor-injected root `ILogger`. Per
|
||
evaluation, `var log = _root.ForContext("ScriptId", id).ForContext("VirtualTagId", virtualTagId)`
|
||
(+ `EquipmentId` when available) and pass into the `VirtualTagContext`. Same for the
|
||
alarm evaluator (binds `AlarmId`).
|
||
3. `ScriptLoggerFactory`: add a `Create(scriptId, virtualTagId?, alarmId?, equipmentId?)`
|
||
overload binding the standard properties (keep the old `Create(scriptName)` for
|
||
compatibility).
|
||
4. `Program.cs`: pass the root logger to both evaluator registrations.
|
||
5. Run + commit by path.
|
||
|
||
> Note: `IVirtualTagEvaluator.Evaluate` carries `virtualTagId`; in the live path
|
||
> `scriptId == virtualTagId`, so Layer 0 binds both from it. Threading a distinct
|
||
> `EquipmentId` (nice-to-have on the page) is optional here — if it requires an
|
||
> interface change, defer it to a Layer-1 follow-up rather than expanding T3.
|
||
|
||
---
|
||
|
||
### Task 4: Live-verify Layer 0
|
||
|
||
**Classification:** verification · **Parallelizable with:** none (depends T2, T3)
|
||
|
||
**Steps:**
|
||
1. Rebuild docker-dev central nodes (user-driven `/run`). Author a virtual tag whose
|
||
script calls `ctx.Logger.Information(...)`.
|
||
2. Open `/script-log`; drive the dependency so the script evaluates; confirm the line
|
||
appears live with the right ScriptId/level. Confirm Debug stays off the page,
|
||
Information+ shows.
|
||
3. **Agent does not sign in** — user signs in and drives. Record outcome. No code unless
|
||
a defect surfaces (→ new fix task).
|
||
|
||
---
|
||
|
||
# LAYER 1 — F9 engine runtime
|
||
|
||
### Task 5: `EquipmentScriptedAlarmPlan` + Phase7Composer enrichment
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 7, Task 8
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs` (new record +
|
||
build the enriched list from `ScriptedAlarm` + `Script` rows).
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/Phase7/` (or wherever Phase7Composer
|
||
is tested) — new `Phase7ComposerScriptedAlarmTests.cs`.
|
||
|
||
**Steps:**
|
||
1. Failing test: compose two equipments each with a scripted alarm referencing a script;
|
||
assert each `EquipmentScriptedAlarmPlan` carries the resolved `PredicateSource`,
|
||
extracted `DependencyRefs` (via `DependencyExtractor`), `AlarmType`, `Severity`,
|
||
`MessageTemplate`, `HistorizeToAveva`, `Retain`, `Enabled`, `Name`.
|
||
2. Add `public sealed record EquipmentScriptedAlarmPlan(string ScriptedAlarmId, string
|
||
EquipmentId, string Name, string AlarmType, int Severity, string MessageTemplate,
|
||
string PredicateScriptId, string PredicateSource, IReadOnlyList<string> DependencyRefs,
|
||
bool HistorizeToAveva, bool Retain, bool Enabled);`
|
||
3. In `Compose`: join `ScriptedAlarm.PredicateScriptId → Script.SourceCode`; run
|
||
`DependencyExtractor.Extract(source).Reads` (∪ `MessageTemplate` token paths) for
|
||
`DependencyRefs`; project into the new list on the composition result. Skip
|
||
`Enabled=false` alarms (or carry the flag — carry it; host decides). Drop alarms whose
|
||
script is missing with a structured warning (don't throw the whole compose).
|
||
4. Run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 6: DeploymentArtifact parity for the alarm plan
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 7, Task 8 (depends T5)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs`
|
||
(encode/decode `EquipmentScriptedAlarmPlan`; add
|
||
`Phase7CompositionResult.EquipmentScriptedAlarms`; filter-by-equipment like
|
||
`EquipmentVirtualTags` at :263).
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — artifact round-trip + parity
|
||
with the Composer for the same input.
|
||
|
||
**Steps:**
|
||
1. Failing test: build a composition via `Phase7Composer`, serialize to artifact, parse
|
||
back → `EquipmentScriptedAlarms` is byte-identical (same discipline as the `{{equip}}`
|
||
parity tests). Equipment-filter test (only alarms for resident equipment survive).
|
||
2. Add the field to `Phase7CompositionResult`; mirror the `EquipmentVirtualTags`
|
||
encode/decode/filter exactly (`:202`, `:263`).
|
||
3. Run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 7: `DependencyMuxTagUpstreamSource`
|
||
|
||
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 5, Task 6, Task 8
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/DependencyMuxTagUpstreamSource.cs`
|
||
(implements `Core.ScriptedAlarms`/`Core.VirtualTags` `ITagUpstreamSource`).
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/DependencyMuxTagUpstreamSourceTests.cs`
|
||
|
||
**Steps:**
|
||
1. Failing tests: `Push(path, snapshot)` updates cache so `ReadTag(path)` returns it;
|
||
`SubscribeTag(path, obs)` → `obs` fires on the next `Push`; `ReadTag` for an unknown
|
||
path returns a Bad-quality snapshot; dispose removes the observer.
|
||
2. Implement: a thread-safe cache (`ConcurrentDictionary<string, DataValueSnapshot>`) +
|
||
per-path observer list; `Push` updates cache then invokes observers; `ReadTag` reads
|
||
cache (Bad if absent); `SubscribeTag` returns an `IDisposable` that deregisters. The
|
||
host actor calls `Push` from its `DependencyValueChanged` handler. Value wrap:
|
||
`new DataValueSnapshot(value, StatusCode:0, ts, ts)`.
|
||
3. Run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 8: `EfAlarmConditionStateStore : IAlarmStateStore`
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 5, Task 6, Task 7
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/EfAlarmConditionStateStore.cs`
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/EfAlarmConditionStateStoreTests.cs`
|
||
(in-memory EF).
|
||
|
||
**Steps:**
|
||
1. Failing tests (in-memory `OtOpcUaConfigDbContext`): `SaveAsync(state)` then
|
||
`LoadAsync(alarmId)` round-trips Enabled/Acked/Confirmed/Shelving(+UnshelveAtUtc)/
|
||
LastAck*/LastConfirm*/Comments; `LoadAsync` of an unknown id → null; `ActiveState`
|
||
is **not** persisted (a saved state's Active is ignored on load — load returns the
|
||
stored operator state, Active defaults). Comments JSON round-trips.
|
||
2. Implement mapping `AlarmConditionState` ↔ `ScriptedAlarmState` entity (mirror
|
||
`EfAlarmActorStateStore`'s `IDbContextFactory` upsert pattern; serialize
|
||
`ImmutableList<AlarmComment>` ↔ `CommentsJson`). Map enum states ↔ the entity's string
|
||
columns.
|
||
3. Run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 9: `ScriptedAlarmHostActor`
|
||
|
||
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T6, T7, T8; needs Layer 0 T2/T3 root logger)
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs`
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
|
||
(Akka TestKit + a real engine with the fake upstream, or a fake engine seam).
|
||
|
||
**Design:** mirrors `VirtualTagHostActor`. Owns one `ScriptedAlarmEngine` (built with the
|
||
`DependencyMuxTagUpstreamSource`, the `EfAlarmConditionStateStore`, a `ScriptLoggerFactory`
|
||
wrapping the **Layer 0 root logger**, and the engine's root logger). Message
|
||
`ApplyScriptedAlarms(IReadOnlyList<EquipmentScriptedAlarmPlan> Plans)`.
|
||
|
||
**Steps:**
|
||
1. Failing TestKit tests:
|
||
- `ApplyScriptedAlarms` with one alarm → engine loaded (assert via a probe/seam);
|
||
registers interest with the (probe) mux for the alarm's dep refs.
|
||
- A `DependencyValueChanged` that makes the predicate true → the host tells the
|
||
(probe) `OpcUaPublishActor` a `WriteAlarmState(alarmId, active:true, …)`, tells the
|
||
(probe) historian an `AlarmHistorianEvent` (when `HistorizeToAveva`), and publishes
|
||
an `AlarmTransitionEvent` on `alerts`.
|
||
- Re-`ApplyScriptedAlarms` with a different set reloads the engine (LoadAsync replace).
|
||
2. Implement: on `ApplyScriptedAlarms`, build `ScriptedAlarmDefinition`s from the plans
|
||
(map `AlarmType`→`AlarmKind`, `Severity`→`AlarmSeverity`, `EquipmentId`→`EquipmentPath`),
|
||
`engine.LoadAsync`; register mux interest for `⋃ DependencyRefs`; on
|
||
`DependencyValueChanged` → `_upstream.Push(...)`. Subscribe `engine.OnEvent` once →
|
||
map `ScriptedAlarmEvent.Condition` to `(active, acknowledged)` →
|
||
`OpcUaPublishActor.WriteAlarmState`; map → `AlarmHistorianEvent` → historian (if
|
||
Historize); publish `AlarmTransitionEvent` on `alerts`. Dispose engine in `PostStop`.
|
||
3. Run targeted tests (`dotnet test --filter ScriptedAlarmHostActor`). Commit by path.
|
||
|
||
---
|
||
|
||
### Task 10: Spawn + apply in DriverHostActor
|
||
|
||
**Classification:** standard · **~4 min** · **Parallelizable with:** none (depends T9)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs`
|
||
(spawn `ScriptedAlarmHostActor` next to `VirtualTagHostActor` ~:197; tell
|
||
`ApplyScriptedAlarms(composition.EquipmentScriptedAlarms)` next to the vtag apply ~:532;
|
||
add an override field for tests like `_virtualTagHostOverride`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs` if the
|
||
host needs new injected deps (EF store factory, root logger, historian ref).
|
||
- Test: extend `DriverHostActorTests` — apply pushes `ApplyScriptedAlarms`.
|
||
|
||
**Steps:** mirror the VirtualTag spawn/apply exactly; thread `_opcUaPublishActor`,
|
||
`_dependencyMux`, the EF store, the root logger, the historian actor ref. Run + commit.
|
||
|
||
---
|
||
|
||
### Task 11: Retire the orphaned actor + F9b evaluator
|
||
|
||
**Classification:** small · **~3 min** · **Parallelizable with:** none (depends T9, T10)
|
||
|
||
**Files:**
|
||
- Delete: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`
|
||
(+ its tests) and `src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs`.
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs` (remove F9b DI registration,
|
||
lines ~110-114) and any `IScriptedAlarmEvaluator` references. Keep
|
||
`EfAlarmActorStateStore` only if nothing else uses it — otherwise delete with the actor.
|
||
|
||
**Steps:** delete, fix build, run full `dotnet test` for the touched projects, commit by
|
||
path. (If something unexpectedly depends on these, stop and surface — don't expand scope.)
|
||
|
||
---
|
||
|
||
### Task 12: Live-verify Layer 1
|
||
|
||
**Classification:** verification · **Parallelizable with:** none (depends T10, T11)
|
||
|
||
**Steps:** rebuild docker-dev; author a scripted alarm whose predicate references a live
|
||
tag; drive the tag; confirm the alarm node flips active/clear, the historian queue
|
||
advances (`/alarms/historian`), the `alerts`/Alerts page shows it, and predicate
|
||
`ctx.Logger` output appears on `/script-log`. User drives sign-in. Defects → new tasks.
|
||
|
||
---
|
||
|
||
# LAYER 2 — F14b real Part 9 + client ack
|
||
|
||
### Task 13: SDK research spike (DeepWiki)
|
||
|
||
**Classification:** small (research) · **~5 min** · **Parallelizable with:** Layer-1 tasks
|
||
|
||
**Steps:** Use the DeepWiki MCP (`OPCFoundation/UA-.NETStandard`) to confirm: how to
|
||
create + add an `AlarmConditionState` (and Limit/OffNormal/Discrete subtypes) under a
|
||
parent in a `CustomNodeManager2`; how to set ActiveState/AckedState/ConfirmedState/
|
||
ShelvingState/Severity/Retain; how transitions fire events (`ReportEvent`); how inbound
|
||
`Acknowledge`/`Shelve`/`Confirm` method calls are dispatched + where to hook them. Write
|
||
findings to `docs/v2/f14b-part9-sdk-notes.md` (committed). This de-risks T14-T17.
|
||
|
||
---
|
||
|
||
### Task 14: Real condition-node materialisation
|
||
|
||
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T13)
|
||
|
||
**Files:** `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` (replace the
|
||
placeholder `[active, ack]` variable in `WriteAlarmState` / add a `MaterialiseAlarmCondition`
|
||
path per `AlarmType`); `Phase7Applier.cs` (call the new materialiser); tests where the
|
||
SDK allows (node existence/type assertions).
|
||
|
||
**Steps:** create real condition nodes on materialise; keep `WriteAlarmState` as a thin
|
||
shim during transition or replace its callers. Run + commit. (SDK threading: all via the
|
||
pinned `OpcUaPublishActor` dispatcher.)
|
||
|
||
---
|
||
|
||
### Task 15: Richer alarm-state bridge
|
||
|
||
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 17 (depends T14, T9)
|
||
|
||
**Files:** `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs` (new message
|
||
carrying the full `AlarmConditionState`, not 2 bools); `ScriptedAlarmHostActor` bridge
|
||
(send the richer message); `OtOpcUaNodeManager` (apply full state to the condition).
|
||
Tests: message mapping.
|
||
|
||
---
|
||
|
||
### Task 16: Event firing on transition
|
||
|
||
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T14, T15)
|
||
|
||
**Files:** `OtOpcUaNodeManager.cs` (`condition.ReportEvent(...)` on state change). Tests:
|
||
mapping/coverage where feasible; behaviour proven in T19.
|
||
|
||
---
|
||
|
||
# LAYER 2 — inbound client ack/shelve (re-scoped 2026-06-11)
|
||
|
||
> **Status:** T0–T16 are **merged to `master`** (Layers 0+1 live-verified; Layer 2 Part-9
|
||
> nodes/state/events done). The original single **T17 "Inbound method dispatch + ack plumbing"**
|
||
> (`high-risk · ~5 min`) proved to be **four separate hard problems**, each its own task. After a
|
||
> 2026-06-11 deep dive into the real code, T17 is split into **T17–T20** and the old T18–T20 shift
|
||
> to **T21–T24** (old T19's Client.CLI work also grew a feature half, T22). This is the deferred
|
||
> "fresh piece": branch off the **current** `master` (`git switch -c feat/scriptlog-alarm-ack`),
|
||
> not the old `feat/scriptlog-alarm-runtime` base.
|
||
|
||
### Layer 2 design decisions (resolved in the re-scope deep dive)
|
||
|
||
These are the load-bearing findings the new tasks rest on — verified against the current code, not
|
||
the original recon's assumptions:
|
||
|
||
1. **Topology — same-node co-location, multi-node ownership.** The OPC UA SDK server (+
|
||
`OtOpcUaNodeManager`) and the `ScriptedAlarmHostActor` are **both spawned on every
|
||
driver-role node** in the same `ActorSystem` (`OtOpcUaServerHostedService` +
|
||
`DriverHostActor.SpawnScriptedAlarmHost`). So per node they're co-located and an in-process
|
||
`Tell` would reach the local host. **But** in a multi-driver-node cluster each node owns a
|
||
**disjoint** subset of alarms (its resident equipment, via the T6 artifact equipment-filter),
|
||
and a client connects to **one** node's server. Whether that node owns the alarm the client
|
||
acks is **not guaranteed**. **Decision:** route inbound commands over a new DPS topic
|
||
`alarm-commands` (mirrors `alerts`/`script-logs`), and have each `ScriptedAlarmHostActor`
|
||
**ignore commands for alarmIds its engine doesn't own**. This works same-node and cross-node
|
||
with one mechanism. (Open item to confirm in T18: whether each node's address space is
|
||
partitioned to its own equipment or replicated — if partitioned, a client can only ever ack
|
||
local alarms and the DPS broadcast is still correct, just always locally satisfied.)
|
||
|
||
2. **The node manager has no Akka handle and must stay that way.** `OtOpcUaNodeManager(server,
|
||
configuration)` (`OtOpcUaSdkServer.CreateMasterNodeManager`) holds **no** `IActorRef` /
|
||
`ActorSystem` / DI. The existing **forward** seam is `OpcUaPublishActor → IOpcUaAddressSpaceSink
|
||
(DeferredAddressSpaceSink → SdkAddressSpaceSink → node manager)`. For the **reverse** path the
|
||
node manager gets a **settable command-router delegate** (`Action<AlarmCommand>`), wired at
|
||
boot by `OtOpcUaServerHostedService` (which *does* have the DPS mediator) to publish onto
|
||
`alarm-commands`. The node manager itself never touches Akka.
|
||
|
||
3. **No explicit re-projection after an engine op.** Every `ScriptedAlarmEngine` op
|
||
(`AcknowledgeAsync`/`ConfirmAsync`/`OneShotShelveAsync`/`TimedShelveAsync`/`UnshelveAsync`/
|
||
`EnableAsync`/`DisableAsync`/`AddCommentAsync` — all exist, signatures verified) raises the
|
||
engine's `OnEvent`, which the host's **existing** `OnEngineEmission` already projects to the
|
||
node. So the inbound handler just calls the op and awaits — the ack visibly updates the node
|
||
for free. This makes T19 small.
|
||
|
||
4. **Roles are dropped at the impersonation seam.** `OpcUaApplicationHost.cs:292` does
|
||
`args.Identity = new UserIdentity(token)` and **discards** `result.Roles` (only logs them at
|
||
:293). `OpcUaUserAuthResult.Roles` is `IReadOnlyList<string>` (`ReadOnly`/`WriteOperate`/
|
||
`WriteTune`/`WriteConfigure`/`AlarmAck`); there is an `OpcUaOperation` enum
|
||
(`Core.Abstractions`) with `AlarmAcknowledge`/`AlarmConfirm`/`AlarmShelve`, but **no role is
|
||
consulted anywhere post-auth today** (writes aren't gated either — this is greenfield, not a
|
||
pattern to copy). **Risk (drives T17 being its own task):** it is **unconfirmed** that a custom
|
||
`UserIdentity` subclass survives the SDK round-trip back to `context.UserIdentity` inside a
|
||
method handler. T17 must *prove* the round-trip (integration assertion); fallback is populating
|
||
`GrantedRoleIds` (`NodeIdCollection`) by mapping role strings → role NodeIds, which is more work.
|
||
|
||
5. **Double-emit is real, and delta-gating resolves it.** `WriteAlarmCondition` calls
|
||
`ReportConditionEvent` **unconditionally** (`OtOpcUaNodeManager.cs:156`); the node manager keeps
|
||
**no** previous snapshot. Once inbound acks route through the engine, the SDK's own
|
||
`OnAcknowledgeCalled` auto-fires event E2 (applying acked state to the node) **and** the engine
|
||
round-trip re-projects → would fire E3. Because the SDK applies the acked state *before* the
|
||
async engine round-trip completes, **delta-gating `WriteAlarmCondition` against the node's
|
||
current state** suppresses E3 (no delta) while still firing on genuine engine-driven
|
||
transitions. That's T20. (Fallback if it proves racy: the correlation-suppression option already
|
||
sketched in the `:190-198` in-code note — skip engine re-projection for inbound-originated
|
||
transitions.)
|
||
|
||
---
|
||
|
||
### Task 17: Carry LDAP roles onto the OPC UA session identity
|
||
|
||
**Classification:** high-risk · **~5 min** · **Parallelizable with:** Task 22
|
||
|
||
**Files:**
|
||
- Create: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Security/RoleCarryingUserIdentity.cs`
|
||
(`: UserIdentity`, adds `IReadOnlyList<string> Roles`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs:292`
|
||
(`args.Identity = new RoleCarryingUserIdentity(token, result.Roles)`).
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostImpersonationTests.cs`
|
||
(existing home for `HandleImpersonation`).
|
||
|
||
**Steps:**
|
||
1. **Round-trip spike FIRST (de-risk the whole task).** Before building anything, confirm the SDK
|
||
preserves a custom `IUserIdentity` instance: in a booted in-process server test (mirror
|
||
`SdkAddressSpaceSinkTests`' server fixture), set `args.Identity` to a sentinel subclass during
|
||
impersonation and assert a method handler reads it back via
|
||
`(context as ISessionOperationContext)?.UserIdentity` **as that subclass**. If it does NOT
|
||
survive (SDK wraps/strips it), STOP and switch to the `GrantedRoleIds` approach — surface this
|
||
as a scope change, don't silently expand.
|
||
2. Failing unit test: `HandleImpersonation` on a successful auth sets `args.Identity` to a
|
||
`RoleCarryingUserIdentity` whose `Roles` equals `result.Roles` (and the existing
|
||
identity/denial/anonymous tests still pass).
|
||
3. Implement `RoleCarryingUserIdentity` + the one-line `:292` swap.
|
||
4. Run (`OpcUaServer.Tests` + the impersonation tests) + commit by path.
|
||
|
||
> Security-path change → high-risk. Touches only the identity construction; no auth-decision logic
|
||
> changes (roles were already resolved, just discarded). Do **not** change `IOpcUaUserAuthenticator`
|
||
> or the LDAP bind.
|
||
|
||
---
|
||
|
||
### Task 18: Node-manager command router + `AlarmAck` veto gate + `alarm-commands` topic
|
||
|
||
**Classification:** high-risk · **~5 min** · **Parallelizable with:** none (depends T17)
|
||
|
||
**Files:**
|
||
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/AlarmCommand.cs`
|
||
(`record AlarmCommand(string AlarmId, string Operation, string User, string? Comment, DateTime? UnshelveAtUtc)`;
|
||
`Operation` ∈ Acknowledge/Confirm/OneShotShelve/TimedShelve/Unshelve/Enable/Disable/AddComment).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — add a settable
|
||
`Action<AlarmCommand>? AlarmCommandRouter`; in `MaterialiseAlarmCondition` (after `Create` +
|
||
initial state, before `AddChild`) wire `alarm.OnAcknowledge`/`OnConfirm`/`OnAddComment`/
|
||
`OnShelve`/`OnTimedUnshelve`. Each delegate: (a) read principal via
|
||
`(context as ISessionOperationContext)?.UserIdentity as RoleCarryingUserIdentity`, **gate on
|
||
`AlarmAck`** → return `StatusCodes.BadUserAccessDenied` if absent; (b) invoke `AlarmCommandRouter`
|
||
with the mapped `AlarmCommand` (so the engine updates the **domain** store + audit + alerts
|
||
historization); (c) return `ServiceResult.Good` so the SDK applies node state + auto-fires (the
|
||
engine re-projection is de-duped in T20).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs` (pass-through to expose
|
||
the router setter on the node manager).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs` (after the server
|
||
starts + node manager exists: resolve the DPS mediator, set the router to
|
||
`mediator.Tell(new Publish(ScriptedAlarmHostActor.AlarmCommandsTopic, cmd))`).
|
||
- Add the topic const `AlarmCommandsTopic = "alarm-commands"` on `ScriptedAlarmHostActor` (used here
|
||
and in T19).
|
||
- Test: `OpcUaServer.Tests` — veto gate allows with `AlarmAck` / denies without (drive a wired
|
||
condition's `OnAcknowledge` with a `RoleCarryingUserIdentity` context); router invoked with the
|
||
correctly-mapped `AlarmCommand` (fake `Action`).
|
||
|
||
**Steps:** TDD the gate + router-mapping in the node manager; then the SDK-server pass-through; then
|
||
the hosted-service wiring (no unit test for the boot wiring — exercised by T23 live-verify). Commit
|
||
by path. **Serialize with T20** (both touch `OtOpcUaNodeManager.cs`).
|
||
|
||
---
|
||
|
||
### Task 19: `ScriptedAlarmHostActor` inbound command handler
|
||
|
||
**Classification:** standard · **~4 min** · **Parallelizable with:** Task 20 (depends T18)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs` —
|
||
subscribe to `AlarmCommandsTopic` in `PreStart` (alongside the existing `_mediator` use);
|
||
`Receive<AlarmCommand>(OnAlarmCommand)`; `OnAlarmCommand` is `async void`, switches on
|
||
`Operation` → the matching `engine.<Op>Async(AlarmId, User, …, CancellationToken.None)`.
|
||
**Ownership filter:** if the engine doesn't own `AlarmId`, no-op (multi-node broadcast). Catch +
|
||
log op failures (mirror `OnLoadFailed`). **No explicit re-projection** — the engine's `OnEvent`
|
||
drives the existing `OnEngineEmission` → node update.
|
||
- Test: `tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs`
|
||
(extend) — TestKit: an `AlarmCommand{Operation="Acknowledge"}` for a loaded alarm calls the
|
||
engine's `AcknowledgeAsync` (fake/probe engine seam); an unknown `AlarmId` is ignored;
|
||
`TimedShelve` without `UnshelveAtUtc` is rejected/logged, not thrown.
|
||
|
||
**Steps:** TDD via the existing host-actor test seam; run `dotnet test --filter ScriptedAlarmHostActor`;
|
||
commit by path.
|
||
|
||
---
|
||
|
||
### Task 20: Delta-gate event firing (kill the inbound double-emit)
|
||
|
||
**Classification:** high-risk · **~4 min** · **Parallelizable with:** Task 19 (depends T18)
|
||
|
||
**Files:**
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs` — keep a
|
||
`ConcurrentDictionary<string, AlarmConditionSnapshot> _lastAlarmState`; in `WriteAlarmCondition`,
|
||
after projecting, compare the new `state` to the stored snapshot and **only call
|
||
`ReportConditionEvent` when it differs** (then store it). Replace the now-stale `:151-156`
|
||
"fire exactly one event" comment + tighten the `:190-198` double-emit note to "resolved by
|
||
delta-gate".
|
||
- Test: `tests/.../OpcUaServer.Tests/SdkAddressSpaceSinkTests.cs` (extend) — two identical
|
||
`WriteAlarmCondition` calls fire the condition event **once**; a changed call fires again.
|
||
(Assert via an event-count probe / monitored-item on the booted server fixture.)
|
||
|
||
**Steps:** TDD the delta-gate; run `OpcUaServer.Tests`; commit by path. If the booted-server test
|
||
can't cleanly count events, fall back to asserting the gate's decision via a seam and prove
|
||
end-to-end in T23. **Serialize after T18** (same file).
|
||
|
||
---
|
||
|
||
### Task 21: AdminUI ack/shelve control
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** none (depends T19)
|
||
|
||
**Files:**
|
||
- Create: `src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/AcknowledgeAlarmCommand.cs` +
|
||
`ShelveAlarmCommand.cs` (control-plane messages, mirror `StartDeployment`'s shape with a
|
||
`CorrelationId`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs`
|
||
(the existing admin-pinned cluster singleton) — `ReceiveAsync` handlers that publish onto
|
||
`alarm-commands` (reusing T18's topic + the host's ownership filter → the singleton solves
|
||
cross-node routing for the AdminUI path too).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminOperationsClient.cs`
|
||
(+ its interface) — `AcknowledgeAlarmAsync` / `ShelveAlarmAsync` (mirror `StartDeploymentAsync`).
|
||
- Modify: `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Alerts.razor` — per-row
|
||
Acknowledge / Shelve buttons → `IAdminOperationsClient`; operator from
|
||
`AuthState … User.Identity?.Name`.
|
||
- Test: the control-plane command service + the new `AdminOperationsActor` handlers (TestKit / Ask).
|
||
**No bUnit** — the razor is proven in T23.
|
||
|
||
**Steps:** TDD the messages + actor handlers + client; wire the razor; run + commit by path.
|
||
|
||
---
|
||
|
||
### Task 22: Client.CLI ack / confirm / shelve commands
|
||
|
||
**Classification:** standard · **~5 min** · **Parallelizable with:** Task 17–T21 (disjoint `Client.*`)
|
||
|
||
**Files:**
|
||
- Modify: `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/IOpcUaClientService.cs` +
|
||
`OpcUaClientService.cs` — `AcknowledgeAlarmAsync` already declared (no command wires it yet); add
|
||
`ConfirmAlarmAsync` + `ShelveAlarmAsync` (call the SDK `Confirm`/`OneShotShelve`/`TimedShelve`/
|
||
`Unshelve` methods on the condition).
|
||
- Create: `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI/Commands/{Acknowledge,Confirm,Shelve}Command.cs`
|
||
(`--node`, `--event-id`, `--comment`; shelve adds `--kind OneShot|Timed --unshelve-at`).
|
||
- Test: unit-test what's pure (arg→request mapping); the live round-trip is T23.
|
||
|
||
**Steps:** add the service methods + CLI commands; build; commit by path. This is **net-new client
|
||
feature work** (the reason old "T19 verification" couldn't just be a verify pass).
|
||
|
||
---
|
||
|
||
### Task 23: Live-verify Layer 2 end-to-end
|
||
|
||
**Classification:** verification · **Parallelizable with:** none (depends T18, T19, T20, T21, T22)
|
||
|
||
**Steps:** docker-dev up. Use the **already-deployed `t12-overheat`** alarm (rig state, below) as the
|
||
live condition. With `Client.CLI`: `alarms --refresh` shows the real condition; drive
|
||
`TestMachine_002.TestChangingInt` past the predicate so it fires an event on transition; call the new
|
||
`acknowledge` command → confirm the ack **round-trips** (node `AckedState` flips, exactly **one** ack
|
||
event fires — no double-emit, T20 — state **persists** across a node restart). Repeat the ack from
|
||
the AdminUI `/alerts` buttons (T21) and confirm parity. Verify the `AlarmAck` gate: a user **without**
|
||
`AlarmAck` is denied (`BadUserAccessDenied`). **Agent does not sign in** — user drives. Defects → new
|
||
fix tasks.
|
||
|
||
---
|
||
|
||
### Task 24: Docs + cleanup + finish branch
|
||
|
||
**Classification:** small · **~5 min** · **Parallelizable with:** none (depends all)
|
||
|
||
**Files:** update `docs/ScriptedAlarms.md`, `docs/VirtualTags.md`, `docs/v2/Runtime.md`,
|
||
`docs/AlarmTracking.md` (the inbound-ack + `AlarmAck`-gate flow is now real); **correct the stale
|
||
`docs/v2/phase-7-status.md` alarm-runtime status**; CLAUDE.md note. **Clean up the deliberately-left
|
||
rig artifacts** (`t12-overheat`, script `SC-ba675b168a85`, the `layer0-logcheck` vtag, and revert
|
||
filler-02's inert `cycle-time-s` logger line — redeploy). Delete/condense `resume.md` + `pending.md`.
|
||
Then run superpowers-extended-cc:finishing-a-development-branch (full `dotnet test`, merge to master).
|
||
|
||
---
|
||
|
||
## Execution notes
|
||
|
||
- **Parallel dispatch (Layers 0+1, done):** Layer 0 serial (T1→T2→T3→T4). Layer 1: **T5→T6** serial;
|
||
**T7, T8 parallel** with T5/T6; T9 waits on T6/T7/T8; T10→T11→T12 serial.
|
||
- **Parallel dispatch (Layer 2 remainder, T17–T24):**
|
||
- **T17** first (roles) — its **step-1 round-trip spike is a go/no-go gate** for the gate design.
|
||
- **T18** after T17 (the veto gate needs the roles). **T19 ∥ T20** after T18 (disjoint files:
|
||
`ScriptedAlarmHostActor.cs` vs `OtOpcUaNodeManager.cs`).
|
||
- **T22 runs in parallel with the whole T17–T21 server chain** (only `Client.*` files).
|
||
- **T21** after T19. **T23** after T18/T19/T20/T21/T22. **T24** last.
|
||
- **One writer at a time** within a shared file: `OtOpcUaNodeManager.cs` is touched by **T18 and
|
||
T20 — serialize T18 → T20**. (Layers 0/1 already merged, so Program.cs/T14-T16 contention is moot.)
|
||
- **Layer boundaries are natural checkpoints** — Layers 0+1 shipped; the T17 round-trip spike is the
|
||
next gate before committing to the rest of the Layer 2 inbound epic.
|