Files
lmxopcua/docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime.md
T
Joseph Doherty 2b890fa716 docs(plan): re-scope script-log Layer 2 inbound-ack into T17-T24
The original single T17 (inbound method dispatch + ack plumbing) proved on a
2026-06-11 deep dive to be four hard problems: roles on the session identity
(T17), node-manager command router + AlarmAck veto + alarm-commands DPS topic
(T18), host-actor inbound handler (T19), and delta-gate double-emit (T20). Old
T18->T21 (AdminUI), old T19 split into T22 (Client.CLI feature) + T23 (verify),
old T20->T24. Adds the Layer 2 design-decisions preamble.
2026-06-11 05:37:54 -04:00

36 KiB
Raw Blame History

Script-log Engine Emit + Scripted-Alarm Runtime — Implementation Plan

For Claude: REQUIRED SUB-SKILL: use superpowers-extended-cc:subagent-driven-development (or executing-plans) to implement this plan task-by-task.

Goal: Make the Script-log page tail real script output, and stand up scripted alarms end-to-end including real OPC UA Part 9 condition nodes + client ack.

Architecture: Three sequenced layers off one shared seam (a root script logger fanning to file + companion + a new DPS topic sink). Layer 0 = emit (F8 live). Layer 1 = F9 engine runtime on the Akka equipment-namespace runtime. Layer 2 = F14b real Part 9 nodes + events + inbound ack. Design: docs/plans/2026-06-10-script-log-and-scripted-alarm-runtime-design.md. Verified gap analysis: pending.md.

Tech: .NET 10, Akka.NET, EF Core (SQL prod / InMemory tests), Serilog, OPC Foundation UA .NET Standard, xUnit + Shouldly, Akka TestKit. No bUnit.

Hard rules (every task): stage by explicit path — never git add .; never stage sql_login.txt or src/Server/ZB.MOM.WW.OtOpcUa.Host/pki/; never echo the gateway API key into a new tracked file; no force-push, no --no-verify. No Configuration entity / EF migration change (ScriptedAlarmState table already exists). Agent does not sign in to the AdminUI — the user drives live /run.

Branch: feat/scriptlog-alarm-runtime off master @ df4c2657 (design committed there).

Reference patterns to mirror: VirtualTagHostActor (host actor shape), EfAlarmActorStateStore (EF store shape), the {{equip}} two-seam parity work (Phase7ComposerDeploymentArtifact), RoslynVirtualTagEvaluator (evaluator).


LAYER 0 — Shared script-log emit + F8 live

Task 0: Branch + test-project check

Classification: small · ~2 min · Parallelizable with: none

Files: none created (branch + verification only)

Steps:

  1. git switch -c feat/scriptlog-alarm-runtime (off master @ df4c2657).
  2. Confirm tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ exists and is in the .slnx (it does — ScriptLoggerFactoryTests.cs lives there). New Layer-0 tests land here. Confirm tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ for Layer-1 tests.
  3. dotnet build ZB.MOM.WW.OtOpcUa.slnx — green baseline. Commit nothing.

Task 1: IScriptLogPublisher + ScriptLogTopicSink

Classification: standard · ~4 min · Parallelizable with: none

Files:

  • Create: src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/IScriptLogPublisher.cs
  • Create: src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLogTopicSink.cs
  • Test: tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptLogTopicSinkTests.cs
  • Maybe modify: Core.Scripting.csproj (add ProjectReference to Commons for ScriptLogEntry if not already referenced — verify first).

Step 1 — failing tests (ScriptLogTopicSinkTests):

  • A LogEvent (Information) with properties ScriptId="S1", VirtualTagId="V1", EquipmentId="EQ1", message "hello" → publisher receives one ScriptLogEntry with those fields, Level=="Information", Message=="hello".
  • AlarmId property maps to ScriptLogEntry.AlarmId; absent properties → null fields.
  • A Debug event with default minLevel=Information → publisher receives nothing.
  • Template message renders ("v={V}" + prop V=3 → "v=3"). Use a fake IScriptLogPublisher capturing entries.

Step 2 — run, expect fail (types don't exist).

Step 3 — implement:

public interface IScriptLogPublisher { void Publish(ScriptLogEntry entry); }

public sealed class ScriptLogTopicSink : ILogEventSink
{
    private readonly IScriptLogPublisher _publisher;
    private readonly LogEventLevel _min;
    public ScriptLogTopicSink(IScriptLogPublisher publisher,
        LogEventLevel min = LogEventLevel.Information) { _publisher = publisher; _min = min; }
    public void Emit(LogEvent e)
    {
        if (e is null || e.Level < _min) return;
        string? P(string k) => e.Properties.TryGetValue(k, out var v)
            && v is ScalarValue { Value: string s } ? s : null;
        _publisher.Publish(new ScriptLogEntry(
            ScriptId: P("ScriptId") ?? P("ScriptName") ?? "unknown",
            Level: e.Level.ToString(),
            Message: e.RenderMessage(),
            TimestampUtc: e.Timestamp.UtcDateTime,
            VirtualTagId: P("VirtualTagId"), AlarmId: P("AlarmId"), EquipmentId: P("EquipmentId")));
    }
}

(Property-name constants — reuse/extend ScriptLoggerFactory's ScriptNameProperty; add ScriptIdProperty/VirtualTagIdProperty/AlarmIdProperty/EquipmentIdProperty.)

Step 4 — run tests, expect pass. Step 5 — commit (git add the 3 files by path).


Task 2: Root script logger + DpsScriptLogPublisher + Host wiring

Classification: standard · ~5 min · Parallelizable with: none (depends T1)

Files:

  • Create: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Scripting/DpsScriptLogPublisher.cs (or Host — wherever the ActorSystem/Mediator is reachable at construction).
  • Create: src/Server/ZB.MOM.WW.OtOpcUa.Host/Logging/ScriptRootLoggerFactory.cs (builds the root ILogger: rolling scripts-*.log + ScriptLogCompanionSink + ScriptLogTopicSink).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs (build + register root logger; register IScriptLogPublisher).
  • Test: tests/.../Core.Scripting.Tests/ScriptRootLoggerFanoutTests.cs (or Host.Tests).

Steps (TDD):

  1. Failing test: a logger built by ScriptRootLoggerFactory with a fake publisher + in-memory companion → an Error event reaches the companion mirror AND the topic publisher; a Debug event reaches neither topic nor companion (file only). (Assert via fakes; don't assert the physical file.)
  2. Implement DpsScriptLogPublisher — ctor takes the DPS mediator IActorRef (or ActorSystem); Publishmediator.Tell(new Publish("script-logs", entry)) (topic constant VirtualTagActor.ScriptLogsTopic).
  3. Implement ScriptRootLoggerFactory.Build(IScriptLogPublisher, config)LoggerConfiguration().WriteTo.File(...).WriteTo.Sink(new ScriptLogCompanionSink(Log.Logger)) .WriteTo.Sink(new ScriptLogTopicSink(publisher, minLevel)).CreateLogger().
  4. Program.cs: resolve the mediator after the ActorSystem is up; register IScriptLogPublisher (singleton) + the root ILogger (keyed/named for scripts). Min-level from config (Scripting:LogTopicMinLevel, default Information).
  5. Run + commit by path.

Task 3: Rewire evaluators to the root script logger

Classification: standard · ~5 min · Parallelizable with: none (depends T1, T2)

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynVirtualTagEvaluator.cs
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs
  • Modify: src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting/ScriptLoggerFactory.cs (bind the full property set, not just ScriptName).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs (inject root logger into the evaluators).
  • Test: tests/.../Core.Scripting.Tests/ — evaluator emits via fake publisher.

Steps:

  1. Failing test: a RoslynVirtualTagEvaluator built with a root logger wired to a fake publisher; evaluate a script ctx.Logger.Information("hi"); return 1; → publisher gets one entry with ScriptId/VirtualTagId bound and Message=="hi".
  2. Replace the static ScriptLogger field with a ctor-injected root ILogger. Per evaluation, var log = _root.ForContext("ScriptId", id).ForContext("VirtualTagId", virtualTagId) (+ EquipmentId when available) and pass into the VirtualTagContext. Same for the alarm evaluator (binds AlarmId).
  3. ScriptLoggerFactory: add a Create(scriptId, virtualTagId?, alarmId?, equipmentId?) overload binding the standard properties (keep the old Create(scriptName) for compatibility).
  4. Program.cs: pass the root logger to both evaluator registrations.
  5. Run + commit by path.

Note: IVirtualTagEvaluator.Evaluate carries virtualTagId; in the live path scriptId == virtualTagId, so Layer 0 binds both from it. Threading a distinct EquipmentId (nice-to-have on the page) is optional here — if it requires an interface change, defer it to a Layer-1 follow-up rather than expanding T3.


Task 4: Live-verify Layer 0

Classification: verification · Parallelizable with: none (depends T2, T3)

Steps:

  1. Rebuild docker-dev central nodes (user-driven /run). Author a virtual tag whose script calls ctx.Logger.Information(...).
  2. Open /script-log; drive the dependency so the script evaluates; confirm the line appears live with the right ScriptId/level. Confirm Debug stays off the page, Information+ shows.
  3. Agent does not sign in — user signs in and drives. Record outcome. No code unless a defect surfaces (→ new fix task).

LAYER 1 — F9 engine runtime

Task 5: EquipmentScriptedAlarmPlan + Phase7Composer enrichment

Classification: standard · ~5 min · Parallelizable with: Task 7, Task 8

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs (new record + build the enriched list from ScriptedAlarm + Script rows).
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Server.Tests/Phase7/ (or wherever Phase7Composer is tested) — new Phase7ComposerScriptedAlarmTests.cs.

Steps:

  1. Failing test: compose two equipments each with a scripted alarm referencing a script; assert each EquipmentScriptedAlarmPlan carries the resolved PredicateSource, extracted DependencyRefs (via DependencyExtractor), AlarmType, Severity, MessageTemplate, HistorizeToAveva, Retain, Enabled, Name.
  2. Add public sealed record EquipmentScriptedAlarmPlan(string ScriptedAlarmId, string EquipmentId, string Name, string AlarmType, int Severity, string MessageTemplate, string PredicateScriptId, string PredicateSource, IReadOnlyList<string> DependencyRefs, bool HistorizeToAveva, bool Retain, bool Enabled);
  3. In Compose: join ScriptedAlarm.PredicateScriptId → Script.SourceCode; run DependencyExtractor.Extract(source).Reads ( MessageTemplate token paths) for DependencyRefs; project into the new list on the composition result. Skip Enabled=false alarms (or carry the flag — carry it; host decides). Drop alarms whose script is missing with a structured warning (don't throw the whole compose).
  4. Run + commit by path.

Task 6: DeploymentArtifact parity for the alarm plan

Classification: standard · ~5 min · Parallelizable with: Task 7, Task 8 (depends T5)

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DeploymentArtifact.cs (encode/decode EquipmentScriptedAlarmPlan; add Phase7CompositionResult.EquipmentScriptedAlarms; filter-by-equipment like EquipmentVirtualTags at :263).
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ — artifact round-trip + parity with the Composer for the same input.

Steps:

  1. Failing test: build a composition via Phase7Composer, serialize to artifact, parse back → EquipmentScriptedAlarms is byte-identical (same discipline as the {{equip}} parity tests). Equipment-filter test (only alarms for resident equipment survive).
  2. Add the field to Phase7CompositionResult; mirror the EquipmentVirtualTags encode/decode/filter exactly (:202, :263).
  3. Run + commit by path.

Task 7: DependencyMuxTagUpstreamSource

Classification: standard · ~4 min · Parallelizable with: Task 5, Task 6, Task 8

Files:

  • Create: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/DependencyMuxTagUpstreamSource.cs (implements Core.ScriptedAlarms/Core.VirtualTags ITagUpstreamSource).
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/DependencyMuxTagUpstreamSourceTests.cs

Steps:

  1. Failing tests: Push(path, snapshot) updates cache so ReadTag(path) returns it; SubscribeTag(path, obs)obs fires on the next Push; ReadTag for an unknown path returns a Bad-quality snapshot; dispose removes the observer.
  2. Implement: a thread-safe cache (ConcurrentDictionary<string, DataValueSnapshot>) + per-path observer list; Push updates cache then invokes observers; ReadTag reads cache (Bad if absent); SubscribeTag returns an IDisposable that deregisters. The host actor calls Push from its DependencyValueChanged handler. Value wrap: new DataValueSnapshot(value, StatusCode:0, ts, ts).
  3. Run + commit by path.

Task 8: EfAlarmConditionStateStore : IAlarmStateStore

Classification: standard · ~5 min · Parallelizable with: Task 5, Task 6, Task 7

Files:

  • Create: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/EfAlarmConditionStateStore.cs
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/EfAlarmConditionStateStoreTests.cs (in-memory EF).

Steps:

  1. Failing tests (in-memory OtOpcUaConfigDbContext): SaveAsync(state) then LoadAsync(alarmId) round-trips Enabled/Acked/Confirmed/Shelving(+UnshelveAtUtc)/ LastAck*/LastConfirm*/Comments; LoadAsync of an unknown id → null; ActiveState is not persisted (a saved state's Active is ignored on load — load returns the stored operator state, Active defaults). Comments JSON round-trips.
  2. Implement mapping AlarmConditionStateScriptedAlarmState entity (mirror EfAlarmActorStateStore's IDbContextFactory upsert pattern; serialize ImmutableList<AlarmComment>CommentsJson). Map enum states ↔ the entity's string columns.
  3. Run + commit by path.

Task 9: ScriptedAlarmHostActor

Classification: high-risk · ~5 min · Parallelizable with: none (depends T6, T7, T8; needs Layer 0 T2/T3 root logger)

Files:

  • Create: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs (Akka TestKit + a real engine with the fake upstream, or a fake engine seam).

Design: mirrors VirtualTagHostActor. Owns one ScriptedAlarmEngine (built with the DependencyMuxTagUpstreamSource, the EfAlarmConditionStateStore, a ScriptLoggerFactory wrapping the Layer 0 root logger, and the engine's root logger). Message ApplyScriptedAlarms(IReadOnlyList<EquipmentScriptedAlarmPlan> Plans).

Steps:

  1. Failing TestKit tests:
    • ApplyScriptedAlarms with one alarm → engine loaded (assert via a probe/seam); registers interest with the (probe) mux for the alarm's dep refs.
    • A DependencyValueChanged that makes the predicate true → the host tells the (probe) OpcUaPublishActor a WriteAlarmState(alarmId, active:true, …), tells the (probe) historian an AlarmHistorianEvent (when HistorizeToAveva), and publishes an AlarmTransitionEvent on alerts.
    • Re-ApplyScriptedAlarms with a different set reloads the engine (LoadAsync replace).
  2. Implement: on ApplyScriptedAlarms, build ScriptedAlarmDefinitions from the plans (map AlarmTypeAlarmKind, SeverityAlarmSeverity, EquipmentIdEquipmentPath), engine.LoadAsync; register mux interest for DependencyRefs; on DependencyValueChanged_upstream.Push(...). Subscribe engine.OnEvent once → map ScriptedAlarmEvent.Condition to (active, acknowledged)OpcUaPublishActor.WriteAlarmState; map → AlarmHistorianEvent → historian (if Historize); publish AlarmTransitionEvent on alerts. Dispose engine in PostStop.
  3. Run targeted tests (dotnet test --filter ScriptedAlarmHostActor). Commit by path.

Task 10: Spawn + apply in DriverHostActor

Classification: standard · ~4 min · Parallelizable with: none (depends T9)

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Drivers/DriverHostActor.cs (spawn ScriptedAlarmHostActor next to VirtualTagHostActor ~:197; tell ApplyScriptedAlarms(composition.EquipmentScriptedAlarms) next to the vtag apply ~:532; add an override field for tests like _virtualTagHostOverride).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ServiceCollectionExtensions.cs if the host needs new injected deps (EF store factory, root logger, historian ref).
  • Test: extend DriverHostActorTests — apply pushes ApplyScriptedAlarms.

Steps: mirror the VirtualTag spawn/apply exactly; thread _opcUaPublishActor, _dependencyMux, the EF store, the root logger, the historian actor ref. Run + commit.


Task 11: Retire the orphaned actor + F9b evaluator

Classification: small · ~3 min · Parallelizable with: none (depends T9, T10)

Files:

  • Delete: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs (+ its tests) and src/Server/ZB.MOM.WW.OtOpcUa.Host/Engines/RoslynScriptedAlarmEvaluator.cs.
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/Program.cs (remove F9b DI registration, lines ~110-114) and any IScriptedAlarmEvaluator references. Keep EfAlarmActorStateStore only if nothing else uses it — otherwise delete with the actor.

Steps: delete, fix build, run full dotnet test for the touched projects, commit by path. (If something unexpectedly depends on these, stop and surface — don't expand scope.)


Task 12: Live-verify Layer 1

Classification: verification · Parallelizable with: none (depends T10, T11)

Steps: rebuild docker-dev; author a scripted alarm whose predicate references a live tag; drive the tag; confirm the alarm node flips active/clear, the historian queue advances (/alarms/historian), the alerts/Alerts page shows it, and predicate ctx.Logger output appears on /script-log. User drives sign-in. Defects → new tasks.


LAYER 2 — F14b real Part 9 + client ack

Task 13: SDK research spike (DeepWiki)

Classification: small (research) · ~5 min · Parallelizable with: Layer-1 tasks

Steps: Use the DeepWiki MCP (OPCFoundation/UA-.NETStandard) to confirm: how to create + add an AlarmConditionState (and Limit/OffNormal/Discrete subtypes) under a parent in a CustomNodeManager2; how to set ActiveState/AckedState/ConfirmedState/ ShelvingState/Severity/Retain; how transitions fire events (ReportEvent); how inbound Acknowledge/Shelve/Confirm method calls are dispatched + where to hook them. Write findings to docs/v2/f14b-part9-sdk-notes.md (committed). This de-risks T14-T17.


Task 14: Real condition-node materialisation

Classification: high-risk · ~5 min · Parallelizable with: none (depends T13)

Files: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs (replace the placeholder [active, ack] variable in WriteAlarmState / add a MaterialiseAlarmCondition path per AlarmType); Phase7Applier.cs (call the new materialiser); tests where the SDK allows (node existence/type assertions).

Steps: create real condition nodes on materialise; keep WriteAlarmState as a thin shim during transition or replace its callers. Run + commit. (SDK threading: all via the pinned OpcUaPublishActor dispatcher.)


Task 15: Richer alarm-state bridge

Classification: standard · ~4 min · Parallelizable with: Task 17 (depends T14, T9)

Files: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/OpcUa/OpcUaPublishActor.cs (new message carrying the full AlarmConditionState, not 2 bools); ScriptedAlarmHostActor bridge (send the richer message); OtOpcUaNodeManager (apply full state to the condition). Tests: message mapping.


Task 16: Event firing on transition

Classification: high-risk · ~5 min · Parallelizable with: none (depends T14, T15)

Files: OtOpcUaNodeManager.cs (condition.ReportEvent(...) on state change). Tests: mapping/coverage where feasible; behaviour proven in T19.


LAYER 2 — inbound client ack/shelve (re-scoped 2026-06-11)

Status: T0T16 are merged to master (Layers 0+1 live-verified; Layer 2 Part-9 nodes/state/events done). The original single T17 "Inbound method dispatch + ack plumbing" (high-risk · ~5 min) proved to be four separate hard problems, each its own task. After a 2026-06-11 deep dive into the real code, T17 is split into T17T20 and the old T18T20 shift to T21T24 (old T19's Client.CLI work also grew a feature half, T22). This is the deferred "fresh piece": branch off the current master (git switch -c feat/scriptlog-alarm-ack), not the old feat/scriptlog-alarm-runtime base.

Layer 2 design decisions (resolved in the re-scope deep dive)

These are the load-bearing findings the new tasks rest on — verified against the current code, not the original recon's assumptions:

  1. Topology — same-node co-location, multi-node ownership. The OPC UA SDK server (+ OtOpcUaNodeManager) and the ScriptedAlarmHostActor are both spawned on every driver-role node in the same ActorSystem (OtOpcUaServerHostedService + DriverHostActor.SpawnScriptedAlarmHost). So per node they're co-located and an in-process Tell would reach the local host. But in a multi-driver-node cluster each node owns a disjoint subset of alarms (its resident equipment, via the T6 artifact equipment-filter), and a client connects to one node's server. Whether that node owns the alarm the client acks is not guaranteed. Decision: route inbound commands over a new DPS topic alarm-commands (mirrors alerts/script-logs), and have each ScriptedAlarmHostActor ignore commands for alarmIds its engine doesn't own. This works same-node and cross-node with one mechanism. (Open item to confirm in T18: whether each node's address space is partitioned to its own equipment or replicated — if partitioned, a client can only ever ack local alarms and the DPS broadcast is still correct, just always locally satisfied.)

  2. The node manager has no Akka handle and must stay that way. OtOpcUaNodeManager(server, configuration) (OtOpcUaSdkServer.CreateMasterNodeManager) holds no IActorRef / ActorSystem / DI. The existing forward seam is OpcUaPublishActor → IOpcUaAddressSpaceSink (DeferredAddressSpaceSink → SdkAddressSpaceSink → node manager). For the reverse path the node manager gets a settable command-router delegate (Action<AlarmCommand>), wired at boot by OtOpcUaServerHostedService (which does have the DPS mediator) to publish onto alarm-commands. The node manager itself never touches Akka.

  3. No explicit re-projection after an engine op. Every ScriptedAlarmEngine op (AcknowledgeAsync/ConfirmAsync/OneShotShelveAsync/TimedShelveAsync/UnshelveAsync/ EnableAsync/DisableAsync/AddCommentAsync — all exist, signatures verified) raises the engine's OnEvent, which the host's existing OnEngineEmission already projects to the node. So the inbound handler just calls the op and awaits — the ack visibly updates the node for free. This makes T19 small.

  4. Roles are dropped at the impersonation seam. OpcUaApplicationHost.cs:292 does args.Identity = new UserIdentity(token) and discards result.Roles (only logs them at :293). OpcUaUserAuthResult.Roles is IReadOnlyList<string> (ReadOnly/WriteOperate/ WriteTune/WriteConfigure/AlarmAck); there is an OpcUaOperation enum (Core.Abstractions) with AlarmAcknowledge/AlarmConfirm/AlarmShelve, but no role is consulted anywhere post-auth today (writes aren't gated either — this is greenfield, not a pattern to copy). Risk (drives T17 being its own task): it is unconfirmed that a custom UserIdentity subclass survives the SDK round-trip back to context.UserIdentity inside a method handler. T17 must prove the round-trip (integration assertion); fallback is populating GrantedRoleIds (NodeIdCollection) by mapping role strings → role NodeIds, which is more work.

  5. Double-emit is real, and delta-gating resolves it. WriteAlarmCondition calls ReportConditionEvent unconditionally (OtOpcUaNodeManager.cs:156); the node manager keeps no previous snapshot. Once inbound acks route through the engine, the SDK's own OnAcknowledgeCalled auto-fires event E2 (applying acked state to the node) and the engine round-trip re-projects → would fire E3. Because the SDK applies the acked state before the async engine round-trip completes, delta-gating WriteAlarmCondition against the node's current state suppresses E3 (no delta) while still firing on genuine engine-driven transitions. That's T20. (Fallback if it proves racy: the correlation-suppression option already sketched in the :190-198 in-code note — skip engine re-projection for inbound-originated transitions.)


Task 17: Carry LDAP roles onto the OPC UA session identity

Classification: high-risk · ~5 min · Parallelizable with: Task 22

Files:

  • Create: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Security/RoleCarryingUserIdentity.cs (: UserIdentity, adds IReadOnlyList<string> Roles).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OpcUaApplicationHost.cs:292 (args.Identity = new RoleCarryingUserIdentity(token, result.Roles)).
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests/OpcUaApplicationHostImpersonationTests.cs (existing home for HandleImpersonation).

Steps:

  1. Round-trip spike FIRST (de-risk the whole task). Before building anything, confirm the SDK preserves a custom IUserIdentity instance: in a booted in-process server test (mirror SdkAddressSpaceSinkTests' server fixture), set args.Identity to a sentinel subclass during impersonation and assert a method handler reads it back via (context as ISessionOperationContext)?.UserIdentity as that subclass. If it does NOT survive (SDK wraps/strips it), STOP and switch to the GrantedRoleIds approach — surface this as a scope change, don't silently expand.
  2. Failing unit test: HandleImpersonation on a successful auth sets args.Identity to a RoleCarryingUserIdentity whose Roles equals result.Roles (and the existing identity/denial/anonymous tests still pass).
  3. Implement RoleCarryingUserIdentity + the one-line :292 swap.
  4. Run (OpcUaServer.Tests + the impersonation tests) + commit by path.

Security-path change → high-risk. Touches only the identity construction; no auth-decision logic changes (roles were already resolved, just discarded). Do not change IOpcUaUserAuthenticator or the LDAP bind.


Task 18: Node-manager command router + AlarmAck veto gate + alarm-commands topic

Classification: high-risk · ~5 min · Parallelizable with: none (depends T17)

Files:

  • Create: src/Core/ZB.MOM.WW.OtOpcUa.Commons/OpcUa/AlarmCommand.cs (record AlarmCommand(string AlarmId, string Operation, string User, string? Comment, DateTime? UnshelveAtUtc); Operation ∈ Acknowledge/Confirm/OneShotShelve/TimedShelve/Unshelve/Enable/Disable/AddComment).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs — add a settable Action<AlarmCommand>? AlarmCommandRouter; in MaterialiseAlarmCondition (after Create + initial state, before AddChild) wire alarm.OnAcknowledge/OnConfirm/OnAddComment/ OnShelve/OnTimedUnshelve. Each delegate: (a) read principal via (context as ISessionOperationContext)?.UserIdentity as RoleCarryingUserIdentity, gate on AlarmAck → return StatusCodes.BadUserAccessDenied if absent; (b) invoke AlarmCommandRouter with the mapped AlarmCommand (so the engine updates the domain store + audit + alerts historization); (c) return ServiceResult.Good so the SDK applies node state + auto-fires (the engine re-projection is de-duped in T20).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaSdkServer.cs (pass-through to expose the router setter on the node manager).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Host/OpcUa/OtOpcUaServerHostedService.cs (after the server starts + node manager exists: resolve the DPS mediator, set the router to mediator.Tell(new Publish(ScriptedAlarmHostActor.AlarmCommandsTopic, cmd))).
  • Add the topic const AlarmCommandsTopic = "alarm-commands" on ScriptedAlarmHostActor (used here and in T19).
  • Test: OpcUaServer.Tests — veto gate allows with AlarmAck / denies without (drive a wired condition's OnAcknowledge with a RoleCarryingUserIdentity context); router invoked with the correctly-mapped AlarmCommand (fake Action).

Steps: TDD the gate + router-mapping in the node manager; then the SDK-server pass-through; then the hosted-service wiring (no unit test for the boot wiring — exercised by T23 live-verify). Commit by path. Serialize with T20 (both touch OtOpcUaNodeManager.cs).


Task 19: ScriptedAlarmHostActor inbound command handler

Classification: standard · ~4 min · Parallelizable with: Task 20 (depends T18)

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmHostActor.cs — subscribe to AlarmCommandsTopic in PreStart (alongside the existing _mediator use); Receive<AlarmCommand>(OnAlarmCommand); OnAlarmCommand is async void, switches on Operation → the matching engine.<Op>Async(AlarmId, User, …, CancellationToken.None). Ownership filter: if the engine doesn't own AlarmId, no-op (multi-node broadcast). Catch + log op failures (mirror OnLoadFailed). No explicit re-projection — the engine's OnEvent drives the existing OnEngineEmission → node update.
  • Test: tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/ScriptedAlarms/ScriptedAlarmHostActorTests.cs (extend) — TestKit: an AlarmCommand{Operation="Acknowledge"} for a loaded alarm calls the engine's AcknowledgeAsync (fake/probe engine seam); an unknown AlarmId is ignored; TimedShelve without UnshelveAtUtc is rejected/logged, not thrown.

Steps: TDD via the existing host-actor test seam; run dotnet test --filter ScriptedAlarmHostActor; commit by path.


Task 20: Delta-gate event firing (kill the inbound double-emit)

Classification: high-risk · ~4 min · Parallelizable with: Task 19 (depends T18)

Files:

  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/OtOpcUaNodeManager.cs — keep a ConcurrentDictionary<string, AlarmConditionSnapshot> _lastAlarmState; in WriteAlarmCondition, after projecting, compare the new state to the stored snapshot and only call ReportConditionEvent when it differs (then store it). Replace the now-stale :151-156 "fire exactly one event" comment + tighten the :190-198 double-emit note to "resolved by delta-gate".
  • Test: tests/.../OpcUaServer.Tests/SdkAddressSpaceSinkTests.cs (extend) — two identical WriteAlarmCondition calls fire the condition event once; a changed call fires again. (Assert via an event-count probe / monitored-item on the booted server fixture.)

Steps: TDD the delta-gate; run OpcUaServer.Tests; commit by path. If the booted-server test can't cleanly count events, fall back to asserting the gate's decision via a seam and prove end-to-end in T23. Serialize after T18 (same file).


Task 21: AdminUI ack/shelve control

Classification: standard · ~5 min · Parallelizable with: none (depends T19)

Files:

  • Create: src/Core/ZB.MOM.WW.OtOpcUa.Commons/Messages/Admin/AcknowledgeAlarmCommand.cs + ShelveAlarmCommand.cs (control-plane messages, mirror StartDeployment's shape with a CorrelationId).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/AdminOperations/AdminOperationsActor.cs (the existing admin-pinned cluster singleton) — ReceiveAsync handlers that publish onto alarm-commands (reusing T18's topic + the host's ownership filter → the singleton solves cross-node routing for the AdminUI path too).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Clients/AdminOperationsClient.cs (+ its interface) — AcknowledgeAlarmAsync / ShelveAlarmAsync (mirror StartDeploymentAsync).
  • Modify: src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/Components/Pages/Alerts.razor — per-row Acknowledge / Shelve buttons → IAdminOperationsClient; operator from AuthState … User.Identity?.Name.
  • Test: the control-plane command service + the new AdminOperationsActor handlers (TestKit / Ask). No bUnit — the razor is proven in T23.

Steps: TDD the messages + actor handlers + client; wire the razor; run + commit by path.


Task 22: Client.CLI ack / confirm / shelve commands

Classification: standard · ~5 min · Parallelizable with: Task 17T21 (disjoint Client.*)

Files:

  • Modify: src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/IOpcUaClientService.cs + OpcUaClientService.csAcknowledgeAlarmAsync already declared (no command wires it yet); add ConfirmAlarmAsync + ShelveAlarmAsync (call the SDK Confirm/OneShotShelve/TimedShelve/ Unshelve methods on the condition).
  • Create: src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI/Commands/{Acknowledge,Confirm,Shelve}Command.cs (--node, --event-id, --comment; shelve adds --kind OneShot|Timed --unshelve-at).
  • Test: unit-test what's pure (arg→request mapping); the live round-trip is T23.

Steps: add the service methods + CLI commands; build; commit by path. This is net-new client feature work (the reason old "T19 verification" couldn't just be a verify pass).


Task 23: Live-verify Layer 2 end-to-end

Classification: verification · Parallelizable with: none (depends T18, T19, T20, T21, T22)

Steps: docker-dev up. Use the already-deployed t12-overheat alarm (rig state, below) as the live condition. With Client.CLI: alarms --refresh shows the real condition; drive TestMachine_002.TestChangingInt past the predicate so it fires an event on transition; call the new acknowledge command → confirm the ack round-trips (node AckedState flips, exactly one ack event fires — no double-emit, T20 — state persists across a node restart). Repeat the ack from the AdminUI /alerts buttons (T21) and confirm parity. Verify the AlarmAck gate: a user without AlarmAck is denied (BadUserAccessDenied). Agent does not sign in — user drives. Defects → new fix tasks.


Task 24: Docs + cleanup + finish branch

Classification: small · ~5 min · Parallelizable with: none (depends all)

Files: update docs/ScriptedAlarms.md, docs/VirtualTags.md, docs/v2/Runtime.md, docs/AlarmTracking.md (the inbound-ack + AlarmAck-gate flow is now real); correct the stale docs/v2/phase-7-status.md alarm-runtime status; CLAUDE.md note. Clean up the deliberately-left rig artifacts (t12-overheat, script SC-ba675b168a85, the layer0-logcheck vtag, and revert filler-02's inert cycle-time-s logger line — redeploy). Delete/condense resume.md + pending.md. Then run superpowers-extended-cc:finishing-a-development-branch (full dotnet test, merge to master).


Execution notes

  • Parallel dispatch (Layers 0+1, done): Layer 0 serial (T1→T2→T3→T4). Layer 1: T5→T6 serial; T7, T8 parallel with T5/T6; T9 waits on T6/T7/T8; T10→T11→T12 serial.
  • Parallel dispatch (Layer 2 remainder, T17T24):
    • T17 first (roles) — its step-1 round-trip spike is a go/no-go gate for the gate design.
    • T18 after T17 (the veto gate needs the roles). T19 ∥ T20 after T18 (disjoint files: ScriptedAlarmHostActor.cs vs OtOpcUaNodeManager.cs).
    • T22 runs in parallel with the whole T17T21 server chain (only Client.* files).
    • T21 after T19. T23 after T18/T19/T20/T21/T22. T24 last.
  • One writer at a time within a shared file: OtOpcUaNodeManager.cs is touched by T18 and T20 — serialize T18 → T20. (Layers 0/1 already merged, so Program.cs/T14-T16 contention is moot.)
  • Layer boundaries are natural checkpoints — Layers 0+1 shipped; the T17 round-trip spike is the next gate before committing to the rest of the Layer 2 inbound epic.