From 87e433871eec0c576e9a2ea0a96988cc7032d965 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Thu, 11 Jun 2026 07:49:41 -0400 Subject: [PATCH] docs: document inbound alarm ack/shelve (AlarmAck gate, alarm-commands, AdminUI/CLI) + remove scratch files Records T17-T22 as shipped: RoleCarryingUserIdentity, Part 9 method handlers gated on AlarmAck role, alarm-commands DPS topic, ScriptedAlarmHostActor dispatch, WriteAlarmCondition delta-gate, AdminUI /alerts Acknowledge/Shelve/Unshelve buttons via AdminOperationsActor singleton, and Client.CLI ack/confirm/shelve commands. Corrects stale "Not started" / "Partial" entries in phase-7-status.md (Stream G OPC UA method binding row and C.6 row and Gap 1 body) and adds the alarm-commands topic to Runtime.md. Removes untracked scratch files resume.md and pending.md. --- CLAUDE.md | 4 +++ docs/AlarmTracking.md | 56 ++++++++++++++++++++++++++++++++++++++- docs/ScriptedAlarms.md | 38 +++++++++++++++++++++++++- docs/v2/Runtime.md | 13 ++++++--- docs/v2/phase-7-status.md | 15 ++++++++--- 5 files changed, 116 insertions(+), 10 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index f8da42f0..596ce4a6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -171,3 +171,7 @@ exactly. The backend lives in `src/Server/ZB.MOM.WW.OtOpcUa.AdminUI/ScriptAnalysis/` (six minimal-API endpoints under `/api/script-analysis/*`, gated by the `FleetAdmin` policy). See `docs/ScriptEditor.md` for the full guide. Scripts may use the `{{equip}}` token for equipment-relative tag paths that resolve per-equipment at deploy time — see the "Equipment-relative tag paths" section in `docs/ScriptEditor.md`. + +## Scripted Alarm Ack/Shelve + +Inbound operator acknowledge/shelve for scripted alarms is fully implemented. Two surfaces converge on the `alarm-commands` DPS topic: (1) OPC UA Part 9 condition methods (Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) wired in `OtOpcUaNodeManager`, gated on the `AlarmAck` LDAP role via `RoleCarryingUserIdentity`; (2) AdminUI `/alerts` per-row buttons routed through the `AdminOperationsActor` singleton (gated by `DriverOperator` policy). `ScriptedAlarmHostActor` dispatches from the topic to the engine. Client.CLI supports `ack`, `confirm`, `shelve` commands. See `docs/ScriptedAlarms.md` §"Inbound operator ack/shelve" and `docs/AlarmTracking.md`. diff --git a/docs/AlarmTracking.md b/docs/AlarmTracking.md index 54bfa527..46ad17a0 100644 --- a/docs/AlarmTracking.md +++ b/docs/AlarmTracking.md @@ -77,7 +77,7 @@ comment + original raise time) and arrive lower-latency (no publishing-interval delay on the sub-attribute reads), so they win the dedup. -## Acknowledge routing +## Acknowledge routing — Galaxy / driver alarms `DriverNodeManager` picks the acknowledger when registering each condition (PR B.3 logic): @@ -99,6 +99,60 @@ already validates the session's `AlarmAck` role before dispatching, so the gateway-side ack RPC only sees authenticated, authorised calls. +## Inbound operator ack/shelve — scripted alarms + +Scripted alarms use a separate inbound path that converges on the +`alarm-commands` DPS topic. Two surfaces route onto this topic: + +### OPC UA Part 9 method path (external OPC UA clients) + +`OtOpcUaNodeManager` wires the Part 9 condition methods (Acknowledge / +Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) on each +scripted-alarm `AlarmConditionState` node. Every call is **gated on the +`AlarmAck` LDAP role** — fail-closed: sessions with no role or without +`AlarmAck` group membership receive `BadUserAccessDenied` immediately. +The LDAP-resolved role set is carried past `OpcUaApplicationHost` by +`RoleCarryingUserIdentity` (a `UserIdentity` subclass), making it +readable inside the method handler at dispatch time. + +On allow, the handler publishes a `Commons.OpcUa.AlarmCommand` onto the +`alarm-commands` DPS topic. The node manager is Akka-free; the dispatch +action is a settable `Action` injected at boot by the +hosted service. + +`OnTimedUnshelve` (the SDK's automatic unshelve timer) bypasses the +operator gate — it is system-initiated. + +`WriteAlarmCondition` fires the Part 9 condition event only when the +incoming state differs from the node's current live state (delta-gate), +preventing the double-emit that would otherwise occur when the SDK +auto-applies the acked state and the engine re-projection fires a +duplicate event immediately after. + +### AdminUI path + +The `/alerts` page shows per-row **Acknowledge / Shelve / Unshelve** +buttons gated by the `DriverOperator` AdminUI policy. These route +through the `AdminOperationsActor` cluster singleton +(`AcknowledgeAlarmCommand` / `ShelveAlarmCommand`), which publishes onto +the same `alarm-commands` topic. The singleton handles cross-node +routing — the command always reaches the driver-role node owning the +engine regardless of which AdminUI instance the operator is on. + +### ScriptedAlarmHostActor dispatch + +`ScriptedAlarmHostActor` subscribes to the `alarm-commands` topic, +ownership-filters each command (each node only acts on its own alarms), +and dispatches to the matching `ScriptedAlarmEngine` operation +(`AcknowledgeAsync` / `ConfirmAsync` / `OneShotShelveAsync` / +`TimedShelveAsync` / `UnshelveAsync` / `EnableAsync` / `DisableAsync` / +`AddCommentAsync`). The engine's existing `OnEvent` callback handles +the OPC UA node update — no explicit re-projection is required. + +The AdminUI `/alerts` Shelve flow was live-verified on docker-dev +2026-06-11: singleton → topic → host actor → engine → "Shelved" status +reflected on `/alerts` with the operator identity threaded through. + ## Historian write-back (non-Galaxy alarms) Scripted alarms (and any future non-Galaxy `IAlarmSource` like diff --git a/docs/ScriptedAlarms.md b/docs/ScriptedAlarms.md index 2818dee9..91b973b6 100644 --- a/docs/ScriptedAlarms.md +++ b/docs/ScriptedAlarms.md @@ -105,7 +105,43 @@ Every mutation the state machine produces is immediately persisted inside the en Two mapping notes specific to this adapter: - `SubscribeAlarmsAsync` accepts a list of source-node-id filters, interpreted as Equipment-path prefixes. Empty list matches every alarm. Each emission is matched against every live subscription — the adapter keeps no per-subscription cursor. -- `IAlarmSource.AcknowledgeAsync` does not carry a user identity. The adapter defaults the audit user to `"opcua-client"` so callers using the base interface still produce an audit entry. The server's Part 9 method handlers (Stream G) call the engine's richer `AcknowledgeAsync` / `ConfirmAsync` / `OneShotShelveAsync` / `TimedShelveAsync` / `UnshelveAsync` / `AddCommentAsync` directly with the authenticated principal instead. +- `IAlarmSource.AcknowledgeAsync` does not carry a user identity. The adapter defaults the audit user to `"opcua-client"` so callers using the base interface still produce an audit entry. The server's Part 9 method handlers call the engine's richer `AcknowledgeAsync` / `ConfirmAsync` / `OneShotShelveAsync` / `TimedShelveAsync` / `UnshelveAsync` / `AddCommentAsync` directly with the authenticated principal instead. + +## Inbound operator ack/shelve + +Operators interact with active scripted alarms through two surfaces — both converge on the same `alarm-commands` DPS topic consumed by `ScriptedAlarmHostActor`. + +### AlarmAck gate (OPC UA method path) + +`OtOpcUaNodeManager` wires the OPC UA Part 9 condition methods (Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) on each `AlarmConditionState` node. Every method call is gated on the `AlarmAck` LDAP role — fail-closed: a session with no resolved roles or no `AlarmAck` group membership receives `BadUserAccessDenied` immediately without reaching the engine. The role is carried on the session by `RoleCarryingUserIdentity` (a `UserIdentity` subclass that preserves the LDAP-resolved role set past `OpcUaApplicationHost`). + +On allow, the handler publishes a `Commons.OpcUa.AlarmCommand` (containing command kind, condition id, comment, and operator principal) onto the `alarm-commands` DPS topic. The node manager itself stays Akka-free: the dispatch action is a settable `Action` injected at boot by the hosted service. + +`OnTimedUnshelve` (the SDK's internal auto-unshelve timer) bypasses the client gate — it is system-initiated and not subject to operator role checks. + +### Delta-gate de-duplication + +`WriteAlarmCondition` fires a Part 9 condition event only when the incoming state differs from the node's current live state. This suppresses the double-emit that would otherwise occur when the OPC UA SDK auto-applies the acked state on the node and the engine's re-projection then attempts to fire a duplicate event. + +### AdminUI path + +The AdminUI `/alerts` page (`Alerts.razor`) shows per-row **Acknowledge / Shelve / Unshelve** buttons. These are gated by the `DriverOperator` AdminUI policy and routed through the `AdminOperationsActor` cluster singleton (`AcknowledgeAlarmCommand` / `ShelveAlarmCommand`), which publishes onto the same `alarm-commands` topic. Cross-node routing is handled by the cluster singleton — the command always reaches the driver-role node hosting the engine that owns the alarm regardless of which AdminUI instance the operator is connected to. + +### Client.CLI path + +The Client.CLI supports `ack`, `confirm`, and `shelve` commands that call the Part 9 condition methods directly on the OPC UA server: + +```bash +dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- ack -u opc.tcp://localhost:4840 -c "ns=2;s=Plant/Line1/Oven::OverTemp" -m "Acknowledged by ops" +dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- confirm -u opc.tcp://localhost:4840 -c "ns=2;s=Plant/Line1/Oven::OverTemp" -m "Confirmed" +dotnet run --project src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI -- shelve -u opc.tcp://localhost:4840 -c "ns=2;s=Plant/Line1/Oven::OverTemp" --timed 300000 +``` + +TimedShelve duration is sent as OPC UA `Duration` (milliseconds). The CLI uses `IOpcUaClientService.ConfirmAlarmAsync` / `ShelveAlarmAsync`. + +### ScriptedAlarmHostActor dispatch + +`ScriptedAlarmHostActor` subscribes to the `alarm-commands` DPS topic, ownership-filters each command (each node only acts on the alarms it owns), and dispatches to the matching `ScriptedAlarmEngine` operation (`AcknowledgeAsync` / `ConfirmAsync` / `OneShotShelveAsync` / `TimedShelveAsync` / `UnshelveAsync` / `EnableAsync` / `DisableAsync` / `AddCommentAsync`). No explicit re-projection is needed — the engine's existing `OnEvent` callback updates the OPC UA node after the transition. Emissions map into `AlarmEventArgs` as `AlarmType = Kind.ToString()`, `SourceNodeId = EquipmentPath`, `ConditionId = AlarmId`, `Message = resolved template string`, `Severity` carried verbatim, `SourceTimestampUtc = emission time`. diff --git a/docs/v2/Runtime.md b/docs/v2/Runtime.md index 40bf820d..49c8d133 100644 --- a/docs/v2/Runtime.md +++ b/docs/v2/Runtime.md @@ -82,11 +82,16 @@ Engine wiring (subscription publishing, ApplyDelta diff, bad-quality-on-disconne ## VirtualTagActor / ScriptedAlarmActor -Skeleton state machines + message handlers. Engine work: +Both are fully wired in production (F8 + F9 shipped). `VirtualTagActor` compiles and evaluates expressions; `ScriptedAlarmActor` owns the per-alarm Part 9 state machine and persists `ScriptedAlarmState` to the config DB. -- `VirtualTagEngine.Evaluate()` not yet called from `VirtualTagActor.DependencyValueChanged` (F8). -- `AlarmConditionService` not yet called from `ScriptedAlarmActor` (F9). -- `ScriptedAlarmState` DB persistence on `PreRestart` not wired (F9). +### alarm-commands topic (inbound operator ack/shelve) + +`ScriptedAlarmHostActor` subscribes to the `alarm-commands` DPS topic. Two surfaces publish onto this topic: + +- **OPC UA Part 9 method path** — `OtOpcUaNodeManager` handles Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve calls from external OPC UA clients. Each call is gated on the `AlarmAck` LDAP role (fail-closed); on allow, a `Commons.OpcUa.AlarmCommand` is published onto the topic. +- **AdminUI `/alerts` path** — `AdminOperationsActor` (cluster singleton) publishes `AcknowledgeAlarmCommand` / `ShelveAlarmCommand` from the AdminUI operator buttons. + +`ScriptedAlarmHostActor` ownership-filters incoming commands (each node acts only on its own alarms) and dispatches to the matching `ScriptedAlarmEngine` operation. The engine's `OnEvent` callback handles the resulting OPC UA condition-node update. ## OpcUaPublishActor diff --git a/docs/v2/phase-7-status.md b/docs/v2/phase-7-status.md index b7c2cb02..4a51a780 100644 --- a/docs/v2/phase-7-status.md +++ b/docs/v2/phase-7-status.md @@ -54,7 +54,7 @@ Shipped as PR #180 (36 tests). | C.3 — Predicate evaluation on input change; activate/clear transitions | **Done** | `ScriptedAlarmEngine.ReevaluateAsync`; `_alarmsReferencing` inverse index | | C.4 — Startup recovery (`ActiveState` re-derived; Enabled/Ack/Confirm/Shelve loaded from store) | **Done** | `ScriptedAlarmEngine.LoadAsync`; `IAlarmStateStore.LoadAsync` | | C.5 — Template substitution (`{TagPath}` tokens resolved at emission time) | **Done** | `MessageTemplate.cs`; `MessageTemplateTests.cs` | -| C.6 — OPC UA method binding (Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) | **Partial** | Engine methods exist and are tested. `ScriptedAlarmSource.AcknowledgeAsync` defaults the user to `"opcua-client"`. The plan's Stream G wiring of these methods to OPC UA `MethodCall` dispatch on the condition nodes (so OPC UA client method calls reach the engine with the authenticated principal) is noted in the e2e smoke doc as "not yet wired through `DriverNodeManager.MethodCall` dispatch." Operators acknowledge through Admin UI today; the Part 9 method-call path is a follow-up. | +| C.6 — OPC UA method binding (Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) | **Done** | Engine methods exist and are tested. `OtOpcUaNodeManager` wires all six Part 9 condition methods on each `AlarmConditionState` node, gated on the `AlarmAck` LDAP role. On allow, a `Commons.OpcUa.AlarmCommand` is published onto the `alarm-commands` DPS topic; `ScriptedAlarmHostActor` dispatches to the engine with the authenticated principal. (T18–T19, `feat/scriptlog-alarm-ack`.) | | C.7 — `IAlarmSource` implementation / fan-out registration | **Done** | `ScriptedAlarmSource.cs` | | C.8 — Tests: all state transitions, startup recovery, template substitution, shelving timer expiry | **Done** | `Part9StateMachineTests.cs`, `ScriptedAlarmEngineTests.cs`, `ScriptedAlarmSourceTests.cs`, `MessageTemplateTests.cs` — 5 test files | @@ -108,9 +108,9 @@ Shipped as PR #185 (13 Admin service tests; UI completeness is partial — see g | G.2 — `DriverNodeManager` dispatch routes reads by source; writes to non-Driver rejected with `BadUserAccessDenied` | **Done** | PR #186 follow-up; `OpcUaApplicationHost.SetPhase7Sources` threads `_virtualReadable` + `_scriptedAlarmReadable` into the node manager | | G.3 — `AlarmTracker` composition (`ScriptedAlarmEngine` registers as additional `IAlarmSource`) | **Done** | `ScriptedAlarmSource` adapts engine to `IAlarmSource`; `Phase7EngineComposer.Compose` wires it | | G.4 — Tests: mixed equipment folder browsable via Client.CLI; read/subscribe round-trip; alarm transitions in event stream | **Done** | `Phase7ComposerMappingTests.cs`, `Phase7EngineComposerTests.cs`, `ScriptedAlarmReadableTests.cs`, `CachedTagUpstreamSourceTests.cs`, `DriverSubscriptionBridgeTests.cs` — 6 test files in `Server.Tests/Phase7/` | -| OPC UA method binding for alarm Ack/Confirm/Shelve | **Not started** | Noted explicitly in `phase-7-e2e-smoke.md` §"Known limitations": `DriverNodeManager.MethodCall` dispatch for scripted alarm methods is not wired. Engine has the methods; the OPC UA call path does not reach them. | +| OPC UA method binding for alarm Ack/Confirm/Shelve | **Done** | `OtOpcUaNodeManager` wires Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve on each `AlarmConditionState` node; all gated on `AlarmAck` LDAP role (T18). `ScriptedAlarmHostActor` subscribes to the `alarm-commands` DPS topic and dispatches to the engine (T19). AdminUI `/alerts` Acknowledge/Shelve/Unshelve buttons route through `AdminOperationsActor` onto the same topic (T21). Client.CLI `ack`/`confirm`/`shelve` commands call the Part 9 methods directly (T22). Live-verified on docker-dev 2026-06-11. | -Shipped across PRs #184 + #186 (5 + 7 tests). +Shipped across PRs #184 + #186 (5 + 7 tests). Inbound ack/shelve path shipped as T17–T22 on branch `feat/scriptlog-alarm-ack`. ### Stream H — Exit gate @@ -140,7 +140,14 @@ These are real open items, not issues with the plan reconciliation. ### Gap 1 — OPC UA method-call dispatch for scripted alarm methods (Stream G / C.6) — CLOSED -All Part 9 alarm methods now route to the `ScriptedAlarmEngine`. `Acknowledge` / `Confirm` / `AddComment` route via `DriverNodeManager.RouteScriptedAlarmMethodCalls` (task #24 + follow-up); `AddComment` gates at the `AlarmAcknowledge` tier. `OneShotShelve` / `TimedShelve` / `Unshelve` route via the native `AlarmConditionState.OnShelve` / `OnTimedUnshelve` hooks wired in `MarkAsAlarmCondition`, with the per-instance shelve method NodeIds indexed so the Call gate resolves them to `OpcUaOperation.AlarmShelve`. +All Part 9 alarm methods now route to the `ScriptedAlarmEngine` (shipped T17–T22, branch `feat/scriptlog-alarm-ack`): + +- `OtOpcUaNodeManager` wires Acknowledge / Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve on each `AlarmConditionState` node, gated on the `AlarmAck` LDAP role (`RoleCarryingUserIdentity` carries roles onto the session identity). On allow, a `Commons.OpcUa.AlarmCommand` is published onto the `alarm-commands` DPS topic. +- `ScriptedAlarmHostActor` subscribes to `alarm-commands`, ownership-filters, and dispatches to the matching engine operation. `OnTimedUnshelve` (SDK auto-unshelve) bypasses the client gate. +- `WriteAlarmCondition` delta-gates condition events to suppress the inbound-ack double-emit. +- AdminUI `/alerts` Acknowledge / Shelve / Unshelve buttons route through `AdminOperationsActor` singleton onto the same `alarm-commands` topic (gated by `DriverOperator` AdminUI policy). +- Client.CLI `ack` / `confirm` / `shelve` commands call the Part 9 methods directly. +- Live-verified on docker-dev 2026-06-11. ### Gap 2 — Admin UI: no `/virtual-tags` tab or form (Stream F.2)