Records T17-T22 as shipped: RoleCarryingUserIdentity, Part 9 method handlers gated on AlarmAck role, alarm-commands DPS topic, ScriptedAlarmHostActor dispatch, WriteAlarmCondition delta-gate, AdminUI /alerts Acknowledge/Shelve/Unshelve buttons via AdminOperationsActor singleton, and Client.CLI ack/confirm/shelve commands. Corrects stale "Not started" / "Partial" entries in phase-7-status.md (Stream G OPC UA method binding row and C.6 row and Gap 1 body) and adds the alarm-commands topic to Runtime.md. Removes untracked scratch files resume.md and pending.md.
9.7 KiB
Alarm tracking — v2 final architecture
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
clients after the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md)
landed. The v1 architecture (Galaxy.Host's COM-side GalaxyAlarmTracker)
is preserved at docs/v1/AlarmTracking.md for
historical reference.
Three alarm sources, one OPC UA Part 9 surface
| Source | Driver capability | Path |
|---|---|---|
| Galaxy MxAccess (driver-native) | GalaxyDriver : IAlarmSource |
gateway → worker → MxAccess alarm sink → MX_EVENT_FAMILY_ON_ALARM_TRANSITION → EventPump → driver OnAlarmEvent → AlarmConditionService |
| Galaxy sub-attribute fallback | IWritable writes to $Alarm* sub-attributes |
gateway data subscription → driver OnDataChange → DriverNodeManager ConditionSink → AlarmConditionService |
| Scripted alarms | Phase7Composer |
server-side script evaluator → ScriptedAlarmActor transitions → HistorianAdapterActor → IAlarmHistorianSink |
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
machine lives inside ScriptedAlarmActor
(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs),
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
that the value-driven path collapses.
Galaxy driver path (driver-native)
Restored in PR B.2 of the epic. GalaxyDriver implements
IAlarmSource with these surfaces:
SubscribeAlarmsAsync(sourceNodeIds)→ returns a sentinel handle. The driver doesn't multiplex per source-node-id today; every active handle observes the gateway's alarm-event stream. The server-sideAlarmConditionServicefilters by source-node before raising the OPC UA condition.UnsubscribeAlarmsAsync(handle)→ symmetric handle removal.AcknowledgeAsync(requests)→ routes one gateway RPC per acknowledgement throughIGalaxyAlarmAcknowledger. Production usesGatewayGalaxyAlarmAcknowledgercallingMxGatewayClient.AcknowledgeAlarmAsync(PR E.2 SDK method).OnAlarmEvent→ bridgesEventPump.OnAlarmTransition(PR B.1) ontoAlarmEventArgs. Suppressed when no alarm subscription is active so untracked transitions don't leak through.
The proto contract carries the rich payload — alarm full reference,
source-object reference, alarm-type-name, transition kind (Raise /
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
original raise timestamp, transition timestamp, operator user,
operator comment, alarm category, description. MxAccessSeverityMapper
(PR B.1) translates the raw severity onto the four-bucket
AlarmSeverity ladder — boundaries match v1's GalaxyAlarmTracker
so customers see no surprise re-classification.
The richer fields surface on Core.Abstractions.AlarmEventArgs via
the optional properties added in PR E.7 (OperatorComment,
OriginalRaiseTimestampUtc, AlarmCategory). Consumers that don't
need them are unaffected; consumers that do (Client.UI, Client.CLI
verbose mode) read the new fields when present.
Galaxy sub-attribute fallback
For Galaxy templates without $Alarm* extensions, the value-driven
path stays in place: DriverNodeManager registers an
AlarmConditionState per Galaxy variable that bears alarm-bearing
sub-attributes (InAlarm, Acked, Priority, Description),
subscribes to those sub-attributes, and synthesizes Part 9 transitions
when the values change. This path operated as the only Galaxy alarm
path between PR 7.2 and the alarms-over-gateway epic; it remains the
fallback today.
When both paths report the same condition,
AlarmConditionService.AlarmConditionState keeps the
driver-native record and discards the duplicate sub-attribute
synthesis. Driver-native transitions are richer (carry operator
comment + original raise time) and arrive lower-latency (no
publishing-interval delay on the sub-attribute reads), so they win
the dedup.
Acknowledge routing — Galaxy / driver alarms
DriverNodeManager picks the acknowledger when registering each
condition (PR B.3 logic):
- Driver implements
IAlarmSource→DriverAlarmSourceAcknowledgerroutes the operator comment throughIAlarmSource.AcknowledgeAsyncvia the existingAlarmSurfaceInvoker(Phase 6.1 resilience pipeline; no-retry per decision #143). End-to-end operator-comment fidelity is preserved. - Driver doesn't implement
IAlarmSource→DriverWritableAcknowledgerwrites the comment into theAckMsgWriteRefsub-attribute viaIWritable.WriteAsync. Same resilience pipeline; collapses the comment into a single string write at the wire level.
The OPC UA Part 9 AlarmConditionState.OnAcknowledge delegate
already validates the session's AlarmAck role before dispatching,
so the gateway-side ack RPC only sees authenticated, authorised
calls.
Inbound operator ack/shelve — scripted alarms
Scripted alarms use a separate inbound path that converges on the
alarm-commands DPS topic. Two surfaces route onto this topic:
OPC UA Part 9 method path (external OPC UA clients)
OtOpcUaNodeManager wires the Part 9 condition methods (Acknowledge /
Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) on each
scripted-alarm AlarmConditionState node. Every call is gated on the
AlarmAck LDAP role — fail-closed: sessions with no role or without
AlarmAck group membership receive BadUserAccessDenied immediately.
The LDAP-resolved role set is carried past OpcUaApplicationHost by
RoleCarryingUserIdentity (a UserIdentity subclass), making it
readable inside the method handler at dispatch time.
On allow, the handler publishes a Commons.OpcUa.AlarmCommand onto the
alarm-commands DPS topic. The node manager is Akka-free; the dispatch
action is a settable Action<AlarmCommand> injected at boot by the
hosted service.
OnTimedUnshelve (the SDK's automatic unshelve timer) bypasses the
operator gate — it is system-initiated.
WriteAlarmCondition fires the Part 9 condition event only when the
incoming state differs from the node's current live state (delta-gate),
preventing the double-emit that would otherwise occur when the SDK
auto-applies the acked state and the engine re-projection fires a
duplicate event immediately after.
AdminUI path
The /alerts page shows per-row Acknowledge / Shelve / Unshelve
buttons gated by the DriverOperator AdminUI policy. These route
through the AdminOperationsActor cluster singleton
(AcknowledgeAlarmCommand / ShelveAlarmCommand), which publishes onto
the same alarm-commands topic. The singleton handles cross-node
routing — the command always reaches the driver-role node owning the
engine regardless of which AdminUI instance the operator is on.
ScriptedAlarmHostActor dispatch
ScriptedAlarmHostActor subscribes to the alarm-commands topic,
ownership-filters each command (each node only acts on its own alarms),
and dispatches to the matching ScriptedAlarmEngine operation
(AcknowledgeAsync / ConfirmAsync / OneShotShelveAsync /
TimedShelveAsync / UnshelveAsync / EnableAsync / DisableAsync /
AddCommentAsync). The engine's existing OnEvent callback handles
the OPC UA node update — no explicit re-projection is required.
The AdminUI /alerts Shelve flow was live-verified on docker-dev
2026-06-11: singleton → topic → host actor → engine → "Shelved" status
reflected on /alerts with the operator identity threaded through.
Historian write-back (non-Galaxy alarms)
Scripted alarms (and any future non-Galaxy IAlarmSource like
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
IAlarmHistorianSinkis the DI-registered intake contract. The default binding isNullAlarmHistorianSink(registered inServiceCollectionExtensions.AddOtOpcUaRuntime). Production deployments override it withSqliteStoreAndForwardSinkwrappingWonderwareHistorianClient(the AVEVA Historian sidecar IPC client) — see ServiceHosting.md for the sidecar setup.SqliteStoreAndForwardSinkqueues each transition to a local SQLite database and drains in the background via anIAlarmHistorianWriter. The durability guarantee is bounded: the queue capacity defaults to 1,000,000 rows; under a sustained historian outage, older non-dead-lettered rows are evicted (oldest first) to make room for new events. TheHistorianSinkStatus.EvictedCountcounter surfaces lifetime eviction events so operators can detect silent data loss without log scraping.HistorianAdapterActor(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs) bridges Akka cluster messages fromScriptedAlarmActorinto the sink'sEnqueueAsync; fire-and-forget so the actor loop is never blocked on historian reachability.
Galaxy-native alarms with $Alarm* extensions reach AVEVA Historian
directly via System Platform's HistorizeToAveva toggle on the
alarm primitive — no involvement from OtOpcUa. This sidecar path is
exclusively for non-Galaxy alarm producers.
Cross-references
- Plan: docs/plans/alarms-over-gateway.md
- v1 archive: docs/v1/AlarmTracking.md
- Galaxy driver: docs/drivers/Galaxy.md
- Phase 7 scripting + alarming: docs/v2/implementation/phase-7-scripting-and-alarming.md
- Security + ACL: docs/security.md