87e433871e
Records T17-T22 as shipped: RoleCarryingUserIdentity, Part 9 method handlers gated on AlarmAck role, alarm-commands DPS topic, ScriptedAlarmHostActor dispatch, WriteAlarmCondition delta-gate, AdminUI /alerts Acknowledge/Shelve/Unshelve buttons via AdminOperationsActor singleton, and Client.CLI ack/confirm/shelve commands. Corrects stale "Not started" / "Partial" entries in phase-7-status.md (Stream G OPC UA method binding row and C.6 row and Gap 1 body) and adds the alarm-commands topic to Runtime.md. Removes untracked scratch files resume.md and pending.md.
193 lines
9.7 KiB
Markdown
193 lines
9.7 KiB
Markdown
# Alarm tracking — v2 final architecture
|
|
|
|
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
|
|
clients after the **alarms-over-gateway** epic
|
|
([docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md))
|
|
landed. The v1 architecture (Galaxy.Host's COM-side `GalaxyAlarmTracker`)
|
|
is preserved at [docs/v1/AlarmTracking.md](v1/AlarmTracking.md) for
|
|
historical reference.
|
|
|
|
## Three alarm sources, one OPC UA Part 9 surface
|
|
|
|
| Source | Driver capability | Path |
|
|
|----------------------------------|--------------------------|------|
|
|
| **Galaxy MxAccess (driver-native)** | `GalaxyDriver : IAlarmSource` | gateway → worker → MxAccess alarm sink → `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` → `EventPump` → driver `OnAlarmEvent` → `AlarmConditionService` |
|
|
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange` → `DriverNodeManager` ConditionSink → `AlarmConditionService` |
|
|
| **Scripted alarms** | `Phase7Composer` | server-side script evaluator → `ScriptedAlarmActor` transitions → `HistorianAdapterActor` → `IAlarmHistorianSink` |
|
|
|
|
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
|
|
machine lives inside `ScriptedAlarmActor`
|
|
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`),
|
|
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
|
|
precedence over sub-attribute synthesis when both arrive for the same
|
|
condition — the dedup logic prefers the richer driver-native record
|
|
because it carries the full operator + raise-time + category metadata
|
|
that the value-driven path collapses.
|
|
|
|
## Galaxy driver path (driver-native)
|
|
|
|
Restored in PR B.2 of the epic. `GalaxyDriver` implements
|
|
`IAlarmSource` with these surfaces:
|
|
|
|
- `SubscribeAlarmsAsync(sourceNodeIds)` → returns a sentinel handle.
|
|
The driver doesn't multiplex per source-node-id today; every
|
|
active handle observes the gateway's alarm-event stream. The
|
|
server-side `AlarmConditionService` filters by source-node before
|
|
raising the OPC UA condition.
|
|
- `UnsubscribeAlarmsAsync(handle)` → symmetric handle removal.
|
|
- `AcknowledgeAsync(requests)` → routes one gateway RPC per
|
|
acknowledgement through `IGalaxyAlarmAcknowledger`. Production
|
|
uses `GatewayGalaxyAlarmAcknowledger` calling
|
|
`MxGatewayClient.AcknowledgeAlarmAsync` (PR E.2 SDK method).
|
|
- `OnAlarmEvent` → bridges `EventPump.OnAlarmTransition` (PR B.1)
|
|
onto `AlarmEventArgs`. Suppressed when no alarm subscription is
|
|
active so untracked transitions don't leak through.
|
|
|
|
The proto contract carries the rich payload — alarm full reference,
|
|
source-object reference, alarm-type-name, transition kind (Raise /
|
|
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
|
|
original raise timestamp, transition timestamp, operator user,
|
|
operator comment, alarm category, description. `MxAccessSeverityMapper`
|
|
(PR B.1) translates the raw severity onto the four-bucket
|
|
`AlarmSeverity` ladder — boundaries match v1's `GalaxyAlarmTracker`
|
|
so customers see no surprise re-classification.
|
|
|
|
The richer fields surface on `Core.Abstractions.AlarmEventArgs` via
|
|
the optional properties added in PR E.7 (`OperatorComment`,
|
|
`OriginalRaiseTimestampUtc`, `AlarmCategory`). Consumers that don't
|
|
need them are unaffected; consumers that do (Client.UI, Client.CLI
|
|
verbose mode) read the new fields when present.
|
|
|
|
## Galaxy sub-attribute fallback
|
|
|
|
For Galaxy templates without `$Alarm*` extensions, the value-driven
|
|
path stays in place: `DriverNodeManager` registers an
|
|
`AlarmConditionState` per Galaxy variable that bears alarm-bearing
|
|
sub-attributes (`InAlarm`, `Acked`, `Priority`, `Description`),
|
|
subscribes to those sub-attributes, and synthesizes Part 9 transitions
|
|
when the values change. This path operated as the only Galaxy alarm
|
|
path between PR 7.2 and the alarms-over-gateway epic; it remains the
|
|
fallback today.
|
|
|
|
When both paths report the same condition,
|
|
`AlarmConditionService.AlarmConditionState` keeps the
|
|
driver-native record and discards the duplicate sub-attribute
|
|
synthesis. Driver-native transitions are richer (carry operator
|
|
comment + original raise time) and arrive lower-latency (no
|
|
publishing-interval delay on the sub-attribute reads), so they win
|
|
the dedup.
|
|
|
|
## Acknowledge routing — Galaxy / driver alarms
|
|
|
|
`DriverNodeManager` picks the acknowledger when registering each
|
|
condition (PR B.3 logic):
|
|
|
|
- Driver implements `IAlarmSource` →
|
|
`DriverAlarmSourceAcknowledger` routes the operator comment
|
|
through `IAlarmSource.AcknowledgeAsync` via the existing
|
|
`AlarmSurfaceInvoker` (Phase 6.1 resilience pipeline; no-retry
|
|
per decision #143). End-to-end operator-comment fidelity is
|
|
preserved.
|
|
- Driver doesn't implement `IAlarmSource` →
|
|
`DriverWritableAcknowledger` writes the comment into the
|
|
`AckMsgWriteRef` sub-attribute via `IWritable.WriteAsync`. Same
|
|
resilience pipeline; collapses the comment into a single string
|
|
write at the wire level.
|
|
|
|
The OPC UA Part 9 `AlarmConditionState.OnAcknowledge` delegate
|
|
already validates the session's `AlarmAck` role before dispatching,
|
|
so the gateway-side ack RPC only sees authenticated, authorised
|
|
calls.
|
|
|
|
## Inbound operator ack/shelve — scripted alarms
|
|
|
|
Scripted alarms use a separate inbound path that converges on the
|
|
`alarm-commands` DPS topic. Two surfaces route onto this topic:
|
|
|
|
### OPC UA Part 9 method path (external OPC UA clients)
|
|
|
|
`OtOpcUaNodeManager` wires the Part 9 condition methods (Acknowledge /
|
|
Confirm / AddComment / OneShotShelve / TimedShelve / Unshelve) on each
|
|
scripted-alarm `AlarmConditionState` node. Every call is **gated on the
|
|
`AlarmAck` LDAP role** — fail-closed: sessions with no role or without
|
|
`AlarmAck` group membership receive `BadUserAccessDenied` immediately.
|
|
The LDAP-resolved role set is carried past `OpcUaApplicationHost` by
|
|
`RoleCarryingUserIdentity` (a `UserIdentity` subclass), making it
|
|
readable inside the method handler at dispatch time.
|
|
|
|
On allow, the handler publishes a `Commons.OpcUa.AlarmCommand` onto the
|
|
`alarm-commands` DPS topic. The node manager is Akka-free; the dispatch
|
|
action is a settable `Action<AlarmCommand>` injected at boot by the
|
|
hosted service.
|
|
|
|
`OnTimedUnshelve` (the SDK's automatic unshelve timer) bypasses the
|
|
operator gate — it is system-initiated.
|
|
|
|
`WriteAlarmCondition` fires the Part 9 condition event only when the
|
|
incoming state differs from the node's current live state (delta-gate),
|
|
preventing the double-emit that would otherwise occur when the SDK
|
|
auto-applies the acked state and the engine re-projection fires a
|
|
duplicate event immediately after.
|
|
|
|
### AdminUI path
|
|
|
|
The `/alerts` page shows per-row **Acknowledge / Shelve / Unshelve**
|
|
buttons gated by the `DriverOperator` AdminUI policy. These route
|
|
through the `AdminOperationsActor` cluster singleton
|
|
(`AcknowledgeAlarmCommand` / `ShelveAlarmCommand`), which publishes onto
|
|
the same `alarm-commands` topic. The singleton handles cross-node
|
|
routing — the command always reaches the driver-role node owning the
|
|
engine regardless of which AdminUI instance the operator is on.
|
|
|
|
### ScriptedAlarmHostActor dispatch
|
|
|
|
`ScriptedAlarmHostActor` subscribes to the `alarm-commands` topic,
|
|
ownership-filters each command (each node only acts on its own alarms),
|
|
and dispatches to the matching `ScriptedAlarmEngine` operation
|
|
(`AcknowledgeAsync` / `ConfirmAsync` / `OneShotShelveAsync` /
|
|
`TimedShelveAsync` / `UnshelveAsync` / `EnableAsync` / `DisableAsync` /
|
|
`AddCommentAsync`). The engine's existing `OnEvent` callback handles
|
|
the OPC UA node update — no explicit re-projection is required.
|
|
|
|
The AdminUI `/alerts` Shelve flow was live-verified on docker-dev
|
|
2026-06-11: singleton → topic → host actor → engine → "Shelved" status
|
|
reflected on `/alerts` with the operator identity threaded through.
|
|
|
|
## Historian write-back (non-Galaxy alarms)
|
|
|
|
Scripted alarms (and any future non-Galaxy `IAlarmSource` like
|
|
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
|
|
|
|
- `IAlarmHistorianSink` is the DI-registered intake contract. The
|
|
default binding is `NullAlarmHistorianSink` (registered in
|
|
`ServiceCollectionExtensions.AddOtOpcUaRuntime`). Production
|
|
deployments override it with `SqliteStoreAndForwardSink` wrapping
|
|
`WonderwareHistorianClient` (the AVEVA Historian sidecar IPC client)
|
|
— see [ServiceHosting.md](ServiceHosting.md) for the sidecar setup.
|
|
- `SqliteStoreAndForwardSink` queues each transition to a local
|
|
SQLite database and drains in the background via an
|
|
`IAlarmHistorianWriter`. **The durability guarantee is bounded**: the
|
|
queue capacity defaults to 1,000,000 rows; under a sustained
|
|
historian outage, older non-dead-lettered rows are evicted (oldest
|
|
first) to make room for new events. The `HistorianSinkStatus.EvictedCount`
|
|
counter surfaces lifetime eviction events so operators can detect
|
|
silent data loss without log scraping.
|
|
- `HistorianAdapterActor`
|
|
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs`)
|
|
bridges Akka cluster messages from `ScriptedAlarmActor` into the
|
|
sink's `EnqueueAsync`; fire-and-forget so the actor loop is never
|
|
blocked on historian reachability.
|
|
|
|
Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian
|
|
directly via System Platform's `HistorizeToAveva` toggle on the
|
|
alarm primitive — no involvement from OtOpcUa. This sidecar path is
|
|
exclusively for non-Galaxy alarm producers.
|
|
|
|
## Cross-references
|
|
|
|
- Plan: [docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md)
|
|
- v1 archive: [docs/v1/AlarmTracking.md](v1/AlarmTracking.md)
|
|
- Galaxy driver: [docs/drivers/Galaxy.md](drivers/Galaxy.md)
|
|
- Phase 7 scripting + alarming: [docs/v2/implementation/phase-7-scripting-and-alarming.md](v2/implementation/phase-7-scripting-and-alarming.md)
|
|
- Security + ACL: [docs/security.md](security.md)
|