ORPHAN DECISION: Keep as live doc (path: keep-and-fix). Rationale: the file carries unique v2 current content describing the alarms-over-gateway epic architecture; docs/ScriptedAlarms.md cross-references it explicitly. The orphan symptom is that docs/README.md still indexes docs/v1/AlarmTracking.md — wiring this top-level file into README.md is a follow-up task. STRUCTURAL (dimension 2): - docs/AlarmTracking.md line 138: Security.md → security.md (CASE-MISMATCH from links-report.md rows 1–2). Verified: docs/security.md exists (inode 77517627); docs/Security.md is the same file on APFS case-insensitive FS, but the checker requires exact on-disk casing. check_links.py: zero rows for docs/AlarmTracking.md after fix. CODE-REALITY (dimension 4): - line 16 table: `Phase7EngineComposer` / `Phase7EngineComposer.RouteToHistorianAsync` → no such class exists. Real class is `Phase7Composer` (src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs). Scripted-alarm historian routing goes through ScriptedAlarmActor → HistorianAdapterActor → IAlarmHistorianSink, not a RouteToHistorianAsync method. Fixed to: Phase7Composer / ScriptedAlarmActor transitions → HistorianAdapterActor → IAlarmHistorianSink. - lines 107–123 "Historian write-back" section: referenced `Phase7Composer.ResolveHistorianSink` (method doesn't exist in current Phase7Composer.cs), `GalaxyProxyDriver` / `GalaxyHistorianWriter` (retired in PR 7.2 — no such class in codebase), and `aahClientManaged` as a direct call (now mediated through WonderwareHistorianClient). Current architecture: NullAlarmHistorianSink default registered in ServiceCollectionExtensions.AddOtOpcUaRuntime(); production override is SqliteStoreAndForwardSink wrapping WonderwareHistorianClient; bridge is HistorianAdapterActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/ HistorianAdapterActor.cs). Section rewritten to match code reality. - line 108: "Program.cs" as NullAlarmHistorianSink registration site → actual site is ServiceCollectionExtensions.cs, not Program.cs. STALE-STATUS (dimension 3): no blocked/pending/not-yet banners found in the top-level file; it was already written as current-state fact. Galaxy native alarms work end-to-end (verified 2026-05-31) and the doc correctly describes that as delivered. CODE-BUG-FLAGS: none. All stale references were doc-side errors; the production code is correct. UNVERIFIABLE CLAIMS: AlarmConditionService, DriverNodeManager, ConditionSink, DriverAlarmSourceAcknowledger, DriverWritableAcknowledger — these are mentioned by name in the doc but their .cs files were not found in the search. They may live under a path not searched, or may be internal implementation details within existing files. These claims are plausible given the architecture and were not changed.
7.1 KiB
Alarm tracking — v2 final architecture
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
clients after the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md)
landed. The v1 architecture (Galaxy.Host's COM-side GalaxyAlarmTracker)
is preserved at docs/v1/AlarmTracking.md for
historical reference.
Three alarm sources, one OPC UA Part 9 surface
| Source | Driver capability | Path |
|---|---|---|
| Galaxy MxAccess (driver-native) | GalaxyDriver : IAlarmSource |
gateway → worker → MxAccess alarm sink → MX_EVENT_FAMILY_ON_ALARM_TRANSITION → EventPump → driver OnAlarmEvent → AlarmConditionService |
| Galaxy sub-attribute fallback | IWritable writes to $Alarm* sub-attributes |
gateway data subscription → driver OnDataChange → DriverNodeManager ConditionSink → AlarmConditionService |
| Scripted alarms | Phase7Composer |
server-side script evaluator → ScriptedAlarmActor transitions → HistorianAdapterActor → IAlarmHistorianSink |
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
machine lives inside ScriptedAlarmActor
(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs),
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
that the value-driven path collapses.
Galaxy driver path (driver-native)
Restored in PR B.2 of the epic. GalaxyDriver implements
IAlarmSource with these surfaces:
SubscribeAlarmsAsync(sourceNodeIds)→ returns a sentinel handle. The driver doesn't multiplex per source-node-id today; every active handle observes the gateway's alarm-event stream. The server-sideAlarmConditionServicefilters by source-node before raising the OPC UA condition.UnsubscribeAlarmsAsync(handle)→ symmetric handle removal.AcknowledgeAsync(requests)→ routes one gateway RPC per acknowledgement throughIGalaxyAlarmAcknowledger. Production usesGatewayGalaxyAlarmAcknowledgercallingMxGatewayClient.AcknowledgeAlarmAsync(PR E.2 SDK method).OnAlarmEvent→ bridgesEventPump.OnAlarmTransition(PR B.1) ontoAlarmEventArgs. Suppressed when no alarm subscription is active so untracked transitions don't leak through.
The proto contract carries the rich payload — alarm full reference,
source-object reference, alarm-type-name, transition kind (Raise /
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
original raise timestamp, transition timestamp, operator user,
operator comment, alarm category, description. MxAccessSeverityMapper
(PR B.1) translates the raw severity onto the four-bucket
AlarmSeverity ladder — boundaries match v1's GalaxyAlarmTracker
so customers see no surprise re-classification.
The richer fields surface on Core.Abstractions.AlarmEventArgs via
the optional properties added in PR E.7 (OperatorComment,
OriginalRaiseTimestampUtc, AlarmCategory). Consumers that don't
need them are unaffected; consumers that do (Client.UI, Client.CLI
verbose mode) read the new fields when present.
Galaxy sub-attribute fallback
For Galaxy templates without $Alarm* extensions, the value-driven
path stays in place: DriverNodeManager registers an
AlarmConditionState per Galaxy variable that bears alarm-bearing
sub-attributes (InAlarm, Acked, Priority, Description),
subscribes to those sub-attributes, and synthesizes Part 9 transitions
when the values change. This path operated as the only Galaxy alarm
path between PR 7.2 and the alarms-over-gateway epic; it remains the
fallback today.
When both paths report the same condition,
AlarmConditionService.AlarmConditionState keeps the
driver-native record and discards the duplicate sub-attribute
synthesis. Driver-native transitions are richer (carry operator
comment + original raise time) and arrive lower-latency (no
publishing-interval delay on the sub-attribute reads), so they win
the dedup.
Acknowledge routing
DriverNodeManager picks the acknowledger when registering each
condition (PR B.3 logic):
- Driver implements
IAlarmSource→DriverAlarmSourceAcknowledgerroutes the operator comment throughIAlarmSource.AcknowledgeAsyncvia the existingAlarmSurfaceInvoker(Phase 6.1 resilience pipeline; no-retry per decision #143). End-to-end operator-comment fidelity is preserved. - Driver doesn't implement
IAlarmSource→DriverWritableAcknowledgerwrites the comment into theAckMsgWriteRefsub-attribute viaIWritable.WriteAsync. Same resilience pipeline; collapses the comment into a single string write at the wire level.
The OPC UA Part 9 AlarmConditionState.OnAcknowledge delegate
already validates the session's AlarmAck role before dispatching,
so the gateway-side ack RPC only sees authenticated, authorised
calls.
Historian write-back (non-Galaxy alarms)
Scripted alarms (and any future non-Galaxy IAlarmSource like
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
IAlarmHistorianSinkis the DI-registered intake contract. The default binding isNullAlarmHistorianSink(registered inServiceCollectionExtensions.AddOtOpcUaRuntime). Production deployments override it withSqliteStoreAndForwardSinkwrappingWonderwareHistorianClient(the AVEVA Historian sidecar IPC client) — see ServiceHosting.md for the sidecar setup.SqliteStoreAndForwardSinkqueues each transition to a local SQLite database and drains in the background via anIAlarmHistorianWriter. The durability guarantee is bounded: the queue capacity defaults to 1,000,000 rows; under a sustained historian outage, older non-dead-lettered rows are evicted (oldest first) to make room for new events. TheHistorianSinkStatus.EvictedCountcounter surfaces lifetime eviction events so operators can detect silent data loss without log scraping.HistorianAdapterActor(src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs) bridges Akka cluster messages fromScriptedAlarmActorinto the sink'sEnqueueAsync; fire-and-forget so the actor loop is never blocked on historian reachability.
Galaxy-native alarms with $Alarm* extensions reach AVEVA Historian
directly via System Platform's HistorizeToAveva toggle on the
alarm primitive — no involvement from OtOpcUa. This sidecar path is
exclusively for non-Galaxy alarm producers.
Cross-references
- Plan: docs/plans/alarms-over-gateway.md
- v1 archive: docs/v1/AlarmTracking.md
- Galaxy driver: docs/drivers/Galaxy.md
- Phase 7 scripting + alarming: docs/v2/implementation/phase-7-scripting-and-alarming.md
- Security + ACL: docs/security.md