Files
lmxopcua/docs/AlarmTracking.md
T
Joseph Doherty 2c938ea6f7 docs(audit): AlarmTracking.md — accuracy + orphan resolution
ORPHAN DECISION: Keep as live doc (path: keep-and-fix).
Rationale: the file carries unique v2 current content describing
the alarms-over-gateway epic architecture; docs/ScriptedAlarms.md
cross-references it explicitly. The orphan symptom is that
docs/README.md still indexes docs/v1/AlarmTracking.md — wiring
this top-level file into README.md is a follow-up task.

STRUCTURAL (dimension 2):
- docs/AlarmTracking.md line 138: Security.md → security.md (CASE-MISMATCH
  from links-report.md rows 1–2). Verified: docs/security.md exists
  (inode 77517627); docs/Security.md is the same file on APFS
  case-insensitive FS, but the checker requires exact on-disk casing.
  check_links.py: zero rows for docs/AlarmTracking.md after fix.

CODE-REALITY (dimension 4):
- line 16 table: `Phase7EngineComposer` / `Phase7EngineComposer.RouteToHistorianAsync`
  → no such class exists. Real class is `Phase7Composer`
  (src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs).
  Scripted-alarm historian routing goes through ScriptedAlarmActor →
  HistorianAdapterActor → IAlarmHistorianSink, not a RouteToHistorianAsync
  method. Fixed to: Phase7Composer / ScriptedAlarmActor transitions →
  HistorianAdapterActor → IAlarmHistorianSink.
- lines 107–123 "Historian write-back" section: referenced
  `Phase7Composer.ResolveHistorianSink` (method doesn't exist in
  current Phase7Composer.cs), `GalaxyProxyDriver` / `GalaxyHistorianWriter`
  (retired in PR 7.2 — no such class in codebase), and `aahClientManaged`
  as a direct call (now mediated through WonderwareHistorianClient).
  Current architecture: NullAlarmHistorianSink default registered in
  ServiceCollectionExtensions.AddOtOpcUaRuntime(); production override
  is SqliteStoreAndForwardSink wrapping WonderwareHistorianClient; bridge
  is HistorianAdapterActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/
  HistorianAdapterActor.cs). Section rewritten to match code reality.
- line 108: "Program.cs" as NullAlarmHistorianSink registration site →
  actual site is ServiceCollectionExtensions.cs, not Program.cs.

STALE-STATUS (dimension 3): no blocked/pending/not-yet banners found
in the top-level file; it was already written as current-state fact.
Galaxy native alarms work end-to-end (verified 2026-05-31) and the
doc correctly describes that as delivered.

CODE-BUG-FLAGS: none. All stale references were doc-side errors; the
production code is correct.

UNVERIFIABLE CLAIMS: AlarmConditionService, DriverNodeManager, ConditionSink,
DriverAlarmSourceAcknowledger, DriverWritableAcknowledger — these are
mentioned by name in the doc but their .cs files were not found in the
search. They may live under a path not searched, or may be internal
implementation details within existing files. These claims are plausible
given the architecture and were not changed.
2026-06-03 15:40:37 -04:00

7.1 KiB

Alarm tracking — v2 final architecture

This document describes how OtOpcUa surfaces alarms to OPC UA Part 9 clients after the alarms-over-gateway epic (docs/plans/alarms-over-gateway.md) landed. The v1 architecture (Galaxy.Host's COM-side GalaxyAlarmTracker) is preserved at docs/v1/AlarmTracking.md for historical reference.

Three alarm sources, one OPC UA Part 9 surface

Source Driver capability Path
Galaxy MxAccess (driver-native) GalaxyDriver : IAlarmSource gateway → worker → MxAccess alarm sink → MX_EVENT_FAMILY_ON_ALARM_TRANSITIONEventPump → driver OnAlarmEventAlarmConditionService
Galaxy sub-attribute fallback IWritable writes to $Alarm* sub-attributes gateway data subscription → driver OnDataChangeDriverNodeManager ConditionSink → AlarmConditionService
Scripted alarms Phase7Composer server-side script evaluator → ScriptedAlarmActor transitions → HistorianAdapterActorIAlarmHistorianSink

All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state machine lives inside ScriptedAlarmActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs), which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take precedence over sub-attribute synthesis when both arrive for the same condition — the dedup logic prefers the richer driver-native record because it carries the full operator + raise-time + category metadata that the value-driven path collapses.

Galaxy driver path (driver-native)

Restored in PR B.2 of the epic. GalaxyDriver implements IAlarmSource with these surfaces:

  • SubscribeAlarmsAsync(sourceNodeIds) → returns a sentinel handle. The driver doesn't multiplex per source-node-id today; every active handle observes the gateway's alarm-event stream. The server-side AlarmConditionService filters by source-node before raising the OPC UA condition.
  • UnsubscribeAlarmsAsync(handle) → symmetric handle removal.
  • AcknowledgeAsync(requests) → routes one gateway RPC per acknowledgement through IGalaxyAlarmAcknowledger. Production uses GatewayGalaxyAlarmAcknowledger calling MxGatewayClient.AcknowledgeAlarmAsync (PR E.2 SDK method).
  • OnAlarmEvent → bridges EventPump.OnAlarmTransition (PR B.1) onto AlarmEventArgs. Suppressed when no alarm subscription is active so untracked transitions don't leak through.

The proto contract carries the rich payload — alarm full reference, source-object reference, alarm-type-name, transition kind (Raise / Acknowledge / Clear / Retrigger), severity (raw MxAccess scale), original raise timestamp, transition timestamp, operator user, operator comment, alarm category, description. MxAccessSeverityMapper (PR B.1) translates the raw severity onto the four-bucket AlarmSeverity ladder — boundaries match v1's GalaxyAlarmTracker so customers see no surprise re-classification.

The richer fields surface on Core.Abstractions.AlarmEventArgs via the optional properties added in PR E.7 (OperatorComment, OriginalRaiseTimestampUtc, AlarmCategory). Consumers that don't need them are unaffected; consumers that do (Client.UI, Client.CLI verbose mode) read the new fields when present.

Galaxy sub-attribute fallback

For Galaxy templates without $Alarm* extensions, the value-driven path stays in place: DriverNodeManager registers an AlarmConditionState per Galaxy variable that bears alarm-bearing sub-attributes (InAlarm, Acked, Priority, Description), subscribes to those sub-attributes, and synthesizes Part 9 transitions when the values change. This path operated as the only Galaxy alarm path between PR 7.2 and the alarms-over-gateway epic; it remains the fallback today.

When both paths report the same condition, AlarmConditionService.AlarmConditionState keeps the driver-native record and discards the duplicate sub-attribute synthesis. Driver-native transitions are richer (carry operator comment + original raise time) and arrive lower-latency (no publishing-interval delay on the sub-attribute reads), so they win the dedup.

Acknowledge routing

DriverNodeManager picks the acknowledger when registering each condition (PR B.3 logic):

  • Driver implements IAlarmSourceDriverAlarmSourceAcknowledger routes the operator comment through IAlarmSource.AcknowledgeAsync via the existing AlarmSurfaceInvoker (Phase 6.1 resilience pipeline; no-retry per decision #143). End-to-end operator-comment fidelity is preserved.
  • Driver doesn't implement IAlarmSourceDriverWritableAcknowledger writes the comment into the AckMsgWriteRef sub-attribute via IWritable.WriteAsync. Same resilience pipeline; collapses the comment into a single string write at the wire level.

The OPC UA Part 9 AlarmConditionState.OnAcknowledge delegate already validates the session's AlarmAck role before dispatching, so the gateway-side ack RPC only sees authenticated, authorised calls.

Historian write-back (non-Galaxy alarms)

Scripted alarms (and any future non-Galaxy IAlarmSource like AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:

  • IAlarmHistorianSink is the DI-registered intake contract. The default binding is NullAlarmHistorianSink (registered in ServiceCollectionExtensions.AddOtOpcUaRuntime). Production deployments override it with SqliteStoreAndForwardSink wrapping WonderwareHistorianClient (the AVEVA Historian sidecar IPC client) — see ServiceHosting.md for the sidecar setup.
  • SqliteStoreAndForwardSink queues each transition to a local SQLite database and drains in the background via an IAlarmHistorianWriter. The durability guarantee is bounded: the queue capacity defaults to 1,000,000 rows; under a sustained historian outage, older non-dead-lettered rows are evicted (oldest first) to make room for new events. The HistorianSinkStatus.EvictedCount counter surfaces lifetime eviction events so operators can detect silent data loss without log scraping.
  • HistorianAdapterActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs) bridges Akka cluster messages from ScriptedAlarmActor into the sink's EnqueueAsync; fire-and-forget so the actor loop is never blocked on historian reachability.

Galaxy-native alarms with $Alarm* extensions reach AVEVA Historian directly via System Platform's HistorizeToAveva toggle on the alarm primitive — no involvement from OtOpcUa. This sidecar path is exclusively for non-Galaxy alarm producers.

Cross-references