Files
lmxopcua/docs/AlarmTracking.md
T
Joseph Doherty 2c938ea6f7 docs(audit): AlarmTracking.md — accuracy + orphan resolution
ORPHAN DECISION: Keep as live doc (path: keep-and-fix).
Rationale: the file carries unique v2 current content describing
the alarms-over-gateway epic architecture; docs/ScriptedAlarms.md
cross-references it explicitly. The orphan symptom is that
docs/README.md still indexes docs/v1/AlarmTracking.md — wiring
this top-level file into README.md is a follow-up task.

STRUCTURAL (dimension 2):
- docs/AlarmTracking.md line 138: Security.md → security.md (CASE-MISMATCH
  from links-report.md rows 1–2). Verified: docs/security.md exists
  (inode 77517627); docs/Security.md is the same file on APFS
  case-insensitive FS, but the checker requires exact on-disk casing.
  check_links.py: zero rows for docs/AlarmTracking.md after fix.

CODE-REALITY (dimension 4):
- line 16 table: `Phase7EngineComposer` / `Phase7EngineComposer.RouteToHistorianAsync`
  → no such class exists. Real class is `Phase7Composer`
  (src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs).
  Scripted-alarm historian routing goes through ScriptedAlarmActor →
  HistorianAdapterActor → IAlarmHistorianSink, not a RouteToHistorianAsync
  method. Fixed to: Phase7Composer / ScriptedAlarmActor transitions →
  HistorianAdapterActor → IAlarmHistorianSink.
- lines 107–123 "Historian write-back" section: referenced
  `Phase7Composer.ResolveHistorianSink` (method doesn't exist in
  current Phase7Composer.cs), `GalaxyProxyDriver` / `GalaxyHistorianWriter`
  (retired in PR 7.2 — no such class in codebase), and `aahClientManaged`
  as a direct call (now mediated through WonderwareHistorianClient).
  Current architecture: NullAlarmHistorianSink default registered in
  ServiceCollectionExtensions.AddOtOpcUaRuntime(); production override
  is SqliteStoreAndForwardSink wrapping WonderwareHistorianClient; bridge
  is HistorianAdapterActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/
  HistorianAdapterActor.cs). Section rewritten to match code reality.
- line 108: "Program.cs" as NullAlarmHistorianSink registration site →
  actual site is ServiceCollectionExtensions.cs, not Program.cs.

STALE-STATUS (dimension 3): no blocked/pending/not-yet banners found
in the top-level file; it was already written as current-state fact.
Galaxy native alarms work end-to-end (verified 2026-05-31) and the
doc correctly describes that as delivered.

CODE-BUG-FLAGS: none. All stale references were doc-side errors; the
production code is correct.

UNVERIFIABLE CLAIMS: AlarmConditionService, DriverNodeManager, ConditionSink,
DriverAlarmSourceAcknowledger, DriverWritableAcknowledger — these are
mentioned by name in the doc but their .cs files were not found in the
search. They may live under a path not searched, or may be internal
implementation details within existing files. These claims are plausible
given the architecture and were not changed.
2026-06-03 15:40:37 -04:00

139 lines
7.1 KiB
Markdown

# Alarm tracking — v2 final architecture
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
clients after the **alarms-over-gateway** epic
([docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md))
landed. The v1 architecture (Galaxy.Host's COM-side `GalaxyAlarmTracker`)
is preserved at [docs/v1/AlarmTracking.md](v1/AlarmTracking.md) for
historical reference.
## Three alarm sources, one OPC UA Part 9 surface
| Source | Driver capability | Path |
|----------------------------------|--------------------------|------|
| **Galaxy MxAccess (driver-native)** | `GalaxyDriver : IAlarmSource` | gateway → worker → MxAccess alarm sink → `MX_EVENT_FAMILY_ON_ALARM_TRANSITION``EventPump` → driver `OnAlarmEvent``AlarmConditionService` |
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange``DriverNodeManager` ConditionSink → `AlarmConditionService` |
| **Scripted alarms** | `Phase7Composer` | server-side script evaluator → `ScriptedAlarmActor` transitions → `HistorianAdapterActor``IAlarmHistorianSink` |
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
machine lives inside `ScriptedAlarmActor`
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`),
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
that the value-driven path collapses.
## Galaxy driver path (driver-native)
Restored in PR B.2 of the epic. `GalaxyDriver` implements
`IAlarmSource` with these surfaces:
- `SubscribeAlarmsAsync(sourceNodeIds)` → returns a sentinel handle.
The driver doesn't multiplex per source-node-id today; every
active handle observes the gateway's alarm-event stream. The
server-side `AlarmConditionService` filters by source-node before
raising the OPC UA condition.
- `UnsubscribeAlarmsAsync(handle)` → symmetric handle removal.
- `AcknowledgeAsync(requests)` → routes one gateway RPC per
acknowledgement through `IGalaxyAlarmAcknowledger`. Production
uses `GatewayGalaxyAlarmAcknowledger` calling
`MxGatewayClient.AcknowledgeAlarmAsync` (PR E.2 SDK method).
- `OnAlarmEvent` → bridges `EventPump.OnAlarmTransition` (PR B.1)
onto `AlarmEventArgs`. Suppressed when no alarm subscription is
active so untracked transitions don't leak through.
The proto contract carries the rich payload — alarm full reference,
source-object reference, alarm-type-name, transition kind (Raise /
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
original raise timestamp, transition timestamp, operator user,
operator comment, alarm category, description. `MxAccessSeverityMapper`
(PR B.1) translates the raw severity onto the four-bucket
`AlarmSeverity` ladder — boundaries match v1's `GalaxyAlarmTracker`
so customers see no surprise re-classification.
The richer fields surface on `Core.Abstractions.AlarmEventArgs` via
the optional properties added in PR E.7 (`OperatorComment`,
`OriginalRaiseTimestampUtc`, `AlarmCategory`). Consumers that don't
need them are unaffected; consumers that do (Client.UI, Client.CLI
verbose mode) read the new fields when present.
## Galaxy sub-attribute fallback
For Galaxy templates without `$Alarm*` extensions, the value-driven
path stays in place: `DriverNodeManager` registers an
`AlarmConditionState` per Galaxy variable that bears alarm-bearing
sub-attributes (`InAlarm`, `Acked`, `Priority`, `Description`),
subscribes to those sub-attributes, and synthesizes Part 9 transitions
when the values change. This path operated as the only Galaxy alarm
path between PR 7.2 and the alarms-over-gateway epic; it remains the
fallback today.
When both paths report the same condition,
`AlarmConditionService.AlarmConditionState` keeps the
driver-native record and discards the duplicate sub-attribute
synthesis. Driver-native transitions are richer (carry operator
comment + original raise time) and arrive lower-latency (no
publishing-interval delay on the sub-attribute reads), so they win
the dedup.
## Acknowledge routing
`DriverNodeManager` picks the acknowledger when registering each
condition (PR B.3 logic):
- Driver implements `IAlarmSource`
`DriverAlarmSourceAcknowledger` routes the operator comment
through `IAlarmSource.AcknowledgeAsync` via the existing
`AlarmSurfaceInvoker` (Phase 6.1 resilience pipeline; no-retry
per decision #143). End-to-end operator-comment fidelity is
preserved.
- Driver doesn't implement `IAlarmSource`
`DriverWritableAcknowledger` writes the comment into the
`AckMsgWriteRef` sub-attribute via `IWritable.WriteAsync`. Same
resilience pipeline; collapses the comment into a single string
write at the wire level.
The OPC UA Part 9 `AlarmConditionState.OnAcknowledge` delegate
already validates the session's `AlarmAck` role before dispatching,
so the gateway-side ack RPC only sees authenticated, authorised
calls.
## Historian write-back (non-Galaxy alarms)
Scripted alarms (and any future non-Galaxy `IAlarmSource` like
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
- `IAlarmHistorianSink` is the DI-registered intake contract. The
default binding is `NullAlarmHistorianSink` (registered in
`ServiceCollectionExtensions.AddOtOpcUaRuntime`). Production
deployments override it with `SqliteStoreAndForwardSink` wrapping
`WonderwareHistorianClient` (the AVEVA Historian sidecar IPC client)
— see [ServiceHosting.md](ServiceHosting.md) for the sidecar setup.
- `SqliteStoreAndForwardSink` queues each transition to a local
SQLite database and drains in the background via an
`IAlarmHistorianWriter`. **The durability guarantee is bounded**: the
queue capacity defaults to 1,000,000 rows; under a sustained
historian outage, older non-dead-lettered rows are evicted (oldest
first) to make room for new events. The `HistorianSinkStatus.EvictedCount`
counter surfaces lifetime eviction events so operators can detect
silent data loss without log scraping.
- `HistorianAdapterActor`
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs`)
bridges Akka cluster messages from `ScriptedAlarmActor` into the
sink's `EnqueueAsync`; fire-and-forget so the actor loop is never
blocked on historian reachability.
Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian
directly via System Platform's `HistorizeToAveva` toggle on the
alarm primitive — no involvement from OtOpcUa. This sidecar path is
exclusively for non-Galaxy alarm producers.
## Cross-references
- Plan: [docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md)
- v1 archive: [docs/v1/AlarmTracking.md](v1/AlarmTracking.md)
- Galaxy driver: [docs/drivers/Galaxy.md](drivers/Galaxy.md)
- Phase 7 scripting + alarming: [docs/v2/implementation/phase-7-scripting-and-alarming.md](v2/implementation/phase-7-scripting-and-alarming.md)
- Security + ACL: [docs/security.md](security.md)