2c938ea6f7
ORPHAN DECISION: Keep as live doc (path: keep-and-fix). Rationale: the file carries unique v2 current content describing the alarms-over-gateway epic architecture; docs/ScriptedAlarms.md cross-references it explicitly. The orphan symptom is that docs/README.md still indexes docs/v1/AlarmTracking.md — wiring this top-level file into README.md is a follow-up task. STRUCTURAL (dimension 2): - docs/AlarmTracking.md line 138: Security.md → security.md (CASE-MISMATCH from links-report.md rows 1–2). Verified: docs/security.md exists (inode 77517627); docs/Security.md is the same file on APFS case-insensitive FS, but the checker requires exact on-disk casing. check_links.py: zero rows for docs/AlarmTracking.md after fix. CODE-REALITY (dimension 4): - line 16 table: `Phase7EngineComposer` / `Phase7EngineComposer.RouteToHistorianAsync` → no such class exists. Real class is `Phase7Composer` (src/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer/Phase7Composer.cs). Scripted-alarm historian routing goes through ScriptedAlarmActor → HistorianAdapterActor → IAlarmHistorianSink, not a RouteToHistorianAsync method. Fixed to: Phase7Composer / ScriptedAlarmActor transitions → HistorianAdapterActor → IAlarmHistorianSink. - lines 107–123 "Historian write-back" section: referenced `Phase7Composer.ResolveHistorianSink` (method doesn't exist in current Phase7Composer.cs), `GalaxyProxyDriver` / `GalaxyHistorianWriter` (retired in PR 7.2 — no such class in codebase), and `aahClientManaged` as a direct call (now mediated through WonderwareHistorianClient). Current architecture: NullAlarmHistorianSink default registered in ServiceCollectionExtensions.AddOtOpcUaRuntime(); production override is SqliteStoreAndForwardSink wrapping WonderwareHistorianClient; bridge is HistorianAdapterActor (src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/ HistorianAdapterActor.cs). Section rewritten to match code reality. - line 108: "Program.cs" as NullAlarmHistorianSink registration site → actual site is ServiceCollectionExtensions.cs, not Program.cs. STALE-STATUS (dimension 3): no blocked/pending/not-yet banners found in the top-level file; it was already written as current-state fact. Galaxy native alarms work end-to-end (verified 2026-05-31) and the doc correctly describes that as delivered. CODE-BUG-FLAGS: none. All stale references were doc-side errors; the production code is correct. UNVERIFIABLE CLAIMS: AlarmConditionService, DriverNodeManager, ConditionSink, DriverAlarmSourceAcknowledger, DriverWritableAcknowledger — these are mentioned by name in the doc but their .cs files were not found in the search. They may live under a path not searched, or may be internal implementation details within existing files. These claims are plausible given the architecture and were not changed.
139 lines
7.1 KiB
Markdown
139 lines
7.1 KiB
Markdown
# Alarm tracking — v2 final architecture
|
|
|
|
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
|
|
clients after the **alarms-over-gateway** epic
|
|
([docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md))
|
|
landed. The v1 architecture (Galaxy.Host's COM-side `GalaxyAlarmTracker`)
|
|
is preserved at [docs/v1/AlarmTracking.md](v1/AlarmTracking.md) for
|
|
historical reference.
|
|
|
|
## Three alarm sources, one OPC UA Part 9 surface
|
|
|
|
| Source | Driver capability | Path |
|
|
|----------------------------------|--------------------------|------|
|
|
| **Galaxy MxAccess (driver-native)** | `GalaxyDriver : IAlarmSource` | gateway → worker → MxAccess alarm sink → `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` → `EventPump` → driver `OnAlarmEvent` → `AlarmConditionService` |
|
|
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange` → `DriverNodeManager` ConditionSink → `AlarmConditionService` |
|
|
| **Scripted alarms** | `Phase7Composer` | server-side script evaluator → `ScriptedAlarmActor` transitions → `HistorianAdapterActor` → `IAlarmHistorianSink` |
|
|
|
|
All three converge on the alarm-state actor — in v2 the OPC UA Part 9 state
|
|
machine lives inside `ScriptedAlarmActor`
|
|
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/ScriptedAlarms/ScriptedAlarmActor.cs`),
|
|
which dispatches transitions to the OPC UA condition node managers. Driver-native transitions take
|
|
precedence over sub-attribute synthesis when both arrive for the same
|
|
condition — the dedup logic prefers the richer driver-native record
|
|
because it carries the full operator + raise-time + category metadata
|
|
that the value-driven path collapses.
|
|
|
|
## Galaxy driver path (driver-native)
|
|
|
|
Restored in PR B.2 of the epic. `GalaxyDriver` implements
|
|
`IAlarmSource` with these surfaces:
|
|
|
|
- `SubscribeAlarmsAsync(sourceNodeIds)` → returns a sentinel handle.
|
|
The driver doesn't multiplex per source-node-id today; every
|
|
active handle observes the gateway's alarm-event stream. The
|
|
server-side `AlarmConditionService` filters by source-node before
|
|
raising the OPC UA condition.
|
|
- `UnsubscribeAlarmsAsync(handle)` → symmetric handle removal.
|
|
- `AcknowledgeAsync(requests)` → routes one gateway RPC per
|
|
acknowledgement through `IGalaxyAlarmAcknowledger`. Production
|
|
uses `GatewayGalaxyAlarmAcknowledger` calling
|
|
`MxGatewayClient.AcknowledgeAlarmAsync` (PR E.2 SDK method).
|
|
- `OnAlarmEvent` → bridges `EventPump.OnAlarmTransition` (PR B.1)
|
|
onto `AlarmEventArgs`. Suppressed when no alarm subscription is
|
|
active so untracked transitions don't leak through.
|
|
|
|
The proto contract carries the rich payload — alarm full reference,
|
|
source-object reference, alarm-type-name, transition kind (Raise /
|
|
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
|
|
original raise timestamp, transition timestamp, operator user,
|
|
operator comment, alarm category, description. `MxAccessSeverityMapper`
|
|
(PR B.1) translates the raw severity onto the four-bucket
|
|
`AlarmSeverity` ladder — boundaries match v1's `GalaxyAlarmTracker`
|
|
so customers see no surprise re-classification.
|
|
|
|
The richer fields surface on `Core.Abstractions.AlarmEventArgs` via
|
|
the optional properties added in PR E.7 (`OperatorComment`,
|
|
`OriginalRaiseTimestampUtc`, `AlarmCategory`). Consumers that don't
|
|
need them are unaffected; consumers that do (Client.UI, Client.CLI
|
|
verbose mode) read the new fields when present.
|
|
|
|
## Galaxy sub-attribute fallback
|
|
|
|
For Galaxy templates without `$Alarm*` extensions, the value-driven
|
|
path stays in place: `DriverNodeManager` registers an
|
|
`AlarmConditionState` per Galaxy variable that bears alarm-bearing
|
|
sub-attributes (`InAlarm`, `Acked`, `Priority`, `Description`),
|
|
subscribes to those sub-attributes, and synthesizes Part 9 transitions
|
|
when the values change. This path operated as the only Galaxy alarm
|
|
path between PR 7.2 and the alarms-over-gateway epic; it remains the
|
|
fallback today.
|
|
|
|
When both paths report the same condition,
|
|
`AlarmConditionService.AlarmConditionState` keeps the
|
|
driver-native record and discards the duplicate sub-attribute
|
|
synthesis. Driver-native transitions are richer (carry operator
|
|
comment + original raise time) and arrive lower-latency (no
|
|
publishing-interval delay on the sub-attribute reads), so they win
|
|
the dedup.
|
|
|
|
## Acknowledge routing
|
|
|
|
`DriverNodeManager` picks the acknowledger when registering each
|
|
condition (PR B.3 logic):
|
|
|
|
- Driver implements `IAlarmSource` →
|
|
`DriverAlarmSourceAcknowledger` routes the operator comment
|
|
through `IAlarmSource.AcknowledgeAsync` via the existing
|
|
`AlarmSurfaceInvoker` (Phase 6.1 resilience pipeline; no-retry
|
|
per decision #143). End-to-end operator-comment fidelity is
|
|
preserved.
|
|
- Driver doesn't implement `IAlarmSource` →
|
|
`DriverWritableAcknowledger` writes the comment into the
|
|
`AckMsgWriteRef` sub-attribute via `IWritable.WriteAsync`. Same
|
|
resilience pipeline; collapses the comment into a single string
|
|
write at the wire level.
|
|
|
|
The OPC UA Part 9 `AlarmConditionState.OnAcknowledge` delegate
|
|
already validates the session's `AlarmAck` role before dispatching,
|
|
so the gateway-side ack RPC only sees authenticated, authorised
|
|
calls.
|
|
|
|
## Historian write-back (non-Galaxy alarms)
|
|
|
|
Scripted alarms (and any future non-Galaxy `IAlarmSource` like
|
|
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
|
|
|
|
- `IAlarmHistorianSink` is the DI-registered intake contract. The
|
|
default binding is `NullAlarmHistorianSink` (registered in
|
|
`ServiceCollectionExtensions.AddOtOpcUaRuntime`). Production
|
|
deployments override it with `SqliteStoreAndForwardSink` wrapping
|
|
`WonderwareHistorianClient` (the AVEVA Historian sidecar IPC client)
|
|
— see [ServiceHosting.md](ServiceHosting.md) for the sidecar setup.
|
|
- `SqliteStoreAndForwardSink` queues each transition to a local
|
|
SQLite database and drains in the background via an
|
|
`IAlarmHistorianWriter`. **The durability guarantee is bounded**: the
|
|
queue capacity defaults to 1,000,000 rows; under a sustained
|
|
historian outage, older non-dead-lettered rows are evicted (oldest
|
|
first) to make room for new events. The `HistorianSinkStatus.EvictedCount`
|
|
counter surfaces lifetime eviction events so operators can detect
|
|
silent data loss without log scraping.
|
|
- `HistorianAdapterActor`
|
|
(`src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs`)
|
|
bridges Akka cluster messages from `ScriptedAlarmActor` into the
|
|
sink's `EnqueueAsync`; fire-and-forget so the actor loop is never
|
|
blocked on historian reachability.
|
|
|
|
Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian
|
|
directly via System Platform's `HistorizeToAveva` toggle on the
|
|
alarm primitive — no involvement from OtOpcUa. This sidecar path is
|
|
exclusively for non-Galaxy alarm producers.
|
|
|
|
## Cross-references
|
|
|
|
- Plan: [docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md)
|
|
- v1 archive: [docs/v1/AlarmTracking.md](v1/AlarmTracking.md)
|
|
- Galaxy driver: [docs/drivers/Galaxy.md](drivers/Galaxy.md)
|
|
- Phase 7 scripting + alarming: [docs/v2/implementation/phase-7-scripting-and-alarming.md](v2/implementation/phase-7-scripting-and-alarming.md)
|
|
- Security + ACL: [docs/security.md](security.md)
|