Read-only mirror of native alarm sources into a unified A&C-style state model (severity + active/acked/shelved/suppressed). Instance-bound source discovery, site-only SQLite state with live central query (no central tables), DebugView enrichment. OPC UA A&C events + ConditionRefresh and MxGateway session-less StreamAlarms via a new IAlarmSubscribableConnection seam routed connection-level by source reference; new NativeAlarmActor peer to computed AlarmActor.
19 KiB
Native OPC UA & MxAccess Gateway Alarms — Design
Date: 2026-05-29 Status: Approved
Problem
Today alarms are computed at the site: a TemplateAlarm defines a trigger (ValueMatch, RangeViolation, RateOfChange, HiLo, Expression); one AlarmActor per alarm evaluates attribute values and emits AlarmStateChanged carrying a bare AlarmState { Active, Normal } plus an integer Priority and (for HiLo) a Level. State is in-memory only — there is no severity dimension, no acknowledgement, no shelve/suppress state, and no operator metadata — and it surfaces only in the per-instance DebugView.
Two data sources we connect to own their own alarm lifecycle and expose far richer state:
- OPC UA Alarms & Conditions (Part 9) — the server raises/acks/clears
AlarmConditionnodes with orthogonal sub-states (Active/Inactive, Acked/Unacked, Confirmed/Unconfirmed, Shelved, Suppressed) and a 1–1000 severity. The DCL OPC UA adapter currently subscribes only to theValueattribute. - MxAccess Gateway — already exposes a session-less
StreamAlarmsfeed (OnAlarmTransitionEvent: raise/ack/clear/retrigger, severity, operator user + comment, category, description, current/limit value) plusQueryActiveAlarms. The DCL MxGateway adapter currently consumes only theOnDataChangeevent family.
These are mirrored alarms — the source is the source of truth — which is a real divergence from the computed model. This design enriches the alarm tracking model to carry severity + ack/shelve/suppress state, and ingests native alarms from both sources.
Design Decisions
| Decision | Choice |
|---|---|
| State model scope | Unified A&C-style state model for all alarms (computed + native) |
| Interactivity | Read-only mirror — display source-reported state; no acking/shelving from ScadaBridge, no command relay, no operator identity captured by ScadaBridge (source-supplied operator user/comment are displayed for native alarms) |
| Binding | Instance declares a NativeAlarmSource (connection + source ref); conditions under it are discovered at runtime, keyed by source reference |
| State location | Site-only, persisted to SQLite (survives restart/failover); central queries live (snapshot + live stream); no central tables, no central history |
| MxGateway transport | Gateway session-less StreamAlarms feed |
| OPC UA transport | Alarms & Conditions events + ConditionRefresh snapshot |
| Site actor structure | New NativeAlarmActor child of InstanceActor, peer to computed AlarmActors (Approach 1) |
| Authoring | Central UI design-time panels (Template editor + Instance Configure) and CLI |
| Runtime UI | Enrich the per-instance DebugView alarm table only (no new operator page) |
Trade-offs accepted
- No central audit trail of alarms (who acked, history). Acceptable because the source systems own ack and retain their own alarm history; ScadaBridge is a read-only window. If audit of alarm state is later wanted, a central mirror following the Site Call Audit (#22) pattern can be added without disturbing this design.
- Read-only means MxGateway/OPC UA acknowledgements happen in the source's own tools; ScadaBridge reflects them.
Section 1 — Unified state model & wire contracts
New Commons types (Types/Enums/, Types/Alarms/):
enum AlarmKind { Computed, NativeOpcUa, NativeMxAccess }
enum AlarmShelveState { Unshelved, OneShotShelved, TimedShelved, PermanentShelved }
record AlarmConditionState(
bool Active, // Active vs Inactive
bool Acknowledged, // Acked / Unacked
bool? Confirmed, // null = not a confirmable condition (OPC UA optional)
AlarmShelveState Shelve,
bool Suppressed,
int Severity) // 0–1000, unified scale
The OPC UA Part 9 sub-conditions are orthogonal and MxAccess's ACTIVE / ACTIVE_ACKED / INACTIVE maps cleanly onto them, so they are modeled as independent flags (the UI rolls them up for display).
AlarmStateChanged is extended additively (existing fields kept for back-compat; new fields defaulted):
| New field | Default | Notes |
|---|---|---|
Kind |
Computed |
discriminator |
Condition |
computed from existing | the AlarmConditionState above |
SourceReference |
"" |
native key, e.g. "Tank01.Level.HiHi" |
AlarmTypeName |
"" |
native, e.g. "AnalogLimitAlarm.HiHi" |
Category |
"" |
native taxonomy |
OperatorUser |
"" |
native ack metadata (display-only) |
OperatorComment |
"" |
native ack metadata (display-only) |
OriginalRaiseTime |
null |
native |
CurrentValue |
"" |
native, display |
LimitValue |
"" |
native, display |
Identity / key: computed alarms key by AlarmName (canonical); native alarms key by SourceReference (stable across transitions). Kind discriminates. The existing AlarmName field carries the source reference's display form for native rows so existing consumers don't break.
Source → AlarmConditionState mapping:
- Computed:
Active = state==Active,Acknowledged = true(auto),Confirmed = null,Shelve = Unshelved,Suppressed = false,Severity = Priority,Levelretained for HiLo. - OPC UA A&C: read
ActiveState,AckedState,ConfirmedState,ShelvingState,SuppressedState,Severityfrom the condition's event fields. - MxAccess:
ACTIVE → (Active=t, Ack=f),ACTIVE_ACKED → (Active=t, Ack=t),INACTIVE → (Active=f);Severityfrom the gateway's remapped 0–1000; shelve/suppress default (gateway proto doesn't surface them).
gRPC AlarmStateUpdate (sitestream.proto) gets the same fields appended as new field numbers (additive — never renumber/remove): kind, active, acknowledged, confirmed, shelve_state, suppressed, source_reference, alarm_type_name, category, operator_user, operator_comment, original_raise_time, current_value, limit_value. Existing state, priority, level, message stay for compatibility.
Section 2 — Configuration, binding & deployment
This mirrors how template attributes bind to a data source today (template declares, instance overrides the concrete reference).
New Commons entities
TemplateNativeAlarmSource(Entities/Templates/):Id,TemplateId,Name(unique within template),Description?,ConnectionName,SourceReference(OPC UA SourceNode/notifier nodeId, or MxAccess object/area),ConditionFilter?(null = mirror all conditions under the source),IsLocked,IsInherited— same lock/inherit bookkeeping asTemplateAlarm.InstanceNativeAlarmSourceOverride(Entities/Instances/):Id,InstanceId,SourceCanonicalName,ConnectionNameOverride?,SourceReferenceOverride?,ConditionFilterOverride?; unique(InstanceId, SourceCanonicalName).SourceReferenceis the field that varies per physical instance, so per-instance override is the common case.
Flattening (FlatteningService, FlattenedConfiguration)
- New
ResolvedNativeAlarmSource { CanonicalName, ConnectionName, SourceReference, ConditionFilter?, Source }, resolved through the same steps asResolvedAlarm: inherited → composed (path-qualified[Module].[Name]) → instance overrides applied. - Pre-deployment semantic validation (extends existing checks):
ConnectionNameresolves to a real siteDataConnection; that connection's protocol is alarm-capable (OpcUaorMxGateway);SourceReferencenon-empty; canonical-name collision check.
ConfigurationDatabase (EF + migration)
TemplateNativeAlarmSourceConfiguration→ tableTemplates.NativeAlarmSources, unique(TemplateId, Name), FK cascade.InstanceNativeAlarmSourceOverrideConfiguration→ tableInstanceNativeAlarmSourceOverrides, unique(InstanceId, SourceCanonicalName), FK cascade.- One migration adds both tables (auto-apply in dev per existing convention).
Deployment — FlattenedConfiguration carries ResolvedNativeAlarmSource[], deployed alongside ResolvedAlarm[] on the existing artifact path. Site Runtime consumes them when building the instance actor hierarchy. All-or-nothing per-instance apply unchanged.
Authoring (Central UI + CLI)
- Template editor: a "Native Alarm Sources" subsection (name, connection dropdown filtered to alarm-capable protocols, source reference, optional filter).
- Instance Configure: override connection/source-ref/filter per instance, like attribute data-source overrides.
- CLI:
template native-alarm-source add/list/remove,instance native-alarm-source set/clear.
Section 3 — DCL ingestion & the two adapters
Capability seam (mirrors the existing IBrowsableDataConnection pattern):
interface IAlarmSubscribableConnection {
Task<string> SubscribeAlarmsAsync(string sourceReference, string? conditionFilter, AlarmTransitionCallback cb);
Task UnsubscribeAlarmsAsync(string subscriptionId);
}
delegate void AlarmTransitionCallback(NativeAlarmTransition t);
Protocol-neutral transition (Commons/Types/Alarms/):
enum AlarmTransitionKind { Snapshot, SnapshotComplete, Raise, Acknowledge, Clear, Retrigger, StateChange }
record NativeAlarmTransition(
string SourceReference, string SourceObjectReference, string AlarmTypeName,
AlarmTransitionKind Kind, AlarmConditionState Condition,
string Category, string Description, string Message,
string OperatorUser, string OperatorComment,
DateTimeOffset? OriginalRaiseTime, DateTimeOffset TransitionTime,
string CurrentValue, string LimitValue)
Snapshot/SnapshotComplete carry the initial active-condition replay so the consumer re-seeds a source's state on every (re)subscribe — this is how reconnect reconciliation works without central storage.
Connection-level transport + source-ref routing. Although binding is declared per-instance, the subscription is naturally connection-level (OPC UA wants one event subscription; MxGateway StreamAlarms is one session-less feed). DataConnectionActor opens one alarm feed per connection and maintains _alarmSubscribers: SourceObjectRef → set<instance actorRef>, routing each transition to matching instances.
New messages (Messages/DataConnection/): SubscribeAlarmsRequest/Response, UnsubscribeAlarmsRequest, internal NativeAlarmTransitionReceived(conn, transition, generation), forwarded as NativeAlarmTransitionUpdate(conn, transition). Subscribe/unsubscribe obey the existing Become/Stash lifecycle (stashed while Connecting/Reconnecting, replayed on Connected). The stale-callback generation guard and once-only disconnect guard apply unchanged. On disconnect the actor emits a per-source NativeAlarmSourceUnavailable so consumers mark mirrored alarms uncertain rather than clearing them.
OPC UA A&C adapter (OpcUaDataConnection / RealOpcUaClient)
- One event
MonitoredItem(AttributeId = EventNotifier) on the Server object (i=2253) or configured notifier, with anEventFilter: SelectClauses for EventId, EventType, SourceNode, SourceName, Time, Message, Severity +ConditionType/AcknowledgeableConditionType/AlarmConditionTypestate fields (Acked/Confirmed/Active/Shelving/Suppressed). Optional WhereClause scoping to the union of bound SourceNodes. - Map event fields →
NativeAlarmTransition; deriveKindfrom which sub-state changed. - Call
ConditionRefreshon (re)subscribe → emit theSnapshot/SnapshotCompletesequence.
MxGateway adapter (MxGatewayDataConnection / RealMxGatewayClient)
- Open session-less
StreamAlarms(optionalalarm_filter_prefixfrom bound source refs). MapAlarmFeedMessage:active_alarm→Snapshot,snapshot_complete→SnapshotComplete,transition (OnAlarmTransitionEvent)→ mapped transition (RAISE/ACK/CLEAR/RETRIGGER, severity, operator user+comment, category, description, raise/transition times, current/limit value). - Resumable stream; on transport fault re-open (existing
RaiseDisconnectedonce-only guard) → fresh snapshot re-seeds. - Uses
ZB.MOM.WW.MxGateway.Client'sStreamAlarmsAsync(already exercised by OtOpcUa'sGatewayGalaxyAlarmFeed); bump the NuGet package if the referenced version predates it.
Section 4 — Site runtime, central query, UI, errors & testing
NativeAlarmActor (new)
- Child of
InstanceActor, one perResolvedNativeAlarmSource(namednative-alarm-{canonicalName}). OnPreStartsendsSubscribeAlarmsRequestfor its(ConnectionName, SourceReference, ConditionFilter). Holds_alarms: Dictionary<sourceRef, MirroredAlarm>(discovered conditions +AlarmConditionState+ metadata). - On
NativeAlarmTransitionUpdate:Snapshot…SnapshotComplete→ buffer then atomic swap the source's set (drop conditions absent from the snapshot, emit diffs — no flicker);Raise/Ack/Clear/Retrigger/StateChange→ update entry, last-write-wins byTransitionTime(ignore older). Each change emits an enrichedAlarmStateChangedtoInstanceActor→ existing stream path. - Retention: keep an entry while
ActiveORUnacked; once fully normal (InactiveANDAcked) emit a final return-to-normal and drop it. - On
NativeAlarmSourceUnavailable: mark its alarms uncertain (snapshot flag) rather than clearing; re-seed from the reconnect snapshot. - Persistence: site-SQLite table
NativeAlarmState (InstanceUniqueName, SourceCanonicalName, SourceReference, serialized condition+metadata, LastTransitionTime). Rehydrate onPreStart(so central can query immediately after restart), then reconcile against the fresh snapshot. Reset on redeployment, like static attribute writes. - Supervision: coordinator-style child → Resume. A bad source ref / subscribe failure logs to the site event log (
alarm), reports unhealthy, and is retried periodically (same spirit as tag-resolution retry) without crashing the instance.
Computed AlarmActor: no logic change — populate AlarmConditionState on emit (Active, Acknowledged=true, Severity=Priority, Level retained, Kind=Computed).
InstanceActor: builds NativeAlarmActors from ResolvedNativeAlarmSource[]; native AlarmStateChanged flows through the existing _alarmStates/_alarmTimestamps + _streamManager.PublishAlarmStateChanged path (state dictionaries extended to carry the enriched shape); the instance snapshot includes native alarms.
Streaming + central query (no central tables)
- Live: enriched
AlarmStateChanged→SiteStreamManager→ enriched gRPCAlarmStateUpdate→ DebugView, as today. - Initial snapshot: the existing ClusterClient instance-snapshot request (DebugView's seed) is extended to include native alarms in the unified shape. Large snapshots reuse existing per-subscriber buffering / frame-size guard (the browse-cap precedent); chunk if needed.
Central UI — DebugView enrichment (+ Section 2 authoring panels)
- Alarm table gains: Severity, a composite condition badge (Active/Acked/Shelved/Suppressed), a Kind badge (computed vs native), Source reference, Alarm type, Category, Operator/comment (tooltip), Original raise time, Current/Limit value (tooltip). Computed rows show severity=priority, auto-acked. Built with the
frontend-designskill, Bootstrap-only custom components.
Error handling / edge cases
- Connection loss → uncertain, not cleared; reconnect snapshot reconciles. Source ref absent from snapshot → cleared. Severity normalized to 0–1000. Bounded growth: configurable per-source mirrored-alarm cap in
SiteRuntimeOptions; when hit, log it (no silent truncation). Disabled/deleted instance → unsubscribe. DataConnectionActorhealth report extended with alarm-feed status (active feeds, last-event time, uncertain sources) viaISiteHealthCollector.
Testing
- Unit:
AlarmConditionStatemapping (computed / OPC UA fields / MxAccess states);NativeAlarmActorsnapshot-swap, transition handling, persistence rehydrate, uncertain-on-disconnect;FlatteningServicenative-source inherit/compose/override; semantic validation. - Adapter: OPC UA event→transition + ConditionRefresh snapshot (fake client); MxGateway
AlarmFeedMessage→transition + reconnect re-seed (fake client, existing fake patterns). - Integration: end-to-end against the infra OPC UA server — confirm the test OPC UA server exposes A&C; if not, add an alarm-capable test source or simulate. MxGateway path mocked in CI unless a gateway-with-alarms is available.
- Seed: add a
NativeAlarmSourcebinding to thedocker-env2site-x MxGateway connection for manual verification.
Affected components & documents
| Area | Changes |
|---|---|
| Commons | New enums/records (AlarmKind, AlarmShelveState, AlarmConditionState, NativeAlarmTransition); extend AlarmStateChanged; new entities TemplateNativeAlarmSource, InstanceNativeAlarmSourceOverride; new DCL messages; IAlarmSubscribableConnection |
| Template Engine (#1) | ResolvedNativeAlarmSource, flattening resolution, semantic validation |
| Site Runtime (#3) | NativeAlarmActor, enriched AlarmActor, InstanceActor wiring, NativeAlarmState SQLite persistence, SiteRuntimeOptions cap |
| Data Connection Layer (#4) | DataConnectionActor alarm feed + routing; OPC UA A&C adapter; MxGateway StreamAlarms adapter |
| Communication (#5) | sitestream.proto AlarmStateUpdate enrichment; instance-snapshot enrichment |
| Configuration Database (#17) | EF configurations + migration for two new tables |
| Central UI (#9) | DebugView alarm table enrichment; Template editor + Instance Configure authoring panels |
| CLI (#19) | native-alarm-source commands |
| Health Monitoring (#11) | Alarm-feed status in DataConnectionHealthReport |
| Docs | Component-DataConnectionLayer.md, Component-SiteRuntime.md, Component-TemplateEngine.md, Component-CentralUI.md, Component-CLI.md, Component-Communication.md, Component-ConfigurationDatabase.md; CLAUDE.md design-decisions; README if needed |
Out of scope (this pass)
- Acknowledging / shelving / suppressing from ScadaBridge (read-only mirror).
- Central alarm tables, alarm history/journal, central audit of alarm state.
- A dedicated operator-facing Alarm Summary page (DebugView only).
- Alarm-driven notifications or scripts off native alarms.
Open items / risks
- MxGateway alarm delivery must work end-to-end via
StreamAlarms. OtOpcUa notes record the x86 COM worker historically delivered no native alarm events; we are trusting that the gateway now delivers (per the chosen transport). Verify against a live gateway before integration sign-off. - Test OPC UA server A&C support — confirm the infra OPC UA server exposes Alarms & Conditions; otherwise add/simulate an alarm-capable source for integration tests.
ZB.MOM.WW.MxGateway.Clientversion — ensure the referenced package exposesStreamAlarmsAsync; bump if needed.