Files
ScadaBridge/docs/plans/2026-05-29-native-alarms-design.md
Joseph Doherty dadebbe227 docs(plans): native OPC UA & MxAccess GW alarms design
Read-only mirror of native alarm sources into a unified A&C-style state
model (severity + active/acked/shelved/suppressed). Instance-bound source
discovery, site-only SQLite state with live central query (no central
tables), DebugView enrichment. OPC UA A&C events + ConditionRefresh and
MxGateway session-less StreamAlarms via a new IAlarmSubscribableConnection
seam routed connection-level by source reference; new NativeAlarmActor peer
to computed AlarmActor.
2026-05-29 15:14:01 -04:00

19 KiB
Raw Permalink Blame History

Native OPC UA & MxAccess Gateway Alarms — Design

Date: 2026-05-29 Status: Approved

Problem

Today alarms are computed at the site: a TemplateAlarm defines a trigger (ValueMatch, RangeViolation, RateOfChange, HiLo, Expression); one AlarmActor per alarm evaluates attribute values and emits AlarmStateChanged carrying a bare AlarmState { Active, Normal } plus an integer Priority and (for HiLo) a Level. State is in-memory only — there is no severity dimension, no acknowledgement, no shelve/suppress state, and no operator metadata — and it surfaces only in the per-instance DebugView.

Two data sources we connect to own their own alarm lifecycle and expose far richer state:

  • OPC UA Alarms & Conditions (Part 9) — the server raises/acks/clears AlarmCondition nodes with orthogonal sub-states (Active/Inactive, Acked/Unacked, Confirmed/Unconfirmed, Shelved, Suppressed) and a 11000 severity. The DCL OPC UA adapter currently subscribes only to the Value attribute.
  • MxAccess Gateway — already exposes a session-less StreamAlarms feed (OnAlarmTransitionEvent: raise/ack/clear/retrigger, severity, operator user + comment, category, description, current/limit value) plus QueryActiveAlarms. The DCL MxGateway adapter currently consumes only the OnDataChange event family.

These are mirrored alarms — the source is the source of truth — which is a real divergence from the computed model. This design enriches the alarm tracking model to carry severity + ack/shelve/suppress state, and ingests native alarms from both sources.

Design Decisions

Decision Choice
State model scope Unified A&C-style state model for all alarms (computed + native)
Interactivity Read-only mirror — display source-reported state; no acking/shelving from ScadaBridge, no command relay, no operator identity captured by ScadaBridge (source-supplied operator user/comment are displayed for native alarms)
Binding Instance declares a NativeAlarmSource (connection + source ref); conditions under it are discovered at runtime, keyed by source reference
State location Site-only, persisted to SQLite (survives restart/failover); central queries live (snapshot + live stream); no central tables, no central history
MxGateway transport Gateway session-less StreamAlarms feed
OPC UA transport Alarms & Conditions events + ConditionRefresh snapshot
Site actor structure New NativeAlarmActor child of InstanceActor, peer to computed AlarmActors (Approach 1)
Authoring Central UI design-time panels (Template editor + Instance Configure) and CLI
Runtime UI Enrich the per-instance DebugView alarm table only (no new operator page)

Trade-offs accepted

  • No central audit trail of alarms (who acked, history). Acceptable because the source systems own ack and retain their own alarm history; ScadaBridge is a read-only window. If audit of alarm state is later wanted, a central mirror following the Site Call Audit (#22) pattern can be added without disturbing this design.
  • Read-only means MxGateway/OPC UA acknowledgements happen in the source's own tools; ScadaBridge reflects them.

Section 1 — Unified state model & wire contracts

New Commons types (Types/Enums/, Types/Alarms/):

enum AlarmKind { Computed, NativeOpcUa, NativeMxAccess }
enum AlarmShelveState { Unshelved, OneShotShelved, TimedShelved, PermanentShelved }

record AlarmConditionState(
    bool   Active,            // Active vs Inactive
    bool   Acknowledged,      // Acked / Unacked
    bool?  Confirmed,         // null = not a confirmable condition (OPC UA optional)
    AlarmShelveState Shelve,
    bool   Suppressed,
    int    Severity)          // 01000, unified scale

The OPC UA Part 9 sub-conditions are orthogonal and MxAccess's ACTIVE / ACTIVE_ACKED / INACTIVE maps cleanly onto them, so they are modeled as independent flags (the UI rolls them up for display).

AlarmStateChanged is extended additively (existing fields kept for back-compat; new fields defaulted):

New field Default Notes
Kind Computed discriminator
Condition computed from existing the AlarmConditionState above
SourceReference "" native key, e.g. "Tank01.Level.HiHi"
AlarmTypeName "" native, e.g. "AnalogLimitAlarm.HiHi"
Category "" native taxonomy
OperatorUser "" native ack metadata (display-only)
OperatorComment "" native ack metadata (display-only)
OriginalRaiseTime null native
CurrentValue "" native, display
LimitValue "" native, display

Identity / key: computed alarms key by AlarmName (canonical); native alarms key by SourceReference (stable across transitions). Kind discriminates. The existing AlarmName field carries the source reference's display form for native rows so existing consumers don't break.

Source → AlarmConditionState mapping:

  • Computed: Active = state==Active, Acknowledged = true (auto), Confirmed = null, Shelve = Unshelved, Suppressed = false, Severity = Priority, Level retained for HiLo.
  • OPC UA A&C: read ActiveState, AckedState, ConfirmedState, ShelvingState, SuppressedState, Severity from the condition's event fields.
  • MxAccess: ACTIVE → (Active=t, Ack=f), ACTIVE_ACKED → (Active=t, Ack=t), INACTIVE → (Active=f); Severity from the gateway's remapped 01000; shelve/suppress default (gateway proto doesn't surface them).

gRPC AlarmStateUpdate (sitestream.proto) gets the same fields appended as new field numbers (additive — never renumber/remove): kind, active, acknowledged, confirmed, shelve_state, suppressed, source_reference, alarm_type_name, category, operator_user, operator_comment, original_raise_time, current_value, limit_value. Existing state, priority, level, message stay for compatibility.


Section 2 — Configuration, binding & deployment

This mirrors how template attributes bind to a data source today (template declares, instance overrides the concrete reference).

New Commons entities

  • TemplateNativeAlarmSource (Entities/Templates/): Id, TemplateId, Name (unique within template), Description?, ConnectionName, SourceReference (OPC UA SourceNode/notifier nodeId, or MxAccess object/area), ConditionFilter? (null = mirror all conditions under the source), IsLocked, IsInherited — same lock/inherit bookkeeping as TemplateAlarm.
  • InstanceNativeAlarmSourceOverride (Entities/Instances/): Id, InstanceId, SourceCanonicalName, ConnectionNameOverride?, SourceReferenceOverride?, ConditionFilterOverride?; unique (InstanceId, SourceCanonicalName). SourceReference is the field that varies per physical instance, so per-instance override is the common case.

Flattening (FlatteningService, FlattenedConfiguration)

  • New ResolvedNativeAlarmSource { CanonicalName, ConnectionName, SourceReference, ConditionFilter?, Source }, resolved through the same steps as ResolvedAlarm: inherited → composed (path-qualified [Module].[Name]) → instance overrides applied.
  • Pre-deployment semantic validation (extends existing checks): ConnectionName resolves to a real site DataConnection; that connection's protocol is alarm-capable (OpcUa or MxGateway); SourceReference non-empty; canonical-name collision check.

ConfigurationDatabase (EF + migration)

  • TemplateNativeAlarmSourceConfiguration → table Templates.NativeAlarmSources, unique (TemplateId, Name), FK cascade.
  • InstanceNativeAlarmSourceOverrideConfiguration → table InstanceNativeAlarmSourceOverrides, unique (InstanceId, SourceCanonicalName), FK cascade.
  • One migration adds both tables (auto-apply in dev per existing convention).

DeploymentFlattenedConfiguration carries ResolvedNativeAlarmSource[], deployed alongside ResolvedAlarm[] on the existing artifact path. Site Runtime consumes them when building the instance actor hierarchy. All-or-nothing per-instance apply unchanged.

Authoring (Central UI + CLI)

  • Template editor: a "Native Alarm Sources" subsection (name, connection dropdown filtered to alarm-capable protocols, source reference, optional filter).
  • Instance Configure: override connection/source-ref/filter per instance, like attribute data-source overrides.
  • CLI: template native-alarm-source add/list/remove, instance native-alarm-source set/clear.

Section 3 — DCL ingestion & the two adapters

Capability seam (mirrors the existing IBrowsableDataConnection pattern):

interface IAlarmSubscribableConnection {
    Task<string> SubscribeAlarmsAsync(string sourceReference, string? conditionFilter, AlarmTransitionCallback cb);
    Task UnsubscribeAlarmsAsync(string subscriptionId);
}
delegate void AlarmTransitionCallback(NativeAlarmTransition t);

Protocol-neutral transition (Commons/Types/Alarms/):

enum AlarmTransitionKind { Snapshot, SnapshotComplete, Raise, Acknowledge, Clear, Retrigger, StateChange }

record NativeAlarmTransition(
    string SourceReference, string SourceObjectReference, string AlarmTypeName,
    AlarmTransitionKind Kind, AlarmConditionState Condition,
    string Category, string Description, string Message,
    string OperatorUser, string OperatorComment,
    DateTimeOffset? OriginalRaiseTime, DateTimeOffset TransitionTime,
    string CurrentValue, string LimitValue)

Snapshot/SnapshotComplete carry the initial active-condition replay so the consumer re-seeds a source's state on every (re)subscribe — this is how reconnect reconciliation works without central storage.

Connection-level transport + source-ref routing. Although binding is declared per-instance, the subscription is naturally connection-level (OPC UA wants one event subscription; MxGateway StreamAlarms is one session-less feed). DataConnectionActor opens one alarm feed per connection and maintains _alarmSubscribers: SourceObjectRef → set<instance actorRef>, routing each transition to matching instances.

New messages (Messages/DataConnection/): SubscribeAlarmsRequest/Response, UnsubscribeAlarmsRequest, internal NativeAlarmTransitionReceived(conn, transition, generation), forwarded as NativeAlarmTransitionUpdate(conn, transition). Subscribe/unsubscribe obey the existing Become/Stash lifecycle (stashed while Connecting/Reconnecting, replayed on Connected). The stale-callback generation guard and once-only disconnect guard apply unchanged. On disconnect the actor emits a per-source NativeAlarmSourceUnavailable so consumers mark mirrored alarms uncertain rather than clearing them.

OPC UA A&C adapter (OpcUaDataConnection / RealOpcUaClient)

  • One event MonitoredItem (AttributeId = EventNotifier) on the Server object (i=2253) or configured notifier, with an EventFilter: SelectClauses for EventId, EventType, SourceNode, SourceName, Time, Message, Severity + ConditionType/AcknowledgeableConditionType/AlarmConditionType state fields (Acked/Confirmed/Active/Shelving/Suppressed). Optional WhereClause scoping to the union of bound SourceNodes.
  • Map event fields → NativeAlarmTransition; derive Kind from which sub-state changed.
  • Call ConditionRefresh on (re)subscribe → emit the Snapshot/SnapshotComplete sequence.

MxGateway adapter (MxGatewayDataConnection / RealMxGatewayClient)

  • Open session-less StreamAlarms (optional alarm_filter_prefix from bound source refs). Map AlarmFeedMessage: active_alarmSnapshot, snapshot_completeSnapshotComplete, transition (OnAlarmTransitionEvent) → mapped transition (RAISE/ACK/CLEAR/RETRIGGER, severity, operator user+comment, category, description, raise/transition times, current/limit value).
  • Resumable stream; on transport fault re-open (existing RaiseDisconnected once-only guard) → fresh snapshot re-seeds.
  • Uses ZB.MOM.WW.MxGateway.Client's StreamAlarmsAsync (already exercised by OtOpcUa's GatewayGalaxyAlarmFeed); bump the NuGet package if the referenced version predates it.

Section 4 — Site runtime, central query, UI, errors & testing

NativeAlarmActor (new)

  • Child of InstanceActor, one per ResolvedNativeAlarmSource (named native-alarm-{canonicalName}). On PreStart sends SubscribeAlarmsRequest for its (ConnectionName, SourceReference, ConditionFilter). Holds _alarms: Dictionary<sourceRef, MirroredAlarm> (discovered conditions + AlarmConditionState + metadata).
  • On NativeAlarmTransitionUpdate: Snapshot…SnapshotComplete → buffer then atomic swap the source's set (drop conditions absent from the snapshot, emit diffs — no flicker); Raise/Ack/Clear/Retrigger/StateChange → update entry, last-write-wins by TransitionTime (ignore older). Each change emits an enriched AlarmStateChanged to InstanceActor → existing stream path.
  • Retention: keep an entry while Active OR Unacked; once fully normal (Inactive AND Acked) emit a final return-to-normal and drop it.
  • On NativeAlarmSourceUnavailable: mark its alarms uncertain (snapshot flag) rather than clearing; re-seed from the reconnect snapshot.
  • Persistence: site-SQLite table NativeAlarmState (InstanceUniqueName, SourceCanonicalName, SourceReference, serialized condition+metadata, LastTransitionTime). Rehydrate on PreStart (so central can query immediately after restart), then reconcile against the fresh snapshot. Reset on redeployment, like static attribute writes.
  • Supervision: coordinator-style child → Resume. A bad source ref / subscribe failure logs to the site event log (alarm), reports unhealthy, and is retried periodically (same spirit as tag-resolution retry) without crashing the instance.

Computed AlarmActor: no logic change — populate AlarmConditionState on emit (Active, Acknowledged=true, Severity=Priority, Level retained, Kind=Computed).

InstanceActor: builds NativeAlarmActors from ResolvedNativeAlarmSource[]; native AlarmStateChanged flows through the existing _alarmStates/_alarmTimestamps + _streamManager.PublishAlarmStateChanged path (state dictionaries extended to carry the enriched shape); the instance snapshot includes native alarms.

Streaming + central query (no central tables)

  • Live: enriched AlarmStateChangedSiteStreamManager → enriched gRPC AlarmStateUpdate → DebugView, as today.
  • Initial snapshot: the existing ClusterClient instance-snapshot request (DebugView's seed) is extended to include native alarms in the unified shape. Large snapshots reuse existing per-subscriber buffering / frame-size guard (the browse-cap precedent); chunk if needed.

Central UI — DebugView enrichment (+ Section 2 authoring panels)

  • Alarm table gains: Severity, a composite condition badge (Active/Acked/Shelved/Suppressed), a Kind badge (computed vs native), Source reference, Alarm type, Category, Operator/comment (tooltip), Original raise time, Current/Limit value (tooltip). Computed rows show severity=priority, auto-acked. Built with the frontend-design skill, Bootstrap-only custom components.

Error handling / edge cases

  • Connection loss → uncertain, not cleared; reconnect snapshot reconciles. Source ref absent from snapshot → cleared. Severity normalized to 01000. Bounded growth: configurable per-source mirrored-alarm cap in SiteRuntimeOptions; when hit, log it (no silent truncation). Disabled/deleted instance → unsubscribe.
  • DataConnectionActor health report extended with alarm-feed status (active feeds, last-event time, uncertain sources) via ISiteHealthCollector.

Testing

  • Unit: AlarmConditionState mapping (computed / OPC UA fields / MxAccess states); NativeAlarmActor snapshot-swap, transition handling, persistence rehydrate, uncertain-on-disconnect; FlatteningService native-source inherit/compose/override; semantic validation.
  • Adapter: OPC UA event→transition + ConditionRefresh snapshot (fake client); MxGateway AlarmFeedMessage→transition + reconnect re-seed (fake client, existing fake patterns).
  • Integration: end-to-end against the infra OPC UA server — confirm the test OPC UA server exposes A&C; if not, add an alarm-capable test source or simulate. MxGateway path mocked in CI unless a gateway-with-alarms is available.
  • Seed: add a NativeAlarmSource binding to the docker-env2 site-x MxGateway connection for manual verification.

Affected components & documents

Area Changes
Commons New enums/records (AlarmKind, AlarmShelveState, AlarmConditionState, NativeAlarmTransition); extend AlarmStateChanged; new entities TemplateNativeAlarmSource, InstanceNativeAlarmSourceOverride; new DCL messages; IAlarmSubscribableConnection
Template Engine (#1) ResolvedNativeAlarmSource, flattening resolution, semantic validation
Site Runtime (#3) NativeAlarmActor, enriched AlarmActor, InstanceActor wiring, NativeAlarmState SQLite persistence, SiteRuntimeOptions cap
Data Connection Layer (#4) DataConnectionActor alarm feed + routing; OPC UA A&C adapter; MxGateway StreamAlarms adapter
Communication (#5) sitestream.proto AlarmStateUpdate enrichment; instance-snapshot enrichment
Configuration Database (#17) EF configurations + migration for two new tables
Central UI (#9) DebugView alarm table enrichment; Template editor + Instance Configure authoring panels
CLI (#19) native-alarm-source commands
Health Monitoring (#11) Alarm-feed status in DataConnectionHealthReport
Docs Component-DataConnectionLayer.md, Component-SiteRuntime.md, Component-TemplateEngine.md, Component-CentralUI.md, Component-CLI.md, Component-Communication.md, Component-ConfigurationDatabase.md; CLAUDE.md design-decisions; README if needed

Out of scope (this pass)

  • Acknowledging / shelving / suppressing from ScadaBridge (read-only mirror).
  • Central alarm tables, alarm history/journal, central audit of alarm state.
  • A dedicated operator-facing Alarm Summary page (DebugView only).
  • Alarm-driven notifications or scripts off native alarms.

Open items / risks

  • MxGateway alarm delivery must work end-to-end via StreamAlarms. OtOpcUa notes record the x86 COM worker historically delivered no native alarm events; we are trusting that the gateway now delivers (per the chosen transport). Verify against a live gateway before integration sign-off.
  • Test OPC UA server A&C support — confirm the infra OPC UA server exposes Alarms & Conditions; otherwise add/simulate an alarm-capable source for integration tests.
  • ZB.MOM.WW.MxGateway.Client version — ensure the referenced package exposes StreamAlarmsAsync; bump if needed.