dadebbe227
Read-only mirror of native alarm sources into a unified A&C-style state model (severity + active/acked/shelved/suppressed). Instance-bound source discovery, site-only SQLite state with live central query (no central tables), DebugView enrichment. OPC UA A&C events + ConditionRefresh and MxGateway session-less StreamAlarms via a new IAlarmSubscribableConnection seam routed connection-level by source reference; new NativeAlarmActor peer to computed AlarmActor.
213 lines
19 KiB
Markdown
213 lines
19 KiB
Markdown
# Native OPC UA & MxAccess Gateway Alarms — Design
|
||
|
||
**Date:** 2026-05-29
|
||
**Status:** Approved
|
||
|
||
## Problem
|
||
|
||
Today alarms are **computed at the site**: a `TemplateAlarm` defines a trigger (ValueMatch, RangeViolation, RateOfChange, HiLo, Expression); one `AlarmActor` per alarm evaluates attribute values and emits `AlarmStateChanged` carrying a bare `AlarmState { Active, Normal }` plus an integer `Priority` and (for HiLo) a `Level`. State is in-memory only — there is **no severity dimension, no acknowledgement, no shelve/suppress state, and no operator metadata** — and it surfaces only in the per-instance DebugView.
|
||
|
||
Two data sources we connect to own their own alarm lifecycle and expose far richer state:
|
||
|
||
- **OPC UA Alarms & Conditions (Part 9)** — the server raises/acks/clears `AlarmCondition` nodes with orthogonal sub-states (Active/Inactive, Acked/Unacked, Confirmed/Unconfirmed, Shelved, Suppressed) and a 1–1000 severity. The DCL OPC UA adapter currently subscribes only to the `Value` attribute.
|
||
- **MxAccess Gateway** — already exposes a session-less `StreamAlarms` feed (`OnAlarmTransitionEvent`: raise/ack/clear/retrigger, severity, operator user + comment, category, description, current/limit value) plus `QueryActiveAlarms`. The DCL MxGateway adapter currently consumes only the `OnDataChange` event family.
|
||
|
||
These are **mirrored** alarms — the source is the source of truth — which is a real divergence from the computed model. This design enriches the alarm tracking model to carry severity + ack/shelve/suppress state, and ingests native alarms from both sources.
|
||
|
||
## Design Decisions
|
||
|
||
| Decision | Choice |
|
||
|----------|--------|
|
||
| State model scope | **Unified** A&C-style state model for *all* alarms (computed + native) |
|
||
| Interactivity | **Read-only mirror** — display source-reported state; no acking/shelving from ScadaBridge, no command relay, no operator identity captured by ScadaBridge (source-supplied operator user/comment *are* displayed for native alarms) |
|
||
| Binding | Instance declares a `NativeAlarmSource` (connection + source ref); conditions under it are **discovered at runtime**, keyed by source reference |
|
||
| State location | **Site-only**, persisted to SQLite (survives restart/failover); central **queries live** (snapshot + live stream); **no central tables, no central history** |
|
||
| MxGateway transport | Gateway session-less `StreamAlarms` feed |
|
||
| OPC UA transport | Alarms & Conditions events + `ConditionRefresh` snapshot |
|
||
| Site actor structure | New `NativeAlarmActor` child of `InstanceActor`, peer to computed `AlarmActor`s (Approach 1) |
|
||
| Authoring | Central UI design-time panels (Template editor + Instance Configure) **and** CLI |
|
||
| Runtime UI | Enrich the per-instance DebugView alarm table only (no new operator page) |
|
||
|
||
### Trade-offs accepted
|
||
|
||
- **No central audit trail of alarms** (who acked, history). Acceptable because the source systems own ack and retain their own alarm history; ScadaBridge is a read-only window. If audit of alarm state is later wanted, a central mirror following the Site Call Audit (#22) pattern can be added without disturbing this design.
|
||
- **Read-only** means MxGateway/OPC UA acknowledgements happen in the source's own tools; ScadaBridge reflects them.
|
||
|
||
---
|
||
|
||
## Section 1 — Unified state model & wire contracts
|
||
|
||
**New Commons types** (`Types/Enums/`, `Types/Alarms/`):
|
||
|
||
```
|
||
enum AlarmKind { Computed, NativeOpcUa, NativeMxAccess }
|
||
enum AlarmShelveState { Unshelved, OneShotShelved, TimedShelved, PermanentShelved }
|
||
|
||
record AlarmConditionState(
|
||
bool Active, // Active vs Inactive
|
||
bool Acknowledged, // Acked / Unacked
|
||
bool? Confirmed, // null = not a confirmable condition (OPC UA optional)
|
||
AlarmShelveState Shelve,
|
||
bool Suppressed,
|
||
int Severity) // 0–1000, unified scale
|
||
```
|
||
|
||
The OPC UA Part 9 sub-conditions are **orthogonal** and MxAccess's `ACTIVE / ACTIVE_ACKED / INACTIVE` maps cleanly onto them, so they are modeled as independent flags (the UI rolls them up for display).
|
||
|
||
**`AlarmStateChanged` is extended additively** (existing fields kept for back-compat; new fields defaulted):
|
||
|
||
| New field | Default | Notes |
|
||
|-----------|---------|-------|
|
||
| `Kind` | `Computed` | discriminator |
|
||
| `Condition` | computed from existing | the `AlarmConditionState` above |
|
||
| `SourceReference` | `""` | native key, e.g. `"Tank01.Level.HiHi"` |
|
||
| `AlarmTypeName` | `""` | native, e.g. `"AnalogLimitAlarm.HiHi"` |
|
||
| `Category` | `""` | native taxonomy |
|
||
| `OperatorUser` | `""` | native ack metadata (display-only) |
|
||
| `OperatorComment` | `""` | native ack metadata (display-only) |
|
||
| `OriginalRaiseTime` | `null` | native |
|
||
| `CurrentValue` | `""` | native, display |
|
||
| `LimitValue` | `""` | native, display |
|
||
|
||
**Identity / key:** computed alarms key by `AlarmName` (canonical); native alarms key by `SourceReference` (stable across transitions). `Kind` discriminates. The existing `AlarmName` field carries the source reference's display form for native rows so existing consumers don't break.
|
||
|
||
**Source → `AlarmConditionState` mapping:**
|
||
- **Computed:** `Active = state==Active`, `Acknowledged = true` (auto), `Confirmed = null`, `Shelve = Unshelved`, `Suppressed = false`, `Severity = Priority`, `Level` retained for HiLo.
|
||
- **OPC UA A&C:** read `ActiveState`, `AckedState`, `ConfirmedState`, `ShelvingState`, `SuppressedState`, `Severity` from the condition's event fields.
|
||
- **MxAccess:** `ACTIVE → (Active=t, Ack=f)`, `ACTIVE_ACKED → (Active=t, Ack=t)`, `INACTIVE → (Active=f)`; `Severity` from the gateway's remapped 0–1000; shelve/suppress default (gateway proto doesn't surface them).
|
||
|
||
**gRPC `AlarmStateUpdate` (`sitestream.proto`)** gets the same fields appended as new field numbers (additive — never renumber/remove): `kind`, `active`, `acknowledged`, `confirmed`, `shelve_state`, `suppressed`, `source_reference`, `alarm_type_name`, `category`, `operator_user`, `operator_comment`, `original_raise_time`, `current_value`, `limit_value`. Existing `state`, `priority`, `level`, `message` stay for compatibility.
|
||
|
||
---
|
||
|
||
## Section 2 — Configuration, binding & deployment
|
||
|
||
This mirrors how template **attributes bind to a data source** today (template declares, instance overrides the concrete reference).
|
||
|
||
**New Commons entities**
|
||
- `TemplateNativeAlarmSource` (`Entities/Templates/`): `Id`, `TemplateId`, `Name` (unique within template), `Description?`, `ConnectionName`, `SourceReference` (OPC UA SourceNode/notifier nodeId, or MxAccess object/area), `ConditionFilter?` (null = mirror *all* conditions under the source), `IsLocked`, `IsInherited` — same lock/inherit bookkeeping as `TemplateAlarm`.
|
||
- `InstanceNativeAlarmSourceOverride` (`Entities/Instances/`): `Id`, `InstanceId`, `SourceCanonicalName`, `ConnectionNameOverride?`, `SourceReferenceOverride?`, `ConditionFilterOverride?`; unique `(InstanceId, SourceCanonicalName)`. `SourceReference` is the field that varies per physical instance, so per-instance override is the common case.
|
||
|
||
**Flattening** (`FlatteningService`, `FlattenedConfiguration`)
|
||
- New `ResolvedNativeAlarmSource { CanonicalName, ConnectionName, SourceReference, ConditionFilter?, Source }`, resolved through the same steps as `ResolvedAlarm`: inherited → composed (path-qualified `[Module].[Name]`) → instance overrides applied.
|
||
- **Pre-deployment semantic validation** (extends existing checks): `ConnectionName` resolves to a real site `DataConnection`; that connection's protocol is alarm-capable (`OpcUa` or `MxGateway`); `SourceReference` non-empty; canonical-name collision check.
|
||
|
||
**ConfigurationDatabase (EF + migration)**
|
||
- `TemplateNativeAlarmSourceConfiguration` → table `Templates.NativeAlarmSources`, unique `(TemplateId, Name)`, FK cascade.
|
||
- `InstanceNativeAlarmSourceOverrideConfiguration` → table `InstanceNativeAlarmSourceOverrides`, unique `(InstanceId, SourceCanonicalName)`, FK cascade.
|
||
- One migration adds both tables (auto-apply in dev per existing convention).
|
||
|
||
**Deployment** — `FlattenedConfiguration` carries `ResolvedNativeAlarmSource[]`, deployed alongside `ResolvedAlarm[]` on the existing artifact path. Site Runtime consumes them when building the instance actor hierarchy. All-or-nothing per-instance apply unchanged.
|
||
|
||
**Authoring (Central UI + CLI)**
|
||
- Template editor: a "Native Alarm Sources" subsection (name, connection dropdown filtered to alarm-capable protocols, source reference, optional filter).
|
||
- Instance Configure: override connection/source-ref/filter per instance, like attribute data-source overrides.
|
||
- CLI: `template native-alarm-source add/list/remove`, `instance native-alarm-source set/clear`.
|
||
|
||
---
|
||
|
||
## Section 3 — DCL ingestion & the two adapters
|
||
|
||
**Capability seam** (mirrors the existing `IBrowsableDataConnection` pattern):
|
||
|
||
```
|
||
interface IAlarmSubscribableConnection {
|
||
Task<string> SubscribeAlarmsAsync(string sourceReference, string? conditionFilter, AlarmTransitionCallback cb);
|
||
Task UnsubscribeAlarmsAsync(string subscriptionId);
|
||
}
|
||
delegate void AlarmTransitionCallback(NativeAlarmTransition t);
|
||
```
|
||
|
||
**Protocol-neutral transition** (`Commons/Types/Alarms/`):
|
||
|
||
```
|
||
enum AlarmTransitionKind { Snapshot, SnapshotComplete, Raise, Acknowledge, Clear, Retrigger, StateChange }
|
||
|
||
record NativeAlarmTransition(
|
||
string SourceReference, string SourceObjectReference, string AlarmTypeName,
|
||
AlarmTransitionKind Kind, AlarmConditionState Condition,
|
||
string Category, string Description, string Message,
|
||
string OperatorUser, string OperatorComment,
|
||
DateTimeOffset? OriginalRaiseTime, DateTimeOffset TransitionTime,
|
||
string CurrentValue, string LimitValue)
|
||
```
|
||
|
||
`Snapshot`/`SnapshotComplete` carry the initial active-condition replay so the consumer re-seeds a source's state on every (re)subscribe — this is how reconnect reconciliation works without central storage.
|
||
|
||
**Connection-level transport + source-ref routing.** Although binding is *declared* per-instance, the subscription is naturally **connection-level** (OPC UA wants one event subscription; MxGateway `StreamAlarms` is one session-less feed). `DataConnectionActor` opens **one** alarm feed per connection and maintains `_alarmSubscribers: SourceObjectRef → set<instance actorRef>`, routing each transition to matching instances.
|
||
|
||
New messages (`Messages/DataConnection/`): `SubscribeAlarmsRequest`/`Response`, `UnsubscribeAlarmsRequest`, internal `NativeAlarmTransitionReceived(conn, transition, generation)`, forwarded as `NativeAlarmTransitionUpdate(conn, transition)`. Subscribe/unsubscribe obey the existing **Become/Stash** lifecycle (stashed while Connecting/Reconnecting, replayed on Connected). The stale-callback **generation guard** and once-only disconnect guard apply unchanged. On disconnect the actor emits a per-source `NativeAlarmSourceUnavailable` so consumers mark mirrored alarms *uncertain* rather than clearing them.
|
||
|
||
**OPC UA A&C adapter** (`OpcUaDataConnection` / `RealOpcUaClient`)
|
||
- One event `MonitoredItem` (`AttributeId = EventNotifier`) on the Server object (i=2253) or configured notifier, with an `EventFilter`: SelectClauses for EventId, EventType, SourceNode, SourceName, Time, Message, Severity + `ConditionType`/`AcknowledgeableConditionType`/`AlarmConditionType` state fields (Acked/Confirmed/Active/Shelving/Suppressed). Optional WhereClause scoping to the union of bound SourceNodes.
|
||
- Map event fields → `NativeAlarmTransition`; derive `Kind` from which sub-state changed.
|
||
- Call `ConditionRefresh` on (re)subscribe → emit the `Snapshot`/`SnapshotComplete` sequence.
|
||
|
||
**MxGateway adapter** (`MxGatewayDataConnection` / `RealMxGatewayClient`)
|
||
- Open session-less `StreamAlarms` (optional `alarm_filter_prefix` from bound source refs). Map `AlarmFeedMessage`: `active_alarm` → `Snapshot`, `snapshot_complete` → `SnapshotComplete`, `transition (OnAlarmTransitionEvent)` → mapped transition (RAISE/ACK/CLEAR/RETRIGGER, severity, operator user+comment, category, description, raise/transition times, current/limit value).
|
||
- Resumable stream; on transport fault re-open (existing `RaiseDisconnected` once-only guard) → fresh snapshot re-seeds.
|
||
- Uses `ZB.MOM.WW.MxGateway.Client`'s `StreamAlarmsAsync` (already exercised by OtOpcUa's `GatewayGalaxyAlarmFeed`); bump the NuGet package if the referenced version predates it.
|
||
|
||
---
|
||
|
||
## Section 4 — Site runtime, central query, UI, errors & testing
|
||
|
||
**`NativeAlarmActor` (new)**
|
||
- Child of `InstanceActor`, one per `ResolvedNativeAlarmSource` (named `native-alarm-{canonicalName}`). On `PreStart` sends `SubscribeAlarmsRequest` for its `(ConnectionName, SourceReference, ConditionFilter)`. Holds `_alarms: Dictionary<sourceRef, MirroredAlarm>` (discovered conditions + `AlarmConditionState` + metadata).
|
||
- On `NativeAlarmTransitionUpdate`: `Snapshot…SnapshotComplete` → buffer then **atomic swap** the source's set (drop conditions absent from the snapshot, emit diffs — no flicker); `Raise/Ack/Clear/Retrigger/StateChange` → update entry, last-write-wins by `TransitionTime` (ignore older). Each change emits an enriched `AlarmStateChanged` to `InstanceActor` → existing stream path.
|
||
- **Retention:** keep an entry while `Active` OR `Unacked`; once fully normal (`Inactive` AND `Acked`) emit a final return-to-normal and drop it.
|
||
- On `NativeAlarmSourceUnavailable`: mark its alarms **uncertain** (snapshot flag) rather than clearing; re-seed from the reconnect snapshot.
|
||
- **Persistence:** site-SQLite table `NativeAlarmState (InstanceUniqueName, SourceCanonicalName, SourceReference, serialized condition+metadata, LastTransitionTime)`. Rehydrate on `PreStart` (so central can query immediately after restart), then reconcile against the fresh snapshot. Reset on redeployment, like static attribute writes.
|
||
- **Supervision:** coordinator-style child → **Resume**. A bad source ref / subscribe failure logs to the site event log (`alarm`), reports unhealthy, and is retried periodically (same spirit as tag-resolution retry) without crashing the instance.
|
||
|
||
**Computed `AlarmActor`:** no logic change — populate `AlarmConditionState` on emit (`Active`, `Acknowledged=true`, `Severity=Priority`, `Level` retained, `Kind=Computed`).
|
||
|
||
**`InstanceActor`:** builds `NativeAlarmActor`s from `ResolvedNativeAlarmSource[]`; native `AlarmStateChanged` flows through the existing `_alarmStates`/`_alarmTimestamps` + `_streamManager.PublishAlarmStateChanged` path (state dictionaries extended to carry the enriched shape); the instance snapshot includes native alarms.
|
||
|
||
**Streaming + central query (no central tables)**
|
||
- Live: enriched `AlarmStateChanged` → `SiteStreamManager` → enriched gRPC `AlarmStateUpdate` → DebugView, as today.
|
||
- Initial snapshot: the existing **ClusterClient instance-snapshot** request (DebugView's seed) is extended to include native alarms in the unified shape. Large snapshots reuse existing per-subscriber buffering / frame-size guard (the browse-cap precedent); chunk if needed.
|
||
|
||
**Central UI — DebugView enrichment** (+ Section 2 authoring panels)
|
||
- Alarm table gains: Severity, a composite condition badge (Active/Acked/Shelved/Suppressed), a Kind badge (computed vs native), Source reference, Alarm type, Category, Operator/comment (tooltip), Original raise time, Current/Limit value (tooltip). Computed rows show severity=priority, auto-acked. Built with the `frontend-design` skill, Bootstrap-only custom components.
|
||
|
||
**Error handling / edge cases**
|
||
- Connection loss → uncertain, not cleared; reconnect snapshot reconciles. Source ref absent from snapshot → cleared. Severity normalized to 0–1000. **Bounded growth:** configurable per-source mirrored-alarm cap in `SiteRuntimeOptions`; when hit, **log it** (no silent truncation). Disabled/deleted instance → unsubscribe.
|
||
- `DataConnectionActor` health report extended with alarm-feed status (active feeds, last-event time, uncertain sources) via `ISiteHealthCollector`.
|
||
|
||
**Testing**
|
||
- Unit: `AlarmConditionState` mapping (computed / OPC UA fields / MxAccess states); `NativeAlarmActor` snapshot-swap, transition handling, persistence rehydrate, uncertain-on-disconnect; `FlatteningService` native-source inherit/compose/override; semantic validation.
|
||
- Adapter: OPC UA event→transition + ConditionRefresh snapshot (fake client); MxGateway `AlarmFeedMessage`→transition + reconnect re-seed (fake client, existing fake patterns).
|
||
- Integration: end-to-end against the infra OPC UA server — **confirm the test OPC UA server exposes A&C; if not, add an alarm-capable test source or simulate.** MxGateway path mocked in CI unless a gateway-with-alarms is available.
|
||
- Seed: add a `NativeAlarmSource` binding to the `docker-env2` site-x MxGateway connection for manual verification.
|
||
|
||
---
|
||
|
||
## Affected components & documents
|
||
|
||
| Area | Changes |
|
||
|------|---------|
|
||
| Commons | New enums/records (`AlarmKind`, `AlarmShelveState`, `AlarmConditionState`, `NativeAlarmTransition`); extend `AlarmStateChanged`; new entities `TemplateNativeAlarmSource`, `InstanceNativeAlarmSourceOverride`; new DCL messages; `IAlarmSubscribableConnection` |
|
||
| Template Engine (#1) | `ResolvedNativeAlarmSource`, flattening resolution, semantic validation |
|
||
| Site Runtime (#3) | `NativeAlarmActor`, enriched `AlarmActor`, `InstanceActor` wiring, `NativeAlarmState` SQLite persistence, `SiteRuntimeOptions` cap |
|
||
| Data Connection Layer (#4) | `DataConnectionActor` alarm feed + routing; OPC UA A&C adapter; MxGateway `StreamAlarms` adapter |
|
||
| Communication (#5) | `sitestream.proto` `AlarmStateUpdate` enrichment; instance-snapshot enrichment |
|
||
| Configuration Database (#17) | EF configurations + migration for two new tables |
|
||
| Central UI (#9) | DebugView alarm table enrichment; Template editor + Instance Configure authoring panels |
|
||
| CLI (#19) | `native-alarm-source` commands |
|
||
| Health Monitoring (#11) | Alarm-feed status in `DataConnectionHealthReport` |
|
||
| Docs | `Component-DataConnectionLayer.md`, `Component-SiteRuntime.md`, `Component-TemplateEngine.md`, `Component-CentralUI.md`, `Component-CLI.md`, `Component-Communication.md`, `Component-ConfigurationDatabase.md`; CLAUDE.md design-decisions; README if needed |
|
||
|
||
## Out of scope (this pass)
|
||
|
||
- Acknowledging / shelving / suppressing from ScadaBridge (read-only mirror).
|
||
- Central alarm tables, alarm history/journal, central audit of alarm state.
|
||
- A dedicated operator-facing Alarm Summary page (DebugView only).
|
||
- Alarm-driven notifications or scripts off native alarms.
|
||
|
||
## Open items / risks
|
||
|
||
- **MxGateway alarm delivery** must work end-to-end via `StreamAlarms`. OtOpcUa notes record the x86 COM worker historically delivered no native alarm events; we are trusting that the gateway now delivers (per the chosen transport). Verify against a live gateway before integration sign-off.
|
||
- **Test OPC UA server A&C support** — confirm the infra OPC UA server exposes Alarms & Conditions; otherwise add/simulate an alarm-capable source for integration tests.
|
||
- **`ZB.MOM.WW.MxGateway.Client` version** — ensure the referenced package exposes `StreamAlarmsAsync`; bump if needed.
|