docs: native alarm ingestion across component docs + CLAUDE.md

This commit is contained in:
Joseph Doherty
2026-05-31 02:55:00 -04:00
parent 2b7c765a58
commit 003e54c1fb
9 changed files with 265 additions and 6 deletions
+72 -3
View File
@@ -34,9 +34,10 @@ Deployment Manager Singleton (Cluster Singleton)
│ │ └── Script Execution Actor — short-lived, per invocation
│ ├── Script Actor ("CalculateOEE") — coordinator
│ │ └── Script Execution Actor — short-lived, per invocation
│ ├── Alarm Actor ("OverTemp") — coordinator
│ ├── Alarm Actor ("OverTemp") — coordinator (computed)
│ │ └── Alarm Execution Actor — short-lived, per on-trigger invocation
── Alarm Actor ("LowPressure") — coordinator
── Alarm Actor ("LowPressure") — coordinator (computed)
│ └── Native Alarm Actor ("OpcUaServer1") — read-only mirror, peer to Alarm Actor
├── Instance Actor ("MachineA-002")
│ └── ...
└── ...
@@ -204,6 +205,74 @@ When the Instance Actor is stopped (due to disable, delete, or redeployment), Ak
---
## Native Alarm Actor
### Role
- **Read-only mirror** of alarms raised natively by an external source — OPC UA Alarms & Conditions (A&C) servers and the MxAccess Gateway — surfaced into the Site Runtime alongside the alarms ScadaBridge computes itself.
- Created as a child of the **Instance Actor** and is a **peer to the computed Alarm Actor** (not a child of it). One `NativeAlarmActor` is spawned per resolved native alarm source binding on the instance.
- Mirrors source-of-truth condition state into the Instance Actor's view and onto the site-wide stream; it **does not** acknowledge, clear, or otherwise write back to the source. There is no ack-back path — the external source remains authoritative.
### Construction
- Constructed with `(ResolvedNativeAlarmSource source, string instanceName, IActorRef instanceActor, IActorRef dclManager, SiteStorageService storage, SiteRuntimeOptions options, ILogger logger, AlarmKind nativeKind = NativeOpcUa)`.
- `nativeKind` distinguishes the two native flavors and stamps the `Kind` on every emitted `AlarmStateChanged`. The Instance Actor selects it from the bound connection's protocol (see **Instance Actor wiring** below).
### Lifecycle & Subscription
- **PreStart**: rehydrates any previously mirrored conditions for this source from the site SQLite `native_alarm_state` table, then subscribes to the source through the Data Connection Layer by sending a `SubscribeAlarmsRequest` to the DCL manager. The DCL routes the subscription to the bound connection's `IAlarmSubscribableConnection` implementation.
- **Failed subscribe**: schedules a retry timer at `NativeAlarmRetryIntervalMs` and re-attempts until the subscription is established. Rehydrated state remains visible in the meantime.
- **`NativeAlarmSourceUnavailable`**: the source connection has dropped. The actor **retains its last-known mirrored conditions** but marks them uncertain rather than purging them, so a transient disconnect does not flap every condition to normal. The set is reconciled against truth by the next reconnect snapshot.
### Transition Handling (`NativeAlarmTransitionUpdate`)
- **Snapshot / SnapshotComplete (reconnect reconciliation)**: `Snapshot` updates buffer into a staging set; `SnapshotComplete` performs an **atomic swap** of the mirrored set with the staged set. Any condition that was previously mirrored but is **not present** in the new snapshot emits a return-to-normal `AlarmStateChanged` and drops out. This is how the mirror self-corrects after an outage.
- **Live transitions** (`Raise` / `Ack` / `Clear` / `Retrigger` / `StateChange`): upsert the condition by `SourceReference`. Updates carrying a `TransitionTime` **older** than the currently held transition are ignored (out-of-order protection). Accepted transitions persist to SQLite and emit an enriched `AlarmStateChanged` upward to the Instance Actor.
- **Retention**: a mirrored condition is dropped once it is both inactive **and** acknowledged (`!Active && Acknowledged`) — the alarm has fully run its course at the source and no longer needs mirroring. The drop emits a final state change and deletes the SQLite row.
- **Per-source cap**: at most `MirroredAlarmCapPerSource` conditions are retained per source. When the cap is exceeded the **oldest** condition is dropped and the eviction is **logged** — there is no silent truncation.
### Persistence
- Mirrored condition state is persisted to the site SQLite `native_alarm_state` table on every accepted transition and removed on drop-out.
- Persistence is **best-effort / fire-and-forget**: a persistence failure is logged but never blocks the actor's mailbox and never aborts the upward `AlarmStateChanged` emit. The in-memory mirror remains authoritative for the running actor; SQLite exists to survive failover.
### Supervision & Restart
- Supervised by the Instance Actor under the same **OneForOneStrategy** as the computed Alarm Actor — a native source fault is isolated to its own actor.
- On site restart or failover, the actor rehydrates its mirror from `native_alarm_state` in PreStart, then reconciles against the source via the reconnect snapshot. Native mirror state therefore **survives failover** (unlike computed alarm state, which is re-evaluated from values).
- Mirrored native state **is cleared on redeploy/undeploy** of the instance (mirroring the static-override reset): the stale rows for the instance are removed and the fresh actor re-subscribes from a clean slate.
---
## Instance Actor — Native Alarm Wiring
The Instance Actor owns native-alarm setup alongside its computed Script and Alarm Actors:
- **Spawning**: for each entry in `_configuration.NativeAlarmSources`, the Instance Actor spawns a `NativeAlarmActor`. Spawning is **skipped when there is no DCL manager** (e.g., debug/test contexts with no data connections), since native alarms require a live source subscription.
- **Kind derivation**: the `AlarmKind` passed to each `NativeAlarmActor` is derived from the bound connection's protocol — `Mx*` protocols → `NativeMxAccess`, otherwise → `NativeOpcUa`.
- **Latest-event retention**: the Instance Actor retains the latest enriched `AlarmStateChanged` per alarm name in `_latestAlarmEvents`. The DebugView snapshot is built from this map so it carries the **unified condition view plus native metadata** for both computed and native alarms. Computed alarms that have not yet produced an event fall back to a **Normal projection** so the snapshot is complete.
- **Reset semantics**: `_latestAlarmEvents` and the mirrored native state are cleared on redeploy/undeploy (same trigger as static-override reset) but rehydrate from SQLite on failover.
---
## Native Alarm State Persistence (Site SQLite)
`SiteStorageService` gains a `native_alarm_state` table backing the native mirror:
- **Primary key**: `(instance_unique_name, source_canonical_name, source_reference)` — one row per mirrored condition.
- **Columns**: `condition_json` (the serialized `AlarmConditionState`) and `last_transition_at` (the accepted `TransitionTime`).
- **Operations**: `Upsert` (on accepted transition), `Delete` (on condition drop-out), `Get` (PreStart rehydrate, scoped to instance + source), and `ClearForInstance` (redeploy/undeploy reset).
- This is a **peer SQLite store** to the existing deployed-configuration, store-and-forward, operation-tracking, and `AuditLog` stores. Unlike computed alarm state, native mirror state is intentionally persisted so it survives failover.
---
## Enriched `AlarmStateChanged` Message
The `AlarmStateChanged` message published by both Alarm Actors and Native Alarm Actors was extended **additively** (existing consumers keep working with computed defaults):
- **`Kind`** (`AlarmKind`): `Computed` for ScadaBridge-evaluated alarms; `NativeOpcUa` / `NativeMxAccess` for mirrored native alarms.
- **`Condition`** (`AlarmConditionState`): the unified condition view. Computed alarms supply a computed default; native alarms carry the mirrored source condition.
- **Native metadata** (populated for native alarms; defaulted/empty for computed): `SourceReference`, `AlarmTypeName`, `Category`, `OperatorUser`, `OperatorComment`, `OriginalRaiseTime`, `CurrentValue`, `LimitValue`.
- **Computed-alarm projection**: computed alarms are surfaced as **auto-acknowledged** with `Severity = Priority`, so a single enriched shape carries both computed and native alarms onto the stream and into the DebugView snapshot.
The enriched message flows Instance Actor → site-wide Akka stream → `SiteStreamManager``SiteStreamGrpcServer` and is streamed to central as the gRPC `AlarmStateUpdate` event (see [Component-Communication.md](Component-Communication.md)).
---
## Shared Script Library
- Shared scripts are compiled at the site when received from central.
@@ -361,7 +430,7 @@ Per Akka.NET best practices, internal actor communication uses **Tell** (fire-an
## Dependencies
- **Data Connection Layer**: Provides tag value updates to Instance Actors. Receives write requests from Instance Actors.
- **Data Connection Layer**: Provides tag value updates to Instance Actors. Receives write requests from Instance Actors. Also feeds Native Alarm Actors: connections implementing `IAlarmSubscribableConnection` (OPC UA A&C servers, MxAccess Gateway) deliver `NativeAlarmTransitionUpdate` events in response to a `SubscribeAlarmsRequest`, and signal `NativeAlarmSourceUnavailable` on connection loss.
- **Store-and-Forward Engine**: Handles reliable delivery for external system calls, cached database writes, and notifications submitted by scripts. For the notification category specifically, it forwards to the central cluster for delivery (not directly to SMTP). Owns the site-local operation tracking table that backs `Tracking.Status(id)`.
- **External System Gateway**: Provides external system method invocations for scripts.
- **Communication Layer**: Receives deployments and lifecycle commands from central. Handles debug view requests. Reports deployment results.