8.0 KiB
Alarm Historian — store-and-forward SQLite sink
Reference for ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian
(src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/),
the durable local queue that historizes alarm transitions to AVEVA Historian
without ever blocking the alarm engine or operator actions.
This is the sink mechanics doc. For how the three alarm sources converge on the OPC UA Part 9 surface and which alarms route here, see AlarmTracking.md. For the historian client that drains this queue, see DriverLifecycle.md and ServiceHosting.md.
Why store-and-forward
Scripted alarms (and any future non-Galaxy IAlarmSource, e.g. AB CIP ALMD)
must reach AVEVA Historian, but the historian sidecar can be slow, busy, or
disconnected. The sink decouples the alarm engine from historian reachability:
every qualifying transition is committed to a local SQLite queue first, and
a background drain worker forwards rows to the historian on a backoff-aware
cadence. Operator acks and alarm-state transitions are never blocked waiting on
the historian.
Galaxy-native alarms with
$Alarm*extensions reach AVEVA Historian directly via System Platform'sHistorizeToAvevatoggle — they do not flow through this sink. This path is exclusively for non-Galaxy alarm producers.
Contracts
All in
IAlarmHistorianSink.cs
unless noted.
IAlarmHistorianSink— the intake contract.EnqueueAsync(evt, ct)durably enqueues an event and returns as soon as the queue row is committed (fire-and-forget from the engine's perspective; the sink must not block the emitting thread).GetStatus()returns aHistorianSinkStatussnapshot.NullAlarmHistorianSink— the no-op default for tests and deployments that don't historize alarms. It is the default DI binding (registered in the Runtime'sAddOtOpcUaRuntime); production overrides it withSqliteStoreAndForwardSink.AlarmHistorianEvent(AlarmHistorianEvent.cs) — the source-agnostic event record:AlarmId,EquipmentPath(UNS path, doubles as Historian's SourceNode),AlarmName,AlarmTypeName(Part 9 subtype),Severity,EventKind(free-form transition string — "Activated"/"Cleared"/"Acknowledged"/etc.),Message,User,Comment,TimestampUtc.IAlarmHistorianWriter— what the drain worker delegates writes to.WriteBatchAsync(batch, ct)returns oneHistorianWriteOutcomeper event, in order. Production binds this toWonderwareHistorianClient(the AVEVA Historian sidecar IPC client).HistorianWriteOutcome— per-event drain result:Ack(persisted, remove from queue),RetryPlease(transient failure — leave queued, retry after backoff),PermanentFail(malformed/unrecoverable — move to dead-letter).HistorianSinkStatus— diagnostic snapshot surfaced to the AdminUI and/healthz:QueueDepth,DeadLetterDepth,LastDrainUtc,LastSuccessUtc,LastError,DrainState, andEvictedCount.HistorianDrainState—Disabled/Idle/Draining/BackingOff.
SqliteStoreAndForwardSink
SqliteStoreAndForwardSink.cs
is the production IAlarmHistorianSink. Construction takes a SQLite database
path, an IAlarmHistorianWriter, a logger, and optional batchSize (default
100), capacity (default 1,000,000), deadLetterRetention (default 30 days),
and a test clock.
Queue table
The sink owns one SQLite table (created on construction, WAL journal mode):
CREATE TABLE Queue (
RowId INTEGER PRIMARY KEY AUTOINCREMENT,
AlarmId TEXT NOT NULL,
EnqueuedUtc TEXT NOT NULL,
PayloadJson TEXT NOT NULL, -- JSON-serialized AlarmHistorianEvent
AttemptCount INTEGER NOT NULL DEFAULT 0,
LastAttemptUtc TEXT NULL,
LastError TEXT NULL,
DeadLettered INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX IX_Queue_Drain ON Queue (DeadLettered, RowId);
EnqueueAsync does a single INSERT on the hot path. To avoid a
SELECT COUNT(*) on every enqueue, the sink keeps an in-memory non-dead-lettered
row counter (seeded at startup, kept current by every mutation, and re-synced
from storage every 10,000 enqueues to defend against drift). SQLite writer
contention is handled via PRAGMA busy_timeout=5000 + WAL so an enqueue/drain
collision waits out the file lock instead of failing fast.
Drain worker
StartDrainLoop(tickInterval) starts a self-rescheduling one-shot
System.Threading.Timer (not started automatically — tests drive
DrainOnceAsync deterministically). Each tick:
- Purges aged dead-lettered rows past the retention window.
- Reads up to
batchSizenon-dead-lettered rows inRowIdorder. - Rows with un-deserializable payloads are dead-lettered immediately (by their
own
RowId) so they can't stall the queue head. - The remaining batch is handed to
IAlarmHistorianWriter.WriteBatchAsync, and each outcome is applied in one transaction:Ackdeletes the row,PermanentFailflips itsDeadLetteredflag,RetryPleasebumps its attempt count and leaves it queued. - The timer re-arms its next due-time to
max(tickInterval, currentBackoff).
Backoff ladder (applied to the timer's next due-time, so a historian outage
genuinely slows the drain cadence): 1s → 2s → 5s → 15s → 60s cap. Any
RetryPlease outcome — or a writer exception, or a writer cardinality violation
(outcome count ≠ event count) — bumps the backoff and sets DrainState = BackingOff; a clean batch resets it. The async-void timer callback is fully
guarded: a fault is logged and recorded into GetStatus() rather than lost as
an unobserved task exception.
Durability bound (important)
The durability guarantee is bounded by capacity (default 1,000,000 rows).
When the non-dead-lettered queue reaches capacity, EnqueueAsync evicts the
oldest non-dead-lettered rows (oldest RowId first) to make room, logs a WARN,
and increments HistorianSinkStatus.EvictedCount. Under a sustained historian
outage, accepted alarm events can therefore be dropped before delivery. A
non-zero EvictedCount is a data-loss signal that requires operator attention —
it surfaces silent loss without log scraping.
Dead-letter + operator recovery
PermanentFail and corrupt-payload rows are retained in-place with
DeadLettered = 1 for the retention window (default 30 days) so operators can
inspect them before the sweeper purges them. RetryDeadLettered() is the
operator action (from the AdminUI) that clears the dead-letter flag and attempt
count on every dead-lettered row, returning them to the regular queue with a
fresh backoff.
Runtime wiring
Production routes alarm transitions through the Akka cluster. The
HistorianAdapterActor
(Runtime/Historian/HistorianAdapterActor.cs)
bridges messages from the scripted-alarm actor into the sink's EnqueueAsync,
fire-and-forget so the actor loop is never blocked on historian reachability.
The WonderwareHistorianClient is the IAlarmHistorianWriter the drain worker
delegates to. See ServiceHosting.md for the sidecar setup.
See also
- AlarmTracking.md — the three alarm sources and the OPC UA Part 9 surface; which alarms route to this sink.
- DriverLifecycle.md —
IHistorianDataSource(the historian read surface; this page covers the write path) and theWonderwareHistorianClient. - ScriptedAlarms.md — the scripted-alarm engine that emits most events into this sink.
- ServiceHosting.md — the optional Wonderware historian sidecar.