# Alarm Historian — store-and-forward SQLite sink Reference for `ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian` ([`src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/`](../src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/)), the durable local queue that historizes alarm transitions to AVEVA Historian without ever blocking the alarm engine or operator actions. This is the *sink mechanics* doc. For how the three alarm sources converge on the OPC UA Part 9 surface and which alarms route here, see [AlarmTracking.md](AlarmTracking.md). For the historian client that drains this queue, see [DriverLifecycle.md](DriverLifecycle.md#ihistoriandatasource--server-side-historian-read-surface) and [ServiceHosting.md](ServiceHosting.md). --- ## Why store-and-forward Scripted alarms (and any future non-Galaxy `IAlarmSource`, e.g. AB CIP ALMD) must reach AVEVA Historian, but the historian sidecar can be slow, busy, or disconnected. The sink decouples the alarm engine from historian reachability: every qualifying transition is committed to a **local SQLite queue first**, and a background drain worker forwards rows to the historian on a backoff-aware cadence. Operator acks and alarm-state transitions are never blocked waiting on the historian. > Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian directly > via System Platform's `HistorizeToAveva` toggle — they do **not** flow through > this sink. This path is exclusively for non-Galaxy alarm producers. --- ## Contracts All in [`IAlarmHistorianSink.cs`](../src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs) unless noted. - **`IAlarmHistorianSink`** — the intake contract. `EnqueueAsync(evt, ct)` durably enqueues an event and returns as soon as the queue row is committed (fire-and-forget from the engine's perspective; the sink must not block the emitting thread). `GetStatus()` returns a `HistorianSinkStatus` snapshot. - **`NullAlarmHistorianSink`** — the no-op default for tests and deployments that don't historize alarms. It is the default DI binding (registered in the Runtime's `AddOtOpcUaRuntime`); production overrides it with `SqliteStoreAndForwardSink`. - **`AlarmHistorianEvent`** ([`AlarmHistorianEvent.cs`](../src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/AlarmHistorianEvent.cs)) — the source-agnostic event record: `AlarmId`, `EquipmentPath` (UNS path, doubles as Historian's SourceNode), `AlarmName`, `AlarmTypeName` (Part 9 subtype), `Severity`, `EventKind` (free-form transition string — "Activated"/"Cleared"/"Acknowledged"/etc.), `Message`, `User`, `Comment`, `TimestampUtc`. - **`IAlarmHistorianWriter`** — what the drain worker delegates writes to. `WriteBatchAsync(batch, ct)` returns one `HistorianWriteOutcome` per event, in order. Production binds this to `WonderwareHistorianClient` (the AVEVA Historian sidecar IPC client). - **`HistorianWriteOutcome`** — per-event drain result: `Ack` (persisted, remove from queue), `RetryPlease` (transient failure — leave queued, retry after backoff), `PermanentFail` (malformed/unrecoverable — move to dead-letter). - **`HistorianSinkStatus`** — diagnostic snapshot surfaced to the AdminUI and `/healthz`: `QueueDepth`, `DeadLetterDepth`, `LastDrainUtc`, `LastSuccessUtc`, `LastError`, `DrainState`, and `EvictedCount`. - **`HistorianDrainState`** — `Disabled` / `Idle` / `Draining` / `BackingOff`. --- ## SqliteStoreAndForwardSink [`SqliteStoreAndForwardSink.cs`](../src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs) is the production `IAlarmHistorianSink`. Construction takes a SQLite database path, an `IAlarmHistorianWriter`, a logger, and optional `batchSize` (default 100), `capacity` (default 1,000,000), `deadLetterRetention` (default 30 days), and a test clock. ### Queue table The sink owns one SQLite table (created on construction, WAL journal mode): ```sql CREATE TABLE Queue ( RowId INTEGER PRIMARY KEY AUTOINCREMENT, AlarmId TEXT NOT NULL, EnqueuedUtc TEXT NOT NULL, PayloadJson TEXT NOT NULL, -- JSON-serialized AlarmHistorianEvent AttemptCount INTEGER NOT NULL DEFAULT 0, LastAttemptUtc TEXT NULL, LastError TEXT NULL, DeadLettered INTEGER NOT NULL DEFAULT 0 ); CREATE INDEX IX_Queue_Drain ON Queue (DeadLettered, RowId); ``` `EnqueueAsync` does a single `INSERT` on the hot path. To avoid a `SELECT COUNT(*)` on every enqueue, the sink keeps an in-memory non-dead-lettered row counter (seeded at startup, kept current by every mutation, and re-synced from storage every 10,000 enqueues to defend against drift). SQLite writer contention is handled via `PRAGMA busy_timeout=5000` + WAL so an enqueue/drain collision waits out the file lock instead of failing fast. ### Drain worker `StartDrainLoop(tickInterval)` starts a **self-rescheduling one-shot `System.Threading.Timer`** (not started automatically — tests drive `DrainOnceAsync` deterministically). Each tick: 1. Purges aged dead-lettered rows past the retention window. 2. Reads up to `batchSize` non-dead-lettered rows in `RowId` order. 3. Rows with un-deserializable payloads are dead-lettered immediately (by their own `RowId`) so they can't stall the queue head. 4. The remaining batch is handed to `IAlarmHistorianWriter.WriteBatchAsync`, and each outcome is applied in one transaction: `Ack` deletes the row, `PermanentFail` flips its `DeadLettered` flag, `RetryPlease` bumps its attempt count and leaves it queued. 5. The timer re-arms its next due-time to `max(tickInterval, currentBackoff)`. **Backoff ladder** (applied to the timer's next due-time, so a historian outage genuinely slows the drain cadence): 1s → 2s → 5s → 15s → 60s cap. Any `RetryPlease` outcome — or a writer exception, or a writer cardinality violation (outcome count ≠ event count) — bumps the backoff and sets `DrainState = BackingOff`; a clean batch resets it. The async-void timer callback is fully guarded: a fault is logged and recorded into `GetStatus()` rather than lost as an unobserved task exception. ### Durability bound (important) **The durability guarantee is bounded by `capacity` (default 1,000,000 rows).** When the non-dead-lettered queue reaches capacity, `EnqueueAsync` evicts the oldest non-dead-lettered rows (oldest `RowId` first) to make room, logs a WARN, and increments `HistorianSinkStatus.EvictedCount`. Under a sustained historian outage, accepted alarm events can therefore be dropped before delivery. A non-zero `EvictedCount` is a data-loss signal that requires operator attention — it surfaces silent loss without log scraping. ### Dead-letter + operator recovery `PermanentFail` and corrupt-payload rows are retained in-place with `DeadLettered = 1` for the retention window (default 30 days) so operators can inspect them before the sweeper purges them. `RetryDeadLettered()` is the operator action (from the AdminUI) that clears the dead-letter flag and attempt count on every dead-lettered row, returning them to the regular queue with a fresh backoff. --- ## Runtime wiring Production routes alarm transitions through the Akka cluster. The `HistorianAdapterActor` ([`Runtime/Historian/HistorianAdapterActor.cs`](../src/Server/ZB.MOM.WW.OtOpcUa.Runtime/Historian/HistorianAdapterActor.cs)) bridges messages from the scripted-alarm actor into the sink's `EnqueueAsync`, fire-and-forget so the actor loop is never blocked on historian reachability. The `WonderwareHistorianClient` is the `IAlarmHistorianWriter` the drain worker delegates to. See [ServiceHosting.md](ServiceHosting.md) for the sidecar setup. --- ## See also - [AlarmTracking.md](AlarmTracking.md) — the three alarm sources and the OPC UA Part 9 surface; which alarms route to this sink. - [DriverLifecycle.md](DriverLifecycle.md) — `IHistorianDataSource` (the historian *read* surface; this page covers the *write* path) and the `WonderwareHistorianClient`. - [ScriptedAlarms.md](ScriptedAlarms.md) — the scripted-alarm engine that emits most events into this sink. - [ServiceHosting.md](ServiceHosting.md) — the optional Wonderware historian sidecar.