# Notification Outbox The Notification Outbox is the central component that receives store-and-forwarded notifications from site clusters, persists each one to the `Notifications` table in the central MS SQL database, and delivers them through per-type delivery adapters. It is the first outbox component to run centrally — the Store-and-Forward Engine remains site-only. ## Overview Notification Outbox (#21) runs exclusively on the central cluster. Sites no longer send notifications directly via SMTP: a script's `Notify.Send` call generates a `NotificationId` (GUID) locally, the notification is stored in the site S&F buffer, forwarded to central via Central–Site Communication, and the central outbox owns all dispatch and delivery from that point on. The component code lives in `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/`, with a flat layout: - Root — `NotificationOutboxActor`, `NotificationOutboxOptions`, `ServiceCollectionExtensions`. - `Delivery/` — `INotificationDeliveryAdapter`, `DeliveryOutcome`/`DeliveryResult`, `EmailNotificationDeliveryAdapter`. - `Messages/` — `InternalMessages` (actor-internal timer and pipe messages, never sent over the network). Shared types used throughout — `Notification`, `NotificationStatus`, `NotificationType`, `INotificationOutboxRepository`, and all ClusterClient message contracts — live in `src/ZB.MOM.WW.ScadaBridge.Commons/`. The DI entry point is `ServiceCollectionExtensions.AddNotificationOutbox`, called by the Host on central nodes. It binds `NotificationOutboxOptions` and registers the typed delivery adapters. The Host separately registers `NotificationOutboxActor` as a cluster singleton and wires the ClusterClient receptionist so inbound `NotificationSubmit` messages reach the actor regardless of which central node is active. ## Key Concepts ### `Notifications` table as source of truth The central MS SQL `Notifications` table is the single source of audit truth for every notification in the system. One row per `NotificationId` (GUID primary key), regardless of delivery channel. The table is type-agnostic: the `Type` discriminator (`Email`; future channels add new enum members) selects the delivery adapter while the rest of the schema is shared. The `Notifications` table answers both operational queries (current status, retry count, next attempt) and KPI queries (queue depth, stuck count, throughput); no separate time-series store is needed. ### Status lifecycle | Status | Where it lives | Meaning | |---|---|---| | `Forwarding` | Site-local only | Notification is in the site S&F buffer; never stored in `Notifications`. | | `Pending` | Central | Ingested; awaiting first dispatch sweep. | | `Retrying` | Central | Transient failure; `NextAttemptAt` schedules the next attempt. | | `Delivered` | Central, terminal | Successfully sent; `DeliveredAt` and `ResolvedTargets` are set. | | `Parked` | Central, terminal | Permanent failure or retries exhausted; `LastError` records why. | | `Discarded` | Central, terminal | Operator-cancelled a `Parked` notification; row is kept for the audit record. | `Forwarding` is answered site-locally by `Notify.Status(id)`; once the ack arrives, queries round-trip to the `Notifications` table on central. ### At-least-once site→central handoff Central ingests a `NotificationSubmit` with `insert-if-not-exists` on `NotificationId`, then acks the site with `NotificationSubmitAck`. The site S&F engine clears the message only after receiving that ack. A lost ack causes the site to resend; the GUID idempotency key makes the resend a no-op. Because the ack is sent only after the row is persisted, no notification is lost to a race between the ack and a central failover — the row already exists. ### No Akka-level replication All outbox state lives in MS SQL, which is already the central HA store. There is no Akka replication of the actor's in-memory state. On a central failover the new active node's singleton starts a fresh dispatch sweep; `Pending` and due `Retrying` rows are reclaimed from the table on the next tick. ## Architecture ### `NotificationOutboxActor` `NotificationOutboxActor` is a `ReceiveActor` that implements `IWithTimers`. It runs as a cluster singleton on the active central node. The actor is responsible for both the ingest path (accepting `NotificationSubmit` messages) and the dispatch path (running the periodic delivery loop). All async work is executed via `PipeTo(Self)` so every result lands on the actor's mailbox thread, preserving single-threaded actor semantics: ```csharp public class NotificationOutboxActor : ReceiveActor, IWithTimers { public NotificationOutboxActor( IServiceProvider serviceProvider, NotificationOutboxOptions options, ICentralAuditWriter auditWriter, ILogger logger) { Receive(HandleSubmit); Receive(HandleIngestPersisted); Receive(_ => HandleDispatchTick()); Receive(_ => _dispatching = false); Receive(_ => HandlePurgeTick()); Receive(_ => { }); Receive(HandleQuery); Receive(HandleStatusQuery); Receive(HandleDetailRequest); Receive(HandleRetry); Receive(HandleDiscard); Receive(HandleKpiRequest); Receive(HandlePerSiteKpiRequest); } } ``` `PreStart` starts two periodic Akka timers: `DispatchTick` at `DispatchInterval` and `PurgeTick` at `PurgeInterval`. A lifecycle-scoped `CancellationTokenSource` (`_shutdownCts`) is created in `PreStart` and cancelled in `PostStop` so any in-flight SMTP send observes coordinated shutdown instead of blocking for a full connect/auth/send timeout. ### Ingest path `HandleSubmit` maps a `NotificationSubmit` to a `Notification` entity and calls `PersistAsync`, which opens a fresh DI scope, resolves `INotificationOutboxRepository`, and calls `InsertIfNotExistsAsync`. The boolean result is intentionally ignored — an existing row is treated identically to a fresh insert. The async result is piped back to `Self` as `InternalMessages.IngestPersisted`, which carries the original `Sender` reference so the ack is sent from the actor thread: ```csharp private void HandleSubmit(NotificationSubmit msg) { var sender = Sender; var notification = BuildNotification(msg); PersistAsync(notification).PipeTo( Self, success: () => new InternalMessages.IngestPersisted( msg.NotificationId, sender, Succeeded: true, Error: null), failure: ex => new InternalMessages.IngestPersisted( msg.NotificationId, sender, Succeeded: false, Error: ex.GetBaseException().Message)); } ``` `NotificationSubmitAck(Accepted: true)` is returned for both a fresh insert and an existing row. Only a thrown repository error yields `Accepted: false`, causing the site to retain and retry its S&F message. ### Dispatch loop On each `DispatchTick` the actor checks a boolean in-flight guard (`_dispatching`). If a sweep is already running the tick is silently dropped — sweeps never overlap. Otherwise the guard is raised and `RunDispatchPass` is launched: 1. A scoped `INotificationOutboxRepository` fetches `Pending` rows and `Retrying` rows whose `NextAttemptAt ≤ now`, ordered by `CreatedAt` ascending, capped at `DispatchBatchSize`. 2. The retry policy (`maxRetries`, `retryDelay`) is resolved by reading `SmtpConfiguration` from `INotificationRepository`. Non-positive values are clamped to `FallbackMaxRetries = 10` and `FallbackRetryDelay = 1 min` with a warning log so a misconfigured SMTP row does not silently park notifications without retrying. 3. Each notification in the batch is delivered sequentially via `DeliverOneAsync`. Per-notification exceptions are caught and logged so one bad row never aborts the rest of the batch. `DispatchComplete` (singleton instance) is piped back to `Self` on both the success and failure projections, ensuring the in-flight guard is always cleared even if the sweep faults unexpectedly. ### Delivery adapters `INotificationDeliveryAdapter` is the per-channel delivery seam: ```csharp public interface INotificationDeliveryAdapter { NotificationType Type { get; } Task DeliverAsync( Notification notification, CancellationToken cancellationToken = default); } ``` `DeliveryOutcome` carries a `DeliveryResult` enum (`Success`, `TransientFailure`, `PermanentFailure`), resolved recipients on success, and an error string on failure. The adapter map (`NotificationType → INotificationDeliveryAdapter`) is built lazily on the first dispatch sweep and cached in `_adaptersCache` for the actor's lifetime, paired with an actor-lifetime `IServiceScope` (`_adaptersScope`) disposed in `PostStop`. This avoids rebuilding the dictionary on every sweep while respecting each adapter's scoped DI dependencies. `EmailNotificationDeliveryAdapter` is the only registered adapter. It resolves the target list and recipients from `INotificationRepository` at delivery time (not at ingest), connects to SMTP via `ISmtpClientWrapper`, acquires an OAuth2 token if configured, and sends a BCC plain-text email. Error classification mirrors the External System Gateway pattern: | Exception | `DeliveryResult` | |---|---| | `SmtpPermanentException` (SMTP 5xx) | `PermanentFailure` | | Connection/protocol/timeout errors | `TransientFailure` | | `OperationCanceledException` (shutdown) | propagated, not classified | | Missing list, empty recipient list, no SMTP config, invalid TLS mode | `PermanentFailure` | | Unclassified (e.g. OAuth2 token fetch failure) | `PermanentFailure` | ### Status transitions in `DeliverOneAsync` ```csharp switch (outcome.Result) { case DeliveryResult.Success: notification.Status = NotificationStatus.Delivered; notification.DeliveredAt = now; notification.ResolvedTargets = outcome.ResolvedTargets; break; case DeliveryResult.TransientFailure: notification.RetryCount++; notification.LastError = outcome.Error; if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked; else { notification.Status = NotificationStatus.Retrying; notification.NextAttemptAt = now + retryDelay; } break; case DeliveryResult.PermanentFailure: notification.Status = NotificationStatus.Parked; notification.LastError = outcome.Error; break; } await outboxRepository.UpdateAsync(notification, cancellationToken); ``` Every attempt also writes audit rows via `ICentralAuditWriter` (see Audit Integration below). Audit-write failure is caught, logged, and never propagates back into the dispatcher — the delivery outcome on the `Notifications` row stands regardless. ### Audit integration Each delivery attempt emits at least one `AuditChannel.Notification` / `AuditKind.NotifyDeliver` row via `ICentralAuditWriter`: - An `AuditStatus.Attempted` row (always, per attempt), carrying attempt duration in milliseconds. - A second, terminal row (`Delivered`, `Parked`, or `Discarded`) only when the post-outcome status is terminal — a transient failure that transitions the notification to `Retrying` emits only the `Attempted` row. `CorrelationId` on the emitted row(s) is parsed from the `NotificationId` GUID. `ExecutionId` and `ParentExecutionId` are echoed from `Notification.OriginExecutionId` / `Notification.OriginParentExecutionId`, linking the central `NotifyDeliver` rows to the site-emitted `NotifySend` row for the same script run. The `Actor` field is `"system"` — there is no authenticated user at dispatch time. Manual discard via `HandleDiscard` also emits a terminal `Discarded` row (with a null error, because the discard is operator-driven, not a delivery failure). ### Purge `HandlePurgeTick` fires daily at `PurgeInterval`. `RunPurgePass` opens a scoped `INotificationOutboxRepository` and calls `DeleteTerminalOlderThanAsync(cutoff)`, where `cutoff = now − TerminalRetention` (default 365 days). The deleted count is logged at `Information`. Purge faults are caught internally so the returned task never faults. ## Usage The outbox is consumed through two DI seams. **Ingest** — the Host registers `NotificationOutboxActor` as an Akka cluster singleton and with the `ClusterClientReceptionist`. Site clusters send `NotificationSubmit` messages through Central–Site Communication; the actor ingests them without further configuration by the caller. **Operator actions / UI queries** — the Central UI's Notification Outbox page and the ManagementActor resolve the singleton `IActorRef` and send query or command messages: | Message | Actor response | Allowed when | |---|---|---| | `NotificationOutboxQueryRequest` | `NotificationOutboxQueryResponse` | Any time | | `NotificationStatusQuery` | `NotificationStatusResponse` | Any time | | `NotificationDetailRequest` | `NotificationDetailResponse` | Any time | | `RetryNotificationRequest` | `RetryNotificationResponse` | Row is `Parked` | | `DiscardNotificationRequest` | `DiscardNotificationResponse` | Row is `Parked` | | `NotificationKpiRequest` | `NotificationKpiResponse` | Any time | | `PerSiteNotificationKpiRequest` | `PerSiteNotificationKpiResponse` | Any time | Retry resets the notification to `Pending` with `RetryCount = 0`, `NextAttemptAt = null`, and `LastError = null` so the dispatch loop reclaims it on the next sweep. Discard transitions to terminal `Discarded` and emits the corresponding audit row. ## Configuration Options are bound from `ScadaBridge:NotificationOutbox` via `NotificationOutboxOptions`: | Key | Default | Description | |---|---|---| | `DispatchInterval` | `00:00:10` (10 s) | How often the dispatcher polls for due rows. | | `DispatchBatchSize` | `100` | Maximum notifications claimed per sweep. | | `StuckAgeThreshold` | `00:10:00` (10 min) | Age beyond which a non-terminal row is counted as stuck in KPIs and the UI badge. Display-only; does not affect dispatcher behaviour. | | `TerminalRetention` | `365` days | How long terminal rows are kept before the daily purge removes them. | | `PurgeInterval` | `1` day | Cadence of the background purge sweep. | | `DeliveredKpiWindow` | `00:01:00` (1 min) | Trailing window for the "delivered last interval" throughput KPI. | Delivery retry policy (`MaxRetries`, `RetryDelay`) is read at runtime from `SmtpConfiguration` via `INotificationRepository`, not from `NotificationOutboxOptions`. Non-positive values are clamped to `FallbackMaxRetries = 10` and `FallbackRetryDelay = 1 min` with a `Warning` log. ## Dependencies & Interactions - [Commons (#16)](./Commons.md) — owns `Notification`, `NotificationStatus`, `NotificationType`, `INotificationOutboxRepository`, `INotificationRepository`, and all message contracts (`NotificationSubmit`, `NotificationSubmitAck`, `NotificationStatusQuery`, `NotificationKpiRequest`, and their responses). Also owns `ScadaBridgeAuditEventFactory` and the `AuditChannel`/`AuditKind`/`AuditStatus` enums used to build dispatch audit rows. - [Configuration Database (#17)](./ConfigurationDatabase.md) — registers the scoped `INotificationOutboxRepository` (the central `dbo.Notifications` table) and `INotificationRepository` (notification-list, recipient, and SMTP configuration tables). Central hosts must call `AddConfigurationDatabase` before `AddNotificationOutbox`. - [Notification Service (#8)](./NotificationService.md) — supplies `ISmtpClientWrapper`, `OAuth2TokenService`, `NotificationOptions`, `SmtpTlsModeParser`, `SmtpErrorClassifier`, and the `SmtpPermanentException` type. `AddNotificationOutbox` relies on `AddNotificationService` being called by the Host to register these shared SMTP primitives; registering them twice would duplicate them. - [Central–Site Communication (#5)](./Communication.md) — carries `NotificationSubmit` / `NotificationSubmitAck` between sites and central via ClusterClient, and `NotificationStatusQuery` / `NotificationStatusResponse` for the `Notify.Status` round-trip. - [Store-and-Forward Engine (#6)](./StoreAndForward.md) — the site-side component that durably buffers notifications in SQLite and retries forwarding until central acks. The outbox is the receiving end of the S&F handoff. - [Audit Log (#23)](./AuditLog.md) — the outbox is a central direct-write caller of `ICentralAuditWriter`. It emits an `Attempted` `NotifyDeliver` row per delivery attempt, plus a terminal row only when the attempt drives the notification to a terminal status (`Delivered`/`Parked`/`Discarded`); it also emits a terminal row per operator Discard. The upstream `NotifySend` row is emitted by the site and arrives at central via standard audit telemetry. - [Health Monitoring (#11)](./HealthMonitoring.md) — polls `NotificationKpiRequest` / `PerSiteNotificationKpiRequest` for the headline KPI tiles on the health dashboard (queue depth, stuck count, parked count). These are central-computed from the `Notifications` table and are separate from the site S&F backlog metric. - [Central UI (#9)](./CentralUI.md) — hosts the Notification Outbox page: KPI tiles, a queryable/filterable notification list, per-row Retry/Discard actions on parked notifications, and a stuck-row badge. ## Troubleshooting ### Notifications stuck in `Pending` A notification stays `Pending` when the dispatch loop is not running or is failing silently. Check for `"Dispatch sweep failed"` at `Error` level in the central node logs. The most common cause is a missing or misconfigured `SmtpConfiguration` row, which the adapter surfaces as a `PermanentFailure` and parks the notification immediately. A `Warning` log line naming `SmtpConfiguration.MaxRetries` or `SmtpConfiguration.RetryDelay` being non-positive indicates the retry policy was clamped — correct the SMTP configuration row. ### Notifications parked with "no delivery adapter for type" The actor parks a notification immediately and logs this message when `NotificationType` has a value for which no `INotificationDeliveryAdapter` is registered. Currently only `Email` has an adapter; future channel types must be registered in `AddNotificationOutbox` before notifications of that type are submitted. ### Dispatch loop wedged (guard stuck `true`) The boolean `_dispatching` guard is cleared by `DispatchComplete`, which is piped even on a faulted sweep. If the actor itself stops and restarts, `PreStart` reinitialises the timers and the guard resets. A wedged guard without a restart indicates the `PipeTo` callback is never completing — examine logs around `"Dispatch sweep faulted unexpectedly"`. ### SMTP credentials appearing in logs `EmailNotificationDeliveryAdapter` runs `CredentialRedactor.Scrub` on all exception messages before logging. If credential strings appear in logs the SMTP exception message is not being routed through the `catch` filters in `DeliverAsync` — ensure the exception type is reachable by one of the three `catch` blocks and not escaping before scrubbing. ### Failover mid-delivery A central failover while a delivery attempt is in flight leaves the row in its pre-attempt status (`Pending` or `Retrying`). The new active node picks it up on the next dispatch tick. One notification may be re-sent to SMTP as a result — this is an accepted trade-off, consistent with the at-least-once guarantee the S&F Engine already provides. ## Related Documentation - [Notification Outbox design specification](../requirements/Component-NotificationOutbox.md) - [Audit Log](./AuditLog.md) - [Commons](./Commons.md) - [Configuration Database](./ConfigurationDatabase.md) - [Central–Site Communication](./Communication.md) - [Store-and-Forward Engine](./StoreAndForward.md) - [Health Monitoring](./HealthMonitoring.md)