# Code Review — NotificationOutbox | Field | Value | |-------|-------| | Module | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox` | | Design doc | `docs/requirements/Component-NotificationOutbox.md` | | Status | Reviewed | | Last reviewed | 2026-05-28 | | Reviewer | claude-agent | | Commit reviewed | `1eb6e97` | | Open findings | 7 | ## Summary NotificationOutbox is a small, focused module — one ~985-line actor (`NotificationOutboxActor`), a strongly-typed options class, an `INotificationDeliveryAdapter` seam, and the single concrete `EmailNotificationDeliveryAdapter`. The Akka.NET conventions are textbook: every async path is wrapped with `PipeTo`, the dispatcher uses an in-flight guard cleared on `DispatchComplete`, the sender is captured before crossing the await, and the actor isolates per-notification failures so one bad row never aborts a batch. Test coverage is broad — ingest, dispatch, query, retry/discard, purge, KPI, and the new audit-emission paths (B2 attempts + B3 terminals) all have dedicated test files — and the audit-write-failure-never-aborts-delivery contract is explicitly asserted. The dominant theme is **trust-boundary leakage between Outbox, NotificationService, and ConfigurationDatabase**. The outbox inherits two known defects from its sibling modules that are reachable through `EmailNotificationDeliveryAdapter`: the OAuth2 SASL empty-user bug (NS-021) ships every M365 send with `user=""`, and the `InsertIfNotExistsAsync` check-then-act race (CD-015) lives on the outbox's ack-after-persist hot path. Neither is a defect of code under `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/`, but both are surfaced here because production dispatch and ingest go through these exact lines. A secondary theme is **dispatcher-fire-and-forget audit writes** (`_ = _auditWriter.WriteAsync(...)`) that can race the per-sweep scope dispose under the wrong DI graph, and a few smaller drifts: the dispatcher passes `CancellationToken.None` to adapter delivery (no graceful shutdown for in-flight SMTP sends), the `StuckAgeThreshold` XML-doc describes a behavior the design explicitly forbids (display-only, never reclaim), the `MaxRetries` boundary check uses `>=` against a config value that can be zero (immediate park on first transient failure), and several `NotificationOutboxOptions` fields are documented in code but absent from `Component-NotificationOutbox.md`. No Critical findings; two High, six Medium, two Low. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | Yes | `MaxRetries` zero/negative immediately parks (NotificationOutbox-002); `StuckAgeThreshold` XML doc contradicts design (NotificationOutbox-009); `Guid.TryParse` accepts compact `"N"` ids emitted by sites. | | 2 | Akka.NET conventions | Yes | `PipeTo` / sender-capture / in-flight guard pattern is correctly applied throughout. Fire-and-forget `_ = _auditWriter.WriteAsync(...)` raises a scope-lifetime concern (NotificationOutbox-004). | | 3 | Concurrency & thread safety | Yes | Actor state mutated only on actor thread. Inherited CD-015 race on `InsertIfNotExistsAsync` (NotificationOutbox-005) is the only race; the dispatcher's in-flight guard correctly serializes sweeps. | | 4 | Error handling & resilience | Yes | Outer try/catch on `RunDispatchPass`/`RunPurgePass` keeps the in-flight guard sane; per-notification isolation is correct. CT not threaded into delivery (NotificationOutbox-003). | | 5 | Security | Yes | Inherited OAuth2 empty-user (NotificationOutbox-001) reachable through the adapter. No new credential or trust-boundary issues introduced by the outbox itself. | | 6 | Performance & resource management | Yes | Dispatch interval & batch size are simple polling; `ResolveAdapters` rebuilds the lookup per sweep (NotificationOutbox-006). No leaks. | | 7 | Design-document adherence | Yes | `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, `PurgeInterval` are not in the design doc (NotificationOutbox-007). | | 8 | Code organization & conventions | Yes | Options class lives in the component project (correct); DI extension lives in the component (correct); adapter is `scoped`, actor singleton — interaction correctly documented in `ServiceCollectionExtensions`. No issues. | | 9 | Testing coverage | Yes | Solid actor-behaviour coverage. Missing tests for `FallbackMaxRetries` / empty-SMTP-config dispatch path (NotificationOutbox-008). | | 10 | Documentation & comments | Yes | XML on `StuckAgeThreshold` misleading (NotificationOutbox-009); XML on dispatcher's audit `_ =` fire-and-forget says "writer never throws" but `EmitAttemptAudit` still wraps in try/catch — comment contradicts itself (NotificationOutbox-010). | ## Findings ### NotificationOutbox-001 — `EmailNotificationDeliveryAdapter` inherits the OAuth2 empty-user SASL bug (NS-021) on the M365 send path | | | |--|--| | Severity | High | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/Delivery/EmailNotificationDeliveryAdapter.cs:185-191` (calls `smtp.AuthenticateAsync("oauth2", token)`); root cause in `src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:76-79` | **Description** `EmailNotificationDeliveryAdapter.SendAsync` resolves an OAuth2 access token via `_tokenService.GetTokenAsync(...)` and then calls `await smtp.AuthenticateAsync(config.AuthType, credentials, cancellationToken);` on `ISmtpClientWrapper`. The production implementation (`MailKitSmtpClientWrapper`) constructs `new SaslMechanismOAuth2("", credentials)` — an empty user-name field — which Microsoft 365 SMTP rejects with `535 5.7.3 Authentication unsuccessful`. The sibling NotificationService finding NS-021 documents this in full; the outbox is the *new home* for delivery on central, so every OAuth2 send that the outbox dispatches hits this code path. The defect is therefore reachable here even though the offending constructor lives in the NotificationService project, and the central-only redesign means this is now the only delivery path in production. Existing outbox tests do not catch it because they all substitute `ISmtpClientWrapper` and assert only that `AuthenticateAsync` is invoked with `("oauth2", "")` — the real `SaslMechanismOAuth2` is never instantiated. `OAuth2TokenService.GetTokenAsync` is explicitly wired to `login.microsoftonline.com/.../oauth2/v2.0/token` with `scope=https://outlook.office365.com/.default`, so M365 SMTP is the intended target — and is precisely the relay that requires the user field to be populated. **Recommendation** Track the NS-021 fix and add an outbox-side regression test once the wrapper signature is widened. Concretely, when `ISmtpClientWrapper.AuthenticateAsync` is extended to accept the sender mailbox (or a dedicated `oauth2UserName` parameter), update `EmailNotificationDeliveryAdapter.SendAsync` to pass `config.FromAddress`, and add a test in `EmailNotificationDeliveryAdapterTests` that asserts the OAuth2 path forwards the sender identity. Until then, surface the same finding here so the outbox is not treated as resolved when NS-021 fires. **Resolution** _Unresolved._ ### NotificationOutbox-002 — Dispatcher parks on first transient failure when `SmtpConfiguration.MaxRetries == 0` | | | |--|--| | Severity | High | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:348-360` | **Description** The transient-failure branch increments `RetryCount` then evaluates `if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked;`. `maxRetries` is read from the central `SmtpConfiguration.MaxRetries` column, which has no enforced lower bound and is not validated by the outbox. A row whose `MaxRetries` is `0` (or any negative value) immediately satisfies `1 >= 0` on the very first transient failure, so the notification is parked without a single retry — directly contradicting the design doc's "fixed retry interval, reuse central SMTP max-retry-count" intent, where a configured value of zero would naturally read as "never retry, fail straight to permanent". `SetupSmtpRetryPolicy` in the dispatch tests always supplies a positive value, so this path is not exercised. Additionally, an operator who clears the SMTP config row drops into the `FallbackMaxRetries = 10` / `FallbackRetryDelay = 1 min` path (`ResolveRetryPolicyAsync` line 251); that path is also untested — see NotificationOutbox-008. The operational result is that a single bad SMTP config value silently halves the outbox's delivery guarantees. **Recommendation** Validate `MaxRetries` at the read point: treat a non-positive value as either the configured fallback (current `FallbackMaxRetries = 10`) or — preferred — surface the mis-configuration to the operator via a health metric and refuse to dispatch until the row is corrected. Either way, add a test that asserts the dispatcher's behaviour for `MaxRetries == 0` and `MaxRetries < 0`. **Resolution** _Unresolved._ ### NotificationOutbox-003 — Dispatcher does not propagate a `CancellationToken` into delivery; in-flight SMTP sends cannot be cancelled on shutdown | | | |--|--| | Severity | Medium | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:334`, `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/Delivery/INotificationDeliveryAdapter.cs:22` | **Description** `DeliverOneAsync` calls `var outcome = await adapter.DeliverAsync(notification);` — the second `CancellationToken` parameter on `INotificationDeliveryAdapter.DeliverAsync` is left at its `default(CancellationToken)` value, meaning `CancellationToken.None`. `EmailNotificationDeliveryAdapter.SendAsync` then threads that `None` token into `smtp.ConnectAsync`, `smtp.AuthenticateAsync`, and `smtp.SendAsync`. The consequence is that during a coordinated cluster shutdown (singleton handover, drain) any in-flight SMTP send is uncancellable and the dispatcher's sweep must wait for the underlying socket/SMTP timeout (`SmtpConfiguration.ConnectionTimeoutSeconds`) before the sweep's task completes and `DispatchComplete` lowers the in-flight guard. With the default connect-timeout values this is on the order of tens of seconds per notification in the in-progress batch, blocking `CoordinatedShutdown`. The adapter implementations clearly *expect* a token — the contract type is `CancellationToken cancellationToken = default` everywhere — so this is a wiring gap, not a missing interface. **Recommendation** Wire a per-sweep `CancellationTokenSource` linked to the actor's lifecycle (cancel in `PostStop`) and pass its token into `DeliverAsync`. A linked source per sweep also bounds individual deliveries by the configured connection timeout when a more explicit per-attempt budget is wanted. Add a test that cancels mid-`DeliverAsync` and asserts the dispatcher completes promptly and the row is left non-terminal (`Pending`/`Retrying` unchanged) for the next sweep. **Resolution** _Unresolved._ ### NotificationOutbox-004 — `EmitAttemptAudit`/`EmitTerminalAudit` fire-and-forget pattern can outlive the per-sweep DI scope | | | |--|--| | Severity | Medium | | Category | Akka.NET conventions | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:425-435`, `463-485` | **Description** Both emission helpers issue `_ = _auditWriter.WriteAsync(evt);` — discarding the returned task. `CentralAuditWriter.WriteAsync` opens its own `await using var scope = _services.CreateAsyncScope();` and resolves a scoped `IAuditLogRepository` (verified at `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/CentralAuditWriter.cs:118-121`), so the writer is defensively scope-independent. However the dispatcher already holds a per-sweep `using var scope = _serviceProvider.CreateScope();` and the per-notification `UpdateAsync` runs in that scope. The fire-and-forget pattern means: 1. The dispatcher's outer scope can be disposed (sweep done, `DispatchComplete` piped) while the audit `WriteAsync` task is still running on a *different* scope it owns — works today only because the writer creates its own scope. 2. A faulted unobserved task is silently lost: if `CentralAuditWriter.WriteAsync` itself were ever made `async void` or refactored to not internally try/catch, the dispatcher would never see the fault and the audit row would vanish without the `_logger.LogWarning` reaching the operator. 3. The XML-doc above `EmitAttemptAudit` says "PipeTo is not used because the writer never throws" — but the surrounding `try { _ = _auditWriter.WriteAsync(evt); } catch (Exception ex)` will only catch a synchronous throw from the *task construction*, not the awaited body of `WriteAsync`. The comment understates the risk: the catch is structurally unreachable for the documented failure mode. The system actually wants the *invariant* "audit write never affects delivery" (verified by the `AuditWriter_Throws_…StillSucceeds` tests). That invariant is better expressed by `await`-ing the writer inside the actor's outer try/catch (the dispatcher already swallows per-notification exceptions) than by a discard-task, which couples the lifetime of the dispatcher's scope to that of the audit task through whatever scope graph the writer happens to use today. **Recommendation** Either `await _auditWriter.WriteAsync(evt)` inside the existing `try`/`catch` (the preferred fix — preserves the invariant, plays well with the per-sweep scope, and makes the catch block actually reachable), or — if a true fire-and-forget remains desired — capture the returned task and attach a continuation that calls `_logger.LogWarning` on faulted to keep diagnostics intact. Either way, fix the "writer never throws" XML-doc to match the implementation. **Resolution** _Unresolved._ ### NotificationOutbox-005 — Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries | | | |--|--| | Severity | Medium | | Category | Concurrency & thread safety | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:127-132` (caller); root cause in `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` | **Resolution (2026-05-28):** Closed by CD-015 — `NotificationOutboxRepository.InsertIfNotExistsAsync` (commit `ac96b83`) is now a single-statement `IF NOT EXISTS ... INSERT` via `ExecuteSqlInterpolatedAsync` with a `SqlException` filter swallowing duplicate-key violations (`2601`/`2627`) as a no-op (`return false`). The check-then-act window is eliminated; the at-least-once handoff contract holds and the actor's `PipeTo` success/failure projection no longer surfaces a permanent PK-violation back to the site. Verified in `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:51-103`. **Description** `HandleSubmit` → `PersistAsync` calls `repository.InsertIfNotExistsAsync(notification)` on `INotificationOutboxRepository`. The current implementation (`src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs`) does a check-then-act with no duplicate-key catch — documented as CD-015 (High, Open). The Notification Outbox's documented contract is "at-least-once handoff with ack-after-persist plus insert-if-not-exists on `NotificationId`" (CLAUDE.md, Component-NotificationOutbox.md §Ingest & Idempotency), and the duplicate-insert race is the **expected contention pattern** — the site retries the same submission after a lost ack. As written, the loser surfaces a `SqlException` (2627 PK violation) wrapped in `DbUpdateException`, propagates through `PipeTo`'s failure projection as a `NotificationSubmitAck { Accepted: false, Error: "... PRIMARY KEY ..." }`, the site treats the ack as a forwarding failure and forwards the message **again**, re-entering the same race. If the contending pair keeps racing this can livelock. The actor side is fine — `PipeTo`'s success/failure projection correctly forwards the exception message. The repository side needs the standard `2601/2627 → no-op` pattern that AuditLog and SiteCall already use. This finding tracks the outbox-side visibility of the CD-015 defect so a re-review of NotificationOutbox surfaces it even if the reader has not yet read the ConfigurationDatabase findings. **Recommendation** Track CD-015 to resolution. As a defense-in-depth complement here, consider treating a duplicate-key `DbUpdateException` in the actor's ingest failure projection as `Accepted: true` so a lost ack between persisted-by-the-first-writer and ack-back does not produce a permanent re-forward loop — but the cleanest fix remains the CD-015 raw-SQL `IF NOT EXISTS … INSERT` with `2601/2627` catch in `NotificationOutboxRepository`. ### NotificationOutbox-006 — `ResolveAdapters` rebuilds the `NotificationType → adapter` dictionary on every dispatch sweep | | | |--|--| | Severity | Low | | Category | Performance & resource management | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:267-277` | **Description** Every dispatch sweep calls `ResolveAdapters(scope.ServiceProvider)` which enumerates `scopedServices.GetServices()` and builds a fresh `Dictionary`. Adapter registration is decided at startup (`AddNotificationOutbox` registers `EmailNotificationDeliveryAdapter`); the registration set does not change at runtime. With a default `DispatchInterval = 10s` and only ever one entry today, the allocation overhead is trivial — but the comment "the last adapter registered for a given type wins, mirroring DI's last-wins resolution semantics" elevates this to a behaviour contract, and the per-sweep dictionary construction obscures the lookup's identity from one sweep to the next, making any future stateful adapter (rate limiter, circuit breaker) silently lose its state. The same issue is the reason `EmailNotificationDeliveryAdapter` is *scoped* — it holds a scoped `INotificationRepository`. A trivial cache-the-types-but-resolve- the-instance fix is possible: cache the set of declared `NotificationType` values and look up each adapter by `GetService()` filtered by `Type` per sweep. **Recommendation** Document the per-sweep contract explicitly ("each sweep gets a fresh adapter instance per the scoped DI contract — adapters must not carry state across sweeps") in the actor XML, or — preferred — cache only the *types* at startup (`PreStart`) and resolve the scoped instance per sweep, so future adapters with stateful intent (timeouts, circuit breakers) cannot accidentally lose state. **Resolution (2026-05-28):** Cached the `NotificationType → adapter` dictionary on a new actor field `_adaptersCache`, built lazily on the first dispatch sweep that needs it. The cache is paired with an actor-lifetime `IServiceScope` (`_adaptersScope`, created on first use and disposed in `PostStop`) so the resolved scoped adapter instances live as long as the cache itself — respecting `EmailNotificationDeliveryAdapter`'s scoped lifetime without leaking captive-DbContext references across the actor's full lifetime. `RunDispatchPass` now calls `ResolveAdapters()` (instance method, no args), and the existing "resolved from the per-sweep scope" comment was rewritten to cross-reference the new cache rationale. Adapter set is static per process lifetime (confirmed against the DI registration in `AddNotificationOutbox`), so no invalidation hook is needed. Regression test `Dispatch_ResolvesAdaptersOnce_AcrossMultipleSweeps` registers a counting factory and pins the resolution count at exactly 1 across three end-to-end dispatch sweeps. ### NotificationOutbox-007 — `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, and `PurgeInterval` are not in the design document | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxOptions.cs:13`, `:22`, `:25`; `docs/requirements/Component-NotificationOutbox.md:152-160` | **Description** `Component-NotificationOutbox.md` §Configuration enumerates three options: dispatch interval, stuck-age threshold, and terminal-row retention window. The implemented `NotificationOutboxOptions` adds three additional fields: - `DispatchBatchSize` (default `100`) — caps the per-sweep claim size, but is invisible to anyone reading only the spec. - `PurgeInterval` (default `1 day`) — the design doc says "daily purge" as if the cadence is fixed; in code it is configurable. - `DeliveredKpiWindow` (default `1 min`) — the KPI section says "Delivered (last interval)" without saying how long "last interval" is or that it is configurable. The design doc also asserts "Delivery max-retry-count and retry interval are not part of `NotificationOutboxOptions` — they are reused from the central SMTP configuration" (line 160) — implementation honours this. But the three additions above are dead text in the design doc. The KPI dashboard cadence and the dispatch batch size are both operationally important values an operator/engineer will hunt for; their absence from the spec is design drift. **Recommendation** Add the three fields to `Component-NotificationOutbox.md §Configuration` with their defaults, or remove them from the implementation if they were meant to be fixed constants. Cross-link `DeliveredKpiWindow` from the §Monitoring "Delivered (last interval)" KPI bullet so a reader sees what controls the bucket length. **Resolution (2026-05-28):** Resolved the code-side gap by adding clear XML `` docs to every property on `NotificationOutboxOptions` (`DispatchInterval`, `DispatchBatchSize`, `StuckAgeThreshold`, `TerminalRetention`, `PurgeInterval`, `DeliveredKpiWindow`) — each now states what it controls and its default value. The design-doc update remains tracked separately. ### NotificationOutbox-008 — `FallbackMaxRetries` / `FallbackRetryDelay` path is unreachable in production AND untested | | | |--|--| | Severity | Low | | Category | Testing coverage | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:29-31`, `:251-259`; tests in `tests/ZB.MOM.WW.ScadaBridge.NotificationOutbox.Tests/NotificationOutboxActorDispatchTests.cs` | **Resolution (2026-05-28):** The `FallbackMaxRetries` / `FallbackRetryDelay` constants are documented for forward-compat with a deferred secondary-adapter path — when no SMTP configuration row exists, `EmailNotificationDeliveryAdapter` returns `Permanent("No SMTP configuration available")` before the retry-policy values are ever consulted, so the path is effectively unreachable from any current production caller. The reachable use of the constants — clamping a non-positive `SmtpConfiguration.MaxRetries` / `RetryDelay` (per NO-002) — is already covered by `TransientFailure_WithZeroMaxRetries_RetriesUsingFallback_DoesNotParkImmediately`, `TransientFailure_WithNegativeMaxRetries_RetriesUsingFallback_DoesNotParkImmediately`, and `TransientFailure_WithNonPositiveRetryDelay_UsesFallbackDelay_NotZero` in `NotificationOutboxActorDispatchTests.cs`. Mark untestable today and re-visit when a non-Email adapter (Teams etc.) makes the empty-SMTP-config branch genuinely deliverable. **Description** `ResolveRetryPolicyAsync` falls back to `FallbackMaxRetries = 10` and `FallbackRetryDelay = 1 min` when `notificationRepository.GetAllSmtpConfigurationsAsync()` returns an empty list (no SMTP configuration row). The comment correctly observes that delivery itself will then return `Permanent("No SMTP configuration available")` from `EmailNotificationDeliveryAdapter.cs:78-81`, so the fallback retry policy never actually retries anything — the row is permanently parked on first attempt regardless of retry count or delay. This produces three concerns. (1) The fallback is essentially dead code — the retry policy values are never consulted in practice because delivery always fails permanently before the retry branch is reached. (2) The fallback can be reached *after* a previously-deployed SMTP config is deleted, which is precisely the moment an operator needs accurate audit trails; the row will say `Parked` with `LastError = "No SMTP configuration available"` but the audit signal "retry policy fell back to defaults" is invisible. (3) Tests never exercise either the fallback path or the empty-SMTP-config dispatch path: `SetupSmtpRetryPolicy` always supplies a config in every dispatch test. **Recommendation** Add a regression test that runs a dispatch sweep with no SMTP config row and asserts the row is parked with the documented error. Optionally remove the fallback constants if parking-with-no-config is the *intended* operational signal; document the choice in the actor XML so a maintainer does not "fix" the unreachable code. ### NotificationOutbox-009 — `StuckAgeThreshold` XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection | | | |--|--| | Severity | Low | | Category | Documentation & comments | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxOptions.cs:15-16` | **Description** ```csharp /// Age past which an in-progress notification is considered stuck and re-claimed. public TimeSpan StuckAgeThreshold { get; set; } = TimeSpan.FromMinutes(10); ``` The implementation never reclaims anything based on `StuckAgeThreshold`. It is used only as a cutoff for the stuck-count KPI (`StuckCutoff`/`IsStuck` in `NotificationOutboxActor.cs:932-942`) and as a `StuckCutoff` filter on paginated queries. The design doc is explicit: "A notification is **stuck** if it is `Pending` or `Retrying` and older than a configurable age threshold (default 10 minutes). Detection is **display-only** — a count KPI and a row badge. There is no automated escalation or alerting" (`Component-NotificationOutbox.md:143-145`). A maintainer reading the XML and expecting "re-claim" behaviour will be surprised twice — once when no re-claim happens, and once when they go looking for the re-claim code and find none. **Recommendation** Rewrite the XML to match the design: "Age past which a still-`Pending`/`Retrying` notification is counted as stuck on the KPI tile and the per-row badge. Display-only — does not affect dispatch." **Resolution (2026-05-28):** Rewrote the `StuckAgeThreshold` XML-doc to make the display-only semantics explicit: rows older than the threshold are flagged in KPIs/UI; there is no automatic re-claim, requeue, or escalation. Matches `Component-NotificationOutbox.md §Monitoring`. ### NotificationOutbox-010 — Comment claims `PipeTo` is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode | | | |--|--| | Severity | Medium | | Category | Documentation & comments | | Status | Resolved | | Location | `src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:469-477` | **Description** ```csharp try { var evt = BuildNotifyDeliverEvent(notification, now, AuditStatus.Attempted, errorMessage) with { DurationMs = durationMs }; // Fire-and-forget — we do NOT await: the dispatcher loop must not // be blocked by audit IO, and the writer swallows its own faults. // PipeTo is not used because the writer never throws. _ = _auditWriter.WriteAsync(evt); } catch (Exception ex) { _logger.LogWarning(ex, "Failed to emit Attempted audit row …"); } ``` The XML-doc on `EmitAttemptAudit` is internally inconsistent and structurally incorrect: (1) if "the writer never throws" then the surrounding try/catch is unreachable and dead code; (2) if the writer *can* throw (and the catch is meaningful) then "never throws" is wrong. In practice the catch only ever fires on a synchronous throw from the writer's *task construction* — never on a fault in the awaited body — because the discarded task is not observed. The current behaviour matches the design intent ("audit failure NEVER aborts delivery"), but the comment misleads the next reader on the *why*. This is the same root cause as NotificationOutbox-004 — they target the same lines from different angles (NotificationOutbox-004 is the scope-lifetime / fire-and-forget Akka concern, NotificationOutbox-010 is the doc/comment-clarity concern). Closing NotificationOutbox-004 by switching to `await` resolves both. **Recommendation** If `await`-ing the writer (recommended fix per NotificationOutbox-004): delete the "PipeTo is not used because the writer never throws" line entirely and let the try/catch's behaviour speak for itself. If keeping fire-and-forget: rewrite the comment to "fire-and-forget by design (the writer is responsible for its own failure handling); the surrounding try/catch only catches the synchronous task-construction throw and is otherwise unreachable." **Resolution** _Unresolved._