Files
ScadaBridge/code-reviews/NotificationOutbox/findings.md
T
Joseph Doherty 55f46e7c92 perf: close Theme 6 — 11 allocation / N+1 / lock-contention findings
Well-localised perf fixes across 8 modules.

Lock decoupling / SQL streaming:
- AuditLog-005: SqliteAuditWriter gains dedicated read-only _readConnection
  (+ _readLock) backed by WAL journal mode. GetBacklogStatsAsync,
  ReadPendingAsync, ReadPendingSinceAsync, ReadForwardedAsync no longer
  contend with the hot-path INSERT lock — backlog probes on a 30s timer
  can't stall the writer under multi-hundred-K Pending backlog.
- SEL-022: dropped Cache=Shared from SiteEventLogger's default connection
  string (single-connection logger; mode was dormant config).

Memory / streaming:
- CLI-019: bundle export streams base64 in 1 MB-aligned chunks via
  Convert.TryFromBase64Chars straight into the FileStream — no more
  full-bundle byte[] allocation.
- CentralUI-031: TransportImport now stages the upload to a per-session
  temp file under Path.GetTempPath() (replaces in-memory byte[] field);
  page implements IDisposable to delete the temp file on reset / new
  upload / dispose. Per-circuit working set drops from ~100 MB to ~80 KB.

N+1 hoisting:
- Transport-008: added ITemplateEngineRepository.GetTemplatesWithChildrenAsync
  bulk method; BundleImporter.PreviewAsync calls it once instead of per-
  template-name. Single query with .Include(...).AsSplitQuery().
- DM-023: BuildDeployArtifactsCommandAsync's per-site loop now references
  a pre-fetched GlobalArtifactSnapshot (shared scripts, external systems,
  DB connections, notification lists, SMTP) instead of re-querying per site.
- MgmtSvc-023: HandleQueryDeployments unfiltered branch uses one
  GetAllInstancesAsync bulk load + Dictionary<int,int?> lookup (was a
  GetInstanceByIdAsync per record).

Small allocations / per-tick rebuilds:
- InboundAPI-019: AuditWriteMiddleware gates EnableBuffering() on
  RequestHasBody() so GET/HEAD/DELETE/TRACE/OPTIONS and Content-Length:0
  requests skip the FileBufferingReadStream allocation.
- NotifOutbox-006: ResolveAdapters dictionary now cached on
  _adaptersCache (built lazily on first sweep) + actor-lifetime
  _adaptersScope; ResolveAdapters no longer rebuilds per dispatch tick.

Verify-only:
- Comm-017: Confirmed _inProgressDeployments was deleted by Comm-016 in
  commit ac96b83 — marked Resolved with that attribution. No code change.

Doc-correction:
- NS-022: Updated MailKitSmtpClientWrapper XML doc to spell out single-
  connection / per-delivery-factory contract (option (b) — transient
  client per Send — rejected because it re-handshakes TLS per email).

10+ new regression tests across 8 test projects. Build clean; affected
suites all green. README regenerated: 54 open (was 65).
2026-05-28 07:47:24 -04:00

491 lines
28 KiB
Markdown

# Code Review — NotificationOutbox
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.NotificationOutbox` |
| Design doc | `docs/requirements/Component-NotificationOutbox.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 8 |
## Summary
NotificationOutbox is a small, focused module — one ~985-line actor
(`NotificationOutboxActor`), a strongly-typed options class, an
`INotificationDeliveryAdapter` seam, and the single concrete `EmailNotificationDeliveryAdapter`.
The Akka.NET conventions are textbook: every async path is wrapped with `PipeTo`, the
dispatcher uses an in-flight guard cleared on `DispatchComplete`, the sender is captured
before crossing the await, and the actor isolates per-notification failures so one bad row
never aborts a batch. Test coverage is broad — ingest, dispatch, query, retry/discard,
purge, KPI, and the new audit-emission paths (B2 attempts + B3 terminals) all have
dedicated test files — and the audit-write-failure-never-aborts-delivery contract is
explicitly asserted.
The dominant theme is **trust-boundary leakage between Outbox, NotificationService, and
ConfigurationDatabase**. The outbox inherits two known defects from its sibling modules
that are reachable through `EmailNotificationDeliveryAdapter`: the OAuth2 SASL empty-user
bug (NS-021) ships every M365 send with `user=""`, and the
`InsertIfNotExistsAsync` check-then-act race (CD-015) lives on the outbox's ack-after-persist
hot path. Neither is a defect of code under `src/ScadaLink.NotificationOutbox/`, but both
are surfaced here because production dispatch and ingest go through these exact lines.
A secondary theme is **dispatcher-fire-and-forget audit writes** (`_ = _auditWriter.WriteAsync(...)`)
that can race the per-sweep scope dispose under the wrong DI graph, and a few smaller
drifts: the dispatcher passes `CancellationToken.None` to adapter delivery (no graceful
shutdown for in-flight SMTP sends), the `StuckAgeThreshold` XML-doc describes a behavior
the design explicitly forbids (display-only, never reclaim), the `MaxRetries` boundary check
uses `>=` against a config value that can be zero (immediate park on first transient
failure), and several `NotificationOutboxOptions` fields are documented in code but absent
from `Component-NotificationOutbox.md`. No Critical findings; two High, six Medium, two Low.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | Yes | `MaxRetries` zero/negative immediately parks (NotificationOutbox-002); `StuckAgeThreshold` XML doc contradicts design (NotificationOutbox-009); `Guid.TryParse` accepts compact `"N"` ids emitted by sites. |
| 2 | Akka.NET conventions | Yes | `PipeTo` / sender-capture / in-flight guard pattern is correctly applied throughout. Fire-and-forget `_ = _auditWriter.WriteAsync(...)` raises a scope-lifetime concern (NotificationOutbox-004). |
| 3 | Concurrency & thread safety | Yes | Actor state mutated only on actor thread. Inherited CD-015 race on `InsertIfNotExistsAsync` (NotificationOutbox-005) is the only race; the dispatcher's in-flight guard correctly serializes sweeps. |
| 4 | Error handling & resilience | Yes | Outer try/catch on `RunDispatchPass`/`RunPurgePass` keeps the in-flight guard sane; per-notification isolation is correct. CT not threaded into delivery (NotificationOutbox-003). |
| 5 | Security | Yes | Inherited OAuth2 empty-user (NotificationOutbox-001) reachable through the adapter. No new credential or trust-boundary issues introduced by the outbox itself. |
| 6 | Performance & resource management | Yes | Dispatch interval & batch size are simple polling; `ResolveAdapters` rebuilds the lookup per sweep (NotificationOutbox-006). No leaks. |
| 7 | Design-document adherence | Yes | `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, `PurgeInterval` are not in the design doc (NotificationOutbox-007). |
| 8 | Code organization & conventions | Yes | Options class lives in the component project (correct); DI extension lives in the component (correct); adapter is `scoped`, actor singleton — interaction correctly documented in `ServiceCollectionExtensions`. No issues. |
| 9 | Testing coverage | Yes | Solid actor-behaviour coverage. Missing tests for `FallbackMaxRetries` / empty-SMTP-config dispatch path (NotificationOutbox-008). |
| 10 | Documentation & comments | Yes | XML on `StuckAgeThreshold` misleading (NotificationOutbox-009); XML on dispatcher's audit `_ =` fire-and-forget says "writer never throws" but `EmitAttemptAudit` still wraps in try/catch — comment contradicts itself (NotificationOutbox-010). |
## Findings
### NotificationOutbox-001 — `EmailNotificationDeliveryAdapter` inherits the OAuth2 empty-user SASL bug (NS-021) on the M365 send path
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/Delivery/EmailNotificationDeliveryAdapter.cs:185-191` (calls `smtp.AuthenticateAsync("oauth2", token)`); root cause in `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:76-79` |
**Description**
`EmailNotificationDeliveryAdapter.SendAsync` resolves an OAuth2 access token via
`_tokenService.GetTokenAsync(...)` and then calls
`await smtp.AuthenticateAsync(config.AuthType, credentials, cancellationToken);`
on `ISmtpClientWrapper`. The production implementation (`MailKitSmtpClientWrapper`)
constructs `new SaslMechanismOAuth2("", credentials)` — an empty user-name field —
which Microsoft 365 SMTP rejects with `535 5.7.3 Authentication unsuccessful`. The
sibling NotificationService finding NS-021 documents this in full; the outbox is the
*new home* for delivery on central, so every OAuth2 send that the outbox dispatches
hits this code path. The defect is therefore reachable here even though the offending
constructor lives in the NotificationService project, and the central-only redesign
means this is now the only delivery path in production. Existing outbox tests do not
catch it because they all substitute `ISmtpClientWrapper` and assert only that
`AuthenticateAsync` is invoked with `("oauth2", "<token>")` — the real
`SaslMechanismOAuth2` is never instantiated. `OAuth2TokenService.GetTokenAsync` is
explicitly wired to `login.microsoftonline.com/.../oauth2/v2.0/token` with
`scope=https://outlook.office365.com/.default`, so M365 SMTP is the intended target —
and is precisely the relay that requires the user field to be populated.
**Recommendation**
Track the NS-021 fix and add an outbox-side regression test once the wrapper signature
is widened. Concretely, when `ISmtpClientWrapper.AuthenticateAsync` is extended to
accept the sender mailbox (or a dedicated `oauth2UserName` parameter), update
`EmailNotificationDeliveryAdapter.SendAsync` to pass `config.FromAddress`, and add a
test in `EmailNotificationDeliveryAdapterTests` that asserts the OAuth2 path forwards
the sender identity. Until then, surface the same finding here so the outbox is not
treated as resolved when NS-021 fires.
**Resolution**
_Unresolved._
### NotificationOutbox-002 — Dispatcher parks on first transient failure when `SmtpConfiguration.MaxRetries == 0`
| | |
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:348-360` |
**Description**
The transient-failure branch increments `RetryCount` then evaluates
`if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked;`.
`maxRetries` is read from the central `SmtpConfiguration.MaxRetries` column, which has
no enforced lower bound and is not validated by the outbox. A row whose `MaxRetries`
is `0` (or any negative value) immediately satisfies `1 >= 0` on the very first
transient failure, so the notification is parked without a single retry — directly
contradicting the design doc's "fixed retry interval, reuse central SMTP
max-retry-count" intent, where a configured value of zero would naturally read as
"never retry, fail straight to permanent". `SetupSmtpRetryPolicy` in the dispatch
tests always supplies a positive value, so this path is not exercised.
Additionally, an operator who clears the SMTP config row drops into the
`FallbackMaxRetries = 10` / `FallbackRetryDelay = 1 min` path
(`ResolveRetryPolicyAsync` line 251); that path is also untested — see
NotificationOutbox-008. The operational result is that a single bad SMTP config
value silently halves the outbox's delivery guarantees.
**Recommendation**
Validate `MaxRetries` at the read point: treat a non-positive value as either the
configured fallback (current `FallbackMaxRetries = 10`) or — preferred — surface the
mis-configuration to the operator via a health metric and refuse to dispatch until
the row is corrected. Either way, add a test that asserts the dispatcher's behaviour
for `MaxRetries == 0` and `MaxRetries < 0`.
**Resolution**
_Unresolved._
### NotificationOutbox-003 — Dispatcher does not propagate a `CancellationToken` into delivery; in-flight SMTP sends cannot be cancelled on shutdown
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:334`, `src/ScadaLink.NotificationOutbox/Delivery/INotificationDeliveryAdapter.cs:22` |
**Description**
`DeliverOneAsync` calls `var outcome = await adapter.DeliverAsync(notification);`
the second `CancellationToken` parameter on `INotificationDeliveryAdapter.DeliverAsync`
is left at its `default(CancellationToken)` value, meaning `CancellationToken.None`.
`EmailNotificationDeliveryAdapter.SendAsync` then threads that `None` token into
`smtp.ConnectAsync`, `smtp.AuthenticateAsync`, and `smtp.SendAsync`. The consequence
is that during a coordinated cluster shutdown (singleton handover, drain) any
in-flight SMTP send is uncancellable and the dispatcher's sweep must wait for the
underlying socket/SMTP timeout (`SmtpConfiguration.ConnectionTimeoutSeconds`) before
the sweep's task completes and `DispatchComplete` lowers the in-flight guard. With
the default connect-timeout values this is on the order of tens of seconds per
notification in the in-progress batch, blocking `CoordinatedShutdown`.
The adapter implementations clearly *expect* a token — the contract type is
`CancellationToken cancellationToken = default` everywhere — so this is a wiring
gap, not a missing interface.
**Recommendation**
Wire a per-sweep `CancellationTokenSource` linked to the actor's lifecycle (cancel
in `PostStop`) and pass its token into `DeliverAsync`. A linked source per sweep
also bounds individual deliveries by the configured connection timeout when a more
explicit per-attempt budget is wanted. Add a test that cancels mid-`DeliverAsync` and
asserts the dispatcher completes promptly and the row is left non-terminal
(`Pending`/`Retrying` unchanged) for the next sweep.
**Resolution**
_Unresolved._
### NotificationOutbox-004 — `EmitAttemptAudit`/`EmitTerminalAudit` fire-and-forget pattern can outlive the per-sweep DI scope
| | |
|--|--|
| Severity | Medium |
| Category | Akka.NET conventions |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:425-435`, `463-485` |
**Description**
Both emission helpers issue `_ = _auditWriter.WriteAsync(evt);` — discarding the
returned task. `CentralAuditWriter.WriteAsync` opens its own `await using var scope =
_services.CreateAsyncScope();` and resolves a scoped `IAuditLogRepository` (verified
at `src/ScadaLink.AuditLog/Central/CentralAuditWriter.cs:118-121`), so the writer is
defensively scope-independent. However the dispatcher already holds a per-sweep
`using var scope = _serviceProvider.CreateScope();` and the per-notification
`UpdateAsync` runs in that scope. The fire-and-forget pattern means:
1. The dispatcher's outer scope can be disposed (sweep done, `DispatchComplete`
piped) while the audit `WriteAsync` task is still running on a *different*
scope it owns — works today only because the writer creates its own scope.
2. A faulted unobserved task is silently lost: if `CentralAuditWriter.WriteAsync`
itself were ever made `async void` or refactored to not internally try/catch,
the dispatcher would never see the fault and the audit row would vanish without
the `_logger.LogWarning` reaching the operator.
3. The XML-doc above `EmitAttemptAudit` says "PipeTo is not used because the writer
never throws" — but the surrounding `try { _ = _auditWriter.WriteAsync(evt); }
catch (Exception ex)` will only catch a synchronous throw from the *task
construction*, not the awaited body of `WriteAsync`. The comment understates the
risk: the catch is structurally unreachable for the documented failure mode.
The system actually wants the *invariant* "audit write never affects delivery"
(verified by the `AuditWriter_Throws_…StillSucceeds` tests). That invariant is
better expressed by `await`-ing the writer inside the actor's outer try/catch (the
dispatcher already swallows per-notification exceptions) than by a discard-task,
which couples the lifetime of the dispatcher's scope to that of the audit task
through whatever scope graph the writer happens to use today.
**Recommendation**
Either `await _auditWriter.WriteAsync(evt)` inside the existing `try`/`catch` (the
preferred fix — preserves the invariant, plays well with the per-sweep scope, and
makes the catch block actually reachable), or — if a true fire-and-forget remains
desired — capture the returned task and attach a continuation that calls
`_logger.LogWarning` on faulted to keep diagnostics intact. Either way, fix the
"writer never throws" XML-doc to match the implementation.
**Resolution**
_Unresolved._
### NotificationOutbox-005 — Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:127-132` (caller); root cause in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
**Resolution (2026-05-28):** Closed by CD-015 — `NotificationOutboxRepository.InsertIfNotExistsAsync` (commit `ac96b83`) is now a single-statement `IF NOT EXISTS ... INSERT` via `ExecuteSqlInterpolatedAsync` with a `SqlException` filter swallowing duplicate-key violations (`2601`/`2627`) as a no-op (`return false`). The check-then-act window is eliminated; the at-least-once handoff contract holds and the actor's `PipeTo` success/failure projection no longer surfaces a permanent PK-violation back to the site. Verified in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:51-103`.
**Description**
`HandleSubmit``PersistAsync` calls `repository.InsertIfNotExistsAsync(notification)`
on `INotificationOutboxRepository`. The current implementation
(`src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs`)
does a check-then-act with no duplicate-key catch — documented as CD-015 (High,
Open). The Notification Outbox's documented contract is "at-least-once handoff with
ack-after-persist plus insert-if-not-exists on `NotificationId`" (CLAUDE.md,
Component-NotificationOutbox.md §Ingest & Idempotency), and the duplicate-insert
race is the **expected contention pattern** — the site retries the same submission
after a lost ack. As written, the loser surfaces a `SqlException` (2627 PK
violation) wrapped in `DbUpdateException`, propagates through `PipeTo`'s failure
projection as a `NotificationSubmitAck { Accepted: false, Error: "... PRIMARY KEY ..." }`,
the site treats the ack as a forwarding failure and forwards the message **again**,
re-entering the same race. If the contending pair keeps racing this can livelock.
The actor side is fine — `PipeTo`'s success/failure projection correctly forwards
the exception message. The repository side needs the standard `2601/2627 → no-op`
pattern that AuditLog and SiteCall already use. This finding tracks the outbox-side
visibility of the CD-015 defect so a re-review of NotificationOutbox surfaces it
even if the reader has not yet read the ConfigurationDatabase findings.
**Recommendation**
Track CD-015 to resolution. As a defense-in-depth complement here, consider
treating a duplicate-key `DbUpdateException` in the actor's ingest failure
projection as `Accepted: true` so a lost ack between persisted-by-the-first-writer
and ack-back does not produce a permanent re-forward loop — but the cleanest fix
remains the CD-015 raw-SQL `IF NOT EXISTS … INSERT` with `2601/2627` catch in
`NotificationOutboxRepository`.
### NotificationOutbox-006 — `ResolveAdapters` rebuilds the `NotificationType → adapter` dictionary on every dispatch sweep
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:267-277` |
**Description**
Every dispatch sweep calls `ResolveAdapters(scope.ServiceProvider)` which enumerates
`scopedServices.GetServices<INotificationDeliveryAdapter>()` and builds a fresh
`Dictionary<NotificationType, INotificationDeliveryAdapter>`. Adapter registration
is decided at startup (`AddNotificationOutbox` registers
`EmailNotificationDeliveryAdapter`); the registration set does not change at
runtime. With a default `DispatchInterval = 10s` and only ever one entry today, the
allocation overhead is trivial — but the comment "the last adapter registered for a
given type wins, mirroring DI's last-wins resolution semantics" elevates this to a
behaviour contract, and the per-sweep dictionary construction obscures the lookup's
identity from one sweep to the next, making any future stateful adapter (rate
limiter, circuit breaker) silently lose its state.
The same issue is the reason `EmailNotificationDeliveryAdapter` is *scoped* — it
holds a scoped `INotificationRepository`. A trivial cache-the-types-but-resolve-
the-instance fix is possible: cache the set of declared `NotificationType` values
and look up each adapter by `GetService<INotificationDeliveryAdapter>()`
filtered by `Type` per sweep.
**Recommendation**
Document the per-sweep contract explicitly ("each sweep gets a fresh adapter
instance per the scoped DI contract — adapters must not carry state across
sweeps") in the actor XML, or — preferred — cache only the *types* at startup
(`PreStart`) and resolve the scoped instance per sweep, so future adapters with
stateful intent (timeouts, circuit breakers) cannot accidentally lose state.
**Resolution (2026-05-28):** Cached the `NotificationType → adapter` dictionary on a new actor field `_adaptersCache`, built lazily on the first dispatch sweep that needs it. The cache is paired with an actor-lifetime `IServiceScope` (`_adaptersScope`, created on first use and disposed in `PostStop`) so the resolved scoped adapter instances live as long as the cache itself — respecting `EmailNotificationDeliveryAdapter`'s scoped lifetime without leaking captive-DbContext references across the actor's full lifetime. `RunDispatchPass` now calls `ResolveAdapters()` (instance method, no args), and the existing "resolved from the per-sweep scope" comment was rewritten to cross-reference the new cache rationale. Adapter set is static per process lifetime (confirmed against the DI registration in `AddNotificationOutbox`), so no invalidation hook is needed. Regression test `Dispatch_ResolvesAdaptersOnce_AcrossMultipleSweeps` registers a counting factory and pins the resolution count at exactly 1 across three end-to-end dispatch sweeps.
### NotificationOutbox-007 — `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, and `PurgeInterval` are not in the design document
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:13`, `:22`, `:25`; `docs/requirements/Component-NotificationOutbox.md:152-160` |
**Description**
`Component-NotificationOutbox.md` §Configuration enumerates three options: dispatch
interval, stuck-age threshold, and terminal-row retention window. The implemented
`NotificationOutboxOptions` adds three additional fields:
- `DispatchBatchSize` (default `100`) — caps the per-sweep claim size, but is invisible
to anyone reading only the spec.
- `PurgeInterval` (default `1 day`) — the design doc says "daily purge" as if the
cadence is fixed; in code it is configurable.
- `DeliveredKpiWindow` (default `1 min`) — the KPI section says "Delivered (last
interval)" without saying how long "last interval" is or that it is configurable.
The design doc also asserts "Delivery max-retry-count and retry interval are not
part of `NotificationOutboxOptions` — they are reused from the central SMTP
configuration" (line 160) — implementation honours this. But the three additions
above are dead text in the design doc. The KPI dashboard cadence and the dispatch
batch size are both operationally important values an operator/engineer will hunt
for; their absence from the spec is design drift.
**Recommendation**
Add the three fields to `Component-NotificationOutbox.md §Configuration` with their
defaults, or remove them from the implementation if they were meant to be fixed
constants. Cross-link `DeliveredKpiWindow` from the §Monitoring "Delivered (last
interval)" KPI bullet so a reader sees what controls the bucket length.
**Resolution (2026-05-28):**
Resolved the code-side gap by adding clear XML `<summary>` docs to every property on
`NotificationOutboxOptions` (`DispatchInterval`, `DispatchBatchSize`,
`StuckAgeThreshold`, `TerminalRetention`, `PurgeInterval`, `DeliveredKpiWindow`) —
each now states what it controls and its default value. The design-doc update
remains tracked separately.
### NotificationOutbox-008 — `FallbackMaxRetries` / `FallbackRetryDelay` path is unreachable in production AND untested
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:29-31`, `:251-259`; tests in `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorDispatchTests.cs` |
**Description**
`ResolveRetryPolicyAsync` falls back to `FallbackMaxRetries = 10` and
`FallbackRetryDelay = 1 min` when `notificationRepository.GetAllSmtpConfigurationsAsync()`
returns an empty list (no SMTP configuration row). The comment correctly observes
that delivery itself will then return `Permanent("No SMTP configuration available")`
from `EmailNotificationDeliveryAdapter.cs:78-81`, so the fallback retry policy
never actually retries anything — the row is permanently parked on first attempt
regardless of retry count or delay.
This produces three concerns. (1) The fallback is essentially dead code — the retry
policy values are never consulted in practice because delivery always fails
permanently before the retry branch is reached. (2) The fallback can be reached
*after* a previously-deployed SMTP config is deleted, which is precisely the
moment an operator needs accurate audit trails; the row will say `Parked` with
`LastError = "No SMTP configuration available"` but the audit signal "retry policy
fell back to defaults" is invisible. (3) Tests never exercise either the fallback
path or the empty-SMTP-config dispatch path: `SetupSmtpRetryPolicy` always supplies
a config in every dispatch test.
**Recommendation**
Add a regression test that runs a dispatch sweep with no SMTP config row and
asserts the row is parked with the documented error. Optionally remove the fallback
constants if parking-with-no-config is the *intended* operational signal; document
the choice in the actor XML so a maintainer does not "fix" the unreachable code.
**Resolution**
_Unresolved._
### NotificationOutbox-009 — `StuckAgeThreshold` XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:15-16` |
**Description**
```csharp
/// <summary>Age past which an in-progress notification is considered stuck and re-claimed.</summary>
public TimeSpan StuckAgeThreshold { get; set; } = TimeSpan.FromMinutes(10);
```
The implementation never reclaims anything based on `StuckAgeThreshold`. It is used
only as a cutoff for the stuck-count KPI (`StuckCutoff`/`IsStuck` in
`NotificationOutboxActor.cs:932-942`) and as a `StuckCutoff` filter on paginated
queries. The design doc is explicit: "A notification is **stuck** if it is `Pending`
or `Retrying` and older than a configurable age threshold (default 10 minutes).
Detection is **display-only** — a count KPI and a row badge. There is no automated
escalation or alerting" (`Component-NotificationOutbox.md:143-145`). A maintainer
reading the XML and expecting "re-claim" behaviour will be surprised twice — once
when no re-claim happens, and once when they go looking for the re-claim code and
find none.
**Recommendation**
Rewrite the XML to match the design: "Age past which a still-`Pending`/`Retrying`
notification is counted as stuck on the KPI tile and the per-row badge.
Display-only — does not affect dispatch."
**Resolution (2026-05-28):**
Rewrote the `StuckAgeThreshold` XML-doc to make the display-only semantics explicit:
rows older than the threshold are flagged in KPIs/UI; there is no automatic re-claim,
requeue, or escalation. Matches `Component-NotificationOutbox.md §Monitoring`.
### NotificationOutbox-010 — Comment claims `PipeTo` is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode
| | |
|--|--|
| Severity | Medium |
| Category | Documentation & comments |
| Status | Resolved |
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:469-477` |
**Description**
```csharp
try
{
var evt = BuildNotifyDeliverEvent(notification, now, AuditStatus.Attempted, errorMessage)
with { DurationMs = durationMs };
// Fire-and-forget — we do NOT await: the dispatcher loop must not
// be blocked by audit IO, and the writer swallows its own faults.
// PipeTo is not used because the writer never throws.
_ = _auditWriter.WriteAsync(evt);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Failed to emit Attempted audit row …");
}
```
The XML-doc on `EmitAttemptAudit` is internally inconsistent and structurally
incorrect: (1) if "the writer never throws" then the surrounding try/catch is
unreachable and dead code; (2) if the writer *can* throw (and the catch is
meaningful) then "never throws" is wrong. In practice the catch only ever fires
on a synchronous throw from the writer's *task construction* — never on a fault
in the awaited body — because the discarded task is not observed. The current
behaviour matches the design intent ("audit failure NEVER aborts delivery"), but
the comment misleads the next reader on the *why*.
This is the same root cause as NotificationOutbox-004 — they target the same lines
from different angles (NotificationOutbox-004 is the scope-lifetime /
fire-and-forget Akka concern, NotificationOutbox-010 is the doc/comment-clarity
concern). Closing NotificationOutbox-004 by switching to `await` resolves both.
**Recommendation**
If `await`-ing the writer (recommended fix per NotificationOutbox-004): delete the
"PipeTo is not used because the writer never throws" line entirely and let
the try/catch's behaviour speak for itself. If keeping fire-and-forget: rewrite
the comment to "fire-and-forget by design (the writer is responsible for its
own failure handling); the surrounding try/catch only catches the synchronous
task-construction throw and is otherwise unreachable."
**Resolution**
_Unresolved._