f93b7b99bb
Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.
regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
489 lines
26 KiB
Markdown
489 lines
26 KiB
Markdown
# Code Review — NotificationOutbox
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.NotificationOutbox` |
|
|
| Design doc | `docs/requirements/Component-NotificationOutbox.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-28 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `1eb6e97` |
|
|
| Open findings | 10 |
|
|
|
|
## Summary
|
|
|
|
NotificationOutbox is a small, focused module — one ~985-line actor
|
|
(`NotificationOutboxActor`), a strongly-typed options class, an
|
|
`INotificationDeliveryAdapter` seam, and the single concrete `EmailNotificationDeliveryAdapter`.
|
|
The Akka.NET conventions are textbook: every async path is wrapped with `PipeTo`, the
|
|
dispatcher uses an in-flight guard cleared on `DispatchComplete`, the sender is captured
|
|
before crossing the await, and the actor isolates per-notification failures so one bad row
|
|
never aborts a batch. Test coverage is broad — ingest, dispatch, query, retry/discard,
|
|
purge, KPI, and the new audit-emission paths (B2 attempts + B3 terminals) all have
|
|
dedicated test files — and the audit-write-failure-never-aborts-delivery contract is
|
|
explicitly asserted.
|
|
|
|
The dominant theme is **trust-boundary leakage between Outbox, NotificationService, and
|
|
ConfigurationDatabase**. The outbox inherits two known defects from its sibling modules
|
|
that are reachable through `EmailNotificationDeliveryAdapter`: the OAuth2 SASL empty-user
|
|
bug (NS-021) ships every M365 send with `user=""`, and the
|
|
`InsertIfNotExistsAsync` check-then-act race (CD-015) lives on the outbox's ack-after-persist
|
|
hot path. Neither is a defect of code under `src/ScadaLink.NotificationOutbox/`, but both
|
|
are surfaced here because production dispatch and ingest go through these exact lines.
|
|
A secondary theme is **dispatcher-fire-and-forget audit writes** (`_ = _auditWriter.WriteAsync(...)`)
|
|
that can race the per-sweep scope dispose under the wrong DI graph, and a few smaller
|
|
drifts: the dispatcher passes `CancellationToken.None` to adapter delivery (no graceful
|
|
shutdown for in-flight SMTP sends), the `StuckAgeThreshold` XML-doc describes a behavior
|
|
the design explicitly forbids (display-only, never reclaim), the `MaxRetries` boundary check
|
|
uses `>=` against a config value that can be zero (immediate park on first transient
|
|
failure), and several `NotificationOutboxOptions` fields are documented in code but absent
|
|
from `Component-NotificationOutbox.md`. No Critical findings; two High, six Medium, two Low.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | Yes | `MaxRetries` zero/negative immediately parks (NotificationOutbox-002); `StuckAgeThreshold` XML doc contradicts design (NotificationOutbox-009); `Guid.TryParse` accepts compact `"N"` ids emitted by sites. |
|
|
| 2 | Akka.NET conventions | Yes | `PipeTo` / sender-capture / in-flight guard pattern is correctly applied throughout. Fire-and-forget `_ = _auditWriter.WriteAsync(...)` raises a scope-lifetime concern (NotificationOutbox-004). |
|
|
| 3 | Concurrency & thread safety | Yes | Actor state mutated only on actor thread. Inherited CD-015 race on `InsertIfNotExistsAsync` (NotificationOutbox-005) is the only race; the dispatcher's in-flight guard correctly serializes sweeps. |
|
|
| 4 | Error handling & resilience | Yes | Outer try/catch on `RunDispatchPass`/`RunPurgePass` keeps the in-flight guard sane; per-notification isolation is correct. CT not threaded into delivery (NotificationOutbox-003). |
|
|
| 5 | Security | Yes | Inherited OAuth2 empty-user (NotificationOutbox-001) reachable through the adapter. No new credential or trust-boundary issues introduced by the outbox itself. |
|
|
| 6 | Performance & resource management | Yes | Dispatch interval & batch size are simple polling; `ResolveAdapters` rebuilds the lookup per sweep (NotificationOutbox-006). No leaks. |
|
|
| 7 | Design-document adherence | Yes | `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, `PurgeInterval` are not in the design doc (NotificationOutbox-007). |
|
|
| 8 | Code organization & conventions | Yes | Options class lives in the component project (correct); DI extension lives in the component (correct); adapter is `scoped`, actor singleton — interaction correctly documented in `ServiceCollectionExtensions`. No issues. |
|
|
| 9 | Testing coverage | Yes | Solid actor-behaviour coverage. Missing tests for `FallbackMaxRetries` / empty-SMTP-config dispatch path (NotificationOutbox-008). |
|
|
| 10 | Documentation & comments | Yes | XML on `StuckAgeThreshold` misleading (NotificationOutbox-009); XML on dispatcher's audit `_ =` fire-and-forget says "writer never throws" but `EmitAttemptAudit` still wraps in try/catch — comment contradicts itself (NotificationOutbox-010). |
|
|
|
|
## Findings
|
|
|
|
### NotificationOutbox-001 — `EmailNotificationDeliveryAdapter` inherits the OAuth2 empty-user SASL bug (NS-021) on the M365 send path
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/Delivery/EmailNotificationDeliveryAdapter.cs:185-191` (calls `smtp.AuthenticateAsync("oauth2", token)`); root cause in `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:76-79` |
|
|
|
|
**Description**
|
|
|
|
`EmailNotificationDeliveryAdapter.SendAsync` resolves an OAuth2 access token via
|
|
`_tokenService.GetTokenAsync(...)` and then calls
|
|
`await smtp.AuthenticateAsync(config.AuthType, credentials, cancellationToken);`
|
|
on `ISmtpClientWrapper`. The production implementation (`MailKitSmtpClientWrapper`)
|
|
constructs `new SaslMechanismOAuth2("", credentials)` — an empty user-name field —
|
|
which Microsoft 365 SMTP rejects with `535 5.7.3 Authentication unsuccessful`. The
|
|
sibling NotificationService finding NS-021 documents this in full; the outbox is the
|
|
*new home* for delivery on central, so every OAuth2 send that the outbox dispatches
|
|
hits this code path. The defect is therefore reachable here even though the offending
|
|
constructor lives in the NotificationService project, and the central-only redesign
|
|
means this is now the only delivery path in production. Existing outbox tests do not
|
|
catch it because they all substitute `ISmtpClientWrapper` and assert only that
|
|
`AuthenticateAsync` is invoked with `("oauth2", "<token>")` — the real
|
|
`SaslMechanismOAuth2` is never instantiated. `OAuth2TokenService.GetTokenAsync` is
|
|
explicitly wired to `login.microsoftonline.com/.../oauth2/v2.0/token` with
|
|
`scope=https://outlook.office365.com/.default`, so M365 SMTP is the intended target —
|
|
and is precisely the relay that requires the user field to be populated.
|
|
|
|
**Recommendation**
|
|
|
|
Track the NS-021 fix and add an outbox-side regression test once the wrapper signature
|
|
is widened. Concretely, when `ISmtpClientWrapper.AuthenticateAsync` is extended to
|
|
accept the sender mailbox (or a dedicated `oauth2UserName` parameter), update
|
|
`EmailNotificationDeliveryAdapter.SendAsync` to pass `config.FromAddress`, and add a
|
|
test in `EmailNotificationDeliveryAdapterTests` that asserts the OAuth2 path forwards
|
|
the sender identity. Until then, surface the same finding here so the outbox is not
|
|
treated as resolved when NS-021 fires.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-002 — Dispatcher parks on first transient failure when `SmtpConfiguration.MaxRetries == 0`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:348-360` |
|
|
|
|
**Description**
|
|
|
|
The transient-failure branch increments `RetryCount` then evaluates
|
|
`if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked;`.
|
|
`maxRetries` is read from the central `SmtpConfiguration.MaxRetries` column, which has
|
|
no enforced lower bound and is not validated by the outbox. A row whose `MaxRetries`
|
|
is `0` (or any negative value) immediately satisfies `1 >= 0` on the very first
|
|
transient failure, so the notification is parked without a single retry — directly
|
|
contradicting the design doc's "fixed retry interval, reuse central SMTP
|
|
max-retry-count" intent, where a configured value of zero would naturally read as
|
|
"never retry, fail straight to permanent". `SetupSmtpRetryPolicy` in the dispatch
|
|
tests always supplies a positive value, so this path is not exercised.
|
|
|
|
Additionally, an operator who clears the SMTP config row drops into the
|
|
`FallbackMaxRetries = 10` / `FallbackRetryDelay = 1 min` path
|
|
(`ResolveRetryPolicyAsync` line 251); that path is also untested — see
|
|
NotificationOutbox-008. The operational result is that a single bad SMTP config
|
|
value silently halves the outbox's delivery guarantees.
|
|
|
|
**Recommendation**
|
|
|
|
Validate `MaxRetries` at the read point: treat a non-positive value as either the
|
|
configured fallback (current `FallbackMaxRetries = 10`) or — preferred — surface the
|
|
mis-configuration to the operator via a health metric and refuse to dispatch until
|
|
the row is corrected. Either way, add a test that asserts the dispatcher's behaviour
|
|
for `MaxRetries == 0` and `MaxRetries < 0`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-003 — Dispatcher does not propagate a `CancellationToken` into delivery; in-flight SMTP sends cannot be cancelled on shutdown
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:334`, `src/ScadaLink.NotificationOutbox/Delivery/INotificationDeliveryAdapter.cs:22` |
|
|
|
|
**Description**
|
|
|
|
`DeliverOneAsync` calls `var outcome = await adapter.DeliverAsync(notification);` —
|
|
the second `CancellationToken` parameter on `INotificationDeliveryAdapter.DeliverAsync`
|
|
is left at its `default(CancellationToken)` value, meaning `CancellationToken.None`.
|
|
`EmailNotificationDeliveryAdapter.SendAsync` then threads that `None` token into
|
|
`smtp.ConnectAsync`, `smtp.AuthenticateAsync`, and `smtp.SendAsync`. The consequence
|
|
is that during a coordinated cluster shutdown (singleton handover, drain) any
|
|
in-flight SMTP send is uncancellable and the dispatcher's sweep must wait for the
|
|
underlying socket/SMTP timeout (`SmtpConfiguration.ConnectionTimeoutSeconds`) before
|
|
the sweep's task completes and `DispatchComplete` lowers the in-flight guard. With
|
|
the default connect-timeout values this is on the order of tens of seconds per
|
|
notification in the in-progress batch, blocking `CoordinatedShutdown`.
|
|
|
|
The adapter implementations clearly *expect* a token — the contract type is
|
|
`CancellationToken cancellationToken = default` everywhere — so this is a wiring
|
|
gap, not a missing interface.
|
|
|
|
**Recommendation**
|
|
|
|
Wire a per-sweep `CancellationTokenSource` linked to the actor's lifecycle (cancel
|
|
in `PostStop`) and pass its token into `DeliverAsync`. A linked source per sweep
|
|
also bounds individual deliveries by the configured connection timeout when a more
|
|
explicit per-attempt budget is wanted. Add a test that cancels mid-`DeliverAsync` and
|
|
asserts the dispatcher completes promptly and the row is left non-terminal
|
|
(`Pending`/`Retrying` unchanged) for the next sweep.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-004 — `EmitAttemptAudit`/`EmitTerminalAudit` fire-and-forget pattern can outlive the per-sweep DI scope
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:425-435`, `463-485` |
|
|
|
|
**Description**
|
|
|
|
Both emission helpers issue `_ = _auditWriter.WriteAsync(evt);` — discarding the
|
|
returned task. `CentralAuditWriter.WriteAsync` opens its own `await using var scope =
|
|
_services.CreateAsyncScope();` and resolves a scoped `IAuditLogRepository` (verified
|
|
at `src/ScadaLink.AuditLog/Central/CentralAuditWriter.cs:118-121`), so the writer is
|
|
defensively scope-independent. However the dispatcher already holds a per-sweep
|
|
`using var scope = _serviceProvider.CreateScope();` and the per-notification
|
|
`UpdateAsync` runs in that scope. The fire-and-forget pattern means:
|
|
|
|
1. The dispatcher's outer scope can be disposed (sweep done, `DispatchComplete`
|
|
piped) while the audit `WriteAsync` task is still running on a *different*
|
|
scope it owns — works today only because the writer creates its own scope.
|
|
2. A faulted unobserved task is silently lost: if `CentralAuditWriter.WriteAsync`
|
|
itself were ever made `async void` or refactored to not internally try/catch,
|
|
the dispatcher would never see the fault and the audit row would vanish without
|
|
the `_logger.LogWarning` reaching the operator.
|
|
3. The XML-doc above `EmitAttemptAudit` says "PipeTo is not used because the writer
|
|
never throws" — but the surrounding `try { _ = _auditWriter.WriteAsync(evt); }
|
|
catch (Exception ex)` will only catch a synchronous throw from the *task
|
|
construction*, not the awaited body of `WriteAsync`. The comment understates the
|
|
risk: the catch is structurally unreachable for the documented failure mode.
|
|
|
|
The system actually wants the *invariant* "audit write never affects delivery"
|
|
(verified by the `AuditWriter_Throws_…StillSucceeds` tests). That invariant is
|
|
better expressed by `await`-ing the writer inside the actor's outer try/catch (the
|
|
dispatcher already swallows per-notification exceptions) than by a discard-task,
|
|
which couples the lifetime of the dispatcher's scope to that of the audit task
|
|
through whatever scope graph the writer happens to use today.
|
|
|
|
**Recommendation**
|
|
|
|
Either `await _auditWriter.WriteAsync(evt)` inside the existing `try`/`catch` (the
|
|
preferred fix — preserves the invariant, plays well with the per-sweep scope, and
|
|
makes the catch block actually reachable), or — if a true fire-and-forget remains
|
|
desired — capture the returned task and attach a continuation that calls
|
|
`_logger.LogWarning` on faulted to keep diagnostics intact. Either way, fix the
|
|
"writer never throws" XML-doc to match the implementation.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-005 — Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:127-132` (caller); root cause in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
|
|
|
|
**Description**
|
|
|
|
`HandleSubmit` → `PersistAsync` calls `repository.InsertIfNotExistsAsync(notification)`
|
|
on `INotificationOutboxRepository`. The current implementation
|
|
(`src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs`)
|
|
does a check-then-act with no duplicate-key catch — documented as CD-015 (High,
|
|
Open). The Notification Outbox's documented contract is "at-least-once handoff with
|
|
ack-after-persist plus insert-if-not-exists on `NotificationId`" (CLAUDE.md,
|
|
Component-NotificationOutbox.md §Ingest & Idempotency), and the duplicate-insert
|
|
race is the **expected contention pattern** — the site retries the same submission
|
|
after a lost ack. As written, the loser surfaces a `SqlException` (2627 PK
|
|
violation) wrapped in `DbUpdateException`, propagates through `PipeTo`'s failure
|
|
projection as a `NotificationSubmitAck { Accepted: false, Error: "... PRIMARY KEY ..." }`,
|
|
the site treats the ack as a forwarding failure and forwards the message **again**,
|
|
re-entering the same race. If the contending pair keeps racing this can livelock.
|
|
|
|
The actor side is fine — `PipeTo`'s success/failure projection correctly forwards
|
|
the exception message. The repository side needs the standard `2601/2627 → no-op`
|
|
pattern that AuditLog and SiteCall already use. This finding tracks the outbox-side
|
|
visibility of the CD-015 defect so a re-review of NotificationOutbox surfaces it
|
|
even if the reader has not yet read the ConfigurationDatabase findings.
|
|
|
|
**Recommendation**
|
|
|
|
Track CD-015 to resolution. As a defense-in-depth complement here, consider
|
|
treating a duplicate-key `DbUpdateException` in the actor's ingest failure
|
|
projection as `Accepted: true` so a lost ack between persisted-by-the-first-writer
|
|
and ack-back does not produce a permanent re-forward loop — but the cleanest fix
|
|
remains the CD-015 raw-SQL `IF NOT EXISTS … INSERT` with `2601/2627` catch in
|
|
`NotificationOutboxRepository`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-006 — `ResolveAdapters` rebuilds the `NotificationType → adapter` dictionary on every dispatch sweep
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:267-277` |
|
|
|
|
**Description**
|
|
|
|
Every dispatch sweep calls `ResolveAdapters(scope.ServiceProvider)` which enumerates
|
|
`scopedServices.GetServices<INotificationDeliveryAdapter>()` and builds a fresh
|
|
`Dictionary<NotificationType, INotificationDeliveryAdapter>`. Adapter registration
|
|
is decided at startup (`AddNotificationOutbox` registers
|
|
`EmailNotificationDeliveryAdapter`); the registration set does not change at
|
|
runtime. With a default `DispatchInterval = 10s` and only ever one entry today, the
|
|
allocation overhead is trivial — but the comment "the last adapter registered for a
|
|
given type wins, mirroring DI's last-wins resolution semantics" elevates this to a
|
|
behaviour contract, and the per-sweep dictionary construction obscures the lookup's
|
|
identity from one sweep to the next, making any future stateful adapter (rate
|
|
limiter, circuit breaker) silently lose its state.
|
|
|
|
The same issue is the reason `EmailNotificationDeliveryAdapter` is *scoped* — it
|
|
holds a scoped `INotificationRepository`. A trivial cache-the-types-but-resolve-
|
|
the-instance fix is possible: cache the set of declared `NotificationType` values
|
|
and look up each adapter by `GetService<INotificationDeliveryAdapter>()`
|
|
filtered by `Type` per sweep.
|
|
|
|
**Recommendation**
|
|
|
|
Document the per-sweep contract explicitly ("each sweep gets a fresh adapter
|
|
instance per the scoped DI contract — adapters must not carry state across
|
|
sweeps") in the actor XML, or — preferred — cache only the *types* at startup
|
|
(`PreStart`) and resolve the scoped instance per sweep, so future adapters with
|
|
stateful intent (timeouts, circuit breakers) cannot accidentally lose state.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-007 — `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, and `PurgeInterval` are not in the design document
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Design-document adherence |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:13`, `:22`, `:25`; `docs/requirements/Component-NotificationOutbox.md:152-160` |
|
|
|
|
**Description**
|
|
|
|
`Component-NotificationOutbox.md` §Configuration enumerates three options: dispatch
|
|
interval, stuck-age threshold, and terminal-row retention window. The implemented
|
|
`NotificationOutboxOptions` adds three additional fields:
|
|
|
|
- `DispatchBatchSize` (default `100`) — caps the per-sweep claim size, but is invisible
|
|
to anyone reading only the spec.
|
|
- `PurgeInterval` (default `1 day`) — the design doc says "daily purge" as if the
|
|
cadence is fixed; in code it is configurable.
|
|
- `DeliveredKpiWindow` (default `1 min`) — the KPI section says "Delivered (last
|
|
interval)" without saying how long "last interval" is or that it is configurable.
|
|
|
|
The design doc also asserts "Delivery max-retry-count and retry interval are not
|
|
part of `NotificationOutboxOptions` — they are reused from the central SMTP
|
|
configuration" (line 160) — implementation honours this. But the three additions
|
|
above are dead text in the design doc. The KPI dashboard cadence and the dispatch
|
|
batch size are both operationally important values an operator/engineer will hunt
|
|
for; their absence from the spec is design drift.
|
|
|
|
**Recommendation**
|
|
|
|
Add the three fields to `Component-NotificationOutbox.md §Configuration` with their
|
|
defaults, or remove them from the implementation if they were meant to be fixed
|
|
constants. Cross-link `DeliveredKpiWindow` from the §Monitoring "Delivered (last
|
|
interval)" KPI bullet so a reader sees what controls the bucket length.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-008 — `FallbackMaxRetries` / `FallbackRetryDelay` path is unreachable in production AND untested
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:29-31`, `:251-259`; tests in `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorDispatchTests.cs` |
|
|
|
|
**Description**
|
|
|
|
`ResolveRetryPolicyAsync` falls back to `FallbackMaxRetries = 10` and
|
|
`FallbackRetryDelay = 1 min` when `notificationRepository.GetAllSmtpConfigurationsAsync()`
|
|
returns an empty list (no SMTP configuration row). The comment correctly observes
|
|
that delivery itself will then return `Permanent("No SMTP configuration available")`
|
|
from `EmailNotificationDeliveryAdapter.cs:78-81`, so the fallback retry policy
|
|
never actually retries anything — the row is permanently parked on first attempt
|
|
regardless of retry count or delay.
|
|
|
|
This produces three concerns. (1) The fallback is essentially dead code — the retry
|
|
policy values are never consulted in practice because delivery always fails
|
|
permanently before the retry branch is reached. (2) The fallback can be reached
|
|
*after* a previously-deployed SMTP config is deleted, which is precisely the
|
|
moment an operator needs accurate audit trails; the row will say `Parked` with
|
|
`LastError = "No SMTP configuration available"` but the audit signal "retry policy
|
|
fell back to defaults" is invisible. (3) Tests never exercise either the fallback
|
|
path or the empty-SMTP-config dispatch path: `SetupSmtpRetryPolicy` always supplies
|
|
a config in every dispatch test.
|
|
|
|
**Recommendation**
|
|
|
|
Add a regression test that runs a dispatch sweep with no SMTP config row and
|
|
asserts the row is parked with the documented error. Optionally remove the fallback
|
|
constants if parking-with-no-config is the *intended* operational signal; document
|
|
the choice in the actor XML so a maintainer does not "fix" the unreachable code.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-009 — `StuckAgeThreshold` XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:15-16` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
/// <summary>Age past which an in-progress notification is considered stuck and re-claimed.</summary>
|
|
public TimeSpan StuckAgeThreshold { get; set; } = TimeSpan.FromMinutes(10);
|
|
```
|
|
|
|
The implementation never reclaims anything based on `StuckAgeThreshold`. It is used
|
|
only as a cutoff for the stuck-count KPI (`StuckCutoff`/`IsStuck` in
|
|
`NotificationOutboxActor.cs:932-942`) and as a `StuckCutoff` filter on paginated
|
|
queries. The design doc is explicit: "A notification is **stuck** if it is `Pending`
|
|
or `Retrying` and older than a configurable age threshold (default 10 minutes).
|
|
Detection is **display-only** — a count KPI and a row badge. There is no automated
|
|
escalation or alerting" (`Component-NotificationOutbox.md:143-145`). A maintainer
|
|
reading the XML and expecting "re-claim" behaviour will be surprised twice — once
|
|
when no re-claim happens, and once when they go looking for the re-claim code and
|
|
find none.
|
|
|
|
**Recommendation**
|
|
|
|
Rewrite the XML to match the design: "Age past which a still-`Pending`/`Retrying`
|
|
notification is counted as stuck on the KPI tile and the per-row badge.
|
|
Display-only — does not affect dispatch."
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-010 — Comment claims `PipeTo` is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Documentation & comments |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:469-477` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
try
|
|
{
|
|
var evt = BuildNotifyDeliverEvent(notification, now, AuditStatus.Attempted, errorMessage)
|
|
with { DurationMs = durationMs };
|
|
// Fire-and-forget — we do NOT await: the dispatcher loop must not
|
|
// be blocked by audit IO, and the writer swallows its own faults.
|
|
// PipeTo is not used because the writer never throws.
|
|
_ = _auditWriter.WriteAsync(evt);
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
_logger.LogWarning(ex, "Failed to emit Attempted audit row …");
|
|
}
|
|
```
|
|
|
|
The XML-doc on `EmitAttemptAudit` is internally inconsistent and structurally
|
|
incorrect: (1) if "the writer never throws" then the surrounding try/catch is
|
|
unreachable and dead code; (2) if the writer *can* throw (and the catch is
|
|
meaningful) then "never throws" is wrong. In practice the catch only ever fires
|
|
on a synchronous throw from the writer's *task construction* — never on a fault
|
|
in the awaited body — because the discarded task is not observed. The current
|
|
behaviour matches the design intent ("audit failure NEVER aborts delivery"), but
|
|
the comment misleads the next reader on the *why*.
|
|
|
|
This is the same root cause as NotificationOutbox-004 — they target the same lines
|
|
from different angles (NotificationOutbox-004 is the scope-lifetime /
|
|
fire-and-forget Akka concern, NotificationOutbox-010 is the doc/comment-clarity
|
|
concern). Closing NotificationOutbox-004 by switching to `await` resolves both.
|
|
|
|
**Recommendation**
|
|
|
|
If `await`-ing the writer (recommended fix per NotificationOutbox-004): delete the
|
|
"PipeTo is not used because the writer never throws" line entirely and let
|
|
the try/catch's behaviour speak for itself. If keeping fire-and-forget: rewrite
|
|
the comment to "fire-and-forget by design (the writer is responsible for its
|
|
own failure handling); the surrounding try/catch only catches the synchronous
|
|
task-construction throw and is otherwise unreachable."
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|