5d2386cc9d
T-003: move the unlock lockout server-side. The 3-strike counter used to live in the Razor page only — a second tab / CLI caller could re-upload the same bytes and grind PBKDF2 indefinitely. The counter now lives in IBundleSessionStore, keyed by ContentHash, so retries against identical bundle bytes are throttled regardless of client. BundleLockedException surfaces the new typed error path. T-005: bind the manifest's non-derivative fields into AES-GCM AAD. A SHA-256 of the manifest (with ContentHash + Encryption normalised to sentinels) is now passed to AesGcm.Encrypt / .Decrypt, so a tampered SourceEnvironment / ExportedBy / CreatedAtUtc on a stolen bundle yields an authentication-tag mismatch instead of slipping past the Step-4 typo-resistant confirmation gate. T-006: cap zip entry count, decompressed length, and compression ratio in LoadAsync's envelope validator BEFORE any payload is decompressed, using ZipArchiveEntry.Length / .CompressedLength. New TransportOptions fields default to 4 entries / 200 MB / 50x ratio. T-007: clear decrypted plaintext on the ApplyAsync failure path and zero the buffer on success before removing the session, so a 100 MB DecryptedContent doesn't sit in memory for the 30-min TTL after a failed apply. A BundleSessionEvictionService BackgroundService now also drives EvictExpired periodically so abandoned sessions clear without needing a fresh Get() call to trigger lazy eviction. Also resolves NO-010 — the misleading "writer never throws" XML doc was the same code+comment my prior NO-004 await-the-writer fix already rewrote.
489 lines
26 KiB
Markdown
489 lines
26 KiB
Markdown
# Code Review — NotificationOutbox
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.NotificationOutbox` |
|
|
| Design doc | `docs/requirements/Component-NotificationOutbox.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-28 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `1eb6e97` |
|
|
| Open findings | 10 |
|
|
|
|
## Summary
|
|
|
|
NotificationOutbox is a small, focused module — one ~985-line actor
|
|
(`NotificationOutboxActor`), a strongly-typed options class, an
|
|
`INotificationDeliveryAdapter` seam, and the single concrete `EmailNotificationDeliveryAdapter`.
|
|
The Akka.NET conventions are textbook: every async path is wrapped with `PipeTo`, the
|
|
dispatcher uses an in-flight guard cleared on `DispatchComplete`, the sender is captured
|
|
before crossing the await, and the actor isolates per-notification failures so one bad row
|
|
never aborts a batch. Test coverage is broad — ingest, dispatch, query, retry/discard,
|
|
purge, KPI, and the new audit-emission paths (B2 attempts + B3 terminals) all have
|
|
dedicated test files — and the audit-write-failure-never-aborts-delivery contract is
|
|
explicitly asserted.
|
|
|
|
The dominant theme is **trust-boundary leakage between Outbox, NotificationService, and
|
|
ConfigurationDatabase**. The outbox inherits two known defects from its sibling modules
|
|
that are reachable through `EmailNotificationDeliveryAdapter`: the OAuth2 SASL empty-user
|
|
bug (NS-021) ships every M365 send with `user=""`, and the
|
|
`InsertIfNotExistsAsync` check-then-act race (CD-015) lives on the outbox's ack-after-persist
|
|
hot path. Neither is a defect of code under `src/ScadaLink.NotificationOutbox/`, but both
|
|
are surfaced here because production dispatch and ingest go through these exact lines.
|
|
A secondary theme is **dispatcher-fire-and-forget audit writes** (`_ = _auditWriter.WriteAsync(...)`)
|
|
that can race the per-sweep scope dispose under the wrong DI graph, and a few smaller
|
|
drifts: the dispatcher passes `CancellationToken.None` to adapter delivery (no graceful
|
|
shutdown for in-flight SMTP sends), the `StuckAgeThreshold` XML-doc describes a behavior
|
|
the design explicitly forbids (display-only, never reclaim), the `MaxRetries` boundary check
|
|
uses `>=` against a config value that can be zero (immediate park on first transient
|
|
failure), and several `NotificationOutboxOptions` fields are documented in code but absent
|
|
from `Component-NotificationOutbox.md`. No Critical findings; two High, six Medium, two Low.
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | Yes | `MaxRetries` zero/negative immediately parks (NotificationOutbox-002); `StuckAgeThreshold` XML doc contradicts design (NotificationOutbox-009); `Guid.TryParse` accepts compact `"N"` ids emitted by sites. |
|
|
| 2 | Akka.NET conventions | Yes | `PipeTo` / sender-capture / in-flight guard pattern is correctly applied throughout. Fire-and-forget `_ = _auditWriter.WriteAsync(...)` raises a scope-lifetime concern (NotificationOutbox-004). |
|
|
| 3 | Concurrency & thread safety | Yes | Actor state mutated only on actor thread. Inherited CD-015 race on `InsertIfNotExistsAsync` (NotificationOutbox-005) is the only race; the dispatcher's in-flight guard correctly serializes sweeps. |
|
|
| 4 | Error handling & resilience | Yes | Outer try/catch on `RunDispatchPass`/`RunPurgePass` keeps the in-flight guard sane; per-notification isolation is correct. CT not threaded into delivery (NotificationOutbox-003). |
|
|
| 5 | Security | Yes | Inherited OAuth2 empty-user (NotificationOutbox-001) reachable through the adapter. No new credential or trust-boundary issues introduced by the outbox itself. |
|
|
| 6 | Performance & resource management | Yes | Dispatch interval & batch size are simple polling; `ResolveAdapters` rebuilds the lookup per sweep (NotificationOutbox-006). No leaks. |
|
|
| 7 | Design-document adherence | Yes | `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, `PurgeInterval` are not in the design doc (NotificationOutbox-007). |
|
|
| 8 | Code organization & conventions | Yes | Options class lives in the component project (correct); DI extension lives in the component (correct); adapter is `scoped`, actor singleton — interaction correctly documented in `ServiceCollectionExtensions`. No issues. |
|
|
| 9 | Testing coverage | Yes | Solid actor-behaviour coverage. Missing tests for `FallbackMaxRetries` / empty-SMTP-config dispatch path (NotificationOutbox-008). |
|
|
| 10 | Documentation & comments | Yes | XML on `StuckAgeThreshold` misleading (NotificationOutbox-009); XML on dispatcher's audit `_ =` fire-and-forget says "writer never throws" but `EmitAttemptAudit` still wraps in try/catch — comment contradicts itself (NotificationOutbox-010). |
|
|
|
|
## Findings
|
|
|
|
### NotificationOutbox-001 — `EmailNotificationDeliveryAdapter` inherits the OAuth2 empty-user SASL bug (NS-021) on the M365 send path
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.NotificationOutbox/Delivery/EmailNotificationDeliveryAdapter.cs:185-191` (calls `smtp.AuthenticateAsync("oauth2", token)`); root cause in `src/ScadaLink.NotificationService/MailKitSmtpClientWrapper.cs:76-79` |
|
|
|
|
**Description**
|
|
|
|
`EmailNotificationDeliveryAdapter.SendAsync` resolves an OAuth2 access token via
|
|
`_tokenService.GetTokenAsync(...)` and then calls
|
|
`await smtp.AuthenticateAsync(config.AuthType, credentials, cancellationToken);`
|
|
on `ISmtpClientWrapper`. The production implementation (`MailKitSmtpClientWrapper`)
|
|
constructs `new SaslMechanismOAuth2("", credentials)` — an empty user-name field —
|
|
which Microsoft 365 SMTP rejects with `535 5.7.3 Authentication unsuccessful`. The
|
|
sibling NotificationService finding NS-021 documents this in full; the outbox is the
|
|
*new home* for delivery on central, so every OAuth2 send that the outbox dispatches
|
|
hits this code path. The defect is therefore reachable here even though the offending
|
|
constructor lives in the NotificationService project, and the central-only redesign
|
|
means this is now the only delivery path in production. Existing outbox tests do not
|
|
catch it because they all substitute `ISmtpClientWrapper` and assert only that
|
|
`AuthenticateAsync` is invoked with `("oauth2", "<token>")` — the real
|
|
`SaslMechanismOAuth2` is never instantiated. `OAuth2TokenService.GetTokenAsync` is
|
|
explicitly wired to `login.microsoftonline.com/.../oauth2/v2.0/token` with
|
|
`scope=https://outlook.office365.com/.default`, so M365 SMTP is the intended target —
|
|
and is precisely the relay that requires the user field to be populated.
|
|
|
|
**Recommendation**
|
|
|
|
Track the NS-021 fix and add an outbox-side regression test once the wrapper signature
|
|
is widened. Concretely, when `ISmtpClientWrapper.AuthenticateAsync` is extended to
|
|
accept the sender mailbox (or a dedicated `oauth2UserName` parameter), update
|
|
`EmailNotificationDeliveryAdapter.SendAsync` to pass `config.FromAddress`, and add a
|
|
test in `EmailNotificationDeliveryAdapterTests` that asserts the OAuth2 path forwards
|
|
the sender identity. Until then, surface the same finding here so the outbox is not
|
|
treated as resolved when NS-021 fires.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-002 — Dispatcher parks on first transient failure when `SmtpConfiguration.MaxRetries == 0`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:348-360` |
|
|
|
|
**Description**
|
|
|
|
The transient-failure branch increments `RetryCount` then evaluates
|
|
`if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked;`.
|
|
`maxRetries` is read from the central `SmtpConfiguration.MaxRetries` column, which has
|
|
no enforced lower bound and is not validated by the outbox. A row whose `MaxRetries`
|
|
is `0` (or any negative value) immediately satisfies `1 >= 0` on the very first
|
|
transient failure, so the notification is parked without a single retry — directly
|
|
contradicting the design doc's "fixed retry interval, reuse central SMTP
|
|
max-retry-count" intent, where a configured value of zero would naturally read as
|
|
"never retry, fail straight to permanent". `SetupSmtpRetryPolicy` in the dispatch
|
|
tests always supplies a positive value, so this path is not exercised.
|
|
|
|
Additionally, an operator who clears the SMTP config row drops into the
|
|
`FallbackMaxRetries = 10` / `FallbackRetryDelay = 1 min` path
|
|
(`ResolveRetryPolicyAsync` line 251); that path is also untested — see
|
|
NotificationOutbox-008. The operational result is that a single bad SMTP config
|
|
value silently halves the outbox's delivery guarantees.
|
|
|
|
**Recommendation**
|
|
|
|
Validate `MaxRetries` at the read point: treat a non-positive value as either the
|
|
configured fallback (current `FallbackMaxRetries = 10`) or — preferred — surface the
|
|
mis-configuration to the operator via a health metric and refuse to dispatch until
|
|
the row is corrected. Either way, add a test that asserts the dispatcher's behaviour
|
|
for `MaxRetries == 0` and `MaxRetries < 0`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-003 — Dispatcher does not propagate a `CancellationToken` into delivery; in-flight SMTP sends cannot be cancelled on shutdown
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:334`, `src/ScadaLink.NotificationOutbox/Delivery/INotificationDeliveryAdapter.cs:22` |
|
|
|
|
**Description**
|
|
|
|
`DeliverOneAsync` calls `var outcome = await adapter.DeliverAsync(notification);` —
|
|
the second `CancellationToken` parameter on `INotificationDeliveryAdapter.DeliverAsync`
|
|
is left at its `default(CancellationToken)` value, meaning `CancellationToken.None`.
|
|
`EmailNotificationDeliveryAdapter.SendAsync` then threads that `None` token into
|
|
`smtp.ConnectAsync`, `smtp.AuthenticateAsync`, and `smtp.SendAsync`. The consequence
|
|
is that during a coordinated cluster shutdown (singleton handover, drain) any
|
|
in-flight SMTP send is uncancellable and the dispatcher's sweep must wait for the
|
|
underlying socket/SMTP timeout (`SmtpConfiguration.ConnectionTimeoutSeconds`) before
|
|
the sweep's task completes and `DispatchComplete` lowers the in-flight guard. With
|
|
the default connect-timeout values this is on the order of tens of seconds per
|
|
notification in the in-progress batch, blocking `CoordinatedShutdown`.
|
|
|
|
The adapter implementations clearly *expect* a token — the contract type is
|
|
`CancellationToken cancellationToken = default` everywhere — so this is a wiring
|
|
gap, not a missing interface.
|
|
|
|
**Recommendation**
|
|
|
|
Wire a per-sweep `CancellationTokenSource` linked to the actor's lifecycle (cancel
|
|
in `PostStop`) and pass its token into `DeliverAsync`. A linked source per sweep
|
|
also bounds individual deliveries by the configured connection timeout when a more
|
|
explicit per-attempt budget is wanted. Add a test that cancels mid-`DeliverAsync` and
|
|
asserts the dispatcher completes promptly and the row is left non-terminal
|
|
(`Pending`/`Retrying` unchanged) for the next sweep.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-004 — `EmitAttemptAudit`/`EmitTerminalAudit` fire-and-forget pattern can outlive the per-sweep DI scope
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Akka.NET conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:425-435`, `463-485` |
|
|
|
|
**Description**
|
|
|
|
Both emission helpers issue `_ = _auditWriter.WriteAsync(evt);` — discarding the
|
|
returned task. `CentralAuditWriter.WriteAsync` opens its own `await using var scope =
|
|
_services.CreateAsyncScope();` and resolves a scoped `IAuditLogRepository` (verified
|
|
at `src/ScadaLink.AuditLog/Central/CentralAuditWriter.cs:118-121`), so the writer is
|
|
defensively scope-independent. However the dispatcher already holds a per-sweep
|
|
`using var scope = _serviceProvider.CreateScope();` and the per-notification
|
|
`UpdateAsync` runs in that scope. The fire-and-forget pattern means:
|
|
|
|
1. The dispatcher's outer scope can be disposed (sweep done, `DispatchComplete`
|
|
piped) while the audit `WriteAsync` task is still running on a *different*
|
|
scope it owns — works today only because the writer creates its own scope.
|
|
2. A faulted unobserved task is silently lost: if `CentralAuditWriter.WriteAsync`
|
|
itself were ever made `async void` or refactored to not internally try/catch,
|
|
the dispatcher would never see the fault and the audit row would vanish without
|
|
the `_logger.LogWarning` reaching the operator.
|
|
3. The XML-doc above `EmitAttemptAudit` says "PipeTo is not used because the writer
|
|
never throws" — but the surrounding `try { _ = _auditWriter.WriteAsync(evt); }
|
|
catch (Exception ex)` will only catch a synchronous throw from the *task
|
|
construction*, not the awaited body of `WriteAsync`. The comment understates the
|
|
risk: the catch is structurally unreachable for the documented failure mode.
|
|
|
|
The system actually wants the *invariant* "audit write never affects delivery"
|
|
(verified by the `AuditWriter_Throws_…StillSucceeds` tests). That invariant is
|
|
better expressed by `await`-ing the writer inside the actor's outer try/catch (the
|
|
dispatcher already swallows per-notification exceptions) than by a discard-task,
|
|
which couples the lifetime of the dispatcher's scope to that of the audit task
|
|
through whatever scope graph the writer happens to use today.
|
|
|
|
**Recommendation**
|
|
|
|
Either `await _auditWriter.WriteAsync(evt)` inside the existing `try`/`catch` (the
|
|
preferred fix — preserves the invariant, plays well with the per-sweep scope, and
|
|
makes the catch block actually reachable), or — if a true fire-and-forget remains
|
|
desired — capture the returned task and attach a continuation that calls
|
|
`_logger.LogWarning` on faulted to keep diagnostics intact. Either way, fix the
|
|
"writer never throws" XML-doc to match the implementation.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-005 — Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Concurrency & thread safety |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:127-132` (caller); root cause in `src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45` |
|
|
|
|
**Description**
|
|
|
|
`HandleSubmit` → `PersistAsync` calls `repository.InsertIfNotExistsAsync(notification)`
|
|
on `INotificationOutboxRepository`. The current implementation
|
|
(`src/ScadaLink.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs`)
|
|
does a check-then-act with no duplicate-key catch — documented as CD-015 (High,
|
|
Open). The Notification Outbox's documented contract is "at-least-once handoff with
|
|
ack-after-persist plus insert-if-not-exists on `NotificationId`" (CLAUDE.md,
|
|
Component-NotificationOutbox.md §Ingest & Idempotency), and the duplicate-insert
|
|
race is the **expected contention pattern** — the site retries the same submission
|
|
after a lost ack. As written, the loser surfaces a `SqlException` (2627 PK
|
|
violation) wrapped in `DbUpdateException`, propagates through `PipeTo`'s failure
|
|
projection as a `NotificationSubmitAck { Accepted: false, Error: "... PRIMARY KEY ..." }`,
|
|
the site treats the ack as a forwarding failure and forwards the message **again**,
|
|
re-entering the same race. If the contending pair keeps racing this can livelock.
|
|
|
|
The actor side is fine — `PipeTo`'s success/failure projection correctly forwards
|
|
the exception message. The repository side needs the standard `2601/2627 → no-op`
|
|
pattern that AuditLog and SiteCall already use. This finding tracks the outbox-side
|
|
visibility of the CD-015 defect so a re-review of NotificationOutbox surfaces it
|
|
even if the reader has not yet read the ConfigurationDatabase findings.
|
|
|
|
**Recommendation**
|
|
|
|
Track CD-015 to resolution. As a defense-in-depth complement here, consider
|
|
treating a duplicate-key `DbUpdateException` in the actor's ingest failure
|
|
projection as `Accepted: true` so a lost ack between persisted-by-the-first-writer
|
|
and ack-back does not produce a permanent re-forward loop — but the cleanest fix
|
|
remains the CD-015 raw-SQL `IF NOT EXISTS … INSERT` with `2601/2627` catch in
|
|
`NotificationOutboxRepository`.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-006 — `ResolveAdapters` rebuilds the `NotificationType → adapter` dictionary on every dispatch sweep
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:267-277` |
|
|
|
|
**Description**
|
|
|
|
Every dispatch sweep calls `ResolveAdapters(scope.ServiceProvider)` which enumerates
|
|
`scopedServices.GetServices<INotificationDeliveryAdapter>()` and builds a fresh
|
|
`Dictionary<NotificationType, INotificationDeliveryAdapter>`. Adapter registration
|
|
is decided at startup (`AddNotificationOutbox` registers
|
|
`EmailNotificationDeliveryAdapter`); the registration set does not change at
|
|
runtime. With a default `DispatchInterval = 10s` and only ever one entry today, the
|
|
allocation overhead is trivial — but the comment "the last adapter registered for a
|
|
given type wins, mirroring DI's last-wins resolution semantics" elevates this to a
|
|
behaviour contract, and the per-sweep dictionary construction obscures the lookup's
|
|
identity from one sweep to the next, making any future stateful adapter (rate
|
|
limiter, circuit breaker) silently lose its state.
|
|
|
|
The same issue is the reason `EmailNotificationDeliveryAdapter` is *scoped* — it
|
|
holds a scoped `INotificationRepository`. A trivial cache-the-types-but-resolve-
|
|
the-instance fix is possible: cache the set of declared `NotificationType` values
|
|
and look up each adapter by `GetService<INotificationDeliveryAdapter>()`
|
|
filtered by `Type` per sweep.
|
|
|
|
**Recommendation**
|
|
|
|
Document the per-sweep contract explicitly ("each sweep gets a fresh adapter
|
|
instance per the scoped DI contract — adapters must not carry state across
|
|
sweeps") in the actor XML, or — preferred — cache only the *types* at startup
|
|
(`PreStart`) and resolve the scoped instance per sweep, so future adapters with
|
|
stateful intent (timeouts, circuit breakers) cannot accidentally lose state.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-007 — `NotificationOutboxOptions.DispatchBatchSize`, `DeliveredKpiWindow`, and `PurgeInterval` are not in the design document
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Design-document adherence |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:13`, `:22`, `:25`; `docs/requirements/Component-NotificationOutbox.md:152-160` |
|
|
|
|
**Description**
|
|
|
|
`Component-NotificationOutbox.md` §Configuration enumerates three options: dispatch
|
|
interval, stuck-age threshold, and terminal-row retention window. The implemented
|
|
`NotificationOutboxOptions` adds three additional fields:
|
|
|
|
- `DispatchBatchSize` (default `100`) — caps the per-sweep claim size, but is invisible
|
|
to anyone reading only the spec.
|
|
- `PurgeInterval` (default `1 day`) — the design doc says "daily purge" as if the
|
|
cadence is fixed; in code it is configurable.
|
|
- `DeliveredKpiWindow` (default `1 min`) — the KPI section says "Delivered (last
|
|
interval)" without saying how long "last interval" is or that it is configurable.
|
|
|
|
The design doc also asserts "Delivery max-retry-count and retry interval are not
|
|
part of `NotificationOutboxOptions` — they are reused from the central SMTP
|
|
configuration" (line 160) — implementation honours this. But the three additions
|
|
above are dead text in the design doc. The KPI dashboard cadence and the dispatch
|
|
batch size are both operationally important values an operator/engineer will hunt
|
|
for; their absence from the spec is design drift.
|
|
|
|
**Recommendation**
|
|
|
|
Add the three fields to `Component-NotificationOutbox.md §Configuration` with their
|
|
defaults, or remove them from the implementation if they were meant to be fixed
|
|
constants. Cross-link `DeliveredKpiWindow` from the §Monitoring "Delivered (last
|
|
interval)" KPI bullet so a reader sees what controls the bucket length.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-008 — `FallbackMaxRetries` / `FallbackRetryDelay` path is unreachable in production AND untested
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:29-31`, `:251-259`; tests in `tests/ScadaLink.NotificationOutbox.Tests/NotificationOutboxActorDispatchTests.cs` |
|
|
|
|
**Description**
|
|
|
|
`ResolveRetryPolicyAsync` falls back to `FallbackMaxRetries = 10` and
|
|
`FallbackRetryDelay = 1 min` when `notificationRepository.GetAllSmtpConfigurationsAsync()`
|
|
returns an empty list (no SMTP configuration row). The comment correctly observes
|
|
that delivery itself will then return `Permanent("No SMTP configuration available")`
|
|
from `EmailNotificationDeliveryAdapter.cs:78-81`, so the fallback retry policy
|
|
never actually retries anything — the row is permanently parked on first attempt
|
|
regardless of retry count or delay.
|
|
|
|
This produces three concerns. (1) The fallback is essentially dead code — the retry
|
|
policy values are never consulted in practice because delivery always fails
|
|
permanently before the retry branch is reached. (2) The fallback can be reached
|
|
*after* a previously-deployed SMTP config is deleted, which is precisely the
|
|
moment an operator needs accurate audit trails; the row will say `Parked` with
|
|
`LastError = "No SMTP configuration available"` but the audit signal "retry policy
|
|
fell back to defaults" is invisible. (3) Tests never exercise either the fallback
|
|
path or the empty-SMTP-config dispatch path: `SetupSmtpRetryPolicy` always supplies
|
|
a config in every dispatch test.
|
|
|
|
**Recommendation**
|
|
|
|
Add a regression test that runs a dispatch sweep with no SMTP config row and
|
|
asserts the row is parked with the documented error. Optionally remove the fallback
|
|
constants if parking-with-no-config is the *intended* operational signal; document
|
|
the choice in the actor XML so a maintainer does not "fix" the unreachable code.
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-009 — `StuckAgeThreshold` XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxOptions.cs:15-16` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
/// <summary>Age past which an in-progress notification is considered stuck and re-claimed.</summary>
|
|
public TimeSpan StuckAgeThreshold { get; set; } = TimeSpan.FromMinutes(10);
|
|
```
|
|
|
|
The implementation never reclaims anything based on `StuckAgeThreshold`. It is used
|
|
only as a cutoff for the stuck-count KPI (`StuckCutoff`/`IsStuck` in
|
|
`NotificationOutboxActor.cs:932-942`) and as a `StuckCutoff` filter on paginated
|
|
queries. The design doc is explicit: "A notification is **stuck** if it is `Pending`
|
|
or `Retrying` and older than a configurable age threshold (default 10 minutes).
|
|
Detection is **display-only** — a count KPI and a row badge. There is no automated
|
|
escalation or alerting" (`Component-NotificationOutbox.md:143-145`). A maintainer
|
|
reading the XML and expecting "re-claim" behaviour will be surprised twice — once
|
|
when no re-claim happens, and once when they go looking for the re-claim code and
|
|
find none.
|
|
|
|
**Recommendation**
|
|
|
|
Rewrite the XML to match the design: "Age past which a still-`Pending`/`Retrying`
|
|
notification is counted as stuck on the KPI tile and the per-row badge.
|
|
Display-only — does not affect dispatch."
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|
|
|
|
### NotificationOutbox-010 — Comment claims `PipeTo` is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Documentation & comments |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.NotificationOutbox/NotificationOutboxActor.cs:469-477` |
|
|
|
|
**Description**
|
|
|
|
```csharp
|
|
try
|
|
{
|
|
var evt = BuildNotifyDeliverEvent(notification, now, AuditStatus.Attempted, errorMessage)
|
|
with { DurationMs = durationMs };
|
|
// Fire-and-forget — we do NOT await: the dispatcher loop must not
|
|
// be blocked by audit IO, and the writer swallows its own faults.
|
|
// PipeTo is not used because the writer never throws.
|
|
_ = _auditWriter.WriteAsync(evt);
|
|
}
|
|
catch (Exception ex)
|
|
{
|
|
_logger.LogWarning(ex, "Failed to emit Attempted audit row …");
|
|
}
|
|
```
|
|
|
|
The XML-doc on `EmitAttemptAudit` is internally inconsistent and structurally
|
|
incorrect: (1) if "the writer never throws" then the surrounding try/catch is
|
|
unreachable and dead code; (2) if the writer *can* throw (and the catch is
|
|
meaningful) then "never throws" is wrong. In practice the catch only ever fires
|
|
on a synchronous throw from the writer's *task construction* — never on a fault
|
|
in the awaited body — because the discarded task is not observed. The current
|
|
behaviour matches the design intent ("audit failure NEVER aborts delivery"), but
|
|
the comment misleads the next reader on the *why*.
|
|
|
|
This is the same root cause as NotificationOutbox-004 — they target the same lines
|
|
from different angles (NotificationOutbox-004 is the scope-lifetime /
|
|
fire-and-forget Akka concern, NotificationOutbox-010 is the doc/comment-clarity
|
|
concern). Closing NotificationOutbox-004 by switching to `await` resolves both.
|
|
|
|
**Recommendation**
|
|
|
|
If `await`-ing the writer (recommended fix per NotificationOutbox-004): delete the
|
|
"PipeTo is not used because the writer never throws" line entirely and let
|
|
the try/catch's behaviour speak for itself. If keeping fire-and-forget: rewrite
|
|
the comment to "fire-and-forget by design (the writer is responsible for its
|
|
own failure handling); the surrounding try/catch only catches the synchronous
|
|
task-construction throw and is otherwise unreachable."
|
|
|
|
**Resolution**
|
|
|
|
_Unresolved._
|