Files
ScadaBridge/code-reviews/NotificationOutbox/findings.md
T
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00

29 KiB

Code Review — NotificationOutbox

Field Value
Module src/ZB.MOM.WW.ScadaBridge.NotificationOutbox
Design doc docs/requirements/Component-NotificationOutbox.md
Status Reviewed
Last reviewed 2026-05-28
Reviewer claude-agent
Commit reviewed 1eb6e97
Open findings 7

Summary

NotificationOutbox is a small, focused module — one ~985-line actor (NotificationOutboxActor), a strongly-typed options class, an INotificationDeliveryAdapter seam, and the single concrete EmailNotificationDeliveryAdapter. The Akka.NET conventions are textbook: every async path is wrapped with PipeTo, the dispatcher uses an in-flight guard cleared on DispatchComplete, the sender is captured before crossing the await, and the actor isolates per-notification failures so one bad row never aborts a batch. Test coverage is broad — ingest, dispatch, query, retry/discard, purge, KPI, and the new audit-emission paths (B2 attempts + B3 terminals) all have dedicated test files — and the audit-write-failure-never-aborts-delivery contract is explicitly asserted.

The dominant theme is trust-boundary leakage between Outbox, NotificationService, and ConfigurationDatabase. The outbox inherits two known defects from its sibling modules that are reachable through EmailNotificationDeliveryAdapter: the OAuth2 SASL empty-user bug (NS-021) ships every M365 send with user="", and the InsertIfNotExistsAsync check-then-act race (CD-015) lives on the outbox's ack-after-persist hot path. Neither is a defect of code under src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/, but both are surfaced here because production dispatch and ingest go through these exact lines. A secondary theme is dispatcher-fire-and-forget audit writes (_ = _auditWriter.WriteAsync(...)) that can race the per-sweep scope dispose under the wrong DI graph, and a few smaller drifts: the dispatcher passes CancellationToken.None to adapter delivery (no graceful shutdown for in-flight SMTP sends), the StuckAgeThreshold XML-doc describes a behavior the design explicitly forbids (display-only, never reclaim), the MaxRetries boundary check uses >= against a config value that can be zero (immediate park on first transient failure), and several NotificationOutboxOptions fields are documented in code but absent from Component-NotificationOutbox.md. No Critical findings; two High, six Medium, two Low.

Checklist coverage

# Category Examined Notes
1 Correctness & logic bugs Yes MaxRetries zero/negative immediately parks (NotificationOutbox-002); StuckAgeThreshold XML doc contradicts design (NotificationOutbox-009); Guid.TryParse accepts compact "N" ids emitted by sites.
2 Akka.NET conventions Yes PipeTo / sender-capture / in-flight guard pattern is correctly applied throughout. Fire-and-forget _ = _auditWriter.WriteAsync(...) raises a scope-lifetime concern (NotificationOutbox-004).
3 Concurrency & thread safety Yes Actor state mutated only on actor thread. Inherited CD-015 race on InsertIfNotExistsAsync (NotificationOutbox-005) is the only race; the dispatcher's in-flight guard correctly serializes sweeps.
4 Error handling & resilience Yes Outer try/catch on RunDispatchPass/RunPurgePass keeps the in-flight guard sane; per-notification isolation is correct. CT not threaded into delivery (NotificationOutbox-003).
5 Security Yes Inherited OAuth2 empty-user (NotificationOutbox-001) reachable through the adapter. No new credential or trust-boundary issues introduced by the outbox itself.
6 Performance & resource management Yes Dispatch interval & batch size are simple polling; ResolveAdapters rebuilds the lookup per sweep (NotificationOutbox-006). No leaks.
7 Design-document adherence Yes NotificationOutboxOptions.DispatchBatchSize, DeliveredKpiWindow, PurgeInterval are not in the design doc (NotificationOutbox-007).
8 Code organization & conventions Yes Options class lives in the component project (correct); DI extension lives in the component (correct); adapter is scoped, actor singleton — interaction correctly documented in ServiceCollectionExtensions. No issues.
9 Testing coverage Yes Solid actor-behaviour coverage. Missing tests for FallbackMaxRetries / empty-SMTP-config dispatch path (NotificationOutbox-008).
10 Documentation & comments Yes XML on StuckAgeThreshold misleading (NotificationOutbox-009); XML on dispatcher's audit _ = fire-and-forget says "writer never throws" but EmitAttemptAudit still wraps in try/catch — comment contradicts itself (NotificationOutbox-010).

Findings

NotificationOutbox-001 — EmailNotificationDeliveryAdapter inherits the OAuth2 empty-user SASL bug (NS-021) on the M365 send path

Severity High
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/Delivery/EmailNotificationDeliveryAdapter.cs:185-191 (calls smtp.AuthenticateAsync("oauth2", token)); root cause in src/ZB.MOM.WW.ScadaBridge.NotificationService/MailKitSmtpClientWrapper.cs:76-79

Description

EmailNotificationDeliveryAdapter.SendAsync resolves an OAuth2 access token via _tokenService.GetTokenAsync(...) and then calls await smtp.AuthenticateAsync(config.AuthType, credentials, cancellationToken); on ISmtpClientWrapper. The production implementation (MailKitSmtpClientWrapper) constructs new SaslMechanismOAuth2("", credentials) — an empty user-name field — which Microsoft 365 SMTP rejects with 535 5.7.3 Authentication unsuccessful. The sibling NotificationService finding NS-021 documents this in full; the outbox is the new home for delivery on central, so every OAuth2 send that the outbox dispatches hits this code path. The defect is therefore reachable here even though the offending constructor lives in the NotificationService project, and the central-only redesign means this is now the only delivery path in production. Existing outbox tests do not catch it because they all substitute ISmtpClientWrapper and assert only that AuthenticateAsync is invoked with ("oauth2", "<token>") — the real SaslMechanismOAuth2 is never instantiated. OAuth2TokenService.GetTokenAsync is explicitly wired to login.microsoftonline.com/.../oauth2/v2.0/token with scope=https://outlook.office365.com/.default, so M365 SMTP is the intended target — and is precisely the relay that requires the user field to be populated.

Recommendation

Track the NS-021 fix and add an outbox-side regression test once the wrapper signature is widened. Concretely, when ISmtpClientWrapper.AuthenticateAsync is extended to accept the sender mailbox (or a dedicated oauth2UserName parameter), update EmailNotificationDeliveryAdapter.SendAsync to pass config.FromAddress, and add a test in EmailNotificationDeliveryAdapterTests that asserts the OAuth2 path forwards the sender identity. Until then, surface the same finding here so the outbox is not treated as resolved when NS-021 fires.

Resolution

Unresolved.

NotificationOutbox-002 — Dispatcher parks on first transient failure when SmtpConfiguration.MaxRetries == 0

Severity High
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:348-360

Description

The transient-failure branch increments RetryCount then evaluates if (notification.RetryCount >= maxRetries) notification.Status = NotificationStatus.Parked;. maxRetries is read from the central SmtpConfiguration.MaxRetries column, which has no enforced lower bound and is not validated by the outbox. A row whose MaxRetries is 0 (or any negative value) immediately satisfies 1 >= 0 on the very first transient failure, so the notification is parked without a single retry — directly contradicting the design doc's "fixed retry interval, reuse central SMTP max-retry-count" intent, where a configured value of zero would naturally read as "never retry, fail straight to permanent". SetupSmtpRetryPolicy in the dispatch tests always supplies a positive value, so this path is not exercised.

Additionally, an operator who clears the SMTP config row drops into the FallbackMaxRetries = 10 / FallbackRetryDelay = 1 min path (ResolveRetryPolicyAsync line 251); that path is also untested — see NotificationOutbox-008. The operational result is that a single bad SMTP config value silently halves the outbox's delivery guarantees.

Recommendation

Validate MaxRetries at the read point: treat a non-positive value as either the configured fallback (current FallbackMaxRetries = 10) or — preferred — surface the mis-configuration to the operator via a health metric and refuse to dispatch until the row is corrected. Either way, add a test that asserts the dispatcher's behaviour for MaxRetries == 0 and MaxRetries < 0.

Resolution

Unresolved.

NotificationOutbox-003 — Dispatcher does not propagate a CancellationToken into delivery; in-flight SMTP sends cannot be cancelled on shutdown

Severity Medium
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:334, src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/Delivery/INotificationDeliveryAdapter.cs:22

Description

DeliverOneAsync calls var outcome = await adapter.DeliverAsync(notification); — the second CancellationToken parameter on INotificationDeliveryAdapter.DeliverAsync is left at its default(CancellationToken) value, meaning CancellationToken.None. EmailNotificationDeliveryAdapter.SendAsync then threads that None token into smtp.ConnectAsync, smtp.AuthenticateAsync, and smtp.SendAsync. The consequence is that during a coordinated cluster shutdown (singleton handover, drain) any in-flight SMTP send is uncancellable and the dispatcher's sweep must wait for the underlying socket/SMTP timeout (SmtpConfiguration.ConnectionTimeoutSeconds) before the sweep's task completes and DispatchComplete lowers the in-flight guard. With the default connect-timeout values this is on the order of tens of seconds per notification in the in-progress batch, blocking CoordinatedShutdown.

The adapter implementations clearly expect a token — the contract type is CancellationToken cancellationToken = default everywhere — so this is a wiring gap, not a missing interface.

Recommendation

Wire a per-sweep CancellationTokenSource linked to the actor's lifecycle (cancel in PostStop) and pass its token into DeliverAsync. A linked source per sweep also bounds individual deliveries by the configured connection timeout when a more explicit per-attempt budget is wanted. Add a test that cancels mid-DeliverAsync and asserts the dispatcher completes promptly and the row is left non-terminal (Pending/Retrying unchanged) for the next sweep.

Resolution

Unresolved.

NotificationOutbox-004 — EmitAttemptAudit/EmitTerminalAudit fire-and-forget pattern can outlive the per-sweep DI scope

Severity Medium
Category Akka.NET conventions
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:425-435, 463-485

Description

Both emission helpers issue _ = _auditWriter.WriteAsync(evt); — discarding the returned task. CentralAuditWriter.WriteAsync opens its own await using var scope = _services.CreateAsyncScope(); and resolves a scoped IAuditLogRepository (verified at src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/CentralAuditWriter.cs:118-121), so the writer is defensively scope-independent. However the dispatcher already holds a per-sweep using var scope = _serviceProvider.CreateScope(); and the per-notification UpdateAsync runs in that scope. The fire-and-forget pattern means:

  1. The dispatcher's outer scope can be disposed (sweep done, DispatchComplete piped) while the audit WriteAsync task is still running on a different scope it owns — works today only because the writer creates its own scope.
  2. A faulted unobserved task is silently lost: if CentralAuditWriter.WriteAsync itself were ever made async void or refactored to not internally try/catch, the dispatcher would never see the fault and the audit row would vanish without the _logger.LogWarning reaching the operator.
  3. The XML-doc above EmitAttemptAudit says "PipeTo is not used because the writer never throws" — but the surrounding try { _ = _auditWriter.WriteAsync(evt); } catch (Exception ex) will only catch a synchronous throw from the task construction, not the awaited body of WriteAsync. The comment understates the risk: the catch is structurally unreachable for the documented failure mode.

The system actually wants the invariant "audit write never affects delivery" (verified by the AuditWriter_Throws_…StillSucceeds tests). That invariant is better expressed by await-ing the writer inside the actor's outer try/catch (the dispatcher already swallows per-notification exceptions) than by a discard-task, which couples the lifetime of the dispatcher's scope to that of the audit task through whatever scope graph the writer happens to use today.

Recommendation

Either await _auditWriter.WriteAsync(evt) inside the existing try/catch (the preferred fix — preserves the invariant, plays well with the per-sweep scope, and makes the catch block actually reachable), or — if a true fire-and-forget remains desired — capture the returned task and attach a continuation that calls _logger.LogWarning on faulted to keep diagnostics intact. Either way, fix the "writer never throws" XML-doc to match the implementation.

Resolution

Unresolved.

NotificationOutbox-005 — Ingest persistence inherits the CD-015 check-then-act race; under contention the second writer throws and the site retries

Severity Medium
Category Concurrency & thread safety
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:127-132 (caller); root cause in src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:33-45

Resolution (2026-05-28): Closed by CD-015 — NotificationOutboxRepository.InsertIfNotExistsAsync (commit ac96b83) is now a single-statement IF NOT EXISTS ... INSERT via ExecuteSqlInterpolatedAsync with a SqlException filter swallowing duplicate-key violations (2601/2627) as a no-op (return false). The check-then-act window is eliminated; the at-least-once handoff contract holds and the actor's PipeTo success/failure projection no longer surfaces a permanent PK-violation back to the site. Verified in src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs:51-103.

Description

HandleSubmitPersistAsync calls repository.InsertIfNotExistsAsync(notification) on INotificationOutboxRepository. The current implementation (src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/NotificationOutboxRepository.cs) does a check-then-act with no duplicate-key catch — documented as CD-015 (High, Open). The Notification Outbox's documented contract is "at-least-once handoff with ack-after-persist plus insert-if-not-exists on NotificationId" (CLAUDE.md, Component-NotificationOutbox.md §Ingest & Idempotency), and the duplicate-insert race is the expected contention pattern — the site retries the same submission after a lost ack. As written, the loser surfaces a SqlException (2627 PK violation) wrapped in DbUpdateException, propagates through PipeTo's failure projection as a NotificationSubmitAck { Accepted: false, Error: "... PRIMARY KEY ..." }, the site treats the ack as a forwarding failure and forwards the message again, re-entering the same race. If the contending pair keeps racing this can livelock.

The actor side is fine — PipeTo's success/failure projection correctly forwards the exception message. The repository side needs the standard 2601/2627 → no-op pattern that AuditLog and SiteCall already use. This finding tracks the outbox-side visibility of the CD-015 defect so a re-review of NotificationOutbox surfaces it even if the reader has not yet read the ConfigurationDatabase findings.

Recommendation

Track CD-015 to resolution. As a defense-in-depth complement here, consider treating a duplicate-key DbUpdateException in the actor's ingest failure projection as Accepted: true so a lost ack between persisted-by-the-first-writer and ack-back does not produce a permanent re-forward loop — but the cleanest fix remains the CD-015 raw-SQL IF NOT EXISTS … INSERT with 2601/2627 catch in NotificationOutboxRepository.

NotificationOutbox-006 — ResolveAdapters rebuilds the NotificationType → adapter dictionary on every dispatch sweep

Severity Low
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:267-277

Description

Every dispatch sweep calls ResolveAdapters(scope.ServiceProvider) which enumerates scopedServices.GetServices<INotificationDeliveryAdapter>() and builds a fresh Dictionary<NotificationType, INotificationDeliveryAdapter>. Adapter registration is decided at startup (AddNotificationOutbox registers EmailNotificationDeliveryAdapter); the registration set does not change at runtime. With a default DispatchInterval = 10s and only ever one entry today, the allocation overhead is trivial — but the comment "the last adapter registered for a given type wins, mirroring DI's last-wins resolution semantics" elevates this to a behaviour contract, and the per-sweep dictionary construction obscures the lookup's identity from one sweep to the next, making any future stateful adapter (rate limiter, circuit breaker) silently lose its state.

The same issue is the reason EmailNotificationDeliveryAdapter is scoped — it holds a scoped INotificationRepository. A trivial cache-the-types-but-resolve- the-instance fix is possible: cache the set of declared NotificationType values and look up each adapter by GetService<INotificationDeliveryAdapter>() filtered by Type per sweep.

Recommendation

Document the per-sweep contract explicitly ("each sweep gets a fresh adapter instance per the scoped DI contract — adapters must not carry state across sweeps") in the actor XML, or — preferred — cache only the types at startup (PreStart) and resolve the scoped instance per sweep, so future adapters with stateful intent (timeouts, circuit breakers) cannot accidentally lose state.

Resolution (2026-05-28): Cached the NotificationType → adapter dictionary on a new actor field _adaptersCache, built lazily on the first dispatch sweep that needs it. The cache is paired with an actor-lifetime IServiceScope (_adaptersScope, created on first use and disposed in PostStop) so the resolved scoped adapter instances live as long as the cache itself — respecting EmailNotificationDeliveryAdapter's scoped lifetime without leaking captive-DbContext references across the actor's full lifetime. RunDispatchPass now calls ResolveAdapters() (instance method, no args), and the existing "resolved from the per-sweep scope" comment was rewritten to cross-reference the new cache rationale. Adapter set is static per process lifetime (confirmed against the DI registration in AddNotificationOutbox), so no invalidation hook is needed. Regression test Dispatch_ResolvesAdaptersOnce_AcrossMultipleSweeps registers a counting factory and pins the resolution count at exactly 1 across three end-to-end dispatch sweeps.

NotificationOutbox-007 — NotificationOutboxOptions.DispatchBatchSize, DeliveredKpiWindow, and PurgeInterval are not in the design document

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxOptions.cs:13, :22, :25; docs/requirements/Component-NotificationOutbox.md:152-160

Description

Component-NotificationOutbox.md §Configuration enumerates three options: dispatch interval, stuck-age threshold, and terminal-row retention window. The implemented NotificationOutboxOptions adds three additional fields:

  • DispatchBatchSize (default 100) — caps the per-sweep claim size, but is invisible to anyone reading only the spec.
  • PurgeInterval (default 1 day) — the design doc says "daily purge" as if the cadence is fixed; in code it is configurable.
  • DeliveredKpiWindow (default 1 min) — the KPI section says "Delivered (last interval)" without saying how long "last interval" is or that it is configurable.

The design doc also asserts "Delivery max-retry-count and retry interval are not part of NotificationOutboxOptions — they are reused from the central SMTP configuration" (line 160) — implementation honours this. But the three additions above are dead text in the design doc. The KPI dashboard cadence and the dispatch batch size are both operationally important values an operator/engineer will hunt for; their absence from the spec is design drift.

Recommendation

Add the three fields to Component-NotificationOutbox.md §Configuration with their defaults, or remove them from the implementation if they were meant to be fixed constants. Cross-link DeliveredKpiWindow from the §Monitoring "Delivered (last interval)" KPI bullet so a reader sees what controls the bucket length.

Resolution (2026-05-28):

Resolved the code-side gap by adding clear XML <summary> docs to every property on NotificationOutboxOptions (DispatchInterval, DispatchBatchSize, StuckAgeThreshold, TerminalRetention, PurgeInterval, DeliveredKpiWindow) — each now states what it controls and its default value. The design-doc update remains tracked separately.

NotificationOutbox-008 — FallbackMaxRetries / FallbackRetryDelay path is unreachable in production AND untested

Severity Low
Category Testing coverage
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:29-31, :251-259; tests in tests/ZB.MOM.WW.ScadaBridge.NotificationOutbox.Tests/NotificationOutboxActorDispatchTests.cs

Resolution (2026-05-28): The FallbackMaxRetries / FallbackRetryDelay constants are documented for forward-compat with a deferred secondary-adapter path — when no SMTP configuration row exists, EmailNotificationDeliveryAdapter returns Permanent("No SMTP configuration available") before the retry-policy values are ever consulted, so the path is effectively unreachable from any current production caller. The reachable use of the constants — clamping a non-positive SmtpConfiguration.MaxRetries / RetryDelay (per NO-002) — is already covered by TransientFailure_WithZeroMaxRetries_RetriesUsingFallback_DoesNotParkImmediately, TransientFailure_WithNegativeMaxRetries_RetriesUsingFallback_DoesNotParkImmediately, and TransientFailure_WithNonPositiveRetryDelay_UsesFallbackDelay_NotZero in NotificationOutboxActorDispatchTests.cs. Mark untestable today and re-visit when a non-Email adapter (Teams etc.) makes the empty-SMTP-config branch genuinely deliverable.

Description

ResolveRetryPolicyAsync falls back to FallbackMaxRetries = 10 and FallbackRetryDelay = 1 min when notificationRepository.GetAllSmtpConfigurationsAsync() returns an empty list (no SMTP configuration row). The comment correctly observes that delivery itself will then return Permanent("No SMTP configuration available") from EmailNotificationDeliveryAdapter.cs:78-81, so the fallback retry policy never actually retries anything — the row is permanently parked on first attempt regardless of retry count or delay.

This produces three concerns. (1) The fallback is essentially dead code — the retry policy values are never consulted in practice because delivery always fails permanently before the retry branch is reached. (2) The fallback can be reached after a previously-deployed SMTP config is deleted, which is precisely the moment an operator needs accurate audit trails; the row will say Parked with LastError = "No SMTP configuration available" but the audit signal "retry policy fell back to defaults" is invisible. (3) Tests never exercise either the fallback path or the empty-SMTP-config dispatch path: SetupSmtpRetryPolicy always supplies a config in every dispatch test.

Recommendation

Add a regression test that runs a dispatch sweep with no SMTP config row and asserts the row is parked with the documented error. Optionally remove the fallback constants if parking-with-no-config is the intended operational signal; document the choice in the actor XML so a maintainer does not "fix" the unreachable code.

NotificationOutbox-009 — StuckAgeThreshold XML-doc says "in-progress notification is re-claimed" — contradicts the design's display-only stuck detection

Severity Low
Category Documentation & comments
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxOptions.cs:15-16

Description

/// <summary>Age past which an in-progress notification is considered stuck and re-claimed.</summary>
public TimeSpan StuckAgeThreshold { get; set; } = TimeSpan.FromMinutes(10);

The implementation never reclaims anything based on StuckAgeThreshold. It is used only as a cutoff for the stuck-count KPI (StuckCutoff/IsStuck in NotificationOutboxActor.cs:932-942) and as a StuckCutoff filter on paginated queries. The design doc is explicit: "A notification is stuck if it is Pending or Retrying and older than a configurable age threshold (default 10 minutes). Detection is display-only — a count KPI and a row badge. There is no automated escalation or alerting" (Component-NotificationOutbox.md:143-145). A maintainer reading the XML and expecting "re-claim" behaviour will be surprised twice — once when no re-claim happens, and once when they go looking for the re-claim code and find none.

Recommendation

Rewrite the XML to match the design: "Age past which a still-Pending/Retrying notification is counted as stuck on the KPI tile and the per-row badge. Display-only — does not affect dispatch."

Resolution (2026-05-28):

Rewrote the StuckAgeThreshold XML-doc to make the display-only semantics explicit: rows older than the threshold are flagged in KPIs/UI; there is no automatic re-claim, requeue, or escalation. Matches Component-NotificationOutbox.md §Monitoring.

NotificationOutbox-010 — Comment claims PipeTo is not used "because the writer never throws"; the surrounding try/catch is dead-letter for the documented failure mode

Severity Medium
Category Documentation & comments
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.NotificationOutbox/NotificationOutboxActor.cs:469-477

Description

try
{
    var evt = BuildNotifyDeliverEvent(notification, now, AuditStatus.Attempted, errorMessage)
        with { DurationMs = durationMs };
    // Fire-and-forget — we do NOT await: the dispatcher loop must not
    // be blocked by audit IO, and the writer swallows its own faults.
    // PipeTo is not used because the writer never throws.
    _ = _auditWriter.WriteAsync(evt);
}
catch (Exception ex)
{
    _logger.LogWarning(ex, "Failed to emit Attempted audit row …");
}

The XML-doc on EmitAttemptAudit is internally inconsistent and structurally incorrect: (1) if "the writer never throws" then the surrounding try/catch is unreachable and dead code; (2) if the writer can throw (and the catch is meaningful) then "never throws" is wrong. In practice the catch only ever fires on a synchronous throw from the writer's task construction — never on a fault in the awaited body — because the discarded task is not observed. The current behaviour matches the design intent ("audit failure NEVER aborts delivery"), but the comment misleads the next reader on the why.

This is the same root cause as NotificationOutbox-004 — they target the same lines from different angles (NotificationOutbox-004 is the scope-lifetime / fire-and-forget Akka concern, NotificationOutbox-010 is the doc/comment-clarity concern). Closing NotificationOutbox-004 by switching to await resolves both.

Recommendation

If await-ing the writer (recommended fix per NotificationOutbox-004): delete the "PipeTo is not used because the writer never throws" line entirely and let the try/catch's behaviour speak for itself. If keeping fire-and-forget: rewrite the comment to "fire-and-forget by design (the writer is responsible for its own failure handling); the surrounding try/catch only catches the synchronous task-construction throw and is otherwise unreachable."

Resolution

Unresolved.