fix(store-and-forward): resolve StoreAndForward-003, re-triage 002 — fix retry-count off-by-one

This commit is contained in:
Joseph Doherty
2026-05-16 19:57:28 -04:00
parent 09b4bd5dfa
commit 71c0564ec0
4 changed files with 101 additions and 14 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 12 |
| Open findings | 11 |
## Summary
@@ -94,7 +94,7 @@ commit whose message references `StoreAndForward-001`.
| | |
|--|--|
| Severity | High |
| Severity | ~~High~~ → Low (re-triaged) |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:162`, `:201` |
@@ -121,9 +121,39 @@ handler exists rather than silently buffering an undeliverable message, and wire
registration is intended, the retry sweep should treat a still-missing handler as a
transient condition with bounded logging rather than a permanent no-op.
**Re-triage note (2026-05-16)**
The finding's central factual claim — *"No caller in the codebase ever calls
`RegisterDeliveryHandler`"* and therefore *"every buffered message lands in this dead
state"* — is **no longer true at the reviewed code**. `ScadaLink.Host`
(`AkkaHostedService.RegisterSiteActors`, `AkkaHostedService.cs:353-379`) registers all
three delivery handlers (`ExternalSystem`, `CachedDbWrite`, `Notification`) at site
startup, immediately after `StoreAndForwardService.StartAsync()`. The finding was
written against commit `9c60592` before that wiring existed; the High-severity
"engine cannot deliver anything" outcome no longer occurs.
The remaining residual risk is narrow: a message enqueued for a category that genuinely
has no handler (e.g. an enqueue racing ahead of `RegisterDeliveryHandler`, or a future
category added without a handler) is still buffered and then skipped by the sweep
forever. That is a real but minor robustness gap, hence the **downgrade to Low**.
It is left **Open** rather than fixed in this pass because the finding's recommended
fix — making `EnqueueAsync` reject when no handler is registered — is a behavioural
contract change, not a localised bug fix: the "buffer with no handler yet" path is
exercised by `StoreAndForwardReplicationTests` and by three NotificationService and
ExternalSystemGateway tests (`Send_TransientError_WithStoreAndForward_BuffersMessage`,
`Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`,
`Send_SmtpProtocolException_ClassifiedTransient`) which construct a real
`StoreAndForwardService` without registering a handler and assert `WasBuffered`.
Changing the contract requires deciding whether late handler registration is supported
and updating tests in modules outside this review's edit scope — a design decision that
should be made deliberately rather than forced here.
**Resolution**
_Unresolved._
_Open — re-triaged to Low. Premise (no handler registration anywhere) is stale: Host
now wires all three handlers. Residual gap is minor and the prescribed fix is a
cross-module contract change needing a design decision._
### StoreAndForward-003 — Off-by-one in retry accounting: immediate failure pre-counts as retry 1
@@ -131,7 +161,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:153`, `:229`, `:233` |
**Description**
@@ -159,7 +189,21 @@ the comparison. Update the affected test to match the chosen semantics.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`). `RetryCount` now consistently means "number
of background retry-sweep attempts so far"; the initial immediate (or caller-made)
delivery attempt is attempt 0 and is not counted, and `MaxRetries` bounds retry-sweep
attempts after that initial attempt. `EnqueueAsync` no longer seeds `RetryCount = 1` on
either the transient-immediate-failure path or the `attemptImmediateDelivery: false`
path — a freshly buffered message has `RetryCount = 0`. `RetryMessageAsync` already
increments before the `>= MaxRetries` check, which is now correct, so a message with
`MaxRetries = 1` gets exactly one real retry before parking (previously zero). The
`StoreAndForwardMessage.RetryCount` XML doc was corrected to match. Regression test
`RetryPendingMessagesAsync_MaxRetriesOne_PerformsExactlyOneRetryBeforeParking` asserts
the immediate attempt plus exactly one retry occur before parking; the affected
existing tests (`EnqueueAsync_TransientFailure_BuffersForRetry`,
`EnqueueAsync_AttemptImmediateDeliveryFalse_BuffersWithoutInvokingHandler`,
`RetryPendingMessagesAsync_MaxRetriesReached_ParksMessage`) were updated to the
corrected semantics.
### StoreAndForward-004 — `RegisterDeliveryHandler` XML doc contradicts the implemented contract