fix(store-and-forward): resolve StoreAndForward-003, re-triage 002 — fix retry-count off-by-one
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 12 |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -94,7 +94,7 @@ commit whose message references `StoreAndForward-001`.
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Severity | ~~High~~ → Low (re-triaged) |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:162`, `:201` |
|
||||
@@ -121,9 +121,39 @@ handler exists rather than silently buffering an undeliverable message, and wire
|
||||
registration is intended, the retry sweep should treat a still-missing handler as a
|
||||
transient condition with bounded logging rather than a permanent no-op.
|
||||
|
||||
**Re-triage note (2026-05-16)**
|
||||
|
||||
The finding's central factual claim — *"No caller in the codebase ever calls
|
||||
`RegisterDeliveryHandler`"* and therefore *"every buffered message lands in this dead
|
||||
state"* — is **no longer true at the reviewed code**. `ScadaLink.Host`
|
||||
(`AkkaHostedService.RegisterSiteActors`, `AkkaHostedService.cs:353-379`) registers all
|
||||
three delivery handlers (`ExternalSystem`, `CachedDbWrite`, `Notification`) at site
|
||||
startup, immediately after `StoreAndForwardService.StartAsync()`. The finding was
|
||||
written against commit `9c60592` before that wiring existed; the High-severity
|
||||
"engine cannot deliver anything" outcome no longer occurs.
|
||||
|
||||
The remaining residual risk is narrow: a message enqueued for a category that genuinely
|
||||
has no handler (e.g. an enqueue racing ahead of `RegisterDeliveryHandler`, or a future
|
||||
category added without a handler) is still buffered and then skipped by the sweep
|
||||
forever. That is a real but minor robustness gap, hence the **downgrade to Low**.
|
||||
|
||||
It is left **Open** rather than fixed in this pass because the finding's recommended
|
||||
fix — making `EnqueueAsync` reject when no handler is registered — is a behavioural
|
||||
contract change, not a localised bug fix: the "buffer with no handler yet" path is
|
||||
exercised by `StoreAndForwardReplicationTests` and by three NotificationService and
|
||||
ExternalSystemGateway tests (`Send_TransientError_WithStoreAndForward_BuffersMessage`,
|
||||
`Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`,
|
||||
`Send_SmtpProtocolException_ClassifiedTransient`) which construct a real
|
||||
`StoreAndForwardService` without registering a handler and assert `WasBuffered`.
|
||||
Changing the contract requires deciding whether late handler registration is supported
|
||||
and updating tests in modules outside this review's edit scope — a design decision that
|
||||
should be made deliberately rather than forced here.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
_Open — re-triaged to Low. Premise (no handler registration anywhere) is stale: Host
|
||||
now wires all three handlers. Residual gap is minor and the prescribed fix is a
|
||||
cross-module contract change needing a design decision._
|
||||
|
||||
### StoreAndForward-003 — Off-by-one in retry accounting: immediate failure pre-counts as retry 1
|
||||
|
||||
@@ -131,7 +161,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:153`, `:229`, `:233` |
|
||||
|
||||
**Description**
|
||||
@@ -159,7 +189,21 @@ the comparison. Update the affected test to match the chosen semantics.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit `<pending>`). `RetryCount` now consistently means "number
|
||||
of background retry-sweep attempts so far"; the initial immediate (or caller-made)
|
||||
delivery attempt is attempt 0 and is not counted, and `MaxRetries` bounds retry-sweep
|
||||
attempts after that initial attempt. `EnqueueAsync` no longer seeds `RetryCount = 1` on
|
||||
either the transient-immediate-failure path or the `attemptImmediateDelivery: false`
|
||||
path — a freshly buffered message has `RetryCount = 0`. `RetryMessageAsync` already
|
||||
increments before the `>= MaxRetries` check, which is now correct, so a message with
|
||||
`MaxRetries = 1` gets exactly one real retry before parking (previously zero). The
|
||||
`StoreAndForwardMessage.RetryCount` XML doc was corrected to match. Regression test
|
||||
`RetryPendingMessagesAsync_MaxRetriesOne_PerformsExactlyOneRetryBeforeParking` asserts
|
||||
the immediate attempt plus exactly one retry occur before parking; the affected
|
||||
existing tests (`EnqueueAsync_TransientFailure_BuffersForRetry`,
|
||||
`EnqueueAsync_AttemptImmediateDeliveryFalse_BuffersWithoutInvokingHandler`,
|
||||
`RetryPendingMessagesAsync_MaxRetriesReached_ParksMessage`) were updated to the
|
||||
corrected semantics.
|
||||
|
||||
### StoreAndForward-004 — `RegisterDeliveryHandler` XML doc contradicts the implemented contract
|
||||
|
||||
|
||||
Reference in New Issue
Block a user