fix(store-and-forward): resolve StoreAndForward-003, re-triage 002 — fix retry-count off-by-one

This commit is contained in:
Joseph Doherty
2026-05-16 19:57:28 -04:00
parent 09b4bd5dfa
commit 71c0564ec0
4 changed files with 101 additions and 14 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 | | Last reviewed | 2026-05-16 |
| Reviewer | claude-agent | | Reviewer | claude-agent |
| Commit reviewed | `9c60592` | | Commit reviewed | `9c60592` |
| Open findings | 12 | | Open findings | 11 |
## Summary ## Summary
@@ -94,7 +94,7 @@ commit whose message references `StoreAndForward-001`.
| | | | | |
|--|--| |--|--|
| Severity | High | | Severity | ~~High~~ → Low (re-triaged) |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Status | Open | | Status | Open |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:162`, `:201` | | Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:162`, `:201` |
@@ -121,9 +121,39 @@ handler exists rather than silently buffering an undeliverable message, and wire
registration is intended, the retry sweep should treat a still-missing handler as a registration is intended, the retry sweep should treat a still-missing handler as a
transient condition with bounded logging rather than a permanent no-op. transient condition with bounded logging rather than a permanent no-op.
**Re-triage note (2026-05-16)**
The finding's central factual claim — *"No caller in the codebase ever calls
`RegisterDeliveryHandler`"* and therefore *"every buffered message lands in this dead
state"* — is **no longer true at the reviewed code**. `ScadaLink.Host`
(`AkkaHostedService.RegisterSiteActors`, `AkkaHostedService.cs:353-379`) registers all
three delivery handlers (`ExternalSystem`, `CachedDbWrite`, `Notification`) at site
startup, immediately after `StoreAndForwardService.StartAsync()`. The finding was
written against commit `9c60592` before that wiring existed; the High-severity
"engine cannot deliver anything" outcome no longer occurs.
The remaining residual risk is narrow: a message enqueued for a category that genuinely
has no handler (e.g. an enqueue racing ahead of `RegisterDeliveryHandler`, or a future
category added without a handler) is still buffered and then skipped by the sweep
forever. That is a real but minor robustness gap, hence the **downgrade to Low**.
It is left **Open** rather than fixed in this pass because the finding's recommended
fix — making `EnqueueAsync` reject when no handler is registered — is a behavioural
contract change, not a localised bug fix: the "buffer with no handler yet" path is
exercised by `StoreAndForwardReplicationTests` and by three NotificationService and
ExternalSystemGateway tests (`Send_TransientError_WithStoreAndForward_BuffersMessage`,
`Send_Smtp4xxCommandException_ClassifiedTransientAndBuffered`,
`Send_SmtpProtocolException_ClassifiedTransient`) which construct a real
`StoreAndForwardService` without registering a handler and assert `WasBuffered`.
Changing the contract requires deciding whether late handler registration is supported
and updating tests in modules outside this review's edit scope — a design decision that
should be made deliberately rather than forced here.
**Resolution** **Resolution**
_Unresolved._ _Open — re-triaged to Low. Premise (no handler registration anywhere) is stale: Host
now wires all three handlers. Residual gap is minor and the prescribed fix is a
cross-module contract change needing a design decision._
### StoreAndForward-003 — Off-by-one in retry accounting: immediate failure pre-counts as retry 1 ### StoreAndForward-003 — Off-by-one in retry accounting: immediate failure pre-counts as retry 1
@@ -131,7 +161,7 @@ _Unresolved._
|--|--| |--|--|
| Severity | High | | Severity | High |
| Category | Correctness & logic bugs | | Category | Correctness & logic bugs |
| Status | Open | | Status | Resolved |
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:153`, `:229`, `:233` | | Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:153`, `:229`, `:233` |
**Description** **Description**
@@ -159,7 +189,21 @@ the comparison. Update the affected test to match the chosen semantics.
**Resolution** **Resolution**
_Unresolved._ Resolved 2026-05-16 (commit `<pending>`). `RetryCount` now consistently means "number
of background retry-sweep attempts so far"; the initial immediate (or caller-made)
delivery attempt is attempt 0 and is not counted, and `MaxRetries` bounds retry-sweep
attempts after that initial attempt. `EnqueueAsync` no longer seeds `RetryCount = 1` on
either the transient-immediate-failure path or the `attemptImmediateDelivery: false`
path — a freshly buffered message has `RetryCount = 0`. `RetryMessageAsync` already
increments before the `>= MaxRetries` check, which is now correct, so a message with
`MaxRetries = 1` gets exactly one real retry before parking (previously zero). The
`StoreAndForwardMessage.RetryCount` XML doc was corrected to match. Regression test
`RetryPendingMessagesAsync_MaxRetriesOne_PerformsExactlyOneRetryBeforeParking` asserts
the immediate attempt plus exactly one retry occur before parking; the affected
existing tests (`EnqueueAsync_TransientFailure_BuffersForRetry`,
`EnqueueAsync_AttemptImmediateDeliveryFalse_BuffersWithoutInvokingHandler`,
`RetryPendingMessagesAsync_MaxRetriesReached_ParksMessage`) were updated to the
corrected semantics.
### StoreAndForward-004 — `RegisterDeliveryHandler` XML doc contradicts the implemented contract ### StoreAndForward-004 — `RegisterDeliveryHandler` XML doc contradicts the implemented contract

View File

@@ -20,10 +20,14 @@ public class StoreAndForwardMessage
/// <summary>JSON-serialized payload containing the call details.</summary> /// <summary>JSON-serialized payload containing the call details.</summary>
public string PayloadJson { get; set; } = string.Empty; public string PayloadJson { get; set; } = string.Empty;
/// <summary>Number of delivery attempts so far.</summary> /// <summary>
/// Number of retry-sweep attempts performed so far. The initial (immediate or
/// caller-made) delivery attempt is attempt 0 and is not counted here; this
/// field counts only background retry attempts (StoreAndForward-003).
/// </summary>
public int RetryCount { get; set; } public int RetryCount { get; set; }
/// <summary>Maximum retry attempts before parking (0 = no limit).</summary> /// <summary>Maximum retry-sweep attempts before parking (0 = no limit).</summary>
public int MaxRetries { get; set; } public int MaxRetries { get; set; }
/// <summary>Retry interval in milliseconds.</summary> /// <summary>Retry interval in milliseconds.</summary>

View File

@@ -148,13 +148,14 @@ public class StoreAndForwardService
} }
catch (Exception ex) catch (Exception ex)
{ {
// Transient failure — buffer for retry // Transient failure — buffer for retry. The immediate attempt is
// attempt 0; RetryCount tracks only sweep retries, so it stays 0
// here (StoreAndForward-003).
_logger.LogWarning(ex, _logger.LogWarning(ex,
"Immediate delivery to {Target} failed (transient), buffering for retry", "Immediate delivery to {Target} failed (transient), buffering for retry",
target); target);
message.LastAttemptAt = DateTimeOffset.UtcNow; message.LastAttemptAt = DateTimeOffset.UtcNow;
message.RetryCount = 1;
message.LastError = ex.Message; message.LastError = ex.Message;
await BufferAsync(message); await BufferAsync(message);
@@ -165,11 +166,11 @@ public class StoreAndForwardService
// Either no handler is registered yet, or the caller already attempted // Either no handler is registered yet, or the caller already attempted
// delivery itself — buffer for the background retry sweep to deliver. // delivery itself — buffer for the background retry sweep to deliver.
// The initial attempt (caller-made, or skipped because no handler is
// registered) is attempt 0; RetryCount tracks only sweep retries and
// therefore stays 0 here (StoreAndForward-003).
if (!attemptImmediateDelivery) if (!attemptImmediateDelivery)
{ {
// The caller made (and failed) one attempt before handing the
// message over, so it counts as the first retry.
message.RetryCount = 1;
message.LastAttemptAt = DateTimeOffset.UtcNow; message.LastAttemptAt = DateTimeOffset.UtcNow;
} }
await BufferAsync(message); await BufferAsync(message);

View File

@@ -86,7 +86,9 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
var msg = await _storage.GetMessageByIdAsync(result.MessageId); var msg = await _storage.GetMessageByIdAsync(result.MessageId);
Assert.NotNull(msg); Assert.NotNull(msg);
Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status); Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status);
Assert.Equal(1, msg.RetryCount); // StoreAndForward-003: RetryCount counts sweep retries only; the immediate
// attempt is attempt 0, so a freshly buffered message has RetryCount 0.
Assert.Equal(0, msg.RetryCount);
} }
[Fact] [Fact]
@@ -134,6 +136,12 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
StoreAndForwardCategory.ExternalSystem, "api", """{}""", StoreAndForwardCategory.ExternalSystem, "api", """{}""",
maxRetries: 2); maxRetries: 2);
// StoreAndForward-003: MaxRetries bounds sweep retries (not the immediate
// attempt), so a message with MaxRetries=2 needs two retry sweeps to park.
await _service.RetryPendingMessagesAsync();
var afterFirst = await _storage.GetMessageByIdAsync(result.MessageId);
Assert.Equal(StoreAndForwardMessageStatus.Pending, afterFirst!.Status);
await _service.RetryPendingMessagesAsync(); await _service.RetryPendingMessagesAsync();
var msg = await _storage.GetMessageByIdAsync(result.MessageId); var msg = await _storage.GetMessageByIdAsync(result.MessageId);
@@ -141,6 +149,34 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
Assert.Equal(StoreAndForwardMessageStatus.Parked, msg!.Status); Assert.Equal(StoreAndForwardMessageStatus.Parked, msg!.Status);
} }
// ── StoreAndForward-003: retry-count accounting ──
[Fact]
public async Task RetryPendingMessagesAsync_MaxRetriesOne_PerformsExactlyOneRetryBeforeParking()
{
// The immediate attempt is attempt 0; MaxRetries=1 must allow exactly one
// retry sweep before parking. The pre-fix off-by-one parked with zero retries.
var attempts = 0;
_service.RegisterDeliveryHandler(StoreAndForwardCategory.ExternalSystem,
_ => { Interlocked.Increment(ref attempts); throw new HttpRequestException("always fails"); });
var result = await _service.EnqueueAsync(
StoreAndForwardCategory.ExternalSystem, "api", """{}""",
maxRetries: 1);
// After the immediate failed attempt the message is buffered, not parked.
var buffered = await _storage.GetMessageByIdAsync(result.MessageId);
Assert.Equal(StoreAndForwardMessageStatus.Pending, buffered!.Status);
Assert.Equal(1, attempts); // only the immediate attempt so far
await _service.RetryPendingMessagesAsync();
var msg = await _storage.GetMessageByIdAsync(result.MessageId);
Assert.Equal(StoreAndForwardMessageStatus.Parked, msg!.Status);
Assert.Equal(2, attempts); // immediate attempt + exactly one retry
Assert.Equal(1, msg.RetryCount); // one sweep retry recorded
}
[Fact] [Fact]
public async Task RetryPendingMessagesAsync_PermanentFailureOnRetry_ParksMessage() public async Task RetryPendingMessagesAsync_PermanentFailureOnRetry_ParksMessage()
{ {
@@ -332,6 +368,8 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
var msg = await _storage.GetMessageByIdAsync(result.MessageId); var msg = await _storage.GetMessageByIdAsync(result.MessageId);
Assert.NotNull(msg); Assert.NotNull(msg);
Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status); Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status);
Assert.Equal(1, msg.RetryCount); // counts as the caller's first attempt // StoreAndForward-003: the caller's own attempt is attempt 0; RetryCount
// counts only sweep retries, so a freshly buffered message has RetryCount 0.
Assert.Equal(0, msg.RetryCount);
} }
} }