fix(store-and-forward): resolve StoreAndForward-004,005,010,013 — accurate handler-contract doc, conditional sweep writes, reset LastAttemptAt on parked retry, test coverage
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 11 |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -212,7 +212,7 @@ corrected semantics.
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:38`, `:60` |
|
||||
|
||||
**Description**
|
||||
@@ -233,15 +233,25 @@ retry); exception = transient failure (buffer / increment retry).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Confirmed the root cause against the source: the
|
||||
old XML comment's "Permanent failures should return false (message will NOT be
|
||||
buffered)" describes only the `EnqueueAsync` immediate path; on the retry path a
|
||||
`false` return parks the already-buffered message. Reworded the `_deliveryHandlers`
|
||||
field doc into an explicit three-way contract (`true` = delivered/removed-or-never-
|
||||
buffered; `false` = permanent failure = not-buffered-on-immediate / parked-on-retry;
|
||||
throws = transient = buffered-for-retry / retry-count-incremented), noting it applies
|
||||
identically on both paths, and pointed `RegisterDeliveryHandler`'s summary at it.
|
||||
Documentation-only change — no behavioural code touched, so no regression test
|
||||
(an XML comment is not test-observable).
|
||||
|
||||
### StoreAndForward-005 — Parked-message retry/discard can race with the in-progress retry sweep
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Low — re-triaged down from Medium on 2026-05-16; the described data-loss race is not actually reachable, see Re-triage note |
|
||||
| Original severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:184`, `:266`, `:280` |
|
||||
|
||||
**Description**
|
||||
@@ -266,9 +276,42 @@ so the actor mailbox serialises them, or make status transitions conditional in
|
||||
(e.g. `UPDATE ... WHERE id = @id AND status = @expected`) and re-check the affected
|
||||
row count.
|
||||
|
||||
**Re-triage note (2026-05-16)**
|
||||
|
||||
Verified against the source: the *specific* race the finding describes — *"an operator
|
||||
`RetryParkedMessageAsync` resets a row to Pending while the sweep simultaneously parks
|
||||
the same in-flight message"* — **cannot occur** at the reviewed code. The two operator
|
||||
operations (`StoreAndForwardStorage.RetryParkedMessageAsync` /
|
||||
`DiscardParkedMessageAsync`) are already SQL-conditional on `status = Parked`, and the
|
||||
retry sweep only ever loads and processes rows that are `Pending`. For the operator and
|
||||
the sweep to touch the *same* message the row would have to be simultaneously `Parked`
|
||||
(operator-actionable) and in the sweep's in-flight `Pending` snapshot — mutually
|
||||
exclusive states. Overlapping sweeps are additionally prevented by the `Interlocked`
|
||||
guard, so there is only ever one sweep writer. The described data-loss outcome is
|
||||
therefore not reachable, which makes the original Medium ("risky behaviour") severity
|
||||
an over-statement — **re-triaged to Low**.
|
||||
|
||||
What *is* real is a latent fragility: the sweep's own state-changing writes used the
|
||||
unconditional `UpdateMessageAsync` (`WHERE id = @id`), so the no-clobber guarantee
|
||||
rested entirely on the `Interlocked` invariant with zero defence-in-depth if that
|
||||
invariant were ever broken (e.g. `RetryMessageAsync` called outside the sweep, or the
|
||||
guard removed). That residual Low-severity weakness is worth closing, so the
|
||||
recommendation's conditional-SQL option was applied.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Re-triaged Medium → Low (the described race is
|
||||
unreachable; see Re-triage note) and the residual latent fragility fixed: added
|
||||
`StoreAndForwardStorage.UpdateMessageIfStatusAsync`, a conditional update
|
||||
(`UPDATE ... WHERE id = @id AND status = @expectedStatus`) returning whether a row was
|
||||
written. `RetryMessageAsync`'s three state-changing writes (park-on-permanent-failure,
|
||||
park-on-max-retries, retry-count increment) now use it with `expectedStatus = Pending`,
|
||||
so the sweep can never overwrite a row whose status changed underneath it; a skipped
|
||||
write is logged and the message left for the next sweep. Regression test
|
||||
`RetryMessageAsync_StatusChangedDuringDelivery_SweepParkWriteIsSkipped` drives a writer
|
||||
that moves the row out of `Pending` mid-delivery and asserts the sweep's stale park
|
||||
write is skipped (it failed against the pre-fix unconditional update, clobbering the
|
||||
other writer's `RetryCount`).
|
||||
|
||||
### StoreAndForward-006 — `GetParkedMessagesAsync` count and page run without a transaction
|
||||
|
||||
@@ -395,7 +438,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:203`, `:101` |
|
||||
|
||||
**Description**
|
||||
@@ -419,7 +462,16 @@ interval. Document the chosen behaviour in the method's XML comment.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Confirmed against the source: `RetryParkedMessage
|
||||
Async` reset `status`, `retry_count` and `last_error` but left `last_attempt_at` stale,
|
||||
so the operator-retried message's retry timing depended on the elapsed time since the
|
||||
original pre-park attempt. Encoded the intent explicitly — an operator retry means
|
||||
"attempt this again now" — by also setting `last_attempt_at = NULL` in the UPDATE, so
|
||||
the re-queued message is unambiguously due on the next sweep regardless of the
|
||||
configured `retry_interval_ms`. The method's XML comment now documents this. Regression
|
||||
test `RetryParkedMessageAsync_ClearsLastAttemptAt_SoMessageIsImmediatelyDue` uses a
|
||||
1-hour retry interval and a recent `last_attempt_at`; it failed pre-fix (timestamp not
|
||||
cleared, message excluded from the retry-due set) and passes post-fix.
|
||||
|
||||
### StoreAndForward-011 — `StoreAndForwardMessageStatus.InFlight` is unused and the doc's "retrying" status is unmodelled
|
||||
|
||||
@@ -488,7 +540,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `tests/ScadaLink.StoreAndForward.Tests/` (whole directory); `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:101`; `src/ScadaLink.StoreAndForward/ParkedMessageHandlerActor.cs` |
|
||||
|
||||
**Description**
|
||||
@@ -514,7 +566,20 @@ invalid-JSON payloads.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). All three gaps closed.
|
||||
(1) Retry-due timing: `GetMessagesForRetryAsync_NonZeroInterval_ExcludesNotYetDue
|
||||
IncludesDue` exercises the `julianday` elapsed-time SQL with non-zero intervals — a
|
||||
just-attempted message (1-hour interval) is excluded, an old one (attempted 2h ago,
|
||||
5-min interval) and a never-attempted one are included.
|
||||
(2) Active-side replication: this sub-claim was **already stale** at the reviewed code —
|
||||
`StoreAndForwardReplicationTests` (added with finding 001's fix) asserts the Add, Remove
|
||||
and Park replication operations; no new test was needed.
|
||||
(3) `ParkedMessageHandlerActor`: new `ParkedMessageHandlerActorTests` using
|
||||
`Akka.TestKit.Xunit2` (package reference added to the test project) covers Query/Retry/
|
||||
Discard request-to-response mapping, correlation-ID propagation, the unknown-message
|
||||
failure responses, and `ExtractMethodName` for `MethodName`, `Subject`, missing-property
|
||||
and malformed-JSON payloads (all falling back to the category name without throwing).
|
||||
These are coverage-gap tests over already-correct code, so they pass on first run.
|
||||
|
||||
### StoreAndForward-014 — Storage does not create its SQLite database directory
|
||||
|
||||
|
||||
@@ -36,8 +36,20 @@ public class StoreAndForwardService
|
||||
private int _retryInProgress;
|
||||
|
||||
/// <summary>
|
||||
/// WP-10: Delivery handler delegate. Returns true on success, throws on transient failure.
|
||||
/// Permanent failures should return false (message will NOT be buffered).
|
||||
/// WP-10: Delivery handler delegate. The return value / exception is interpreted
|
||||
/// the same way on both the immediate-delivery path (<see cref="EnqueueAsync"/>)
|
||||
/// and the background retry path (<c>RetryMessageAsync</c>):
|
||||
/// <list type="bullet">
|
||||
/// <item><description><c>true</c> — delivered successfully. The message is
|
||||
/// removed from the buffer (or, on the immediate path, never buffered).</description></item>
|
||||
/// <item><description><c>false</c> — permanent failure. On the immediate path
|
||||
/// the message is NOT buffered; on a retry the message is already buffered and
|
||||
/// is parked immediately (no further retries).</description></item>
|
||||
/// <item><description>throws — transient failure. On the immediate path the
|
||||
/// message is buffered for retry; on a retry the retry count is incremented and
|
||||
/// the message is parked once <see cref="StoreAndForwardMessage.MaxRetries"/> is
|
||||
/// reached.</description></item>
|
||||
/// </list>
|
||||
/// </summary>
|
||||
private readonly Dictionary<StoreAndForwardCategory, Func<StoreAndForwardMessage, Task<bool>>> _deliveryHandlers = new();
|
||||
|
||||
@@ -59,7 +71,9 @@ public class StoreAndForwardService
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Registers a delivery handler for a given message category.
|
||||
/// Registers a delivery handler for a given message category. See the
|
||||
/// <c>_deliveryHandlers</c> field documentation for the true/false/throws contract,
|
||||
/// which applies identically on the immediate and retry paths.
|
||||
/// </summary>
|
||||
public void RegisterDeliveryHandler(
|
||||
StoreAndForwardCategory category,
|
||||
@@ -241,11 +255,22 @@ public class StoreAndForwardService
|
||||
return;
|
||||
}
|
||||
|
||||
// Permanent failure on retry — park immediately
|
||||
// Permanent failure on retry — park immediately.
|
||||
// StoreAndForward-005: the sweep observed this row as Pending; only commit
|
||||
// the park if it is still Pending so a concurrent operator action that
|
||||
// moved it (retry/discard) is not silently overwritten.
|
||||
message.Status = StoreAndForwardMessageStatus.Parked;
|
||||
message.LastAttemptAt = DateTimeOffset.UtcNow;
|
||||
message.LastError = "Permanent failure (handler returned false)";
|
||||
await _storage.UpdateMessageAsync(message);
|
||||
var parked = await _storage.UpdateMessageIfStatusAsync(
|
||||
message, StoreAndForwardMessageStatus.Pending);
|
||||
if (!parked)
|
||||
{
|
||||
_logger.LogDebug(
|
||||
"Message {MessageId} changed status during delivery; sweep park skipped",
|
||||
message.Id);
|
||||
return;
|
||||
}
|
||||
_replication?.ReplicatePark(message);
|
||||
RaiseActivity("Parked", message.Category,
|
||||
$"Permanent failure for {message.Target}: handler returned false");
|
||||
@@ -259,8 +284,18 @@ public class StoreAndForwardService
|
||||
|
||||
if (message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries)
|
||||
{
|
||||
// StoreAndForward-005: conditional park — see the permanent-failure
|
||||
// branch above for rationale.
|
||||
message.Status = StoreAndForwardMessageStatus.Parked;
|
||||
await _storage.UpdateMessageAsync(message);
|
||||
var parked = await _storage.UpdateMessageIfStatusAsync(
|
||||
message, StoreAndForwardMessageStatus.Pending);
|
||||
if (!parked)
|
||||
{
|
||||
_logger.LogDebug(
|
||||
"Message {MessageId} changed status during delivery; sweep park skipped",
|
||||
message.Id);
|
||||
return;
|
||||
}
|
||||
_replication?.ReplicatePark(message);
|
||||
RaiseActivity("Parked", message.Category,
|
||||
$"Max retries ({message.MaxRetries}) reached for {message.Target}");
|
||||
@@ -270,7 +305,17 @@ public class StoreAndForwardService
|
||||
}
|
||||
else
|
||||
{
|
||||
await _storage.UpdateMessageAsync(message);
|
||||
// StoreAndForward-005: the retry-count increment is also conditional
|
||||
// on the row still being Pending so it cannot clobber an operator
|
||||
// action that ran during the failed delivery.
|
||||
if (!await _storage.UpdateMessageIfStatusAsync(
|
||||
message, StoreAndForwardMessageStatus.Pending))
|
||||
{
|
||||
_logger.LogDebug(
|
||||
"Message {MessageId} changed status during delivery; sweep retry-count update skipped",
|
||||
message.Id);
|
||||
return;
|
||||
}
|
||||
RaiseActivity("Retried", message.Category,
|
||||
$"Retry {message.RetryCount}/{message.MaxRetries} for {message.Target}: {ex.Message}");
|
||||
}
|
||||
|
||||
@@ -164,6 +164,47 @@ public class StoreAndForwardStorage
|
||||
await cmd.ExecuteNonQueryAsync();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WP-10: Updates a message after a delivery attempt, but only if the row is still
|
||||
/// in the expected status. Returns true if the row was updated, false if it had
|
||||
/// already been changed (e.g. an operator retried or discarded the message) and so
|
||||
/// was skipped.
|
||||
///
|
||||
/// StoreAndForward-005: the retry sweep uses this for its state-changing writes so
|
||||
/// it cannot clobber a concurrent operator action (RetryParkedMessageAsync /
|
||||
/// DiscardParkedMessageAsync). Those operator operations are themselves SQL-
|
||||
/// conditional on <c>status = Parked</c>; making the sweep's writes conditional on
|
||||
/// the status the sweep observed closes the sweep-vs-management race rather than
|
||||
/// relying only on the in-process overlapping-sweep guard.
|
||||
/// </summary>
|
||||
public async Task<bool> UpdateMessageIfStatusAsync(
|
||||
StoreAndForwardMessage message,
|
||||
StoreAndForwardMessageStatus expectedStatus)
|
||||
{
|
||||
await using var connection = new SqliteConnection(_connectionString);
|
||||
await connection.OpenAsync();
|
||||
|
||||
await using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = @"
|
||||
UPDATE sf_messages
|
||||
SET retry_count = @retryCount,
|
||||
last_attempt_at = @lastAttempt,
|
||||
status = @status,
|
||||
last_error = @lastError
|
||||
WHERE id = @id AND status = @expectedStatus";
|
||||
|
||||
cmd.Parameters.AddWithValue("@id", message.Id);
|
||||
cmd.Parameters.AddWithValue("@retryCount", message.RetryCount);
|
||||
cmd.Parameters.AddWithValue("@lastAttempt", message.LastAttemptAt.HasValue
|
||||
? message.LastAttemptAt.Value.ToString("O") : DBNull.Value);
|
||||
cmd.Parameters.AddWithValue("@status", (int)message.Status);
|
||||
cmd.Parameters.AddWithValue("@lastError", (object?)message.LastError ?? DBNull.Value);
|
||||
cmd.Parameters.AddWithValue("@expectedStatus", (int)expectedStatus);
|
||||
|
||||
var rows = await cmd.ExecuteNonQueryAsync();
|
||||
return rows > 0;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// WP-10: Removes a successfully delivered message.
|
||||
/// </summary>
|
||||
@@ -221,6 +262,13 @@ public class StoreAndForwardStorage
|
||||
|
||||
/// <summary>
|
||||
/// WP-12: Moves a parked message back to pending for retry.
|
||||
///
|
||||
/// StoreAndForward-010: <c>last_attempt_at</c> is reset to NULL so the re-queued
|
||||
/// message is unambiguously due on the next retry sweep. An operator-initiated
|
||||
/// retry means "attempt this again now"; leaving the stale parked timestamp in
|
||||
/// place would make the message's retry timing depend on the configured retry
|
||||
/// interval relative to the original (pre-park) attempt — "try immediately" only
|
||||
/// by accident, and a long interval would instead delay the operator's retry.
|
||||
/// </summary>
|
||||
public async Task<bool> RetryParkedMessageAsync(string messageId)
|
||||
{
|
||||
@@ -230,7 +278,7 @@ public class StoreAndForwardStorage
|
||||
await using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = @"
|
||||
UPDATE sf_messages
|
||||
SET status = @pending, retry_count = 0, last_error = NULL
|
||||
SET status = @pending, retry_count = 0, last_error = NULL, last_attempt_at = NULL
|
||||
WHERE id = @id AND status = @parked";
|
||||
|
||||
cmd.Parameters.AddWithValue("@id", messageId);
|
||||
|
||||
@@ -0,0 +1,173 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.Data.Sqlite;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using ScadaLink.Commons.Messages.RemoteQuery;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
|
||||
namespace ScadaLink.StoreAndForward.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// StoreAndForward-013: tests for the <see cref="ParkedMessageHandlerActor"/> actor
|
||||
/// bridge — the Query/Retry/Discard request-to-response mapping and the
|
||||
/// <c>ExtractMethodName</c> payload-JSON parsing (including the malformed-JSON branch).
|
||||
/// </summary>
|
||||
public class ParkedMessageHandlerActorTests : TestKit, IAsyncLifetime, IDisposable
|
||||
{
|
||||
private readonly SqliteConnection _keepAlive;
|
||||
private readonly StoreAndForwardStorage _storage;
|
||||
private readonly StoreAndForwardService _service;
|
||||
|
||||
public ParkedMessageHandlerActorTests()
|
||||
{
|
||||
var connStr = $"Data Source=ActorTests_{Guid.NewGuid():N};Mode=Memory;Cache=Shared";
|
||||
_keepAlive = new SqliteConnection(connStr);
|
||||
_keepAlive.Open();
|
||||
|
||||
_storage = new StoreAndForwardStorage(connStr, NullLogger<StoreAndForwardStorage>.Instance);
|
||||
|
||||
var options = new StoreAndForwardOptions
|
||||
{
|
||||
DefaultRetryInterval = TimeSpan.Zero,
|
||||
DefaultMaxRetries = 1,
|
||||
RetryTimerInterval = TimeSpan.FromMinutes(10),
|
||||
ReplicationEnabled = false,
|
||||
};
|
||||
|
||||
_service = new StoreAndForwardService(
|
||||
_storage, options, NullLogger<StoreAndForwardService>.Instance);
|
||||
}
|
||||
|
||||
public async Task InitializeAsync() => await _storage.InitializeAsync();
|
||||
|
||||
public Task DisposeAsync() => Task.CompletedTask;
|
||||
|
||||
protected override void Dispose(bool disposing)
|
||||
{
|
||||
if (disposing) _keepAlive.Dispose();
|
||||
base.Dispose(disposing);
|
||||
}
|
||||
|
||||
/// <summary>Enqueues a message and parks it via the retry sweep (MaxRetries = 1).</summary>
|
||||
private async Task<string> ParkMessageAsync(StoreAndForwardCategory category, string payloadJson)
|
||||
{
|
||||
_service.RegisterDeliveryHandler(category, _ => throw new HttpRequestException("always fails"));
|
||||
var result = await _service.EnqueueAsync(category, "target", payloadJson, maxRetries: 1);
|
||||
await _service.RetryPendingMessagesAsync();
|
||||
return result.MessageId;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_ReturnsParkedEntries_WithExtractedMethodName()
|
||||
{
|
||||
await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem,
|
||||
"""{"MethodName":"StartPump","Args":{}}""");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageQueryRequest("corr-1", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||
Assert.True(response.Success);
|
||||
Assert.Equal("corr-1", response.CorrelationId);
|
||||
Assert.Equal("site-1", response.SiteId);
|
||||
Assert.Single(response.Messages);
|
||||
Assert.Equal("StartPump", response.Messages[0].MethodName);
|
||||
Assert.Equal(StoreAndForwardCategory.ExternalSystem, response.Messages[0].Category);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_NotificationPayload_UsesSubjectAsMethodName()
|
||||
{
|
||||
await ParkMessageAsync(StoreAndForwardCategory.Notification,
|
||||
"""{"Subject":"Tank overflow","Body":"..."}""");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageQueryRequest("corr-2", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||
Assert.True(response.Success);
|
||||
Assert.Equal("Tank overflow", response.Messages[0].MethodName);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_MalformedJsonPayload_FallsBackToCategoryName()
|
||||
{
|
||||
await ParkMessageAsync(StoreAndForwardCategory.CachedDbWrite, "not-valid-json{");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageQueryRequest("corr-3", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||
Assert.True(response.Success);
|
||||
// Malformed JSON must not throw — ExtractMethodName falls back to the category.
|
||||
Assert.Equal(StoreAndForwardCategory.CachedDbWrite.ToString(), response.Messages[0].MethodName);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Query_PayloadWithoutMethodNameOrSubject_FallsBackToCategoryName()
|
||||
{
|
||||
await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{"Unrelated":"value"}""");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageQueryRequest("corr-4", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||
Assert.Equal(StoreAndForwardCategory.ExternalSystem.ToString(), response.Messages[0].MethodName);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Retry_ParkedMessage_ReturnsSuccess()
|
||||
{
|
||||
var messageId = await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{}""");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageRetryRequest("corr-5", "site-1", messageId, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageRetryResponse>();
|
||||
Assert.True(response.Success);
|
||||
Assert.Equal("corr-5", response.CorrelationId);
|
||||
Assert.Null(response.ErrorMessage);
|
||||
|
||||
var msg = await _storage.GetMessageByIdAsync(messageId);
|
||||
Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Retry_UnknownMessage_ReturnsFailureWithMessage()
|
||||
{
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageRetryRequest("corr-6", "site-1", "does-not-exist", DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageRetryResponse>();
|
||||
Assert.False(response.Success);
|
||||
Assert.Equal("corr-6", response.CorrelationId);
|
||||
Assert.NotNull(response.ErrorMessage);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Discard_ParkedMessage_ReturnsSuccessAndRemovesMessage()
|
||||
{
|
||||
var messageId = await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{}""");
|
||||
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageDiscardRequest("corr-7", "site-1", messageId, DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageDiscardResponse>();
|
||||
Assert.True(response.Success);
|
||||
Assert.Equal("corr-7", response.CorrelationId);
|
||||
|
||||
var msg = await _storage.GetMessageByIdAsync(messageId);
|
||||
Assert.Null(msg);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Discard_UnknownMessage_ReturnsFailureWithMessage()
|
||||
{
|
||||
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||
actor.Tell(new ParkedMessageDiscardRequest("corr-8", "site-1", "does-not-exist", DateTimeOffset.UtcNow));
|
||||
|
||||
var response = ExpectMsg<ParkedMessageDiscardResponse>();
|
||||
Assert.False(response.Success);
|
||||
Assert.NotNull(response.ErrorMessage);
|
||||
}
|
||||
}
|
||||
@@ -9,6 +9,7 @@
|
||||
</PropertyGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<PackageReference Include="Akka.TestKit.Xunit2" />
|
||||
<PackageReference Include="coverlet.collector" />
|
||||
<PackageReference Include="Microsoft.Data.Sqlite" />
|
||||
<PackageReference Include="Microsoft.NET.Test.Sdk" />
|
||||
|
||||
@@ -177,6 +177,49 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
|
||||
Assert.Equal(1, msg.RetryCount); // one sweep retry recorded
|
||||
}
|
||||
|
||||
// ── StoreAndForward-005: sweep-vs-management race hardening ──
|
||||
|
||||
[Fact]
|
||||
public async Task RetryMessageAsync_StatusChangedDuringDelivery_SweepParkWriteIsSkipped()
|
||||
{
|
||||
// StoreAndForward-005: the retry sweep's state-changing writes must be
|
||||
// conditional on the status it observed, so a concurrent operator action that
|
||||
// moved the row out of Pending (e.g. between the sweep's snapshot load and its
|
||||
// park write) is not silently overwritten by the sweep's stale view.
|
||||
var result = await _service.EnqueueAsync(
|
||||
StoreAndForwardCategory.ExternalSystem, "api", """{}""",
|
||||
attemptImmediateDelivery: false, maxRetries: 1);
|
||||
|
||||
_service.RegisterDeliveryHandler(StoreAndForwardCategory.ExternalSystem,
|
||||
async msg =>
|
||||
{
|
||||
// Simulate an operator action winning the race: the row leaves Pending
|
||||
// (here: parked) while the sweep is still mid-delivery. The sweep would
|
||||
// otherwise unconditionally re-write this row from its stale snapshot.
|
||||
var parkedOutFromUnderTheSweep = new StoreAndForwardMessage
|
||||
{
|
||||
Id = msg.Id, Category = msg.Category, Target = msg.Target,
|
||||
PayloadJson = msg.PayloadJson, RetryCount = 7,
|
||||
MaxRetries = msg.MaxRetries, RetryIntervalMs = msg.RetryIntervalMs,
|
||||
CreatedAt = msg.CreatedAt, LastAttemptAt = DateTimeOffset.UtcNow,
|
||||
Status = StoreAndForwardMessageStatus.Parked,
|
||||
LastError = "operator/other writer"
|
||||
};
|
||||
await _storage.UpdateMessageAsync(parkedOutFromUnderTheSweep);
|
||||
throw new HttpRequestException("transient — sweep will try to park");
|
||||
});
|
||||
|
||||
await _service.RetryPendingMessagesAsync();
|
||||
|
||||
// The sweep observed Pending; the row is now Parked with the other writer's
|
||||
// RetryCount (7), not the sweep's (1). The sweep's conditional write was skipped.
|
||||
var msg = await _storage.GetMessageByIdAsync(result.MessageId);
|
||||
Assert.NotNull(msg);
|
||||
Assert.Equal(StoreAndForwardMessageStatus.Parked, msg!.Status);
|
||||
Assert.Equal(7, msg.RetryCount);
|
||||
Assert.Equal("operator/other writer", msg.LastError);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RetryPendingMessagesAsync_PermanentFailureOnRetry_ParksMessage()
|
||||
{
|
||||
|
||||
@@ -106,6 +106,35 @@ public class StoreAndForwardStorageTests : IAsyncLifetime, IDisposable
|
||||
Assert.All(forRetry, m => Assert.Equal(StoreAndForwardMessageStatus.Pending, m.Status));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task GetMessagesForRetryAsync_NonZeroInterval_ExcludesNotYetDueIncludesDue()
|
||||
{
|
||||
// StoreAndForward-013: exercise the julianday elapsed-time comparison with a
|
||||
// non-zero retry interval. A message attempted just now must NOT be due; one
|
||||
// attempted long ago must be due.
|
||||
var notDue = CreateMessage("notdue", StoreAndForwardCategory.ExternalSystem);
|
||||
notDue.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||
notDue.LastAttemptAt = DateTimeOffset.UtcNow;
|
||||
await _storage.EnqueueAsync(notDue);
|
||||
|
||||
var due = CreateMessage("due", StoreAndForwardCategory.ExternalSystem);
|
||||
due.RetryIntervalMs = (long)TimeSpan.FromMinutes(5).TotalMilliseconds;
|
||||
due.LastAttemptAt = DateTimeOffset.UtcNow.AddHours(-2);
|
||||
await _storage.EnqueueAsync(due);
|
||||
|
||||
var neverAttempted = CreateMessage("never", StoreAndForwardCategory.ExternalSystem);
|
||||
neverAttempted.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||
neverAttempted.LastAttemptAt = null;
|
||||
await _storage.EnqueueAsync(neverAttempted);
|
||||
|
||||
var forRetry = await _storage.GetMessagesForRetryAsync();
|
||||
var ids = forRetry.Select(m => m.Id).ToHashSet();
|
||||
|
||||
Assert.DoesNotContain("notdue", ids);
|
||||
Assert.Contains("due", ids);
|
||||
Assert.Contains("never", ids);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task GetParkedMessagesAsync_ReturnsParkedOnly()
|
||||
{
|
||||
@@ -136,6 +165,31 @@ public class StoreAndForwardStorageTests : IAsyncLifetime, IDisposable
|
||||
Assert.Equal(0, retrieved.RetryCount);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task RetryParkedMessageAsync_ClearsLastAttemptAt_SoMessageIsImmediatelyDue()
|
||||
{
|
||||
// StoreAndForward-010: a re-queued parked message must be unambiguously due
|
||||
// for the next sweep regardless of its (stale) last_attempt_at. Use a large
|
||||
// retry interval so a leftover timestamp would otherwise exclude the message.
|
||||
var msg = CreateMessage("requeue1", StoreAndForwardCategory.ExternalSystem);
|
||||
msg.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||
msg.LastAttemptAt = DateTimeOffset.UtcNow; // recent attempt
|
||||
msg.Status = StoreAndForwardMessageStatus.Parked;
|
||||
await _storage.EnqueueAsync(msg);
|
||||
await _storage.UpdateMessageAsync(msg);
|
||||
|
||||
var requeued = await _storage.RetryParkedMessageAsync("requeue1");
|
||||
Assert.True(requeued);
|
||||
|
||||
var retrieved = await _storage.GetMessageByIdAsync("requeue1");
|
||||
Assert.Null(retrieved!.LastAttemptAt);
|
||||
|
||||
// It must appear in the retry-due set even though the configured interval
|
||||
// (1 hour) has not elapsed since the original attempt.
|
||||
var due = await _storage.GetMessagesForRetryAsync();
|
||||
Assert.Contains(due, m => m.Id == "requeue1");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task DiscardParkedMessageAsync_RemovesMessage()
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user