fix(store-and-forward): resolve StoreAndForward-004,005,010,013 — accurate handler-contract doc, conditional sweep writes, reset LastAttemptAt on parked retry, test coverage
This commit is contained in:
@@ -8,7 +8,7 @@
|
|||||||
| Last reviewed | 2026-05-16 |
|
| Last reviewed | 2026-05-16 |
|
||||||
| Reviewer | claude-agent |
|
| Reviewer | claude-agent |
|
||||||
| Commit reviewed | `9c60592` |
|
| Commit reviewed | `9c60592` |
|
||||||
| Open findings | 11 |
|
| Open findings | 7 |
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
@@ -212,7 +212,7 @@ corrected semantics.
|
|||||||
|--|--|
|
|--|--|
|
||||||
| Severity | Medium |
|
| Severity | Medium |
|
||||||
| Category | Documentation & comments |
|
| Category | Documentation & comments |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:38`, `:60` |
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:38`, `:60` |
|
||||||
|
|
||||||
**Description**
|
**Description**
|
||||||
@@ -233,15 +233,25 @@ retry); exception = transient failure (buffer / increment retry).
|
|||||||
|
|
||||||
**Resolution**
|
**Resolution**
|
||||||
|
|
||||||
_Unresolved._
|
Resolved 2026-05-16 (commit pending). Confirmed the root cause against the source: the
|
||||||
|
old XML comment's "Permanent failures should return false (message will NOT be
|
||||||
|
buffered)" describes only the `EnqueueAsync` immediate path; on the retry path a
|
||||||
|
`false` return parks the already-buffered message. Reworded the `_deliveryHandlers`
|
||||||
|
field doc into an explicit three-way contract (`true` = delivered/removed-or-never-
|
||||||
|
buffered; `false` = permanent failure = not-buffered-on-immediate / parked-on-retry;
|
||||||
|
throws = transient = buffered-for-retry / retry-count-incremented), noting it applies
|
||||||
|
identically on both paths, and pointed `RegisterDeliveryHandler`'s summary at it.
|
||||||
|
Documentation-only change — no behavioural code touched, so no regression test
|
||||||
|
(an XML comment is not test-observable).
|
||||||
|
|
||||||
### StoreAndForward-005 — Parked-message retry/discard can race with the in-progress retry sweep
|
### StoreAndForward-005 — Parked-message retry/discard can race with the in-progress retry sweep
|
||||||
|
|
||||||
| | |
|
| | |
|
||||||
|--|--|
|
|--|--|
|
||||||
| Severity | Medium |
|
| Severity | Low — re-triaged down from Medium on 2026-05-16; the described data-loss race is not actually reachable, see Re-triage note |
|
||||||
|
| Original severity | Medium |
|
||||||
| Category | Concurrency & thread safety |
|
| Category | Concurrency & thread safety |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:184`, `:266`, `:280` |
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:184`, `:266`, `:280` |
|
||||||
|
|
||||||
**Description**
|
**Description**
|
||||||
@@ -266,9 +276,42 @@ so the actor mailbox serialises them, or make status transitions conditional in
|
|||||||
(e.g. `UPDATE ... WHERE id = @id AND status = @expected`) and re-check the affected
|
(e.g. `UPDATE ... WHERE id = @id AND status = @expected`) and re-check the affected
|
||||||
row count.
|
row count.
|
||||||
|
|
||||||
|
**Re-triage note (2026-05-16)**
|
||||||
|
|
||||||
|
Verified against the source: the *specific* race the finding describes — *"an operator
|
||||||
|
`RetryParkedMessageAsync` resets a row to Pending while the sweep simultaneously parks
|
||||||
|
the same in-flight message"* — **cannot occur** at the reviewed code. The two operator
|
||||||
|
operations (`StoreAndForwardStorage.RetryParkedMessageAsync` /
|
||||||
|
`DiscardParkedMessageAsync`) are already SQL-conditional on `status = Parked`, and the
|
||||||
|
retry sweep only ever loads and processes rows that are `Pending`. For the operator and
|
||||||
|
the sweep to touch the *same* message the row would have to be simultaneously `Parked`
|
||||||
|
(operator-actionable) and in the sweep's in-flight `Pending` snapshot — mutually
|
||||||
|
exclusive states. Overlapping sweeps are additionally prevented by the `Interlocked`
|
||||||
|
guard, so there is only ever one sweep writer. The described data-loss outcome is
|
||||||
|
therefore not reachable, which makes the original Medium ("risky behaviour") severity
|
||||||
|
an over-statement — **re-triaged to Low**.
|
||||||
|
|
||||||
|
What *is* real is a latent fragility: the sweep's own state-changing writes used the
|
||||||
|
unconditional `UpdateMessageAsync` (`WHERE id = @id`), so the no-clobber guarantee
|
||||||
|
rested entirely on the `Interlocked` invariant with zero defence-in-depth if that
|
||||||
|
invariant were ever broken (e.g. `RetryMessageAsync` called outside the sweep, or the
|
||||||
|
guard removed). That residual Low-severity weakness is worth closing, so the
|
||||||
|
recommendation's conditional-SQL option was applied.
|
||||||
|
|
||||||
**Resolution**
|
**Resolution**
|
||||||
|
|
||||||
_Unresolved._
|
Resolved 2026-05-16 (commit pending). Re-triaged Medium → Low (the described race is
|
||||||
|
unreachable; see Re-triage note) and the residual latent fragility fixed: added
|
||||||
|
`StoreAndForwardStorage.UpdateMessageIfStatusAsync`, a conditional update
|
||||||
|
(`UPDATE ... WHERE id = @id AND status = @expectedStatus`) returning whether a row was
|
||||||
|
written. `RetryMessageAsync`'s three state-changing writes (park-on-permanent-failure,
|
||||||
|
park-on-max-retries, retry-count increment) now use it with `expectedStatus = Pending`,
|
||||||
|
so the sweep can never overwrite a row whose status changed underneath it; a skipped
|
||||||
|
write is logged and the message left for the next sweep. Regression test
|
||||||
|
`RetryMessageAsync_StatusChangedDuringDelivery_SweepParkWriteIsSkipped` drives a writer
|
||||||
|
that moves the row out of `Pending` mid-delivery and asserts the sweep's stale park
|
||||||
|
write is skipped (it failed against the pre-fix unconditional update, clobbering the
|
||||||
|
other writer's `RetryCount`).
|
||||||
|
|
||||||
### StoreAndForward-006 — `GetParkedMessagesAsync` count and page run without a transaction
|
### StoreAndForward-006 — `GetParkedMessagesAsync` count and page run without a transaction
|
||||||
|
|
||||||
@@ -395,7 +438,7 @@ _Unresolved._
|
|||||||
|--|--|
|
|--|--|
|
||||||
| Severity | Medium |
|
| Severity | Medium |
|
||||||
| Category | Correctness & logic bugs |
|
| Category | Correctness & logic bugs |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:203`, `:101` |
|
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:203`, `:101` |
|
||||||
|
|
||||||
**Description**
|
**Description**
|
||||||
@@ -419,7 +462,16 @@ interval. Document the chosen behaviour in the method's XML comment.
|
|||||||
|
|
||||||
**Resolution**
|
**Resolution**
|
||||||
|
|
||||||
_Unresolved._
|
Resolved 2026-05-16 (commit pending). Confirmed against the source: `RetryParkedMessage
|
||||||
|
Async` reset `status`, `retry_count` and `last_error` but left `last_attempt_at` stale,
|
||||||
|
so the operator-retried message's retry timing depended on the elapsed time since the
|
||||||
|
original pre-park attempt. Encoded the intent explicitly — an operator retry means
|
||||||
|
"attempt this again now" — by also setting `last_attempt_at = NULL` in the UPDATE, so
|
||||||
|
the re-queued message is unambiguously due on the next sweep regardless of the
|
||||||
|
configured `retry_interval_ms`. The method's XML comment now documents this. Regression
|
||||||
|
test `RetryParkedMessageAsync_ClearsLastAttemptAt_SoMessageIsImmediatelyDue` uses a
|
||||||
|
1-hour retry interval and a recent `last_attempt_at`; it failed pre-fix (timestamp not
|
||||||
|
cleared, message excluded from the retry-due set) and passes post-fix.
|
||||||
|
|
||||||
### StoreAndForward-011 — `StoreAndForwardMessageStatus.InFlight` is unused and the doc's "retrying" status is unmodelled
|
### StoreAndForward-011 — `StoreAndForwardMessageStatus.InFlight` is unused and the doc's "retrying" status is unmodelled
|
||||||
|
|
||||||
@@ -488,7 +540,7 @@ _Unresolved._
|
|||||||
|--|--|
|
|--|--|
|
||||||
| Severity | Medium |
|
| Severity | Medium |
|
||||||
| Category | Testing coverage |
|
| Category | Testing coverage |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
| Location | `tests/ScadaLink.StoreAndForward.Tests/` (whole directory); `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:101`; `src/ScadaLink.StoreAndForward/ParkedMessageHandlerActor.cs` |
|
| Location | `tests/ScadaLink.StoreAndForward.Tests/` (whole directory); `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:101`; `src/ScadaLink.StoreAndForward/ParkedMessageHandlerActor.cs` |
|
||||||
|
|
||||||
**Description**
|
**Description**
|
||||||
@@ -514,7 +566,20 @@ invalid-JSON payloads.
|
|||||||
|
|
||||||
**Resolution**
|
**Resolution**
|
||||||
|
|
||||||
_Unresolved._
|
Resolved 2026-05-16 (commit pending). All three gaps closed.
|
||||||
|
(1) Retry-due timing: `GetMessagesForRetryAsync_NonZeroInterval_ExcludesNotYetDue
|
||||||
|
IncludesDue` exercises the `julianday` elapsed-time SQL with non-zero intervals — a
|
||||||
|
just-attempted message (1-hour interval) is excluded, an old one (attempted 2h ago,
|
||||||
|
5-min interval) and a never-attempted one are included.
|
||||||
|
(2) Active-side replication: this sub-claim was **already stale** at the reviewed code —
|
||||||
|
`StoreAndForwardReplicationTests` (added with finding 001's fix) asserts the Add, Remove
|
||||||
|
and Park replication operations; no new test was needed.
|
||||||
|
(3) `ParkedMessageHandlerActor`: new `ParkedMessageHandlerActorTests` using
|
||||||
|
`Akka.TestKit.Xunit2` (package reference added to the test project) covers Query/Retry/
|
||||||
|
Discard request-to-response mapping, correlation-ID propagation, the unknown-message
|
||||||
|
failure responses, and `ExtractMethodName` for `MethodName`, `Subject`, missing-property
|
||||||
|
and malformed-JSON payloads (all falling back to the category name without throwing).
|
||||||
|
These are coverage-gap tests over already-correct code, so they pass on first run.
|
||||||
|
|
||||||
### StoreAndForward-014 — Storage does not create its SQLite database directory
|
### StoreAndForward-014 — Storage does not create its SQLite database directory
|
||||||
|
|
||||||
|
|||||||
@@ -36,8 +36,20 @@ public class StoreAndForwardService
|
|||||||
private int _retryInProgress;
|
private int _retryInProgress;
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// WP-10: Delivery handler delegate. Returns true on success, throws on transient failure.
|
/// WP-10: Delivery handler delegate. The return value / exception is interpreted
|
||||||
/// Permanent failures should return false (message will NOT be buffered).
|
/// the same way on both the immediate-delivery path (<see cref="EnqueueAsync"/>)
|
||||||
|
/// and the background retry path (<c>RetryMessageAsync</c>):
|
||||||
|
/// <list type="bullet">
|
||||||
|
/// <item><description><c>true</c> — delivered successfully. The message is
|
||||||
|
/// removed from the buffer (or, on the immediate path, never buffered).</description></item>
|
||||||
|
/// <item><description><c>false</c> — permanent failure. On the immediate path
|
||||||
|
/// the message is NOT buffered; on a retry the message is already buffered and
|
||||||
|
/// is parked immediately (no further retries).</description></item>
|
||||||
|
/// <item><description>throws — transient failure. On the immediate path the
|
||||||
|
/// message is buffered for retry; on a retry the retry count is incremented and
|
||||||
|
/// the message is parked once <see cref="StoreAndForwardMessage.MaxRetries"/> is
|
||||||
|
/// reached.</description></item>
|
||||||
|
/// </list>
|
||||||
/// </summary>
|
/// </summary>
|
||||||
private readonly Dictionary<StoreAndForwardCategory, Func<StoreAndForwardMessage, Task<bool>>> _deliveryHandlers = new();
|
private readonly Dictionary<StoreAndForwardCategory, Func<StoreAndForwardMessage, Task<bool>>> _deliveryHandlers = new();
|
||||||
|
|
||||||
@@ -59,7 +71,9 @@ public class StoreAndForwardService
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Registers a delivery handler for a given message category.
|
/// Registers a delivery handler for a given message category. See the
|
||||||
|
/// <c>_deliveryHandlers</c> field documentation for the true/false/throws contract,
|
||||||
|
/// which applies identically on the immediate and retry paths.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
public void RegisterDeliveryHandler(
|
public void RegisterDeliveryHandler(
|
||||||
StoreAndForwardCategory category,
|
StoreAndForwardCategory category,
|
||||||
@@ -241,11 +255,22 @@ public class StoreAndForwardService
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Permanent failure on retry — park immediately
|
// Permanent failure on retry — park immediately.
|
||||||
|
// StoreAndForward-005: the sweep observed this row as Pending; only commit
|
||||||
|
// the park if it is still Pending so a concurrent operator action that
|
||||||
|
// moved it (retry/discard) is not silently overwritten.
|
||||||
message.Status = StoreAndForwardMessageStatus.Parked;
|
message.Status = StoreAndForwardMessageStatus.Parked;
|
||||||
message.LastAttemptAt = DateTimeOffset.UtcNow;
|
message.LastAttemptAt = DateTimeOffset.UtcNow;
|
||||||
message.LastError = "Permanent failure (handler returned false)";
|
message.LastError = "Permanent failure (handler returned false)";
|
||||||
await _storage.UpdateMessageAsync(message);
|
var parked = await _storage.UpdateMessageIfStatusAsync(
|
||||||
|
message, StoreAndForwardMessageStatus.Pending);
|
||||||
|
if (!parked)
|
||||||
|
{
|
||||||
|
_logger.LogDebug(
|
||||||
|
"Message {MessageId} changed status during delivery; sweep park skipped",
|
||||||
|
message.Id);
|
||||||
|
return;
|
||||||
|
}
|
||||||
_replication?.ReplicatePark(message);
|
_replication?.ReplicatePark(message);
|
||||||
RaiseActivity("Parked", message.Category,
|
RaiseActivity("Parked", message.Category,
|
||||||
$"Permanent failure for {message.Target}: handler returned false");
|
$"Permanent failure for {message.Target}: handler returned false");
|
||||||
@@ -259,8 +284,18 @@ public class StoreAndForwardService
|
|||||||
|
|
||||||
if (message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries)
|
if (message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries)
|
||||||
{
|
{
|
||||||
|
// StoreAndForward-005: conditional park — see the permanent-failure
|
||||||
|
// branch above for rationale.
|
||||||
message.Status = StoreAndForwardMessageStatus.Parked;
|
message.Status = StoreAndForwardMessageStatus.Parked;
|
||||||
await _storage.UpdateMessageAsync(message);
|
var parked = await _storage.UpdateMessageIfStatusAsync(
|
||||||
|
message, StoreAndForwardMessageStatus.Pending);
|
||||||
|
if (!parked)
|
||||||
|
{
|
||||||
|
_logger.LogDebug(
|
||||||
|
"Message {MessageId} changed status during delivery; sweep park skipped",
|
||||||
|
message.Id);
|
||||||
|
return;
|
||||||
|
}
|
||||||
_replication?.ReplicatePark(message);
|
_replication?.ReplicatePark(message);
|
||||||
RaiseActivity("Parked", message.Category,
|
RaiseActivity("Parked", message.Category,
|
||||||
$"Max retries ({message.MaxRetries}) reached for {message.Target}");
|
$"Max retries ({message.MaxRetries}) reached for {message.Target}");
|
||||||
@@ -270,7 +305,17 @@ public class StoreAndForwardService
|
|||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
await _storage.UpdateMessageAsync(message);
|
// StoreAndForward-005: the retry-count increment is also conditional
|
||||||
|
// on the row still being Pending so it cannot clobber an operator
|
||||||
|
// action that ran during the failed delivery.
|
||||||
|
if (!await _storage.UpdateMessageIfStatusAsync(
|
||||||
|
message, StoreAndForwardMessageStatus.Pending))
|
||||||
|
{
|
||||||
|
_logger.LogDebug(
|
||||||
|
"Message {MessageId} changed status during delivery; sweep retry-count update skipped",
|
||||||
|
message.Id);
|
||||||
|
return;
|
||||||
|
}
|
||||||
RaiseActivity("Retried", message.Category,
|
RaiseActivity("Retried", message.Category,
|
||||||
$"Retry {message.RetryCount}/{message.MaxRetries} for {message.Target}: {ex.Message}");
|
$"Retry {message.RetryCount}/{message.MaxRetries} for {message.Target}: {ex.Message}");
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -164,6 +164,47 @@ public class StoreAndForwardStorage
|
|||||||
await cmd.ExecuteNonQueryAsync();
|
await cmd.ExecuteNonQueryAsync();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// WP-10: Updates a message after a delivery attempt, but only if the row is still
|
||||||
|
/// in the expected status. Returns true if the row was updated, false if it had
|
||||||
|
/// already been changed (e.g. an operator retried or discarded the message) and so
|
||||||
|
/// was skipped.
|
||||||
|
///
|
||||||
|
/// StoreAndForward-005: the retry sweep uses this for its state-changing writes so
|
||||||
|
/// it cannot clobber a concurrent operator action (RetryParkedMessageAsync /
|
||||||
|
/// DiscardParkedMessageAsync). Those operator operations are themselves SQL-
|
||||||
|
/// conditional on <c>status = Parked</c>; making the sweep's writes conditional on
|
||||||
|
/// the status the sweep observed closes the sweep-vs-management race rather than
|
||||||
|
/// relying only on the in-process overlapping-sweep guard.
|
||||||
|
/// </summary>
|
||||||
|
public async Task<bool> UpdateMessageIfStatusAsync(
|
||||||
|
StoreAndForwardMessage message,
|
||||||
|
StoreAndForwardMessageStatus expectedStatus)
|
||||||
|
{
|
||||||
|
await using var connection = new SqliteConnection(_connectionString);
|
||||||
|
await connection.OpenAsync();
|
||||||
|
|
||||||
|
await using var cmd = connection.CreateCommand();
|
||||||
|
cmd.CommandText = @"
|
||||||
|
UPDATE sf_messages
|
||||||
|
SET retry_count = @retryCount,
|
||||||
|
last_attempt_at = @lastAttempt,
|
||||||
|
status = @status,
|
||||||
|
last_error = @lastError
|
||||||
|
WHERE id = @id AND status = @expectedStatus";
|
||||||
|
|
||||||
|
cmd.Parameters.AddWithValue("@id", message.Id);
|
||||||
|
cmd.Parameters.AddWithValue("@retryCount", message.RetryCount);
|
||||||
|
cmd.Parameters.AddWithValue("@lastAttempt", message.LastAttemptAt.HasValue
|
||||||
|
? message.LastAttemptAt.Value.ToString("O") : DBNull.Value);
|
||||||
|
cmd.Parameters.AddWithValue("@status", (int)message.Status);
|
||||||
|
cmd.Parameters.AddWithValue("@lastError", (object?)message.LastError ?? DBNull.Value);
|
||||||
|
cmd.Parameters.AddWithValue("@expectedStatus", (int)expectedStatus);
|
||||||
|
|
||||||
|
var rows = await cmd.ExecuteNonQueryAsync();
|
||||||
|
return rows > 0;
|
||||||
|
}
|
||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// WP-10: Removes a successfully delivered message.
|
/// WP-10: Removes a successfully delivered message.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
@@ -221,6 +262,13 @@ public class StoreAndForwardStorage
|
|||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// WP-12: Moves a parked message back to pending for retry.
|
/// WP-12: Moves a parked message back to pending for retry.
|
||||||
|
///
|
||||||
|
/// StoreAndForward-010: <c>last_attempt_at</c> is reset to NULL so the re-queued
|
||||||
|
/// message is unambiguously due on the next retry sweep. An operator-initiated
|
||||||
|
/// retry means "attempt this again now"; leaving the stale parked timestamp in
|
||||||
|
/// place would make the message's retry timing depend on the configured retry
|
||||||
|
/// interval relative to the original (pre-park) attempt — "try immediately" only
|
||||||
|
/// by accident, and a long interval would instead delay the operator's retry.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
public async Task<bool> RetryParkedMessageAsync(string messageId)
|
public async Task<bool> RetryParkedMessageAsync(string messageId)
|
||||||
{
|
{
|
||||||
@@ -230,7 +278,7 @@ public class StoreAndForwardStorage
|
|||||||
await using var cmd = connection.CreateCommand();
|
await using var cmd = connection.CreateCommand();
|
||||||
cmd.CommandText = @"
|
cmd.CommandText = @"
|
||||||
UPDATE sf_messages
|
UPDATE sf_messages
|
||||||
SET status = @pending, retry_count = 0, last_error = NULL
|
SET status = @pending, retry_count = 0, last_error = NULL, last_attempt_at = NULL
|
||||||
WHERE id = @id AND status = @parked";
|
WHERE id = @id AND status = @parked";
|
||||||
|
|
||||||
cmd.Parameters.AddWithValue("@id", messageId);
|
cmd.Parameters.AddWithValue("@id", messageId);
|
||||||
|
|||||||
@@ -0,0 +1,173 @@
|
|||||||
|
using Akka.Actor;
|
||||||
|
using Akka.TestKit.Xunit2;
|
||||||
|
using Microsoft.Data.Sqlite;
|
||||||
|
using Microsoft.Extensions.Logging.Abstractions;
|
||||||
|
using ScadaLink.Commons.Messages.RemoteQuery;
|
||||||
|
using ScadaLink.Commons.Types.Enums;
|
||||||
|
|
||||||
|
namespace ScadaLink.StoreAndForward.Tests;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// StoreAndForward-013: tests for the <see cref="ParkedMessageHandlerActor"/> actor
|
||||||
|
/// bridge — the Query/Retry/Discard request-to-response mapping and the
|
||||||
|
/// <c>ExtractMethodName</c> payload-JSON parsing (including the malformed-JSON branch).
|
||||||
|
/// </summary>
|
||||||
|
public class ParkedMessageHandlerActorTests : TestKit, IAsyncLifetime, IDisposable
|
||||||
|
{
|
||||||
|
private readonly SqliteConnection _keepAlive;
|
||||||
|
private readonly StoreAndForwardStorage _storage;
|
||||||
|
private readonly StoreAndForwardService _service;
|
||||||
|
|
||||||
|
public ParkedMessageHandlerActorTests()
|
||||||
|
{
|
||||||
|
var connStr = $"Data Source=ActorTests_{Guid.NewGuid():N};Mode=Memory;Cache=Shared";
|
||||||
|
_keepAlive = new SqliteConnection(connStr);
|
||||||
|
_keepAlive.Open();
|
||||||
|
|
||||||
|
_storage = new StoreAndForwardStorage(connStr, NullLogger<StoreAndForwardStorage>.Instance);
|
||||||
|
|
||||||
|
var options = new StoreAndForwardOptions
|
||||||
|
{
|
||||||
|
DefaultRetryInterval = TimeSpan.Zero,
|
||||||
|
DefaultMaxRetries = 1,
|
||||||
|
RetryTimerInterval = TimeSpan.FromMinutes(10),
|
||||||
|
ReplicationEnabled = false,
|
||||||
|
};
|
||||||
|
|
||||||
|
_service = new StoreAndForwardService(
|
||||||
|
_storage, options, NullLogger<StoreAndForwardService>.Instance);
|
||||||
|
}
|
||||||
|
|
||||||
|
public async Task InitializeAsync() => await _storage.InitializeAsync();
|
||||||
|
|
||||||
|
public Task DisposeAsync() => Task.CompletedTask;
|
||||||
|
|
||||||
|
protected override void Dispose(bool disposing)
|
||||||
|
{
|
||||||
|
if (disposing) _keepAlive.Dispose();
|
||||||
|
base.Dispose(disposing);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>Enqueues a message and parks it via the retry sweep (MaxRetries = 1).</summary>
|
||||||
|
private async Task<string> ParkMessageAsync(StoreAndForwardCategory category, string payloadJson)
|
||||||
|
{
|
||||||
|
_service.RegisterDeliveryHandler(category, _ => throw new HttpRequestException("always fails"));
|
||||||
|
var result = await _service.EnqueueAsync(category, "target", payloadJson, maxRetries: 1);
|
||||||
|
await _service.RetryPendingMessagesAsync();
|
||||||
|
return result.MessageId;
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Query_ReturnsParkedEntries_WithExtractedMethodName()
|
||||||
|
{
|
||||||
|
await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem,
|
||||||
|
"""{"MethodName":"StartPump","Args":{}}""");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageQueryRequest("corr-1", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||||
|
Assert.True(response.Success);
|
||||||
|
Assert.Equal("corr-1", response.CorrelationId);
|
||||||
|
Assert.Equal("site-1", response.SiteId);
|
||||||
|
Assert.Single(response.Messages);
|
||||||
|
Assert.Equal("StartPump", response.Messages[0].MethodName);
|
||||||
|
Assert.Equal(StoreAndForwardCategory.ExternalSystem, response.Messages[0].Category);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Query_NotificationPayload_UsesSubjectAsMethodName()
|
||||||
|
{
|
||||||
|
await ParkMessageAsync(StoreAndForwardCategory.Notification,
|
||||||
|
"""{"Subject":"Tank overflow","Body":"..."}""");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageQueryRequest("corr-2", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||||
|
Assert.True(response.Success);
|
||||||
|
Assert.Equal("Tank overflow", response.Messages[0].MethodName);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Query_MalformedJsonPayload_FallsBackToCategoryName()
|
||||||
|
{
|
||||||
|
await ParkMessageAsync(StoreAndForwardCategory.CachedDbWrite, "not-valid-json{");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageQueryRequest("corr-3", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||||
|
Assert.True(response.Success);
|
||||||
|
// Malformed JSON must not throw — ExtractMethodName falls back to the category.
|
||||||
|
Assert.Equal(StoreAndForwardCategory.CachedDbWrite.ToString(), response.Messages[0].MethodName);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Query_PayloadWithoutMethodNameOrSubject_FallsBackToCategoryName()
|
||||||
|
{
|
||||||
|
await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{"Unrelated":"value"}""");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageQueryRequest("corr-4", "site-1", 1, 50, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageQueryResponse>();
|
||||||
|
Assert.Equal(StoreAndForwardCategory.ExternalSystem.ToString(), response.Messages[0].MethodName);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Retry_ParkedMessage_ReturnsSuccess()
|
||||||
|
{
|
||||||
|
var messageId = await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{}""");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageRetryRequest("corr-5", "site-1", messageId, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageRetryResponse>();
|
||||||
|
Assert.True(response.Success);
|
||||||
|
Assert.Equal("corr-5", response.CorrelationId);
|
||||||
|
Assert.Null(response.ErrorMessage);
|
||||||
|
|
||||||
|
var msg = await _storage.GetMessageByIdAsync(messageId);
|
||||||
|
Assert.Equal(StoreAndForwardMessageStatus.Pending, msg!.Status);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Retry_UnknownMessage_ReturnsFailureWithMessage()
|
||||||
|
{
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageRetryRequest("corr-6", "site-1", "does-not-exist", DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageRetryResponse>();
|
||||||
|
Assert.False(response.Success);
|
||||||
|
Assert.Equal("corr-6", response.CorrelationId);
|
||||||
|
Assert.NotNull(response.ErrorMessage);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task Discard_ParkedMessage_ReturnsSuccessAndRemovesMessage()
|
||||||
|
{
|
||||||
|
var messageId = await ParkMessageAsync(StoreAndForwardCategory.ExternalSystem, """{}""");
|
||||||
|
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageDiscardRequest("corr-7", "site-1", messageId, DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageDiscardResponse>();
|
||||||
|
Assert.True(response.Success);
|
||||||
|
Assert.Equal("corr-7", response.CorrelationId);
|
||||||
|
|
||||||
|
var msg = await _storage.GetMessageByIdAsync(messageId);
|
||||||
|
Assert.Null(msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public void Discard_UnknownMessage_ReturnsFailureWithMessage()
|
||||||
|
{
|
||||||
|
var actor = Sys.ActorOf(Props.Create(() => new ParkedMessageHandlerActor(_service, "site-1")));
|
||||||
|
actor.Tell(new ParkedMessageDiscardRequest("corr-8", "site-1", "does-not-exist", DateTimeOffset.UtcNow));
|
||||||
|
|
||||||
|
var response = ExpectMsg<ParkedMessageDiscardResponse>();
|
||||||
|
Assert.False(response.Success);
|
||||||
|
Assert.NotNull(response.ErrorMessage);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -9,6 +9,7 @@
|
|||||||
</PropertyGroup>
|
</PropertyGroup>
|
||||||
|
|
||||||
<ItemGroup>
|
<ItemGroup>
|
||||||
|
<PackageReference Include="Akka.TestKit.Xunit2" />
|
||||||
<PackageReference Include="coverlet.collector" />
|
<PackageReference Include="coverlet.collector" />
|
||||||
<PackageReference Include="Microsoft.Data.Sqlite" />
|
<PackageReference Include="Microsoft.Data.Sqlite" />
|
||||||
<PackageReference Include="Microsoft.NET.Test.Sdk" />
|
<PackageReference Include="Microsoft.NET.Test.Sdk" />
|
||||||
|
|||||||
@@ -177,6 +177,49 @@ public class StoreAndForwardServiceTests : IAsyncLifetime, IDisposable
|
|||||||
Assert.Equal(1, msg.RetryCount); // one sweep retry recorded
|
Assert.Equal(1, msg.RetryCount); // one sweep retry recorded
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── StoreAndForward-005: sweep-vs-management race hardening ──
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task RetryMessageAsync_StatusChangedDuringDelivery_SweepParkWriteIsSkipped()
|
||||||
|
{
|
||||||
|
// StoreAndForward-005: the retry sweep's state-changing writes must be
|
||||||
|
// conditional on the status it observed, so a concurrent operator action that
|
||||||
|
// moved the row out of Pending (e.g. between the sweep's snapshot load and its
|
||||||
|
// park write) is not silently overwritten by the sweep's stale view.
|
||||||
|
var result = await _service.EnqueueAsync(
|
||||||
|
StoreAndForwardCategory.ExternalSystem, "api", """{}""",
|
||||||
|
attemptImmediateDelivery: false, maxRetries: 1);
|
||||||
|
|
||||||
|
_service.RegisterDeliveryHandler(StoreAndForwardCategory.ExternalSystem,
|
||||||
|
async msg =>
|
||||||
|
{
|
||||||
|
// Simulate an operator action winning the race: the row leaves Pending
|
||||||
|
// (here: parked) while the sweep is still mid-delivery. The sweep would
|
||||||
|
// otherwise unconditionally re-write this row from its stale snapshot.
|
||||||
|
var parkedOutFromUnderTheSweep = new StoreAndForwardMessage
|
||||||
|
{
|
||||||
|
Id = msg.Id, Category = msg.Category, Target = msg.Target,
|
||||||
|
PayloadJson = msg.PayloadJson, RetryCount = 7,
|
||||||
|
MaxRetries = msg.MaxRetries, RetryIntervalMs = msg.RetryIntervalMs,
|
||||||
|
CreatedAt = msg.CreatedAt, LastAttemptAt = DateTimeOffset.UtcNow,
|
||||||
|
Status = StoreAndForwardMessageStatus.Parked,
|
||||||
|
LastError = "operator/other writer"
|
||||||
|
};
|
||||||
|
await _storage.UpdateMessageAsync(parkedOutFromUnderTheSweep);
|
||||||
|
throw new HttpRequestException("transient — sweep will try to park");
|
||||||
|
});
|
||||||
|
|
||||||
|
await _service.RetryPendingMessagesAsync();
|
||||||
|
|
||||||
|
// The sweep observed Pending; the row is now Parked with the other writer's
|
||||||
|
// RetryCount (7), not the sweep's (1). The sweep's conditional write was skipped.
|
||||||
|
var msg = await _storage.GetMessageByIdAsync(result.MessageId);
|
||||||
|
Assert.NotNull(msg);
|
||||||
|
Assert.Equal(StoreAndForwardMessageStatus.Parked, msg!.Status);
|
||||||
|
Assert.Equal(7, msg.RetryCount);
|
||||||
|
Assert.Equal("operator/other writer", msg.LastError);
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public async Task RetryPendingMessagesAsync_PermanentFailureOnRetry_ParksMessage()
|
public async Task RetryPendingMessagesAsync_PermanentFailureOnRetry_ParksMessage()
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -106,6 +106,35 @@ public class StoreAndForwardStorageTests : IAsyncLifetime, IDisposable
|
|||||||
Assert.All(forRetry, m => Assert.Equal(StoreAndForwardMessageStatus.Pending, m.Status));
|
Assert.All(forRetry, m => Assert.Equal(StoreAndForwardMessageStatus.Pending, m.Status));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task GetMessagesForRetryAsync_NonZeroInterval_ExcludesNotYetDueIncludesDue()
|
||||||
|
{
|
||||||
|
// StoreAndForward-013: exercise the julianday elapsed-time comparison with a
|
||||||
|
// non-zero retry interval. A message attempted just now must NOT be due; one
|
||||||
|
// attempted long ago must be due.
|
||||||
|
var notDue = CreateMessage("notdue", StoreAndForwardCategory.ExternalSystem);
|
||||||
|
notDue.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||||
|
notDue.LastAttemptAt = DateTimeOffset.UtcNow;
|
||||||
|
await _storage.EnqueueAsync(notDue);
|
||||||
|
|
||||||
|
var due = CreateMessage("due", StoreAndForwardCategory.ExternalSystem);
|
||||||
|
due.RetryIntervalMs = (long)TimeSpan.FromMinutes(5).TotalMilliseconds;
|
||||||
|
due.LastAttemptAt = DateTimeOffset.UtcNow.AddHours(-2);
|
||||||
|
await _storage.EnqueueAsync(due);
|
||||||
|
|
||||||
|
var neverAttempted = CreateMessage("never", StoreAndForwardCategory.ExternalSystem);
|
||||||
|
neverAttempted.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||||
|
neverAttempted.LastAttemptAt = null;
|
||||||
|
await _storage.EnqueueAsync(neverAttempted);
|
||||||
|
|
||||||
|
var forRetry = await _storage.GetMessagesForRetryAsync();
|
||||||
|
var ids = forRetry.Select(m => m.Id).ToHashSet();
|
||||||
|
|
||||||
|
Assert.DoesNotContain("notdue", ids);
|
||||||
|
Assert.Contains("due", ids);
|
||||||
|
Assert.Contains("never", ids);
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public async Task GetParkedMessagesAsync_ReturnsParkedOnly()
|
public async Task GetParkedMessagesAsync_ReturnsParkedOnly()
|
||||||
{
|
{
|
||||||
@@ -136,6 +165,31 @@ public class StoreAndForwardStorageTests : IAsyncLifetime, IDisposable
|
|||||||
Assert.Equal(0, retrieved.RetryCount);
|
Assert.Equal(0, retrieved.RetryCount);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
[Fact]
|
||||||
|
public async Task RetryParkedMessageAsync_ClearsLastAttemptAt_SoMessageIsImmediatelyDue()
|
||||||
|
{
|
||||||
|
// StoreAndForward-010: a re-queued parked message must be unambiguously due
|
||||||
|
// for the next sweep regardless of its (stale) last_attempt_at. Use a large
|
||||||
|
// retry interval so a leftover timestamp would otherwise exclude the message.
|
||||||
|
var msg = CreateMessage("requeue1", StoreAndForwardCategory.ExternalSystem);
|
||||||
|
msg.RetryIntervalMs = (long)TimeSpan.FromHours(1).TotalMilliseconds;
|
||||||
|
msg.LastAttemptAt = DateTimeOffset.UtcNow; // recent attempt
|
||||||
|
msg.Status = StoreAndForwardMessageStatus.Parked;
|
||||||
|
await _storage.EnqueueAsync(msg);
|
||||||
|
await _storage.UpdateMessageAsync(msg);
|
||||||
|
|
||||||
|
var requeued = await _storage.RetryParkedMessageAsync("requeue1");
|
||||||
|
Assert.True(requeued);
|
||||||
|
|
||||||
|
var retrieved = await _storage.GetMessageByIdAsync("requeue1");
|
||||||
|
Assert.Null(retrieved!.LastAttemptAt);
|
||||||
|
|
||||||
|
// It must appear in the retry-due set even though the configured interval
|
||||||
|
// (1 hour) has not elapsed since the original attempt.
|
||||||
|
var due = await _storage.GetMessagesForRetryAsync();
|
||||||
|
Assert.Contains(due, m => m.Id == "requeue1");
|
||||||
|
}
|
||||||
|
|
||||||
[Fact]
|
[Fact]
|
||||||
public async Task DiscardParkedMessageAsync_RemovesMessage()
|
public async Task DiscardParkedMessageAsync_RemovesMessage()
|
||||||
{
|
{
|
||||||
|
|||||||
Reference in New Issue
Block a user