fix(store-and-forward): resolve StoreAndForward-004,005,010,013 — accurate handler-contract doc, conditional sweep writes, reset LastAttemptAt on parked retry, test coverage
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 11 |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -212,7 +212,7 @@ corrected semantics.
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:38`, `:60` |
|
||||
|
||||
**Description**
|
||||
@@ -233,15 +233,25 @@ retry); exception = transient failure (buffer / increment retry).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Confirmed the root cause against the source: the
|
||||
old XML comment's "Permanent failures should return false (message will NOT be
|
||||
buffered)" describes only the `EnqueueAsync` immediate path; on the retry path a
|
||||
`false` return parks the already-buffered message. Reworded the `_deliveryHandlers`
|
||||
field doc into an explicit three-way contract (`true` = delivered/removed-or-never-
|
||||
buffered; `false` = permanent failure = not-buffered-on-immediate / parked-on-retry;
|
||||
throws = transient = buffered-for-retry / retry-count-incremented), noting it applies
|
||||
identically on both paths, and pointed `RegisterDeliveryHandler`'s summary at it.
|
||||
Documentation-only change — no behavioural code touched, so no regression test
|
||||
(an XML comment is not test-observable).
|
||||
|
||||
### StoreAndForward-005 — Parked-message retry/discard can race with the in-progress retry sweep
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Low — re-triaged down from Medium on 2026-05-16; the described data-loss race is not actually reachable, see Re-triage note |
|
||||
| Original severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:184`, `:266`, `:280` |
|
||||
|
||||
**Description**
|
||||
@@ -266,9 +276,42 @@ so the actor mailbox serialises them, or make status transitions conditional in
|
||||
(e.g. `UPDATE ... WHERE id = @id AND status = @expected`) and re-check the affected
|
||||
row count.
|
||||
|
||||
**Re-triage note (2026-05-16)**
|
||||
|
||||
Verified against the source: the *specific* race the finding describes — *"an operator
|
||||
`RetryParkedMessageAsync` resets a row to Pending while the sweep simultaneously parks
|
||||
the same in-flight message"* — **cannot occur** at the reviewed code. The two operator
|
||||
operations (`StoreAndForwardStorage.RetryParkedMessageAsync` /
|
||||
`DiscardParkedMessageAsync`) are already SQL-conditional on `status = Parked`, and the
|
||||
retry sweep only ever loads and processes rows that are `Pending`. For the operator and
|
||||
the sweep to touch the *same* message the row would have to be simultaneously `Parked`
|
||||
(operator-actionable) and in the sweep's in-flight `Pending` snapshot — mutually
|
||||
exclusive states. Overlapping sweeps are additionally prevented by the `Interlocked`
|
||||
guard, so there is only ever one sweep writer. The described data-loss outcome is
|
||||
therefore not reachable, which makes the original Medium ("risky behaviour") severity
|
||||
an over-statement — **re-triaged to Low**.
|
||||
|
||||
What *is* real is a latent fragility: the sweep's own state-changing writes used the
|
||||
unconditional `UpdateMessageAsync` (`WHERE id = @id`), so the no-clobber guarantee
|
||||
rested entirely on the `Interlocked` invariant with zero defence-in-depth if that
|
||||
invariant were ever broken (e.g. `RetryMessageAsync` called outside the sweep, or the
|
||||
guard removed). That residual Low-severity weakness is worth closing, so the
|
||||
recommendation's conditional-SQL option was applied.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Re-triaged Medium → Low (the described race is
|
||||
unreachable; see Re-triage note) and the residual latent fragility fixed: added
|
||||
`StoreAndForwardStorage.UpdateMessageIfStatusAsync`, a conditional update
|
||||
(`UPDATE ... WHERE id = @id AND status = @expectedStatus`) returning whether a row was
|
||||
written. `RetryMessageAsync`'s three state-changing writes (park-on-permanent-failure,
|
||||
park-on-max-retries, retry-count increment) now use it with `expectedStatus = Pending`,
|
||||
so the sweep can never overwrite a row whose status changed underneath it; a skipped
|
||||
write is logged and the message left for the next sweep. Regression test
|
||||
`RetryMessageAsync_StatusChangedDuringDelivery_SweepParkWriteIsSkipped` drives a writer
|
||||
that moves the row out of `Pending` mid-delivery and asserts the sweep's stale park
|
||||
write is skipped (it failed against the pre-fix unconditional update, clobbering the
|
||||
other writer's `RetryCount`).
|
||||
|
||||
### StoreAndForward-006 — `GetParkedMessagesAsync` count and page run without a transaction
|
||||
|
||||
@@ -395,7 +438,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:203`, `:101` |
|
||||
|
||||
**Description**
|
||||
@@ -419,7 +462,16 @@ interval. Document the chosen behaviour in the method's XML comment.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Confirmed against the source: `RetryParkedMessage
|
||||
Async` reset `status`, `retry_count` and `last_error` but left `last_attempt_at` stale,
|
||||
so the operator-retried message's retry timing depended on the elapsed time since the
|
||||
original pre-park attempt. Encoded the intent explicitly — an operator retry means
|
||||
"attempt this again now" — by also setting `last_attempt_at = NULL` in the UPDATE, so
|
||||
the re-queued message is unambiguously due on the next sweep regardless of the
|
||||
configured `retry_interval_ms`. The method's XML comment now documents this. Regression
|
||||
test `RetryParkedMessageAsync_ClearsLastAttemptAt_SoMessageIsImmediatelyDue` uses a
|
||||
1-hour retry interval and a recent `last_attempt_at`; it failed pre-fix (timestamp not
|
||||
cleared, message excluded from the retry-due set) and passes post-fix.
|
||||
|
||||
### StoreAndForward-011 — `StoreAndForwardMessageStatus.InFlight` is unused and the doc's "retrying" status is unmodelled
|
||||
|
||||
@@ -488,7 +540,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `tests/ScadaLink.StoreAndForward.Tests/` (whole directory); `src/ScadaLink.StoreAndForward/StoreAndForwardStorage.cs:101`; `src/ScadaLink.StoreAndForward/ParkedMessageHandlerActor.cs` |
|
||||
|
||||
**Description**
|
||||
@@ -514,7 +566,20 @@ invalid-JSON payloads.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). All three gaps closed.
|
||||
(1) Retry-due timing: `GetMessagesForRetryAsync_NonZeroInterval_ExcludesNotYetDue
|
||||
IncludesDue` exercises the `julianday` elapsed-time SQL with non-zero intervals — a
|
||||
just-attempted message (1-hour interval) is excluded, an old one (attempted 2h ago,
|
||||
5-min interval) and a never-attempted one are included.
|
||||
(2) Active-side replication: this sub-claim was **already stale** at the reviewed code —
|
||||
`StoreAndForwardReplicationTests` (added with finding 001's fix) asserts the Add, Remove
|
||||
and Park replication operations; no new test was needed.
|
||||
(3) `ParkedMessageHandlerActor`: new `ParkedMessageHandlerActorTests` using
|
||||
`Akka.TestKit.Xunit2` (package reference added to the test project) covers Query/Retry/
|
||||
Discard request-to-response mapping, correlation-ID propagation, the unknown-message
|
||||
failure responses, and `ExtractMethodName` for `MethodName`, `Subject`, missing-property
|
||||
and malformed-JSON payloads (all falling back to the category name without throwing).
|
||||
These are coverage-gap tests over already-correct code, so they pass on first run.
|
||||
|
||||
### StoreAndForward-014 — Storage does not create its SQLite database directory
|
||||
|
||||
|
||||
Reference in New Issue
Block a user