docs(code-reviews): re-review batch 4 at 39d737e — SiteEventLogging, SiteRuntime, StoreAndForward, TemplateEngine
11 new findings: SiteEventLogging-012..014, SiteRuntime-017..019, StoreAndForward-015..017, TemplateEngine-015..016.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.StoreAndForward` |
|
||||
| Design doc | `docs/requirements/Component-StoreAndForward.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 0 (3 Deferred: 002, 011, 012 — see notes) |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 3 (3 Deferred: 002, 011, 012 — see notes) |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -30,20 +30,45 @@ status set, and untested critical paths (retry-due timing, replication-from-acti
|
||||
the actor bridge). None of the findings are blockers for compilation, but the
|
||||
replication and retry-count issues are functional defects against the design.
|
||||
|
||||
#### Re-review 2026-05-17 (commit `39d737e`)
|
||||
|
||||
Re-reviewed at commit `39d737e` after the batch-3 fixes. All of findings 001 and
|
||||
003–010, plus 013–014, are confirmed `Resolved` against the current source: the
|
||||
replication wiring (`BufferAsync`/`ReplicateRemove`/`ReplicatePark`), the corrected
|
||||
retry-count semantics, the conditional `UpdateMessageIfStatusAsync` writes, the
|
||||
transactioned parked-message reads, the `PipeTo` refactor, the `RaiseActivity`
|
||||
hardening, the `RetryParkedMessageAsync` `last_attempt_at` reset and the database
|
||||
directory creation are all present as described. Findings 002, 011 and 012 were
|
||||
re-verified and remain validly `Deferred` — their preconditions are unchanged (002's
|
||||
residual no-handler gap, 011's Commons-owned enum, 012's Commons-owned entity placement).
|
||||
|
||||
This pass surfaced **three new findings**. StoreAndForward-015 records the
|
||||
StoreAndForward side of the cross-module `MaxRetries == 0` ambiguity flagged by
|
||||
ExternalSystemGateway-015: `EnqueueAsync`'s public contract documents `maxRetries` only
|
||||
as "parked once `MaxRetries` is reached" and never states the `0 = no limit / retry
|
||||
forever` special case that `RetryMessageAsync` actually enforces, so an ESG caller
|
||||
passing `0` to mean "never retry" gets the opposite behaviour with no warning from the
|
||||
S&F API surface. StoreAndForward-016 records that operator-initiated parked-message
|
||||
retry and discard are not replicated to the standby — only the add/remove/park sweep
|
||||
paths are — so a failover diverges the standby buffer from the active one.
|
||||
StoreAndForward-017 records that the Retry/Discard activity-log entries hard-code the
|
||||
`ExternalSystem` category, mislabelling notification and cached-DB-write messages in
|
||||
the site event log.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010). |
|
||||
| 1 | Correctness & logic bugs | ☑ | Off-by-one in retry counting (003); parked-message retry timing (010); Retry/Discard activity log hard-codes the category (017). |
|
||||
| 2 | Akka.NET conventions | ☑ | `ContinueWith` used instead of `PipeTo`-friendly continuations; default supervision; see 007. |
|
||||
| 3 | Concurrency & thread safety | ☑ | Sweep guarded by `Interlocked`, but no guard against retry-vs-manage races (005); `OnActivity` event not thread-safe (009). |
|
||||
| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002). |
|
||||
| 4 | Error handling & resilience | ☑ | Replication never invoked from active path (001); no-handler messages buffered then stuck (002); operator retry/discard not replicated to standby (016). |
|
||||
| 5 | Security | ☑ | No issues found — parameterised SQL throughout; no secrets handled directly; payload JSON treated opaquely. |
|
||||
| 6 | Performance & resource management | ☑ | New SQLite connection per call; multi-statement operations not wrapped in a transaction (006, 008). |
|
||||
| 7 | Design-document adherence | ☑ | Replication gap (001); `InFlight` status undocumented/unused (011); "retrying" status from design doc not modelled. |
|
||||
| 8 | Code organization & conventions | ☑ | `StoreAndForwardMessage` is an entity-like POCO living in the component, not Commons (012). |
|
||||
| 9 | Testing coverage | ☑ | Retry-due timing, replication-from-active, and `ParkedMessageHandlerActor` are untested (013). |
|
||||
| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004). |
|
||||
| 10 | Documentation & comments | ☑ | XML doc on `RegisterDeliveryHandler` contract is inconsistent with code (004); `EnqueueAsync` never documents the `maxRetries == 0` = "retry forever" special case (015). |
|
||||
|
||||
## Findings
|
||||
|
||||
@@ -703,3 +728,158 @@ and bare filenames are skipped). Regression test
|
||||
`InitializeAsync_FileInMissingDirectory_CreatesDirectory` fails against the pre-fix code;
|
||||
all six `SiteActorPathTests` now pass. Fixed by the commit whose message references
|
||||
`StoreAndForward-014`.
|
||||
|
||||
### StoreAndForward-015 — `EnqueueAsync`'s public contract never documents that `maxRetries == 0` means "retry forever"
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:114`–`:130`, `:285` |
|
||||
|
||||
**Description**
|
||||
|
||||
The re-review brief asks for the StoreAndForward side of the cross-module ambiguity
|
||||
recorded as `ExternalSystemGateway-015`. The semantics are split across this module and
|
||||
its callers, and the StoreAndForward side carries a genuine documentation/API-contract
|
||||
fault:
|
||||
|
||||
- `RetryMessageAsync` parks a message only when `message.MaxRetries > 0 && message.RetryCount >= message.MaxRetries`
|
||||
(`StoreAndForwardService.cs:285`). When `MaxRetries == 0` the guard is false on every
|
||||
sweep, so a `0` value means **"no limit — retry forever"**. The
|
||||
`StoreAndForwardMessage.MaxRetries` XML doc (`StoreAndForwardMessage.cs:31`) does state
|
||||
`"0 = no limit"`, so the persistence model is internally consistent.
|
||||
- But `EnqueueAsync` — the *only* public entry point into the engine — exposes a
|
||||
`maxRetries` parameter (`StoreAndForwardService.cs:128`) with **no parameter
|
||||
documentation at all**, and its method summary (lines 114–122) describes the lifecycle
|
||||
only as "On max retries → park" / "parked once `MaxRetries` is reached" (see also the
|
||||
`_deliveryHandlers` field doc, line 50–51). Nothing on the public surface tells a
|
||||
caller that passing `0` flips the meaning from "park immediately / never retry" to
|
||||
"retry forever". A caller reading only `EnqueueAsync` would reasonably assume `0`
|
||||
retries means zero retries.
|
||||
- This is exactly the trap ESG fell into: `ExternalSystemClient.CachedCallAsync` /
|
||||
`DatabaseGateway.CachedWriteAsync` pass the source entity's `MaxRetries` verbatim,
|
||||
intending `0` to mean "never retry", and instead get unbounded retry — the
|
||||
duplicate-delivery / unbounded-buffer-growth hazard the design doc's idempotency note
|
||||
warns against. The fault is not solely ESG's: the S&F public API silently overloads
|
||||
`0` with the opposite of its natural reading and does not document it.
|
||||
|
||||
The defect is in this module's API contract and documentation, so it is recorded here
|
||||
in addition to `ExternalSystemGateway-015`. (Whether `0` *should* mean "no limit" or
|
||||
"no retry" is the cross-module design decision tracked by ESG-015; this finding is
|
||||
specifically that the StoreAndForward public surface fails to document whichever
|
||||
meaning is chosen.)
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Document the `maxRetries` parameter on `EnqueueAsync` explicitly with a `<param>` tag
|
||||
that states the `0` special case in the same words as `StoreAndForwardMessage.MaxRetries`
|
||||
(`"0 = no limit — retried on every sweep until delivered, never parked"`), and add the
|
||||
`0` case to the method summary's lifecycle description. Better still — and consistent
|
||||
with the resolution of ESG-015 — make the engine reject the ambiguity at the API: accept
|
||||
a nullable/`enum` retry policy, or treat `0` as an explicit "no retry" (do not buffer, or
|
||||
park on the first sweep) so the natural reading and the behaviour agree. Either way the
|
||||
public `EnqueueAsync` contract must state the chosen meaning; today it states nothing.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### StoreAndForward-016 — Operator-initiated parked-message retry and discard are not replicated to the standby
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:339`–`:362`; `src/ScadaLink.StoreAndForward/ReplicationService.cs:131`–`:136` |
|
||||
|
||||
**Description**
|
||||
|
||||
`StoreAndForward-001`'s fix wired replication into the active *delivery* paths:
|
||||
`BufferAsync` replicates an `Add`, a successful retry replicates a `Remove`, and a park
|
||||
replicates a `Park`. But the two *operator* paths — `RetryParkedMessageAsync` (line 339)
|
||||
and `DiscardParkedMessageAsync` (line 353) — change buffer state and never touch
|
||||
`_replication`:
|
||||
|
||||
- `RetryParkedMessageAsync` flips a row from `Parked` back to `Pending` (and clears
|
||||
`retry_count` / `last_attempt_at`) in the local SQLite only. The standby's copy stays
|
||||
`Parked`.
|
||||
- `DiscardParkedMessageAsync` `DELETE`s the row from the local SQLite only. The standby's
|
||||
copy is left in place, still `Parked`.
|
||||
|
||||
The Component design doc ("Persistence") requires the active node to forward "each
|
||||
buffer operation (add, remove, park)" so that on failover "the new active node has a
|
||||
near-complete copy of the buffer." An operator retrying a parked message is a buffer
|
||||
state change; an operator discarding one is a removal. After a failover that follows an
|
||||
operator action:
|
||||
|
||||
1. A **discarded** message reappears on the new active node — it is still `Parked`
|
||||
there, so it resurfaces in the central UI's parked-message list and an operator must
|
||||
discard it a second time. For a message deliberately removed (e.g. a known-bad
|
||||
payload) this is a correctness regression of the operator's intent.
|
||||
2. A **retried** message is still `Parked` on the new active node, so the operator's
|
||||
"move it back to the queue" action is silently lost across the failover and the
|
||||
message is not re-attempted.
|
||||
|
||||
`ReplicationOperationType` only models `Add`/`Remove`/`Park` — there is no operation for
|
||||
"un-park / move back to pending", so even a minimal fix needs either a new operation
|
||||
type or a re-use of `Add` to overwrite the standby row. This is the same class of defect
|
||||
as the now-resolved `StoreAndForward-001`, for the operator paths rather than the sweep
|
||||
paths.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Replicate both operator actions. `DiscardParkedMessageAsync` should call
|
||||
`_replication?.ReplicateRemove(messageId)` after a successful local delete (the existing
|
||||
`Remove` op already deletes on the standby). For `RetryParkedMessageAsync`, add a
|
||||
`Requeue`/`Unpark` `ReplicationOperationType` whose `ApplyReplicatedOperationAsync` case
|
||||
resets the standby row to `Pending` with `retry_count = 0`, or have the method re-load
|
||||
the updated message and replicate it as an `Add`-style upsert. Add replication tests for
|
||||
both operator paths (the existing `StoreAndForwardReplicationTests` only cover the sweep
|
||||
paths).
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### StoreAndForward-017 — Retry/Discard activity-log entries hard-code the `ExternalSystem` category
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:344`, `:358` |
|
||||
|
||||
**Description**
|
||||
|
||||
`RetryParkedMessageAsync` and `DiscardParkedMessageAsync` raise an S&F activity
|
||||
notification (consumed by Site Event Logging — WP-14) but pass a hard-coded
|
||||
`StoreAndForwardCategory.ExternalSystem` as the category argument:
|
||||
|
||||
```csharp
|
||||
RaiseActivity("Retry", StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} moved back to queue");
|
||||
RaiseActivity("Discard", StoreAndForwardCategory.ExternalSystem, $"Parked message {messageId} discarded");
|
||||
```
|
||||
|
||||
Both methods take only a `messageId` and never load the message, so they have no access
|
||||
to its real category. When an operator retries or discards a parked **Notification** or
|
||||
**CachedDbWrite** message, the site event log records the activity under
|
||||
`ExternalSystem`. Every other `RaiseActivity` call in the service passes the message's
|
||||
true `Category` (`EnqueueAsync`, `RetryMessageAsync`), so the operator paths are
|
||||
inconsistent and produce mislabelled audit entries — misleading when an operator later
|
||||
filters or reviews S&F activity by category.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Load the message (or have `StoreAndForwardStorage.RetryParkedMessageAsync` /
|
||||
`DiscardParkedMessageAsync` return the affected row's category) and pass the real
|
||||
`Category` to `RaiseActivity`. If loading the row is considered too costly on these
|
||||
infrequent operator paths, change the `OnActivity` event / `RaiseActivity` signature to
|
||||
allow a nullable category for management actions rather than asserting a false one.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user