fix(store-and-forward): resolve S&F delivery + replication wiring (3 Critical findings)
Resolves StoreAndForward-001, ExternalSystemGateway-001, NotificationService-001 — one systemic gap where buffered messages were persisted but never delivered, and the active node never replicated its buffer to the standby. Delivery handlers (ExternalSystemGateway-001 / NotificationService-001): - AkkaHostedService registers delivery handlers for the ExternalSystem, CachedDbWrite and Notification categories after StoreAndForwardService starts; each resolves its scoped consumer in a fresh DI scope. - ExternalSystemClient, DatabaseGateway and NotificationDeliveryService each gain a DeliverBufferedAsync method: re-resolve the target and re-attempt delivery, returning true/false/throwing per the transient-vs-permanent contract. - EnqueueAsync gains an attemptImmediateDelivery flag; CachedCallAsync and NotificationDeliveryService.SendAsync pass false (they already attempted delivery themselves) so registering a handler does not dispatch twice. Replication (StoreAndForward-001): - ReplicationService is injected into StoreAndForwardService; a new BufferAsync helper replicates every enqueue, and successful-retry removes and parks are replicated too. Fire-and-forget, no-op when replication is disabled. Tests: StoreAndForwardReplicationTests (Add/Remove/Park observed), attemptImmediateDelivery behaviour, and DeliverBufferedAsync paths for each consumer. Full solution builds; StoreAndForward/ExternalSystemGateway/ NotificationService suites green.
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 14 |
|
||||
| Open findings | 13 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -53,7 +53,7 @@ requirements (timeout, retry settings) that are declared but not implemented.
|
||||
|--|--|
|
||||
| Severity | Critical |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:109`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:81` |
|
||||
|
||||
**Description**
|
||||
@@ -89,7 +89,19 @@ verifies it is delivered by a retry sweep.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16. Delivery handlers for `StoreAndForwardCategory.ExternalSystem` and
|
||||
`CachedDbWrite` are now registered at site startup in `AkkaHostedService`, after
|
||||
`StoreAndForwardService.StartAsync()`. Each handler resolves its consumer in a fresh DI
|
||||
scope and calls a new `DeliverBufferedAsync`: `ExternalSystemClient.DeliverBufferedAsync`
|
||||
re-resolves the system/method and re-invokes `InvokeHttpAsync`, and
|
||||
`DatabaseGateway.DeliverBufferedAsync` executes the buffered SQL — each returning `true`
|
||||
on success, `false` (park) when the target no longer exists or fails permanently, and
|
||||
throwing on transient failure so the engine retries. `EnqueueAsync` gained an
|
||||
`attemptImmediateDelivery` parameter; `CachedCallAsync` passes `false` so registering the
|
||||
handler does not dispatch the request twice (the double-dispatch noted in
|
||||
`ExternalSystemGateway-003`). Regression tests cover the success, target-removed and
|
||||
transient-retry paths. Fixed by the commit whose message references
|
||||
`ExternalSystemGateway-001`.
|
||||
|
||||
### ExternalSystemGateway-002 — Per-system call timeout is never applied to HTTP requests
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 12 |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -53,7 +53,7 @@ fallback in `DeliverAsync`, and concurrency on the token cache.
|
||||
|--|--|
|
||||
| Severity | Critical |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.NotificationService/NotificationDeliveryService.cs:96`, `src/ScadaLink.NotificationService/ServiceCollectionExtensions.cs:8` |
|
||||
|
||||
**Description**
|
||||
@@ -66,7 +66,15 @@ Register a delivery handler for `StoreAndForwardCategory.Notification` during st
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16. A delivery handler for `StoreAndForwardCategory.Notification` is now
|
||||
registered at site startup in `AkkaHostedService`. The handler resolves
|
||||
`NotificationDeliveryService` in a fresh DI scope and calls the new `DeliverBufferedAsync`,
|
||||
which re-resolves the list, recipients and SMTP config and re-attempts delivery —
|
||||
returning `true` on success, `false` (park) on permanent failure or missing
|
||||
configuration, and throwing on transient failure so the engine retries. `SendAsync` now
|
||||
buffers with `attemptImmediateDelivery: false` so registering the handler does not send
|
||||
the notification twice. Regression tests cover the happy path and the list-removed park
|
||||
path. Fixed by the commit whose message references `NotificationService-001`.
|
||||
|
||||
### NotificationService-002 — `TimeoutException`/`OperationCanceledException` misclassified as transient
|
||||
|
||||
|
||||
@@ -34,11 +34,11 @@ resolved and re-triaged.
|
||||
|
||||
| Severity | Open findings |
|
||||
|----------|---------------|
|
||||
| Critical | 3 |
|
||||
| Critical | 0 |
|
||||
| High | 46 |
|
||||
| Medium | 100 |
|
||||
| Low | 89 |
|
||||
| **Total** | **238** |
|
||||
| **Total** | **235** |
|
||||
|
||||
## Module Status
|
||||
|
||||
@@ -52,16 +52,16 @@ resolved and re-triaged.
|
||||
| [ConfigurationDatabase](ConfigurationDatabase/findings.md) | 2026-05-16 | `9c60592` | 0/1/4/6 | 11 | 11 |
|
||||
| [DataConnectionLayer](DataConnectionLayer/findings.md) | 2026-05-16 | `9c60592` | 0/4/6/2 | 12 | 13 |
|
||||
| [DeploymentManager](DeploymentManager/findings.md) | 2026-05-16 | `9c60592` | 0/3/6/5 | 14 | 14 |
|
||||
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-16 | `9c60592` | 1/2/7/4 | 14 | 14 |
|
||||
| [ExternalSystemGateway](ExternalSystemGateway/findings.md) | 2026-05-16 | `9c60592` | 0/2/7/4 | 13 | 14 |
|
||||
| [HealthMonitoring](HealthMonitoring/findings.md) | 2026-05-16 | `9c60592` | 0/2/5/5 | 12 | 12 |
|
||||
| [Host](Host/findings.md) | 2026-05-16 | `9c60592` | 0/1/3/7 | 11 | 11 |
|
||||
| [InboundAPI](InboundAPI/findings.md) | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 | 13 |
|
||||
| [ManagementService](ManagementService/findings.md) | 2026-05-16 | `9c60592` | 0/3/5/5 | 13 | 13 |
|
||||
| [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 1/3/5/3 | 12 | 12 |
|
||||
| [NotificationService](NotificationService/findings.md) | 2026-05-16 | `9c60592` | 0/3/5/3 | 11 | 12 |
|
||||
| [Security](Security/findings.md) | 2026-05-16 | `9c60592` | 0/3/4/4 | 11 | 11 |
|
||||
| [SiteEventLogging](SiteEventLogging/findings.md) | 2026-05-16 | `9c60592` | 0/4/4/3 | 11 | 11 |
|
||||
| [SiteRuntime](SiteRuntime/findings.md) | 2026-05-16 | `9c60592` | 0/3/8/5 | 16 | 16 |
|
||||
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 1/2/4/6 | 13 | 13 |
|
||||
| [StoreAndForward](StoreAndForward/findings.md) | 2026-05-16 | `9c60592` | 0/2/4/6 | 12 | 13 |
|
||||
| [TemplateEngine](TemplateEngine/findings.md) | 2026-05-16 | `9c60592` | 0/5/5/4 | 14 | 14 |
|
||||
|
||||
## Pending Findings
|
||||
@@ -71,13 +71,9 @@ Resolved findings drop off this list but remain recorded in their module's
|
||||
`findings.md` (see [REVIEW-PROCESS.md](REVIEW-PROCESS.md) §4–§5). Full detail —
|
||||
description, location, recommendation — lives in the module's `findings.md`.
|
||||
|
||||
### Critical (3)
|
||||
### Critical (0)
|
||||
|
||||
| ID | Module | Title |
|
||||
|----|--------|-------|
|
||||
| ExternalSystemGateway-001 | [ExternalSystemGateway](ExternalSystemGateway/findings.md) | No S&F delivery handler registered; cached calls and writes can never be delivered |
|
||||
| NotificationService-001 | [NotificationService](NotificationService/findings.md) | Buffered notifications are never retried (no S&F delivery handler) |
|
||||
| StoreAndForward-001 | [StoreAndForward](StoreAndForward/findings.md) | Replication to standby is never triggered by the active node |
|
||||
_None open._
|
||||
|
||||
### High (46)
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 13 |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -53,7 +53,7 @@ replication and retry-count issues are functional defects against the design.
|
||||
|--|--|
|
||||
| Severity | Critical |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.StoreAndForward/ReplicationService.cs:40`, `:53`, `:66`; `src/ScadaLink.StoreAndForward/StoreAndForwardService.cs:155`, `:212`, `:222`, `:236` |
|
||||
|
||||
**Description**
|
||||
@@ -81,7 +81,14 @@ asserts the replication handler observes each operation type.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16. `ReplicationService` is now injected into `StoreAndForwardService`
|
||||
(wired in `AddStoreAndForward`), and every buffer operation is forwarded to the standby:
|
||||
a new `BufferAsync` helper calls `ReplicateEnqueue` after each persist, `ReplicateRemove`
|
||||
runs after a successful retry removes a message, and `ReplicatePark` runs on both park
|
||||
paths. Replication stays fire-and-forget and is a no-op when `ReplicationEnabled` is
|
||||
false or no handler is wired. Regression tests `StoreAndForwardReplicationTests` assert
|
||||
the replication handler observes the Add, Remove and Park operations. Fixed by the
|
||||
commit whose message references `StoreAndForward-001`.
|
||||
|
||||
### StoreAndForward-002 — Messages enqueued with no registered handler are buffered but never deliverable
|
||||
|
||||
|
||||
Reference in New Issue
Block a user