fix(external-system-gateway): resolve ExternalSystemGateway-002/003 — apply HTTP call timeout, confirm CachedCall no double-dispatch

This commit is contained in:
Joseph Doherty
2026-05-16 19:40:40 -04:00
parent ab098bf6c8
commit 340a70f0e6
4 changed files with 208 additions and 10 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 13 |
| Open findings | 11 |
## Summary
@@ -109,7 +109,7 @@ transient-retry paths. Fixed by the commit whose message references
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:130`, `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs:13` |
**Description**
@@ -142,7 +142,24 @@ is classified as transient.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`). `InvokeHttpAsync` now enforces a call
timeout: `ExternalSystemClient` takes an `IOptions<ExternalSystemGatewayOptions>` and
links a `CancellationTokenSource(DefaultHttpTimeout)` with the caller's token before
`SendAsync` and the response-body read, so the design's "timeout applies to the HTTP
request round-trip" guarantee now holds within the configured window (default 30s)
instead of `HttpClient`'s default 100s. A timeout is reclassified as a
`TransientExternalSystemException`; a caller-initiated cancellation is distinguished
from a timeout and propagated as `OperationCanceledException` rather than being
swallowed as transient. Regression tests:
`Call_SlowSystem_TimesOutAsTransientErrorWithinConfiguredWindow` and
`Call_CallerCancellation_IsNotMisreportedAsTimeout`.
Note (partial scope): the per-*system* `Timeout` field on `ExternalSystemDefinition`
remains unimplemented — adding it requires a change to `ScadaLink.Commons`, which is
outside this module's edit scope. Until that entity field exists, the configured
`DefaultHttpTimeout` is the effective per-call limit for every system. A follow-up
against the Commons module should add the `Timeout` field and have `InvokeHttpAsync`
prefer it over the default. This is a tracked follow-up, not a regression.
### ExternalSystemGateway-003 — `CachedCall` double-dispatches the HTTP request
@@ -150,7 +167,7 @@ _Unresolved._
|--|--|
| Severity | High |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:84-117` |
**Description**
@@ -179,7 +196,18 @@ the duplicated logic.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `<pending>`). Re-triage: this finding was already fixed in
the codebase as a side effect of the `ExternalSystemGateway-001` fix and is no longer
reproducible against the current source. `StoreAndForwardService.EnqueueAsync` gained an
`attemptImmediateDelivery` parameter (recommendation approach (b)), and
`CachedCallAsync` passes `attemptImmediateDelivery: false` after its own first HTTP
attempt — so `EnqueueAsync` buffers the message for the background retry sweep without
re-invoking the registered delivery handler, eliminating the duplicate dispatch. A
dedicated regression test, `CachedCall_TransientFailure_DoesNotImmediatelyRedispatchViaRegisteredHandler`,
was added in this module's test suite: it registers a counting delivery handler, drives
a `CachedCall` whose HTTP attempt fails transiently, and asserts the handler is invoked
zero times during enqueue. The test was verified to fail if `attemptImmediateDelivery`
is flipped back to `true`.
### ExternalSystemGateway-004 — System retry settings are not honoured for cached calls/writes