fix(external-system-gateway): resolve ExternalSystemGateway-004..010 — honour retry settings, dispose HTTP messages, fix URL building, truncate error bodies, fix connection leak
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 11 |
|
||||
| Open findings | 4 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -215,7 +215,7 @@ is flipped back to `true`.
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:114-115`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:86-87` |
|
||||
|
||||
**Description**
|
||||
@@ -242,7 +242,21 @@ should be tracked against the SiteRuntime module.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `CachedCallAsync` and `CachedWriteAsync` now pass
|
||||
the definition's `MaxRetries` to `EnqueueAsync` verbatim — the `> 0` guard is dropped, so
|
||||
a legitimately-configured `MaxRetries` of 0 ("never retry") is honoured instead of being
|
||||
collapsed to the S&F default. The `RetryDelay > TimeSpan.Zero` guard is deliberately
|
||||
**kept**: `TimeSpan.Zero` is the entity default for an unconfigured field and a literal
|
||||
zero-delay retry loop is not a valid configuration, so falling back to the S&F default
|
||||
interval for an unset delay is correct (only `MaxRetries == 0` is a meaningful operator
|
||||
choice). Regression test `CachedCall_TransientFailure_ZeroMaxRetriesIsHonouredNotTreatedAsUnset`
|
||||
buffers a transient failure and asserts the buffered message carries `MaxRetries == 0`
|
||||
rather than the S&F default; `CachedCall_TransientFailure_BuffersWithSystemRetrySettings`
|
||||
additionally covers a non-default settings pass-through. The companion fix in
|
||||
`SiteExternalSystemRepository.MapExternalSystem` to actually read the `MaxRetries` /
|
||||
`RetryDelay` columns from SQLite remains a tracked follow-up against the SiteRuntime
|
||||
module (outside this module's edit scope) — until then, sites still supply
|
||||
`MaxRetries == 0`, which this fix now correctly honours as "never retry".
|
||||
|
||||
### ExternalSystemGateway-005 — `HttpRequestMessage` and `HttpResponseMessage` are not disposed
|
||||
|
||||
@@ -250,7 +264,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:133-167` |
|
||||
|
||||
**Description**
|
||||
@@ -272,15 +286,22 @@ occurs on the exception paths.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `InvokeHttpAsync` now declares the request as
|
||||
`using var request` and wraps all response handling in a `using (response)` block, so
|
||||
both `IDisposable` instances (and the request's `StringContent` / the response content
|
||||
stream) are released on the success path **and** on the permanent/transient
|
||||
exception paths. Regression tests `Call_SuccessfulHttp_DisposesRequestAndResponse` and
|
||||
`Call_PermanentFailure_StillDisposesRequestAndResponse` use a disposal-tracking
|
||||
`HttpMessageHandler`/`HttpContent` and assert both the request and the response content
|
||||
are disposed; both were verified to fail before the `using` wrappers were added.
|
||||
|
||||
### ExternalSystemGateway-006 — `BuildUrl` ignores path templates and appends a trailing slash for empty paths
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium — partially re-triaged: trailing-slash bug fixed; path-templating sub-issue is a design decision (see Resolution) |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:180-196` |
|
||||
|
||||
**Description**
|
||||
@@ -305,7 +326,25 @@ example and state that paths are literal. Also avoid appending a trailing `/` wh
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). The **trailing-slash bug** is fixed: `BuildUrl`
|
||||
now appends a `/`-joined path segment only when the method's path is non-empty after
|
||||
trimming, so a method targeting the base URL itself produces `https://host/api` rather
|
||||
than `https://host/api/`. Regression tests `Call_MethodWithEmptyPath_DoesNotAppendTrailingSlash`
|
||||
and `Call_MethodWithPath_BuildsExpectedUrl` (asserting on the captured request URI)
|
||||
cover the empty-path and normal-path cases; the empty-path test was verified to fail
|
||||
before the fix.
|
||||
|
||||
Re-triage of the **path-templating sub-issue** (`{id}` placeholder substitution): this
|
||||
is a genuine design decision, not a code bug, and it requires editing the component
|
||||
design doc — both outside this module's edit scope (`src/`, `tests/`, this file only).
|
||||
The current code treats method paths as literal strings and routes parameters to the
|
||||
query string (GET/DELETE) or JSON body (POST/PUT); a method authored as `/recipes/{id}`
|
||||
sends the `{id}` token verbatim. **Tracked follow-up / surfaced design question:** the
|
||||
design owner must decide whether path templating is in scope — if yes, implement
|
||||
`{name}` substitution in `BuildUrl` and exclude substituted params from the
|
||||
query/body; if no, the `Component-ExternalSystemGateway.md` `/recipes/{id}` example must
|
||||
be changed to a literal path. The trailing-slash defect (the concrete correctness bug
|
||||
in this finding) is fully resolved.
|
||||
|
||||
### ExternalSystemGateway-007 — External error response bodies are echoed verbatim into script-visible error messages
|
||||
|
||||
@@ -313,7 +352,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:167-177` |
|
||||
|
||||
**Description**
|
||||
@@ -335,15 +374,25 @@ the script. Optionally only include the body when the content type is textual.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `InvokeHttpAsync` now truncates the external error
|
||||
response body to `MaxErrorBodyChars` (2048) via a `Truncate` helper before embedding it
|
||||
into the transient/permanent exception message — so a misbehaving or hostile endpoint
|
||||
can no longer inflate every script-visible `ErrorMessage` and Site Event Logging entry
|
||||
with a multi-megabyte body. When truncation occurs the message is suffixed with
|
||||
`… [truncated, N chars total]` so the original size is still visible. Regression test
|
||||
`Call_PermanentFailureWithHugeErrorBody_TruncatesErrorMessage` drives a 400 with a
|
||||
500 KB body and asserts the resulting `ErrorMessage` is bounded (< 4096 chars); it was
|
||||
verified to fail (500 040-char message) before the cap was added. Content-type
|
||||
filtering was considered optional in the recommendation and was not implemented — the
|
||||
size cap alone closes the inflation/disclosure vector.
|
||||
|
||||
### ExternalSystemGateway-008 — Cancellation is conflated with transient timeout failure
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium — re-triaged: root cause already fixed in current source (see Resolution) |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ErrorClassifier.cs:24-30`, `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:157-159` |
|
||||
|
||||
**Description**
|
||||
@@ -367,15 +416,37 @@ classification. Only treat a cancellation as a timeout when the supplied token i
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). **Re-triage:** the root cause described — a
|
||||
caller-initiated cancellation being misclassified as a transient failure — is **no
|
||||
longer present in the current source** and is not reproducible. `InvokeHttpAsync`
|
||||
already wraps both `SendAsync` and the response-body `ReadAsStringAsync` in ordered
|
||||
`catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)`
|
||||
filters that rethrow the caller's cancellation *before* the
|
||||
`catch (Exception ex) when (ErrorClassifier.IsTransient(ex))` branch is ever reached
|
||||
(this was added alongside the `ExternalSystemGateway-002` timeout fix). A caller-cancel
|
||||
therefore propagates as `OperationCanceledException` and is never buffered; only the
|
||||
gateway's own timeout token reclassifies as transient.
|
||||
|
||||
`ErrorClassifier.IsTransient(Exception)` does still return `true` for
|
||||
`TaskCanceledException`/`OperationCanceledException`, but that is **correct and
|
||||
intentional**: a `TaskCanceledException` raised by an HTTP timeout *is* a genuine
|
||||
transient failure, and the only caller (`InvokeHttpAsync:238`) is unreachable for a
|
||||
caller-cancellation because the two preceding `when`-filtered catches intercept it
|
||||
first. The transient-vs-cancel decision is contextual (which token fired) and cannot
|
||||
be made from the exception type alone, which is exactly why the call site does it.
|
||||
No source change was required. A regression guard,
|
||||
`CachedCall_CallerCancellation_IsNotBufferedAsTransient`, was added: it cancels the
|
||||
caller token mid-`CachedCall` and asserts an `OperationCanceledException` is thrown and
|
||||
the S&F buffer remains empty (the cancelled work is not retried). The existing
|
||||
`Call_CallerCancellation_IsNotMisreportedAsTimeout` covers the synchronous `Call` path.
|
||||
|
||||
### ExternalSystemGateway-009 — `StoreAndForwardResult` from `EnqueueAsync` is discarded; permanent failures during buffering are swallowed
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium — re-triaged: root cause subsumed by the ExternalSystemGateway-003 dispatch redesign (see Resolution) |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:109-117` |
|
||||
|
||||
**Description**
|
||||
@@ -398,7 +469,27 @@ partly subsumed by the dispatch redesign in finding 003.)
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). **Re-triage:** the stated root cause — "a
|
||||
permanent failure surfaced by `EnqueueAsync`'s immediate-delivery attempt is silently
|
||||
lost" — **can no longer occur** in the current source, and the dead `sfResult` variable
|
||||
the finding cites has already been removed. The `ExternalSystemGateway-003` fix changed
|
||||
`CachedCallAsync` to call `EnqueueAsync` with `attemptImmediateDelivery: false`. With
|
||||
that flag, `EnqueueAsync` never invokes the registered delivery handler: it skips the
|
||||
immediate-delivery block entirely (so the `StoreAndForwardResult(false, …, …)`
|
||||
permanent-failure return at `StoreAndForwardService.cs:147` is unreachable from this
|
||||
caller) and unconditionally buffers, returning `Accepted: true, WasBuffered: true`
|
||||
(`StoreAndForwardService.cs:180`). The `ExternalCallResult(true, null, null,
|
||||
WasBuffered: true)` that `CachedCallAsync` returns is therefore now factually correct
|
||||
in every reachable case — the message *is* buffered and there is no swallowed permanent
|
||||
failure. Permanent HTTP 4xx failures are still surfaced synchronously, because
|
||||
`CachedCallAsync` makes its own first HTTP attempt and catches
|
||||
`PermanentExternalSystemException` *before* it ever reaches `EnqueueAsync`. No source
|
||||
change was required beyond the `ExternalSystemGateway-003` redesign that already landed.
|
||||
Coverage: `CachedCall_TransientFailure_BuffersWithSystemRetrySettings` asserts both
|
||||
`result.WasBuffered == true` and that the message is genuinely present in the S&F buffer
|
||||
(depth == 1), confirming the `WasBuffered: true` claim is not a lie; the existing
|
||||
`CachedCall` permanent-failure path is exercised by `Call_Permanent400_ReturnsPermanentError`
|
||||
semantics shared via `InvokeHttpAsync`.
|
||||
|
||||
### ExternalSystemGateway-010 — `GetConnectionAsync` leaks the `SqlConnection` when `OpenAsync` fails
|
||||
|
||||
@@ -406,7 +497,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:48-50` |
|
||||
|
||||
**Description**
|
||||
@@ -427,7 +518,14 @@ Wrap the open in a try/catch that disposes the connection before rethrowing:
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `GetConnectionAsync` now wraps `OpenAsync` in a
|
||||
`try/catch` that calls `await connection.DisposeAsync()` before rethrowing, so a failed
|
||||
open (unreachable server, bad credentials, cancellation) no longer leaks the
|
||||
`SqlConnection`. Connection creation was extracted into an `internal virtual
|
||||
CreateConnection(string)` factory so the failure path is unit-testable. Regression test
|
||||
`GetConnection_OpenFails_DisposesConnectionBeforeRethrowing` substitutes a `DbConnection`
|
||||
whose `OpenAsync` always throws and asserts the connection is disposed when the
|
||||
exception propagates; it was verified to fail before the `try/catch` was added.
|
||||
|
||||
### ExternalSystemGateway-011 — Every call performs a full repository scan of all systems and methods
|
||||
|
||||
|
||||
Reference in New Issue
Block a user