fix(external-system-gateway): resolve ExternalSystemGateway-012,013,014 — failure logging, connection-limit wiring, test coverage; ExternalSystemGateway-011 flagged

This commit is contained in:
Joseph Doherty
2026-05-16 22:14:23 -04:00
parent e9ee4e3ea5
commit e57ccd78b7
7 changed files with 509 additions and 15 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 4 |
| Open findings | 1 |
## Summary
@@ -533,8 +533,8 @@ exception propagates; it was verified to fail before the `try/catch` was added.
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:231-245`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:90-97` |
| Status | Open — root cause confirmed; no correct in-module fix exists (see Resolution) |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:360-374`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:169-176` |
**Description**
@@ -554,7 +554,26 @@ rather than fetch-all-then-filter.
**Resolution**
_Unresolved._
2026-05-16 — **Root cause confirmed, but left Open: no correct fix is possible within
this module's edit scope.** `ResolveSystemAndMethodAsync`
(`ExternalSystemClient.cs:360`) does call `GetAllExternalSystemsAsync()` followed by
`GetMethodsByExternalSystemIdAsync()` and filters in memory, and
`ResolveConnectionAsync` (`DatabaseGateway.cs:169`) does `GetAllDatabaseConnectionsAsync()`
then filters — fetch-all-then-filter on every hot-path call, as described.
Both recommended fixes require changes outside `src/ScadaLink.ExternalSystemGateway`:
(a) a **name-keyed repository lookup** (e.g. `GetExternalSystemByNameAsync`) means adding
methods to `IExternalSystemRepository` in `ScadaLink.Commons` and implementing them in
`ScadaLink.ConfigurationDatabase` / `ScadaLink.SiteRuntime`; (b) an **in-memory cache
invalidated on artifact deployment** requires subscribing to a deployment-applied event
owned by `ScadaLink.SiteRuntime` / `ScadaLink.DeploymentManager`. A purely module-local
cache with a time-based TTL was rejected as a fix: definitions only change on deployment
and must reflect a deployment promptly, so a TTL would either be too short to help the
hot path or long enough to serve stale definitions after a redeploy — trading a
correctness hazard for a performance gain on a Low-severity issue. **Tracked follow-up:**
add a name-keyed lookup to `IExternalSystemRepository` (Commons) and have the gateway use
it, or add a deployment-invalidated definition cache wired from SiteRuntime. No source
change was made in this module.
### ExternalSystemGateway-012 — Permanent-failure logging requirement is not met; `_logger` is injected but unused
@@ -562,7 +581,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:24,169-177`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:22` |
**Description**
@@ -585,7 +604,21 @@ caller's responsibility and remove the unused `_logger` fields. Add a comment in
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text: the
"`_logger` injected but unused" claim is **stale** — both loggers are already used (the
`DeliverBufferedAsync` retry-sweep handlers added by `ExternalSystemGateway-001` log
park/error events). The genuine remaining gap — `InvokeHttpAsync` performing no logging
on the HTTP-failure paths — is now fixed: `InvokeHttpAsync` emits a
`_logger.LogWarning` on the permanent-failure path (status code, system, method,
truncated error body) so a permanent failure is visible in Site Event Logging as the
design requires, and a `_logger.LogDebug` on the transient-failure path (transient
failures are normal retry/S&F operation and must not be noisy at warning level). A
documentation comment was added to `ErrorClassifier.IsTransient(HttpStatusCode)`
explaining the "any other non-success status defaults to permanent" behaviour and why
permanent is the safe default. Regression tests: `Call_PermanentFailure_LogsAWarning`
(asserts a warning carrying the system name is emitted; verified to fail before the
`LogWarning` was added) and `Call_TransientFailure_DoesNotLogAtWarningOrAbove` (guards
against over-logging transient failures).
### ExternalSystemGateway-013 — `MaxConcurrentConnectionsPerSystem` and `DefaultHttpTimeout` options are defined but never used
@@ -593,7 +626,7 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemGatewayOptions.cs:9,12`, `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs:13` |
**Description**
@@ -614,7 +647,20 @@ options to avoid implying behaviour that does not exist.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text:
`DefaultHttpTimeout` is **no longer unused** — it became the effective per-call HTTP
round-trip limit when `ExternalSystemGateway-002` was fixed (`InvokeHttpAsync` builds a
linked `CancellationTokenSource(DefaultHttpTimeout)`). The genuinely-unused option,
`MaxConcurrentConnectionsPerSystem`, is now wired in: `AddExternalSystemGateway` adds a
`ConfigureHttpClientDefaults` registration that supplies a `SocketsHttpHandler` whose
`MaxConnectionsPerServer` is bound from the option, so it applies to every per-system
named client (`ExternalSystem_{name}`) the gateway creates rather than being silently
ignored. Regression test
`ServiceWiringTests.MaxConcurrentConnectionsPerSystem_IsAppliedToTheNamedHttpClientPrimaryHandler`
builds the DI container, resolves the named client's handler chain via
`IHttpMessageHandlerFactory`, walks to the primary handler and asserts
`SocketsHttpHandler.MaxConnectionsPerServer` equals the configured value; it was verified
to fail before the wiring was added.
### ExternalSystemGateway-014 — Cached-call buffering path and `DatabaseGateway` are untested
@@ -622,8 +668,8 @@ _Unresolved._
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientTests.cs:1`, (no `DatabaseGatewayTests.cs`) |
| Status | Resolved |
| Location | `tests/ScadaLink.ExternalSystemGateway.Tests/ExternalSystemClientTests.cs:1`, `tests/ScadaLink.ExternalSystemGateway.Tests/DatabaseGatewayTests.cs` |
**Description**
@@ -647,4 +693,29 @@ by asserting on the captured `HttpRequestMessage` in the mock handler.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit pending). Partial re-triage of the finding text: a number
of the listed gaps were **already closed** by tests added when findings 001010 were
fixed — `DatabaseGatewayTests.cs` now exists (not-found, null-S&F guard, buffered-write
delivery), and the `CachedCall` transient/buffering and permanent paths are covered by
the `ExternalSystemGateway-001/003/004/008` regression tests. The remaining genuine
coverage gaps are now closed with new tests:
- `Call_GetWithParameters_AppendsEscapedQueryString``BuildUrl` GET query-string
construction, asserting on the captured request URI and that an `&` inside a value is
percent-escaped.
- `Call_PostWithParameters_SendsJsonBody` — POST JSON-body construction.
- `Call_ApiKeyAuthWithDefaultHeader_SendsXApiKeyHeader`,
`Call_ApiKeyAuthWithCustomHeader_SendsNamedHeader`,
`Call_BasicAuth_SendsBase64AuthorizationHeader``ApplyAuth` for all three auth
variants, asserting on the captured request headers.
- `Call_ConnectionError_IsClassifiedAsTransient` — a connection-level
`HttpRequestException` is classified transient.
- `CachedWrite_BuffersTheWriteWithConnectionRetrySettings` and
`CachedWrite_ZeroMaxRetriesIsHonouredNotTreatedAsUnset` — the `DatabaseGateway`
`CachedWrite` happy-path enqueue against a real S&F service.
The shared `RequestCapturingHandler` test helper was extended to capture request
headers and body so URL/auth/body construction is now verified, not just status codes.
These are new-coverage tests against already-correct behaviour, so they pass on the
current source; the `BuildUrl` and `ApplyAuth` paths they exercise are now protected
against regression.