code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97

Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.

regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
This commit is contained in:
Joseph Doherty
2026-05-28 02:55:47 -04:00
parent 1eb6e972b0
commit f93b7b99bb
25 changed files with 8793 additions and 115 deletions
+343 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.ExternalSystemGateway` |
| Design doc | `docs/requirements/Component-ExternalSystemGateway.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `39d737e` |
| Open findings | 0 |
| Commit reviewed | `1eb6e97` |
| Open findings | 6 |
## Summary
@@ -51,6 +51,36 @@ both substantive findings are second-order defects in earlier fixes — the earl
resolutions did not verify the downstream contract of the S&F engine they integrate
with.
#### Re-review 2026-05-28 (commit `1eb6e97`)
All seventeen prior findings (001017) remain `Resolved`; spot-checks against the
current source confirm the fixes still hold. Between `39d737e` and this re-review the
only source changes to the module are the documentation-only commit `1eb6e97` (XML
doc additions) and the `executionId` / `sourceScript` / `parentExecutionId` plumbing
threaded through `CachedCallAsync` / `CachedWriteAsync` to the S&F enqueue (Audit Log
#23 Tasks 4/6). The re-review walked the full 10-category checklist again and
surfaced **six new findings**, none Critical. The most serious
(`ExternalSystemGateway-018`, High) is that `DeliverBufferedAsync` on both
`ExternalSystemClient` and `DatabaseGateway` lets a `JsonException` from
`JsonSerializer.Deserialize` propagate out of the delivery handler — the S&F engine
treats any thrown exception as a transient retry, so a corrupted or
schema-incompatible buffered row becomes a permanent poison message that is retried
on every sweep forever (the same retry-forever class of hazard `-015` already
addressed for a different cause). `ExternalSystemGateway-019` (Medium) is that
`HttpClient.Timeout` is never set, so any operator-configured `DefaultHttpTimeout`
greater than 100s is silently clipped by `HttpClient`'s built-in 100s default and the
gateway's "timeout applies to the HTTP request round-trip" guarantee no longer
holds — a partial reopen of the `-002` contract for the long-timeout case.
`ExternalSystemGateway-020` (Medium) is a silent precision-loss bug in the cached-DB-write
retry path: `JsonElementToParameterValue` collapses any JSON number that is not
Int64-convertible to `double`, so a script's `decimal` SQL parameter is downcast on
retry and only on retry. The remaining three (`-021`/`-022`/`-023`, Low) are an
unauthenticated-by-default `ApplyAuth` for unknown `AuthType` / malformed Basic config,
runtime-only HTTP-verb validation, and an undocumented PATCH HTTP method (code vs
design-doc drift). Theme: every new finding is in a code path that was added or
touched by the earlier fix bundle but whose error-propagation contract was not
verified end-to-end against the S&F engine or the design doc.
## Checklist coverage
| # | Category | Examined | Notes |
@@ -66,6 +96,21 @@ with.
| 9 | Testing coverage | ☑ | Coverage is broad after finding 014. Re-review note: the `ZeroMaxRetries...` tests assert the persisted column, not the sweep outcome, and so lock in the finding-015 defect. |
| 10 | Documentation & comments | ☑ | Inline comments at `ExternalSystemClient.cs:118-119` / `DatabaseGateway.cs:99-101` assert a "never retry" semantic that the code does not deliver — see finding 015. |
_Re-review (2026-05-28, `1eb6e97`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `JsonException` not caught in either `DeliverBufferedAsync`, so a corrupt buffered payload becomes a permanent poison-message retried forever — finding 018. `JsonElementToParameterValue` collapses a non-Int64 number to `double`, silently losing precision for `decimal` SQL parameters on cached-write retry — finding 020. `new HttpMethod(method.HttpMethod)` accepts any string at runtime, so an invalid HTTP verb is only diagnosed at call time — finding 022. |
| 2 | Akka.NET conventions | ☑ | Still no actors in this module; `AddExternalSystemGatewayActors` remains a no-op. The cached-call lifecycle/audit emission lives in `ScriptRuntimeContext` / `CachedCallTelemetryForwarder` (SiteRuntime / AuditLog), not here, and that boundary is correct. No issues found. |
| 3 | Concurrency & thread safety | ☑ | Services are still stateless and DI-scoped; the S&F delivery handlers resolve in a fresh DI scope on the sweep thread. The added `executionId` / `sourceScript` / `parentExecutionId` plumbing flows through method arguments only — no shared state introduced. No findings. |
| 4 | Error handling & resilience | ☑ | The poison-payload retry-forever path is the headline resilience issue (finding 018). `HttpClient.Timeout` not being set leaves the gateway's per-call round-trip cap clipped to the framework's 100s default whenever the configured `DefaultHttpTimeout` is larger — finding 019 (partial reopen of the `-002` contract). |
| 5 | Security | ☑ | Auth secrets still never logged; error bodies still truncated. `ApplyAuth` is silent on unknown `AuthType` / empty `AuthConfiguration` / malformed Basic config — finding 021 (fail-open is a real but bounded risk; recorded Low because misconfiguration is the precondition). Connection-string handling in `DatabaseGateway` reads from the entity verbatim and never logs it. |
| 6 | Performance & resource management | ☑ | Disposal paths from findings 005/010 still hold. The `IHttpClientFactory` name-keyed-options registration (finding 016 fix) creates a fresh `SocketsHttpHandler` per primary-handler build — acceptable because `IHttpClientFactory` recycles handlers. No new findings. |
| 7 | Design-document adherence | ☑ | The design doc enumerates GET/POST/PUT/DELETE but the code also serializes a body for PATCH (and accepts arbitrary HTTP verbs at runtime) — finding 023 (drift to be reconciled). The per-call timeout guarantee is partially defeated by the unset `HttpClient.Timeout` for option values > 100s — finding 019. |
| 8 | Code organization & conventions | ☑ | The `-016` fix replaced `ConfigureHttpClientDefaults` with a scoped `IConfigureNamedOptions<HttpClientFactoryOptions>` — verified clean, no new conventions issue. `internal virtual CreateConnection` (DatabaseGateway) and `internal InvokeHttpAsync` (ExternalSystemClient) are exposed via `InternalsVisibleTo` for tests — acceptable. No new findings. |
| 9 | Testing coverage | ☑ | The `JsonException` deserialization path for `DeliverBufferedAsync` is untested; the `JsonElementToParameterValue` `double`-downcast path is untested; `ApplyAuth`'s unknown-AuthType / empty-config / malformed-Basic branches are untested. Recorded under findings 018 / 020 / 021 rather than a standalone coverage finding. |
| 10 | Documentation & comments | ☑ | XML doc additions in `1eb6e97` are accurate and consistent. PATCH support is undocumented in the design doc (finding 023). The inline `ExternalSystemGateway-015` block-comment in `CachedCallAsync` (lines 126133) and the equivalent in `DatabaseGateway.cs:106113` now correctly describe the "treat 0 as unset" semantics. |
## Findings
### ExternalSystemGateway-001 — No S&F delivery handler registered; cached calls and writes can never be delivered
@@ -951,3 +996,298 @@ method whose effective parameter set is empty produces a URL identical to the
no-parameters case. Regression test
`Call_GetWithAllNullParameters_DoesNotAppendTrailingQuestionMark` asserts the
captured request URI has no trailing `?`; it was verified to fail before the fix.
### ExternalSystemGateway-018 — `DeliverBufferedAsync` lets `JsonException` propagate, turning a corrupt buffered row into a permanent retry-forever poison message
| | |
|--|--|
| Severity | High |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:176`, `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:151` |
**Description**
Both `ExternalSystemClient.DeliverBufferedAsync` and `DatabaseGateway.DeliverBufferedAsync`
begin with an unguarded `JsonSerializer.Deserialize<...>(message.PayloadJson)`:
```csharp
var payload = JsonSerializer.Deserialize<CachedCallPayload>(message.PayloadJson);
if (payload == null || string.IsNullOrEmpty(payload.SystemName) || ...) {
_logger.LogError("... unreadable payload; parking.");
return false;
}
```
The "unreadable payload; parking" branch is only entered when `Deserialize` *succeeds*
and produces a null / partially-empty object. If `PayloadJson` is **malformed JSON**
the column was truncated mid-write, an older payload schema is being deserialized into a
newer record, or storage corruption occurred — `Deserialize` throws `JsonException`
before that check is ever reached. The exception propagates out of the delivery handler.
The Store-and-Forward retry loop treats *any* thrown exception from a delivery handler
as a transient failure (only a returned `false` parks the message); see
`StoreAndForwardService.RetryMessageAsync`. Combined with the `MaxRetries == 0`
"unset → bounded default" fix from `-015`, the resulting behaviour is:
1. Corrupt payload arrives in the buffer.
2. Every retry sweep deserializes, throws `JsonException`, increments `RetryCount`.
3. The message is retried until `RetryCount >= MaxRetries`, then parked — *only* if
`MaxRetries > 0` is configured (which `-015` already established is not the default
site configuration today). With the bounded S&F default it does eventually park, but
it park-loops noisily for `DefaultMaxRetries` iterations first; without that bound it
retries forever.
4. The script is unaware — the cached call was returned `WasBuffered: true` long ago.
This is the same "poison message buffered forever" class of hazard that
`ExternalSystemGateway-001` (no-handler) and `ExternalSystemGateway-015` (MaxRetries==0)
already removed for their own causes; corrupt JSON is an alternative arrival path into
the same bad state.
The `DatabaseGateway.DeliverBufferedAsync` path has the same shape and the same defect:
`JsonSerializer.Deserialize<CachedWritePayload>` at line 151 is not guarded.
**Recommendation**
Wrap the `Deserialize` call in a `try/catch (JsonException)` block in both
`DeliverBufferedAsync` methods. A `JsonException` is by definition a permanent failure —
re-running the same deserialization against the same payload will produce the same
exception — so the catch should log at `LogError` and **return `false`** so the S&F
engine parks the message rather than retrying. Add regression tests that feed a
malformed `PayloadJson` to each handler and assert `delivered == false` (i.e. the
message parks) and that no exception escapes the handler.
### ExternalSystemGateway-019 — `HttpClient.Timeout` is not set; `DefaultHttpTimeout` > 100s is silently clipped by the framework default
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:226,257-264`, `src/ScadaLink.ExternalSystemGateway/ServiceCollectionExtensions.cs:90-102` |
**Description**
The `-002` fix enforces the per-call timeout via a linked `CancellationTokenSource`
built from `_options.DefaultHttpTimeout` and passed into `SendAsync`. That correctly
caps every call to *at most* the configured value when `DefaultHttpTimeout` ≤ 100s.
However, `HttpClient.Timeout` (the framework default) is never set on either the named
client or its primary handler — the `GatewayHttpClientConfigurator` only sets
`MaxConnectionsPerServer`. `HttpClient.Timeout` defaults to **100 seconds**, and
`SendAsync` enforces it internally by cancelling its own private CTS, raising a
`TaskCanceledException` from `SendAsync` *without* cancelling either the caller's token
or the gateway's `timeoutCts`.
Consequences when an operator configures `DefaultHttpTimeout` to anything > 100s
(a legitimate setting for external systems with long-running endpoints — recipe
exports, large queries):
1. The gateway's `timeoutCts` (e.g. 5 minutes) has not yet fired.
2. `HttpClient.Timeout` fires at 100s, `SendAsync` throws.
3. Neither `when (cancellationToken.IsCancellationRequested)` nor
`when (timeoutCts.IsCancellationRequested)` matches, so the exception falls into
the generic `catch (Exception ex) when (ErrorClassifier.IsTransient(ex))` branch
(line 277) and is re-thrown as a `TransientExternalSystemException` with the
message `"Connection error to {Name}: A task was canceled."` — misattributing a
timeout as a connection error.
4. The configured 5-minute round-trip window the design doc promises ("Each external
system definition specifies a timeout that applies to all method calls on that
system" / "applies to the HTTP request round-trip") is silently overridden.
The opposite case (`DefaultHttpTimeout` < 100s) is the only one the `-002` regression
test exercises (200ms), so the defect is not caught by the existing suite.
**Recommendation**
Set `HttpClient.Timeout = Timeout.InfiniteTimeSpan` on the gateway's named clients via
the existing `GatewayHttpClientConfigurator` (delegate `HttpClientActions` rather than
just `HttpMessageHandlerBuilderActions`), so the cancellation-token mechanism is the
sole timeout source. The linked `timeoutCts` then reliably enforces
`DefaultHttpTimeout` for every value, and the timeout-vs-cancellation classification at
lines 266276 stays accurate. Add a regression test that configures `DefaultHttpTimeout`
to ~150s, hangs the handler, and asserts the call times out at the configured value
and produces a `"Timeout calling..."` (not `"Connection error to..."`) error.
### ExternalSystemGateway-020 — `JsonElementToParameterValue` silently downcasts non-Int64 JSON numbers to `double`, losing precision for `decimal` SQL parameters on retry
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/DatabaseGateway.cs:185-193` |
**Description**
`DatabaseGateway.JsonElementToParameterValue` materialises the buffered cached-write
SQL parameter values during a retry-sweep delivery:
```csharp
private static object JsonElementToParameterValue(JsonElement element) => element.ValueKind switch
{
JsonValueKind.String => (object?)element.GetString() ?? DBNull.Value,
JsonValueKind.Number => element.TryGetInt64(out var l) ? l : element.GetDouble(),
...
};
```
For a JSON number, the helper attempts `Int64` first and otherwise returns a `double`.
There is no `decimal` branch. The immediate-attempt path is unaffected — `CachedWriteAsync`
on the original call serializes the script-provided typed parameters via
`JsonSerializer.Serialize(new { ConnectionName, Sql, Parameters = parameters })` and
executes the SQL directly outside this code path. But the **retry path** runs through
`DeliverBufferedAsync``JsonElementToParameterValue`, so a script that submitted
a `decimal` value (e.g. `123.4567890123m`) gets:
1. Immediate attempt: `decimal` parameter, full precision (or, more accurately, the
value never enters this helper because cached writes today never re-execute on the
immediate path — but on the retry path it does).
2. Retry attempt(s) after a transient failure: the value is deserialized as a JSON
number, fails `TryGetInt64`, and is downcast to `double` — which has ~1517 digits
of precision against `decimal`'s 2829. A SQL column of type `decimal(18, 6)` or
`numeric` receives a value that has been truncated to `double` precision before
parameter binding.
Two further consequences worth recording:
- The downcast is **silent** — there is no log, no error, and the cached-write
acknowledgement to the script has long since happened. Data drift between a
same-call immediate-success delivery and a same-call retry delivery is the worst
shape of "looks like the right value but isn't" defect.
- For SCADA telemetry (process variables, totals, currency-denominated quality
reports) `decimal` is the correct CLR type and `double`'s representation error
changes the persisted value.
**Recommendation**
Replace the `Number` branch with a precision-preserving cascade — try `Int64`, then
`decimal` (`element.TryGetDecimal(out var d) ? d : element.GetDouble()`), and only
fall back to `double` when even `decimal` fails. Add a regression test against
`DatabaseGateway.DeliverBufferedAsync` that buffers a write with a high-precision
`decimal` parameter, drives the delivery, and asserts the SQL parameter bound is a
`decimal` (or compares the round-tripped value to the original at the parameter level)
rather than a `double` with truncated precision. The same Number-branch decision should
be reviewed against `JsonValueKind.True`/`False`/`Null` (currently fine) and a string
that happens to encode a number (already correctly returns `string`).
### ExternalSystemGateway-021 — `ApplyAuth` silently sends an unauthenticated request on unknown `AuthType`, empty `AuthConfiguration`, or malformed Basic config
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:385-415` |
**Description**
`ApplyAuth` has three fail-open paths that all result in an HTTP request being sent
**without** the credential the operator configured:
1. Line 387 — `if (string.IsNullOrEmpty(system.AuthConfiguration)) return;` returns
early regardless of `AuthType`. A system entity with `AuthType = "apikey"` but an
empty `AuthConfiguration` (e.g. the secret column failed to deploy, or the
protector key changed and decryption produced `""`) sends every request with no
`X-API-Key` header — the gateway is silent.
2. The `switch` has no `default` arm. A system entity with `AuthType = "bearer"`,
`"oauth2"`, a typo like `"ApiKey "` (trailing space) or even `"none"` falls off the
`switch` and the request is sent without any auth header — again silent.
3. Line 408 — `if (basicParts.Length == 2)` skips the auth attach when
`AuthConfiguration` for `basic` lacks a `:` separator. The request is sent with no
`Authorization` header.
Effectively the gateway treats every misconfiguration as "send anonymously" and
relies on the remote system rejecting it with a 401/403. That is a defensible default
on its own, but combined with `-007`'s 2 KB error-body cap and the fact that no audit
or warning is emitted, an operator debugging "why does my external system always
return 401" has nothing to go on inside ScadaLink — the gateway never says it failed
to apply auth. For `AuthType = "none"` (the design's expected sentinel for
unauthenticated systems) the fall-through is correct; the failure mode is misconfig.
**Recommendation**
Add a `default:` arm to the `switch` that logs `_logger.LogWarning(...)` naming the
unknown `AuthType` and the system, and emit a similar warning when
`AuthConfiguration` is empty for an `AuthType` of `"apikey"` or `"basic"` (those
require a value; `"none"` does not). For Basic auth specifically, the
`basicParts.Length != 2` branch should also warn. Do **not** include the
`AuthConfiguration` value in the log message — secrets must stay out of the log
(consistent with the existing module). A small set of `ApplyAuth` unit tests
verifying the warning emission and that no `Authorization` / `X-API-Key` header is
ever leaked in the warning text would close the test gap as well.
### ExternalSystemGateway-022 — `new HttpMethod(method.HttpMethod)` accepts any string at runtime; an invalid HTTP verb fails only at call time
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:233` |
**Description**
`InvokeHttpAsync` constructs the request method directly from the string column:
`new HttpRequestMessage(new HttpMethod(method.HttpMethod), url)`. `System.Net.Http.HttpMethod(string)`
performs only a token-character validation (it rejects whitespace and control chars
but accepts arbitrary non-standard tokens like `"FOO"` or `"GIT"`). The body-vs-query
selection at lines 239250 explicitly checks for POST/PUT/PATCH; for any other
non-standard verb (`"FOO"`) the parameters silently go to neither body nor query
string and the request is dispatched anyway.
The design doc enumerates GET/POST/PUT/DELETE as the supported set. There is no
validation at deployment time, at definition save time, or at gateway
resolution time that `method.HttpMethod` is one of the expected verbs. An operator
who typos `"DLETE"` discovers the issue only when a script invokes that method and
the remote server rejects the request — usually as a 4xx that the gateway classifies
as permanent, which is correct but obscures the root cause.
**Recommendation**
Validate `method.HttpMethod` at gateway entry — either with a small `switch` of
allowed verbs in `InvokeHttpAsync` that throws `PermanentExternalSystemException` for
an unsupported verb (cheap, immediate, surfaces a clear error to the script), or by
adding a validation pass in the Template/Deployment Manager so it can never reach
the gateway. The first option is local to this module and cheaper to land. Either
way, the canonical list should agree with `BuildUrl`'s query-vs-body decision (which
currently knows about POST/PUT/PATCH for body and GET/DELETE for query — note PATCH
is in the body branch but not the design-doc list; see finding 023).
### ExternalSystemGateway-023 — PATCH HTTP method is supported by code but absent from the design doc; body-vs-query decision drifts from the documented set
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.ExternalSystemGateway/ExternalSystemClient.cs:241`, `docs/requirements/Component-ExternalSystemGateway.md:43` |
**Description**
The component design doc lists the supported HTTP methods as `GET, POST, PUT, or
DELETE` (line 43: `**HTTP method**: GET, POST, PUT, or DELETE.`). `InvokeHttpAsync`'s
body-serialization branch at lines 239250 explicitly includes `PATCH` alongside POST
and PUT — so PATCH is in fact supported (and routes parameters into the JSON body),
but operators reading the spec would not know it. Conversely, `BuildUrl`'s
query-string branch at lines 364366 lists only `GET` and `DELETE`, so a PATCH
method's parameters always go to the body, matching the body-branch but not appearing
anywhere in the documented contract.
This is mild drift — the code is more permissive than the spec. It only becomes a
real issue if a future change relies on the documented "only GET/POST/PUT/DELETE"
set and breaks the PATCH path silently, or if PATCH is genuinely out of scope and a
template author defines a PATCH method on purpose only to learn later it is
unsupported.
**Recommendation**
Pick one direction and apply it in the same session, per the project's "design doc +
code travel together" rule:
- If PATCH is intentionally supported, add `PATCH` to the Component-ExternalSystemGateway.md
HTTP-method list (line 43) and add a parameterised test confirming a PATCH method
sends its parameters in the JSON body and resolves like POST/PUT for error
classification.
- If PATCH is not in scope, remove `method.HttpMethod.Equals("PATCH", ...)` from the
body branch in `InvokeHttpAsync` and let finding-022's verb validation reject it.
The design-doc list then remains the single source of truth.