code-review: 2026-05-28 baseline re-review of all 23 modules at 1eb6e97

Re-applies the full 10-category checklist to every src/ project — including
first-time reviews of the four newer components (AuditLog, NotificationOutbox,
SiteCallAudit, Transport) — so the code-reviews/ index reflects today's
codebase rather than the 2026-05-16 baseline. 172 new Open findings (0
Critical, 18 High, 62 Medium, 92 Low); 481 findings total across 23 modules.

regen-readme.py now derives each module's Last reviewed + Commit from its
findings.md header instead of hard-coding 2026-05-16 / 9c60592, so future
single-module re-reviews show their own date in the Module Status table.
This commit is contained in:
Joseph Doherty
2026-05-28 02:55:47 -04:00
parent 1eb6e972b0
commit f93b7b99bb
25 changed files with 8793 additions and 115 deletions
+389 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.InboundAPI` |
| Design doc | `docs/requirements/Component-InboundAPI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `39d737e` |
| Open findings | 0 |
| Commit reviewed | `1eb6e97` |
| Open findings | 8 |
## Summary
@@ -64,6 +64,66 @@ statement that the timeout covers routed calls (InboundAPI-016); and (4) `RouteH
| 9 | Testing coverage | ☑ | Re-review: `RouteHelper`/`RouteTarget` (WP-4 routing) entirely untested (InboundAPI-017); validators/executor/filter well covered. |
| 10 | Documentation & comments | ☑ | `ApiKeyValidationResult.NotFound` XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013). |
#### Re-review 2026-05-28 (commit `1eb6e97`)
All 17 prior findings remain `Resolved`. The module has grown materially since the
last pass — a new `AuditWriteMiddleware` (Audit Log #23 M4 Bundle D) now lives under
`src/ScadaLink.InboundAPI/Middleware/`, the `ApiKeyValidator` was rewired to hash the
candidate with `IApiKeyHasher` (ConfigurationDatabase-012), and an `IInstanceRouter`
seam was introduced. This re-review re-walked all 10 checklist categories against
`1eb6e97` and surfaced **8 new findings** concentrated on the new audit middleware
and a stranded follow-up from InboundAPI-008:
1. The InboundAPI-008 resolution explicitly deferred registering an `IActiveNodeGate`
implementation in `ScadaLink.Host` as a "follow-up outside this module's scope" —
that follow-up is still unfulfilled (no production registration anywhere in
`src/ScadaLink.Host/`), so the design-mandated standby-node gating is silently
disabled in production today (`InboundAPI-022`, High).
2. `AuditWriteMiddleware` is wired in `Program.cs` against `/api/*` rather than the
specific `POST /api/{methodName}` route, so GETs against `/api/audit/query` and
`/api/audit/export` (audit query endpoints — themselves not script invocations)
now emit spurious `AuditChannel.ApiInbound`/`InboundRequest` rows back into the
audit log with `Target` set to the last path segment (`InboundAPI-025`, Medium).
3. The middleware fires its audit write as `_ = _auditWriter.WriteAsync(evt)` — the
wrapping try/catch only catches synchronous throws, so a faulted async writer
task is unobserved and the row silently disappears with no log line
(`InboundAPI-018`, Low/Medium).
4. `ParentExecutionId` correlation flows only through `RouteToCallRequest`
`RouteToGetAttributesRequest`/`RouteToSetAttributesRequest` have no
`ParentExecutionId` field, so attribute reads/writes from inbound scripts lose
the inbound→site execution-tree link the Audit Log decision in CLAUDE.md
describes (`InboundAPI-021`, Medium).
5. `EndpointExtensions.HandleInboundApiRequest` — the entire wiring composition
that ties validator/executor/route/audit together — has no test coverage; only
the components it composes are tested (`InboundAPI-023`, Low).
6. `EndpointExtensions.HandleInboundApiRequest` does
`ContentType?.Contains("json")` (case-sensitive) so a request with
`application/JSON` and no Content-Length silently skips JSON body parsing
(`InboundAPI-020`, Low).
7. `AuditWriteMiddleware.InvokeAsync` calls `EnableBuffering()` unconditionally
before the empty-body short-circuit, allocating a `FileBufferingReadStream` for
every request including bodyless ones (`InboundAPI-019`, Low).
Severity mix: 1 High, 3 Medium, 4 Low — no Critical. (The eighth finding —
`InboundAPI-024`, Low — is a defensive watch-list item flagging that
`_knownBadMethods` is unbounded; it is bounded *in practice* today by the
configuration database, but the invariant is undocumented.)
## Checklist coverage — 2026-05-28 (commit `1eb6e97`)
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `ContentType?.Contains("json")` is case-sensitive (InboundAPI-020). |
| 2 | Akka.NET conventions | ☑ | ASP.NET-hosted, no actors of its own; routes via `IInstanceRouter`/`CommunicationService`. No new issues. |
| 3 | Concurrency & thread safety | ☑ | `ConcurrentDictionary` handler cache (post-001/002 fix). New audit middleware is per-request scoped, no shared mutable state. No new issues. |
| 4 | Error handling & resilience | ☑ | Audit `WriteAsync` is fire-and-forget; async faults are unobserved (InboundAPI-018). |
| 5 | Security | ☑ | `IActiveNodeGate` not registered in Host — standby-node gating disabled in production (InboundAPI-022). |
| 6 | Performance & resource management | ☑ | `EnableBuffering()` unconditional on bodyless requests (InboundAPI-019); audit middleware wraps `Response.Body` and mints `ExecutionId` for non-script /api routes (InboundAPI-025). |
| 7 | Design-document adherence | ☑ | `ParentExecutionId` not stamped on attribute-read/write routed messages (InboundAPI-021). InboundAPI-008's deferred Host registration still unfulfilled (InboundAPI-022). |
| 8 | Code organization & conventions | ☑ | No new issues. |
| 9 | Testing coverage | ☑ | `EndpointExtensions.HandleInboundApiRequest` composition wiring has no test (InboundAPI-023); middleware/filter/validator/executor/route are individually covered. |
| 10 | Documentation & comments | ☑ | No new issues. |
## Findings
### InboundAPI-001 — Singleton script handler cache mutated without synchronization
@@ -844,3 +904,329 @@ now depends on `IInstanceLocator` + `IInstanceRouter` (both substitutable). Adde
for each routed method, `GetAttribute` delegating to the batch `GetAttributes` and
returning `null` for an absent key, `SetAttribute` delegating to `SetAttributes`, and
the InboundAPI-016 deadline-token inheritance behaviour. All 15 pass.
### InboundAPI-018 — `AuditWriteMiddleware` fires `WriteAsync` as `_ = task` — faulted async writes are unobserved
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs:257` |
**Description**
`EmitInboundAudit` calls `_ = _auditWriter.WriteAsync(evt);` — the returned `Task` is
discarded with the discard operator inside a synchronous `try` block. The wrapping
`try/catch (Exception ex)` (lines 198266) only catches a *synchronous* throw before
the writer returns a task. Once `WriteAsync` returns a task, any exception that
faults that task (e.g. a DB timeout in the central audit writer, a serialization
failure, a cancellation that bubbles up) is never observed: it is not logged, it
does not increment the `CentralAuditWriteFailures` health-monitoring counter the
design doc references ("Fail-soft semantics" paragraph), and the audit row is
silently lost. In .NET, unobserved task exceptions are eventually surfaced via
`TaskScheduler.UnobservedTaskException` and may also be logged by the runtime —
either way, the middleware itself has no control over what (if anything) happens
on a fault. The XML doc comment at line 255 claims "the writer itself swallows"
but this is an implicit cross-component contract: the abstraction
`ICentralAuditWriter.WriteAsync` returns `Task` and makes no such guarantee, and
the only test that exercises a throwing writer (`AuditWriter_Throws_*` in
`AuditWriteMiddlewareTests.cs`) uses an `OnWrite` callback that throws
*synchronously*, not asynchronously — so the async fault path is not covered by
tests either.
This matters because Component-InboundAPI.md states that audit-emission failures
must increment `CentralAuditWriteFailures` (Health Monitoring #11) — a counter
that, with the current fire-and-forget, will under-count async-faulted writes.
**Recommendation**
Either (a) await the write and rely on the surrounding try/catch to log the
failure, accepting an extra await on the request hot path; or (b) keep the
fire-and-forget for latency but attach a `ContinueWith(t => ..., OnlyOnFaulted)`
that logs the fault and increments the failure counter, so a faulted async write
is at least observed. Option (b) preserves "audit emission never blocks the HTTP
response" while restoring the visibility the design assumes. Add a regression
test using a writer whose `WriteAsync` returns a faulted `Task` (not a
synchronous throw) to pin the new contract.
### InboundAPI-019 — `EnableBuffering()` called unconditionally on every request, including bodyless requests
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs:141` |
| Status | Open |
**Description**
`InvokeAsync` always calls `ctx.Request.EnableBuffering()` before the empty-body
short-circuit at `ReadBufferedRequestBodyAsync` line 289 (`if (request.ContentLength
is 0) return (null, false);`). `EnableBuffering()` swaps the request stream for a
`FileBufferingReadStream` whose construction allocates an internal buffer (default
threshold ~30 KB before spilling to a temp file) regardless of whether the request
actually has a body. The /api scope this middleware lives under will see at least
some bodyless requests (e.g. GET `/api/audit/query` once that route is in the same
branch — see InboundAPI-025; future health checks; misbehaving clients) and each
one pays the buffering allocation cost for no benefit.
**Recommendation**
Defer the `EnableBuffering()` call into `ReadBufferedRequestBodyAsync` after the
`ContentLength is 0` check, or short-circuit in `InvokeAsync` before enabling
buffering when `ContentLength is 0` and `Method is "GET" or "HEAD" or "DELETE"`.
The win is a per-request `FileBufferingReadStream` allocation avoided on every
bodyless request through the middleware.
### InboundAPI-020 — `ContentType.Contains("json")` is case-sensitive; `application/JSON` with no Content-Length skips body parsing
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:70` |
**Description**
`HandleInboundApiRequest` parses the JSON body only when
`httpContext.Request.ContentLength > 0 || httpContext.Request.ContentType?.Contains("json") == true`.
The `string.Contains(string)` overload used here is case-sensitive — a perfectly
valid HTTP header `Content-Type: application/JSON` (uppercase) would yield
`false` (`"application/JSON".Contains("json")` is `false`). With no
Content-Length (e.g. chunked transfer-encoding) and an uppercase content type,
the handler then leaves `body = null` and `ParameterValidator.Validate` runs
against a missing body — so a method that declares any required parameter is
rejected with "Missing required parameters" even though the caller did send a
well-formed JSON body. HTTP RFC 7230 §3.2 makes header field names case-insensitive
but is silent on values; in practice clients do sometimes uppercase media-type
tokens, and the framework's own `MediaTypeHeaderValue` is case-insensitive.
**Recommendation**
Use the case-insensitive overload —
`httpContext.Request.ContentType?.Contains("json", StringComparison.OrdinalIgnoreCase) == true`
— or rely on the framework's `IsJson` check via
`MediaTypeHeaderValue.TryParse`/`HttpRequest.HasJsonContentType()`. Add a
regression test posting with `application/JSON` and Transfer-Encoding: chunked.
### InboundAPI-021 — `ParentExecutionId` correlation flows only through `Call`; attribute reads/writes lose the inbound→site execution-tree link
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/RouteHelper.cs:141-143`, `:182-183`, `:225-226`; `src/ScadaLink.Commons/Messages/InboundApi/RouteToInstanceRequest.cs:15-21`, `:36-40`, `:55-59` |
**Description**
CLAUDE.md's Centralized Audit Log section describes `ParentExecutionId` as the
cross-execution spawn pointer that "every row of a spawned run carries" and
specifically calls out "the inbound API → routed-site-script case". The current
implementation honours this only on `RouteToCallRequest` — which carries
`ParentExecutionId` as its trailing additive field (line 21 of
`RouteToInstanceRequest.cs`) and is stamped by `RouteTarget.Call` with the
inbound request's execution id at line 143 of `RouteHelper.cs`.
`RouteToGetAttributesRequest` and `RouteToSetAttributesRequest`, however, have
**no `ParentExecutionId` field** and the matching `RouteTarget.GetAttributes` /
`SetAttributes` methods (`RouteHelper.cs:182-183`, `:225-226`) never reference
`_parentExecutionId`. So when an inbound API script reads or writes a site
attribute via `Route.To("inst").GetAttribute(...)` /
`Route.To("inst").SetAttribute(...)`, the site-side audit row for that
trust-boundary action (an outbound-by-the-script DB / OPC write at the site) is
emitted with `ParentExecutionId = null` and the execution-tree walk
`IX_AuditLog_ParentExecution` cannot link it back to the spawning inbound
request. The two-row pair (inbound + spawned site work) reverts to the
"top-level / null" state the design says is the *fallback* for non-spawned runs.
The asymmetry between `Call` and `GetAttributes`/`SetAttributes` is also surprising
— a script author would reasonably expect the same correlation across all
`Route.To(...)` calls.
**Recommendation**
Add a trailing `Guid? ParentExecutionId = null` field to
`RouteToGetAttributesRequest` and `RouteToSetAttributesRequest` (additive
trailing member, matches the message-evolution rule in CLAUDE.md); stamp it
from `_parentExecutionId` in `RouteTarget.GetAttributes` and
`RouteTarget.SetAttributes`; have the site-side handlers thread the field onto
their emitted audit rows. Add a `RouteHelperTests` regression case asserting
that an attribute read/write carries the inherited `ParentExecutionId`.
### InboundAPI-022 — `IActiveNodeGate` has no production registration in Host — standby-node gating is silently disabled in production
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/IActiveNodeGate.cs`, `src/ScadaLink.InboundAPI/InboundApiEndpointFilter.cs:52-60`; absent from `src/ScadaLink.Host/Program.cs` |
**Description**
InboundAPI-008's resolution adds `IActiveNodeGate` (lines 1724 of
`IActiveNodeGate.cs`) so a standby central node can refuse to serve the inbound
API. `InboundApiEndpointFilter.InvokeAsync` consults the gate at line 52
(`var gate = httpContext.RequestServices.GetService<IActiveNodeGate>();`), and
when `gate is { IsActiveNode: false }` returns HTTP 503. The filter's behaviour
when **no implementation is registered** (line 51 comment) is to fall through and
serve the request — the resolution paragraph for InboundAPI-008 closes with:
> "Follow-up (outside this module's scope): `ScadaLink.Host` should register an
> `IActiveNodeGate` implementation backed by `ActiveNodeHealthCheck` /
> `Cluster.State.Leader` in the central-role branch of `Program.cs` so the gate is
> actually enforced in production; until then the endpoint defaults to "allow"."
A grep of the entire `src/ScadaLink.Host/` tree at `1eb6e97` finds **zero**
`IActiveNodeGate` registrations: `grep -rn "IActiveNodeGate\|AddSingleton.*ActiveNode"
src/ScadaLink.Host/` returns no matches. The follow-up was never carried out. So
in production today the standby central node still serves the inbound API exactly
as InboundAPI-008 described — executes method scripts, runs `Route.To()` calls,
races the active node, and may operate against stale singleton state. The new
infrastructure (interface + filter check) is present but unwired; from the user's
perspective the original High-severity issue is unresolved in deployed binaries.
The design says the inbound API is "Central cluster only (active node)" and
"fails over with it" — this guarantee is not currently enforced in production.
**Recommendation**
Register an `IActiveNodeGate` implementation in the central-role branch of
`ScadaLink.Host/Program.cs`. The natural backing is the existing
`ActiveNodeHealthCheck` (already wired for `/health/active`) or a direct read of
`Cluster.Get(actorSystem).State.Leader == Cluster.Get(actorSystem).SelfAddress`.
Add an integration test in the Host that spins up the central role and asserts
that the gate is resolvable and returns `IsActiveNode` consistent with cluster
leader state. Until that wiring lands, this finding is the user-facing
realisation of the InboundAPI-008 vulnerability.
### InboundAPI-023 — `EndpointExtensions.HandleInboundApiRequest` composition wiring has no test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:31-140`, `tests/ScadaLink.InboundAPI.Tests/` |
**Description**
The endpoint handler `HandleInboundApiRequest` is the wiring composition that
ties the validator → JSON parse → `ParameterValidator``InboundScriptExecutor`
result-serialization path together; it is the single piece of code that maps
validator status codes to HTTP responses, threads the `parentExecutionId` from
`HttpContext.Items` into the executor, stashes the resolved API key name as
`AuditActorItemKey`, and emits the request-aborted short-circuit. The test
project covers each composed component (`ApiKeyValidatorTests`,
`ParameterValidatorTests`, `InboundScriptExecutorTests`, `RouteHelperTests`,
`InboundApiEndpointFilter`, `AuditWriteMiddlewareTests`,
`MiddlewareOrderTests`) but no test exercises `HandleInboundApiRequest` itself —
so regressions in the wiring (e.g. forgetting to stash the actor name on
`HttpContext.Items`, the `Contains("json")` case sensitivity from
InboundAPI-020, or accidentally swapping `validationResult.StatusCode` for a
literal) are not caught.
**Recommendation**
Add an `EndpointExtensionsTests` suite using `TestServer` (the same pattern
`MiddlewareOrderTests` uses) covering: the happy path (200 + body), invalid
JSON (400), validator 401, validator 403, parameter-validation failure (400),
script-failure 500, client-aborted short-circuit (`Results.Empty`), and the
actor-stash invariant (HttpContext.Items[AuditActorItemKey] is set with the
resolved key name after successful auth, but is absent on auth failures).
### InboundAPI-024 — `_knownBadMethods` is unbounded — an attacker can grow the cache by spamming distinct method names against the audit middleware path
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:30`, `:77`, `:223`, `:233` |
**Description**
The InboundAPI-009 fix introduced `_knownBadMethods`, a `ConcurrentDictionary<string, byte>`
of method names whose Roslyn compilation failed, to short-circuit lazy
recompilation. It is keyed by `method.Name` and entries are only ever removed
when `CompileAndRegister` succeeds for the same name (line 83). Practically the
key space is bounded by the configured method definitions in the database, so
this is bounded in normal operation. But because the cache is mutated from the
lazy-compile path at `ExecuteAsync.cs:233`, and `ExecuteAsync` is called from
`HandleInboundApiRequest` only **after** `ApiKeyValidator.ValidateAsync` has
returned `Valid` (i.e. a real method exists), the entry is keyed by a name that
must have already been resolved through `GetMethodByNameAsync` — so this attack
surface is gated by the configuration database. The finding is therefore mostly
defensive: there is no rate limit on inbound API calls (deliberate design), so
if a future change ever causes `ExecuteAsync` to be called for an unvalidated
caller-supplied method name (e.g. a refactor that moves method-existence
checking later), this cache would become attacker-controllable.
**Recommendation**
Optional / defensive: cap `_knownBadMethods` (e.g. an LRU with a fixed size, or
clear it periodically). At minimum, document the invariant in the executor's
XML comment that `_knownBadMethods` keys must come from validated
`ApiMethod.Name` values, so the safety property survives future refactors. No
immediate change required; this is a watch-list item.
### InboundAPI-025 — `AuditWriteMiddleware` runs against the entire `/api/*` branch — emits spurious `ApiInbound` audit rows for `/api/audit/query` and `/api/audit/export`
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.Host/Program.cs:183-185`; consumers: `src/ScadaLink.ManagementService/AuditEndpoints.cs:93-94`; emitter: `src/ScadaLink.InboundAPI/Middleware/AuditWriteMiddleware.cs:175-252` |
**Description**
`Program.cs` wires the audit middleware as
`app.UseWhen(ctx => ctx.Request.Path.StartsWithSegments("/api"), branch => branch.UseAuditWriteMiddleware())`
— scoped to the `/api` *prefix*, not to the `POST /api/{methodName}` route.
Meanwhile, `ScadaLink.ManagementService/AuditEndpoints.cs` maps
`MapGet("/api/audit/query", ...)` (line 93) and `MapGet("/api/audit/export", ...)`
(line 94). Both routes therefore inherit `AuditWriteMiddleware`, which emits an
`AuditEvent { Channel = AuditChannel.ApiInbound, Kind = AuditKind.InboundRequest, ... }`
row for every call. The middleware's `ResolveMethodName` falls back to the last
path segment (lines 446452), so a GET `/api/audit/query?...` is recorded as if a
caller had invoked an inbound API method named "query"; an export is recorded
as method "export". Effects:
1. **Audit log is polluted with non-script rows.** The audit log is now
recording its own query traffic as if it were inbound script invocations,
contradicting Component-AuditLog.md's scope ("script trust boundary actions").
2. **Audit reads recursively emit audit writes.** Every audit-log query (e.g.
from the Central UI Audit Log page or the CLI `audit query` command) writes
an additional row into `AuditLog`, growing the table on read.
3. **`Target` is meaningless.** `/api/audit/query` has no method definition, so
the recorded `Target = "query"` is not joinable to any `ApiMethod` row in
audit-log drill-ins.
4. **Wasted resources on health probes / management calls.** Any future routes
added under `/api/` will inherit the middleware and pay the
`EnableBuffering`, `CapturedResponseStream`, and `JsonSerializer.Serialize`
costs even though they are not inbound script invocations.
Tests for the audit middleware (`AuditWriteMiddlewareTests`) and pipeline order
(`MiddlewareOrderTests`) wire the middleware only against the
`POST /api/{methodName}` route in test hosts, so this production-only
mis-scoping is not exercised.
**Recommendation**
Tighten the predicate so the middleware runs only on the inbound API method
route, not on the `/api/` prefix. Options:
- `app.UseWhen(ctx => ctx.Request.Path.StartsWithSegments("/api") && !ctx.Request.Path.StartsWithSegments("/api/audit") && !ctx.Request.Path.StartsWithSegments("/api/management"), ...)`
— defensive, but fragile to future route additions.
- Move the audit emission from a pipeline middleware to an `IEndpointFilter`
applied via `.AddEndpointFilter<>()` on the `MapInboundAPI` registration
(alongside `InboundApiEndpointFilter`). This makes the scope explicit on the
one route that needs it and survives future `/api/...` route additions
unchanged.
The endpoint-filter form is the recommended fix — it co-locates the audit-emission
scope with the route definition and matches how InboundAPI-006/008 gating is
already wired.