fix(inbound-api): resolve InboundAPI-002,004,006,008 — disconnect vs timeout, body size limit, active-node gate; surface InboundAPI-007

This commit is contained in:
Joseph Doherty
2026-05-16 21:22:01 -04:00
parent 6563511b5f
commit da955042aa
10 changed files with 462 additions and 20 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 10 |
| Open findings | 6 |
## Summary
@@ -87,10 +87,10 @@ via `GetOrAdd` so concurrent first-callers share one handler. Regression tests
| | |
|--|--|
| Severity | Medium |
| Severity | Medium — re-triaged: already fixed by the InboundAPI-001 fix; verified and closed |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-129` |
| Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:152-161` |
**Description**
@@ -111,7 +111,17 @@ return the handler it produced rather than requiring a separate dictionary read.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `pending`): re-triage — verified against the current
source, this finding was **already fixed** by the InboundAPI-001 fix. The
`InboundScriptExecutor.cs:152-161` lazy-compile path no longer does check-then-act
re-read: `Compile(method)` runs unconditionally (it never reads the cache) and the
result is published via the atomic `_scriptHandlers.GetOrAdd(method.Name, compiled)`.
There is no separate dictionary indexer read, so the `KeyNotFoundException` race the
finding describes cannot occur, and concurrent first-callers all share the single
handler that `GetOrAdd` keeps. Regression test
`LazyCompile_RacingRemoveHandler_NeverThrowsKeyNotFound` added (asserts a concurrent
`RemoveHandler` storm against lazy-compiling callers never yields the catch-all
"Internal script error"); it passes against the current code, confirming the fix.
### InboundAPI-003 — API key compared with non-constant-time string equality
@@ -161,7 +171,7 @@ longer depends on it.
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:117-141` |
**Description**
@@ -185,7 +195,17 @@ or use a dedicated timeout `CancellationTokenSource` so the two are separable.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `pending`): `ExecuteAsync` now uses a dedicated timeout
`CancellationTokenSource` (`new CancellationTokenSource(timeout)`) linked with the
request-abort token, so the two cancellation sources are separable. The
`OperationCanceledException` handler reports "Script execution timed out" (and logs a
warning) **only** when the timeout CTS fired and the request token did not; a client
abort instead returns "Request cancelled by client" and logs at Debug — the failure
log stays reserved for genuine script-execution timeouts. `HandleInboundApiRequest`
additionally short-circuits with `Results.Empty` (no warning log, no 500 body write)
when `RequestAborted` is cancelled, since the connection is already gone. Regression
tests `ClientDisconnect_IsNotReportedAsTimeout` and `GenuineTimeout_StillReportedAsTimeout`
added.
### InboundAPI-005 — Compiled API scripts run with no script-trust-model enforcement
@@ -237,7 +257,7 @@ Regression tests `CompileAndRegister_ForbiddenApi_RejectsScript` (theory),
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:54-62` |
**Description**
@@ -260,16 +280,24 @@ Reject oversized bodies with 413 before buffering.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `pending`): added `InboundApiEndpointFilter`, an
`IEndpointFilter` applied to `POST /api/{methodName}` via `.AddEndpointFilter<>()`.
It rejects requests whose declared `Content-Length` exceeds `InboundApiOptions.
MaxRequestBodyBytes` (default 1 MiB) with HTTP 413 *before* the handler buffers the
body into a `JsonDocument`, and also lowers the per-request `IHttpMaxRequestBodySizeFeature`
cap so a chunked/unknown-length stream is cut off by Kestrel while being read. The
limit is configurable via the bound `ScadaLink:InboundApi` options section. Regression
tests `OversizedBody_ShortCircuitsWith413_AndDoesNotRunHandler`, `BodyAtLimit_RunsHandler`,
and `FilterCapsMaxRequestBodySizeFeature` added.
### InboundAPI-007 — `Database.Connection()` script API from the design doc is not implemented
| | |
|--|--|
| Severity | Medium |
| Severity | Medium — verified real drift; left Open pending a design decision (see Resolution) |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:155-170` |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:188-203` |
**Description**
@@ -289,7 +317,27 @@ API.
**Resolution**
_Unresolved._
_Unresolved — left Open; needs a design decision the resolving agent cannot make._
Re-triage 2026-05-16: confirmed against the current source — the drift is **real**.
`InboundScriptContext` (`InboundScriptExecutor.cs:188-203`) exposes only `Parameters`,
`Route`, and `CancellationToken`; there is no `Database` member, so a method script
following the documented `Database.Connection("name")` API fails to compile.
This finding cannot be closed by the InboundAPI module agent for two reasons:
1. **Scope** — the alternative resolution (deleting the "Database Access" section)
edits `docs/requirements/Component-InboundAPI.md`, which is outside the editable
scope (`src/ScadaLink.InboundAPI`, `tests/`, this file only).
2. **It is a genuine design decision.** Implementing `Database.Connection()` is not a
mechanical fix: it hands inbound API scripts a *raw* MS SQL client. The ScadaLink
script trust model (CLAUDE.md, Akka.NET conventions) forbids scripts from `System.IO`
and raw network access, and `ForbiddenApiChecker` (added for InboundAPI-005) now
statically blocks `System.Net`/`System.IO`. A raw `SqlConnection` is in clear
tension with that trust model, and the set of connection names a script may open,
read-only vs. read-write access, and connection lifetime/pooling all require a
design call. **Surface to the design owner:** decide whether `Database.Connection()`
is in scope — if yes, write a design note covering the trust-model carve-out and
then implement a `Database` member backed by a connection-factory service; if no,
delete the "Database Access" section from `Component-InboundAPI.md`.
### InboundAPI-008 — Inbound API endpoint not restricted to the active central node
@@ -297,7 +345,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:19-23`, `src/ScadaLink.Host/Program.cs:149` |
**Description**
@@ -318,7 +366,19 @@ treated.
**Resolution**
_Unresolved._
Resolved 2026-05-16 (commit `pending`): introduced `IActiveNodeGate`, an abstraction
the inbound API uses to ask whether this node is the active (cluster-leader) central
node. The new `InboundApiEndpointFilter` (applied to `POST /api/{methodName}`)
consults the gate and short-circuits a standby node with HTTP 503 before any
auth/script work, so Traefik/clients only reach the live node — consistent with
`/health/active`. The gate is resolved optionally: when no implementation is
registered (non-clustered host / tests) the endpoint defaults to "allow", preserving
prior behaviour. Regression tests `StandbyNode_ShortCircuitsWith503_AndDoesNotRunHandler`,
`ActiveNode_PassesGate_RunsHandler`, and `NoGateRegistered_PassesGate_RunsHandler`
added. **Follow-up (outside this module's scope):** `ScadaLink.Host` should register
an `IActiveNodeGate` implementation backed by `ActiveNodeHealthCheck` /
`Cluster.State.Leader` in the central-role branch of `Program.cs` so the gate is
actually enforced in production; until then the endpoint defaults to "allow".
### InboundAPI-009 — Failed compilation is retried on every subsequent request