Files
scadalink-design/code-reviews/InboundAPI/findings.md
Joseph Doherty 977d7369a7 docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
2026-05-16 18:09:09 -04:00

443 lines
19 KiB
Markdown

# Code Review — InboundAPI
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.InboundAPI` |
| Design doc | `docs/requirements/Component-InboundAPI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 13 |
## Summary
The InboundAPI module is small (8 source files) and the happy-path flow — extract
key, validate, deserialize parameters, execute script, serialize result — is clean
and readable. However the review surfaced several real problems concentrated in two
themes: **concurrency** and **security**. The `InboundScriptExecutor` is a singleton
that mutates a plain `Dictionary` from concurrent ASP.NET request threads with no
synchronization, which can corrupt the handler cache or crash the process under load.
On the security side, API-key comparison is a non-constant-time database string
match (timing oracle), compiled scripts run with no enforcement of the documented
script trust model (forbidden APIs such as `System.IO`/`Process`/`Reflection` are
fully reachable), there is no request-body size limit, and the executor's catch-all
swallows `OperationCanceledException` from genuine client disconnects as a "timeout".
Design-doc adherence is also incomplete: the `Database.Connection()` script API
described in the design doc is entirely absent from `InboundScriptContext`, and the
endpoint never enforces that the API is central-only. Testing covers the validators
well but there is no coverage of the HTTP endpoint, concurrency, or recompilation.
None of the findings are data-loss-class, but the concurrency and trust-model issues
are High severity and should be addressed before production use.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `CoerceValue` returns `null` for legitimately-null/`String` values indistinguishably; parameter-definition edge cases noted. |
| 2 | Akka.NET conventions | ☑ | Module is ASP.NET-hosted, no actors of its own; routes to actors via `CommunicationService`. No correlation-ID issues — IDs are set in `RouteHelper`. |
| 3 | Concurrency & thread safety | ☑ | Singleton `InboundScriptExecutor` mutates a non-thread-safe `Dictionary` from concurrent request threads — see InboundAPI-001/002. |
| 4 | Error handling & resilience | ☑ | Catch-all conflates client cancellation with timeout (InboundAPI-004); compilation-failure path repeats work on every request (InboundAPI-009). |
| 5 | Security | ☑ | Non-constant-time key comparison, no trust-model enforcement, no body-size limit, missing-method enumeration oracle — see InboundAPI-003/005/006/011. |
| 6 | Performance & resource management | ☑ | Up to 3 separate DB round-trips per request in `ApiKeyValidator`; uncapped lazy recompilation. |
| 7 | Design-document adherence | ☑ | `Database.Connection()` script API missing; central-only hosting not enforced; lazy-compile diverges from "compiled at startup". |
| 8 | Code organization & conventions | ☑ | `ParameterDefinition` is an API-shaped POCO declared in the component project rather than Commons; otherwise conventions followed. |
| 9 | Testing coverage | ☑ | Good unit coverage of the two validators; no endpoint, concurrency, recompilation, or timeout-vs-cancel tests. |
| 10 | Documentation & comments | ☑ | `ApiKeyValidationResult.NotFound` XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013). |
## Findings
### InboundAPI-001 — Singleton script handler cache mutated without synchronization
| | |
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:17`, `:32`, `:40`, `:89`, `:123-128` |
**Description**
`InboundScriptExecutor` is registered as a singleton (`ServiceCollectionExtensions.cs:11`)
and its handler cache is a plain `Dictionary<string, Func<...>>` (`InboundScriptExecutor.cs:17`).
`RegisterHandler`, `RemoveHandler`, `CompileAndRegister`, and the lazy-compile path in
`ExecuteAsync` all read and write this dictionary with no lock. ASP.NET serves inbound
API requests on concurrent thread-pool threads, so two requests for an as-yet-uncompiled
method (or a request racing a CLI-triggered `CompileAndRegister`) can mutate the
dictionary concurrently. `Dictionary` is explicitly not safe for concurrent
read/write — this can corrupt internal buckets, throw `InvalidOperationException`,
or return a torn/`null` handler, crashing the request or the process.
**Recommendation**
Replace the `Dictionary` with a `ConcurrentDictionary<string, Func<...>>`, or guard all
access with a lock. For the lazy-compile path use `GetOrAdd` so concurrent first-callers
compile at most once.
**Resolution**
_Unresolved._
### InboundAPI-002 — Lazy compilation is a check-then-act race with no atomicity
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-129` |
**Description**
`ExecuteAsync` does `if (!_scriptHandlers.TryGetValue(...)) { CompileAndRegister(method); handler = _scriptHandlers[method.Name]; }`.
Even setting aside the unsynchronized dictionary (InboundAPI-001), this is a
check-then-act sequence: between `TryGetValue` failing and the re-read on line 128,
another thread could `RemoveHandler` the entry, causing the indexer on line 128 to
throw `KeyNotFoundException` — an unhandled-in-context exception that is then caught
only by the broad catch on line 143 and reported to the caller as "Internal script
error". Multiple concurrent first-callers will also each compile the same script
redundantly (wasted Roslyn work).
**Recommendation**
Make compile-and-fetch a single atomic operation (`ConcurrentDictionary.GetOrAdd`
with a lazily-evaluated factory, or a per-method lock), and have `CompileAndRegister`
return the handler it produced rather than requiring a separate dictionary read.
**Resolution**
_Unresolved._
### InboundAPI-003 — API key compared with non-constant-time string equality
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:22-23`, consumed by `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:33` |
**Description**
API-key authentication resolves the key with
`FirstOrDefaultAsync(k => k.KeyValue == keyValue)` — an ordinary equality match
translated to a SQL `WHERE KeyValue = @p` comparison. The secret is matched with
ordinary (early-exit) string/SQL comparison rather than a constant-time comparison,
which is a classic timing side-channel for secret material. Combined with the design's
explicit "no rate limiting" decision, an attacker with network access to the central
API can mount a timing attack to recover valid keys. The API key is the *sole*
credential for the inbound API, so this is the primary authentication path.
**Recommendation**
Look the key up by a non-secret indexed identifier (e.g. a key prefix/id) or fetch
candidate rows, then verify the secret in-process using
`CryptographicOperations.FixedTimeEquals` over the UTF-8 bytes. Preferably store only
a salted hash of the key value and compare hashes. Avoid leaking secret-length and
match-position timing.
**Resolution**
_Unresolved._
### InboundAPI-004 — Client disconnect is misreported as a script timeout
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:117-141` |
**Description**
`ExecuteAsync` creates a linked CTS from `httpContext.RequestAborted` and the method
timeout, then catches `OperationCanceledException` and unconditionally returns
"Script execution timed out". When the *client* aborts the request (`RequestAborted`
fires), the same exception type is thrown, so a normal client disconnect is logged as
a timeout (`_logger.LogWarning("Script execution timed out ...")`) and an attempt is
made to write a 500 timeout body to an already-gone connection. This pollutes the
failure log (which the design says is reserved for genuine script errors) and obscures
real timeout incidents.
**Recommendation**
Distinguish the two cancellation sources: if `cancellationToken` (the request token)
is cancelled, treat it as a client abort — do not log a timeout and do not attempt to
write a response. Only when the timeout CTS fired should the result be "timed out".
Check `cts.Token.IsCancellationRequested && !cancellationToken.IsCancellationRequested`
or use a dedicated timeout `CancellationTokenSource` so the two are separable.
**Resolution**
_Unresolved._
### InboundAPI-005 — Compiled API scripts run with no script-trust-model enforcement
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:56-93` |
**Description**
CLAUDE.md's Akka.NET conventions state the script trust model forbids `System.IO`,
`Process`, `Threading`, `Reflection`, and raw network access. `CompileAndRegister`
compiles arbitrary C# with `CSharpScript.Create` and only restricts the *default
imports* (`WithImports("System", ...)`). Imports are a convenience, not a sandbox — a
script can still fully-qualify any type (`System.IO.File.Delete(...)`,
`System.Diagnostics.Process.Start(...)`, `System.Reflection`, raw `Socket`) because
the core framework assemblies are referenced and Roslyn scripting performs no API
allow/deny-listing. Inbound API scripts execute on the central node with the host
process's privileges, so a malicious or buggy method definition has full host access.
Note the Design role authors these scripts (less trusted than Admin), making
enforcement material.
**Recommendation**
Add a compile-time analyzer/`SyntaxWalker` (as the Site Runtime does for instance
scripts) that rejects forbidden namespaces/types before registering a handler, and/or
run scripts under a constrained boundary. At minimum, share the Site Runtime's
forbidden-API checker so the trust model is enforced consistently. Reject the method
(and log) when a violation is found instead of registering it.
**Resolution**
_Unresolved._
### InboundAPI-006 — No request body size limit on the inbound endpoint
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:54-62` |
**Description**
`HandleInboundApiRequest` calls `JsonDocument.ParseAsync(httpContext.Request.Body, ...)`
with no explicit body-size cap and no `[RequestSizeLimit]`/endpoint metadata. Although
Kestrel has a default max request body size, this endpoint accepts arbitrary JSON from
external systems, fully buffers it into a `JsonDocument`, and then `Clone()`s the
root element (`:61`) which materializes the entire document on the heap. With no rate
limiting (a deliberate design choice) a single caller can drive large allocations.
Deep/wide JSON also makes the `CoerceValue` `object`/`list` deserialization
(`ParameterValidator.cs:113,117`) expensive.
**Recommendation**
Set an explicit, modest body-size limit on the endpoint
(`.WithMetadata(new RequestSizeLimitAttribute(...))` or
`IHttpMaxRequestBodySizeFeature`) and consider a `JsonDocumentOptions` `MaxDepth`.
Reject oversized bodies with 413 before buffering.
**Resolution**
_Unresolved._
### InboundAPI-007 — `Database.Connection()` script API from the design doc is not implemented
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:155-170` |
**Description**
`Component-InboundAPI.md` ("Script Runtime API -> Database Access") specifies
`Database.Connection("connectionName")` as an available script capability for
querying the configuration/machine-data databases. `InboundScriptContext` exposes only
`Parameters`, `Route`, and `CancellationToken` — there is no `Database` member. Any
method script that follows the documented API will fail to compile. Either the code
is incomplete or the design doc is stale; the two must be reconciled.
**Recommendation**
If database access is in scope, add a `Database` property to `InboundScriptContext`
backed by a connection-factory service. If it is not, remove the "Database Access"
section from `Component-InboundAPI.md` so the design doc stops advertising an absent
API.
**Resolution**
_Unresolved._
### InboundAPI-008 — Inbound API endpoint not restricted to the active central node
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:19-23`, `src/ScadaLink.Host/Program.cs:149` |
**Description**
The design states the Inbound API is "Central cluster only (active node)" and "fails
over with it". `MapInboundAPI` registers `POST /api/{methodName}` unconditionally, and
`Program.cs` maps it inside the central-role branch but with no active-node gating —
unlike `/health/active` which has an `active-node` predicate. A standby central node
will happily serve inbound API calls, executing scripts and `Route.To()` calls from a
non-leader, which can race the active node or run against stale singleton state.
**Recommendation**
Gate the endpoint on active-node status (reuse the cluster `active-node` health check
or a leader-state check) and return 503 on the standby, so Traefik/clients only reach
the live node — consistent with how the Management API and `/health/active` are
treated.
**Resolution**
_Unresolved._
### InboundAPI-009 — Failed compilation is retried on every subsequent request
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-128` |
**Description**
When a method's script fails to compile, `CompileAndRegister` returns `false` and
nothing is stored in `_scriptHandlers`. Every subsequent call to that method re-enters
the lazy-compile branch and recompiles the broken script via Roslyn from scratch.
Roslyn compilation is expensive; a single broken method definition repeatedly invoked
by an external caller (no rate limiting) becomes a CPU amplification vector.
**Recommendation**
Cache the compilation *failure* (e.g. store a sentinel handler that immediately
returns the compile error, or keep a `HashSet` of known-bad method names with the
diagnostic) so a broken script is compiled at most once until the definition is
updated via `CompileAndRegister`.
**Resolution**
_Unresolved._
### InboundAPI-010 — `ParameterValidator` ignores extra body fields and cannot validate Object/List element types
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/ParameterValidator.cs:64-90`, `:112-118` |
**Description**
Two related correctness gaps: (1) The validator iterates only over *defined*
parameters; any extra top-level fields in the request body are silently ignored
rather than reported, so callers get no feedback on typo'd parameter names. (2) For
`Object` and `List` types the validator only checks the JSON *kind* (`Object`/`Array`)
and then blindly `JsonSerializer.Deserialize`s the raw text — the design's extended
type system describes Objects as "named structure with typed fields" and Lists as
collections "of objects or primitive types", but no field-level or element-level type
validation is performed. Invalid nested structures pass validation and surface only
as runtime script errors.
**Recommendation**
Optionally warn/400 on unexpected body fields. For the extended types, either parse a
richer `ParameterDefinition` (with nested field definitions / element type) and
validate recursively, or document explicitly that Object/List are validated only for
shape — and update the design doc to match.
**Resolution**
_Unresolved._
### InboundAPI-011 — Method-existence check leaks to unapproved callers (enumeration oracle)
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:39-52` |
**Description**
`ValidateAsync` returns 400 `Method '{methodName}' not found` when the method does not
exist, but 403 `API key not approved for this method` when it exists but the key is
not approved. A caller holding any valid enabled key can therefore enumerate which
method names exist on the central API by observing 400-vs-403 responses. The error
message also echoes the caller-supplied `methodName` back verbatim into the JSON
response (`EndpointExtensions.cs:47`), a minor reflected-input concern.
**Recommendation**
Return an indistinguishable response (e.g. 403/404) for both "method not found" and
"key not approved" so existence is not observable to unapproved callers. Avoid echoing
raw caller input in error bodies, or sanitize it.
**Resolution**
_Unresolved._
### InboundAPI-012 — `ParameterDefinition` POCO declared in the component project, not Commons
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/ParameterValidator.cs:128-133` |
**Description**
`ParameterDefinition` is a persistence-/contract-shaped POCO: it is the deserialized
form of `ApiMethod.ParameterDefinitions` (a column in the configuration database) and
describes the public API contract. CLAUDE.md's code-organization rules place
persistence-ignorant entity/contract types in `ScadaLink.Commons`. Defining it inside
the InboundAPI project means any other component that needs to read or produce method
parameter definitions (e.g. Central UI's method editor, CLI, Management Service)
cannot share the type and will duplicate it.
**Recommendation**
Move `ParameterDefinition` (and a matching return-definition type, if added) to
`ScadaLink.Commons` under the InboundApi entity/types namespace so it is shared by all
components that work with method definitions.
**Resolution**
_Unresolved._
### InboundAPI-013 — `ApiKeyValidationResult.NotFound` factory returns HTTP 400, contradicting its name
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:78-79` |
**Description**
The static factory is named `NotFound` and is used for the "method not found" case,
but it builds a result with `StatusCode = 400` (Bad Request), not 404. The name
strongly implies 404 and will mislead future maintainers; `EndpointExtensions`
faithfully propagates whatever status code the factory sets, so the misnaming directly
affects the wire contract.
**Recommendation**
Rename the factory to match its behaviour (e.g. `BadRequest`) or change the status
code to 404 if that is the intended contract — and document the chosen "method not
found" status in `Component-InboundAPI.md`'s Error Handling section, which currently
does not list it.
**Resolution**
_Unresolved._