docs: add code review process and baseline review of all 19 modules
Establishes a per-module code review workflow under code-reviews/ and
records the 2026-05-16 baseline review (commit 9c60592): 241 findings
across all src/ modules (6 Critical, 46 High, 100 Medium, 89 Low).
This is the clean starting point for remediation work.
This commit is contained in:
442
code-reviews/InboundAPI/findings.md
Normal file
442
code-reviews/InboundAPI/findings.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# Code Review — InboundAPI
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Module | `src/ScadaLink.InboundAPI` |
|
||||
| Design doc | `docs/requirements/Component-InboundAPI.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 13 |
|
||||
|
||||
## Summary
|
||||
|
||||
The InboundAPI module is small (8 source files) and the happy-path flow — extract
|
||||
key, validate, deserialize parameters, execute script, serialize result — is clean
|
||||
and readable. However the review surfaced several real problems concentrated in two
|
||||
themes: **concurrency** and **security**. The `InboundScriptExecutor` is a singleton
|
||||
that mutates a plain `Dictionary` from concurrent ASP.NET request threads with no
|
||||
synchronization, which can corrupt the handler cache or crash the process under load.
|
||||
On the security side, API-key comparison is a non-constant-time database string
|
||||
match (timing oracle), compiled scripts run with no enforcement of the documented
|
||||
script trust model (forbidden APIs such as `System.IO`/`Process`/`Reflection` are
|
||||
fully reachable), there is no request-body size limit, and the executor's catch-all
|
||||
swallows `OperationCanceledException` from genuine client disconnects as a "timeout".
|
||||
Design-doc adherence is also incomplete: the `Database.Connection()` script API
|
||||
described in the design doc is entirely absent from `InboundScriptContext`, and the
|
||||
endpoint never enforces that the API is central-only. Testing covers the validators
|
||||
well but there is no coverage of the HTTP endpoint, concurrency, or recompilation.
|
||||
None of the findings are data-loss-class, but the concurrency and trust-model issues
|
||||
are High severity and should be addressed before production use.
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ☑ | `CoerceValue` returns `null` for legitimately-null/`String` values indistinguishably; parameter-definition edge cases noted. |
|
||||
| 2 | Akka.NET conventions | ☑ | Module is ASP.NET-hosted, no actors of its own; routes to actors via `CommunicationService`. No correlation-ID issues — IDs are set in `RouteHelper`. |
|
||||
| 3 | Concurrency & thread safety | ☑ | Singleton `InboundScriptExecutor` mutates a non-thread-safe `Dictionary` from concurrent request threads — see InboundAPI-001/002. |
|
||||
| 4 | Error handling & resilience | ☑ | Catch-all conflates client cancellation with timeout (InboundAPI-004); compilation-failure path repeats work on every request (InboundAPI-009). |
|
||||
| 5 | Security | ☑ | Non-constant-time key comparison, no trust-model enforcement, no body-size limit, missing-method enumeration oracle — see InboundAPI-003/005/006/011. |
|
||||
| 6 | Performance & resource management | ☑ | Up to 3 separate DB round-trips per request in `ApiKeyValidator`; uncapped lazy recompilation. |
|
||||
| 7 | Design-document adherence | ☑ | `Database.Connection()` script API missing; central-only hosting not enforced; lazy-compile diverges from "compiled at startup". |
|
||||
| 8 | Code organization & conventions | ☑ | `ParameterDefinition` is an API-shaped POCO declared in the component project rather than Commons; otherwise conventions followed. |
|
||||
| 9 | Testing coverage | ☑ | Good unit coverage of the two validators; no endpoint, concurrency, recompilation, or timeout-vs-cancel tests. |
|
||||
| 10 | Documentation & comments | ☑ | `ApiKeyValidationResult.NotFound` XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013). |
|
||||
|
||||
## Findings
|
||||
|
||||
### InboundAPI-001 — Singleton script handler cache mutated without synchronization
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:17`, `:32`, `:40`, `:89`, `:123-128` |
|
||||
|
||||
**Description**
|
||||
|
||||
`InboundScriptExecutor` is registered as a singleton (`ServiceCollectionExtensions.cs:11`)
|
||||
and its handler cache is a plain `Dictionary<string, Func<...>>` (`InboundScriptExecutor.cs:17`).
|
||||
`RegisterHandler`, `RemoveHandler`, `CompileAndRegister`, and the lazy-compile path in
|
||||
`ExecuteAsync` all read and write this dictionary with no lock. ASP.NET serves inbound
|
||||
API requests on concurrent thread-pool threads, so two requests for an as-yet-uncompiled
|
||||
method (or a request racing a CLI-triggered `CompileAndRegister`) can mutate the
|
||||
dictionary concurrently. `Dictionary` is explicitly not safe for concurrent
|
||||
read/write — this can corrupt internal buckets, throw `InvalidOperationException`,
|
||||
or return a torn/`null` handler, crashing the request or the process.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Replace the `Dictionary` with a `ConcurrentDictionary<string, Func<...>>`, or guard all
|
||||
access with a lock. For the lazy-compile path use `GetOrAdd` so concurrent first-callers
|
||||
compile at most once.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-002 — Lazy compilation is a check-then-act race with no atomicity
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-129` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ExecuteAsync` does `if (!_scriptHandlers.TryGetValue(...)) { CompileAndRegister(method); handler = _scriptHandlers[method.Name]; }`.
|
||||
Even setting aside the unsynchronized dictionary (InboundAPI-001), this is a
|
||||
check-then-act sequence: between `TryGetValue` failing and the re-read on line 128,
|
||||
another thread could `RemoveHandler` the entry, causing the indexer on line 128 to
|
||||
throw `KeyNotFoundException` — an unhandled-in-context exception that is then caught
|
||||
only by the broad catch on line 143 and reported to the caller as "Internal script
|
||||
error". Multiple concurrent first-callers will also each compile the same script
|
||||
redundantly (wasted Roslyn work).
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Make compile-and-fetch a single atomic operation (`ConcurrentDictionary.GetOrAdd`
|
||||
with a lazily-evaluated factory, or a per-method lock), and have `CompileAndRegister`
|
||||
return the handler it produced rather than requiring a separate dictionary read.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-003 — API key compared with non-constant-time string equality
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:22-23`, consumed by `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:33` |
|
||||
|
||||
**Description**
|
||||
|
||||
API-key authentication resolves the key with
|
||||
`FirstOrDefaultAsync(k => k.KeyValue == keyValue)` — an ordinary equality match
|
||||
translated to a SQL `WHERE KeyValue = @p` comparison. The secret is matched with
|
||||
ordinary (early-exit) string/SQL comparison rather than a constant-time comparison,
|
||||
which is a classic timing side-channel for secret material. Combined with the design's
|
||||
explicit "no rate limiting" decision, an attacker with network access to the central
|
||||
API can mount a timing attack to recover valid keys. The API key is the *sole*
|
||||
credential for the inbound API, so this is the primary authentication path.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Look the key up by a non-secret indexed identifier (e.g. a key prefix/id) or fetch
|
||||
candidate rows, then verify the secret in-process using
|
||||
`CryptographicOperations.FixedTimeEquals` over the UTF-8 bytes. Preferably store only
|
||||
a salted hash of the key value and compare hashes. Avoid leaking secret-length and
|
||||
match-position timing.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-004 — Client disconnect is misreported as a script timeout
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:117-141` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ExecuteAsync` creates a linked CTS from `httpContext.RequestAborted` and the method
|
||||
timeout, then catches `OperationCanceledException` and unconditionally returns
|
||||
"Script execution timed out". When the *client* aborts the request (`RequestAborted`
|
||||
fires), the same exception type is thrown, so a normal client disconnect is logged as
|
||||
a timeout (`_logger.LogWarning("Script execution timed out ...")`) and an attempt is
|
||||
made to write a 500 timeout body to an already-gone connection. This pollutes the
|
||||
failure log (which the design says is reserved for genuine script errors) and obscures
|
||||
real timeout incidents.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Distinguish the two cancellation sources: if `cancellationToken` (the request token)
|
||||
is cancelled, treat it as a client abort — do not log a timeout and do not attempt to
|
||||
write a response. Only when the timeout CTS fired should the result be "timed out".
|
||||
Check `cts.Token.IsCancellationRequested && !cancellationToken.IsCancellationRequested`
|
||||
or use a dedicated timeout `CancellationTokenSource` so the two are separable.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-005 — Compiled API scripts run with no script-trust-model enforcement
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:56-93` |
|
||||
|
||||
**Description**
|
||||
|
||||
CLAUDE.md's Akka.NET conventions state the script trust model forbids `System.IO`,
|
||||
`Process`, `Threading`, `Reflection`, and raw network access. `CompileAndRegister`
|
||||
compiles arbitrary C# with `CSharpScript.Create` and only restricts the *default
|
||||
imports* (`WithImports("System", ...)`). Imports are a convenience, not a sandbox — a
|
||||
script can still fully-qualify any type (`System.IO.File.Delete(...)`,
|
||||
`System.Diagnostics.Process.Start(...)`, `System.Reflection`, raw `Socket`) because
|
||||
the core framework assemblies are referenced and Roslyn scripting performs no API
|
||||
allow/deny-listing. Inbound API scripts execute on the central node with the host
|
||||
process's privileges, so a malicious or buggy method definition has full host access.
|
||||
Note the Design role authors these scripts (less trusted than Admin), making
|
||||
enforcement material.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add a compile-time analyzer/`SyntaxWalker` (as the Site Runtime does for instance
|
||||
scripts) that rejects forbidden namespaces/types before registering a handler, and/or
|
||||
run scripts under a constrained boundary. At minimum, share the Site Runtime's
|
||||
forbidden-API checker so the trust model is enforced consistently. Reject the method
|
||||
(and log) when a violation is found instead of registering it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-006 — No request body size limit on the inbound endpoint
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:54-62` |
|
||||
|
||||
**Description**
|
||||
|
||||
`HandleInboundApiRequest` calls `JsonDocument.ParseAsync(httpContext.Request.Body, ...)`
|
||||
with no explicit body-size cap and no `[RequestSizeLimit]`/endpoint metadata. Although
|
||||
Kestrel has a default max request body size, this endpoint accepts arbitrary JSON from
|
||||
external systems, fully buffers it into a `JsonDocument`, and then `Clone()`s the
|
||||
root element (`:61`) which materializes the entire document on the heap. With no rate
|
||||
limiting (a deliberate design choice) a single caller can drive large allocations.
|
||||
Deep/wide JSON also makes the `CoerceValue` `object`/`list` deserialization
|
||||
(`ParameterValidator.cs:113,117`) expensive.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Set an explicit, modest body-size limit on the endpoint
|
||||
(`.WithMetadata(new RequestSizeLimitAttribute(...))` or
|
||||
`IHttpMaxRequestBodySizeFeature`) and consider a `JsonDocumentOptions` `MaxDepth`.
|
||||
Reject oversized bodies with 413 before buffering.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-007 — `Database.Connection()` script API from the design doc is not implemented
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:155-170` |
|
||||
|
||||
**Description**
|
||||
|
||||
`Component-InboundAPI.md` ("Script Runtime API -> Database Access") specifies
|
||||
`Database.Connection("connectionName")` as an available script capability for
|
||||
querying the configuration/machine-data databases. `InboundScriptContext` exposes only
|
||||
`Parameters`, `Route`, and `CancellationToken` — there is no `Database` member. Any
|
||||
method script that follows the documented API will fail to compile. Either the code
|
||||
is incomplete or the design doc is stale; the two must be reconciled.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
If database access is in scope, add a `Database` property to `InboundScriptContext`
|
||||
backed by a connection-factory service. If it is not, remove the "Database Access"
|
||||
section from `Component-InboundAPI.md` so the design doc stops advertising an absent
|
||||
API.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-008 — Inbound API endpoint not restricted to the active central node
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/EndpointExtensions.cs:19-23`, `src/ScadaLink.Host/Program.cs:149` |
|
||||
|
||||
**Description**
|
||||
|
||||
The design states the Inbound API is "Central cluster only (active node)" and "fails
|
||||
over with it". `MapInboundAPI` registers `POST /api/{methodName}` unconditionally, and
|
||||
`Program.cs` maps it inside the central-role branch but with no active-node gating —
|
||||
unlike `/health/active` which has an `active-node` predicate. A standby central node
|
||||
will happily serve inbound API calls, executing scripts and `Route.To()` calls from a
|
||||
non-leader, which can race the active node or run against stale singleton state.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Gate the endpoint on active-node status (reuse the cluster `active-node` health check
|
||||
or a leader-state check) and return 503 on the standby, so Traefik/clients only reach
|
||||
the live node — consistent with how the Management API and `/health/active` are
|
||||
treated.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-009 — Failed compilation is retried on every subsequent request
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-128` |
|
||||
|
||||
**Description**
|
||||
|
||||
When a method's script fails to compile, `CompileAndRegister` returns `false` and
|
||||
nothing is stored in `_scriptHandlers`. Every subsequent call to that method re-enters
|
||||
the lazy-compile branch and recompiles the broken script via Roslyn from scratch.
|
||||
Roslyn compilation is expensive; a single broken method definition repeatedly invoked
|
||||
by an external caller (no rate limiting) becomes a CPU amplification vector.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Cache the compilation *failure* (e.g. store a sentinel handler that immediately
|
||||
returns the compile error, or keep a `HashSet` of known-bad method names with the
|
||||
diagnostic) so a broken script is compiled at most once until the definition is
|
||||
updated via `CompileAndRegister`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-010 — `ParameterValidator` ignores extra body fields and cannot validate Object/List element types
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/ParameterValidator.cs:64-90`, `:112-118` |
|
||||
|
||||
**Description**
|
||||
|
||||
Two related correctness gaps: (1) The validator iterates only over *defined*
|
||||
parameters; any extra top-level fields in the request body are silently ignored
|
||||
rather than reported, so callers get no feedback on typo'd parameter names. (2) For
|
||||
`Object` and `List` types the validator only checks the JSON *kind* (`Object`/`Array`)
|
||||
and then blindly `JsonSerializer.Deserialize`s the raw text — the design's extended
|
||||
type system describes Objects as "named structure with typed fields" and Lists as
|
||||
collections "of objects or primitive types", but no field-level or element-level type
|
||||
validation is performed. Invalid nested structures pass validation and surface only
|
||||
as runtime script errors.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Optionally warn/400 on unexpected body fields. For the extended types, either parse a
|
||||
richer `ParameterDefinition` (with nested field definitions / element type) and
|
||||
validate recursively, or document explicitly that Object/List are validated only for
|
||||
shape — and update the design doc to match.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-011 — Method-existence check leaks to unapproved callers (enumeration oracle)
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:39-52` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ValidateAsync` returns 400 `Method '{methodName}' not found` when the method does not
|
||||
exist, but 403 `API key not approved for this method` when it exists but the key is
|
||||
not approved. A caller holding any valid enabled key can therefore enumerate which
|
||||
method names exist on the central API by observing 400-vs-403 responses. The error
|
||||
message also echoes the caller-supplied `methodName` back verbatim into the JSON
|
||||
response (`EndpointExtensions.cs:47`), a minor reflected-input concern.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Return an indistinguishable response (e.g. 403/404) for both "method not found" and
|
||||
"key not approved" so existence is not observable to unapproved callers. Avoid echoing
|
||||
raw caller input in error bodies, or sanitize it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-012 — `ParameterDefinition` POCO declared in the component project, not Commons
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/ParameterValidator.cs:128-133` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ParameterDefinition` is a persistence-/contract-shaped POCO: it is the deserialized
|
||||
form of `ApiMethod.ParameterDefinitions` (a column in the configuration database) and
|
||||
describes the public API contract. CLAUDE.md's code-organization rules place
|
||||
persistence-ignorant entity/contract types in `ScadaLink.Commons`. Defining it inside
|
||||
the InboundAPI project means any other component that needs to read or produce method
|
||||
parameter definitions (e.g. Central UI's method editor, CLI, Management Service)
|
||||
cannot share the type and will duplicate it.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Move `ParameterDefinition` (and a matching return-definition type, if added) to
|
||||
`ScadaLink.Commons` under the InboundApi entity/types namespace so it is shared by all
|
||||
components that work with method definitions.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### InboundAPI-013 — `ApiKeyValidationResult.NotFound` factory returns HTTP 400, contradicting its name
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.InboundAPI/ApiKeyValidator.cs:78-79` |
|
||||
|
||||
**Description**
|
||||
|
||||
The static factory is named `NotFound` and is used for the "method not found" case,
|
||||
but it builds a result with `StatusCode = 400` (Bad Request), not 404. The name
|
||||
strongly implies 404 and will mislead future maintainers; `EndpointExtensions`
|
||||
faithfully propagates whatever status code the factory sets, so the misnaming directly
|
||||
affects the wire contract.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Rename the factory to match its behaviour (e.g. `BadRequest`) or change the status
|
||||
code to 404 if that is the intended contract — and document the chosen "method not
|
||||
found" status in `Component-InboundAPI.md`'s Error Handling section, which currently
|
||||
does not list it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Reference in New Issue
Block a user