46 KiB
Code Review — InboundAPI
| Field | Value |
|---|---|
| Module | src/ScadaLink.InboundAPI |
| Design doc | docs/requirements/Component-InboundAPI.md |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | 39d737e |
| Open findings | 0 |
Summary
The InboundAPI module is small (8 source files) and the happy-path flow — extract
key, validate, deserialize parameters, execute script, serialize result — is clean
and readable. However the review surfaced several real problems concentrated in two
themes: concurrency and security. The InboundScriptExecutor is a singleton
that mutates a plain Dictionary from concurrent ASP.NET request threads with no
synchronization, which can corrupt the handler cache or crash the process under load.
On the security side, API-key comparison is a non-constant-time database string
match (timing oracle), compiled scripts run with no enforcement of the documented
script trust model (forbidden APIs such as System.IO/Process/Reflection are
fully reachable), there is no request-body size limit, and the executor's catch-all
swallows OperationCanceledException from genuine client disconnects as a "timeout".
Design-doc adherence is also incomplete: the Database.Connection() script API
described in the design doc is entirely absent from InboundScriptContext, and the
endpoint never enforces that the API is central-only. Testing covers the validators
well but there is no coverage of the HTTP endpoint, concurrency, or recompilation.
None of the findings are data-loss-class, but the concurrency and trust-model issues
are High severity and should be addressed before production use.
Re-review 2026-05-17 (commit 39d737e)
All 13 findings from the initial review remain Resolved; the module source under
src/ScadaLink.InboundAPI is unchanged since the last InboundAPI fix commit
(8dd7412), which precedes 39d737e. This re-review re-walked all 10 checklist
categories against the resolved code and surfaced 4 new findings — none touching
the previously-fixed concurrency/trust-model code, but all in areas the first pass
did not probe deeply: (1) the ReturnDefinition column is loaded onto ApiMethod
but is never consulted — script return values are serialized verbatim with no shaping
or validation against the declared return structure (InboundAPI-014); (2) the new
ForbiddenApiChecker is a purely textual syntax walker and can be bypassed by
reaching forbidden functionality through member access that never spells a forbidden
namespace, e.g. typeof(x).Assembly.GetType("System.IO.File") (InboundAPI-015);
(3) routed Route.To().Call() invocations are not bound by the method timeout unless
the script explicitly threads Parameters-side cancellation, contradicting the design
statement that the timeout covers routed calls (InboundAPI-016); and (4) RouteHelper
/ RouteTarget — the entire WP-4 cross-site routing surface — has no test coverage
(InboundAPI-017). New findings are one Medium-trio plus one Low; no Critical or High.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ☑ | CoerceValue returns null for legitimately-null/String values indistinguishably; parameter-definition edge cases noted. |
| 2 | Akka.NET conventions | ☑ | Module is ASP.NET-hosted, no actors of its own; routes to actors via CommunicationService. No correlation-ID issues — IDs are set in RouteHelper. |
| 3 | Concurrency & thread safety | ☑ | Singleton InboundScriptExecutor mutates a non-thread-safe Dictionary from concurrent request threads — see InboundAPI-001/002. |
| 4 | Error handling & resilience | ☑ | Catch-all conflates client cancellation with timeout (InboundAPI-004); compilation-failure path repeats work on every request (InboundAPI-009). |
| 5 | Security | ☑ | Prior items resolved. Re-review: ForbiddenApiChecker is a textual deny-list bypassable via reflection without a forbidden namespace token (InboundAPI-015). |
| 6 | Performance & resource management | ☑ | Up to 3 separate DB round-trips per request in ApiKeyValidator; uncapped lazy recompilation. |
| 7 | Design-document adherence | ☑ | Re-review: ReturnDefinition loaded but never used (InboundAPI-014); routed-call timeout not enforced (InboundAPI-016). Prior Database.Connection()/central-only items resolved. |
| 8 | Code organization & conventions | ☑ | ParameterDefinition moved to Commons (InboundAPI-012 resolved); no new issues. |
| 9 | Testing coverage | ☑ | Re-review: RouteHelper/RouteTarget (WP-4 routing) entirely untested (InboundAPI-017); validators/executor/filter well covered. |
| 10 | Documentation & comments | ☑ | ApiKeyValidationResult.NotFound XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013). |
Findings
InboundAPI-001 — Singleton script handler cache mutated without synchronization
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:17, :32, :40, :89, :123-128 |
Description
InboundScriptExecutor is registered as a singleton (ServiceCollectionExtensions.cs:11)
and its handler cache is a plain Dictionary<string, Func<...>> (InboundScriptExecutor.cs:17).
RegisterHandler, RemoveHandler, CompileAndRegister, and the lazy-compile path in
ExecuteAsync all read and write this dictionary with no lock. ASP.NET serves inbound
API requests on concurrent thread-pool threads, so two requests for an as-yet-uncompiled
method (or a request racing a CLI-triggered CompileAndRegister) can mutate the
dictionary concurrently. Dictionary is explicitly not safe for concurrent
read/write — this can corrupt internal buckets, throw InvalidOperationException,
or return a torn/null handler, crashing the request or the process.
Recommendation
Replace the Dictionary with a ConcurrentDictionary<string, Func<...>>, or guard all
access with a lock. For the lazy-compile path use GetOrAdd so concurrent first-callers
compile at most once.
Resolution
Resolved 2026-05-16 (commit <pending>): replaced the plain Dictionary handler
cache with a ConcurrentDictionary; RemoveHandler now uses TryRemove; the
lazy-compile path in ExecuteAsync compiles outside the cache and inserts atomically
via GetOrAdd so concurrent first-callers share one handler. Regression tests
ConcurrentLazyCompile_SameMethod_DoesNotCorruptCache and
ConcurrentRegisterAndExecute_DoesNotThrow added.
InboundAPI-002 — Lazy compilation is a check-then-act race with no atomicity
| Severity | Medium — re-triaged: already fixed by the InboundAPI-001 fix; verified and closed |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:152-161 |
Description
ExecuteAsync does if (!_scriptHandlers.TryGetValue(...)) { CompileAndRegister(method); handler = _scriptHandlers[method.Name]; }.
Even setting aside the unsynchronized dictionary (InboundAPI-001), this is a
check-then-act sequence: between TryGetValue failing and the re-read on line 128,
another thread could RemoveHandler the entry, causing the indexer on line 128 to
throw KeyNotFoundException — an unhandled-in-context exception that is then caught
only by the broad catch on line 143 and reported to the caller as "Internal script
error". Multiple concurrent first-callers will also each compile the same script
redundantly (wasted Roslyn work).
Recommendation
Make compile-and-fetch a single atomic operation (ConcurrentDictionary.GetOrAdd
with a lazily-evaluated factory, or a per-method lock), and have CompileAndRegister
return the handler it produced rather than requiring a separate dictionary read.
Resolution
Resolved 2026-05-16 (commit pending): re-triage — verified against the current
source, this finding was already fixed by the InboundAPI-001 fix. The
InboundScriptExecutor.cs:152-161 lazy-compile path no longer does check-then-act
re-read: Compile(method) runs unconditionally (it never reads the cache) and the
result is published via the atomic _scriptHandlers.GetOrAdd(method.Name, compiled).
There is no separate dictionary indexer read, so the KeyNotFoundException race the
finding describes cannot occur, and concurrent first-callers all share the single
handler that GetOrAdd keeps. Regression test
LazyCompile_RacingRemoveHandler_NeverThrowsKeyNotFound added (asserts a concurrent
RemoveHandler storm against lazy-compiling callers never yields the catch-all
"Internal script error"); it passes against the current code, confirming the fix.
InboundAPI-003 — API key compared with non-constant-time string equality
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.ConfigurationDatabase/Repositories/InboundApiRepository.cs:22-23, consumed by src/ScadaLink.InboundAPI/ApiKeyValidator.cs:33 |
Description
API-key authentication resolves the key with
FirstOrDefaultAsync(k => k.KeyValue == keyValue) — an ordinary equality match
translated to a SQL WHERE KeyValue = @p comparison. The secret is matched with
ordinary (early-exit) string/SQL comparison rather than a constant-time comparison,
which is a classic timing side-channel for secret material. Combined with the design's
explicit "no rate limiting" decision, an attacker with network access to the central
API can mount a timing attack to recover valid keys. The API key is the sole
credential for the inbound API, so this is the primary authentication path.
Recommendation
Look the key up by a non-secret indexed identifier (e.g. a key prefix/id) or fetch
candidate rows, then verify the secret in-process using
CryptographicOperations.FixedTimeEquals over the UTF-8 bytes. Preferably store only
a salted hash of the key value and compare hashes. Avoid leaking secret-length and
match-position timing.
Resolution
Resolved 2026-05-16 (commit <pending>): ApiKeyValidator no longer calls the
secret-equality lookup GetApiKeyByValueAsync (the SQL WHERE KeyValue = @secret
timing oracle). It now fetches all keys via GetAllApiKeysAsync and matches the
secret in-process with CryptographicOperations.FixedTimeEquals over the UTF-8 bytes,
scanning every candidate so neither match position nor secret length is observable.
Regression tests ValidateAsync_DoesNotUseSecretEqualityLookup,
ValidateAsync_WrongKey_ConstantTimePath_Returns401, and
ValidateAsync_KeyOfDifferentLength_Returns401 added. Note: the timing-oracle method
GetApiKeyByValueAsync remains on IInboundApiRepository (it is outside this module);
removing it from the repository is left as separate follow-up since the validator no
longer depends on it.
InboundAPI-004 — Client disconnect is misreported as a script timeout
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:117-141 |
Description
ExecuteAsync creates a linked CTS from httpContext.RequestAborted and the method
timeout, then catches OperationCanceledException and unconditionally returns
"Script execution timed out". When the client aborts the request (RequestAborted
fires), the same exception type is thrown, so a normal client disconnect is logged as
a timeout (_logger.LogWarning("Script execution timed out ...")) and an attempt is
made to write a 500 timeout body to an already-gone connection. This pollutes the
failure log (which the design says is reserved for genuine script errors) and obscures
real timeout incidents.
Recommendation
Distinguish the two cancellation sources: if cancellationToken (the request token)
is cancelled, treat it as a client abort — do not log a timeout and do not attempt to
write a response. Only when the timeout CTS fired should the result be "timed out".
Check cts.Token.IsCancellationRequested && !cancellationToken.IsCancellationRequested
or use a dedicated timeout CancellationTokenSource so the two are separable.
Resolution
Resolved 2026-05-16 (commit pending): ExecuteAsync now uses a dedicated timeout
CancellationTokenSource (new CancellationTokenSource(timeout)) linked with the
request-abort token, so the two cancellation sources are separable. The
OperationCanceledException handler reports "Script execution timed out" (and logs a
warning) only when the timeout CTS fired and the request token did not; a client
abort instead returns "Request cancelled by client" and logs at Debug — the failure
log stays reserved for genuine script-execution timeouts. HandleInboundApiRequest
additionally short-circuits with Results.Empty (no warning log, no 500 body write)
when RequestAborted is cancelled, since the connection is already gone. Regression
tests ClientDisconnect_IsNotReportedAsTimeout and GenuineTimeout_StillReportedAsTimeout
added.
InboundAPI-005 — Compiled API scripts run with no script-trust-model enforcement
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:56-93 |
Description
CLAUDE.md's Akka.NET conventions state the script trust model forbids System.IO,
Process, Threading, Reflection, and raw network access. CompileAndRegister
compiles arbitrary C# with CSharpScript.Create and only restricts the default
imports (WithImports("System", ...)). Imports are a convenience, not a sandbox — a
script can still fully-qualify any type (System.IO.File.Delete(...),
System.Diagnostics.Process.Start(...), System.Reflection, raw Socket) because
the core framework assemblies are referenced and Roslyn scripting performs no API
allow/deny-listing. Inbound API scripts execute on the central node with the host
process's privileges, so a malicious or buggy method definition has full host access.
Note the Design role authors these scripts (less trusted than Admin), making
enforcement material.
Recommendation
Add a compile-time analyzer/SyntaxWalker (as the Site Runtime does for instance
scripts) that rejects forbidden namespaces/types before registering a handler, and/or
run scripts under a constrained boundary. At minimum, share the Site Runtime's
forbidden-API checker so the trust model is enforced consistently. Reject the method
(and log) when a violation is found instead of registering it.
Resolution
Resolved 2026-05-16 (commit <pending>): added ForbiddenApiChecker, a Roslyn
CSharpSyntaxWalker that statically rejects scripts referencing forbidden namespaces
(System.IO, System.Diagnostics, System.Threading except Tasks,
System.Reflection, System.Net, System.Runtime.InteropServices, Microsoft.Win32)
whether reached via a using directive or a fully-qualified name. CompileAndRegister
now runs the check before Roslyn compilation and refuses to register (and logs) a
violating method; ExecuteAsync's lazy-compile path is gated by the same check.
Regression tests CompileAndRegister_ForbiddenApi_RejectsScript (theory),
ExecuteAsync_ForbiddenApiScript_DoesNotRunAndReturnsFailure, and
CompileAndRegister_PermittedScript_StillRegisters added.
InboundAPI-006 — No request body size limit on the inbound endpoint
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/EndpointExtensions.cs:54-62 |
Description
HandleInboundApiRequest calls JsonDocument.ParseAsync(httpContext.Request.Body, ...)
with no explicit body-size cap and no [RequestSizeLimit]/endpoint metadata. Although
Kestrel has a default max request body size, this endpoint accepts arbitrary JSON from
external systems, fully buffers it into a JsonDocument, and then Clone()s the
root element (:61) which materializes the entire document on the heap. With no rate
limiting (a deliberate design choice) a single caller can drive large allocations.
Deep/wide JSON also makes the CoerceValue object/list deserialization
(ParameterValidator.cs:113,117) expensive.
Recommendation
Set an explicit, modest body-size limit on the endpoint
(.WithMetadata(new RequestSizeLimitAttribute(...)) or
IHttpMaxRequestBodySizeFeature) and consider a JsonDocumentOptions MaxDepth.
Reject oversized bodies with 413 before buffering.
Resolution
Resolved 2026-05-16 (commit pending): added InboundApiEndpointFilter, an
IEndpointFilter applied to POST /api/{methodName} via .AddEndpointFilter<>().
It rejects requests whose declared Content-Length exceeds InboundApiOptions. MaxRequestBodyBytes (default 1 MiB) with HTTP 413 before the handler buffers the
body into a JsonDocument, and also lowers the per-request IHttpMaxRequestBodySizeFeature
cap so a chunked/unknown-length stream is cut off by Kestrel while being read. The
limit is configurable via the bound ScadaLink:InboundApi options section. Regression
tests OversizedBody_ShortCircuitsWith413_AndDoesNotRunHandler, BodyAtLimit_RunsHandler,
and FilterCapsMaxRequestBodySizeFeature added.
InboundAPI-007 — Database.Connection() script API from the design doc is not implemented
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:188-203 |
Description
Component-InboundAPI.md ("Script Runtime API -> Database Access") specifies
Database.Connection("connectionName") as an available script capability for
querying the configuration/machine-data databases. InboundScriptContext exposes only
Parameters, Route, and CancellationToken — there is no Database member. Any
method script that follows the documented API will fail to compile. Either the code
is incomplete or the design doc is stale; the two must be reconciled.
Recommendation
If database access is in scope, add a Database property to InboundScriptContext
backed by a connection-factory service. If it is not, remove the "Database Access"
section from Component-InboundAPI.md so the design doc stops advertising an absent
API.
Resolution
Resolved 2026-05-16 (commit <pending>). The drift was confirmed real:
InboundScriptContext (InboundScriptExecutor.cs:188-203) exposes only
Parameters, Route, and CancellationToken — there is no Database member,
so a method script following the documented Database.Connection("name") API
would fail to compile. Resolution direction: the design doc is stale, not the
code. Implementing Database.Connection() would hand inbound API scripts a
raw MS SQL client, in direct tension with the ScadaLink script trust model
(scripts are forbidden System.IO, raw network, etc.; ForbiddenApiChecker
statically enforces this). Rather than carve a hole in the trust model, the
"Database Access" section was removed from docs/requirements/Component-InboundAPI.md
and replaced with an explicit "No direct database access" note explaining that
scripts interact only through the curated Route/Parameters surfaces, and
that any future data access must go behind a dedicated scoped helper added as an
explicit design change. Code and doc now agree; no code or test change required.
InboundAPI-008 — Inbound API endpoint not restricted to the active central node
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/EndpointExtensions.cs:19-23, src/ScadaLink.Host/Program.cs:149 |
Description
The design states the Inbound API is "Central cluster only (active node)" and "fails
over with it". MapInboundAPI registers POST /api/{methodName} unconditionally, and
Program.cs maps it inside the central-role branch but with no active-node gating —
unlike /health/active which has an active-node predicate. A standby central node
will happily serve inbound API calls, executing scripts and Route.To() calls from a
non-leader, which can race the active node or run against stale singleton state.
Recommendation
Gate the endpoint on active-node status (reuse the cluster active-node health check
or a leader-state check) and return 503 on the standby, so Traefik/clients only reach
the live node — consistent with how the Management API and /health/active are
treated.
Resolution
Resolved 2026-05-16 (commit pending): introduced IActiveNodeGate, an abstraction
the inbound API uses to ask whether this node is the active (cluster-leader) central
node. The new InboundApiEndpointFilter (applied to POST /api/{methodName})
consults the gate and short-circuits a standby node with HTTP 503 before any
auth/script work, so Traefik/clients only reach the live node — consistent with
/health/active. The gate is resolved optionally: when no implementation is
registered (non-clustered host / tests) the endpoint defaults to "allow", preserving
prior behaviour. Regression tests StandbyNode_ShortCircuitsWith503_AndDoesNotRunHandler,
ActiveNode_PassesGate_RunsHandler, and NoGateRegistered_PassesGate_RunsHandler
added. Follow-up (outside this module's scope): ScadaLink.Host should register
an IActiveNodeGate implementation backed by ActiveNodeHealthCheck /
Cluster.State.Leader in the central-role branch of Program.cs so the gate is
actually enforced in production; until then the endpoint defaults to "allow".
InboundAPI-009 — Failed compilation is retried on every subsequent request
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:123-128 |
Description
When a method's script fails to compile, CompileAndRegister returns false and
nothing is stored in _scriptHandlers. Every subsequent call to that method re-enters
the lazy-compile branch and recompiles the broken script via Roslyn from scratch.
Roslyn compilation is expensive; a single broken method definition repeatedly invoked
by an external caller (no rate limiting) becomes a CPU amplification vector.
Recommendation
Cache the compilation failure (e.g. store a sentinel handler that immediately
returns the compile error, or keep a HashSet of known-bad method names with the
diagnostic) so a broken script is compiled at most once until the definition is
updated via CompileAndRegister.
Resolution
Resolved 2026-05-16 (commit pending): confirmed the root cause — a failed Compile
stored nothing in _scriptHandlers, so every subsequent request re-entered the
lazy-compile branch and re-ran Roslyn. Added a _knownBadMethods ConcurrentDictionary
of method names whose compilation failed; ExecuteAsync's lazy-compile path
short-circuits before Roslyn when the method is already known-bad, and records the
failure on a fresh failed compile. CompileAndRegister also records failures and
clears the record on a successful (re)compile, so a fixed method definition is
re-evaluated. Regression tests FailedCompilation_IsNotRetriedOnEveryRequest
(asserts the compile-failure log fires exactly once across 5 requests) and
FailedCompilation_RecompilesAfterCompileAndRegisterCalledAgain added.
InboundAPI-010 — ParameterValidator ignores extra body fields and cannot validate Object/List element types
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/ParameterValidator.cs:64-90, :112-118 |
Description
Two related correctness gaps: (1) The validator iterates only over defined
parameters; any extra top-level fields in the request body are silently ignored
rather than reported, so callers get no feedback on typo'd parameter names. (2) For
Object and List types the validator only checks the JSON kind (Object/Array)
and then blindly JsonSerializer.Deserializes the raw text — the design's extended
type system describes Objects as "named structure with typed fields" and Lists as
collections "of objects or primitive types", but no field-level or element-level type
validation is performed. Invalid nested structures pass validation and surface only
as runtime script errors.
Recommendation
Optionally warn/400 on unexpected body fields. For the extended types, either parse a
richer ParameterDefinition (with nested field definitions / element type) and
validate recursively, or document explicitly that Object/List are validated only for
shape — and update the design doc to match.
Resolution
Resolved 2026-05-16 (commit pending): both gaps addressed along the lines the
recommendation offers. (1) ParameterValidator.Validate now enumerates the request
body's top-level fields and returns an Invalid result naming any field that does not
match a defined parameter ("Unexpected parameter(s): ..."), so a typo'd parameter
name is reported instead of silently ignored. (2) For Object/List, recursive
field/element-level type validation is deliberately not added — it requires a
richer nested ParameterDefinition schema, a design decision; instead the
shape-only behaviour is now explicitly documented in the CoerceValue XML comment so
the code's contract is unambiguous. Re-triage note: the design doc
(Component-InboundAPI.md line 43) lists only Boolean/Integer/Float/String as method
parameter types — the Object/List extended types are a CLAUDE.md decision; reconciling
the design doc is out of this module's editable scope and left as a doc-owner
follow-up. Regression tests UnexpectedBodyField_ReturnsInvalid and
OnlyDefinedFields_StillValid added.
InboundAPI-011 — Method-existence check leaks to unapproved callers (enumeration oracle)
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/ApiKeyValidator.cs:39-52 |
Description
ValidateAsync returns 400 Method '{methodName}' not found when the method does not
exist, but 403 API key not approved for this method when it exists but the key is
not approved. A caller holding any valid enabled key can therefore enumerate which
method names exist on the central API by observing 400-vs-403 responses. The error
message also echoes the caller-supplied methodName back verbatim into the JSON
response (EndpointExtensions.cs:47), a minor reflected-input concern.
Recommendation
Return an indistinguishable response (e.g. 403/404) for both "method not found" and "key not approved" so existence is not observable to unapproved callers. Avoid echoing raw caller input in error bodies, or sanitize it.
Resolution
Resolved 2026-05-16 (commit pending): confirmed — ValidateAsync returned 400
Method '{methodName}' not found for a missing method but 403
API key not approved for this method for an existing-but-unapproved one, an
enumeration oracle, and echoed the caller-supplied method name verbatim. Both cases
now return an identical response: HTTP 403 with the single shared message
API key not approved for this method (a NotApprovedMessage constant); the
method name is no longer interpolated into any error body, removing both the
existence oracle and the reflected-input concern. Regression tests
ValidKey_MethodNotFound_IsIndistinguishableFromNotApproved and
ValidKey_MethodNotFound_ErrorMessageDoesNotEchoMethodName added; the pre-existing
ValidKey_MethodNotFound_Returns400 test was updated to assert the new
indistinguishable contract.
InboundAPI-012 — ParameterDefinition POCO declared in the component project, not Commons
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/ParameterValidator.cs:128-133 |
Description
ParameterDefinition is a persistence-/contract-shaped POCO: it is the deserialized
form of ApiMethod.ParameterDefinitions (a column in the configuration database) and
describes the public API contract. CLAUDE.md's code-organization rules place
persistence-ignorant entity/contract types in ScadaLink.Commons. Defining it inside
the InboundAPI project means any other component that needs to read or produce method
parameter definitions (e.g. Central UI's method editor, CLI, Management Service)
cannot share the type and will duplicate it.
Recommendation
Move ParameterDefinition (and a matching return-definition type, if added) to
ScadaLink.Commons under the InboundApi entity/types namespace so it is shared by all
components that work with method definitions.
Resolution
Resolved 2026-05-16 (commit <pending>): root cause confirmed against the source —
ParameterDefinition was a persistence-ignorant, API-contract-shaped POCO (the
deserialized form of the ApiMethod.ParameterDefinitions configuration-database
column) declared inside the component project, contrary to CLAUDE.md's
code-organization rule that such shared contract types live in ScadaLink.Commons.
The type was moved to src/ScadaLink.Commons/Types/InboundApi/ParameterDefinition.cs
(namespace ScadaLink.Commons.Types.InboundApi) — placed under Types/ with an
InboundApi domain subfolder, matching the existing Types/Scripts/ precedent, since
the column itself is the persisted form and this type is its deserialized contract
shape (not an EF-mapped entity). It remains a pure POCO with no EF attributes and no
behaviour. ParameterValidator now imports the moved type via a using ScadaLink.Commons.Types.InboundApi; directive; a tree-wide search confirmed
ParameterValidator.cs was the type's only declaration and only direct consumer (all
other ParameterDefinition* matches are the unrelated ParameterDefinitions string
property). No return-definition type exists in the codebase — only a ReturnDefinition
string column — so none was invented. No behavioural change, so no new runtime
regression test: this is a compile-level type move, and the existing 52
ScadaLink.InboundAPI.Tests (including the ParameterValidator suite) act as the
regression guard. dotnet test for ScadaLink.InboundAPI.Tests (52 passed) and
ScadaLink.Commons.Tests (226 passed) are green; dotnet build ScadaLink.slnx
succeeds with 0 warnings / 0 errors.
InboundAPI-013 — ApiKeyValidationResult.NotFound factory returns HTTP 400, contradicting its name
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/ApiKeyValidator.cs:78-79 |
Description
The static factory is named NotFound and is used for the "method not found" case,
but it builds a result with StatusCode = 400 (Bad Request), not 404. The name
strongly implies 404 and will mislead future maintainers; EndpointExtensions
faithfully propagates whatever status code the factory sets, so the misnaming directly
affects the wire contract.
Recommendation
Rename the factory to match its behaviour (e.g. BadRequest) or change the status
code to 404 if that is the intended contract — and document the chosen "method not
found" status in Component-InboundAPI.md's Error Handling section, which currently
does not list it.
Resolution
Resolved 2026-05-16 (commit pending): the misnamed NotFound factory (which built a
StatusCode = 400 result) was the only producer of the "method not found" result,
and the InboundAPI-011 fix made "method not found" return 403 via the existing
Forbidden factory instead. The misleading NotFound factory is therefore now
removed entirely — it has no remaining callers in or out of the module
(ApiKeyValidationResult is InboundAPI-internal), eliminating the name/behaviour
contradiction. No separate regression test is needed: the InboundAPI-011 tests cover
the new method-not-found status, and removing dead code cannot regress. Doc-owner
follow-up: Component-InboundAPI.md's Error Handling section still does not list a
"method not found" status; it should note that it is reported as 403 (indistinguishable
from "key not approved"), but that doc edit is outside this module's editable scope.
InboundAPI-014 — ReturnDefinition is loaded but never used; script return value is unshaped/unvalidated
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:201-205, src/ScadaLink.Commons/Entities/InboundApi/ApiMethod.cs:10 |
Description
Component-InboundAPI.md ("API Method Definition → Return Value Definition" and the
"Response Format" section) specifies that each method has a declared return structure
— "Field names and data types … Supports returning lists of objects" — and that the
success response body is "the method's return value as JSON, with fields matching the
return value definition". The ApiMethod entity carries a ReturnDefinition column
to hold exactly this. However, nothing in the module ever reads ReturnDefinition:
ExecuteAsync takes whatever object the script happens to return and does a blind
JsonSerializer.Serialize(result). There is no validation that the script's return
value matches the declared shape, no coercion to the declared field types, and no
error when a method returns a structure inconsistent with its definition. A method
whose script returns the wrong shape (or null where a structure is required) will
silently emit a malformed 200 response, and the documented return-definition contract
is effectively unenforced. This is the response-side mirror of the parameter
validation that ParameterValidator does perform, leaving the two halves of the
method contract asymmetric.
Recommendation
Either (a) implement return-value validation/shaping: parse ReturnDefinition with
the same extended-type machinery used for parameters and validate/coerce the script
result before serializing, returning a 500 (or logging) when the script result does
not match; or (b) if return shaping is deliberately out of scope, remove the "Return
Value Definition" / "fields matching the return value definition" language from
Component-InboundAPI.md and document that the response is the script's raw return
value serialized as-is. Code and design doc must be reconciled.
Resolution
Resolved 2026-05-17 (commit <pending>): root cause confirmed — ApiMethod carries
ReturnDefinition but the executor did a blind JsonSerializer.Serialize(result),
so a script returning the wrong shape silently emitted a malformed 200. Took
option (a): added ReturnValueValidator, the response-side mirror of
ParameterValidator. It parses ReturnDefinition (a JSON array of {name,type}
field definitions, same extended-type set as parameters), validates the serialized
script result against it — declared fields must be present with a compatible JSON
type, primitives type-checked, Object/List shape-checked — and a null/non-object
result is rejected when a structure is declared. InboundScriptExecutor.ExecuteAsync
now runs the validator after serialization and, on mismatch, logs and returns a
script failure ("Method return value did not match its return definition", → 500)
instead of a malformed 200. A method with no ReturnDefinition stays unconstrained
(backward compatible). Doc-owner follow-up (outside this module's editable scope):
the Component-InboundAPI.md "Response Format" section may note that return shaping
is validation-only (no coercion). Regression tests: ReturnValueValidatorTests
(12 cases) plus executor-level ReturnValue_MatchingReturnDefinition_Succeeds,
ReturnValue_NotMatchingReturnDefinition_ReturnsFailureNotMalformed200, and
ReturnValue_NoReturnDefinition_IsUnconstrained.
InboundAPI-015 — ForbiddenApiChecker is purely textual and is bypassable via reflection reachable without a forbidden namespace token
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/ForbiddenApiChecker.cs:63-119, src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:109-126 |
Description
ForbiddenApiChecker walks the script syntax tree and rejects any using directive,
QualifiedNameSyntax, or MemberAccessExpressionSyntax whose textual dotted name
starts with a forbidden namespace prefix (System.IO, System.Diagnostics,
System.Reflection, System.Net, etc.). This is a textual match, not a semantic
one, and the trust model it enforces (per InboundAPI-005) is explicitly meant to keep
untrusted Design-role scripts away from host APIs. The check can be bypassed because
forbidden functionality is reachable through member access that never spells a
forbidden namespace:
typeof(string).Assembly.GetType("System.IO.File")—typeof(string)is permitted,.Assemblyis aSystem.Typeproperty,.GetType(string)is aSystem.Reflection.Assemblymethod. The string literal"System.IO.File"is a string, not aQualifiedNameSyntaxorMemberAccessExpressionSyntax, soIsForbiddennever sees it. The script obtains aSystem.IO.FileTypeand canInvokeMember/GetMethod(...).Invoke(...)on it — all via members of permitted types — with no forbidden namespace ever appearing in the source.CompileAndRegisterreferencestypeof(object).Assembly(System.Private.CoreLib) inScriptOptions, so every framework type is loadable at runtime.- The executor also references the
Microsoft.CSharp.RuntimeBinderassembly (InboundScriptExecutor.cs:116), enabling thedynamickeyword, which further widens late-bound member access that the static walker cannot see through.
Because the inbound API script runs on the central node with the host process's privileges and is authored by the (less-trusted-than-Admin) Design role, a static textual deny-list gives a false sense of containment.
Recommendation
Treat the syntax walker as defence-in-depth, not the boundary. Strengthen it where
cheap (flag Assembly.GetType, Type.GetType, Activator.CreateInstance,
InvokeMember, and dynamic usage), but for real enforcement run compiled scripts
under a genuine boundary — a restricted AssemblyLoadContext/AppDomain-equivalent, a
curated reference set that does not expose reflection-to-arbitrary-type, or an
out-of-process sandbox — consistent with however the Site Runtime ultimately enforces
its instance-script trust model. At minimum, document in Component-InboundAPI.md
that the current check is best-effort and does not stop a determined script.
Resolution
Resolved 2026-05-17 (commit <pending>): the specific reflection-via-permitted-member
vector was confirmed and the textual checker materially hardened against it (full
sandboxing remains a separate, larger design effort — see below). ForbiddenApiWalker
now, in addition to the namespace deny-list, rejects a curated set of reflection-gateway
member names (GetType, GetTypeInfo, Assembly, Module, CreateInstance,
InvokeMember, GetMethod(s), GetConstructor(s), GetField(s), GetProperty(ies),
GetMember(s), GetRuntimeMethod(s), MethodHandle, TypeHandle) regardless of the
receiver expression — so typeof(string).Assembly.GetType("System.IO.File") is now
caught because .Assembly and .GetType appear as accessed member names. It also
rejects a bare Activator identifier and the dynamic keyword (which widens
late-bound access the static walker cannot see through). Invoke is deliberately
not flagged so legitimate Action/Func delegate invocation still compiles —
the reflection MethodInfo.Invoke path is cut off by rejecting the GetMethod that
produces the MethodInfo. Documented limitation: this is hardened defence-in-depth,
not a true sandbox — a determined author may still find a vector the syntax walker
cannot see (e.g. via Microsoft.CSharp.RuntimeBinder internals or generics tricks).
Genuine containment needs a runtime boundary (restricted AssemblyLoadContext /
curated reference set that does not expose reflection-to-arbitrary-type / out-of-process
sandbox); that is tracked as a future design change and noted in the ForbiddenApiChecker
XML summary. Regression tests: new ForbiddenApiCheckerTests suite (19 cases) covering
the Assembly/GetType/Type.GetType/Activator.CreateInstance/InvokeMember/
GetMethod/GetTypeInfo/dynamic bypass vectors plus permitted-script and
namespace-deny-list regression guards.
InboundAPI-016 — Routed Route.To().Call() invocations are not bound by the method timeout
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/RouteHelper.cs:59-152, src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:177, :199 |
Description
Component-InboundAPI.md states the per-method timeout "defines the maximum time the
method is allowed to execute (including any routed calls to sites)", and the
Routing Behavior section says a routed call "blocks until the site responds or the
method-level timeout is reached". The executor builds a linked
CancellationTokenSource (cts) combining the request-abort token and a dedicated
timeout CTS, and exposes cts.Token to the script as InboundScriptContext.CancellationToken.
However, every RouteTarget method (Call, GetAttribute(s), SetAttribute(s))
takes CancellationToken cancellationToken = default and the script must explicitly
pass the context token for the routed call to honour the timeout. A natural script —
Route.To("inst").Call("doWork", parameters) — invokes the routed call with
CancellationToken.None. That request flows into CommunicationService.RouteToCallAsync
with no cancellation, so the routed call is not bounded by the method timeout at all.
The only timeout guard left is handler(context).WaitAsync(cts.Token) in
ExecuteAsync: when the method timeout fires, WaitAsync returns a cancellation to
the caller, but the underlying script Task — and the in-flight RouteToCallAsync
awaiting a remote site — keeps running orphaned with no cancellation, holding the
correlation/communication resources until the site eventually responds or its own
transport timeout (if any) fires. The design's guarantee that the method timeout
covers routed calls is therefore not met, and a slow/hung site can leak background
work past the timeout the caller was told bounds the request.
Recommendation
Make routed calls inherit the method deadline without relying on script discipline:
have RouteHelper/RouteTarget carry the executing method's CancellationToken
(injected by InboundScriptExecutor when it constructs the context, e.g. a
RouteHelper bound to cts.Token) and pass it into every CommunicationService
call by default, so Route.To("x").Call("s", p) is timeout-bounded with no token
argument. Keep the explicit-token overload for callers that want a tighter bound.
Verify RouteToCallAsync and the attribute-routing calls actually observe the token
and abandon the in-flight request when it fires.
Resolution
Resolved 2026-05-17 (commit <pending>): root cause confirmed — every RouteTarget
method took CancellationToken cancellationToken = default, so a natural script
Route.To("inst").Call("doWork", p) routed with CancellationToken.None and was not
bound by the method timeout at all. RouteHelper now carries the executing method's
deadline token: InboundScriptExecutor.ExecuteAsync calls the new
RouteHelper.WithDeadline(cts.Token) when it builds the script context, so the route
helper handed to the script is bound to the method-level timeout CTS. Each
RouteTarget method resolves an effective token — the explicitly-supplied token if
the caller passed one (tighter bound preserved), otherwise the method deadline — and
forwards it into both IInstanceLocator site resolution and the routed call. The
deadline token therefore flows through to CommunicationService.RouteTo*Async, so
an in-flight routed call observes cancellation when the method timeout fires instead
of running orphaned. Regression tests (in the new RouteHelperTests):
Call_WithNoExplicitToken_InheritsMethodDeadlineToken,
Call_WhenMethodDeadlineCancelled_RoutedCallObservesCancellation,
Call_ExplicitToken_OverridesDeadlineToken,
GetAttributes_WithNoExplicitToken_InheritsMethodDeadlineToken,
SetAttributes_WithNoExplicitToken_InheritsMethodDeadlineToken.
InboundAPI-017 — RouteHelper / RouteTarget has no test coverage
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | src/ScadaLink.InboundAPI/RouteHelper.cs:1-165, tests/ScadaLink.InboundAPI.Tests/ |
Description
RouteHelper/RouteTarget is the entire WP-4 cross-site routing surface — the
Route.To().Call()/GetAttribute(s)/SetAttribute(s) API that inbound API scripts use
to reach instances at any site. It has zero tests: the ScadaLink.InboundAPI.Tests
project covers ApiKeyValidator, ParameterValidator, InboundScriptExecutor, and
InboundApiEndpointFilter, but no test file exercises RouteHelper. Untested
behaviours include site resolution via IInstanceLocator (including the
"instance not found / no assigned site" InvalidOperationException path at
RouteHelper.cs:154-164), the !response.Success → InvalidOperationException
translation in each routed method, GetAttribute delegating to the batch
GetAttributes and returning null for an absent key, correlation-ID generation,
and SetAttribute delegating to SetAttributes. These are non-trivial branches
whose failure modes (a thrown exception inside a script) surface to the caller as a
500, so regressions would be silent.
Recommendation
Add a RouteHelperTests suite using substituted IInstanceLocator and
CommunicationService (the executor tests already substitute CommunicationService):
cover the happy path of each routed method, the unresolved-instance throw, the
!Success → InvalidOperationException mapping, and GetAttribute returning null
for a missing key. This also gives InboundAPI-016 a regression home if the timeout
wiring is added.
Resolution
Resolved 2026-05-17 (commit <pending>): confirmed — ScadaLink.InboundAPI.Tests had
no file exercising RouteHelper/RouteTarget. To make the surface testable without a
live actor system, an IInstanceRouter seam was introduced in the module (the routing
transport RouteHelper depends on); the production CommunicationServiceInstanceRouter
delegates to CommunicationService and is registered by AddInboundAPI. RouteHelper
now depends on IInstanceLocator + IInstanceRouter (both substitutable). Added the
RouteHelperTests suite (15 cases) covering: the happy path of Call/GetAttribute(s)/
SetAttribute(s), correlation-ID generation, the unresolved-instance
InvalidOperationException path, the !Success → InvalidOperationException mapping
for each routed method, GetAttribute delegating to the batch GetAttributes and
returning null for an absent key, SetAttribute delegating to SetAttributes, and
the InboundAPI-016 deadline-token inheritance behaviour. All 15 pass.