docs(code-reviews): re-review batch 3 at 39d737e — Host, InboundAPI, ManagementService, NotificationService, Security

21 new findings: Host-012..015, InboundAPI-014..017, ManagementService-014..017, NotificationService-014..018, Security-012..015.
This commit is contained in:
Joseph Doherty
2026-05-17 00:48:25 -04:00
parent 89636e2bbf
commit 3b3760f026
6 changed files with 873 additions and 41 deletions

View File

@@ -5,10 +5,10 @@
| Module | `src/ScadaLink.InboundAPI` |
| Design doc | `docs/requirements/Component-InboundAPI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 0 |
| Commit reviewed | `39d737e` |
| Open findings | 4 |
## Summary
@@ -30,6 +30,25 @@ well but there is no coverage of the HTTP endpoint, concurrency, or recompilatio
None of the findings are data-loss-class, but the concurrency and trust-model issues
are High severity and should be addressed before production use.
#### Re-review 2026-05-17 (commit `39d737e`)
All 13 findings from the initial review remain `Resolved`; the module source under
`src/ScadaLink.InboundAPI` is unchanged since the last InboundAPI fix commit
(`8dd7412`), which precedes `39d737e`. This re-review re-walked all 10 checklist
categories against the resolved code and surfaced **4 new findings** — none touching
the previously-fixed concurrency/trust-model code, but all in areas the first pass
did not probe deeply: (1) the `ReturnDefinition` column is loaded onto `ApiMethod`
but is never consulted — script return values are serialized verbatim with no shaping
or validation against the declared return structure (InboundAPI-014); (2) the new
`ForbiddenApiChecker` is a purely textual syntax walker and can be bypassed by
reaching forbidden functionality through member access that never spells a forbidden
namespace, e.g. `typeof(x).Assembly.GetType("System.IO.File")` (InboundAPI-015);
(3) routed `Route.To().Call()` invocations are not bound by the method timeout unless
the script explicitly threads `Parameters`-side cancellation, contradicting the design
statement that the timeout covers routed calls (InboundAPI-016); and (4) `RouteHelper`
/ `RouteTarget` — the entire WP-4 cross-site routing surface — has no test coverage
(InboundAPI-017). New findings are one Medium-trio plus one Low; no Critical or High.
## Checklist coverage
| # | Category | Examined | Notes |
@@ -38,11 +57,11 @@ are High severity and should be addressed before production use.
| 2 | Akka.NET conventions | ☑ | Module is ASP.NET-hosted, no actors of its own; routes to actors via `CommunicationService`. No correlation-ID issues — IDs are set in `RouteHelper`. |
| 3 | Concurrency & thread safety | ☑ | Singleton `InboundScriptExecutor` mutates a non-thread-safe `Dictionary` from concurrent request threads — see InboundAPI-001/002. |
| 4 | Error handling & resilience | ☑ | Catch-all conflates client cancellation with timeout (InboundAPI-004); compilation-failure path repeats work on every request (InboundAPI-009). |
| 5 | Security | ☑ | Non-constant-time key comparison, no trust-model enforcement, no body-size limit, missing-method enumeration oracle — see InboundAPI-003/005/006/011. |
| 5 | Security | ☑ | Prior items resolved. Re-review: `ForbiddenApiChecker` is a textual deny-list bypassable via reflection without a forbidden namespace token (InboundAPI-015). |
| 6 | Performance & resource management | ☑ | Up to 3 separate DB round-trips per request in `ApiKeyValidator`; uncapped lazy recompilation. |
| 7 | Design-document adherence | ☑ | `Database.Connection()` script API missing; central-only hosting not enforced; lazy-compile diverges from "compiled at startup". |
| 8 | Code organization & conventions | ☑ | `ParameterDefinition` is an API-shaped POCO declared in the component project rather than Commons; otherwise conventions followed. |
| 9 | Testing coverage | ☑ | Good unit coverage of the two validators; no endpoint, concurrency, recompilation, or timeout-vs-cancel tests. |
| 7 | Design-document adherence | ☑ | Re-review: `ReturnDefinition` loaded but never used (InboundAPI-014); routed-call timeout not enforced (InboundAPI-016). Prior `Database.Connection()`/central-only items resolved. |
| 8 | Code organization & conventions | ☑ | `ParameterDefinition` moved to Commons (InboundAPI-012 resolved); no new issues. |
| 9 | Testing coverage | ☑ | Re-review: `RouteHelper`/`RouteTarget` (WP-4 routing) entirely untested (InboundAPI-017); validators/executor/filter well covered. |
| 10 | Documentation & comments | ☑ | `ApiKeyValidationResult.NotFound` XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013). |
## Findings
@@ -580,3 +599,181 @@ the new method-not-found status, and removing dead code cannot regress. Doc-owne
follow-up: `Component-InboundAPI.md`'s Error Handling section still does not list a
"method not found" status; it should note that it is reported as 403 (indistinguishable
from "key not approved"), but that doc edit is outside this module's editable scope.
### InboundAPI-014 — `ReturnDefinition` is loaded but never used; script return value is unshaped/unvalidated
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:201-205`, `src/ScadaLink.Commons/Entities/InboundApi/ApiMethod.cs:10` |
**Description**
`Component-InboundAPI.md` ("API Method Definition → Return Value Definition" and the
"Response Format" section) specifies that each method has a declared return structure
— "Field names and data types … Supports returning lists of objects" — and that the
success response body is "the method's return value as JSON, with fields matching the
return value definition". The `ApiMethod` entity carries a `ReturnDefinition` column
to hold exactly this. However, nothing in the module ever reads `ReturnDefinition`:
`ExecuteAsync` takes whatever object the script happens to return and does a blind
`JsonSerializer.Serialize(result)`. There is no validation that the script's return
value matches the declared shape, no coercion to the declared field types, and no
error when a method returns a structure inconsistent with its definition. A method
whose script returns the wrong shape (or `null` where a structure is required) will
silently emit a malformed 200 response, and the documented return-definition contract
is effectively unenforced. This is the response-side mirror of the parameter
validation that `ParameterValidator` does perform, leaving the two halves of the
method contract asymmetric.
**Recommendation**
Either (a) implement return-value validation/shaping: parse `ReturnDefinition` with
the same extended-type machinery used for parameters and validate/coerce the script
result before serializing, returning a 500 (or logging) when the script result does
not match; or (b) if return shaping is deliberately out of scope, remove the "Return
Value Definition" / "fields matching the return value definition" language from
`Component-InboundAPI.md` and document that the response is the script's raw return
value serialized as-is. Code and design doc must be reconciled.
**Resolution**
_Unresolved._
### InboundAPI-015 — `ForbiddenApiChecker` is purely textual and is bypassable via reflection reachable without a forbidden namespace token
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/ForbiddenApiChecker.cs:63-119`, `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:109-126` |
**Description**
`ForbiddenApiChecker` walks the script syntax tree and rejects any `using` directive,
`QualifiedNameSyntax`, or `MemberAccessExpressionSyntax` whose textual dotted name
starts with a forbidden namespace prefix (`System.IO`, `System.Diagnostics`,
`System.Reflection`, `System.Net`, etc.). This is a textual match, not a semantic
one, and the trust model it enforces (per InboundAPI-005) is explicitly meant to keep
*untrusted* Design-role scripts away from host APIs. The check can be bypassed because
forbidden functionality is reachable through member access that never spells a
forbidden namespace:
- `typeof(string).Assembly.GetType("System.IO.File")``typeof(string)` is permitted,
`.Assembly` is a `System.Type` property, `.GetType(string)` is a `System.Reflection.Assembly`
method. The string literal `"System.IO.File"` is a string, not a `QualifiedNameSyntax`
or `MemberAccessExpressionSyntax`, so `IsForbidden` never sees it. The script obtains
a `System.IO.File` `Type` and can `InvokeMember`/`GetMethod(...).Invoke(...)` on it —
all via members of permitted types — with no forbidden namespace ever appearing in
the source. `CompileAndRegister` references `typeof(object).Assembly`
(System.Private.CoreLib) in `ScriptOptions`, so every framework type is loadable at
runtime.
- The executor also references the `Microsoft.CSharp.RuntimeBinder` assembly
(`InboundScriptExecutor.cs:116`), enabling the `dynamic` keyword, which further
widens late-bound member access that the static walker cannot see through.
Because the inbound API script runs on the central node with the host process's
privileges and is authored by the (less-trusted-than-Admin) Design role, a static
textual deny-list gives a false sense of containment.
**Recommendation**
Treat the syntax walker as defence-in-depth, not the boundary. Strengthen it where
cheap (flag `Assembly.GetType`, `Type.GetType`, `Activator.CreateInstance`,
`InvokeMember`, and `dynamic` usage), but for real enforcement run compiled scripts
under a genuine boundary — a restricted `AssemblyLoadContext`/AppDomain-equivalent, a
curated reference set that does not expose reflection-to-arbitrary-type, or an
out-of-process sandbox — consistent with however the Site Runtime ultimately enforces
its instance-script trust model. At minimum, document in `Component-InboundAPI.md`
that the current check is best-effort and does not stop a determined script.
**Resolution**
_Unresolved._
### InboundAPI-016 — Routed `Route.To().Call()` invocations are not bound by the method timeout
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/RouteHelper.cs:59-152`, `src/ScadaLink.InboundAPI/InboundScriptExecutor.cs:177`, `:199` |
**Description**
`Component-InboundAPI.md` states the per-method timeout "defines the maximum time the
method is allowed to execute (**including any routed calls to sites**)", and the
Routing Behavior section says a routed call "blocks until the site responds or the
**method-level timeout** is reached". The executor builds a linked
`CancellationTokenSource` (`cts`) combining the request-abort token and a dedicated
timeout CTS, and exposes `cts.Token` to the script as `InboundScriptContext.CancellationToken`.
However, every `RouteTarget` method (`Call`, `GetAttribute(s)`, `SetAttribute(s)`)
takes `CancellationToken cancellationToken = default` and the script must *explicitly*
pass the context token for the routed call to honour the timeout. A natural script —
`Route.To("inst").Call("doWork", parameters)` — invokes the routed call with
`CancellationToken.None`. That request flows into `CommunicationService.RouteToCallAsync`
with no cancellation, so the routed call is not bounded by the method timeout at all.
The only timeout guard left is `handler(context).WaitAsync(cts.Token)` in
`ExecuteAsync`: when the method timeout fires, `WaitAsync` returns a cancellation to
the caller, but the underlying script `Task` — and the in-flight `RouteToCallAsync`
awaiting a remote site — keeps running orphaned with no cancellation, holding the
correlation/communication resources until the site eventually responds or its own
transport timeout (if any) fires. The design's guarantee that the method timeout
covers routed calls is therefore not met, and a slow/hung site can leak background
work past the timeout the caller was told bounds the request.
**Recommendation**
Make routed calls inherit the method deadline without relying on script discipline:
have `RouteHelper`/`RouteTarget` carry the executing method's `CancellationToken`
(injected by `InboundScriptExecutor` when it constructs the context, e.g. a
`RouteHelper` bound to `cts.Token`) and pass it into every `CommunicationService`
call by default, so `Route.To("x").Call("s", p)` is timeout-bounded with no token
argument. Keep the explicit-token overload for callers that want a tighter bound.
Verify `RouteToCallAsync` and the attribute-routing calls actually observe the token
and abandon the in-flight request when it fires.
**Resolution**
_Unresolved._
### InboundAPI-017 — `RouteHelper` / `RouteTarget` has no test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `src/ScadaLink.InboundAPI/RouteHelper.cs:1-165`, `tests/ScadaLink.InboundAPI.Tests/` |
**Description**
`RouteHelper`/`RouteTarget` is the entire WP-4 cross-site routing surface — the
`Route.To().Call()/GetAttribute(s)/SetAttribute(s)` API that inbound API scripts use
to reach instances at any site. It has zero tests: the `ScadaLink.InboundAPI.Tests`
project covers `ApiKeyValidator`, `ParameterValidator`, `InboundScriptExecutor`, and
`InboundApiEndpointFilter`, but no test file exercises `RouteHelper`. Untested
behaviours include site resolution via `IInstanceLocator` (including the
"instance not found / no assigned site" `InvalidOperationException` path at
`RouteHelper.cs:154-164`), the `!response.Success``InvalidOperationException`
translation in each routed method, `GetAttribute` delegating to the batch
`GetAttributes` and returning `null` for an absent key, correlation-ID generation,
and `SetAttribute` delegating to `SetAttributes`. These are non-trivial branches
whose failure modes (a thrown exception inside a script) surface to the caller as a
500, so regressions would be silent.
**Recommendation**
Add a `RouteHelperTests` suite using substituted `IInstanceLocator` and
`CommunicationService` (the executor tests already substitute `CommunicationService`):
cover the happy path of each routed method, the unresolved-instance throw, the
`!Success``InvalidOperationException` mapping, and `GetAttribute` returning `null`
for a missing key. This also gives InboundAPI-016 a regression home if the timeout
wiring is added.
**Resolution**
_Unresolved._