Files
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00

75 KiB
Raw Permalink Blame History

Code Review — InboundAPI

Field Value
Module src/ZB.MOM.WW.ScadaBridge.InboundAPI
Design doc docs/requirements/Component-InboundAPI.md
Status Reviewed
Last reviewed 2026-05-28
Reviewer claude-agent
Commit reviewed 1eb6e97
Open findings 0

Summary

The InboundAPI module is small (8 source files) and the happy-path flow — extract key, validate, deserialize parameters, execute script, serialize result — is clean and readable. However the review surfaced several real problems concentrated in two themes: concurrency and security. The InboundScriptExecutor is a singleton that mutates a plain Dictionary from concurrent ASP.NET request threads with no synchronization, which can corrupt the handler cache or crash the process under load. On the security side, API-key comparison is a non-constant-time database string match (timing oracle), compiled scripts run with no enforcement of the documented script trust model (forbidden APIs such as System.IO/Process/Reflection are fully reachable), there is no request-body size limit, and the executor's catch-all swallows OperationCanceledException from genuine client disconnects as a "timeout". Design-doc adherence is also incomplete: the Database.Connection() script API described in the design doc is entirely absent from InboundScriptContext, and the endpoint never enforces that the API is central-only. Testing covers the validators well but there is no coverage of the HTTP endpoint, concurrency, or recompilation. None of the findings are data-loss-class, but the concurrency and trust-model issues are High severity and should be addressed before production use.

Re-review 2026-05-17 (commit 39d737e)

All 13 findings from the initial review remain Resolved; the module source under src/ZB.MOM.WW.ScadaBridge.InboundAPI is unchanged since the last InboundAPI fix commit (8dd7412), which precedes 39d737e. This re-review re-walked all 10 checklist categories against the resolved code and surfaced 4 new findings — none touching the previously-fixed concurrency/trust-model code, but all in areas the first pass did not probe deeply: (1) the ReturnDefinition column is loaded onto ApiMethod but is never consulted — script return values are serialized verbatim with no shaping or validation against the declared return structure (InboundAPI-014); (2) the new ForbiddenApiChecker is a purely textual syntax walker and can be bypassed by reaching forbidden functionality through member access that never spells a forbidden namespace, e.g. typeof(x).Assembly.GetType("System.IO.File") (InboundAPI-015); (3) routed Route.To().Call() invocations are not bound by the method timeout unless the script explicitly threads Parameters-side cancellation, contradicting the design statement that the timeout covers routed calls (InboundAPI-016); and (4) RouteHelper / RouteTarget — the entire WP-4 cross-site routing surface — has no test coverage (InboundAPI-017). New findings are one Medium-trio plus one Low; no Critical or High.

Checklist coverage

# Category Examined Notes
1 Correctness & logic bugs CoerceValue returns null for legitimately-null/String values indistinguishably; parameter-definition edge cases noted.
2 Akka.NET conventions Module is ASP.NET-hosted, no actors of its own; routes to actors via CommunicationService. No correlation-ID issues — IDs are set in RouteHelper.
3 Concurrency & thread safety Singleton InboundScriptExecutor mutates a non-thread-safe Dictionary from concurrent request threads — see InboundAPI-001/002.
4 Error handling & resilience Catch-all conflates client cancellation with timeout (InboundAPI-004); compilation-failure path repeats work on every request (InboundAPI-009).
5 Security Prior items resolved. Re-review: ForbiddenApiChecker is a textual deny-list bypassable via reflection without a forbidden namespace token (InboundAPI-015).
6 Performance & resource management Up to 3 separate DB round-trips per request in ApiKeyValidator; uncapped lazy recompilation.
7 Design-document adherence Re-review: ReturnDefinition loaded but never used (InboundAPI-014); routed-call timeout not enforced (InboundAPI-016). Prior Database.Connection()/central-only items resolved.
8 Code organization & conventions ParameterDefinition moved to Commons (InboundAPI-012 resolved); no new issues.
9 Testing coverage Re-review: RouteHelper/RouteTarget (WP-4 routing) entirely untested (InboundAPI-017); validators/executor/filter well covered.
10 Documentation & comments ApiKeyValidationResult.NotFound XML/name says "NotFound" but returns HTTP 400 — misleading (InboundAPI-013).

Re-review 2026-05-28 (commit 1eb6e97)

All 17 prior findings remain Resolved. The module has grown materially since the last pass — a new AuditWriteMiddleware (Audit Log #23 M4 Bundle D) now lives under src/ZB.MOM.WW.ScadaBridge.InboundAPI/Middleware/, the ApiKeyValidator was rewired to hash the candidate with IApiKeyHasher (ConfigurationDatabase-012), and an IInstanceRouter seam was introduced. This re-review re-walked all 10 checklist categories against 1eb6e97 and surfaced 8 new findings concentrated on the new audit middleware and a stranded follow-up from InboundAPI-008:

  1. The InboundAPI-008 resolution explicitly deferred registering an IActiveNodeGate implementation in ZB.MOM.WW.ScadaBridge.Host as a "follow-up outside this module's scope" — that follow-up is still unfulfilled (no production registration anywhere in src/ZB.MOM.WW.ScadaBridge.Host/), so the design-mandated standby-node gating is silently disabled in production today (InboundAPI-022, High).
  2. AuditWriteMiddleware is wired in Program.cs against /api/* rather than the specific POST /api/{methodName} route, so GETs against /api/audit/query and /api/audit/export (audit query endpoints — themselves not script invocations) now emit spurious AuditChannel.ApiInbound/InboundRequest rows back into the audit log with Target set to the last path segment (InboundAPI-025, Medium).
  3. The middleware fires its audit write as _ = _auditWriter.WriteAsync(evt) — the wrapping try/catch only catches synchronous throws, so a faulted async writer task is unobserved and the row silently disappears with no log line (InboundAPI-018, Low/Medium).
  4. ParentExecutionId correlation flows only through RouteToCallRequestRouteToGetAttributesRequest/RouteToSetAttributesRequest have no ParentExecutionId field, so attribute reads/writes from inbound scripts lose the inbound→site execution-tree link the Audit Log decision in CLAUDE.md describes (InboundAPI-021, Medium).
  5. EndpointExtensions.HandleInboundApiRequest — the entire wiring composition that ties validator/executor/route/audit together — has no test coverage; only the components it composes are tested (InboundAPI-023, Low).
  6. EndpointExtensions.HandleInboundApiRequest does ContentType?.Contains("json") (case-sensitive) so a request with application/JSON and no Content-Length silently skips JSON body parsing (InboundAPI-020, Low).
  7. AuditWriteMiddleware.InvokeAsync calls EnableBuffering() unconditionally before the empty-body short-circuit, allocating a FileBufferingReadStream for every request including bodyless ones (InboundAPI-019, Low).

Severity mix: 1 High, 3 Medium, 4 Low — no Critical. (The eighth finding — InboundAPI-024, Low — is a defensive watch-list item flagging that _knownBadMethods is unbounded; it is bounded in practice today by the configuration database, but the invariant is undocumented.)

Checklist coverage — 2026-05-28 (commit 1eb6e97)

# Category Examined Notes
1 Correctness & logic bugs ContentType?.Contains("json") is case-sensitive (InboundAPI-020).
2 Akka.NET conventions ASP.NET-hosted, no actors of its own; routes via IInstanceRouter/CommunicationService. No new issues.
3 Concurrency & thread safety ConcurrentDictionary handler cache (post-001/002 fix). New audit middleware is per-request scoped, no shared mutable state. No new issues.
4 Error handling & resilience Audit WriteAsync is fire-and-forget; async faults are unobserved (InboundAPI-018).
5 Security IActiveNodeGate not registered in Host — standby-node gating disabled in production (InboundAPI-022).
6 Performance & resource management EnableBuffering() unconditional on bodyless requests (InboundAPI-019); audit middleware wraps Response.Body and mints ExecutionId for non-script /api routes (InboundAPI-025).
7 Design-document adherence ParentExecutionId not stamped on attribute-read/write routed messages (InboundAPI-021). InboundAPI-008's deferred Host registration still unfulfilled (InboundAPI-022).
8 Code organization & conventions No new issues.
9 Testing coverage EndpointExtensions.HandleInboundApiRequest composition wiring has no test (InboundAPI-023); middleware/filter/validator/executor/route are individually covered.
10 Documentation & comments No new issues.

Findings

InboundAPI-001 — Singleton script handler cache mutated without synchronization

Severity High
Category Concurrency & thread safety
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:17, :32, :40, :89, :123-128

Description

InboundScriptExecutor is registered as a singleton (ServiceCollectionExtensions.cs:11) and its handler cache is a plain Dictionary<string, Func<...>> (InboundScriptExecutor.cs:17). RegisterHandler, RemoveHandler, CompileAndRegister, and the lazy-compile path in ExecuteAsync all read and write this dictionary with no lock. ASP.NET serves inbound API requests on concurrent thread-pool threads, so two requests for an as-yet-uncompiled method (or a request racing a CLI-triggered CompileAndRegister) can mutate the dictionary concurrently. Dictionary is explicitly not safe for concurrent read/write — this can corrupt internal buckets, throw InvalidOperationException, or return a torn/null handler, crashing the request or the process.

Recommendation

Replace the Dictionary with a ConcurrentDictionary<string, Func<...>>, or guard all access with a lock. For the lazy-compile path use GetOrAdd so concurrent first-callers compile at most once.

Resolution

Resolved 2026-05-16 (commit <pending>): replaced the plain Dictionary handler cache with a ConcurrentDictionary; RemoveHandler now uses TryRemove; the lazy-compile path in ExecuteAsync compiles outside the cache and inserts atomically via GetOrAdd so concurrent first-callers share one handler. Regression tests ConcurrentLazyCompile_SameMethod_DoesNotCorruptCache and ConcurrentRegisterAndExecute_DoesNotThrow added.

InboundAPI-002 — Lazy compilation is a check-then-act race with no atomicity

Severity Medium — re-triaged: already fixed by the InboundAPI-001 fix; verified and closed
Category Concurrency & thread safety
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:152-161

Description

ExecuteAsync does if (!_scriptHandlers.TryGetValue(...)) { CompileAndRegister(method); handler = _scriptHandlers[method.Name]; }. Even setting aside the unsynchronized dictionary (InboundAPI-001), this is a check-then-act sequence: between TryGetValue failing and the re-read on line 128, another thread could RemoveHandler the entry, causing the indexer on line 128 to throw KeyNotFoundException — an unhandled-in-context exception that is then caught only by the broad catch on line 143 and reported to the caller as "Internal script error". Multiple concurrent first-callers will also each compile the same script redundantly (wasted Roslyn work).

Recommendation

Make compile-and-fetch a single atomic operation (ConcurrentDictionary.GetOrAdd with a lazily-evaluated factory, or a per-method lock), and have CompileAndRegister return the handler it produced rather than requiring a separate dictionary read.

Resolution

Resolved 2026-05-16 (commit pending): re-triage — verified against the current source, this finding was already fixed by the InboundAPI-001 fix. The InboundScriptExecutor.cs:152-161 lazy-compile path no longer does check-then-act re-read: Compile(method) runs unconditionally (it never reads the cache) and the result is published via the atomic _scriptHandlers.GetOrAdd(method.Name, compiled). There is no separate dictionary indexer read, so the KeyNotFoundException race the finding describes cannot occur, and concurrent first-callers all share the single handler that GetOrAdd keeps. Regression test LazyCompile_RacingRemoveHandler_NeverThrowsKeyNotFound added (asserts a concurrent RemoveHandler storm against lazy-compiling callers never yields the catch-all "Internal script error"); it passes against the current code, confirming the fix.

InboundAPI-003 — API key compared with non-constant-time string equality

Severity High
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/InboundApiRepository.cs:22-23, consumed by src/ZB.MOM.WW.ScadaBridge.InboundAPI/ApiKeyValidator.cs:33

Description

API-key authentication resolves the key with FirstOrDefaultAsync(k => k.KeyValue == keyValue) — an ordinary equality match translated to a SQL WHERE KeyValue = @p comparison. The secret is matched with ordinary (early-exit) string/SQL comparison rather than a constant-time comparison, which is a classic timing side-channel for secret material. Combined with the design's explicit "no rate limiting" decision, an attacker with network access to the central API can mount a timing attack to recover valid keys. The API key is the sole credential for the inbound API, so this is the primary authentication path.

Recommendation

Look the key up by a non-secret indexed identifier (e.g. a key prefix/id) or fetch candidate rows, then verify the secret in-process using CryptographicOperations.FixedTimeEquals over the UTF-8 bytes. Preferably store only a salted hash of the key value and compare hashes. Avoid leaking secret-length and match-position timing.

Resolution

Resolved 2026-05-16 (commit <pending>): ApiKeyValidator no longer calls the secret-equality lookup GetApiKeyByValueAsync (the SQL WHERE KeyValue = @secret timing oracle). It now fetches all keys via GetAllApiKeysAsync and matches the secret in-process with CryptographicOperations.FixedTimeEquals over the UTF-8 bytes, scanning every candidate so neither match position nor secret length is observable. Regression tests ValidateAsync_DoesNotUseSecretEqualityLookup, ValidateAsync_WrongKey_ConstantTimePath_Returns401, and ValidateAsync_KeyOfDifferentLength_Returns401 added. Note: the timing-oracle method GetApiKeyByValueAsync remains on IInboundApiRepository (it is outside this module); removing it from the repository is left as separate follow-up since the validator no longer depends on it.

InboundAPI-004 — Client disconnect is misreported as a script timeout

Severity Medium
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:117-141

Description

ExecuteAsync creates a linked CTS from httpContext.RequestAborted and the method timeout, then catches OperationCanceledException and unconditionally returns "Script execution timed out". When the client aborts the request (RequestAborted fires), the same exception type is thrown, so a normal client disconnect is logged as a timeout (_logger.LogWarning("Script execution timed out ...")) and an attempt is made to write a 500 timeout body to an already-gone connection. This pollutes the failure log (which the design says is reserved for genuine script errors) and obscures real timeout incidents.

Recommendation

Distinguish the two cancellation sources: if cancellationToken (the request token) is cancelled, treat it as a client abort — do not log a timeout and do not attempt to write a response. Only when the timeout CTS fired should the result be "timed out". Check cts.Token.IsCancellationRequested && !cancellationToken.IsCancellationRequested or use a dedicated timeout CancellationTokenSource so the two are separable.

Resolution

Resolved 2026-05-16 (commit pending): ExecuteAsync now uses a dedicated timeout CancellationTokenSource (new CancellationTokenSource(timeout)) linked with the request-abort token, so the two cancellation sources are separable. The OperationCanceledException handler reports "Script execution timed out" (and logs a warning) only when the timeout CTS fired and the request token did not; a client abort instead returns "Request cancelled by client" and logs at Debug — the failure log stays reserved for genuine script-execution timeouts. HandleInboundApiRequest additionally short-circuits with Results.Empty (no warning log, no 500 body write) when RequestAborted is cancelled, since the connection is already gone. Regression tests ClientDisconnect_IsNotReportedAsTimeout and GenuineTimeout_StillReportedAsTimeout added.

InboundAPI-005 — Compiled API scripts run with no script-trust-model enforcement

Severity High
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:56-93

Description

CLAUDE.md's Akka.NET conventions state the script trust model forbids System.IO, Process, Threading, Reflection, and raw network access. CompileAndRegister compiles arbitrary C# with CSharpScript.Create and only restricts the default imports (WithImports("System", ...)). Imports are a convenience, not a sandbox — a script can still fully-qualify any type (System.IO.File.Delete(...), System.Diagnostics.Process.Start(...), System.Reflection, raw Socket) because the core framework assemblies are referenced and Roslyn scripting performs no API allow/deny-listing. Inbound API scripts execute on the central node with the host process's privileges, so a malicious or buggy method definition has full host access. Note the Design role authors these scripts (less trusted than Admin), making enforcement material.

Recommendation

Add a compile-time analyzer/SyntaxWalker (as the Site Runtime does for instance scripts) that rejects forbidden namespaces/types before registering a handler, and/or run scripts under a constrained boundary. At minimum, share the Site Runtime's forbidden-API checker so the trust model is enforced consistently. Reject the method (and log) when a violation is found instead of registering it.

Resolution

Resolved 2026-05-16 (commit <pending>): added ForbiddenApiChecker, a Roslyn CSharpSyntaxWalker that statically rejects scripts referencing forbidden namespaces (System.IO, System.Diagnostics, System.Threading except Tasks, System.Reflection, System.Net, System.Runtime.InteropServices, Microsoft.Win32) whether reached via a using directive or a fully-qualified name. CompileAndRegister now runs the check before Roslyn compilation and refuses to register (and logs) a violating method; ExecuteAsync's lazy-compile path is gated by the same check. Regression tests CompileAndRegister_ForbiddenApi_RejectsScript (theory), ExecuteAsync_ForbiddenApiScript_DoesNotRunAndReturnsFailure, and CompileAndRegister_PermittedScript_StillRegisters added.

InboundAPI-006 — No request body size limit on the inbound endpoint

Severity Medium
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/EndpointExtensions.cs:54-62

Description

HandleInboundApiRequest calls JsonDocument.ParseAsync(httpContext.Request.Body, ...) with no explicit body-size cap and no [RequestSizeLimit]/endpoint metadata. Although Kestrel has a default max request body size, this endpoint accepts arbitrary JSON from external systems, fully buffers it into a JsonDocument, and then Clone()s the root element (:61) which materializes the entire document on the heap. With no rate limiting (a deliberate design choice) a single caller can drive large allocations. Deep/wide JSON also makes the CoerceValue object/list deserialization (ParameterValidator.cs:113,117) expensive.

Recommendation

Set an explicit, modest body-size limit on the endpoint (.WithMetadata(new RequestSizeLimitAttribute(...)) or IHttpMaxRequestBodySizeFeature) and consider a JsonDocumentOptions MaxDepth. Reject oversized bodies with 413 before buffering.

Resolution

Resolved 2026-05-16 (commit pending): added InboundApiEndpointFilter, an IEndpointFilter applied to POST /api/{methodName} via .AddEndpointFilter<>(). It rejects requests whose declared Content-Length exceeds InboundApiOptions. MaxRequestBodyBytes (default 1 MiB) with HTTP 413 before the handler buffers the body into a JsonDocument, and also lowers the per-request IHttpMaxRequestBodySizeFeature cap so a chunked/unknown-length stream is cut off by Kestrel while being read. The limit is configurable via the bound ScadaBridge:InboundApi options section. Regression tests OversizedBody_ShortCircuitsWith413_AndDoesNotRunHandler, BodyAtLimit_RunsHandler, and FilterCapsMaxRequestBodySizeFeature added.

InboundAPI-007 — Database.Connection() script API from the design doc is not implemented

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:188-203

Description

Component-InboundAPI.md ("Script Runtime API -> Database Access") specifies Database.Connection("connectionName") as an available script capability for querying the configuration/machine-data databases. InboundScriptContext exposes only Parameters, Route, and CancellationToken — there is no Database member. Any method script that follows the documented API will fail to compile. Either the code is incomplete or the design doc is stale; the two must be reconciled.

Recommendation

If database access is in scope, add a Database property to InboundScriptContext backed by a connection-factory service. If it is not, remove the "Database Access" section from Component-InboundAPI.md so the design doc stops advertising an absent API.

Resolution

Resolved 2026-05-16 (commit <pending>). The drift was confirmed real: InboundScriptContext (InboundScriptExecutor.cs:188-203) exposes only Parameters, Route, and CancellationToken — there is no Database member, so a method script following the documented Database.Connection("name") API would fail to compile. Resolution direction: the design doc is stale, not the code. Implementing Database.Connection() would hand inbound API scripts a raw MS SQL client, in direct tension with the ScadaBridge script trust model (scripts are forbidden System.IO, raw network, etc.; ForbiddenApiChecker statically enforces this). Rather than carve a hole in the trust model, the "Database Access" section was removed from docs/requirements/Component-InboundAPI.md and replaced with an explicit "No direct database access" note explaining that scripts interact only through the curated Route/Parameters surfaces, and that any future data access must go behind a dedicated scoped helper added as an explicit design change. Code and doc now agree; no code or test change required.

InboundAPI-008 — Inbound API endpoint not restricted to the active central node

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/EndpointExtensions.cs:19-23, src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:149

Description

The design states the Inbound API is "Central cluster only (active node)" and "fails over with it". MapInboundAPI registers POST /api/{methodName} unconditionally, and Program.cs maps it inside the central-role branch but with no active-node gating — unlike /health/active which has an active-node predicate. A standby central node will happily serve inbound API calls, executing scripts and Route.To() calls from a non-leader, which can race the active node or run against stale singleton state.

Recommendation

Gate the endpoint on active-node status (reuse the cluster active-node health check or a leader-state check) and return 503 on the standby, so Traefik/clients only reach the live node — consistent with how the Management API and /health/active are treated.

Resolution

Resolved 2026-05-16 (commit pending): introduced IActiveNodeGate, an abstraction the inbound API uses to ask whether this node is the active (cluster-leader) central node. The new InboundApiEndpointFilter (applied to POST /api/{methodName}) consults the gate and short-circuits a standby node with HTTP 503 before any auth/script work, so Traefik/clients only reach the live node — consistent with /health/active. The gate is resolved optionally: when no implementation is registered (non-clustered host / tests) the endpoint defaults to "allow", preserving prior behaviour. Regression tests StandbyNode_ShortCircuitsWith503_AndDoesNotRunHandler, ActiveNode_PassesGate_RunsHandler, and NoGateRegistered_PassesGate_RunsHandler added. Follow-up (outside this module's scope): ZB.MOM.WW.ScadaBridge.Host should register an IActiveNodeGate implementation backed by ActiveNodeHealthCheck / Cluster.State.Leader in the central-role branch of Program.cs so the gate is actually enforced in production; until then the endpoint defaults to "allow".

InboundAPI-009 — Failed compilation is retried on every subsequent request

Severity Low
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:123-128

Description

When a method's script fails to compile, CompileAndRegister returns false and nothing is stored in _scriptHandlers. Every subsequent call to that method re-enters the lazy-compile branch and recompiles the broken script via Roslyn from scratch. Roslyn compilation is expensive; a single broken method definition repeatedly invoked by an external caller (no rate limiting) becomes a CPU amplification vector.

Recommendation

Cache the compilation failure (e.g. store a sentinel handler that immediately returns the compile error, or keep a HashSet of known-bad method names with the diagnostic) so a broken script is compiled at most once until the definition is updated via CompileAndRegister.

Resolution

Resolved 2026-05-16 (commit pending): confirmed the root cause — a failed Compile stored nothing in _scriptHandlers, so every subsequent request re-entered the lazy-compile branch and re-ran Roslyn. Added a _knownBadMethods ConcurrentDictionary of method names whose compilation failed; ExecuteAsync's lazy-compile path short-circuits before Roslyn when the method is already known-bad, and records the failure on a fresh failed compile. CompileAndRegister also records failures and clears the record on a successful (re)compile, so a fixed method definition is re-evaluated. Regression tests FailedCompilation_IsNotRetriedOnEveryRequest (asserts the compile-failure log fires exactly once across 5 requests) and FailedCompilation_RecompilesAfterCompileAndRegisterCalledAgain added.

InboundAPI-010 — ParameterValidator ignores extra body fields and cannot validate Object/List element types

Severity Low
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/ParameterValidator.cs:64-90, :112-118

Description

Two related correctness gaps: (1) The validator iterates only over defined parameters; any extra top-level fields in the request body are silently ignored rather than reported, so callers get no feedback on typo'd parameter names. (2) For Object and List types the validator only checks the JSON kind (Object/Array) and then blindly JsonSerializer.Deserializes the raw text — the design's extended type system describes Objects as "named structure with typed fields" and Lists as collections "of objects or primitive types", but no field-level or element-level type validation is performed. Invalid nested structures pass validation and surface only as runtime script errors.

Recommendation

Optionally warn/400 on unexpected body fields. For the extended types, either parse a richer ParameterDefinition (with nested field definitions / element type) and validate recursively, or document explicitly that Object/List are validated only for shape — and update the design doc to match.

Resolution

Resolved 2026-05-16 (commit pending): both gaps addressed along the lines the recommendation offers. (1) ParameterValidator.Validate now enumerates the request body's top-level fields and returns an Invalid result naming any field that does not match a defined parameter ("Unexpected parameter(s): ..."), so a typo'd parameter name is reported instead of silently ignored. (2) For Object/List, recursive field/element-level type validation is deliberately not added — it requires a richer nested ParameterDefinition schema, a design decision; instead the shape-only behaviour is now explicitly documented in the CoerceValue XML comment so the code's contract is unambiguous. Re-triage note: the design doc (Component-InboundAPI.md line 43) lists only Boolean/Integer/Float/String as method parameter types — the Object/List extended types are a CLAUDE.md decision; reconciling the design doc is out of this module's editable scope and left as a doc-owner follow-up. Regression tests UnexpectedBodyField_ReturnsInvalid and OnlyDefinedFields_StillValid added.

InboundAPI-011 — Method-existence check leaks to unapproved callers (enumeration oracle)

Severity Low
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/ApiKeyValidator.cs:39-52

Description

ValidateAsync returns 400 Method '{methodName}' not found when the method does not exist, but 403 API key not approved for this method when it exists but the key is not approved. A caller holding any valid enabled key can therefore enumerate which method names exist on the central API by observing 400-vs-403 responses. The error message also echoes the caller-supplied methodName back verbatim into the JSON response (EndpointExtensions.cs:47), a minor reflected-input concern.

Recommendation

Return an indistinguishable response (e.g. 403/404) for both "method not found" and "key not approved" so existence is not observable to unapproved callers. Avoid echoing raw caller input in error bodies, or sanitize it.

Resolution

Resolved 2026-05-16 (commit pending): confirmed — ValidateAsync returned 400 Method '{methodName}' not found for a missing method but 403 API key not approved for this method for an existing-but-unapproved one, an enumeration oracle, and echoed the caller-supplied method name verbatim. Both cases now return an identical response: HTTP 403 with the single shared message API key not approved for this method (a NotApprovedMessage constant); the method name is no longer interpolated into any error body, removing both the existence oracle and the reflected-input concern. Regression tests ValidKey_MethodNotFound_IsIndistinguishableFromNotApproved and ValidKey_MethodNotFound_ErrorMessageDoesNotEchoMethodName added; the pre-existing ValidKey_MethodNotFound_Returns400 test was updated to assert the new indistinguishable contract.

InboundAPI-012 — ParameterDefinition POCO declared in the component project, not Commons

Severity Low
Category Code organization & conventions
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/ParameterValidator.cs:128-133

Description

ParameterDefinition is a persistence-/contract-shaped POCO: it is the deserialized form of ApiMethod.ParameterDefinitions (a column in the configuration database) and describes the public API contract. CLAUDE.md's code-organization rules place persistence-ignorant entity/contract types in ZB.MOM.WW.ScadaBridge.Commons. Defining it inside the InboundAPI project means any other component that needs to read or produce method parameter definitions (e.g. Central UI's method editor, CLI, Management Service) cannot share the type and will duplicate it.

Recommendation

Move ParameterDefinition (and a matching return-definition type, if added) to ZB.MOM.WW.ScadaBridge.Commons under the InboundApi entity/types namespace so it is shared by all components that work with method definitions.

Resolution

Resolved 2026-05-16 (commit <pending>): root cause confirmed against the source — ParameterDefinition was a persistence-ignorant, API-contract-shaped POCO (the deserialized form of the ApiMethod.ParameterDefinitions configuration-database column) declared inside the component project, contrary to CLAUDE.md's code-organization rule that such shared contract types live in ZB.MOM.WW.ScadaBridge.Commons. The type was moved to src/ZB.MOM.WW.ScadaBridge.Commons/Types/InboundApi/ParameterDefinition.cs (namespace ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi) — placed under Types/ with an InboundApi domain subfolder, matching the existing Types/Scripts/ precedent, since the column itself is the persisted form and this type is its deserialized contract shape (not an EF-mapped entity). It remains a pure POCO with no EF attributes and no behaviour. ParameterValidator now imports the moved type via a using ZB.MOM.WW.ScadaBridge.Commons.Types.InboundApi; directive; a tree-wide search confirmed ParameterValidator.cs was the type's only declaration and only direct consumer (all other ParameterDefinition* matches are the unrelated ParameterDefinitions string property). No return-definition type exists in the codebase — only a ReturnDefinition string column — so none was invented. No behavioural change, so no new runtime regression test: this is a compile-level type move, and the existing 52 ZB.MOM.WW.ScadaBridge.InboundAPI.Tests (including the ParameterValidator suite) act as the regression guard. dotnet test for ZB.MOM.WW.ScadaBridge.InboundAPI.Tests (52 passed) and ZB.MOM.WW.ScadaBridge.Commons.Tests (226 passed) are green; dotnet build ZB.MOM.WW.ScadaBridge.slnx succeeds with 0 warnings / 0 errors.

InboundAPI-013 — ApiKeyValidationResult.NotFound factory returns HTTP 400, contradicting its name

Severity Low
Category Documentation & comments
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/ApiKeyValidator.cs:78-79

Description

The static factory is named NotFound and is used for the "method not found" case, but it builds a result with StatusCode = 400 (Bad Request), not 404. The name strongly implies 404 and will mislead future maintainers; EndpointExtensions faithfully propagates whatever status code the factory sets, so the misnaming directly affects the wire contract.

Recommendation

Rename the factory to match its behaviour (e.g. BadRequest) or change the status code to 404 if that is the intended contract — and document the chosen "method not found" status in Component-InboundAPI.md's Error Handling section, which currently does not list it.

Resolution

Resolved 2026-05-16 (commit pending): the misnamed NotFound factory (which built a StatusCode = 400 result) was the only producer of the "method not found" result, and the InboundAPI-011 fix made "method not found" return 403 via the existing Forbidden factory instead. The misleading NotFound factory is therefore now removed entirely — it has no remaining callers in or out of the module (ApiKeyValidationResult is InboundAPI-internal), eliminating the name/behaviour contradiction. No separate regression test is needed: the InboundAPI-011 tests cover the new method-not-found status, and removing dead code cannot regress. Doc-owner follow-up: Component-InboundAPI.md's Error Handling section still does not list a "method not found" status; it should note that it is reported as 403 (indistinguishable from "key not approved"), but that doc edit is outside this module's editable scope.

InboundAPI-014 — ReturnDefinition is loaded but never used; script return value is unshaped/unvalidated

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:201-205, src/ZB.MOM.WW.ScadaBridge.Commons/Entities/InboundApi/ApiMethod.cs:10

Description

Component-InboundAPI.md ("API Method Definition → Return Value Definition" and the "Response Format" section) specifies that each method has a declared return structure — "Field names and data types … Supports returning lists of objects" — and that the success response body is "the method's return value as JSON, with fields matching the return value definition". The ApiMethod entity carries a ReturnDefinition column to hold exactly this. However, nothing in the module ever reads ReturnDefinition: ExecuteAsync takes whatever object the script happens to return and does a blind JsonSerializer.Serialize(result). There is no validation that the script's return value matches the declared shape, no coercion to the declared field types, and no error when a method returns a structure inconsistent with its definition. A method whose script returns the wrong shape (or null where a structure is required) will silently emit a malformed 200 response, and the documented return-definition contract is effectively unenforced. This is the response-side mirror of the parameter validation that ParameterValidator does perform, leaving the two halves of the method contract asymmetric.

Recommendation

Either (a) implement return-value validation/shaping: parse ReturnDefinition with the same extended-type machinery used for parameters and validate/coerce the script result before serializing, returning a 500 (or logging) when the script result does not match; or (b) if return shaping is deliberately out of scope, remove the "Return Value Definition" / "fields matching the return value definition" language from Component-InboundAPI.md and document that the response is the script's raw return value serialized as-is. Code and design doc must be reconciled.

Resolution

Resolved 2026-05-17 (commit <pending>): root cause confirmed — ApiMethod carries ReturnDefinition but the executor did a blind JsonSerializer.Serialize(result), so a script returning the wrong shape silently emitted a malformed 200. Took option (a): added ReturnValueValidator, the response-side mirror of ParameterValidator. It parses ReturnDefinition (a JSON array of {name,type} field definitions, same extended-type set as parameters), validates the serialized script result against it — declared fields must be present with a compatible JSON type, primitives type-checked, Object/List shape-checked — and a null/non-object result is rejected when a structure is declared. InboundScriptExecutor.ExecuteAsync now runs the validator after serialization and, on mismatch, logs and returns a script failure ("Method return value did not match its return definition", → 500) instead of a malformed 200. A method with no ReturnDefinition stays unconstrained (backward compatible). Doc-owner follow-up (outside this module's editable scope): the Component-InboundAPI.md "Response Format" section may note that return shaping is validation-only (no coercion). Regression tests: ReturnValueValidatorTests (12 cases) plus executor-level ReturnValue_MatchingReturnDefinition_Succeeds, ReturnValue_NotMatchingReturnDefinition_ReturnsFailureNotMalformed200, and ReturnValue_NoReturnDefinition_IsUnconstrained.

InboundAPI-015 — ForbiddenApiChecker is purely textual and is bypassable via reflection reachable without a forbidden namespace token

Severity Medium
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/ForbiddenApiChecker.cs:63-119, src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:109-126

Description

ForbiddenApiChecker walks the script syntax tree and rejects any using directive, QualifiedNameSyntax, or MemberAccessExpressionSyntax whose textual dotted name starts with a forbidden namespace prefix (System.IO, System.Diagnostics, System.Reflection, System.Net, etc.). This is a textual match, not a semantic one, and the trust model it enforces (per InboundAPI-005) is explicitly meant to keep untrusted Design-role scripts away from host APIs. The check can be bypassed because forbidden functionality is reachable through member access that never spells a forbidden namespace:

  • typeof(string).Assembly.GetType("System.IO.File")typeof(string) is permitted, .Assembly is a System.Type property, .GetType(string) is a System.Reflection.Assembly method. The string literal "System.IO.File" is a string, not a QualifiedNameSyntax or MemberAccessExpressionSyntax, so IsForbidden never sees it. The script obtains a System.IO.File Type and can InvokeMember/GetMethod(...).Invoke(...) on it — all via members of permitted types — with no forbidden namespace ever appearing in the source. CompileAndRegister references typeof(object).Assembly (System.Private.CoreLib) in ScriptOptions, so every framework type is loadable at runtime.
  • The executor also references the Microsoft.CSharp.RuntimeBinder assembly (InboundScriptExecutor.cs:116), enabling the dynamic keyword, which further widens late-bound member access that the static walker cannot see through.

Because the inbound API script runs on the central node with the host process's privileges and is authored by the (less-trusted-than-Admin) Design role, a static textual deny-list gives a false sense of containment.

Recommendation

Treat the syntax walker as defence-in-depth, not the boundary. Strengthen it where cheap (flag Assembly.GetType, Type.GetType, Activator.CreateInstance, InvokeMember, and dynamic usage), but for real enforcement run compiled scripts under a genuine boundary — a restricted AssemblyLoadContext/AppDomain-equivalent, a curated reference set that does not expose reflection-to-arbitrary-type, or an out-of-process sandbox — consistent with however the Site Runtime ultimately enforces its instance-script trust model. At minimum, document in Component-InboundAPI.md that the current check is best-effort and does not stop a determined script.

Resolution

Resolved 2026-05-17 (commit <pending>): the specific reflection-via-permitted-member vector was confirmed and the textual checker materially hardened against it (full sandboxing remains a separate, larger design effort — see below). ForbiddenApiWalker now, in addition to the namespace deny-list, rejects a curated set of reflection-gateway member names (GetType, GetTypeInfo, Assembly, Module, CreateInstance, InvokeMember, GetMethod(s), GetConstructor(s), GetField(s), GetProperty(ies), GetMember(s), GetRuntimeMethod(s), MethodHandle, TypeHandle) regardless of the receiver expression — so typeof(string).Assembly.GetType("System.IO.File") is now caught because .Assembly and .GetType appear as accessed member names. It also rejects a bare Activator identifier and the dynamic keyword (which widens late-bound access the static walker cannot see through). Invoke is deliberately not flagged so legitimate Action/Func delegate invocation still compiles — the reflection MethodInfo.Invoke path is cut off by rejecting the GetMethod that produces the MethodInfo. Documented limitation: this is hardened defence-in-depth, not a true sandbox — a determined author may still find a vector the syntax walker cannot see (e.g. via Microsoft.CSharp.RuntimeBinder internals or generics tricks). Genuine containment needs a runtime boundary (restricted AssemblyLoadContext / curated reference set that does not expose reflection-to-arbitrary-type / out-of-process sandbox); that is tracked as a future design change and noted in the ForbiddenApiChecker XML summary. Regression tests: new ForbiddenApiCheckerTests suite (19 cases) covering the Assembly/GetType/Type.GetType/Activator.CreateInstance/InvokeMember/ GetMethod/GetTypeInfo/dynamic bypass vectors plus permitted-script and namespace-deny-list regression guards.

InboundAPI-016 — Routed Route.To().Call() invocations are not bound by the method timeout

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs:59-152, src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:177, :199

Description

Component-InboundAPI.md states the per-method timeout "defines the maximum time the method is allowed to execute (including any routed calls to sites)", and the Routing Behavior section says a routed call "blocks until the site responds or the method-level timeout is reached". The executor builds a linked CancellationTokenSource (cts) combining the request-abort token and a dedicated timeout CTS, and exposes cts.Token to the script as InboundScriptContext.CancellationToken. However, every RouteTarget method (Call, GetAttribute(s), SetAttribute(s)) takes CancellationToken cancellationToken = default and the script must explicitly pass the context token for the routed call to honour the timeout. A natural script — Route.To("inst").Call("doWork", parameters) — invokes the routed call with CancellationToken.None. That request flows into CommunicationService.RouteToCallAsync with no cancellation, so the routed call is not bounded by the method timeout at all. The only timeout guard left is handler(context).WaitAsync(cts.Token) in ExecuteAsync: when the method timeout fires, WaitAsync returns a cancellation to the caller, but the underlying script Task — and the in-flight RouteToCallAsync awaiting a remote site — keeps running orphaned with no cancellation, holding the correlation/communication resources until the site eventually responds or its own transport timeout (if any) fires. The design's guarantee that the method timeout covers routed calls is therefore not met, and a slow/hung site can leak background work past the timeout the caller was told bounds the request.

Recommendation

Make routed calls inherit the method deadline without relying on script discipline: have RouteHelper/RouteTarget carry the executing method's CancellationToken (injected by InboundScriptExecutor when it constructs the context, e.g. a RouteHelper bound to cts.Token) and pass it into every CommunicationService call by default, so Route.To("x").Call("s", p) is timeout-bounded with no token argument. Keep the explicit-token overload for callers that want a tighter bound. Verify RouteToCallAsync and the attribute-routing calls actually observe the token and abandon the in-flight request when it fires.

Resolution

Resolved 2026-05-17 (commit <pending>): root cause confirmed — every RouteTarget method took CancellationToken cancellationToken = default, so a natural script Route.To("inst").Call("doWork", p) routed with CancellationToken.None and was not bound by the method timeout at all. RouteHelper now carries the executing method's deadline token: InboundScriptExecutor.ExecuteAsync calls the new RouteHelper.WithDeadline(cts.Token) when it builds the script context, so the route helper handed to the script is bound to the method-level timeout CTS. Each RouteTarget method resolves an effective token — the explicitly-supplied token if the caller passed one (tighter bound preserved), otherwise the method deadline — and forwards it into both IInstanceLocator site resolution and the routed call. The deadline token therefore flows through to CommunicationService.RouteTo*Async, so an in-flight routed call observes cancellation when the method timeout fires instead of running orphaned. Regression tests (in the new RouteHelperTests): Call_WithNoExplicitToken_InheritsMethodDeadlineToken, Call_WhenMethodDeadlineCancelled_RoutedCallObservesCancellation, Call_ExplicitToken_OverridesDeadlineToken, GetAttributes_WithNoExplicitToken_InheritsMethodDeadlineToken, SetAttributes_WithNoExplicitToken_InheritsMethodDeadlineToken.

InboundAPI-017 — RouteHelper / RouteTarget has no test coverage

Severity Low
Category Testing coverage
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs:1-165, tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/

Description

RouteHelper/RouteTarget is the entire WP-4 cross-site routing surface — the Route.To().Call()/GetAttribute(s)/SetAttribute(s) API that inbound API scripts use to reach instances at any site. It has zero tests: the ZB.MOM.WW.ScadaBridge.InboundAPI.Tests project covers ApiKeyValidator, ParameterValidator, InboundScriptExecutor, and InboundApiEndpointFilter, but no test file exercises RouteHelper. Untested behaviours include site resolution via IInstanceLocator (including the "instance not found / no assigned site" InvalidOperationException path at RouteHelper.cs:154-164), the !response.SuccessInvalidOperationException translation in each routed method, GetAttribute delegating to the batch GetAttributes and returning null for an absent key, correlation-ID generation, and SetAttribute delegating to SetAttributes. These are non-trivial branches whose failure modes (a thrown exception inside a script) surface to the caller as a 500, so regressions would be silent.

Recommendation

Add a RouteHelperTests suite using substituted IInstanceLocator and CommunicationService (the executor tests already substitute CommunicationService): cover the happy path of each routed method, the unresolved-instance throw, the !SuccessInvalidOperationException mapping, and GetAttribute returning null for a missing key. This also gives InboundAPI-016 a regression home if the timeout wiring is added.

Resolution

Resolved 2026-05-17 (commit <pending>): confirmed — ZB.MOM.WW.ScadaBridge.InboundAPI.Tests had no file exercising RouteHelper/RouteTarget. To make the surface testable without a live actor system, an IInstanceRouter seam was introduced in the module (the routing transport RouteHelper depends on); the production CommunicationServiceInstanceRouter delegates to CommunicationService and is registered by AddInboundAPI. RouteHelper now depends on IInstanceLocator + IInstanceRouter (both substitutable). Added the RouteHelperTests suite (15 cases) covering: the happy path of Call/GetAttribute(s)/ SetAttribute(s), correlation-ID generation, the unresolved-instance InvalidOperationException path, the !SuccessInvalidOperationException mapping for each routed method, GetAttribute delegating to the batch GetAttributes and returning null for an absent key, SetAttribute delegating to SetAttributes, and the InboundAPI-016 deadline-token inheritance behaviour. All 15 pass.

InboundAPI-018 — AuditWriteMiddleware fires WriteAsync as _ = task — faulted async writes are unobserved

Severity Medium
Category Error handling & resilience
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/Middleware/AuditWriteMiddleware.cs:257

Resolution (2026-05-28): kept the fire-and-forget (audit emission must never block or alter the user-facing response per alog.md §13) but added ObserveAuditWriteFault, a small helper that attaches an OnlyOnFaulted ContinueWith to the writer task — an asynchronously-faulted audit write now logs a Warning (with the captured exception, method, path, and HTTP status) instead of vanishing into TaskScheduler.UnobservedTaskException. The continuation runs off-thread on TaskScheduler.Default so the response hot path is unchanged. Regression test AuditWriter_AsyncFault_IsObserved_AsWarning_AndDoesNotAlterResponse uses an async-yielding throwing writer to prove the post-async fault is logged and the response stays 200.

Description

EmitInboundAudit calls _ = _auditWriter.WriteAsync(evt); — the returned Task is discarded with the discard operator inside a synchronous try block. The wrapping try/catch (Exception ex) (lines 198266) only catches a synchronous throw before the writer returns a task. Once WriteAsync returns a task, any exception that faults that task (e.g. a DB timeout in the central audit writer, a serialization failure, a cancellation that bubbles up) is never observed: it is not logged, it does not increment the CentralAuditWriteFailures health-monitoring counter the design doc references ("Fail-soft semantics" paragraph), and the audit row is silently lost. In .NET, unobserved task exceptions are eventually surfaced via TaskScheduler.UnobservedTaskException and may also be logged by the runtime — either way, the middleware itself has no control over what (if anything) happens on a fault. The XML doc comment at line 255 claims "the writer itself swallows" but this is an implicit cross-component contract: the abstraction ICentralAuditWriter.WriteAsync returns Task and makes no such guarantee, and the only test that exercises a throwing writer (AuditWriter_Throws_* in AuditWriteMiddlewareTests.cs) uses an OnWrite callback that throws synchronously, not asynchronously — so the async fault path is not covered by tests either.

This matters because Component-InboundAPI.md states that audit-emission failures must increment CentralAuditWriteFailures (Health Monitoring #11) — a counter that, with the current fire-and-forget, will under-count async-faulted writes.

Recommendation

Either (a) await the write and rely on the surrounding try/catch to log the failure, accepting an extra await on the request hot path; or (b) keep the fire-and-forget for latency but attach a ContinueWith(t => ..., OnlyOnFaulted) that logs the fault and increments the failure counter, so a faulted async write is at least observed. Option (b) preserves "audit emission never blocks the HTTP response" while restoring the visibility the design assumes. Add a regression test using a writer whose WriteAsync returns a faulted Task (not a synchronous throw) to pin the new contract.

InboundAPI-019 — EnableBuffering() called unconditionally on every request, including bodyless requests

Severity Low
Category Performance & resource management
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/Middleware/AuditWriteMiddleware.cs:141
Status Resolved

Resolution (2026-05-28): Added a RequestHasBody(HttpRequest) guard that returns true only when ContentLength > 0 or the method is POST / PUT / PATCH (the HttpMethods.IsPost/Put/Patch helpers); the EnableBuffering + ReadBufferedRequestBodyAsync call now sits behind that guard. Bodyless GET / HEAD / DELETE / TRACE / OPTIONS requests (and any explicit Content-Length: 0) skip the FileBufferingReadStream allocation; body-carrying methods with no Content-Length (chunked POST etc.) still buffer. Regression tests BodylessMethod_SkipsEnableBuffering_RequestStreamIsNotReplaced (GET/HEAD/DELETE theory), BodylessPost_ContentLengthZero_SkipsEnableBuffering, and PostWithBody_StillEnablesBuffering_AndCapturesRequestSummary (anti-regression).

Description

InvokeAsync always calls ctx.Request.EnableBuffering() before the empty-body short-circuit at ReadBufferedRequestBodyAsync line 289 (if (request.ContentLength is 0) return (null, false);). EnableBuffering() swaps the request stream for a FileBufferingReadStream whose construction allocates an internal buffer (default threshold ~30 KB before spilling to a temp file) regardless of whether the request actually has a body. The /api scope this middleware lives under will see at least some bodyless requests (e.g. GET /api/audit/query once that route is in the same branch — see InboundAPI-025; future health checks; misbehaving clients) and each one pays the buffering allocation cost for no benefit.

Recommendation

Defer the EnableBuffering() call into ReadBufferedRequestBodyAsync after the ContentLength is 0 check, or short-circuit in InvokeAsync before enabling buffering when ContentLength is 0 and Method is "GET" or "HEAD" or "DELETE". The win is a per-request FileBufferingReadStream allocation avoided on every bodyless request through the middleware.

InboundAPI-020 — ContentType.Contains("json") is case-sensitive; application/JSON with no Content-Length skips body parsing

Severity Low
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/EndpointExtensions.cs:70

Resolution (2026-05-28): swapped the case-sensitive Contains("json") substring match for Contains("json", StringComparison.OrdinalIgnoreCase) so application/JSON / Application/Json / APPLICATION/JSON all enter the JSON-deserialization path. Regression test EndpointContentTypeTests.ContentTypeCheck_IsCaseInsensitive_ParsesBodyForAnyCasing (theory, 4 casings) drives a TestServer-hosted MapInboundAPI end-to-end with a required Integer parameter — proving body parsing actually ran by asserting the parameter reaches the handler.

Description

HandleInboundApiRequest parses the JSON body only when httpContext.Request.ContentLength > 0 || httpContext.Request.ContentType?.Contains("json") == true. The string.Contains(string) overload used here is case-sensitive — a perfectly valid HTTP header Content-Type: application/JSON (uppercase) would yield false ("application/JSON".Contains("json") is false). With no Content-Length (e.g. chunked transfer-encoding) and an uppercase content type, the handler then leaves body = null and ParameterValidator.Validate runs against a missing body — so a method that declares any required parameter is rejected with "Missing required parameters" even though the caller did send a well-formed JSON body. HTTP RFC 7230 §3.2 makes header field names case-insensitive but is silent on values; in practice clients do sometimes uppercase media-type tokens, and the framework's own MediaTypeHeaderValue is case-insensitive.

Recommendation

Use the case-insensitive overload — httpContext.Request.ContentType?.Contains("json", StringComparison.OrdinalIgnoreCase) == true — or rely on the framework's IsJson check via MediaTypeHeaderValue.TryParse/HttpRequest.HasJsonContentType(). Add a regression test posting with application/JSON and Transfer-Encoding: chunked.

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs:141-143, :182-183, :225-226; src/ZB.MOM.WW.ScadaBridge.Commons/Messages/InboundApi/RouteToInstanceRequest.cs:15-21, :36-40, :55-59

Description

CLAUDE.md's Centralized Audit Log section describes ParentExecutionId as the cross-execution spawn pointer that "every row of a spawned run carries" and specifically calls out "the inbound API → routed-site-script case". The current implementation honours this only on RouteToCallRequest — which carries ParentExecutionId as its trailing additive field (line 21 of RouteToInstanceRequest.cs) and is stamped by RouteTarget.Call with the inbound request's execution id at line 143 of RouteHelper.cs.

RouteToGetAttributesRequest and RouteToSetAttributesRequest, however, have no ParentExecutionId field and the matching RouteTarget.GetAttributes / SetAttributes methods (RouteHelper.cs:182-183, :225-226) never reference _parentExecutionId. So when an inbound API script reads or writes a site attribute via Route.To("inst").GetAttribute(...) / Route.To("inst").SetAttribute(...), the site-side audit row for that trust-boundary action (an outbound-by-the-script DB / OPC write at the site) is emitted with ParentExecutionId = null and the execution-tree walk IX_AuditLog_ParentExecution cannot link it back to the spawning inbound request. The two-row pair (inbound + spawned site work) reverts to the "top-level / null" state the design says is the fallback for non-spawned runs. The asymmetry between Call and GetAttributes/SetAttributes is also surprising — a script author would reasonably expect the same correlation across all Route.To(...) calls.

Recommendation

Add a trailing Guid? ParentExecutionId = null field to RouteToGetAttributesRequest and RouteToSetAttributesRequest (additive trailing member, matches the message-evolution rule in CLAUDE.md); stamp it from _parentExecutionId in RouteTarget.GetAttributes and RouteTarget.SetAttributes; have the site-side handlers thread the field onto their emitted audit rows. Add a RouteHelperTests regression case asserting that an attribute read/write carries the inherited ParentExecutionId.

Resolution (2026-05-28):

Wire fix landed — RouteToGetAttributesRequest and RouteToSetAttributesRequest now carry a trailing additive Guid? ParentExecutionId = null, mirroring RouteToCallRequest; RouteTarget.GetAttributes and RouteTarget.SetAttributes stamp _parentExecutionId onto the request, so the field travels off the inbound API box symmetrically across all three Route.To() verbs. Four regression tests added to RouteHelperTests (with/without parent id for both verbs).

Site-side audit emission for routed reads/writes is NOT currently wired — DeploymentManagerActor.RouteInboundApiGetAttributes / …SetAttributes and InstanceActor.HandleGetAttribute / HandleSetStaticAttribute / HandleSetDataAttribute do not call IAuditWriter today (only script-driven AuditingDbConnection / AuditingDbCommand paths emit audit rows on the site side). The ParentExecutionId is now available on the wire so once those audit emissions land (tracked separately under the Audit Log site-wiring backlog), they can stamp the parent id without any further plumbing.

InboundAPI-022 — IActiveNodeGate has no production registration in Host — standby-node gating is silently disabled in production

Severity High
Category Security
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/IActiveNodeGate.cs, src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundApiEndpointFilter.cs:52-60; absent from src/ZB.MOM.WW.ScadaBridge.Host/Program.cs

Resolution — Added src/ZB.MOM.WW.ScadaBridge.Host/Health/ActiveNodeGate.cs, a production IActiveNodeGate implementation backed by AkkaHostedService that mirrors ActiveNodeHealthCheck's leadership probe (member status Up AND Cluster.State.Leader == SelfAddress), and registered it as a singleton in the central-role branch of Program.cs. A structural regression test (CentralCompositionRootTests.Central_IActiveNodeGate_IsRegisteredAsActiveNodeGate) reflects over the built IServiceProvider to assert the registration's existence and concrete type — failing on main and passing after the fix. The InboundApiEndpointFilter's fall-through-to-allow behaviour is retained as the documented safe default for non-clustered hosts and tests.

Description

InboundAPI-008's resolution adds IActiveNodeGate (lines 1724 of IActiveNodeGate.cs) so a standby central node can refuse to serve the inbound API. InboundApiEndpointFilter.InvokeAsync consults the gate at line 52 (var gate = httpContext.RequestServices.GetService<IActiveNodeGate>();), and when gate is { IsActiveNode: false } returns HTTP 503. The filter's behaviour when no implementation is registered (line 51 comment) is to fall through and serve the request — the resolution paragraph for InboundAPI-008 closes with:

"Follow-up (outside this module's scope): ZB.MOM.WW.ScadaBridge.Host should register an IActiveNodeGate implementation backed by ActiveNodeHealthCheck / Cluster.State.Leader in the central-role branch of Program.cs so the gate is actually enforced in production; until then the endpoint defaults to "allow"."

A grep of the entire src/ZB.MOM.WW.ScadaBridge.Host/ tree at 1eb6e97 finds zero IActiveNodeGate registrations: grep -rn "IActiveNodeGate\|AddSingleton.*ActiveNode" src/ZB.MOM.WW.ScadaBridge.Host/ returns no matches. The follow-up was never carried out. So in production today the standby central node still serves the inbound API exactly as InboundAPI-008 described — executes method scripts, runs Route.To() calls, races the active node, and may operate against stale singleton state. The new infrastructure (interface + filter check) is present but unwired; from the user's perspective the original High-severity issue is unresolved in deployed binaries.

The design says the inbound API is "Central cluster only (active node)" and "fails over with it" — this guarantee is not currently enforced in production.

Recommendation

Register an IActiveNodeGate implementation in the central-role branch of ZB.MOM.WW.ScadaBridge.Host/Program.cs. The natural backing is the existing ActiveNodeHealthCheck (already wired for /health/active) or a direct read of Cluster.Get(actorSystem).State.Leader == Cluster.Get(actorSystem).SelfAddress. Add an integration test in the Host that spins up the central role and asserts that the gate is resolvable and returns IsActiveNode consistent with cluster leader state. Until that wiring lands, this finding is the user-facing realisation of the InboundAPI-008 vulnerability.

InboundAPI-023 — EndpointExtensions.HandleInboundApiRequest composition wiring has no test coverage

Severity Low
Category Testing coverage
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/EndpointExtensions.cs:31-140, tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/

Resolution (2026-05-28): Added tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointExtensionsTests.cs, a TestServer-hosted suite (same pattern as EndpointContentTypeTests) that drives the POST /api/{methodName} wiring end-to-end. Seven cases pin the composed flow: happy path (200 + script result body), missing API key (401), unknown method (403, indistinguishable from "not approved" per InboundAPI-011), invalid JSON body (400), missing required parameter (400 from ParameterValidator), script throws (500 with sanitized error body — the executor's catch-all replaces the raw exception with "Internal script error"), and the HttpContext.Items[AuditWriteMiddleware.AuditActorItemKey] actor-stash invariant (verified by an inline capture middleware reading the slot after the endpoint runs). All 7 new tests pass; total InboundAPI.Tests now 158 (was 151).

Description

The endpoint handler HandleInboundApiRequest is the wiring composition that ties the validator → JSON parse → ParameterValidatorInboundScriptExecutor → result-serialization path together; it is the single piece of code that maps validator status codes to HTTP responses, threads the parentExecutionId from HttpContext.Items into the executor, stashes the resolved API key name as AuditActorItemKey, and emits the request-aborted short-circuit. The test project covers each composed component (ApiKeyValidatorTests, ParameterValidatorTests, InboundScriptExecutorTests, RouteHelperTests, InboundApiEndpointFilter, AuditWriteMiddlewareTests, MiddlewareOrderTests) but no test exercises HandleInboundApiRequest itself — so regressions in the wiring (e.g. forgetting to stash the actor name on HttpContext.Items, the Contains("json") case sensitivity from InboundAPI-020, or accidentally swapping validationResult.StatusCode for a literal) are not caught.

Recommendation

Add an EndpointExtensionsTests suite using TestServer (the same pattern MiddlewareOrderTests uses) covering: the happy path (200 + body), invalid JSON (400), validator 401, validator 403, parameter-validation failure (400), script-failure 500, client-aborted short-circuit (Results.Empty), and the actor-stash invariant (HttpContext.Items[AuditActorItemKey] is set with the resolved key name after successful auth, but is absent on auth failures).

InboundAPI-024 — _knownBadMethods is unbounded — an attacker can grow the cache by spamming distinct method names against the audit middleware path

Severity Low
Category Performance & resource management
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:30, :77, :223, :233

Resolution (2026-05-28): capped the cache at KnownBadMethodsCap = 1000 entries via a new TryRecordBadMethod helper that short-circuits when the cap is reached — both CompileAndRegister and the ExecuteAsync lazy-compile path now route through it. Once full, new bad-method records are dropped; the cache is just a fast-fail optimisation and the per-request DB lookup remains the correctness path. Regression tests KnownBadMethodsCache_SizeNeverExceedsCap_UnderUniqueNameFlood and KnownBadMethodsCache_LazyCompilePath_AlsoCappedUnderUniqueNameFlood flood cap + 500 / cap + 250 unique broken methods and assert the cache size never exceeds the cap. Internal KnownBadMethodCount exposed for testability only.

Description

The InboundAPI-009 fix introduced _knownBadMethods, a ConcurrentDictionary<string, byte> of method names whose Roslyn compilation failed, to short-circuit lazy recompilation. It is keyed by method.Name and entries are only ever removed when CompileAndRegister succeeds for the same name (line 83). Practically the key space is bounded by the configured method definitions in the database, so this is bounded in normal operation. But because the cache is mutated from the lazy-compile path at ExecuteAsync.cs:233, and ExecuteAsync is called from HandleInboundApiRequest only after ApiKeyValidator.ValidateAsync has returned Valid (i.e. a real method exists), the entry is keyed by a name that must have already been resolved through GetMethodByNameAsync — so this attack surface is gated by the configuration database. The finding is therefore mostly defensive: there is no rate limit on inbound API calls (deliberate design), so if a future change ever causes ExecuteAsync to be called for an unvalidated caller-supplied method name (e.g. a refactor that moves method-existence checking later), this cache would become attacker-controllable.

Recommendation

Optional / defensive: cap _knownBadMethods (e.g. an LRU with a fixed size, or clear it periodically). At minimum, document the invariant in the executor's XML comment that _knownBadMethods keys must come from validated ApiMethod.Name values, so the safety property survives future refactors. No immediate change required; this is a watch-list item.

InboundAPI-025 — AuditWriteMiddleware runs against the entire /api/* branch — emits spurious ApiInbound audit rows for /api/audit/query and /api/audit/export

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:183-185; consumers: src/ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs:93-94; emitter: src/ZB.MOM.WW.ScadaBridge.InboundAPI/Middleware/AuditWriteMiddleware.cs:175-252

Resolution (2026-05-28): Took the defensive path-exclusion in Program.cs (Option 1 from the recommendation). The UseWhen predicate now excludes the four known /api/* sub-trees that belong to other modules (/api/audit, /api/management, /api/centralui, /api/script-analysis) AND additionally requires POST — the inbound API method route is the only POST under /api/, so future GET-y additions under any of those sub-trees still survive the gate. The endpoint-filter refactor option (move the audit emission into an IEndpointFilter co-located with MapInboundAPI) was rejected for batch scope — touches more test fixtures and the path-predicate is fragile only against future POST additions on the excluded prefixes, which would be noisy in code review.

Description

Program.cs wires the audit middleware as app.UseWhen(ctx => ctx.Request.Path.StartsWithSegments("/api"), branch => branch.UseAuditWriteMiddleware()) — scoped to the /api prefix, not to the POST /api/{methodName} route. Meanwhile, ZB.MOM.WW.ScadaBridge.ManagementService/AuditEndpoints.cs maps MapGet("/api/audit/query", ...) (line 93) and MapGet("/api/audit/export", ...) (line 94). Both routes therefore inherit AuditWriteMiddleware, which emits an AuditEvent { Channel = AuditChannel.ApiInbound, Kind = AuditKind.InboundRequest, ... } row for every call. The middleware's ResolveMethodName falls back to the last path segment (lines 446452), so a GET /api/audit/query?... is recorded as if a caller had invoked an inbound API method named "query"; an export is recorded as method "export". Effects:

  1. Audit log is polluted with non-script rows. The audit log is now recording its own query traffic as if it were inbound script invocations, contradicting Component-AuditLog.md's scope ("script trust boundary actions").
  2. Audit reads recursively emit audit writes. Every audit-log query (e.g. from the Central UI Audit Log page or the CLI audit query command) writes an additional row into AuditLog, growing the table on read.
  3. Target is meaningless. /api/audit/query has no method definition, so the recorded Target = "query" is not joinable to any ApiMethod row in audit-log drill-ins.
  4. Wasted resources on health probes / management calls. Any future routes added under /api/ will inherit the middleware and pay the EnableBuffering, CapturedResponseStream, and JsonSerializer.Serialize costs even though they are not inbound script invocations.

Tests for the audit middleware (AuditWriteMiddlewareTests) and pipeline order (MiddlewareOrderTests) wire the middleware only against the POST /api/{methodName} route in test hosts, so this production-only mis-scoping is not exercised.

Recommendation

Tighten the predicate so the middleware runs only on the inbound API method route, not on the /api/ prefix. Options:

  • app.UseWhen(ctx => ctx.Request.Path.StartsWithSegments("/api") && !ctx.Request.Path.StartsWithSegments("/api/audit") && !ctx.Request.Path.StartsWithSegments("/api/management"), ...) — defensive, but fragile to future route additions.
  • Move the audit emission from a pipeline middleware to an IEndpointFilter applied via .AddEndpointFilter<>() on the MapInboundAPI registration (alongside InboundApiEndpointFilter). This makes the scope explicit on the one route that needs it and survives future /api/... route additions unchanged.

The endpoint-filter form is the recommended fix — it co-locates the audit-emission scope with the route definition and matches how InboundAPI-006/008 gating is already wired.