docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28)
plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381.

67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit
fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix
with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings
(IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision.

Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to
sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024),
an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001),
and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary
is semantically sound (symbol-based) in the production cluster config.

README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
Joseph Doherty
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
+229 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.InboundAPI` |
| Design doc | `docs/requirements/Component-InboundAPI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Last reviewed | 2026-06-20 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 0 |
| Commit reviewed | `4307c381` |
| Open findings | 4 |
## Summary
@@ -124,6 +124,36 @@ configuration database, but the invariant is undocumented.)
| 9 | Testing coverage | ☑ | `EndpointExtensions.HandleInboundApiRequest` composition wiring has no test (InboundAPI-023); middleware/filter/validator/executor/route are individually covered. |
| 10 | Documentation & comments | ☑ | No new issues. |
#### Re-review 2026-06-20 (commit `4307c381`) — full review
All 25 prior findings remain `Resolved` (InboundAPI-022's `IActiveNodeGate` Host
registration is now present at `Program.cs:260`; InboundAPI-025's POST-only audit
`UseWhen` predicate is live at `Program.cs:337-344`). The module changed materially:
the whole project was renamed (ScadaLink → ZB.MOM.WW.ScadaBridge), inbound auth was
re-architected to the shared `ZB.MOM.WW.Auth.ApiKeys` Bearer/`sbk_` verifier (the SQL
`ApiKey` store, `ApiKeyValidator`, and peppered `IApiKeyHasher` were retired),
`ForbiddenApiChecker` became a thin shim over Script Analysis #25's `ScriptTrustValidator`,
and JSON-Schema `$ref` runtime resolution (`SchemaRefResolver`) plus recursive
Object/List validation+materialization landed. The dominant new finding is that a
**read-only `InboundDatabaseHelper`** was added to `InboundScriptContext.Database`
reintroducing the very raw-DB-access capability the design doc still explicitly
forbids (regressing the InboundAPI-007 closure), with no read-only enforcement and a
sync-over-async, token-ignoring data path. **4 new findings**: 1 High, 1 Medium,
2 Low — no Critical.
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | Recursive Materialize/Validate look sound; `WaitForAttribute` deadline-vs-wait-timeout tension (InboundAPI-029). |
| 2 | Akka.NET conventions | ☑ | ASP.NET-hosted; routes via `IInstanceRouter``CommunicationService`. Correlation/parent ids stamped on all four `Route.To()` verbs. No issues. |
| 3 | Concurrency & thread safety | ☑ | `ConcurrentDictionary` caches; per-request DI scope created from root in singleton executor (valid). New DB helper blocks a pool thread (InboundAPI-027). |
| 4 | Error handling & resilience | ☑ | DB helper's `ExecuteScalar`/`ExecuteReader` are synchronous and ignore `_ct` — a slow query is unbounded by the method timeout (InboundAPI-027). |
| 5 | Security | ☑ | `InboundDatabaseHelper` runs arbitrary script-supplied SQL (not read-only despite the name) on central with host privileges (InboundAPI-026). Bearer verifier delegates constant-time compare to shared lib. Forbidden-API check delegates to #25. |
| 6 | Performance & resource management | ☑ | Capped capture/known-bad caches; `ArrayPool` rent. Sync-over-async DB calls (InboundAPI-027). |
| 7 | Design-document adherence | ☑ | `Database` helper contradicts the doc's "No direct database access" decision (InboundAPI-026, regresses InboundAPI-007); `WaitForAttribute` timeout binding diverges from spec §6 line 192 (InboundAPI-029). |
| 8 | Code organization & conventions | ☑ | `InboundDatabaseHelper` is a component-local helper (fine); schema types in Commons. No issues. |
| 9 | Testing coverage | ☑ | `InboundDatabaseHelper` tests cover SELECT happy-path only — no write/DDL-rejection test (because none is enforced); `WaitForAttribute`/`Database` not end-to-end via the endpoint (InboundAPI-028). |
| 10 | Documentation & comments | ☑ | Helper XML doc claims "read-only" which is unenforced (folded into InboundAPI-026). |
## Findings
### InboundAPI-001 — Singleton script handler cache mutated without synchronization
@@ -1299,3 +1329,199 @@ route, not on the `/api/` prefix. Options:
The endpoint-filter form is the recommended fix — it co-locates the audit-emission
scope with the route definition and matches how InboundAPI-006/008 gating is
already wired.
### InboundAPI-026 — `InboundDatabaseHelper` gives inbound scripts arbitrary raw SQL (not read-only); contradicts the design doc's "No direct database access" decision and regresses InboundAPI-007
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Open |
| Location | `src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundDatabaseHelper.cs:13-70`, `src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundScriptExecutor.cs:267-281`, `:387`; design doc `docs/requirements/Component-InboundAPI.md:202-215` |
**Description**
A new `InboundDatabaseHelper` is exposed to every inbound API script as
`InboundScriptContext.Database` (`InboundScriptExecutor.cs:387`), backed by the
production `IDatabaseGateway` (`AddExternalSystemGateway()` in the central-role branch
of `Program.cs:81`). Its `QuerySingle<T>`/`Query` methods take a script-supplied `sql`
string and run it verbatim — `cmd.CommandText = sql; cmd.ExecuteScalar()/ExecuteReader()`
(`InboundDatabaseHelper.cs:27-29`, `:41-44`) — against a raw ADO.NET `DbConnection`
obtained from `IDatabaseGateway.GetConnectionAsync` (`IDatabaseGateway.cs:20`, the same
gateway the External System Gateway uses for SCADA machine-data DBs). Two problems:
1. **It directly contradicts the design doc.** `Component-InboundAPI.md:202-215` — which
is the *resolution* of finding InboundAPI-007 — states verbatim: *"**No direct
database access.** Inbound API scripts are not given a raw database client. Handing a
script a raw `SqlConnection` is in direct tension with the ScadaBridge script trust
model … Scripts interact with the system only through the curated `Route` and
`Parameters` surfaces … If a method needs data from the configuration or machine-data
databases, that access belongs behind a dedicated, scoped helper — not a
general-purpose connection — and would be added here as an explicit design change."*
The code now ships exactly the capability the doc forbids, and the doc was **not**
updated. This is a regression of the closed InboundAPI-007 (whose chosen resolution
was to remove the DB API and document its absence). Inbound API scripts are authored
by the less-trusted-than-Admin Design role and execute on the central node with the
host process's privileges, so this materially widens the trust boundary the design
deliberately drew.
2. **"Read-only" is unenforced — it is a comment, not a constraint.** The class summary
says *"Read-only database access"* but nothing restricts the SQL: a script may run
`Database.Query("conn", "DELETE FROM Machine")`, `UPDATE`, `INSERT`, `DROP`, or any
DDL/DML, and `Query`/`QuerySingle` will execute it (`ExecuteReader`/`ExecuteScalar`
happily run non-SELECT statements). The gateway opens a normal read-write connection
(no `ApplicationIntent=ReadOnly`, no statement allow-listing). The "read-only" claim
in the XML doc and the call-site comment (`InboundScriptExecutor.cs:240`,
"read-only Database helper") is therefore false and misleading.
Parameter *values* are correctly bound (`AddParameters`, `:59-68`) so value-injection is
not the issue — the issue is that arbitrary statement text is script-controlled and the
"read-only" containment does not exist.
**Recommendation**
Reconcile code and design, and make the safety claim real. Either (a) if scoped DB
access is now intended, update `Component-InboundAPI.md:202-215` to authorize it as the
explicit design change the prior text demanded, AND enforce read-only — restrict to
`SELECT` (reject any statement whose first significant token is not `SELECT`/`WITH`, or
better, route through a read-intent connection / a curated query API that cannot mutate),
fix the XML summary to describe what is actually enforced, and gate which connection
names a script may reach; or (b) if direct DB access is still out of scope, remove
`InboundDatabaseHelper` and the `Database` member (reverting to the InboundAPI-007
posture). Add a regression test asserting a non-SELECT statement is rejected once
read-only is enforced (see InboundAPI-028).
**Resolution**
_Unresolved._
### InboundAPI-027 — `InboundDatabaseHelper` is sync-over-async and ignores the method-deadline token — thread-pool starvation and an unbounded slow query
| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Open |
| Location | `src/ZB.MOM.WW.ScadaBridge.InboundAPI/InboundDatabaseHelper.cs:25-33`, `:40-44` |
**Description**
`QuerySingle`/`Query` execute on the ASP.NET request thread (the script runs inside
`CSharpScript.RunAsync` on a thread-pool thread). Both methods do
`_gateway.GetConnectionAsync(connectionName, _ct).GetAwaiter().GetResult()`
(`:25`, `:40`) — a synchronous block on an async DB-open — and then call the **synchronous**
`cmd.ExecuteScalar()` (`:29`) / `cmd.ExecuteReader()` (`:44`). Two consequences:
1. **Thread-pool starvation.** The inbound API has no rate limiting (a deliberate design
choice). Each DB-backed method invocation blocks a pool thread for the full duration
of the DB round-trip via `.GetAwaiter().GetResult()`. Under concurrent load this
consumes pool threads that cannot be released until the DB responds — exactly the
pattern ASP.NET guidance warns against (it won't deadlock without a sync context, but
it does starve the pool and degrade the whole central node).
2. **The slow query is not bounded by the method timeout.** The cancellation token
`_ct` (the method-deadline `cts.Token` passed at `InboundScriptExecutor.cs:268`) is
forwarded to `GetConnectionAsync` but **not** to the synchronous `ExecuteScalar()` /
`ExecuteReader()` — those overloads ignore cancellation entirely. So once a connection
is open, a hung or slow `SELECT` runs to completion (or the provider's own command
timeout) regardless of the method timeout, holding the blocked thread the whole time.
This contradicts the InboundAPI-016 guarantee that the method timeout bounds the
method's work.
**Recommendation**
Use the async ADO.NET path end-to-end and honour the token: `await
_gateway.GetConnectionAsync(...)`, then `await cmd.ExecuteReaderAsync(_ct)` /
`await cmd.ExecuteScalarAsync(_ct)`, and make `QuerySingle`/`Query` async (`Task<T?>` /
`Task<IReadOnlyList<...>>`) so the script `await`s them rather than blocking a pool
thread. Set a `CommandTimeout` derived from the remaining method deadline as a backstop.
If a synchronous script-facing signature must be kept, at minimum register the command
with the token and a command timeout so a slow query cannot run unbounded.
**Resolution**
_Unresolved._
### InboundAPI-028 — `InboundDatabaseHelper` has no negative-path tests; `Database`/`WaitForAttribute` are not exercised end-to-end through the endpoint
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/InboundDatabaseHelperTests.cs:34-67`, `tests/ZB.MOM.WW.ScadaBridge.InboundAPI.Tests/EndpointExtensionsTests.cs` |
**Description**
`InboundDatabaseHelperTests` covers only the SELECT happy path (single value, no-rows
default, multi-column row, empty result). There is no test that:
- a non-SELECT statement (`DELETE`/`UPDATE`/`DROP`) is rejected — because nothing rejects
it today (InboundAPI-026); a test would currently *prove* the helper is not read-only;
- the `InvalidOperationException("Database is not available …")` path fires when the
gateway is null (`InboundDatabaseHelper.cs:24`, `:39`);
- the method-deadline token cancels a slow query (InboundAPI-027).
Separately, `InboundScriptContext.Database` and `RouteTarget.WaitForAttribute` — both
new script-facing capabilities — are unit-tested at the helper/route level but never
driven end-to-end through `HandleInboundApiRequest` (the `EndpointExtensionsTests`
TestServer suite does not invoke a script that touches `Database` or `WaitForAttribute`),
so a wiring regression that fails to construct the `Database` helper or thread the
deadline into a wait would be silent.
**Recommendation**
Add negative-path tests to `InboundDatabaseHelperTests` (null-gateway throw; once
InboundAPI-026 enforces read-only, a write/DDL-rejection test; an InboundAPI-027
cancellation test). Add an `EndpointExtensionsTests` case driving a method script that
calls `Database.Query(...)` and one that calls `Route.To(...).WaitForAttribute(...)` so
the executor→context→helper wiring is covered through the real endpoint.
**Resolution**
_Unresolved._
### InboundAPI-029 — Routed `WaitForAttribute` is cancelled by the method-level deadline, contradicting spec §6 (wait bounded by the wait timeout, not the method timeout)
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ZB.MOM.WW.ScadaBridge.InboundAPI/RouteHelper.cs:220-247`, `:302-303`; design doc `docs/requirements/Component-InboundAPI.md:192` |
**Description**
`Component-InboundAPI.md:192` says of `WaitForAttribute`: *"The cluster call is bounded
by the **wait timeout** rather than the generic integration timeout."* But
`RouteTarget.WaitForAttribute` routes with `Effective(cancellationToken)` (`:226`),
which — when the script passes no explicit token, the normal case — resolves to
`_deadlineToken` (the method-level timeout CTS bound by `WithDeadline` at
`InboundScriptExecutor.cs:279`). So a method whose timeout (default 30 s) is shorter than
the requested wait timeout (e.g. `WaitForAttribute("Flag", true, TimeSpan.FromMinutes(5))`)
has its wait cancelled at 30 s by the method deadline, never reaching the 5-minute wait
timeout the spec says governs it. The routed wait is thus bounded by the *generic method
timeout* — exactly what spec line 192 says it should not be.
This is in genuine tension with the InboundAPI-016 decision ("the method timeout covers
routed calls, including waits"). The two design statements conflict for the
`WaitForAttribute` case: §6 line 192 wants the wait timeout to win; InboundAPI-016 wants
the method deadline to be the hard ceiling. The current code silently resolves it in
favour of the method deadline, which makes the documented `WaitForAttribute` timeout
semantics (and its `false`-on-timeout return contract) unreliable whenever the method
timeout is the smaller of the two.
**Recommendation**
Reconcile the two design statements explicitly. Options: (a) treat the method deadline as
the hard ceiling and update `Component-InboundAPI.md:192` to say the wait is bounded by
`min(waitTimeout, remaining method deadline)` and that an undersized method timeout will
cut a wait short; or (b) honour spec §6 by deriving the wait's effective token from the
wait timeout (a per-wait CTS) rather than the method deadline, accepting that a long wait
can outlive the generic method timeout. Either way, add a `RouteHelperTests` case pinning
the chosen semantics when method-timeout < wait-timeout (the current suite only tests
deadline-token *inheritance*, not the timeout *ordering* conflict).
**Resolution**
_Unresolved._