docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28)
plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381.

67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit
fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix
with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings
(IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision.

Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to
sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024),
an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001),
and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary
is semantically sound (symbol-based) in the production cluster config.

README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
Joseph Doherty
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
+204 -2
View File
@@ -5,9 +5,9 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.Host` |
| Design doc | `docs/requirements/Component-Host.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Last reviewed | 2026-06-20 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Commit reviewed | `4307c381` |
| Open findings | 0 |
## Summary
@@ -80,6 +80,33 @@ Serilog is the only logger provider and the section is dead config (Host-021);
and `ParseLevel` silently swallows an unrecognised `MinimumLevel` value (e.g.
a typo) and falls back to `Information` with no warning (Host-022).
#### Re-review 2026-06-20 (commit `4307c381`) — full review
All twenty-two prior findings (Host-001..022) remain `Resolved` in the current tree
(Host-008 was deliberately *reverted* in M2.9 — see Host-024). This is the first
review since the `ScadaBridge` rename, so a `git diff 1eb6e97 HEAD` shows the whole
directory as additions and is not a useful delta; the review walked the full current
state plus the per-file commit history. Major changes since `1eb6e97`: REQ-HOST-4a was
re-architected (the leader-only `active-node` check left `/health/ready`; a new
leadership-agnostic `RequiredSingletonsHealthCheck` probes the five required central
singletons via `Identify` — clean), shared `ZB.MOM.WW.Health`/`Telemetry` adoption
(`/healthz`, `/metrics`, OTel), a site HTTP/1.1 `MetricsPort` (8084) listener, the
KpiHistory + Transport central registrations, the disable-login dev flag, and the
library-backed inbound API-key auth. Four new findings, none crash/data-loss class.
Host-023 (Medium) is a regression-of-resolution: the shipped `appsettings.Site.json`
second seed-node now targets `localhost:8084` — the new HTTP/1.1 MetricsPort, not an
Akka remoting endpoint — the same defect class as the resolved Host-004 (which chose
8084 as a "remoting port" before MetricsPort claimed it); `StartupValidator` rejects a
seed on the GrpcPort but not on the MetricsPort. Host-024 (Medium) flags the M2.9
reversal of Host-008: `MachineDataDb` is again *required* for Central yet is consumed
by no DbContext/repository AND is absent from the shipped `appsettings.Central.json`,
so a developer running the shipped Central binary fails startup for a value nothing
uses. Host-025 (Low): the Host directly consumes KpiHistory (`AddKpiHistory`,
`KpiHistoryRecorderActor`, `KpiHistoryOptions`) but has no direct `ProjectReference`
it relies on a transitive pull through CentralUI. Host-026 (Low): the Component
Registration Matrix in `Component-Host.md` is stale — KpiHistory and Transport are
registered central-only in `Program.cs` but absent from the matrix.
## Checklist coverage
| # | Category | Examined | Notes |
@@ -110,6 +137,21 @@ _Re-review (2026-05-28, `1eb6e97`):_
| 9 | Testing coverage | ☑ | Strong existing suite. No coverage for the Site `CentralContactPoints` second-entry rule (Host-016), the site-shutdown ordering (Host-017), the `NodeName`-absent shipped config (Host-018), the unused `CancellationToken` parameter (Host-019), the `MinimumLevel.Is` override semantics (Host-020) or the `ParseLevel` silent fallback (Host-022). |
| 10 | Documentation & comments | ☑ | Re-review: layered `MinimumLevel.Is` / `ReadFrom.Configuration` semantics are not surfaced — an operator-set `Serilog:MinimumLevel` is silently overridden by `ScadaBridge:Logging:MinimumLevel` (Host-020); `ParseLevel` silently coerces a misspelled level to `Information` with no warning (Host-022). |
_Re-review (2026-06-20, `4307c381`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | Re-review: shipped `appsettings.Site.json` second seed-node targets MetricsPort 8084 (HTTP/1.1 Kestrel listener), not an Akka remoting endpoint — regression of the resolved Host-004 fix (Host-023). |
| 2 | Akka.NET conventions | ☑ | All five central singletons + the four site singletons reviewed: ClusterSingletonManager/Proxy pairing, role scoping (site singletons `WithRole(site-*)`, central on base "Central"), `PhaseClusterLeave` graceful-stop drain tasks, receptionist registrations, `RequiredSingletonsHealthCheck` `Identify` probe — all sound. No new issues. |
| 3 | Concurrency & thread safety | ☑ | `GetOrCreateActorSystem` double-checked lock (HOST-021) is correct; `_trackedDisposables` snapshot-under-lock; `RequiredSingletonsHealthCheck` probes concurrently and never throws. No new issues. |
| 4 | Error handling & resilience | ☑ | `StartupRetry` now threads `ApplicationStopping` + transient-only classifier (Host-019/015 resolved); migration retry sound. No new issues. |
| 5 | Security | ☑ | Shipped configs keep `${...}` secret placeholders; pepper fail-fast validated (≥16 chars). Note: docker per-node `appsettings.Central.json` still carry plaintext SQL passwords / JWT key — out of this module's edit scope (Host-003 follow-up), not re-filed. |
| 6 | Performance & resource management | ☑ | ActorSystem singleton-not-transient avoids per-probe disposal; gRPC stream cancel-on-shutdown wired. No new issues. |
| 7 | Design-document adherence | ☑ | Re-review: Component Registration Matrix omits KpiHistory + Transport (both central-only in Program.cs) (Host-026); `MachineDataDb` required-but-unconsumed and absent from shipped Central config (Host-024). |
| 8 | Code organization & conventions | ☑ | Re-review: Host directly consumes KpiHistory types but has no direct `ProjectReference` (transitive via CentralUI) (Host-025). |
| 9 | Testing coverage | ☑ | Strong suite (RequiredSingletons, Hocon, Serilog, StartupRetry covered). Gaps: no test asserts a seed on the MetricsPort is rejected (Host-023 — the existing 8084-seed test only avoids the grpc rule), and no shipped-config test catches the missing `MachineDataDb` key (Host-024). |
| 10 | Documentation & comments | ☑ | Re-review: `Component-Host.md` matrix stale vs `Program.cs` (Host-026). Inline XML/comments otherwise accurate and unusually thorough. |
## Findings
### Host-001 — `/health/ready` includes the leader-only `active-node` check
@@ -1118,3 +1160,163 @@ behaviour you choose.
**Resolution**
_Open._
### Host-023 — Shipped Site `SeedNodes` second entry targets the MetricsPort, not a remoting port
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.Host/appsettings.Site.json:13-17`, `src/ZB.MOM.WW.ScadaBridge.Host/StartupValidator.cs:117-127` |
**Description**
The shipped site config sets `Node:RemotingPort = 8082`, `Node:GrpcPort = 8083`,
`Node:MetricsPort = 8084`, but `Cluster:SeedNodes` is
`["akka.tcp://scadabridge@localhost:8082", "akka.tcp://scadabridge@localhost:8084"]`.
The second seed targets `8084` — which `Program.cs:419-422` binds as the **HTTP/1.1
Kestrel `/metrics` listener** (`HttpProtocols.Http1AndHttp2`), not an Akka.Remote
endpoint. A node joining via that seed attempts an Akka.Remote TCP association against
the Prometheus scrape listener and fails. This is the exact defect class as the
**resolved Host-004** (where the second seed targeted the gRPC port 8083): Host-004's
fix corrected the seed to `8084` and called it a "remoting port", but the later
introduction of `MetricsPort` (commits `bbc9f092`/`c41cb41c`) claimed `8084`, silently
re-breaking the example. `StartupValidator` (`StartupValidator.cs:121-127`) was
extended by Host-004 to reject a seed whose port equals `GrpcPort`, but it does **not**
reject a seed whose port equals `MetricsPort`, so this misconfiguration passes
validation silently. For the single-node dev loopback the first seed (`8082`) succeeds
and the bug is masked, but it is an incorrect example that copies into multi-node
configs (the docker site configs correctly use distinct hostnames both on `8082`).
**Recommendation**
The loopback dev site cannot host two distinct remoting endpoints, so drop the
second seed (mirror Host-016's single-entry `CentralContactPoints` template) or use a
distinct hostname placeholder both on `8082`, with a comment that multi-node sites
list each node's *remoting* port. Extend the Site block of `StartupValidator` to also
reject any seed whose port equals the resolved `MetricsPort` (and ideally any port
the node binds for non-remoting use). Add a `StartupValidatorTests` case
`Site_SeedNodeOnMetricsPort_FailsValidation` mirroring
`Site_SeedNodeOnGrpcPort_FailsValidation`.
**Resolution**
Deferred 2026-06-20: the shipped `appsettings.Site.json` second seed-node points at the MetricsPort and the validator lacks a `seedPort != metricsPort` rule; fixing touches shipped dev config and a single-node-loopback assumption. Recorded for a follow-up (no production node ships this dev config).
### Host-024 — `MachineDataDb` re-required for Central but consumed by nothing and absent from the shipped config
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.Host/StartupValidator.cs:63-65`, `src/ZB.MOM.WW.ScadaBridge.Host/appsettings.Central.json:22-24` |
**Description**
Commit `76198b36` ("reverts Host-008") re-added a fail-fast `.Require` for
`ScadaBridge:Database:MachineDataDb` to the Central `.When` block, citing
REQ-HOST-3/REQ-HOST-4 and that the docker per-node configs carry the key. Two problems
remain after the reversal:
1. **Still consumed by nothing.** A repo-wide search shows `MachineDataDb` is read only
in `StartupValidator.cs` and declared on `DatabaseOptions.cs:11` — no DbContext, no
repository, no component reads it. Only `ConfigurationDb` is wired into
`AddConfigurationDatabase` (`Program.cs:200-202`). The original Host-008 rationale
("dead configuration that fails startup for a value nothing uses") is therefore
still entirely true; the reversal re-instated the fail-fast without adding a
consumer.
2. **Absent from the shipped Central config.** `appsettings.Central.json` (the template
a developer running the binary directly uses) contains `Database:ConfigurationDb`
(a `${...}` placeholder) but **no `MachineDataDb` key at all**. So a Central node
started from the shipped config fails `StartupValidator` immediately with
"ScadaBridge:Database:MachineDataDb connection string required for Central" — for a
value nothing consumes. The docker configs happen to set it; the shipped default
does not. (This is *not* a re-file of Host-008, which was Resolved-then-reverted;
the new, distinct defect is the validation breaking the shipped dev config.)
This is also a code-vs-doc inconsistency: the design intent (validate a value the
system needs) is unmet because the value is needed by nothing.
**Recommendation**
Pick one and make code, shipped config, and doc agree: either (a) wire a machine-data
store so the value is actually consumed and add a `MachineDataDb` placeholder to the
shipped `appsettings.Central.json` (matching the `_secrets` env-var pattern), or (b)
remove the `.Require`, the `DatabaseOptions.MachineDataDb` property, and the docker
keys again. Add a config-shape regression test asserting the shipped
`appsettings.Central.json` satisfies `StartupValidator` for the Central role (the
absence of such a test is why this slipped through).
**Resolution**
Deferred 2026-06-20: `MachineDataDb` is re-required for Central but consumed by nothing and absent from shipped Central config (the shipped Central binary would fail StartupValidator). This is a reversal of the reverted Host-008 and needs a product decision — whether a machine-data store is actually coming (add a config placeholder) or the requirement should be removed again. Recorded for that decision.
### Host-025 — Host directly consumes KpiHistory but has no direct project reference
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Host/ZB.MOM.WW.ScadaBridge.Host.csproj:42-63`, `src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:118`, `src/ZB.MOM.WW.ScadaBridge.Host/Actors/AkkaHostedService.cs:717-757` |
**Description**
`Program.cs` calls `builder.Services.AddKpiHistory(builder.Configuration)` and
`AkkaHostedService.RegisterCentralActors` constructs the `KpiHistoryRecorderActor`
cluster singleton from `KpiHistoryOptions` / `KpiHistoryRecorderActor`
(`ZB.MOM.WW.ScadaBridge.KpiHistory`), yet the Host `.csproj` carries **no
`ProjectReference` to `ZB.MOM.WW.ScadaBridge.KpiHistory`**. It compiles only because
CentralUI (`ZB.MOM.WW.ScadaBridge.CentralUI.csproj`) transitively re-exports it. The
Host is the composition root for every component and the project deliberately declares
direct dependencies elsewhere — the csproj even comments that `ZB.MOM.WW.Theme` is
declared directly "rather than leaning on CentralUI re-exporting it transitively"
(lines 37-39). KpiHistory is the same situation but was not given the same treatment.
If CentralUI ever drops or restructures its KpiHistory reference, the Host fails to
compile for a dependency it directly uses — a latent build fragility.
**Recommendation**
Add `<ProjectReference Include="../ZB.MOM.WW.ScadaBridge.KpiHistory/ZB.MOM.WW.ScadaBridge.KpiHistory.csproj" />`
to the Host csproj's `ProjectReference` group, mirroring the explicit Theme reference
rationale already documented there.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): added an explicit `<ProjectReference>` to the KpiHistory project in the Host csproj, so Host's direct use of KpiHistory types no longer relies on the transitive pull through CentralUI.
### Host-026 — Component Registration Matrix omits KpiHistory and Transport
| | |
|--|--|
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | `docs/requirements/Component-Host.md` (Component Registration Matrix, lines 177-198), `src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:92,118` |
**Description**
`Program.cs` registers two central-only components that have shipped since the matrix
was last updated: **Transport** (`builder.Services.AddTransport()`, line 92, #24) and
**KpiHistory** (`builder.Services.AddKpiHistory(...)`, line 118, plus the
`kpi-history-recorder` cluster singleton in `AkkaHostedService`, #26). Neither appears
in the Component Registration Matrix in `Component-Host.md` (lines 177-198), which is
the documented source of which components register on which role. The matrix is
described in CLAUDE.md as something that "must stay in sync with actual component
documents"; it is now stale against the actual composition root. A maintainer auditing
role-based registration against the doc would miss both components (and miss that
KpiHistory's recorder is deliberately *not* readiness-gated — worth a matrix note).
**Recommendation**
Add `Transport` (Central: Yes / Site: No / DI: Yes / Actors: No / Endpoints: No) and
`KpiHistory` (Central: Yes / Site: No / DI: Yes / Actors: Yes / Endpoints: No) rows to
the matrix, and note KpiHistory's recorder singleton is intentionally absent from
`RequiredSingletonsHealthCheck` (not readiness-gated). Re-check the Dependencies list
("All 19 component libraries") against the actual csproj reference count while editing.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): added `KpiHistory` and `Transport` rows to the Component Registration Matrix in Component-Host.md (both Central=Yes/Site=No/DI=Yes), reflecting their actual central-only registration.