docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28)
plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381.

67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit
fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix
with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings
(IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision.

Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to
sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024),
an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001),
and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary
is semantically sound (symbol-based) in the production cluster config.

README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
Joseph Doherty
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
+222 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.Security` |
| Design doc | `docs/requirements/Component-Security.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Last reviewed | 2026-06-20 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 0 (1 deferred — Security-008) |
| Commit reviewed | `4307c381` |
| Open findings | 0 (2 deferred — Security-008, Security-022) |
## Summary
@@ -78,6 +78,50 @@ values silently surface only on first login (Security-020); and the
`RequireHttpsCookie=false` dev opt-out emits no warning, so an HTTP production
deployment silently transmits the JWT bearer credential in cleartext (Security-021).
#### Re-review 2026-06-20 (commit `4307c381`) — full review
Major architectural change since `1eb6e97`: the project was renamed (ScadaLink →
ZB.MOM.WW.ScadaBridge) and the bespoke `LdapAuthService`/`LdapTransport`/`LdapAuthResult`
were **cut over to the external `ZB.MOM.WW.Auth.Ldap` library** (commit `ac34dac4`); the
prior `SiteScopeAuthorizationHandler` was deleted (Security-017 resolution), and the role
model was canonicalised + collapsed (`Admin→Administrator`, `Design→Designer`,
`Deployment→Deployer`, `Audit→Administrator`, `AuditReadOnly→Viewer`; commit `b104760b`)
with M7 `Operator`/`Verifier` added (`a0ce8b6c`). The interactive cookie session now signs
in with **bare claims** via the shared `SessionClaimBuilder` and is policed by
`CookieSessionValidator` (idle-timeout + LDAP-free DB role-refresh, fail-closed, well
tested). All prior fixes that survived the cutover remain in place (key-length guard,
issuer/audience binding, `RoleMapper` union semantics, `Roles` constants, cookie hardening
via `ZbCookieDefaults`, the `RequireHttpsCookie` warning). Security-008 (N+1 scope-rule
query) is still correctly **Deferred**. This pass surfaced **4 new findings**
(Security-022..025): three Medium and one Low. The most consequential is **Security-022**
the documented "cookie + JWT hybrid" no longer holds at HEAD: `/auth/token` still *mints* a
bearer JWT but **no code anywhere validates it** (CLI uses HTTP Basic Auth, Inbound API uses
the `sbk_` ApiKeys verifier, the cookie path uses bare claims), so `JwtTokenService`'s entire
`ValidateToken`/`RefreshToken`/`RecordActivity`/`ShouldRefresh`/`IsIdleTimedOut` surface is
dead code minting an orphaned credential. **Security-023**`LdapGroupMapping.Role` accepts
arbitrary free strings via CLI/API with no canonical-set validation, and the Central UI
`RequireClaim` policies are case-sensitive while the ManagementActor authz check is
case-insensitive, so a mis-cased/typo'd mapping silently grants inconsistent access across
the two surfaces with no operator feedback. **Security-024** — the M7 `RequireOperator`/
`RequireVerifier` SoD policies have no functional `AuthorizeAsync` grant/deny test (only
constant-name assertions), unlike every other policy. **Security-025** (Low) — a unit test
makes a live network connect to `nonexistent.invalid:9999`.
_Re-review (2026-06-20, `4307c381`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `RoleMapper` union semantics (Security-016 fix) hold and read correctly. `LdapGroupMapping.Role` is an unvalidated free string + case-sensitive UI policy vs case-insensitive ManagementActor authz (Security-023). |
| 2 | Akka.NET conventions | ☑ | No actors. `AddSecurityActors` is still a placeholder. No issues. |
| 3 | Concurrency & thread safety | ☑ | Services stateless/DI-scoped; `CookieSessionValidator` reads injected `TimeProvider`. LDAP now in the external library. No shared mutable state. No issues. |
| 4 | Error handling & resilience | ☑ | `CookieSessionValidator` fails closed on idle (missing anchor = timed out) and swallows refresh faults keeping current roles (mirrors "LDAP failure: active sessions continue") — correct and well tested. `LdapAuthFailureMessages` maps the library's structured failure enum to stable user text. No issues. |
| 5 | Security | ☑ | Cookie hardening (HttpOnly/SameSite=Strict/Secure-when-required/sliding) comes from `ZbCookieDefaults` — verified correct. JWT mint binds issuer/audience, pins `MapInbound/OutboundClaims=false`, fails fast on short key. BUT the minted bearer JWT is validated by no consumer (Security-022). DisableLogin auto-login is gated at registration + loud warning + doc'd (acceptable). |
| 6 | Performance & resource management | ☑ | Security-008 N+1 remains correctly Deferred (gated on `ISecurityRepository`). `CookieSessionValidator` deliberately avoids re-minting on the no-refresh path. No new perf issues. |
| 7 | Design-document adherence | ☑ | The cookie+JWT-hybrid design is now cookie-with-bare-claims; the JWT half is orphaned/unvalidated (Security-022) — the M2.19 doc note acknowledges bare-claim cookies but the doc still presents JWT-in-cookie as the model and does not flag that `/auth/token`'s JWT is unconsumed. Operator/Verifier policies match the design. |
| 8 | Code organization & conventions | ☑ | `Roles`/`SecurityOptions` owned by the component; repository interface in Commons; additive `IGroupRoleMapper<string>` adapter. Dead JWT methods (Security-022) are the main org smell. |
| 9 | Testing coverage | ☑ | `CookieSessionValidator`, `SessionClaimBuilder`, `RoleMapper` union, cookie wiring, `SecurityOptionsValidator`, DisableLogin are well covered. Gaps: no functional authz test for `RequireOperator`/`RequireVerifier` (Security-024); a unit test does live network I/O (Security-025); no test for a non-canonical/mis-cased mapping role (Security-023). |
| 10 | Documentation & comments | ☑ | XML docs are thorough and current. `SecurityOptions` and `JwtTokenService` docs still describe an active JWT-refresh lifecycle that no longer has a caller (Security-022). |
## Checklist coverage
| # | Category | Examined | Notes |
@@ -1008,3 +1052,178 @@ SecurityOptions:RequireHttpsCookie=true in production.")`. Optionally, also fail
startup when `RequireHttpsCookie=false` AND `ASPNETCORE_ENVIRONMENT=Production`. Add a
regression test that asserts the warning is emitted when the flag is disabled and not
when it is enabled.
### Security-022 — `/auth/token` mints a bearer JWT that no consumer validates; the entire `JwtTokenService` validate/refresh surface is dead code
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:149`, `:194`, `:209`, `:239`, `:272`; `src/ZB.MOM.WW.ScadaBridge.CentralUI/Auth/AuthEndpoints.cs:99-148` |
**Description**
The design (Component-Security.md "Cookie + JWT Hybrid", CLAUDE.md "cookie+JWT hybrid …
HMAC-SHA256, 15-minute expiry w/ sliding refresh") presents a cookie-embedded JWT as the
session model. At HEAD this is no longer how the interactive session works (the M2.19 note
acknowledges the cookie now carries **bare claims** policed by `CookieSessionValidator`),
and — more importantly — the JWT half is now an **orphaned, unconsumed credential**. A grep
across `src/` shows `JwtTokenService.ValidateToken`, `RefreshToken`, `RecordActivity`, and
`ShouldRefresh` have **zero production callers**; only `GenerateToken` is called, solely by
`POST /auth/token` (`AuthEndpoints.cs:134`). That endpoint returns an `access_token` to the
caller, but **no endpoint in the codebase accepts or validates that JWT** as a credential:
the CLI authenticates to the Management API with HTTP Basic Auth
(`CLI/ManagementHttpClient.cs:36`), the Inbound API authenticates with the external
`ZB.MOM.WW.Auth.ApiKeys` `sbk_<keyId>_<secret>` verifier, and the cookie pipeline uses
`SessionClaimBuilder` bare claims (never `ValidateToken`). The result: `/auth/token` issues
HMAC-signed bearer tokens that grant access to nothing, and ~120 lines of carefully-hardened
issuer/audience/clock-skew/idle/refresh logic (the subject of resolved findings
Security-006/007/014) are dead code. The hazard is twofold: (a) the design doc materially
overstates the live session mechanism, misleading any future reader/auditor; and (b) an
orphaned token mint is a latent footgun — a future endpoint added under the assumption that
`/auth/token` produces a *usable* credential would wire a second, parallel auth path with
no test coverage proving it actually validates.
**Recommendation**
Decide the intent explicitly. Either (a) **wire the bearer path**: add
`AddAuthentication().AddJwtBearer(...)` (or a custom handler delegating to
`JwtTokenService.ValidateToken`) on whatever surface is meant to accept the token, and add
an end-to-end test that a `/auth/token`-issued JWT authorizes a protected request — then the
hybrid model the doc describes is real; or (b) **remove the dead code**: delete
`/auth/token` and `JwtTokenService.ValidateToken`/`RefreshToken`/`RecordActivity`/
`ShouldRefresh`/`IsIdleTimedOut` (keep only what the bare-claim cookie path needs, which is
none of `JwtTokenService`), and rewrite Component-Security.md's "Cookie + JWT Hybrid" /
"Token Lifecycle" sections to describe the actual bare-claim cookie + `CookieSessionValidator`
mechanism as the primary model rather than a footnote. Whichever is chosen, the design doc
and the code must agree on whether a usable JWT exists.
**Resolution**
Deferred 2026-06-20: this needs a product decision — either wire a JWT-bearer auth handler so the documented cookie+JWT hybrid is real, or delete `/auth/token` + the orphaned `JwtTokenService` validate/refresh surface and rewrite the design doc to describe the actual bare-claim cookie model. Recorded for that decision.
### Security-023 — `LdapGroupMapping.Role` is an unvalidated free string; case-sensitive UI policy vs case-insensitive server authz grants inconsistent access
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:144-157`; `src/ZB.MOM.WW.ScadaBridge.Security/RoleMapper.cs:40-42`; `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Security/LdapGroupMapping.cs:10`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:106`, `:1893` |
**Description**
The role string an operator maps an LDAP group to (`LdapGroupMapping.Role`) is a free
`string` with no validation against the canonical `Roles.*` set anywhere on the write path:
`ManagementActor` (`:1893`) does `new LdapGroupMapping(cmd.LdapGroupName, cmd.Role)` verbatim,
and the CLI `security role-mapping create --role <value>`
(`CLI/Commands/SecurityCommands.cs:157`) accepts any string. The Central UI dropdown
(`LdapMappingForm.razor`) constrains the *UI* path to the canonical constants, but the CLI/API
path does not. Two enforcement surfaces then disagree on casing:
- **Central UI authorization** uses `policy.RequireClaim(JwtTokenService.RoleClaimType, Roles.Deployer)`
(`AuthorizationPolicies.cs:150`). ASP.NET Core's `RequireClaim` compares claim *values*
with ordinal (case-**sensitive**) equality.
- **ManagementActor authorization** uses `user.Roles.Contains(requiredRole, StringComparer.OrdinalIgnoreCase)`
(`ManagementActor.cs:106`) — case-**insensitive**.
- **`RoleMapper`** matches the Deployer scope branch with `StringComparison.OrdinalIgnoreCase`
(`RoleMapper.cs:42`) but stamps the **DB row's verbatim casing** into the role claim.
So a mapping created via CLI/API as `--role deployer` (lowercase) or a typo like `Deploy`
produces a principal whose role claim is `"deployer"`: the **ManagementActor/CLI surface
authorizes it** (case-insensitive), the **Central UI `RequireDeployment` policy denies it**
(case-sensitive), and for the Deployer case `RoleMapper` still resolves site scope (matching
case-insensitively) — a user who can deploy via CLI but is locked out of the equivalent UI
pages, or vice versa, with no error explaining why. A pure typo (`"Deploer"`) silently grants
a role claim that *no* policy and no ManagementActor check matches — the user appears to hold
a role yet is denied everywhere, with no validation feedback at mapping-creation time.
**Recommendation**
Validate `cmd.Role` against the canonical `Roles.All` set (case-insensitively, normalising to
the canonical casing) when creating/updating a mapping — in `ManagementActor` (the single
server-side write path) and reject non-canonical values with a clear error; optionally also
guard in the CLI for a fast local failure. Separately, make the two authorization surfaces
agree on casing: either normalise the role claim to canonical casing in `SessionClaimBuilder`/
`RoleMapper` before it is stamped, or make the ManagementActor check case-sensitive to match
`RequireClaim` (the safer direction once values are validated). Add a regression test for a
mis-cased mapping role.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): added membership validation rejecting any `cmd.Role` not in `Roles.All` (canonical set, case-insensitive) at LDAP-group-mapping create AND update, before any DB write — closing the unvalidated-free-string gap. The casing/comparison-asymmetry half is a larger change and remains a recorded follow-up (membership check is safe: a non-canonical role never worked). No existing test regressed.
### Security-024 — `RequireOperator` / `RequireVerifier` policies have no functional authorization test
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:153-157`; `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/SecurityTests.cs:1045-1186` (AuthorizationPolicyTests); `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/RolesTests.cs:44-45` |
**Description**
The M7 two-person Secured Write feature added the `RequireOperator` and `RequireVerifier`
policies (`AuthorizationPolicies.cs:153-157`), each a single-role `RequireClaim`. These gate a
SCADA control-surface write workflow whose entire safety argument is separation of duties, so
correct grant/deny behaviour is security-critical. But `AuthorizationPolicyTests` — which
exercises every *other* policy with real `IAuthorizationService.AuthorizeAsync` evaluations
(Admin/Design/Deployment/OperationalAudit/AuditExport, including the load-bearing
"Viewer reads but cannot export" SoD case) — has **no** Operator/Verifier test. The only
coverage is `RolesTests.cs:44-45`, which asserts the *constant string values*
(`"RequireOperator"`, `"RequireVerifier"`) — not that an `Operator` principal satisfies
`RequireOperator`, that a `Verifier` does not, or that the two policies are mutually distinct.
A regression that, say, mapped `RequireOperator` to `Roles.Verifier` (or to
`OperationalAuditRoles`) would compile and pass the existing tests while silently collapsing
the separation of duties.
**Recommendation**
Add `AuthorizeAsync`-based `[Theory]` cases to `AuthorizationPolicyTests` mirroring the
existing policy tests: `RequireOperator` succeeds for `Roles.Operator` and fails for
`Roles.Verifier`/`Administrator`/empty; `RequireVerifier` succeeds for `Roles.Verifier` and
fails for `Roles.Operator`; and a combined case asserting an Operator-only principal cannot
satisfy `RequireVerifier` (the SoD invariant at the policy layer).
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): added functional `AuthorizeAsync` grant/deny tests for `RequireOperator`/`RequireVerifier` (and a combined SoD-distinctness test), guarding the separation-of-duties policy wiring against regression.
### Security-025 — Unit test performs a live network connection to `nonexistent.invalid:9999`
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/SecurityTests.cs:71-89` (`AuthenticateAsync_ConnectionFailure_FailsClosed_NeverThrows`) |
**Description**
`LdapAuthServiceTests.AuthenticateAsync_ConnectionFailure_FailsClosed_NeverThrows` constructs
the external library's `LdapAuthService` pointed at `Server = "nonexistent.invalid"`,
`Port = 9999` with a 2-second `ConnectionTimeoutMs`, then asserts the result fails closed.
Although the test relies on the connection *failing*, it still performs a real DNS resolution
and TCP connect attempt from the unit-test process. In an offline or network-sandboxed CI
environment the resolution/connect can behave differently (immediate failure vs. the full
2-second timeout vs. a captive-portal redirect), making the test both **slow** (up to ~2s of
wall-clock dead time) and **environment-dependent**. Unit tests in this suite are otherwise
network-free (the sibling cases are explicitly named `_NoNetwork`). The behaviour under test
(fail-closed mapping of an unreachable directory) is really the external library's
responsibility and is, per the file's own header comment, already covered by the library's own
test suite through its internal connection seam.
**Recommendation**
Drop this case from the ScadaBridge unit suite (the library owns the fail-closed contract via
its internal `ILdapConnection` seam, as the region comment notes), or move it to an explicitly
opt-in integration-test category so it does not run in the default network-free unit pass. If
kept as a smoke test, point it at a guaranteed-unroutable loopback address (e.g.
`127.0.0.1` on a closed port) to bound the failure deterministically rather than relying on
DNS resolution of `.invalid`.
**Resolution**
Resolved 2026-06-20 (commit `fd618cf1`): the fail-closed connection test now targets `127.0.0.1` on a closed port instead of `nonexistent.invalid:9999` — deterministic loopback connection-refused, no external DNS/timeout dependency. Same assertion preserved.