docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28) plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381. 67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings (IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision. Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024), an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001), and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary is semantically sound (symbol-based) in the production cluster config. README regenerated; regen-readme.py --check passes (4 pending / 567 total).
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
@@ -5,10 +5,10 @@
 | Module | `src/ZB.MOM.WW.ScadaBridge.Security` |
 | Design doc | `docs/requirements/Component-Security.md` |
 | Status | Reviewed |
-| Last reviewed | 2026-05-28 |
+| Last reviewed | 2026-06-20 |
 | Reviewer | claude-agent |
-| Commit reviewed | `1eb6e97` |
-| Open findings | 0 (1 deferred — Security-008) |
+| Commit reviewed | `4307c381` |
+| Open findings | 0 (2 deferred — Security-008, Security-022) |

 ## Summary

@@ -78,6 +78,50 @@ values silently surface only on first login (Security-020); and the
 `RequireHttpsCookie=false` dev opt-out emits no warning, so an HTTP production
 deployment silently transmits the JWT bearer credential in cleartext (Security-021).

+#### Re-review 2026-06-20 (commit `4307c381`) — full review
+
+Major architectural change since `1eb6e97`: the project was renamed (ScadaLink →
+ZB.MOM.WW.ScadaBridge) and the bespoke `LdapAuthService`/`LdapTransport`/`LdapAuthResult`
+were **cut over to the external `ZB.MOM.WW.Auth.Ldap` library** (commit `ac34dac4`); the
+prior `SiteScopeAuthorizationHandler` was deleted (Security-017 resolution), and the role
+model was canonicalised + collapsed (`Admin→Administrator`, `Design→Designer`,
+`Deployment→Deployer`, `Audit→Administrator`, `AuditReadOnly→Viewer`; commit `b104760b`)
+with M7 `Operator`/`Verifier` added (`a0ce8b6c`). The interactive cookie session now signs
+in with **bare claims** via the shared `SessionClaimBuilder` and is policed by
+`CookieSessionValidator` (idle-timeout + LDAP-free DB role-refresh, fail-closed, well
+tested). All prior fixes that survived the cutover remain in place (key-length guard,
+issuer/audience binding, `RoleMapper` union semantics, `Roles` constants, cookie hardening
+via `ZbCookieDefaults`, the `RequireHttpsCookie` warning). Security-008 (N+1 scope-rule
+query) is still correctly **Deferred**. This pass surfaced **4 new findings**
+(Security-022..025): three Medium and one Low. The most consequential is **Security-022** —
+the documented "cookie + JWT hybrid" no longer holds at HEAD: `/auth/token` still *mints* a
+bearer JWT but **no code anywhere validates it** (CLI uses HTTP Basic Auth, Inbound API uses
+the `sbk_` ApiKeys verifier, the cookie path uses bare claims), so `JwtTokenService`'s entire
+`ValidateToken`/`RefreshToken`/`RecordActivity`/`ShouldRefresh`/`IsIdleTimedOut` surface is
+dead code minting an orphaned credential. **Security-023** — `LdapGroupMapping.Role` accepts
+arbitrary free strings via CLI/API with no canonical-set validation, and the Central UI
+`RequireClaim` policies are case-sensitive while the ManagementActor authz check is
+case-insensitive, so a mis-cased/typo'd mapping silently grants inconsistent access across
+the two surfaces with no operator feedback. **Security-024** — the M7 `RequireOperator`/
+`RequireVerifier` SoD policies have no functional `AuthorizeAsync` grant/deny test (only
+constant-name assertions), unlike every other policy. **Security-025** (Low) — a unit test
+makes a live network connect to `nonexistent.invalid:9999`.
+
+_Re-review (2026-06-20, `4307c381`):_
+
+| # | Category | Examined | Notes |
+|---|----------|----------|-------|
+| 1 | Correctness & logic bugs | ☑ | `RoleMapper` union semantics (Security-016 fix) hold and read correctly. `LdapGroupMapping.Role` is an unvalidated free string + case-sensitive UI policy vs case-insensitive ManagementActor authz (Security-023). |
+| 2 | Akka.NET conventions | ☑ | No actors. `AddSecurityActors` is still a placeholder. No issues. |
+| 3 | Concurrency & thread safety | ☑ | Services stateless/DI-scoped; `CookieSessionValidator` reads injected `TimeProvider`. LDAP now in the external library. No shared mutable state. No issues. |
+| 4 | Error handling & resilience | ☑ | `CookieSessionValidator` fails closed on idle (missing anchor = timed out) and swallows refresh faults keeping current roles (mirrors "LDAP failure: active sessions continue") — correct and well tested. `LdapAuthFailureMessages` maps the library's structured failure enum to stable user text. No issues. |
+| 5 | Security | ☑ | Cookie hardening (HttpOnly/SameSite=Strict/Secure-when-required/sliding) comes from `ZbCookieDefaults` — verified correct. JWT mint binds issuer/audience, pins `MapInbound/OutboundClaims=false`, fails fast on short key. BUT the minted bearer JWT is validated by no consumer (Security-022). DisableLogin auto-login is gated at registration + loud warning + doc'd (acceptable). |
+| 6 | Performance & resource management | ☑ | Security-008 N+1 remains correctly Deferred (gated on `ISecurityRepository`). `CookieSessionValidator` deliberately avoids re-minting on the no-refresh path. No new perf issues. |
+| 7 | Design-document adherence | ☑ | The cookie+JWT-hybrid design is now cookie-with-bare-claims; the JWT half is orphaned/unvalidated (Security-022) — the M2.19 doc note acknowledges bare-claim cookies but the doc still presents JWT-in-cookie as the model and does not flag that `/auth/token`'s JWT is unconsumed. Operator/Verifier policies match the design. |
+| 8 | Code organization & conventions | ☑ | `Roles`/`SecurityOptions` owned by the component; repository interface in Commons; additive `IGroupRoleMapper<string>` adapter. Dead JWT methods (Security-022) are the main org smell. |
+| 9 | Testing coverage | ☑ | `CookieSessionValidator`, `SessionClaimBuilder`, `RoleMapper` union, cookie wiring, `SecurityOptionsValidator`, DisableLogin are well covered. Gaps: no functional authz test for `RequireOperator`/`RequireVerifier` (Security-024); a unit test does live network I/O (Security-025); no test for a non-canonical/mis-cased mapping role (Security-023). |
+| 10 | Documentation & comments | ☑ | XML docs are thorough and current. `SecurityOptions` and `JwtTokenService` docs still describe an active JWT-refresh lifecycle that no longer has a caller (Security-022). |
+
 ## Checklist coverage

 | # | Category | Examined | Notes |
@@ -1008,3 +1052,178 @@ SecurityOptions:RequireHttpsCookie=true in production.")`. Optionally, also fail
 startup when `RequireHttpsCookie=false` AND `ASPNETCORE_ENVIRONMENT=Production`. Add a
 regression test that asserts the warning is emitted when the flag is disabled and not
 when it is enabled.
+
+### Security-022 — `/auth/token` mints a bearer JWT that no consumer validates; the entire `JwtTokenService` validate/refresh surface is dead code
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Status | Deferred |
+| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:149`, `:194`, `:209`, `:239`, `:272`; `src/ZB.MOM.WW.ScadaBridge.CentralUI/Auth/AuthEndpoints.cs:99-148` |
+
+**Description**
+
+The design (Component-Security.md "Cookie + JWT Hybrid", CLAUDE.md "cookie+JWT hybrid …
+HMAC-SHA256, 15-minute expiry w/ sliding refresh") presents a cookie-embedded JWT as the
+session model. At HEAD this is no longer how the interactive session works (the M2.19 note
+acknowledges the cookie now carries **bare claims** policed by `CookieSessionValidator`),
+and — more importantly — the JWT half is now an **orphaned, unconsumed credential**. A grep
+across `src/` shows `JwtTokenService.ValidateToken`, `RefreshToken`, `RecordActivity`, and
+`ShouldRefresh` have **zero production callers**; only `GenerateToken` is called, solely by
+`POST /auth/token` (`AuthEndpoints.cs:134`). That endpoint returns an `access_token` to the
+caller, but **no endpoint in the codebase accepts or validates that JWT** as a credential:
+the CLI authenticates to the Management API with HTTP Basic Auth
+(`CLI/ManagementHttpClient.cs:36`), the Inbound API authenticates with the external
+`ZB.MOM.WW.Auth.ApiKeys` `sbk_<keyId>_<secret>` verifier, and the cookie pipeline uses
+`SessionClaimBuilder` bare claims (never `ValidateToken`). The result: `/auth/token` issues
+HMAC-signed bearer tokens that grant access to nothing, and ~120 lines of carefully-hardened
+issuer/audience/clock-skew/idle/refresh logic (the subject of resolved findings
+Security-006/007/014) are dead code. The hazard is twofold: (a) the design doc materially
+overstates the live session mechanism, misleading any future reader/auditor; and (b) an
+orphaned token mint is a latent footgun — a future endpoint added under the assumption that
+`/auth/token` produces a *usable* credential would wire a second, parallel auth path with
+no test coverage proving it actually validates.
+
+**Recommendation**
+
+Decide the intent explicitly. Either (a) **wire the bearer path**: add
+`AddAuthentication().AddJwtBearer(...)` (or a custom handler delegating to
+`JwtTokenService.ValidateToken`) on whatever surface is meant to accept the token, and add
+an end-to-end test that a `/auth/token`-issued JWT authorizes a protected request — then the
+hybrid model the doc describes is real; or (b) **remove the dead code**: delete
+`/auth/token` and `JwtTokenService.ValidateToken`/`RefreshToken`/`RecordActivity`/
+`ShouldRefresh`/`IsIdleTimedOut` (keep only what the bare-claim cookie path needs, which is
+none of `JwtTokenService`), and rewrite Component-Security.md's "Cookie + JWT Hybrid" /
+"Token Lifecycle" sections to describe the actual bare-claim cookie + `CookieSessionValidator`
+mechanism as the primary model rather than a footnote. Whichever is chosen, the design doc
+and the code must agree on whether a usable JWT exists.
+
+**Resolution**
+
+Deferred 2026-06-20: this needs a product decision — either wire a JWT-bearer auth handler so the documented cookie+JWT hybrid is real, or delete `/auth/token` + the orphaned `JwtTokenService` validate/refresh surface and rewrite the design doc to describe the actual bare-claim cookie model. Recorded for that decision.
+
+### Security-023 — `LdapGroupMapping.Role` is an unvalidated free string; case-sensitive UI policy vs case-insensitive server authz grants inconsistent access
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Status | Resolved |
+| Location | `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:144-157`; `src/ZB.MOM.WW.ScadaBridge.Security/RoleMapper.cs:40-42`; `src/ZB.MOM.WW.ScadaBridge.Commons/Entities/Security/LdapGroupMapping.cs:10`; `src/ZB.MOM.WW.ScadaBridge.ManagementService/ManagementActor.cs:106`, `:1893` |
+
+**Description**
+
+The role string an operator maps an LDAP group to (`LdapGroupMapping.Role`) is a free
+`string` with no validation against the canonical `Roles.*` set anywhere on the write path:
+`ManagementActor` (`:1893`) does `new LdapGroupMapping(cmd.LdapGroupName, cmd.Role)` verbatim,
+and the CLI `security role-mapping create --role <value>`
+(`CLI/Commands/SecurityCommands.cs:157`) accepts any string. The Central UI dropdown
+(`LdapMappingForm.razor`) constrains the *UI* path to the canonical constants, but the CLI/API
+path does not. Two enforcement surfaces then disagree on casing:
+
+- **Central UI authorization** uses `policy.RequireClaim(JwtTokenService.RoleClaimType, Roles.Deployer)`
+  (`AuthorizationPolicies.cs:150`). ASP.NET Core's `RequireClaim` compares claim *values*
+  with ordinal (case-**sensitive**) equality.
+- **ManagementActor authorization** uses `user.Roles.Contains(requiredRole, StringComparer.OrdinalIgnoreCase)`
+  (`ManagementActor.cs:106`) — case-**insensitive**.
+- **`RoleMapper`** matches the Deployer scope branch with `StringComparison.OrdinalIgnoreCase`
+  (`RoleMapper.cs:42`) but stamps the **DB row's verbatim casing** into the role claim.
+
+So a mapping created via CLI/API as `--role deployer` (lowercase) or a typo like `Deploy`
+produces a principal whose role claim is `"deployer"`: the **ManagementActor/CLI surface
+authorizes it** (case-insensitive), the **Central UI `RequireDeployment` policy denies it**
+(case-sensitive), and for the Deployer case `RoleMapper` still resolves site scope (matching
+case-insensitively) — a user who can deploy via CLI but is locked out of the equivalent UI
+pages, or vice versa, with no error explaining why. A pure typo (`"Deploer"`) silently grants
+a role claim that *no* policy and no ManagementActor check matches — the user appears to hold
+a role yet is denied everywhere, with no validation feedback at mapping-creation time.
+
+**Recommendation**
+
+Validate `cmd.Role` against the canonical `Roles.All` set (case-insensitively, normalising to
+the canonical casing) when creating/updating a mapping — in `ManagementActor` (the single
+server-side write path) and reject non-canonical values with a clear error; optionally also
+guard in the CLI for a fast local failure. Separately, make the two authorization surfaces
+agree on casing: either normalise the role claim to canonical casing in `SessionClaimBuilder`/
+`RoleMapper` before it is stamped, or make the ManagementActor check case-sensitive to match
+`RequireClaim` (the safer direction once values are validated). Add a regression test for a
+mis-cased mapping role.
+
+**Resolution**
+
+Resolved 2026-06-20 (commit `fd618cf1`): added membership validation rejecting any `cmd.Role` not in `Roles.All` (canonical set, case-insensitive) at LDAP-group-mapping create AND update, before any DB write — closing the unvalidated-free-string gap. The casing/comparison-asymmetry half is a larger change and remains a recorded follow-up (membership check is safe: a non-canonical role never worked). No existing test regressed.
+
+### Security-024 — `RequireOperator` / `RequireVerifier` policies have no functional authorization test
+
+| | |
+|--|--|
+| Severity | Medium |
+| Category | Testing coverage |
+| Status | Resolved |
+| Location | `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:153-157`; `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/SecurityTests.cs:1045-1186` (AuthorizationPolicyTests); `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/RolesTests.cs:44-45` |
+
+**Description**
+
+The M7 two-person Secured Write feature added the `RequireOperator` and `RequireVerifier`
+policies (`AuthorizationPolicies.cs:153-157`), each a single-role `RequireClaim`. These gate a
+SCADA control-surface write workflow whose entire safety argument is separation of duties, so
+correct grant/deny behaviour is security-critical. But `AuthorizationPolicyTests` — which
+exercises every *other* policy with real `IAuthorizationService.AuthorizeAsync` evaluations
+(Admin/Design/Deployment/OperationalAudit/AuditExport, including the load-bearing
+"Viewer reads but cannot export" SoD case) — has **no** Operator/Verifier test. The only
+coverage is `RolesTests.cs:44-45`, which asserts the *constant string values*
+(`"RequireOperator"`, `"RequireVerifier"`) — not that an `Operator` principal satisfies
+`RequireOperator`, that a `Verifier` does not, or that the two policies are mutually distinct.
+A regression that, say, mapped `RequireOperator` to `Roles.Verifier` (or to
+`OperationalAuditRoles`) would compile and pass the existing tests while silently collapsing
+the separation of duties.
+
+**Recommendation**
+
+Add `AuthorizeAsync`-based `[Theory]` cases to `AuthorizationPolicyTests` mirroring the
+existing policy tests: `RequireOperator` succeeds for `Roles.Operator` and fails for
+`Roles.Verifier`/`Administrator`/empty; `RequireVerifier` succeeds for `Roles.Verifier` and
+fails for `Roles.Operator`; and a combined case asserting an Operator-only principal cannot
+satisfy `RequireVerifier` (the SoD invariant at the policy layer).
+
+**Resolution**
+
+Resolved 2026-06-20 (commit `fd618cf1`): added functional `AuthorizeAsync` grant/deny tests for `RequireOperator`/`RequireVerifier` (and a combined SoD-distinctness test), guarding the separation-of-duties policy wiring against regression.
+
+### Security-025 — Unit test performs a live network connection to `nonexistent.invalid:9999`
+
+| | |
+|--|--|
+| Severity | Low |
+| Category | Testing coverage |
+| Status | Resolved |
+| Location | `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/SecurityTests.cs:71-89` (`AuthenticateAsync_ConnectionFailure_FailsClosed_NeverThrows`) |
+
+**Description**
+
+`LdapAuthServiceTests.AuthenticateAsync_ConnectionFailure_FailsClosed_NeverThrows` constructs
+the external library's `LdapAuthService` pointed at `Server = "nonexistent.invalid"`,
+`Port = 9999` with a 2-second `ConnectionTimeoutMs`, then asserts the result fails closed.
+Although the test relies on the connection *failing*, it still performs a real DNS resolution
+and TCP connect attempt from the unit-test process. In an offline or network-sandboxed CI
+environment the resolution/connect can behave differently (immediate failure vs. the full
+2-second timeout vs. a captive-portal redirect), making the test both **slow** (up to ~2s of
+wall-clock dead time) and **environment-dependent**. Unit tests in this suite are otherwise
+network-free (the sibling cases are explicitly named `_NoNetwork`). The behaviour under test
+(fail-closed mapping of an unreachable directory) is really the external library's
+responsibility and is, per the file's own header comment, already covered by the library's own
+test suite through its internal connection seam.
+
+**Recommendation**
+
+Drop this case from the ScadaBridge unit suite (the library owns the fail-closed contract via
+its internal `ILdapConnection` seam, as the region comment notes), or move it to an explicitly
+opt-in integration-test category so it does not run in the default network-free unit pass. If
+kept as a smoke test, point it at a guaranteed-unroutable loopback address (e.g.
+`127.0.0.1` on a closed port) to bound the failure deterministically rather than relying on
+DNS resolution of `.invalid`.
+
+**Resolution**
+
+Resolved 2026-06-20 (commit `fd618cf1`): the fail-closed connection test now targets `127.0.0.1` on a closed port instead of `nonexistent.invalid:9999` — deterministic loopback connection-refused, no external DNS/timeout dependency. Same assertion preserved.