Files
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00

1011 lines
58 KiB
Markdown

# Code Review — Security
| Field | Value |
|-------|-------|
| Module | `src/ZB.MOM.WW.ScadaBridge.Security` |
| Design doc | `docs/requirements/Component-Security.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 0 (1 deferred — Security-008) |
## Summary
The Security module is small and reasonably structured: a stateless `LdapAuthService`
for search-then-bind authentication, a `JwtTokenService` for HMAC-signed cookie tokens,
a `RoleMapper` that resolves LDAP groups to roles, and ASP.NET Core authorization
policies plus a site-scope handler. Unit-test coverage of the happy paths is decent.
However, the review surfaced several real security weaknesses, the most serious being
that **StartTLS is dead code** (the design's "LDAPS or StartTLS" requirement is only
half met), that **the authentication cookie is not marked `Secure`** despite the design
mandating it, and that **the JWT signing key is never length-validated** so a weak or
empty key is silently accepted. There is also a genuine **DN-injection** gap in the
no-service-account fallback path, a filter/DN attribute mismatch (`uid=` vs `cn=`) that
makes that fallback path internally inconsistent, and an N+1 query in `RoleMapper`.
JWT validation also disables issuer/audience checks and the idle-timeout claim is reset
on every refresh, weakening the documented 30-minute idle policy. None of these are
crash/data-loss bugs, but the TLS, cookie, and key-validation items are security
defects that should be fixed before any production deployment.
#### Re-review 2026-05-17 (commit `39d737e`)
Re-reviewed the module after the batch-resolution of Security-001..007, 009, 010, 011.
Those fixes were verified in place and remain Resolved: the `LdapTransport` enum makes
StartTLS reachable, the auth cookie is `SecurePolicy.Always` with a sliding window, the
JWT signing key is length-validated at construction, issuer/audience are bound and
checked, the DN-injection fallback is RFC 4514-escaped, and the idle-timeout anchor is
carried across refreshes. Security-008 (N+1 scope-rule query) remains correctly
**Deferred**`ISecurityRepository` still exposes no per-set scope-rule query, so the
N+1 cannot be removed from within `RoleMapper.cs` alone; the deferral condition is
unchanged. This pass surfaced **4 new findings** (Security-012..015): two Medium and
two Low. The most notable is that a *partial* LDAP outage during login (bind succeeds,
group search fails) silently returns an authenticated session with **zero roles**
instead of failing the login as the design's LDAP-failure rule requires
(Security-012); and that `RefreshToken` re-issues a token without ever checking
`IsIdleTimedOut`, so an idle-expired session can be renewed indefinitely if the caller
omits the separate idle check (Security-014). The two Low findings concern fragile
DN parsing of group names containing escaped commas and an un-trimmed username flowing
into the LDAP filter, fallback DN, and JWT claims.
#### Re-review 2026-05-28 (commit `1eb6e97`)
Re-reviewed the module on a fresh baseline. All Security-001..007, 009..015 fixes remain
in place; the only Open carry-over is Security-008 (still correctly **Deferred**
`ISecurityRepository` still exposes no per-set scope-rule query, so the N+1 in
`RoleMapper` cannot be removed from within this module). The original
Security-014 fix is now load-bearing: `RefreshToken` calls `IsIdleTimedOut` before
re-issuing, and the new cookie sliding-expiry tests in `SecurityReviewRegressionTests`
pin CentralUI-005's Security-side contract. This pass surfaced **6 new findings**
(Security-016..021): one Medium correctness/security defect, one Medium design-adherence
defect, and four Low. The most consequential is **Security-016** — when a user is
mapped to *both* a system-wide Deployment LDAP group (e.g. `SCADA-Deploy-All`) and a
site-scoped Deployment LDAP group (e.g. `SCADA-Deploy-SiteA`), `RoleMapper` silently
treats the union as site-scoped (the system-wide grant is dropped); the design's
"multiple groups grant multiple independent roles" intent is not honoured for this
mix-and-match case. **Security-017** is the cross-module partner of CentralUI-028:
`SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are declared and registered
but no production caller ever instantiates them — `[Authorize(Policy = RequireDeployment)]`
*does not* enforce the documented site scoping, callers must remember to inject
`SiteScopeService` and re-check `IsSiteAllowedAsync` themselves (which the two new
report pages flagged by CentralUI-028 forgot to do). The remaining Lows are: role names
are magic strings duplicated across `RoleMapper`, `SiteScopeAuthorizationHandler`, and
`AuthorizationPolicies` (Security-018); a service-account-rebind failure is reported
to the user as "Invalid username or password" — masking a misconfiguration as a
user-credential error (Security-019); required `SecurityOptions` fields
(`LdapServer`, `LdapSearchBase`) have no `IValidateOptions` startup check, so empty
values silently surface only on first login (Security-020); and the
`RequireHttpsCookie=false` dev opt-out emits no warning, so an HTTP production
deployment silently transmits the JWT bearer credential in cleartext (Security-021).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `uid=`/`cn=` attribute mismatch between search filter and fallback DN construction (Security-004); StartTLS branch is unreachable (Security-001). |
| 2 | Akka.NET conventions | ☑ | No actors in this module — `AddSecurityActors` is an empty placeholder. Nothing to assess. |
| 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; LDAP sync calls wrapped in `Task.Run`. No shared mutable state. No issues found. |
| 4 | Error handling & resilience | ☑ | LDAP failure paths return structured `LdapAuthResult`; group-lookup failure is tolerated per design. `ct` not honored inside `Task.Run` bodies (Security-009). |
| 5 | Security | ☑ | StartTLS dead code (Security-001), cookie not `Secure` (Security-002), JWT key unvalidated (Security-003), DN injection (Security-005), no issuer/audience validation (Security-006), idle-timeout reset on refresh (Security-007). |
| 6 | Performance & resource management | ☑ | N+1 scope-rule query in `RoleMapper` (Security-008). `LdapConnection` correctly disposed via `using`. |
| 7 | Design-document adherence | ☑ | StartTLS unsupported and Secure cookie missing both contradict the design doc; design also says "Windows Integrated Authentication" in Responsibilities, contradicting its own Authentication section (Security-010). |
| 8 | Code organization & conventions | ☑ | `SecurityOptions` correctly owned by the component; repository interface in Commons. No issues found. |
| 9 | Testing coverage | ☑ | No tests for `RoleMapper` N+1 behavior, DN-injection inputs, StartTLS path, or idle-timeout-after-refresh. Insecure-config combinations under-tested (Security-011). |
| 10 | Documentation & comments | ☑ | `SecurityOptions` XML docs say direct bind uses `cn={username}` while the search filter uses `uid=` — comment is misleading (covered under Security-004). |
_Re-review (2026-05-28, `1eb6e97`):_
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | `RoleMapper` drops a system-wide Deployment grant when the user is also in any site-scoped Deployment group (Security-016); hard-coded role-name string `"Deployment"` in two separate places allows a refactor to silently break site scoping (Security-018). |
| 2 | Akka.NET conventions | ☑ | No actors. `AddSecurityActors` is still a registration placeholder. No issues. |
| 3 | Concurrency & thread safety | ☑ | Services stateless; LDAP sync calls wrapped in `Task.Run` with the now-bounded timeout (Security-009 resolution holds). No issues found. |
| 4 | Error handling & resilience | ☑ | A service-account-rebind failure inside `AuthenticateAsync` is reported as "Invalid username or password", masking a misconfiguration as a user-credential error (Security-019). LDAP-failure rule + partial-outage path remain correctly enforced post-Security-012. |
| 5 | Security | ☑ | `SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are dead code — no policy is registered that uses them and no production caller instantiates them, so declarative `[Authorize]` does not enforce site scoping (Security-017, cross-module partner of CentralUI-028). `RequireHttpsCookie=false` dev opt-out has no warning path — a production misconfiguration silently transmits the JWT bearer credential over HTTP (Security-021). |
| 6 | Performance & resource management | ☑ | Security-008 N+1 remains correctly Deferred (still gated on `ISecurityRepository`). No new perf issues. |
| 7 | Design-document adherence | ☑ | `RoleMapper`'s drop-system-wide-on-any-scoped behaviour (Security-016) contradicts the design's "A user can hold multiple roles simultaneously … roles are independent — there is no implied hierarchy" rule for the union case; `SiteScopeRequirement` advertises a site-scope authorization pattern the implementation does not actually wire up (Security-017). |
| 8 | Code organization & conventions | ☑ | Role-name strings are duplicated as magic literals across `RoleMapper.cs`, `SiteScopeAuthorizationHandler.cs`, and `AuthorizationPolicies.cs` — only the audit roles have a single source of truth via `OperationalAuditRoles` / `AuditExportRoles` (Security-018). `SecurityOptions` defaults pass through to runtime with no `IValidateOptions` for required fields like `LdapServer` / `LdapSearchBase` (Security-020). |
| 9 | Testing coverage | ☑ | No test covers a user mapped to both a system-wide AND a site-scoped Deployment LDAP group (the Security-016 case). No test covers the `SiteScopeRequirement` cross-page integration — tests evaluate the handler in isolation, not the absence of a policy that uses it (Security-017). |
| 10 | Documentation & comments | ☑ | `SiteScopeAuthorizationHandler` XML doc describes a permission model no caller actually invokes (Security-017). Otherwise stable. |
## Findings
### Security-001 — StartTLS upgrade path is unreachable dead code
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:37-47` |
**Description**
When `LdapUseTls` is true the code sets `connection.SecureSocketLayer = true` (LDAPS).
The subsequent StartTLS block is guarded by `if (_options.LdapUseTls && !connection.SecureSocketLayer)`.
Because `SecureSocketLayer` was just set to `true`, the second condition `!connection.SecureSocketLayer`
is always false, so `connection.StartTls()` is never called. The design doc explicitly
states LDAP connections must use **"LDAPS (port 636) or StartTLS"** — StartTLS is in
practice unsupported. A deployment that intends to use StartTLS on port 389 would get a
plaintext LDAPS-mode connection attempt that fails, or worse, an operator may disable
TLS entirely to make it work, sending credentials in cleartext.
**Recommendation**
Introduce an explicit transport mode (e.g. `LdapTransport { Ldaps, StartTls, None }`)
or a separate `LdapUseStartTls` flag. For StartTLS, leave `SecureSocketLayer` false,
call `connection.Connect`, then call `connection.StartTls()` and verify the negotiated
session is encrypted before binding. Remove the unreachable conditional.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Added an explicit `LdapTransport` enum
(`Ldaps`/`StartTls`/`None`); `SecureSocketLayer` is set only for LDAPS, and the
StartTLS branch now connects in plaintext, calls `StartTls()`, and verifies
`connection.Tls` before binding. `LdapUseTls` is retained as a compatibility shim
mapping onto the enum. Regression tests `AuthenticateAsync_StartTlsTransport_AttemptsConnection`
and `AuthenticateAsync_NoTlsTransport_RejectedWithoutAllowInsecure`.
### Security-002 — Authentication cookie is not marked `Secure`
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/ServiceCollectionExtensions.cs:16-23` |
**Description**
`AddCookie` sets `HttpOnly = true` and `SameSite = Strict` but never sets
`options.Cookie.SecurePolicy`. The ASP.NET Core default is `CookieSecurePolicy.SameAsRequest`,
which permits the cookie (carrying the embedded JWT — a bearer credential) to be sent
over plain HTTP. The design doc states the cookie is **"HttpOnly and Secure (requires
HTTPS)"**. As written, the module does not enforce that requirement; a misconfigured or
HTTP-fronted deployment would transmit the session token in cleartext.
**Recommendation**
Set `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` in `AddCookie`. Consider
also setting `ExpireTimeSpan` and `SlidingExpiration` to align the cookie lifetime with
the documented 15-minute JWT / 30-minute idle policy.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). Confirmed the cookie is configured in this
module (`ServiceCollectionExtensions.AddSecurity`), not CentralUI — no misattribution.
Added `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` so the JWT-bearing
cookie is never sent over plain HTTP. Regression test
`AddSecurity_AuthCookie_IsMarkedSecureAlways`. (`ExpireTimeSpan`/`SlidingExpiration`
tuning left as a separate, lower-priority improvement.)
### Security-003 — JWT signing key length is never validated
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:33`, `src/ZB.MOM.WW.ScadaBridge.Security/SecurityOptions.cs:42` |
**Description**
`SecurityOptions.JwtSigningKey` defaults to `string.Empty` and is fed directly into
`new SymmetricSecurityKey(Encoding.UTF8.GetBytes(_options.JwtSigningKey))` with no
validation. HMAC-SHA256 requires a key of at least 256 bits (32 bytes); a short or empty
key produces a trivially forgeable token. The `SecurityHardeningTests` comment claims a
minimum length is "enforced", but no code in this module enforces it — the test only
asserts that a 32+ char key works. A deployment with a missing or short `JwtSigningKey`
would start successfully and issue weakly-signed tokens.
**Recommendation**
Validate `JwtSigningKey` at startup — fail fast if it is empty or shorter than 32 bytes.
Use an `IValidateOptions<SecurityOptions>` validator or guard in the `JwtTokenService`
constructor so a weak key is rejected before any token is issued.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`). The `JwtTokenService` constructor now
fails fast with an `InvalidOperationException` when `JwtSigningKey` is empty or
shorter than 32 bytes (`SecurityOptions.MinJwtSigningKeyBytes`), so a weak key is
rejected before any token is issued. The misleading `SecurityOptions` XML doc was
corrected to state the requirement. Regression tests
`JwtTokenService_EmptySigningKey_ThrowsAtConstruction`,
`JwtTokenService_ShortSigningKey_ThrowsAtConstruction`, and
`JwtTokenService_AdequateSigningKey_ConstructsSuccessfully`.
### Security-004 — Search filter uses `uid=` while fallback DN construction uses `cn=`
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:66`, `:138`, `:157-159` |
**Description**
`AuthenticateAsync` and `ResolveUserDnAsync` build the search filter as
`(uid={username})`, but the no-service-account fallback in `ResolveUserDnAsync`
constructs the bind DN as `cn={username},{LdapSearchBase}`. The `SecurityOptions.LdapServiceAccountDn`
XML comment also documents the fallback as `cn={username},{LdapSearchBase}`. A directory
keyed on `uid` will succeed via search-then-bind but fail via the direct-bind fallback
(and vice versa). The attribute used for lookup is hard-coded and inconsistent across
the two code paths, so the two configuration modes are not interchangeable.
**Recommendation**
Introduce a single configurable `LdapUserIdAttribute` (default `uid`) and use it
consistently in both the search filter and the fallback DN. Update the XML doc to match.
**Resolution**
Resolved 2026-05-16 (commit `pending`). Confirmed: the search filter was hard-coded
`(uid={username})` (both in `AuthenticateAsync` and `ResolveUserDnAsync`) while the
fallback DN used `cn={username}` — the two auth modes were not interchangeable. Added
a configurable `SecurityOptions.LdapUserIdAttribute` (default `uid`) used for both the
search filter and the fallback DN via the new `BuildFallbackUserDn` helper, and
corrected the `LdapServiceAccountDn` XML doc to reference `{LdapUserIdAttribute}`.
Regression tests `BuildFallbackUserDn_UsesConfiguredUserIdAttribute`,
`BuildFallbackUserDn_HonoursNonDefaultUserIdAttribute`,
`SecurityOptions_LdapUserIdAttribute_DefaultsToUid`.
### Security-005 — DN injection in the no-service-account bind fallback
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:157-159` |
**Description**
When no service account is configured, the user-supplied `username` is interpolated
directly into a distinguished name: `$"cn={username},{LdapSearchBase}"`. `EscapeLdapFilter`
escapes *search-filter* metacharacters, but DN construction requires a different
escaping scheme (RFC 4514 — `,`, `+`, `"`, `\`, `<`, `>`, `;`, leading/trailing spaces).
No DN escaping is applied here. A username such as `victim,ou=admins` alters the DN
structure, allowing a caller to attempt a bind as a different DN than intended. Combined
with the `username.Contains('=')` shortcut at line 129 — which lets a caller supply a
full arbitrary DN — the fallback path gives the client undue control over the bind
identity.
**Recommendation**
Apply RFC 4514 DN-component escaping to `username` before interpolation, or use the
LDAP library's DN-builder API. Reconsider the `Contains('=')` shortcut — accepting a
raw DN from untrusted input is risky; restrict it or remove it.
**Resolution**
Resolved 2026-05-16 (commit `pending`). Confirmed: the fallback path interpolated
the raw `username` straight into `cn={username},{LdapSearchBase}` with no DN escaping,
and the `username.Contains('=')` shortcut let a caller supply an arbitrary bind DN.
Added an RFC 4514 `EscapeLdapDn` helper (escapes `, + " \ < > ;`, leading/trailing
space, leading `#`, NUL) applied in `BuildFallbackUserDn`, so a username such as
`victim,ou=admins` can no longer alter the DN structure. The `Contains('=')` raw-DN
shortcut was removed entirely — untrusted input no longer controls the bind identity.
Regression tests `BuildFallbackUserDn_EscapesDnMetacharacters`,
`EscapeLdapDn_EscapesAllRfc4514Specials`, `EscapeLdapDn_EscapesLeadingAndTrailingSpaces`.
### Security-006 — JWT validation disables issuer and audience checks
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:67-75`, `:56-59` |
**Description**
`ValidateToken` sets `ValidateIssuer = false` and `ValidateAudience = false`, and
`GenerateToken` never sets an `iss` or `aud`. With a shared symmetric HMAC key, any
other system or component that signs JWTs with the same key would produce tokens this
service accepts. While the design states the key is shared only between the two central
nodes, omitting issuer/audience binding removes a cheap defense-in-depth control and
makes accidental key reuse (e.g. the same secret used for another internal token)
silently exploitable.
**Recommendation**
Set a fixed `Issuer` and `Audience` (e.g. `"scadabridge-central"`) when generating tokens
and enable `ValidateIssuer`/`ValidateAudience` with the matching expected values during
validation.
**Resolution**
Resolved 2026-05-16 (commit `pending`). Confirmed: `GenerateToken` set neither `iss`
nor `aud` and `ValidateToken` had `ValidateIssuer = false`/`ValidateAudience = false`.
`GenerateToken` now binds `JwtTokenService.TokenIssuer`/`TokenAudience`
(both `"scadabridge-central"`) into every token, and `ValidateToken` enables
`ValidateIssuer`/`ValidateAudience` against those fixed values — a token signed with
the shared key but a foreign issuer is now rejected. Regression tests
`GenerateToken_SetsIssuerAndAudience`, `ValidateToken_RejectsTokenWithWrongIssuer`,
`ValidateToken_AcceptsOwnIssuerAndAudience`.
### Security-007 — Idle-timeout claim is reset on every token refresh
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:40`, `:111-123` |
**Description**
The design states the 30-minute idle timeout is tracked via a "last-activity timestamp
in the token", and `IsIdleTimedOut` reads the `LastActivity` claim. But `RefreshToken`
calls `GenerateToken`, which unconditionally writes `LastActivity = DateTimeOffset.UtcNow`.
Token refresh fires whenever a request arrives within ~5 minutes of expiry. The result
is that `LastActivity` reflects *token issuance time*, not genuine user activity — and
since refresh itself is a request, the timestamp keeps moving forward. A more subtle
consequence: the idle window is effectively measured from the last refresh, not the
last real interaction, so the documented "no requests within the idle window" semantics
are not faithfully implemented. The claim name `LastActivity` is also misleading.
**Recommendation**
Decide explicitly how activity is tracked. Either (a) carry the original `LastActivity`
forward across refreshes and update it only on real request handling in the middleware,
or (b) rename the claim to `IssuedAt`/`TokenCreated` and document that the idle window
is measured from issuance. Whichever is chosen, ensure `IsIdleTimedOut` and the refresh
path agree on the semantics.
**Resolution**
Resolved 2026-05-16 (commit `pending`). Confirmed: `RefreshToken``GenerateToken`
unconditionally wrote `LastActivity = UtcNow`, so the idle clock reset on every
refresh and the documented 30-minute idle timeout could never fire for a client that
polls in the background. Implemented option (a) — the Security-side half of the
documented "15-min sliding + 30-min idle" policy (the cross-module partner of
CentralUI-005): `GenerateToken` now takes an optional `lastActivity` anchor;
`RefreshToken` carries the **existing** `LastActivity` claim forward unchanged, and a
new explicit `RecordActivity` method advances the anchor to now — to be called by the
CentralUI request pipeline on genuine user interaction (not on a background refresh).
`IsIdleTimedOut` is unchanged and now agrees with the refresh path. The remaining
CentralUI-side wiring (calling `RecordActivity` from the middleware, setting
`SlidingExpiration`/`ExpireTimeSpan`) stays tracked under CentralUI-005; this finding's
Security-side defect — the reset-on-refresh bug — is fully fixed here. Regression tests
`RefreshToken_PreservesOriginalLastActivityClaim`,
`RefreshToken_DoesNotResetIdleTimeoutWhenUserIsActuallyIdle`,
`RecordActivity_UpdatesLastActivityToNow`.
### Security-008 — N+1 query loading site-scope rules in `RoleMapper`
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/RoleMapper.cs:25-48` |
**Description**
`MapGroupsToRolesAsync` first calls `GetAllMappingsAsync`, then inside the per-mapping
loop calls `GetScopeRulesForMappingAsync(mapping.Id, ct)` once for every matched
Deployment mapping. This is an N+1 query pattern executed on the login hot path and on
every 15-minute token refresh. With multiple site-scoped Deployment groups it issues a
round-trip per group.
**Recommendation**
Add a repository method that loads scope rules for a set of mapping IDs in one query
(or eager-loads them with the mappings), and resolve all scope rules with a single call.
**Resolution**
Deferred 2026-05-16 (commit `pending commit`). Finding confirmed accurate: `RoleMapper.MapGroupsToRolesAsync`
issues one `GetScopeRulesForMappingAsync` round-trip per matched Deployment mapping — a
genuine N+1 on the login / 15-minute-refresh path. However, the only correct fix
(a batch `GetScopeRulesForMappingsAsync(IEnumerable<int>)` repository method, or an
eager-load navigation property) requires changes to `ISecurityRepository`
(`src/ZB.MOM.WW.ScadaBridge.Commons`) and `SecurityRepository` (`src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase`).
Both are outside the Security module's permitted edit scope for this review pass, and the
existing `ISecurityRepository` surface offers no per-set scope-rule query, so the N+1
cannot be removed from within `RoleMapper.cs` alone. Severity is Low (bounded by the
number of site-scoped Deployment groups, typically small). Deferred to a cross-module
change that adds the batch repository method; `RoleMapper` should then resolve all scope
rules in a single call.
### Security-009 — CancellationToken not honored inside `Task.Run` LDAP calls
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:42`, `:46`, `:51`, `:56-57`, `:67-73`, `:135`, `:139-145` |
**Description**
The synchronous Novell LDAP calls are wrapped in `Task.Run(() => ..., ct)`. The `ct`
argument only prevents the work item from *starting* if cancellation is already
signaled; once a `connection.Connect`/`Bind`/`Search` call is in progress it cannot be
cancelled. A cancelled or timed-out login request will continue to occupy a thread-pool
thread and an LDAP connection until the blocking call returns on its own. There is also
no explicit network/operation timeout configured on the `LdapConnection`.
**Recommendation**
Configure `LdapConnection.ConnectionTimeout` and search/operation time limits so a
hung LDAP server cannot pin a thread indefinitely. Document that `ct` only guards
work-item scheduling, or implement a timeout-with-disconnect fallback.
**Resolution**
Resolved 2026-05-16 (commit `pending commit`). Confirmed: the synchronous Novell LDAP
calls were wrapped in `Task.Run(..., ct)` where `ct` only prevents the work item from
starting and cannot interrupt an in-progress blocking `Connect`/`Bind`/`Search`, and no
network/operation timeout was configured on the `LdapConnection`. Added a configurable
`SecurityOptions.LdapConnectionTimeoutMs` (default 10s) and a new `ApplyConnectionTimeout`
helper that sets both `LdapConnection.ConnectionTimeout` (socket connect) and
`LdapConstraints.TimeLimit` (per-operation limit) before connecting, so a hung LDAP
server can no longer pin a thread-pool thread indefinitely. The `ct`-only-guards-scheduling
limitation is now documented in the option's XML doc and an inline comment. Regression
tests `SecurityOptions_LdapConnectionTimeout_HasSaneDefault` and
`AuthenticateAsync_UnreachableHost_FailsWithinConfiguredTimeout`.
### Security-010 — Design doc contradicts itself on Windows Integrated Authentication
| | |
|--|--|
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `docs/requirements/Component-Security.md:13` (vs. `:23`) |
**Description**
The Responsibilities section states the component authenticates "using Windows
Integrated Authentication", but the Authentication section (line 23) and CLAUDE.md
explicitly state **"No Windows Integrated Authentication ... authenticates directly
against LDAP/AD, not via Kerberos/NTLM"** — which is what the code actually does
(direct LDAP bind). The Responsibilities line is stale and contradicts both the rest of
the doc and the implementation.
**Recommendation**
Fix `Component-Security.md:13` to say "using a direct LDAP/Active Directory bind"
to match the implemented behavior and the rest of the document.
**Resolution**
Resolved 2026-05-16 (commit `pending commit`). Confirmed: the Responsibilities bullet at
line 13 said "using Windows Integrated Authentication", directly contradicting the
Authentication section's "No Windows Integrated Authentication ... authenticates directly
against LDAP/AD, not via Kerberos/NTLM", CLAUDE.md, and the implementation (`LdapAuthService`
performs a direct username/password bind). Documentation-only finding — no regression test
is meaningful. Reworded the Responsibilities bullet to "Authenticate users against
LDAP/Active Directory using a direct LDAP/AD bind (username/password)", matching the rest
of the document and the code.
### Security-011 — Missing tests for security-critical paths
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ZB.MOM.WW.ScadaBridge.Security.Tests/UnitTest1.cs` |
**Description**
The test suite covers happy paths well but omits several security-relevant cases:
no test exercises the StartTLS path (Security-001), the DN-injection / `Contains('=')`
fallback inputs (Security-005), JWT validation with a too-short or empty signing key
(Security-003), `IsIdleTimedOut` returning true after a token has been refreshed
(Security-007), or the `uid`/`cn` mismatch in the no-service-account path (Security-004).
The integration `SecurityHardeningTests` only asserts default option values, not
enforcement. The test file is still named `UnitTest1.cs`.
**Recommendation**
Add negative/edge-case tests for the items above, particularly key-length rejection,
DN-escaping of hostile usernames, and idle-timeout behavior across a refresh. Rename
`UnitTest1.cs` to a descriptive name.
**Resolution**
Resolved 2026-05-16 (commit `pending commit`). Re-triage: most of the listed gaps were
already closed by the Security-001..007 fixes — the regression classes
`SecurityReviewRegressionTests` (StartTLS path, JWT empty/short-key rejection) and
`SecurityReviewRegressionTests2` (DN-injection / hostile-username escaping, `uid`/`cn`
fallback consistency, idle-timeout preserved across refresh) cover them. The finding's
reference to a `SecurityHardeningTests` class is stale — no such class exists in the
suite. Remaining gaps closed here: renamed the test file `UnitTest1.cs`
`SecurityTests.cs`, and added a new `SecurityReviewRegressionTests3` class with the
LDAP-timeout coverage (Security-009) plus extra no-service-account / DN-path edge cases
(`BuildFallbackUserDn_NoSearchBase_ReturnsBareRdn`, `EscapeLdapDn_LeadingHash_IsEscaped`,
`EscapeLdapDn_NullOrEmpty_ReturnedUnchanged`). Full module suite: 54 tests, all green.
### Security-012 — Partial LDAP failure during login yields a roleless authenticated session
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:78-118` |
**Description**
In `AuthenticateAsync`, after the user bind succeeds, the group/attribute `Search` is
wrapped in a `try`/`catch (LdapException)` that logs a warning and continues — the
comment states "Auth succeeded even if attribute lookup failed". The method then returns
`new LdapAuthResult(true, displayName, username, groups, null)` with an **empty
`groups`** list. Because `RoleMapper.MapGroupsToRolesAsync` derives all roles from that
group list, a login during a *partial* LDAP outage (bind reachable, search/group server
unavailable, or a transient `LdapException` mid-search) produces a fully authenticated
session with **zero roles** — the user is logged in but silently stripped of all
permissions. The design's LDAP Connection Failure rule states new logins must **fail**
when LDAP is unavailable; this path instead admits the user. The caller cannot
distinguish "user genuinely belongs to no mapped groups" from "the group query failed",
so it cannot make the documented fail-the-login decision. The `while` loop's inner
`catch (LdapException) { break; }` (line 107-111) compounds this: a real error mid-enumeration
is indistinguishable from end-of-results and silently truncates the group list.
**Recommendation**
Treat a failed group/attribute lookup on the *initial login* as an authentication
failure: return `Success = false` with a "directory unavailable, try again" message, or
add an explicit flag on `LdapAuthResult` (e.g. `GroupLookupSucceeded`) so the caller can
apply the design's fail-the-login rule. The inner `while`-loop `catch` should not treat
an arbitrary `LdapException` as a benign end-of-results sentinel — only the specific
"no more results" condition should break the loop; any other exception should propagate
or be surfaced.
**Resolution**
Resolved 2026-05-17 (commit `pending`). Confirmed: the group/attribute `Search` was
wrapped in a `catch (LdapException)` that logged a warning and returned
`new LdapAuthResult(true, …, groups: [])` — a partial LDAP outage produced an
authenticated, zero-role session — and the inner `while`-loop `catch { break; }` masked
real mid-enumeration errors as end-of-results. `AuthenticateAsync` now tracks a
`groupLookupSucceeded` flag (false when any `LdapException` is thrown by the search or
`Next()`); the inner swallowing catch was removed so such errors propagate; the new
`BuildAuthResultFromGroupLookup` helper returns a FAILED `LdapAuthResult`
("directory temporarily unavailable, please try again") on lookup failure while still
treating a genuine empty-groups result as a successful login. Regression tests
`BuildAuthResultFromGroupLookup_LookupFailed_FailsTheLogin`,
`BuildAuthResultFromGroupLookup_LookupSucceededNoGroups_IsAuthenticated`,
`BuildAuthResultFromGroupLookup_LookupSucceededWithGroups_CarriesGroups`.
### Security-013 — `ExtractFirstRdnValue` mis-parses group DNs containing escaped commas
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:258-269` |
**Description**
`ExtractFirstRdnValue` extracts a group name from a `memberOf` DN by taking the substring
between the first `=` and the first `,`. RFC 4514 permits a `,` inside an RDN value when
it is backslash-escaped — e.g. `cn=Acme\, Inc Operators,ou=groups,dc=example,dc=com`. The
method splits on the *escaped* comma and returns `Acme\` as the group name. `RoleMapper`
then matches that mangled name verbatim against configured `LdapGroupMapping.LdapGroupName`
values, so a group whose CN legitimately contains a comma silently fails to map to its
role — the user loses that role with no error. The same naive parsing also does not strip
the trailing backslash escape.
**Recommendation**
Parse the first RDN with an escape-aware scan: ignore a `,` preceded by an unescaped
`\`, and unescape RFC 4514 sequences in the extracted value. Alternatively use the LDAP
library's DN-parsing API (`LdapDN` / RDN accessors) rather than hand-rolled `IndexOf`.
**Resolution**
Resolved 2026-05-17 (commit `pending`). Confirmed: `ExtractFirstRdnValue` took the
substring between the first `=` and the first `,`, splitting on an RFC 4514
backslash-escaped comma so a CN like `Acme\, Inc Operators` became `Acme\`. Replaced the
naive `IndexOf` scan with an escape-aware character scan that ignores a backslash-escaped
`,`, unescapes single-character and `\XX` hex escape sequences, and only terminates the
value on an unescaped comma — so a comma-containing group CN is returned intact and
matches its configured `LdapGroupName`. Method was also made `public` for direct
testing. Regression tests `ExtractFirstRdnValue_EscapedComma_KeepsWholeGroupName`,
`ExtractFirstRdnValue_PlainDn_ReturnsFirstRdnValue`,
`ExtractFirstRdnValue_SingleRdn_ReturnsValue`,
`ExtractFirstRdnValue_EscapedSpecials_AreUnescaped`.
### Security-014 — `RefreshToken` re-issues a token without checking the idle timeout
| | |
|--|--|
| Severity | Medium |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/JwtTokenService.cs:156-169` |
**Description**
`RefreshToken` carries the existing `LastActivity` claim forward (correct, per the
Security-007 fix) and unconditionally issues a fresh token with a new 15-minute expiry.
It never calls `IsIdleTimedOut`. The documented 30-minute idle policy therefore depends
entirely on the CentralUI request pipeline calling `IsIdleTimedOut` *before* calling
`RefreshToken` — two independent, order-dependent checks with no enforcement that they
are used together. If a caller refreshes without first checking idle state (an easy
mistake, and there is no compiler or API signal preventing it), an idle-expired session
is silently renewed: each refresh resets the JWT expiry while `LastActivity` keeps
ageing, so the session can be kept alive indefinitely by background refreshes even
though the user has been idle for hours. `RefreshToken` is the natural place to enforce
the invariant but provides no safety net.
**Recommendation**
Have `RefreshToken` itself reject a principal that is already past the idle timeout —
return `null` (the same "cannot refresh" signal it already uses for missing claims) when
`IsIdleTimedOut(currentPrincipal)` is true — so the idle policy holds regardless of
caller discipline. Document that an idle-expired principal cannot be refreshed and add a
regression test covering refresh of a 31-minute-idle token.
**Resolution**
Resolved 2026-05-17 (commit `pending`). Confirmed: `RefreshToken` carried `LastActivity`
forward and minted a fresh 15-minute token without ever calling `IsIdleTimedOut`, so the
idle policy depended entirely on caller discipline and a background refresh could keep an
idle-expired session alive indefinitely. `RefreshToken` now calls
`IsIdleTimedOut(currentPrincipal)` after the missing-claims check and returns `null`
(its existing "cannot refresh" signal) when the session is past the idle window, so the
30-minute idle policy holds regardless of caller order; XML doc updated to state the
invariant. Regression tests `RefreshToken_IdleExpiredPrincipal_ReturnsNull`,
`RefreshToken_ActiveSessionWithinIdleWindow_StillRefreshes`,
`RefreshToken_MissingLastActivityClaim_ReturnsNull`.
### Security-015 — Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:20-21`, `:80`, `:122`, `:169`, `:193` |
**Description**
`AuthenticateAsync` rejects null/whitespace-only usernames via `IsNullOrWhiteSpace`, but
an otherwise-valid username with leading or trailing whitespace (`" alice"`, copy-paste
artefacts, mobile keyboards) is passed through verbatim. It flows into the LDAP search
filter `({attr}={EscapeLdapFilter(username)})`, into the fallback bind DN via
`BuildFallbackUserDn`, and — most consequentially — into the returned `LdapAuthResult`
and from there into the JWT `Username` claim. Most LDAP directories ignore surrounding
whitespace on a search term, so `" alice"` and `"alice"` may both authenticate but yield
JWT `Username` claims that differ by whitespace. Any downstream identity comparison
(audit-log `who`, site-scope lookups keyed by username, session de-duplication) then
sees two distinct identities for the same person.
**Recommendation**
Trim the username once at the top of `AuthenticateAsync` (after the `IsNullOrWhiteSpace`
guard) and use the trimmed value consistently for the filter, the DN, and the
`LdapAuthResult`. This normalises the identity that ends up in the JWT claim and audit
trail.
**Resolution**
Resolved 2026-05-17 (commit `pending`). Confirmed: an otherwise-valid username with
leading/trailing whitespace passed the `IsNullOrWhiteSpace` guard and flowed verbatim
into the LDAP filter, the fallback DN, and the returned `LdapAuthResult` (and thus the
JWT `Username` claim). Added a `NormalizeUsername` helper (trims whitespace) called once
at the top of `AuthenticateAsync` immediately after the guard; the local `username`
variable is reassigned to the trimmed value so the filter, fallback DN, and result all
use the single canonical identity. Regression tests
`NormalizeUsername_TrimsLeadingAndTrailingWhitespace`,
`BuildFallbackUserDn_TrimmedUsername_NoLeadingTrailingSpace`,
`AuthenticateAsync_UsernameWithSurroundingWhitespace_StillRejectedForInsecure`.
### Security-016 — `RoleMapper` silently drops the system-wide Deployment grant when a user is also in any site-scoped Deployment group
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/RoleMapper.cs:30-31`, `:41-55`, `:59` |
**Description**
`MapGroupsToRolesAsync` resolves the Deployment role's site scope as a single
`isSystemWide = hasDeploymentRole && !hasDeploymentWithScopeRules` flag computed across
ALL matched Deployment mappings. If a user is a member of both a system-wide Deployment
group (e.g. `SCADA-Deploy-All`, no scope rules) AND a site-scoped Deployment group
(e.g. `SCADA-Deploy-SiteA`, one scope rule for Site A), the second mapping sets
`hasDeploymentWithScopeRules = true`, so the final `isSystemWide` becomes `false` and
the returned `PermittedSiteIds` is just `[SiteA]`. The system-wide grant from
`SCADA-Deploy-All` is silently dropped — the user loses access to every other site, even
though one of their LDAP groups was intended to grant them system-wide reach. This
contradicts the design's "A user can hold multiple roles simultaneously … roles are
independent — there is no implied hierarchy" intent: the union of grants should be the
broadest grant in the set, not the narrowest. The mistake is also non-obvious to an
operator: from the Admin → LDAP Mappings page nothing flags that adding a site-scoped
Deployment mapping for a user already in `SCADA-Deploy-All` *removes* sites from their
effective grant. The downstream `SiteScopeService.IsSystemWideAsync` / `FilterSitesAsync`
faithfully reproduce this narrowing, so the user can no longer see or act on sites
outside `[SiteA]`.
**Recommendation**
Track the union semantics explicitly: if any matched Deployment mapping has no scope
rules, the user is system-wide regardless of what other mappings have. The simplest
change is to set `hasDeploymentWithScopeRules` only when the mapping has scope rules
AND another flag `hasUnscopedDeploymentMapping` is false; then compute
`isSystemWide = hasUnscopedDeploymentMapping || (hasDeploymentRole && !hasDeploymentWithScopeRules)`.
Equivalently: collect per-mapping `(hasRules, scopedSiteIds)` first, then
`isSystemWide = any mapping has hasRules==false`, and `permittedSiteIds = union of all
scopedSiteIds` (left empty for system-wide users). Add a regression test
`MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide` covering
the design's example pair `SCADA-Deploy-All` + `SCADA-Deploy-SiteA`.
**Resolution**
Resolved 2026-05-28 (commit pending). `RoleMapper.MapGroupsToRolesAsync` now tracks two
independent flags per matched Deployment mapping: `hasUnscopedDeploymentMapping` (any
matched mapping with no scope rules) and `hasScopedDeploymentMapping` (any matched
mapping with scope rules). `isSystemWide` is `hasUnscopedDeploymentMapping ||
(hasDeploymentRole && !hasScopedDeploymentMapping)` — so a user in both
`SCADA-Deploy-All` and `SCADA-Deploy-SiteA` is now correctly system-wide, with the
accumulated scope ids cleared. Magic string `"Deployment"` was replaced with the new
`Roles.Deployment` constant (Security-018). Regression test
`MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide`
(seeds Site A, attaches a scope rule to the seeded `SCADA-Deploy-SiteA` mapping, and
asserts a user mapped via both `SCADA-Deploy-All` and `SCADA-Deploy-SiteA` resolves
to system-wide with empty `PermittedSiteIds`) fails on the pre-fix code and passes
after.
### Security-017 — `SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are dead code from production callers — `[Authorize(Policy = RequireDeployment)]` does NOT enforce site scoping
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/SiteScopeAuthorizationHandler.cs:8-58`; `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:113-143` |
**Description**
The module declares `SiteScopeRequirement` (an `IAuthorizationRequirement` carrying a
`TargetSiteId`) and the matching `SiteScopeAuthorizationHandler` that combines the
Deployment role claim with the `SiteId` claims to enforce the design's site-scoping
rule. The handler is registered in `AddScadaBridgeAuthorization`
(`services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>()`). But
no `AddPolicy` call ever wires the requirement to a named policy, and a grep across
`src/ZB.MOM.WW.ScadaBridge.CentralUI` and `src/ZB.MOM.WW.ScadaBridge.ManagementService` confirms that **no
production code ever instantiates `new SiteScopeRequirement(...)` or calls
`AuthorizeAsync(...)` with one** — the only callers are the unit tests in
`SecurityTests.cs:1146,1166,1185,1203`. The design + CLAUDE.md state that "Deployment
and Monitoring pages must filter every site/instance list through `FilterSitesAsync`
and re-check `IsSiteAllowedAsync` before any cross-site command", and the
CentralUI-028 finding (High, Open) confirms this is exactly the contract two new
report pages forgot — because there is no declarative `[Authorize(Policy = ...)]`
shortcut, callers must remember to inject `SiteScopeService` and write the check by
hand, and any new page that forgets is a silent regression with no compile-time or
test-time signal. The module's published surface advertises an authorization-handler
pattern that is, in practice, unwired plumbing.
**Recommendation**
Either (a) **delete** `SiteScopeRequirement` and `SiteScopeAuthorizationHandler` (and
the dead `IAuthorizationHandler` registration) and document `SiteScopeService` as the
sole site-scoping mechanism — this is the smaller change and matches what the codebase
actually does today; or, preferably, (b) **finish the wiring**: add a `RequireSiteScope`
policy that uses `SiteScopeRequirement` and provide a small helper / source generator
or analyzer that flags Deployment-policy-attributed pages without a site-scope check.
Either way, address the cross-module gap: CentralUI-028 stays open until production
pages reliably enforce the rule. If (b) is chosen, a route-parameter-aware
`IAuthorizationPolicyProvider` is needed so the policy can read the target site id from
the request — that is a meaningful design extension and would need to be planned
alongside the Central UI's existing `SiteScopeService` usage rather than replacing it
piecemeal.
**Resolution**
Resolved 2026-05-28 (commit pending) per recommendation option (a): deleted
`SiteScopeRequirement` and `SiteScopeAuthorizationHandler` outright, along with the
unwired `services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>()`
registration in `AuthorizationPolicies.AddScadaBridgeAuthorization` and the four
isolation tests in `SecurityTests.cs`. `SiteScopeService` (CentralUI-002) is now
documented as the sole site-scoping mechanism — the stale "mirrors
SiteScopeAuthorizationHandler" comment in `SiteScopeService.ResolveAsync` was rewritten
to say so. The cross-module partner CentralUI-028 is fixed in the same batch (the two
new report pages now consume `SiteScopeService`).
### Security-018 — Role names are hard-coded magic strings duplicated across `RoleMapper`, `SiteScopeAuthorizationHandler`, and `AuthorizationPolicies`
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/RoleMapper.cs:41`; `src/ZB.MOM.WW.ScadaBridge.Security/SiteScopeAuthorizationHandler.cs:36`; `src/ZB.MOM.WW.ScadaBridge.Security/AuthorizationPolicies.cs:118,121,124,95,107` |
**Description**
The role-name literals `"Admin"`, `"Design"`, `"Deployment"`, `"Audit"`, and
`"AuditReadOnly"` are duplicated as magic strings across three separate files:
`RoleMapper.cs:41` hard-codes `"Deployment"` to detect the site-scope branch;
`SiteScopeAuthorizationHandler.cs:36` independently hard-codes `"Deployment"` to gate
the handler; and `AuthorizationPolicies.cs:118,121,124` hard-code the four role names
as the policy `RequireClaim` values. Only the audit roles have a single source of truth
(via the `OperationalAuditRoles` / `AuditExportRoles` arrays on
`AuthorizationPolicies`). A future rename or addition of a role that misses any one of
these call sites silently breaks the system: e.g. renaming "Deployment" → "Deployer"
in `RoleMapper` alone would leave the policy still requiring `"Deployment"` (logins
get the new role name but the policy never matches), while changing it in the policy
alone would leave `RoleMapper` failing to populate scope rules for the renamed role.
The bug class is "string drift" — exactly the kind the `OperationalAuditRoles` constant
was introduced to prevent.
**Recommendation**
Introduce a `public static class Roles { public const string Admin = "Admin"; public const
string Design = "Design"; public const string Deployment = "Deployment"; public const string
Audit = "Audit"; public const string AuditReadOnly = "AuditReadOnly"; }` in the Security
project and replace every magic-string occurrence — including the elements of
`OperationalAuditRoles` and `AuditExportRoles` — with the constants. A single rename
will then either succeed everywhere or fail to compile.
**Resolution**
Resolved 2026-05-28 (commit pending). Added `src/ZB.MOM.WW.ScadaBridge.Security/Roles.cs` holding
`Admin`/`Design`/`Deployment`/`Audit`/`AuditReadOnly` as `public const string`
fields. Replaced every magic-string occurrence in this module:
`RoleMapper.MapGroupsToRolesAsync` now compares against `Roles.Deployment`;
`AuthorizationPolicies.AddScadaBridgeAuthorization` binds `RequireClaim(...)` to
`Roles.{Admin,Design,Deployment}`; `OperationalAuditRoles` /
`AuditExportRoles` are now built from `Roles.Admin`, `Roles.Audit`, `Roles.AuditReadOnly`.
`SiteScopeAuthorizationHandler.cs` was deleted under Security-017, so its
`"Deployment"` literal is gone with it. A future rename now propagates by a
single edit or fails to compile.
### Security-019 — Service-account rebind failure is reported as "Invalid username or password" — masks misconfiguration as a user-credential error
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/LdapAuthService.cs:85-89`, `:147-151` |
**Description**
After the user's credentials bind successfully, `AuthenticateAsync` re-binds as the
configured service account to perform the group/attribute search
(`connection.Bind(_options.LdapServiceAccountDn, _options.LdapServiceAccountPassword)`).
A failure of this second bind — wrong service-account password, deleted/disabled
service-account, locked-out service-account — throws `LdapException` which is caught by
the broad outer `catch (LdapException)` and returned as
`new LdapAuthResult(false, null, username, null, "Invalid username or password.")`.
The user sees an "invalid credentials" message for *their* credentials even though
their bind succeeded and the failure was in the system's own service-account
configuration. Worse, every user attempting to log in sees the same incorrect message
during a service-account outage, which routes operators down the wrong incident path
(reset the user's password) instead of the right one (check the service-account
credentials). The successful user bind itself is also not auditable as a discrete
event because the result is "Invalid username or password" — indistinguishable from a
genuine bad-password attempt.
**Recommendation**
Wrap the service-account rebind in its own `try`/`catch (LdapException)` and surface a
distinct error: log `_logger.LogError(ex, "Service-account rebind failed; check
LdapServiceAccountDn / LdapServiceAccountPassword configuration")` and return
`new LdapAuthResult(false, null, username, null, "Authentication service is misconfigured. Contact an administrator.")`.
Add a regression test that exercises the service-account-bind failure path (a mocked
or seamed `LdapConnection.Bind` that throws on the second call) and asserts the
distinct error message.
**Resolution**
Resolved 2026-05-28 (commit pending). Added a new `ServiceAccountBindException`
(`src/ZB.MOM.WW.ScadaBridge.Security/ServiceAccountBindException.cs`) — deliberately NOT an
`LdapException` subtype so it short-circuits the generic LDAP catch — and a private
`BindServiceAccountAsync` helper on `LdapAuthService` that wraps both service-account
rebind sites (the post-user-bind group-lookup rebind AND the `ResolveUserDnAsync`
DN-search rebind). On `LdapException`, the helper logs Error
("Service-account rebind failed; check LdapServiceAccountDn /
LdapServiceAccountPassword configuration") and rethrows as `ServiceAccountBindException`,
which the outer `AuthenticateAsync` catch chain maps to the distinct user-facing message
"Authentication service is misconfigured. Contact an administrator." Service-account
faults no longer surface as "Invalid username or password". Regression test
`ServiceAccountBindException_DoesNotInheritLdapException_SoCatchOrderIsCorrect` pins the
load-bearing inheritance contract; a full end-to-end auth test that exercises the
service-account-bind failure path is not feasible without an LDAP seam in
`LdapAuthService` (the `LdapConnection` is constructed in-method), so the structural test
is the closest meaningful unit-level coverage.
### Security-020 — `SecurityOptions` has no startup validation for required fields (`LdapServer`, `LdapSearchBase`)
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/SecurityOptions.cs:6-7`, `:36-37`; `src/ZB.MOM.WW.ScadaBridge.Security/ServiceCollectionExtensions.cs:13-30` |
**Resolution (2026-05-28):** added `SecurityOptionsValidator`
(`IValidateOptions<SecurityOptions>`) that rejects empty/whitespace
`LdapServer` and `LdapSearchBase` with messages naming the full
`Security:Field` key the operator would edit. `AddSecurity` registers it via
`services.AddOptions<SecurityOptions>().ValidateOnStart()` +
`TryAddEnumerable(... SecurityOptionsValidator)` so a misconfigured
`Security` section fails fast at boot rather than minutes later on the first
login. `JwtSigningKey` is deliberately left to `JwtTokenService`'s existing
length-aware constructor guard (Security-003). Regression tests in
`SecurityOptionsValidatorTests`: valid-options succeed; empty/whitespace
`LdapServer` and `LdapSearchBase` each fail with the key-naming message
(theory); both-empty reports both keys; `AddSecurity_RegistersSecurityOptionsValidator`
pins the DI wiring.
**Description**
`SecurityOptions.JwtSigningKey` correctly fails fast at `JwtTokenService` construction
(Security-003 fix), but the LDAP-side required fields — `LdapServer` (default
`string.Empty`) and `LdapSearchBase` (default `string.Empty`) — have no equivalent
guard. `AddSecurity` does not register an `IValidateOptions<SecurityOptions>`. A
deployment that fails to set `LdapServer` (a typo in the appsettings.json section name,
a missing environment-variable substitution, a misconfigured Docker compose file)
starts cleanly — the Central UI comes up, the login page loads, and only the first
authentication attempt fails with `LdapConnection.Connect("")` throwing a low-level
exception that bubbles up as the generic "An unexpected error occurred during
authentication." message. The misconfiguration surfaces minutes or hours into the
deploy, on the first real user login, rather than at startup where it is cheap to
diagnose.
**Recommendation**
Add an `IValidateOptions<SecurityOptions>` registered via
`services.AddOptions<SecurityOptions>().ValidateOnStart()` that fails when
`LdapServer` is null/whitespace, `LdapSearchBase` is null/whitespace, or
`LdapPort <= 0`. Combine with the existing `JwtTokenService` constructor check so
every required `SecurityOptions` field is enforced at startup, not at first use.
### Security-021 — `RequireHttpsCookie=false` dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ZB.MOM.WW.ScadaBridge.Security/SecurityOptions.cs:100-108`; `src/ZB.MOM.WW.ScadaBridge.Security/ServiceCollectionExtensions.cs:54-59` |
**Resolution (2026-05-28):** Added `ILoggerFactory` to the cookie-options
`Configure` callback in `AddSecurity` so an explicit `RequireHttpsCookie=false`
opt-out now emits a startup `LogWarning` ("auth cookie SecurePolicy is
SameAsRequest. The cookie-embedded JWT will be transmitted in cleartext
over plain HTTP. This setting is intended for local dev only — set
SecurityOptions:RequireHttpsCookie=true in production."). Default is still
`true`, so production deployments unchanged. The "fail startup when
RequireHttpsCookie=false AND ASPNETCORE_ENVIRONMENT=Production" hard-stop
option was not implemented (the dev Docker cluster intentionally runs with
the flag false, and the env-var name varies across deploy mechanisms);
the warning is the right ergonomic floor.
**Description**
The Security-002 fix added `RequireHttpsCookie` (default `true`) so the auth cookie's
`SecurePolicy` is `Always` in production. The current Docker dev cluster sets
`RequireHttpsCookie=false` in both central nodes' `appsettings.Central.json`, downgrading
to `SameAsRequest` so the local HTTP cluster works. The downgrade is documented in the
XML doc but is silent at runtime: no log line warns that the cookie carrying the JWT
bearer credential is being sent over an HTTP-only path. A production deployment that
inherits a dev-derived appsettings — or that copy-pastes the docker config and forgets
to flip the flag — transmits the session token in cleartext with no diagnostic signal.
The default is correct; the gap is that the unsafe override has no operational guard.
**Recommendation**
In the `PostConfigure` block in `AddSecurity`, when `RequireHttpsCookie == false`, log
a single startup warning along the lines of `_logger.LogWarning("RequireHttpsCookie is
DISABLED — auth cookie SecurePolicy is SameAsRequest. The cookie-embedded JWT will be
transmitted over plain HTTP. This setting is intended for local dev only — set
SecurityOptions:RequireHttpsCookie=true in production.")`. Optionally, also fail
startup when `RequireHttpsCookie=false` AND `ASPNETCORE_ENVIRONMENT=Production`. Add a
regression test that asserts the warning is emitted when the flag is disabled and not
when it is enabled.