# Code Review — Security | Field | Value | |-------|-------| | Module | `src/ScadaLink.Security` | | Design doc | `docs/requirements/Component-Security.md` | | Status | Reviewed | | Last reviewed | 2026-05-16 | | Reviewer | claude-agent | | Commit reviewed | `9c60592` | | Open findings | 0 (1 deferred — Security-008) | ## Summary The Security module is small and reasonably structured: a stateless `LdapAuthService` for search-then-bind authentication, a `JwtTokenService` for HMAC-signed cookie tokens, a `RoleMapper` that resolves LDAP groups to roles, and ASP.NET Core authorization policies plus a site-scope handler. Unit-test coverage of the happy paths is decent. However, the review surfaced several real security weaknesses, the most serious being that **StartTLS is dead code** (the design's "LDAPS or StartTLS" requirement is only half met), that **the authentication cookie is not marked `Secure`** despite the design mandating it, and that **the JWT signing key is never length-validated** so a weak or empty key is silently accepted. There is also a genuine **DN-injection** gap in the no-service-account fallback path, a filter/DN attribute mismatch (`uid=` vs `cn=`) that makes that fallback path internally inconsistent, and an N+1 query in `RoleMapper`. JWT validation also disables issuer/audience checks and the idle-timeout claim is reset on every refresh, weakening the documented 30-minute idle policy. None of these are crash/data-loss bugs, but the TLS, cookie, and key-validation items are security defects that should be fixed before any production deployment. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ☑ | `uid=`/`cn=` attribute mismatch between search filter and fallback DN construction (Security-004); StartTLS branch is unreachable (Security-001). | | 2 | Akka.NET conventions | ☑ | No actors in this module — `AddSecurityActors` is an empty placeholder. Nothing to assess. | | 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; LDAP sync calls wrapped in `Task.Run`. No shared mutable state. No issues found. | | 4 | Error handling & resilience | ☑ | LDAP failure paths return structured `LdapAuthResult`; group-lookup failure is tolerated per design. `ct` not honored inside `Task.Run` bodies (Security-009). | | 5 | Security | ☑ | StartTLS dead code (Security-001), cookie not `Secure` (Security-002), JWT key unvalidated (Security-003), DN injection (Security-005), no issuer/audience validation (Security-006), idle-timeout reset on refresh (Security-007). | | 6 | Performance & resource management | ☑ | N+1 scope-rule query in `RoleMapper` (Security-008). `LdapConnection` correctly disposed via `using`. | | 7 | Design-document adherence | ☑ | StartTLS unsupported and Secure cookie missing both contradict the design doc; design also says "Windows Integrated Authentication" in Responsibilities, contradicting its own Authentication section (Security-010). | | 8 | Code organization & conventions | ☑ | `SecurityOptions` correctly owned by the component; repository interface in Commons. No issues found. | | 9 | Testing coverage | ☑ | No tests for `RoleMapper` N+1 behavior, DN-injection inputs, StartTLS path, or idle-timeout-after-refresh. Insecure-config combinations under-tested (Security-011). | | 10 | Documentation & comments | ☑ | `SecurityOptions` XML docs say direct bind uses `cn={username}` while the search filter uses `uid=` — comment is misleading (covered under Security-004). | ## Findings ### Security-001 — StartTLS upgrade path is unreachable dead code | | | |--|--| | Severity | High | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.Security/LdapAuthService.cs:37-47` | **Description** When `LdapUseTls` is true the code sets `connection.SecureSocketLayer = true` (LDAPS). The subsequent StartTLS block is guarded by `if (_options.LdapUseTls && !connection.SecureSocketLayer)`. Because `SecureSocketLayer` was just set to `true`, the second condition `!connection.SecureSocketLayer` is always false, so `connection.StartTls()` is never called. The design doc explicitly states LDAP connections must use **"LDAPS (port 636) or StartTLS"** — StartTLS is in practice unsupported. A deployment that intends to use StartTLS on port 389 would get a plaintext LDAPS-mode connection attempt that fails, or worse, an operator may disable TLS entirely to make it work, sending credentials in cleartext. **Recommendation** Introduce an explicit transport mode (e.g. `LdapTransport { Ldaps, StartTls, None }`) or a separate `LdapUseStartTls` flag. For StartTLS, leave `SecureSocketLayer` false, call `connection.Connect`, then call `connection.StartTls()` and verify the negotiated session is encrypted before binding. Remove the unreachable conditional. **Resolution** Resolved 2026-05-16 (commit ``). Added an explicit `LdapTransport` enum (`Ldaps`/`StartTls`/`None`); `SecureSocketLayer` is set only for LDAPS, and the StartTLS branch now connects in plaintext, calls `StartTls()`, and verifies `connection.Tls` before binding. `LdapUseTls` is retained as a compatibility shim mapping onto the enum. Regression tests `AuthenticateAsync_StartTlsTransport_AttemptsConnection` and `AuthenticateAsync_NoTlsTransport_RejectedWithoutAllowInsecure`. ### Security-002 — Authentication cookie is not marked `Secure` | | | |--|--| | Severity | High | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.Security/ServiceCollectionExtensions.cs:16-23` | **Description** `AddCookie` sets `HttpOnly = true` and `SameSite = Strict` but never sets `options.Cookie.SecurePolicy`. The ASP.NET Core default is `CookieSecurePolicy.SameAsRequest`, which permits the cookie (carrying the embedded JWT — a bearer credential) to be sent over plain HTTP. The design doc states the cookie is **"HttpOnly and Secure (requires HTTPS)"**. As written, the module does not enforce that requirement; a misconfigured or HTTP-fronted deployment would transmit the session token in cleartext. **Recommendation** Set `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` in `AddCookie`. Consider also setting `ExpireTimeSpan` and `SlidingExpiration` to align the cookie lifetime with the documented 15-minute JWT / 30-minute idle policy. **Resolution** Resolved 2026-05-16 (commit ``). Confirmed the cookie is configured in this module (`ServiceCollectionExtensions.AddSecurity`), not CentralUI — no misattribution. Added `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` so the JWT-bearing cookie is never sent over plain HTTP. Regression test `AddSecurity_AuthCookie_IsMarkedSecureAlways`. (`ExpireTimeSpan`/`SlidingExpiration` tuning left as a separate, lower-priority improvement.) ### Security-003 — JWT signing key length is never validated | | | |--|--| | Severity | High | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.Security/JwtTokenService.cs:33`, `src/ScadaLink.Security/SecurityOptions.cs:42` | **Description** `SecurityOptions.JwtSigningKey` defaults to `string.Empty` and is fed directly into `new SymmetricSecurityKey(Encoding.UTF8.GetBytes(_options.JwtSigningKey))` with no validation. HMAC-SHA256 requires a key of at least 256 bits (32 bytes); a short or empty key produces a trivially forgeable token. The `SecurityHardeningTests` comment claims a minimum length is "enforced", but no code in this module enforces it — the test only asserts that a 32+ char key works. A deployment with a missing or short `JwtSigningKey` would start successfully and issue weakly-signed tokens. **Recommendation** Validate `JwtSigningKey` at startup — fail fast if it is empty or shorter than 32 bytes. Use an `IValidateOptions` validator or guard in the `JwtTokenService` constructor so a weak key is rejected before any token is issued. **Resolution** Resolved 2026-05-16 (commit ``). The `JwtTokenService` constructor now fails fast with an `InvalidOperationException` when `JwtSigningKey` is empty or shorter than 32 bytes (`SecurityOptions.MinJwtSigningKeyBytes`), so a weak key is rejected before any token is issued. The misleading `SecurityOptions` XML doc was corrected to state the requirement. Regression tests `JwtTokenService_EmptySigningKey_ThrowsAtConstruction`, `JwtTokenService_ShortSigningKey_ThrowsAtConstruction`, and `JwtTokenService_AdequateSigningKey_ConstructsSuccessfully`. ### Security-004 — Search filter uses `uid=` while fallback DN construction uses `cn=` | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.Security/LdapAuthService.cs:66`, `:138`, `:157-159` | **Description** `AuthenticateAsync` and `ResolveUserDnAsync` build the search filter as `(uid={username})`, but the no-service-account fallback in `ResolveUserDnAsync` constructs the bind DN as `cn={username},{LdapSearchBase}`. The `SecurityOptions.LdapServiceAccountDn` XML comment also documents the fallback as `cn={username},{LdapSearchBase}`. A directory keyed on `uid` will succeed via search-then-bind but fail via the direct-bind fallback (and vice versa). The attribute used for lookup is hard-coded and inconsistent across the two code paths, so the two configuration modes are not interchangeable. **Recommendation** Introduce a single configurable `LdapUserIdAttribute` (default `uid`) and use it consistently in both the search filter and the fallback DN. Update the XML doc to match. **Resolution** Resolved 2026-05-16 (commit `pending`). Confirmed: the search filter was hard-coded `(uid={username})` (both in `AuthenticateAsync` and `ResolveUserDnAsync`) while the fallback DN used `cn={username}` — the two auth modes were not interchangeable. Added a configurable `SecurityOptions.LdapUserIdAttribute` (default `uid`) used for both the search filter and the fallback DN via the new `BuildFallbackUserDn` helper, and corrected the `LdapServiceAccountDn` XML doc to reference `{LdapUserIdAttribute}`. Regression tests `BuildFallbackUserDn_UsesConfiguredUserIdAttribute`, `BuildFallbackUserDn_HonoursNonDefaultUserIdAttribute`, `SecurityOptions_LdapUserIdAttribute_DefaultsToUid`. ### Security-005 — DN injection in the no-service-account bind fallback | | | |--|--| | Severity | Medium | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.Security/LdapAuthService.cs:157-159` | **Description** When no service account is configured, the user-supplied `username` is interpolated directly into a distinguished name: `$"cn={username},{LdapSearchBase}"`. `EscapeLdapFilter` escapes *search-filter* metacharacters, but DN construction requires a different escaping scheme (RFC 4514 — `,`, `+`, `"`, `\`, `<`, `>`, `;`, leading/trailing spaces). No DN escaping is applied here. A username such as `victim,ou=admins` alters the DN structure, allowing a caller to attempt a bind as a different DN than intended. Combined with the `username.Contains('=')` shortcut at line 129 — which lets a caller supply a full arbitrary DN — the fallback path gives the client undue control over the bind identity. **Recommendation** Apply RFC 4514 DN-component escaping to `username` before interpolation, or use the LDAP library's DN-builder API. Reconsider the `Contains('=')` shortcut — accepting a raw DN from untrusted input is risky; restrict it or remove it. **Resolution** Resolved 2026-05-16 (commit `pending`). Confirmed: the fallback path interpolated the raw `username` straight into `cn={username},{LdapSearchBase}` with no DN escaping, and the `username.Contains('=')` shortcut let a caller supply an arbitrary bind DN. Added an RFC 4514 `EscapeLdapDn` helper (escapes `, + " \ < > ;`, leading/trailing space, leading `#`, NUL) applied in `BuildFallbackUserDn`, so a username such as `victim,ou=admins` can no longer alter the DN structure. The `Contains('=')` raw-DN shortcut was removed entirely — untrusted input no longer controls the bind identity. Regression tests `BuildFallbackUserDn_EscapesDnMetacharacters`, `EscapeLdapDn_EscapesAllRfc4514Specials`, `EscapeLdapDn_EscapesLeadingAndTrailingSpaces`. ### Security-006 — JWT validation disables issuer and audience checks | | | |--|--| | Severity | Medium | | Category | Security | | Status | Resolved | | Location | `src/ScadaLink.Security/JwtTokenService.cs:67-75`, `:56-59` | **Description** `ValidateToken` sets `ValidateIssuer = false` and `ValidateAudience = false`, and `GenerateToken` never sets an `iss` or `aud`. With a shared symmetric HMAC key, any other system or component that signs JWTs with the same key would produce tokens this service accepts. While the design states the key is shared only between the two central nodes, omitting issuer/audience binding removes a cheap defense-in-depth control and makes accidental key reuse (e.g. the same secret used for another internal token) silently exploitable. **Recommendation** Set a fixed `Issuer` and `Audience` (e.g. `"scadalink-central"`) when generating tokens and enable `ValidateIssuer`/`ValidateAudience` with the matching expected values during validation. **Resolution** Resolved 2026-05-16 (commit `pending`). Confirmed: `GenerateToken` set neither `iss` nor `aud` and `ValidateToken` had `ValidateIssuer = false`/`ValidateAudience = false`. `GenerateToken` now binds `JwtTokenService.TokenIssuer`/`TokenAudience` (both `"scadalink-central"`) into every token, and `ValidateToken` enables `ValidateIssuer`/`ValidateAudience` against those fixed values — a token signed with the shared key but a foreign issuer is now rejected. Regression tests `GenerateToken_SetsIssuerAndAudience`, `ValidateToken_RejectsTokenWithWrongIssuer`, `ValidateToken_AcceptsOwnIssuerAndAudience`. ### Security-007 — Idle-timeout claim is reset on every token refresh | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.Security/JwtTokenService.cs:40`, `:111-123` | **Description** The design states the 30-minute idle timeout is tracked via a "last-activity timestamp in the token", and `IsIdleTimedOut` reads the `LastActivity` claim. But `RefreshToken` calls `GenerateToken`, which unconditionally writes `LastActivity = DateTimeOffset.UtcNow`. Token refresh fires whenever a request arrives within ~5 minutes of expiry. The result is that `LastActivity` reflects *token issuance time*, not genuine user activity — and since refresh itself is a request, the timestamp keeps moving forward. A more subtle consequence: the idle window is effectively measured from the last refresh, not the last real interaction, so the documented "no requests within the idle window" semantics are not faithfully implemented. The claim name `LastActivity` is also misleading. **Recommendation** Decide explicitly how activity is tracked. Either (a) carry the original `LastActivity` forward across refreshes and update it only on real request handling in the middleware, or (b) rename the claim to `IssuedAt`/`TokenCreated` and document that the idle window is measured from issuance. Whichever is chosen, ensure `IsIdleTimedOut` and the refresh path agree on the semantics. **Resolution** Resolved 2026-05-16 (commit `pending`). Confirmed: `RefreshToken` → `GenerateToken` unconditionally wrote `LastActivity = UtcNow`, so the idle clock reset on every refresh and the documented 30-minute idle timeout could never fire for a client that polls in the background. Implemented option (a) — the Security-side half of the documented "15-min sliding + 30-min idle" policy (the cross-module partner of CentralUI-005): `GenerateToken` now takes an optional `lastActivity` anchor; `RefreshToken` carries the **existing** `LastActivity` claim forward unchanged, and a new explicit `RecordActivity` method advances the anchor to now — to be called by the CentralUI request pipeline on genuine user interaction (not on a background refresh). `IsIdleTimedOut` is unchanged and now agrees with the refresh path. The remaining CentralUI-side wiring (calling `RecordActivity` from the middleware, setting `SlidingExpiration`/`ExpireTimeSpan`) stays tracked under CentralUI-005; this finding's Security-side defect — the reset-on-refresh bug — is fully fixed here. Regression tests `RefreshToken_PreservesOriginalLastActivityClaim`, `RefreshToken_DoesNotResetIdleTimeoutWhenUserIsActuallyIdle`, `RecordActivity_UpdatesLastActivityToNow`. ### Security-008 — N+1 query loading site-scope rules in `RoleMapper` | | | |--|--| | Severity | Low | | Category | Performance & resource management | | Status | Deferred | | Location | `src/ScadaLink.Security/RoleMapper.cs:25-48` | **Description** `MapGroupsToRolesAsync` first calls `GetAllMappingsAsync`, then inside the per-mapping loop calls `GetScopeRulesForMappingAsync(mapping.Id, ct)` once for every matched Deployment mapping. This is an N+1 query pattern executed on the login hot path and on every 15-minute token refresh. With multiple site-scoped Deployment groups it issues a round-trip per group. **Recommendation** Add a repository method that loads scope rules for a set of mapping IDs in one query (or eager-loads them with the mappings), and resolve all scope rules with a single call. **Resolution** Deferred 2026-05-16 (commit `pending commit`). Finding confirmed accurate: `RoleMapper.MapGroupsToRolesAsync` issues one `GetScopeRulesForMappingAsync` round-trip per matched Deployment mapping — a genuine N+1 on the login / 15-minute-refresh path. However, the only correct fix (a batch `GetScopeRulesForMappingsAsync(IEnumerable)` repository method, or an eager-load navigation property) requires changes to `ISecurityRepository` (`src/ScadaLink.Commons`) and `SecurityRepository` (`src/ScadaLink.ConfigurationDatabase`). Both are outside the Security module's permitted edit scope for this review pass, and the existing `ISecurityRepository` surface offers no per-set scope-rule query, so the N+1 cannot be removed from within `RoleMapper.cs` alone. Severity is Low (bounded by the number of site-scoped Deployment groups, typically small). Deferred to a cross-module change that adds the batch repository method; `RoleMapper` should then resolve all scope rules in a single call. ### Security-009 — CancellationToken not honored inside `Task.Run` LDAP calls | | | |--|--| | Severity | Low | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.Security/LdapAuthService.cs:42`, `:46`, `:51`, `:56-57`, `:67-73`, `:135`, `:139-145` | **Description** The synchronous Novell LDAP calls are wrapped in `Task.Run(() => ..., ct)`. The `ct` argument only prevents the work item from *starting* if cancellation is already signaled; once a `connection.Connect`/`Bind`/`Search` call is in progress it cannot be cancelled. A cancelled or timed-out login request will continue to occupy a thread-pool thread and an LDAP connection until the blocking call returns on its own. There is also no explicit network/operation timeout configured on the `LdapConnection`. **Recommendation** Configure `LdapConnection.ConnectionTimeout` and search/operation time limits so a hung LDAP server cannot pin a thread indefinitely. Document that `ct` only guards work-item scheduling, or implement a timeout-with-disconnect fallback. **Resolution** Resolved 2026-05-16 (commit `pending commit`). Confirmed: the synchronous Novell LDAP calls were wrapped in `Task.Run(..., ct)` where `ct` only prevents the work item from starting and cannot interrupt an in-progress blocking `Connect`/`Bind`/`Search`, and no network/operation timeout was configured on the `LdapConnection`. Added a configurable `SecurityOptions.LdapConnectionTimeoutMs` (default 10s) and a new `ApplyConnectionTimeout` helper that sets both `LdapConnection.ConnectionTimeout` (socket connect) and `LdapConstraints.TimeLimit` (per-operation limit) before connecting, so a hung LDAP server can no longer pin a thread-pool thread indefinitely. The `ct`-only-guards-scheduling limitation is now documented in the option's XML doc and an inline comment. Regression tests `SecurityOptions_LdapConnectionTimeout_HasSaneDefault` and `AuthenticateAsync_UnreachableHost_FailsWithinConfiguredTimeout`. ### Security-010 — Design doc contradicts itself on Windows Integrated Authentication | | | |--|--| | Severity | Low | | Category | Design-document adherence | | Status | Resolved | | Location | `docs/requirements/Component-Security.md:13` (vs. `:23`) | **Description** The Responsibilities section states the component authenticates "using Windows Integrated Authentication", but the Authentication section (line 23) and CLAUDE.md explicitly state **"No Windows Integrated Authentication ... authenticates directly against LDAP/AD, not via Kerberos/NTLM"** — which is what the code actually does (direct LDAP bind). The Responsibilities line is stale and contradicts both the rest of the doc and the implementation. **Recommendation** Fix `Component-Security.md:13` to say "using a direct LDAP/Active Directory bind" to match the implemented behavior and the rest of the document. **Resolution** Resolved 2026-05-16 (commit `pending commit`). Confirmed: the Responsibilities bullet at line 13 said "using Windows Integrated Authentication", directly contradicting the Authentication section's "No Windows Integrated Authentication ... authenticates directly against LDAP/AD, not via Kerberos/NTLM", CLAUDE.md, and the implementation (`LdapAuthService` performs a direct username/password bind). Documentation-only finding — no regression test is meaningful. Reworded the Responsibilities bullet to "Authenticate users against LDAP/Active Directory using a direct LDAP/AD bind (username/password)", matching the rest of the document and the code. ### Security-011 — Missing tests for security-critical paths | | | |--|--| | Severity | Low | | Category | Testing coverage | | Status | Resolved | | Location | `tests/ScadaLink.Security.Tests/UnitTest1.cs` | **Description** The test suite covers happy paths well but omits several security-relevant cases: no test exercises the StartTLS path (Security-001), the DN-injection / `Contains('=')` fallback inputs (Security-005), JWT validation with a too-short or empty signing key (Security-003), `IsIdleTimedOut` returning true after a token has been refreshed (Security-007), or the `uid`/`cn` mismatch in the no-service-account path (Security-004). The integration `SecurityHardeningTests` only asserts default option values, not enforcement. The test file is still named `UnitTest1.cs`. **Recommendation** Add negative/edge-case tests for the items above, particularly key-length rejection, DN-escaping of hostile usernames, and idle-timeout behavior across a refresh. Rename `UnitTest1.cs` to a descriptive name. **Resolution** Resolved 2026-05-16 (commit `pending commit`). Re-triage: most of the listed gaps were already closed by the Security-001..007 fixes — the regression classes `SecurityReviewRegressionTests` (StartTLS path, JWT empty/short-key rejection) and `SecurityReviewRegressionTests2` (DN-injection / hostile-username escaping, `uid`/`cn` fallback consistency, idle-timeout preserved across refresh) cover them. The finding's reference to a `SecurityHardeningTests` class is stale — no such class exists in the suite. Remaining gaps closed here: renamed the test file `UnitTest1.cs` → `SecurityTests.cs`, and added a new `SecurityReviewRegressionTests3` class with the LDAP-timeout coverage (Security-009) plus extra no-service-account / DN-path edge cases (`BuildFallbackUserDn_NoSearchBase_ReturnsBareRdn`, `EscapeLdapDn_LeadingHash_IsEscaped`, `EscapeLdapDn_NullOrEmpty_ReturnedUnchanged`). Full module suite: 54 tests, all green.