Files
ScadaBridge/code-reviews/Security/findings.md
T
Joseph Doherty 6ae0fea558 fix(error-handling): close Theme 4 — 18 cancellation / fire-and-forget findings
Async cancellation hygiene, fire-and-forget observability, retry/shutdown
semantics, and audit-row coverage across 9 modules. Highlights:

Cancellation & lifecycle:
- AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the
  captured SyncContext that risked sync-over-async deadlock.
- AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS,
  threaded through drain paths instead of CancellationToken.None.
- Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls.
- Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM
  during the bounded-retry window aborts cleanly.

Cursor / retry / counter correctness:
- AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since`
  when any row's idempotent insert is still being retried (per-EventId
  retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical
  abandon). No more silent abandonment of permanently-failing rows.
- ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's
  SPLIT loop — by class-doc construction the catch could only mask real
  failures and let the next iteration create permanent partition holes.
- HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot
  per-interval counters before sending, restore via new
  ISiteHealthCollector.AddIntervalCounters on transport failure so counts
  aren't silently lost.

Fire-and-forget / shutdown waits:
- InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks
  via OnlyOnFaulted continuation (Warning log; response unchanged).
- SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep
  with a bounded SweepShutdownWaitTimeout (10s).

Leak / refactor:
- Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its
  own try/catch so a throw doesn't leak the relay actor or _activeStreams
  entry.
- Comm-022: VERIFIED already-closed by Comm-016's dead-code purge.
- CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync
  (auth-failure exit-code contract unified).

Defensive / validation:
- CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config
  prints a warning and returns defaults instead of crashing the CLI.
- Host-022: ParseLevel emits stderr one-shot warning for unrecognised
  MinimumLevel instead of silently coercing to Information.
- ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the
  per-call CTS is the sole timeout source (was clipped to 100s by .NET).
- Security-020: New SecurityOptionsValidator (IValidateOptions) rejects
  empty LdapServer/LdapSearchBase with ValidateOnStart.
- DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/
  DeleteTimedOut audit entries (mirrors DeployFailed pattern).

Plus reconciled stale per-module Open-findings counters that had drifted
from prior sessions.

20+ new regression tests across 11 test projects; build clean; affected
suites all green. README regenerated: 75 open (was 93).
2026-05-28 07:13:28 -04:00

57 KiB

Code Review — Security

Field Value
Module src/ScadaLink.Security
Design doc docs/requirements/Component-Security.md
Status Reviewed
Last reviewed 2026-05-28
Reviewer claude-agent
Commit reviewed 1eb6e97
Open findings 1 (Security-021); 1 deferred (Security-008)

Summary

The Security module is small and reasonably structured: a stateless LdapAuthService for search-then-bind authentication, a JwtTokenService for HMAC-signed cookie tokens, a RoleMapper that resolves LDAP groups to roles, and ASP.NET Core authorization policies plus a site-scope handler. Unit-test coverage of the happy paths is decent. However, the review surfaced several real security weaknesses, the most serious being that StartTLS is dead code (the design's "LDAPS or StartTLS" requirement is only half met), that the authentication cookie is not marked Secure despite the design mandating it, and that the JWT signing key is never length-validated so a weak or empty key is silently accepted. There is also a genuine DN-injection gap in the no-service-account fallback path, a filter/DN attribute mismatch (uid= vs cn=) that makes that fallback path internally inconsistent, and an N+1 query in RoleMapper. JWT validation also disables issuer/audience checks and the idle-timeout claim is reset on every refresh, weakening the documented 30-minute idle policy. None of these are crash/data-loss bugs, but the TLS, cookie, and key-validation items are security defects that should be fixed before any production deployment.

Re-review 2026-05-17 (commit 39d737e)

Re-reviewed the module after the batch-resolution of Security-001..007, 009, 010, 011. Those fixes were verified in place and remain Resolved: the LdapTransport enum makes StartTLS reachable, the auth cookie is SecurePolicy.Always with a sliding window, the JWT signing key is length-validated at construction, issuer/audience are bound and checked, the DN-injection fallback is RFC 4514-escaped, and the idle-timeout anchor is carried across refreshes. Security-008 (N+1 scope-rule query) remains correctly DeferredISecurityRepository still exposes no per-set scope-rule query, so the N+1 cannot be removed from within RoleMapper.cs alone; the deferral condition is unchanged. This pass surfaced 4 new findings (Security-012..015): two Medium and two Low. The most notable is that a partial LDAP outage during login (bind succeeds, group search fails) silently returns an authenticated session with zero roles instead of failing the login as the design's LDAP-failure rule requires (Security-012); and that RefreshToken re-issues a token without ever checking IsIdleTimedOut, so an idle-expired session can be renewed indefinitely if the caller omits the separate idle check (Security-014). The two Low findings concern fragile DN parsing of group names containing escaped commas and an un-trimmed username flowing into the LDAP filter, fallback DN, and JWT claims.

Re-review 2026-05-28 (commit 1eb6e97)

Re-reviewed the module on a fresh baseline. All Security-001..007, 009..015 fixes remain in place; the only Open carry-over is Security-008 (still correctly DeferredISecurityRepository still exposes no per-set scope-rule query, so the N+1 in RoleMapper cannot be removed from within this module). The original Security-014 fix is now load-bearing: RefreshToken calls IsIdleTimedOut before re-issuing, and the new cookie sliding-expiry tests in SecurityReviewRegressionTests pin CentralUI-005's Security-side contract. This pass surfaced 6 new findings (Security-016..021): one Medium correctness/security defect, one Medium design-adherence defect, and four Low. The most consequential is Security-016 — when a user is mapped to both a system-wide Deployment LDAP group (e.g. SCADA-Deploy-All) and a site-scoped Deployment LDAP group (e.g. SCADA-Deploy-SiteA), RoleMapper silently treats the union as site-scoped (the system-wide grant is dropped); the design's "multiple groups grant multiple independent roles" intent is not honoured for this mix-and-match case. Security-017 is the cross-module partner of CentralUI-028: SiteScopeRequirement / SiteScopeAuthorizationHandler are declared and registered but no production caller ever instantiates them — [Authorize(Policy = RequireDeployment)] does not enforce the documented site scoping, callers must remember to inject SiteScopeService and re-check IsSiteAllowedAsync themselves (which the two new report pages flagged by CentralUI-028 forgot to do). The remaining Lows are: role names are magic strings duplicated across RoleMapper, SiteScopeAuthorizationHandler, and AuthorizationPolicies (Security-018); a service-account-rebind failure is reported to the user as "Invalid username or password" — masking a misconfiguration as a user-credential error (Security-019); required SecurityOptions fields (LdapServer, LdapSearchBase) have no IValidateOptions startup check, so empty values silently surface only on first login (Security-020); and the RequireHttpsCookie=false dev opt-out emits no warning, so an HTTP production deployment silently transmits the JWT bearer credential in cleartext (Security-021).

Checklist coverage

# Category Examined Notes
1 Correctness & logic bugs uid=/cn= attribute mismatch between search filter and fallback DN construction (Security-004); StartTLS branch is unreachable (Security-001).
2 Akka.NET conventions No actors in this module — AddSecurityActors is an empty placeholder. Nothing to assess.
3 Concurrency & thread safety Services are stateless and DI-scoped; LDAP sync calls wrapped in Task.Run. No shared mutable state. No issues found.
4 Error handling & resilience LDAP failure paths return structured LdapAuthResult; group-lookup failure is tolerated per design. ct not honored inside Task.Run bodies (Security-009).
5 Security StartTLS dead code (Security-001), cookie not Secure (Security-002), JWT key unvalidated (Security-003), DN injection (Security-005), no issuer/audience validation (Security-006), idle-timeout reset on refresh (Security-007).
6 Performance & resource management N+1 scope-rule query in RoleMapper (Security-008). LdapConnection correctly disposed via using.
7 Design-document adherence StartTLS unsupported and Secure cookie missing both contradict the design doc; design also says "Windows Integrated Authentication" in Responsibilities, contradicting its own Authentication section (Security-010).
8 Code organization & conventions SecurityOptions correctly owned by the component; repository interface in Commons. No issues found.
9 Testing coverage No tests for RoleMapper N+1 behavior, DN-injection inputs, StartTLS path, or idle-timeout-after-refresh. Insecure-config combinations under-tested (Security-011).
10 Documentation & comments SecurityOptions XML docs say direct bind uses cn={username} while the search filter uses uid= — comment is misleading (covered under Security-004).

Re-review (2026-05-28, 1eb6e97):

# Category Examined Notes
1 Correctness & logic bugs RoleMapper drops a system-wide Deployment grant when the user is also in any site-scoped Deployment group (Security-016); hard-coded role-name string "Deployment" in two separate places allows a refactor to silently break site scoping (Security-018).
2 Akka.NET conventions No actors. AddSecurityActors is still a registration placeholder. No issues.
3 Concurrency & thread safety Services stateless; LDAP sync calls wrapped in Task.Run with the now-bounded timeout (Security-009 resolution holds). No issues found.
4 Error handling & resilience A service-account-rebind failure inside AuthenticateAsync is reported as "Invalid username or password", masking a misconfiguration as a user-credential error (Security-019). LDAP-failure rule + partial-outage path remain correctly enforced post-Security-012.
5 Security SiteScopeRequirement / SiteScopeAuthorizationHandler are dead code — no policy is registered that uses them and no production caller instantiates them, so declarative [Authorize] does not enforce site scoping (Security-017, cross-module partner of CentralUI-028). RequireHttpsCookie=false dev opt-out has no warning path — a production misconfiguration silently transmits the JWT bearer credential over HTTP (Security-021).
6 Performance & resource management Security-008 N+1 remains correctly Deferred (still gated on ISecurityRepository). No new perf issues.
7 Design-document adherence RoleMapper's drop-system-wide-on-any-scoped behaviour (Security-016) contradicts the design's "A user can hold multiple roles simultaneously … roles are independent — there is no implied hierarchy" rule for the union case; SiteScopeRequirement advertises a site-scope authorization pattern the implementation does not actually wire up (Security-017).
8 Code organization & conventions Role-name strings are duplicated as magic literals across RoleMapper.cs, SiteScopeAuthorizationHandler.cs, and AuthorizationPolicies.cs — only the audit roles have a single source of truth via OperationalAuditRoles / AuditExportRoles (Security-018). SecurityOptions defaults pass through to runtime with no IValidateOptions for required fields like LdapServer / LdapSearchBase (Security-020).
9 Testing coverage No test covers a user mapped to both a system-wide AND a site-scoped Deployment LDAP group (the Security-016 case). No test covers the SiteScopeRequirement cross-page integration — tests evaluate the handler in isolation, not the absence of a policy that uses it (Security-017).
10 Documentation & comments SiteScopeAuthorizationHandler XML doc describes a permission model no caller actually invokes (Security-017). Otherwise stable.

Findings

Security-001 — StartTLS upgrade path is unreachable dead code

Severity High
Category Security
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:37-47

Description

When LdapUseTls is true the code sets connection.SecureSocketLayer = true (LDAPS). The subsequent StartTLS block is guarded by if (_options.LdapUseTls && !connection.SecureSocketLayer). Because SecureSocketLayer was just set to true, the second condition !connection.SecureSocketLayer is always false, so connection.StartTls() is never called. The design doc explicitly states LDAP connections must use "LDAPS (port 636) or StartTLS" — StartTLS is in practice unsupported. A deployment that intends to use StartTLS on port 389 would get a plaintext LDAPS-mode connection attempt that fails, or worse, an operator may disable TLS entirely to make it work, sending credentials in cleartext.

Recommendation

Introduce an explicit transport mode (e.g. LdapTransport { Ldaps, StartTls, None }) or a separate LdapUseStartTls flag. For StartTLS, leave SecureSocketLayer false, call connection.Connect, then call connection.StartTls() and verify the negotiated session is encrypted before binding. Remove the unreachable conditional.

Resolution

Resolved 2026-05-16 (commit <pending>). Added an explicit LdapTransport enum (Ldaps/StartTls/None); SecureSocketLayer is set only for LDAPS, and the StartTLS branch now connects in plaintext, calls StartTls(), and verifies connection.Tls before binding. LdapUseTls is retained as a compatibility shim mapping onto the enum. Regression tests AuthenticateAsync_StartTlsTransport_AttemptsConnection and AuthenticateAsync_NoTlsTransport_RejectedWithoutAllowInsecure.

Severity High
Category Security
Status Resolved
Location src/ScadaLink.Security/ServiceCollectionExtensions.cs:16-23

Description

AddCookie sets HttpOnly = true and SameSite = Strict but never sets options.Cookie.SecurePolicy. The ASP.NET Core default is CookieSecurePolicy.SameAsRequest, which permits the cookie (carrying the embedded JWT — a bearer credential) to be sent over plain HTTP. The design doc states the cookie is "HttpOnly and Secure (requires HTTPS)". As written, the module does not enforce that requirement; a misconfigured or HTTP-fronted deployment would transmit the session token in cleartext.

Recommendation

Set options.Cookie.SecurePolicy = CookieSecurePolicy.Always in AddCookie. Consider also setting ExpireTimeSpan and SlidingExpiration to align the cookie lifetime with the documented 15-minute JWT / 30-minute idle policy.

Resolution

Resolved 2026-05-16 (commit <pending>). Confirmed the cookie is configured in this module (ServiceCollectionExtensions.AddSecurity), not CentralUI — no misattribution. Added options.Cookie.SecurePolicy = CookieSecurePolicy.Always so the JWT-bearing cookie is never sent over plain HTTP. Regression test AddSecurity_AuthCookie_IsMarkedSecureAlways. (ExpireTimeSpan/SlidingExpiration tuning left as a separate, lower-priority improvement.)

Security-003 — JWT signing key length is never validated

Severity High
Category Security
Status Resolved
Location src/ScadaLink.Security/JwtTokenService.cs:33, src/ScadaLink.Security/SecurityOptions.cs:42

Description

SecurityOptions.JwtSigningKey defaults to string.Empty and is fed directly into new SymmetricSecurityKey(Encoding.UTF8.GetBytes(_options.JwtSigningKey)) with no validation. HMAC-SHA256 requires a key of at least 256 bits (32 bytes); a short or empty key produces a trivially forgeable token. The SecurityHardeningTests comment claims a minimum length is "enforced", but no code in this module enforces it — the test only asserts that a 32+ char key works. A deployment with a missing or short JwtSigningKey would start successfully and issue weakly-signed tokens.

Recommendation

Validate JwtSigningKey at startup — fail fast if it is empty or shorter than 32 bytes. Use an IValidateOptions<SecurityOptions> validator or guard in the JwtTokenService constructor so a weak key is rejected before any token is issued.

Resolution

Resolved 2026-05-16 (commit <pending>). The JwtTokenService constructor now fails fast with an InvalidOperationException when JwtSigningKey is empty or shorter than 32 bytes (SecurityOptions.MinJwtSigningKeyBytes), so a weak key is rejected before any token is issued. The misleading SecurityOptions XML doc was corrected to state the requirement. Regression tests JwtTokenService_EmptySigningKey_ThrowsAtConstruction, JwtTokenService_ShortSigningKey_ThrowsAtConstruction, and JwtTokenService_AdequateSigningKey_ConstructsSuccessfully.

Security-004 — Search filter uses uid= while fallback DN construction uses cn=

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:66, :138, :157-159

Description

AuthenticateAsync and ResolveUserDnAsync build the search filter as (uid={username}), but the no-service-account fallback in ResolveUserDnAsync constructs the bind DN as cn={username},{LdapSearchBase}. The SecurityOptions.LdapServiceAccountDn XML comment also documents the fallback as cn={username},{LdapSearchBase}. A directory keyed on uid will succeed via search-then-bind but fail via the direct-bind fallback (and vice versa). The attribute used for lookup is hard-coded and inconsistent across the two code paths, so the two configuration modes are not interchangeable.

Recommendation

Introduce a single configurable LdapUserIdAttribute (default uid) and use it consistently in both the search filter and the fallback DN. Update the XML doc to match.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: the search filter was hard-coded (uid={username}) (both in AuthenticateAsync and ResolveUserDnAsync) while the fallback DN used cn={username} — the two auth modes were not interchangeable. Added a configurable SecurityOptions.LdapUserIdAttribute (default uid) used for both the search filter and the fallback DN via the new BuildFallbackUserDn helper, and corrected the LdapServiceAccountDn XML doc to reference {LdapUserIdAttribute}. Regression tests BuildFallbackUserDn_UsesConfiguredUserIdAttribute, BuildFallbackUserDn_HonoursNonDefaultUserIdAttribute, SecurityOptions_LdapUserIdAttribute_DefaultsToUid.

Security-005 — DN injection in the no-service-account bind fallback

Severity Medium
Category Security
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:157-159

Description

When no service account is configured, the user-supplied username is interpolated directly into a distinguished name: $"cn={username},{LdapSearchBase}". EscapeLdapFilter escapes search-filter metacharacters, but DN construction requires a different escaping scheme (RFC 4514 — ,, +, ", \, <, >, ;, leading/trailing spaces). No DN escaping is applied here. A username such as victim,ou=admins alters the DN structure, allowing a caller to attempt a bind as a different DN than intended. Combined with the username.Contains('=') shortcut at line 129 — which lets a caller supply a full arbitrary DN — the fallback path gives the client undue control over the bind identity.

Recommendation

Apply RFC 4514 DN-component escaping to username before interpolation, or use the LDAP library's DN-builder API. Reconsider the Contains('=') shortcut — accepting a raw DN from untrusted input is risky; restrict it or remove it.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: the fallback path interpolated the raw username straight into cn={username},{LdapSearchBase} with no DN escaping, and the username.Contains('=') shortcut let a caller supply an arbitrary bind DN. Added an RFC 4514 EscapeLdapDn helper (escapes , + " \ < > ;, leading/trailing space, leading #, NUL) applied in BuildFallbackUserDn, so a username such as victim,ou=admins can no longer alter the DN structure. The Contains('=') raw-DN shortcut was removed entirely — untrusted input no longer controls the bind identity. Regression tests BuildFallbackUserDn_EscapesDnMetacharacters, EscapeLdapDn_EscapesAllRfc4514Specials, EscapeLdapDn_EscapesLeadingAndTrailingSpaces.

Security-006 — JWT validation disables issuer and audience checks

Severity Medium
Category Security
Status Resolved
Location src/ScadaLink.Security/JwtTokenService.cs:67-75, :56-59

Description

ValidateToken sets ValidateIssuer = false and ValidateAudience = false, and GenerateToken never sets an iss or aud. With a shared symmetric HMAC key, any other system or component that signs JWTs with the same key would produce tokens this service accepts. While the design states the key is shared only between the two central nodes, omitting issuer/audience binding removes a cheap defense-in-depth control and makes accidental key reuse (e.g. the same secret used for another internal token) silently exploitable.

Recommendation

Set a fixed Issuer and Audience (e.g. "scadalink-central") when generating tokens and enable ValidateIssuer/ValidateAudience with the matching expected values during validation.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: GenerateToken set neither iss nor aud and ValidateToken had ValidateIssuer = false/ValidateAudience = false. GenerateToken now binds JwtTokenService.TokenIssuer/TokenAudience (both "scadalink-central") into every token, and ValidateToken enables ValidateIssuer/ValidateAudience against those fixed values — a token signed with the shared key but a foreign issuer is now rejected. Regression tests GenerateToken_SetsIssuerAndAudience, ValidateToken_RejectsTokenWithWrongIssuer, ValidateToken_AcceptsOwnIssuerAndAudience.

Security-007 — Idle-timeout claim is reset on every token refresh

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ScadaLink.Security/JwtTokenService.cs:40, :111-123

Description

The design states the 30-minute idle timeout is tracked via a "last-activity timestamp in the token", and IsIdleTimedOut reads the LastActivity claim. But RefreshToken calls GenerateToken, which unconditionally writes LastActivity = DateTimeOffset.UtcNow. Token refresh fires whenever a request arrives within ~5 minutes of expiry. The result is that LastActivity reflects token issuance time, not genuine user activity — and since refresh itself is a request, the timestamp keeps moving forward. A more subtle consequence: the idle window is effectively measured from the last refresh, not the last real interaction, so the documented "no requests within the idle window" semantics are not faithfully implemented. The claim name LastActivity is also misleading.

Recommendation

Decide explicitly how activity is tracked. Either (a) carry the original LastActivity forward across refreshes and update it only on real request handling in the middleware, or (b) rename the claim to IssuedAt/TokenCreated and document that the idle window is measured from issuance. Whichever is chosen, ensure IsIdleTimedOut and the refresh path agree on the semantics.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: RefreshTokenGenerateToken unconditionally wrote LastActivity = UtcNow, so the idle clock reset on every refresh and the documented 30-minute idle timeout could never fire for a client that polls in the background. Implemented option (a) — the Security-side half of the documented "15-min sliding + 30-min idle" policy (the cross-module partner of CentralUI-005): GenerateToken now takes an optional lastActivity anchor; RefreshToken carries the existing LastActivity claim forward unchanged, and a new explicit RecordActivity method advances the anchor to now — to be called by the CentralUI request pipeline on genuine user interaction (not on a background refresh). IsIdleTimedOut is unchanged and now agrees with the refresh path. The remaining CentralUI-side wiring (calling RecordActivity from the middleware, setting SlidingExpiration/ExpireTimeSpan) stays tracked under CentralUI-005; this finding's Security-side defect — the reset-on-refresh bug — is fully fixed here. Regression tests RefreshToken_PreservesOriginalLastActivityClaim, RefreshToken_DoesNotResetIdleTimeoutWhenUserIsActuallyIdle, RecordActivity_UpdatesLastActivityToNow.

Security-008 — N+1 query loading site-scope rules in RoleMapper

Severity Low
Category Performance & resource management
Status Deferred
Location src/ScadaLink.Security/RoleMapper.cs:25-48

Description

MapGroupsToRolesAsync first calls GetAllMappingsAsync, then inside the per-mapping loop calls GetScopeRulesForMappingAsync(mapping.Id, ct) once for every matched Deployment mapping. This is an N+1 query pattern executed on the login hot path and on every 15-minute token refresh. With multiple site-scoped Deployment groups it issues a round-trip per group.

Recommendation

Add a repository method that loads scope rules for a set of mapping IDs in one query (or eager-loads them with the mappings), and resolve all scope rules with a single call.

Resolution

Deferred 2026-05-16 (commit pending commit). Finding confirmed accurate: RoleMapper.MapGroupsToRolesAsync issues one GetScopeRulesForMappingAsync round-trip per matched Deployment mapping — a genuine N+1 on the login / 15-minute-refresh path. However, the only correct fix (a batch GetScopeRulesForMappingsAsync(IEnumerable<int>) repository method, or an eager-load navigation property) requires changes to ISecurityRepository (src/ScadaLink.Commons) and SecurityRepository (src/ScadaLink.ConfigurationDatabase). Both are outside the Security module's permitted edit scope for this review pass, and the existing ISecurityRepository surface offers no per-set scope-rule query, so the N+1 cannot be removed from within RoleMapper.cs alone. Severity is Low (bounded by the number of site-scoped Deployment groups, typically small). Deferred to a cross-module change that adds the batch repository method; RoleMapper should then resolve all scope rules in a single call.

Security-009 — CancellationToken not honored inside Task.Run LDAP calls

Severity Low
Category Error handling & resilience
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:42, :46, :51, :56-57, :67-73, :135, :139-145

Description

The synchronous Novell LDAP calls are wrapped in Task.Run(() => ..., ct). The ct argument only prevents the work item from starting if cancellation is already signaled; once a connection.Connect/Bind/Search call is in progress it cannot be cancelled. A cancelled or timed-out login request will continue to occupy a thread-pool thread and an LDAP connection until the blocking call returns on its own. There is also no explicit network/operation timeout configured on the LdapConnection.

Recommendation

Configure LdapConnection.ConnectionTimeout and search/operation time limits so a hung LDAP server cannot pin a thread indefinitely. Document that ct only guards work-item scheduling, or implement a timeout-with-disconnect fallback.

Resolution

Resolved 2026-05-16 (commit pending commit). Confirmed: the synchronous Novell LDAP calls were wrapped in Task.Run(..., ct) where ct only prevents the work item from starting and cannot interrupt an in-progress blocking Connect/Bind/Search, and no network/operation timeout was configured on the LdapConnection. Added a configurable SecurityOptions.LdapConnectionTimeoutMs (default 10s) and a new ApplyConnectionTimeout helper that sets both LdapConnection.ConnectionTimeout (socket connect) and LdapConstraints.TimeLimit (per-operation limit) before connecting, so a hung LDAP server can no longer pin a thread-pool thread indefinitely. The ct-only-guards-scheduling limitation is now documented in the option's XML doc and an inline comment. Regression tests SecurityOptions_LdapConnectionTimeout_HasSaneDefault and AuthenticateAsync_UnreachableHost_FailsWithinConfiguredTimeout.

Security-010 — Design doc contradicts itself on Windows Integrated Authentication

Severity Low
Category Design-document adherence
Status Resolved
Location docs/requirements/Component-Security.md:13 (vs. :23)

Description

The Responsibilities section states the component authenticates "using Windows Integrated Authentication", but the Authentication section (line 23) and CLAUDE.md explicitly state "No Windows Integrated Authentication ... authenticates directly against LDAP/AD, not via Kerberos/NTLM" — which is what the code actually does (direct LDAP bind). The Responsibilities line is stale and contradicts both the rest of the doc and the implementation.

Recommendation

Fix Component-Security.md:13 to say "using a direct LDAP/Active Directory bind" to match the implemented behavior and the rest of the document.

Resolution

Resolved 2026-05-16 (commit pending commit). Confirmed: the Responsibilities bullet at line 13 said "using Windows Integrated Authentication", directly contradicting the Authentication section's "No Windows Integrated Authentication ... authenticates directly against LDAP/AD, not via Kerberos/NTLM", CLAUDE.md, and the implementation (LdapAuthService performs a direct username/password bind). Documentation-only finding — no regression test is meaningful. Reworded the Responsibilities bullet to "Authenticate users against LDAP/Active Directory using a direct LDAP/AD bind (username/password)", matching the rest of the document and the code.

Security-011 — Missing tests for security-critical paths

Severity Low
Category Testing coverage
Status Resolved
Location tests/ScadaLink.Security.Tests/UnitTest1.cs

Description

The test suite covers happy paths well but omits several security-relevant cases: no test exercises the StartTLS path (Security-001), the DN-injection / Contains('=') fallback inputs (Security-005), JWT validation with a too-short or empty signing key (Security-003), IsIdleTimedOut returning true after a token has been refreshed (Security-007), or the uid/cn mismatch in the no-service-account path (Security-004). The integration SecurityHardeningTests only asserts default option values, not enforcement. The test file is still named UnitTest1.cs.

Recommendation

Add negative/edge-case tests for the items above, particularly key-length rejection, DN-escaping of hostile usernames, and idle-timeout behavior across a refresh. Rename UnitTest1.cs to a descriptive name.

Resolution

Resolved 2026-05-16 (commit pending commit). Re-triage: most of the listed gaps were already closed by the Security-001..007 fixes — the regression classes SecurityReviewRegressionTests (StartTLS path, JWT empty/short-key rejection) and SecurityReviewRegressionTests2 (DN-injection / hostile-username escaping, uid/cn fallback consistency, idle-timeout preserved across refresh) cover them. The finding's reference to a SecurityHardeningTests class is stale — no such class exists in the suite. Remaining gaps closed here: renamed the test file UnitTest1.csSecurityTests.cs, and added a new SecurityReviewRegressionTests3 class with the LDAP-timeout coverage (Security-009) plus extra no-service-account / DN-path edge cases (BuildFallbackUserDn_NoSearchBase_ReturnsBareRdn, EscapeLdapDn_LeadingHash_IsEscaped, EscapeLdapDn_NullOrEmpty_ReturnedUnchanged). Full module suite: 54 tests, all green.

Security-012 — Partial LDAP failure during login yields a roleless authenticated session

Severity Medium
Category Error handling & resilience
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:78-118

Description

In AuthenticateAsync, after the user bind succeeds, the group/attribute Search is wrapped in a try/catch (LdapException) that logs a warning and continues — the comment states "Auth succeeded even if attribute lookup failed". The method then returns new LdapAuthResult(true, displayName, username, groups, null) with an empty groups list. Because RoleMapper.MapGroupsToRolesAsync derives all roles from that group list, a login during a partial LDAP outage (bind reachable, search/group server unavailable, or a transient LdapException mid-search) produces a fully authenticated session with zero roles — the user is logged in but silently stripped of all permissions. The design's LDAP Connection Failure rule states new logins must fail when LDAP is unavailable; this path instead admits the user. The caller cannot distinguish "user genuinely belongs to no mapped groups" from "the group query failed", so it cannot make the documented fail-the-login decision. The while loop's inner catch (LdapException) { break; } (line 107-111) compounds this: a real error mid-enumeration is indistinguishable from end-of-results and silently truncates the group list.

Recommendation

Treat a failed group/attribute lookup on the initial login as an authentication failure: return Success = false with a "directory unavailable, try again" message, or add an explicit flag on LdapAuthResult (e.g. GroupLookupSucceeded) so the caller can apply the design's fail-the-login rule. The inner while-loop catch should not treat an arbitrary LdapException as a benign end-of-results sentinel — only the specific "no more results" condition should break the loop; any other exception should propagate or be surfaced.

Resolution

Resolved 2026-05-17 (commit pending). Confirmed: the group/attribute Search was wrapped in a catch (LdapException) that logged a warning and returned new LdapAuthResult(true, …, groups: []) — a partial LDAP outage produced an authenticated, zero-role session — and the inner while-loop catch { break; } masked real mid-enumeration errors as end-of-results. AuthenticateAsync now tracks a groupLookupSucceeded flag (false when any LdapException is thrown by the search or Next()); the inner swallowing catch was removed so such errors propagate; the new BuildAuthResultFromGroupLookup helper returns a FAILED LdapAuthResult ("directory temporarily unavailable, please try again") on lookup failure while still treating a genuine empty-groups result as a successful login. Regression tests BuildAuthResultFromGroupLookup_LookupFailed_FailsTheLogin, BuildAuthResultFromGroupLookup_LookupSucceededNoGroups_IsAuthenticated, BuildAuthResultFromGroupLookup_LookupSucceededWithGroups_CarriesGroups.

Security-013 — ExtractFirstRdnValue mis-parses group DNs containing escaped commas

Severity Low
Category Correctness & logic bugs
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:258-269

Description

ExtractFirstRdnValue extracts a group name from a memberOf DN by taking the substring between the first = and the first ,. RFC 4514 permits a , inside an RDN value when it is backslash-escaped — e.g. cn=Acme\, Inc Operators,ou=groups,dc=example,dc=com. The method splits on the escaped comma and returns Acme\ as the group name. RoleMapper then matches that mangled name verbatim against configured LdapGroupMapping.LdapGroupName values, so a group whose CN legitimately contains a comma silently fails to map to its role — the user loses that role with no error. The same naive parsing also does not strip the trailing backslash escape.

Recommendation

Parse the first RDN with an escape-aware scan: ignore a , preceded by an unescaped \, and unescape RFC 4514 sequences in the extracted value. Alternatively use the LDAP library's DN-parsing API (LdapDN / RDN accessors) rather than hand-rolled IndexOf.

Resolution

Resolved 2026-05-17 (commit pending). Confirmed: ExtractFirstRdnValue took the substring between the first = and the first ,, splitting on an RFC 4514 backslash-escaped comma so a CN like Acme\, Inc Operators became Acme\. Replaced the naive IndexOf scan with an escape-aware character scan that ignores a backslash-escaped ,, unescapes single-character and \XX hex escape sequences, and only terminates the value on an unescaped comma — so a comma-containing group CN is returned intact and matches its configured LdapGroupName. Method was also made public for direct testing. Regression tests ExtractFirstRdnValue_EscapedComma_KeepsWholeGroupName, ExtractFirstRdnValue_PlainDn_ReturnsFirstRdnValue, ExtractFirstRdnValue_SingleRdn_ReturnsValue, ExtractFirstRdnValue_EscapedSpecials_AreUnescaped.

Security-014 — RefreshToken re-issues a token without checking the idle timeout

Severity Medium
Category Security
Status Resolved
Location src/ScadaLink.Security/JwtTokenService.cs:156-169

Description

RefreshToken carries the existing LastActivity claim forward (correct, per the Security-007 fix) and unconditionally issues a fresh token with a new 15-minute expiry. It never calls IsIdleTimedOut. The documented 30-minute idle policy therefore depends entirely on the CentralUI request pipeline calling IsIdleTimedOut before calling RefreshToken — two independent, order-dependent checks with no enforcement that they are used together. If a caller refreshes without first checking idle state (an easy mistake, and there is no compiler or API signal preventing it), an idle-expired session is silently renewed: each refresh resets the JWT expiry while LastActivity keeps ageing, so the session can be kept alive indefinitely by background refreshes even though the user has been idle for hours. RefreshToken is the natural place to enforce the invariant but provides no safety net.

Recommendation

Have RefreshToken itself reject a principal that is already past the idle timeout — return null (the same "cannot refresh" signal it already uses for missing claims) when IsIdleTimedOut(currentPrincipal) is true — so the idle policy holds regardless of caller discipline. Document that an idle-expired principal cannot be refreshed and add a regression test covering refresh of a 31-minute-idle token.

Resolution

Resolved 2026-05-17 (commit pending). Confirmed: RefreshToken carried LastActivity forward and minted a fresh 15-minute token without ever calling IsIdleTimedOut, so the idle policy depended entirely on caller discipline and a background refresh could keep an idle-expired session alive indefinitely. RefreshToken now calls IsIdleTimedOut(currentPrincipal) after the missing-claims check and returns null (its existing "cannot refresh" signal) when the session is past the idle window, so the 30-minute idle policy holds regardless of caller order; XML doc updated to state the invariant. Regression tests RefreshToken_IdleExpiredPrincipal_ReturnsNull, RefreshToken_ActiveSessionWithinIdleWindow_StillRefreshes, RefreshToken_MissingLastActivityClaim_ReturnsNull.

Security-015 — Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims

Severity Low
Category Correctness & logic bugs
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:20-21, :80, :122, :169, :193

Description

AuthenticateAsync rejects null/whitespace-only usernames via IsNullOrWhiteSpace, but an otherwise-valid username with leading or trailing whitespace (" alice", copy-paste artefacts, mobile keyboards) is passed through verbatim. It flows into the LDAP search filter ({attr}={EscapeLdapFilter(username)}), into the fallback bind DN via BuildFallbackUserDn, and — most consequentially — into the returned LdapAuthResult and from there into the JWT Username claim. Most LDAP directories ignore surrounding whitespace on a search term, so " alice" and "alice" may both authenticate but yield JWT Username claims that differ by whitespace. Any downstream identity comparison (audit-log who, site-scope lookups keyed by username, session de-duplication) then sees two distinct identities for the same person.

Recommendation

Trim the username once at the top of AuthenticateAsync (after the IsNullOrWhiteSpace guard) and use the trimmed value consistently for the filter, the DN, and the LdapAuthResult. This normalises the identity that ends up in the JWT claim and audit trail.

Resolution

Resolved 2026-05-17 (commit pending). Confirmed: an otherwise-valid username with leading/trailing whitespace passed the IsNullOrWhiteSpace guard and flowed verbatim into the LDAP filter, the fallback DN, and the returned LdapAuthResult (and thus the JWT Username claim). Added a NormalizeUsername helper (trims whitespace) called once at the top of AuthenticateAsync immediately after the guard; the local username variable is reassigned to the trimmed value so the filter, fallback DN, and result all use the single canonical identity. Regression tests NormalizeUsername_TrimsLeadingAndTrailingWhitespace, BuildFallbackUserDn_TrimmedUsername_NoLeadingTrailingSpace, AuthenticateAsync_UsernameWithSurroundingWhitespace_StillRejectedForInsecure.

Security-016 — RoleMapper silently drops the system-wide Deployment grant when a user is also in any site-scoped Deployment group

Severity Medium
Category Correctness & logic bugs
Status Resolved
Location src/ScadaLink.Security/RoleMapper.cs:30-31, :41-55, :59

Description

MapGroupsToRolesAsync resolves the Deployment role's site scope as a single isSystemWide = hasDeploymentRole && !hasDeploymentWithScopeRules flag computed across ALL matched Deployment mappings. If a user is a member of both a system-wide Deployment group (e.g. SCADA-Deploy-All, no scope rules) AND a site-scoped Deployment group (e.g. SCADA-Deploy-SiteA, one scope rule for Site A), the second mapping sets hasDeploymentWithScopeRules = true, so the final isSystemWide becomes false and the returned PermittedSiteIds is just [SiteA]. The system-wide grant from SCADA-Deploy-All is silently dropped — the user loses access to every other site, even though one of their LDAP groups was intended to grant them system-wide reach. This contradicts the design's "A user can hold multiple roles simultaneously … roles are independent — there is no implied hierarchy" intent: the union of grants should be the broadest grant in the set, not the narrowest. The mistake is also non-obvious to an operator: from the Admin → LDAP Mappings page nothing flags that adding a site-scoped Deployment mapping for a user already in SCADA-Deploy-All removes sites from their effective grant. The downstream SiteScopeService.IsSystemWideAsync / FilterSitesAsync faithfully reproduce this narrowing, so the user can no longer see or act on sites outside [SiteA].

Recommendation

Track the union semantics explicitly: if any matched Deployment mapping has no scope rules, the user is system-wide regardless of what other mappings have. The simplest change is to set hasDeploymentWithScopeRules only when the mapping has scope rules AND another flag hasUnscopedDeploymentMapping is false; then compute isSystemWide = hasUnscopedDeploymentMapping || (hasDeploymentRole && !hasDeploymentWithScopeRules). Equivalently: collect per-mapping (hasRules, scopedSiteIds) first, then isSystemWide = any mapping has hasRules==false, and permittedSiteIds = union of all scopedSiteIds (left empty for system-wide users). Add a regression test MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide covering the design's example pair SCADA-Deploy-All + SCADA-Deploy-SiteA.

Resolution

Resolved 2026-05-28 (commit pending). RoleMapper.MapGroupsToRolesAsync now tracks two independent flags per matched Deployment mapping: hasUnscopedDeploymentMapping (any matched mapping with no scope rules) and hasScopedDeploymentMapping (any matched mapping with scope rules). isSystemWide is hasUnscopedDeploymentMapping || (hasDeploymentRole && !hasScopedDeploymentMapping) — so a user in both SCADA-Deploy-All and SCADA-Deploy-SiteA is now correctly system-wide, with the accumulated scope ids cleared. Magic string "Deployment" was replaced with the new Roles.Deployment constant (Security-018). Regression test MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide (seeds Site A, attaches a scope rule to the seeded SCADA-Deploy-SiteA mapping, and asserts a user mapped via both SCADA-Deploy-All and SCADA-Deploy-SiteA resolves to system-wide with empty PermittedSiteIds) fails on the pre-fix code and passes after.

Security-017 — SiteScopeRequirement / SiteScopeAuthorizationHandler are dead code from production callers — [Authorize(Policy = RequireDeployment)] does NOT enforce site scoping

Severity Medium
Category Design-document adherence
Status Resolved
Location src/ScadaLink.Security/SiteScopeAuthorizationHandler.cs:8-58; src/ScadaLink.Security/AuthorizationPolicies.cs:113-143

Description

The module declares SiteScopeRequirement (an IAuthorizationRequirement carrying a TargetSiteId) and the matching SiteScopeAuthorizationHandler that combines the Deployment role claim with the SiteId claims to enforce the design's site-scoping rule. The handler is registered in AddScadaLinkAuthorization (services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>()). But no AddPolicy call ever wires the requirement to a named policy, and a grep across src/ScadaLink.CentralUI and src/ScadaLink.ManagementService confirms that no production code ever instantiates new SiteScopeRequirement(...) or calls AuthorizeAsync(...) with one — the only callers are the unit tests in SecurityTests.cs:1146,1166,1185,1203. The design + CLAUDE.md state that "Deployment and Monitoring pages must filter every site/instance list through FilterSitesAsync and re-check IsSiteAllowedAsync before any cross-site command", and the CentralUI-028 finding (High, Open) confirms this is exactly the contract two new report pages forgot — because there is no declarative [Authorize(Policy = ...)] shortcut, callers must remember to inject SiteScopeService and write the check by hand, and any new page that forgets is a silent regression with no compile-time or test-time signal. The module's published surface advertises an authorization-handler pattern that is, in practice, unwired plumbing.

Recommendation

Either (a) delete SiteScopeRequirement and SiteScopeAuthorizationHandler (and the dead IAuthorizationHandler registration) and document SiteScopeService as the sole site-scoping mechanism — this is the smaller change and matches what the codebase actually does today; or, preferably, (b) finish the wiring: add a RequireSiteScope policy that uses SiteScopeRequirement and provide a small helper / source generator or analyzer that flags Deployment-policy-attributed pages without a site-scope check. Either way, address the cross-module gap: CentralUI-028 stays open until production pages reliably enforce the rule. If (b) is chosen, a route-parameter-aware IAuthorizationPolicyProvider is needed so the policy can read the target site id from the request — that is a meaningful design extension and would need to be planned alongside the Central UI's existing SiteScopeService usage rather than replacing it piecemeal.

Resolution

Resolved 2026-05-28 (commit pending) per recommendation option (a): deleted SiteScopeRequirement and SiteScopeAuthorizationHandler outright, along with the unwired services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>() registration in AuthorizationPolicies.AddScadaLinkAuthorization and the four isolation tests in SecurityTests.cs. SiteScopeService (CentralUI-002) is now documented as the sole site-scoping mechanism — the stale "mirrors SiteScopeAuthorizationHandler" comment in SiteScopeService.ResolveAsync was rewritten to say so. The cross-module partner CentralUI-028 is fixed in the same batch (the two new report pages now consume SiteScopeService).

Security-018 — Role names are hard-coded magic strings duplicated across RoleMapper, SiteScopeAuthorizationHandler, and AuthorizationPolicies

Severity Low
Category Code organization & conventions
Status Resolved
Location src/ScadaLink.Security/RoleMapper.cs:41; src/ScadaLink.Security/SiteScopeAuthorizationHandler.cs:36; src/ScadaLink.Security/AuthorizationPolicies.cs:118,121,124,95,107

Description

The role-name literals "Admin", "Design", "Deployment", "Audit", and "AuditReadOnly" are duplicated as magic strings across three separate files: RoleMapper.cs:41 hard-codes "Deployment" to detect the site-scope branch; SiteScopeAuthorizationHandler.cs:36 independently hard-codes "Deployment" to gate the handler; and AuthorizationPolicies.cs:118,121,124 hard-code the four role names as the policy RequireClaim values. Only the audit roles have a single source of truth (via the OperationalAuditRoles / AuditExportRoles arrays on AuthorizationPolicies). A future rename or addition of a role that misses any one of these call sites silently breaks the system: e.g. renaming "Deployment" → "Deployer" in RoleMapper alone would leave the policy still requiring "Deployment" (logins get the new role name but the policy never matches), while changing it in the policy alone would leave RoleMapper failing to populate scope rules for the renamed role. The bug class is "string drift" — exactly the kind the OperationalAuditRoles constant was introduced to prevent.

Recommendation

Introduce a public static class Roles { public const string Admin = "Admin"; public const string Design = "Design"; public const string Deployment = "Deployment"; public const string Audit = "Audit"; public const string AuditReadOnly = "AuditReadOnly"; } in the Security project and replace every magic-string occurrence — including the elements of OperationalAuditRoles and AuditExportRoles — with the constants. A single rename will then either succeed everywhere or fail to compile.

Resolution

Resolved 2026-05-28 (commit pending). Added src/ScadaLink.Security/Roles.cs holding Admin/Design/Deployment/Audit/AuditReadOnly as public const string fields. Replaced every magic-string occurrence in this module: RoleMapper.MapGroupsToRolesAsync now compares against Roles.Deployment; AuthorizationPolicies.AddScadaLinkAuthorization binds RequireClaim(...) to Roles.{Admin,Design,Deployment}; OperationalAuditRoles / AuditExportRoles are now built from Roles.Admin, Roles.Audit, Roles.AuditReadOnly. SiteScopeAuthorizationHandler.cs was deleted under Security-017, so its "Deployment" literal is gone with it. A future rename now propagates by a single edit or fails to compile.

Security-019 — Service-account rebind failure is reported as "Invalid username or password" — masks misconfiguration as a user-credential error

Severity Low
Category Error handling & resilience
Status Resolved
Location src/ScadaLink.Security/LdapAuthService.cs:85-89, :147-151

Description

After the user's credentials bind successfully, AuthenticateAsync re-binds as the configured service account to perform the group/attribute search (connection.Bind(_options.LdapServiceAccountDn, _options.LdapServiceAccountPassword)). A failure of this second bind — wrong service-account password, deleted/disabled service-account, locked-out service-account — throws LdapException which is caught by the broad outer catch (LdapException) and returned as new LdapAuthResult(false, null, username, null, "Invalid username or password."). The user sees an "invalid credentials" message for their credentials even though their bind succeeded and the failure was in the system's own service-account configuration. Worse, every user attempting to log in sees the same incorrect message during a service-account outage, which routes operators down the wrong incident path (reset the user's password) instead of the right one (check the service-account credentials). The successful user bind itself is also not auditable as a discrete event because the result is "Invalid username or password" — indistinguishable from a genuine bad-password attempt.

Recommendation

Wrap the service-account rebind in its own try/catch (LdapException) and surface a distinct error: log _logger.LogError(ex, "Service-account rebind failed; check LdapServiceAccountDn / LdapServiceAccountPassword configuration") and return new LdapAuthResult(false, null, username, null, "Authentication service is misconfigured. Contact an administrator."). Add a regression test that exercises the service-account-bind failure path (a mocked or seamed LdapConnection.Bind that throws on the second call) and asserts the distinct error message.

Resolution

Resolved 2026-05-28 (commit pending). Added a new ServiceAccountBindException (src/ScadaLink.Security/ServiceAccountBindException.cs) — deliberately NOT an LdapException subtype so it short-circuits the generic LDAP catch — and a private BindServiceAccountAsync helper on LdapAuthService that wraps both service-account rebind sites (the post-user-bind group-lookup rebind AND the ResolveUserDnAsync DN-search rebind). On LdapException, the helper logs Error ("Service-account rebind failed; check LdapServiceAccountDn / LdapServiceAccountPassword configuration") and rethrows as ServiceAccountBindException, which the outer AuthenticateAsync catch chain maps to the distinct user-facing message "Authentication service is misconfigured. Contact an administrator." Service-account faults no longer surface as "Invalid username or password". Regression test ServiceAccountBindException_DoesNotInheritLdapException_SoCatchOrderIsCorrect pins the load-bearing inheritance contract; a full end-to-end auth test that exercises the service-account-bind failure path is not feasible without an LDAP seam in LdapAuthService (the LdapConnection is constructed in-method), so the structural test is the closest meaningful unit-level coverage.

Security-020 — SecurityOptions has no startup validation for required fields (LdapServer, LdapSearchBase)

Severity Low
Category Error handling & resilience
Status Resolved
Location src/ScadaLink.Security/SecurityOptions.cs:6-7, :36-37; src/ScadaLink.Security/ServiceCollectionExtensions.cs:13-30

Resolution (2026-05-28): added SecurityOptionsValidator (IValidateOptions<SecurityOptions>) that rejects empty/whitespace LdapServer and LdapSearchBase with messages naming the full Security:Field key the operator would edit. AddSecurity registers it via services.AddOptions<SecurityOptions>().ValidateOnStart() + TryAddEnumerable(... SecurityOptionsValidator) so a misconfigured Security section fails fast at boot rather than minutes later on the first login. JwtSigningKey is deliberately left to JwtTokenService's existing length-aware constructor guard (Security-003). Regression tests in SecurityOptionsValidatorTests: valid-options succeed; empty/whitespace LdapServer and LdapSearchBase each fail with the key-naming message (theory); both-empty reports both keys; AddSecurity_RegistersSecurityOptionsValidator pins the DI wiring.

Description

SecurityOptions.JwtSigningKey correctly fails fast at JwtTokenService construction (Security-003 fix), but the LDAP-side required fields — LdapServer (default string.Empty) and LdapSearchBase (default string.Empty) — have no equivalent guard. AddSecurity does not register an IValidateOptions<SecurityOptions>. A deployment that fails to set LdapServer (a typo in the appsettings.json section name, a missing environment-variable substitution, a misconfigured Docker compose file) starts cleanly — the Central UI comes up, the login page loads, and only the first authentication attempt fails with LdapConnection.Connect("") throwing a low-level exception that bubbles up as the generic "An unexpected error occurred during authentication." message. The misconfiguration surfaces minutes or hours into the deploy, on the first real user login, rather than at startup where it is cheap to diagnose.

Recommendation

Add an IValidateOptions<SecurityOptions> registered via services.AddOptions<SecurityOptions>().ValidateOnStart() that fails when LdapServer is null/whitespace, LdapSearchBase is null/whitespace, or LdapPort <= 0. Combine with the existing JwtTokenService constructor check so every required SecurityOptions field is enforced at startup, not at first use.

Security-021 — RequireHttpsCookie=false dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext

Severity Low
Category Security
Status Open
Location src/ScadaLink.Security/SecurityOptions.cs:100-108; src/ScadaLink.Security/ServiceCollectionExtensions.cs:54-59

Description

The Security-002 fix added RequireHttpsCookie (default true) so the auth cookie's SecurePolicy is Always in production. The current Docker dev cluster sets RequireHttpsCookie=false in both central nodes' appsettings.Central.json, downgrading to SameAsRequest so the local HTTP cluster works. The downgrade is documented in the XML doc but is silent at runtime: no log line warns that the cookie carrying the JWT bearer credential is being sent over an HTTP-only path. A production deployment that inherits a dev-derived appsettings — or that copy-pastes the docker config and forgets to flip the flag — transmits the session token in cleartext with no diagnostic signal. The default is correct; the gap is that the unsafe override has no operational guard.

Recommendation

In the PostConfigure block in AddSecurity, when RequireHttpsCookie == false, log a single startup warning along the lines of _logger.LogWarning("RequireHttpsCookie is DISABLED — auth cookie SecurePolicy is SameAsRequest. The cookie-embedded JWT will be transmitted over plain HTTP. This setting is intended for local dev only — set SecurityOptions:RequireHttpsCookie=true in production."). Optionally, also fail startup when RequireHttpsCookie=false AND ASPNETCORE_ENVIRONMENT=Production. Add a regression test that asserts the warning is emitted when the flag is disabled and not when it is enabled.