6ae0fea558
Async cancellation hygiene, fire-and-forget observability, retry/shutdown semantics, and audit-row coverage across 9 modules. Highlights: Cancellation & lifecycle: - AuditLog-006: SqliteAuditWriter.Dispose hops to thread pool, escaping the captured SyncContext that risked sync-over-async deadlock. - AuditLog-010: SiteAuditTelemetryActor owns a private lifecycle CTS, threaded through drain paths instead of CancellationToken.None. - Comm-019: CentralCommunicationActor adds lifecycle CTS for repo calls. - Host-019: Migration StartupRetry forwards ApplicationStopping so SIGTERM during the bounded-retry window aborts cleanly. Cursor / retry / counter correctness: - AuditLog-004: SiteAuditReconciliationActor's cursor now holds at `since` when any row's idempotent insert is still being retried (per-EventId retry counter, MaxPermanentInsertAttempts=5 escape valve with LogCritical abandon). No more silent abandonment of permanently-failing rows. - ConfigDB-019: Dropped the catch-and-continue on EnsureLookaheadAsync's SPLIT loop — by class-doc construction the catch could only mask real failures and let the next iteration create permanent partition holes. - HM-017/018: HealthReportSender + CentralHealthReportLoop snapshot per-interval counters before sending, restore via new ISiteHealthCollector.AddIntervalCounters on transport failure so counts aren't silently lost. Fire-and-forget / shutdown waits: - InboundAPI-018: AuditWriteMiddleware observes faulted audit-write tasks via OnlyOnFaulted continuation (Warning log; response unchanged). - SnF-024: StoreAndForwardService.StopAsync awaits in-flight retry sweep with a bounded SweepShutdownWaitTimeout (10s). Leak / refactor: - Comm-021: SiteStreamGrpcServer.SubscribeInstance wraps Subscribe in its own try/catch so a throw doesn't leak the relay actor or _activeStreams entry. - Comm-022: VERIFIED already-closed by Comm-016's dead-code purge. - CLI-017: BundleCommands' three subcommands delegate to ExecuteCommandAsync (auth-failure exit-code contract unified). Defensive / validation: - CLI-021: CliConfig.Load wraps file-read/JSON parse so malformed config prints a warning and returns defaults instead of crashing the CLI. - Host-022: ParseLevel emits stderr one-shot warning for unrecognised MinimumLevel instead of silently coercing to Information. - ESG-019: ExternalSystemClient sets HttpClient.Timeout=Infinite so the per-call CTS is the sole timeout source (was clipped to 100s by .NET). - Security-020: New SecurityOptionsValidator (IValidateOptions) rejects empty LdapServer/LdapSearchBase with ValidateOnStart. - DM-019: Lifecycle command timeouts now emit DisableTimedOut/EnableTimedOut/ DeleteTimedOut audit entries (mirrors DeployFailed pattern). Plus reconciled stale per-module Open-findings counters that had drifted from prior sessions. 20+ new regression tests across 11 test projects; build clean; affected suites all green. README regenerated: 75 open (was 93).
999 lines
57 KiB
Markdown
999 lines
57 KiB
Markdown
# Code Review — Security
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Module | `src/ScadaLink.Security` |
|
|
| Design doc | `docs/requirements/Component-Security.md` |
|
|
| Status | Reviewed |
|
|
| Last reviewed | 2026-05-28 |
|
|
| Reviewer | claude-agent |
|
|
| Commit reviewed | `1eb6e97` |
|
|
| Open findings | 1 (Security-021); 1 deferred (Security-008) |
|
|
|
|
## Summary
|
|
|
|
The Security module is small and reasonably structured: a stateless `LdapAuthService`
|
|
for search-then-bind authentication, a `JwtTokenService` for HMAC-signed cookie tokens,
|
|
a `RoleMapper` that resolves LDAP groups to roles, and ASP.NET Core authorization
|
|
policies plus a site-scope handler. Unit-test coverage of the happy paths is decent.
|
|
However, the review surfaced several real security weaknesses, the most serious being
|
|
that **StartTLS is dead code** (the design's "LDAPS or StartTLS" requirement is only
|
|
half met), that **the authentication cookie is not marked `Secure`** despite the design
|
|
mandating it, and that **the JWT signing key is never length-validated** so a weak or
|
|
empty key is silently accepted. There is also a genuine **DN-injection** gap in the
|
|
no-service-account fallback path, a filter/DN attribute mismatch (`uid=` vs `cn=`) that
|
|
makes that fallback path internally inconsistent, and an N+1 query in `RoleMapper`.
|
|
JWT validation also disables issuer/audience checks and the idle-timeout claim is reset
|
|
on every refresh, weakening the documented 30-minute idle policy. None of these are
|
|
crash/data-loss bugs, but the TLS, cookie, and key-validation items are security
|
|
defects that should be fixed before any production deployment.
|
|
|
|
#### Re-review 2026-05-17 (commit `39d737e`)
|
|
|
|
Re-reviewed the module after the batch-resolution of Security-001..007, 009, 010, 011.
|
|
Those fixes were verified in place and remain Resolved: the `LdapTransport` enum makes
|
|
StartTLS reachable, the auth cookie is `SecurePolicy.Always` with a sliding window, the
|
|
JWT signing key is length-validated at construction, issuer/audience are bound and
|
|
checked, the DN-injection fallback is RFC 4514-escaped, and the idle-timeout anchor is
|
|
carried across refreshes. Security-008 (N+1 scope-rule query) remains correctly
|
|
**Deferred** — `ISecurityRepository` still exposes no per-set scope-rule query, so the
|
|
N+1 cannot be removed from within `RoleMapper.cs` alone; the deferral condition is
|
|
unchanged. This pass surfaced **4 new findings** (Security-012..015): two Medium and
|
|
two Low. The most notable is that a *partial* LDAP outage during login (bind succeeds,
|
|
group search fails) silently returns an authenticated session with **zero roles**
|
|
instead of failing the login as the design's LDAP-failure rule requires
|
|
(Security-012); and that `RefreshToken` re-issues a token without ever checking
|
|
`IsIdleTimedOut`, so an idle-expired session can be renewed indefinitely if the caller
|
|
omits the separate idle check (Security-014). The two Low findings concern fragile
|
|
DN parsing of group names containing escaped commas and an un-trimmed username flowing
|
|
into the LDAP filter, fallback DN, and JWT claims.
|
|
|
|
#### Re-review 2026-05-28 (commit `1eb6e97`)
|
|
|
|
Re-reviewed the module on a fresh baseline. All Security-001..007, 009..015 fixes remain
|
|
in place; the only Open carry-over is Security-008 (still correctly **Deferred** —
|
|
`ISecurityRepository` still exposes no per-set scope-rule query, so the N+1 in
|
|
`RoleMapper` cannot be removed from within this module). The original
|
|
Security-014 fix is now load-bearing: `RefreshToken` calls `IsIdleTimedOut` before
|
|
re-issuing, and the new cookie sliding-expiry tests in `SecurityReviewRegressionTests`
|
|
pin CentralUI-005's Security-side contract. This pass surfaced **6 new findings**
|
|
(Security-016..021): one Medium correctness/security defect, one Medium design-adherence
|
|
defect, and four Low. The most consequential is **Security-016** — when a user is
|
|
mapped to *both* a system-wide Deployment LDAP group (e.g. `SCADA-Deploy-All`) and a
|
|
site-scoped Deployment LDAP group (e.g. `SCADA-Deploy-SiteA`), `RoleMapper` silently
|
|
treats the union as site-scoped (the system-wide grant is dropped); the design's
|
|
"multiple groups grant multiple independent roles" intent is not honoured for this
|
|
mix-and-match case. **Security-017** is the cross-module partner of CentralUI-028:
|
|
`SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are declared and registered
|
|
but no production caller ever instantiates them — `[Authorize(Policy = RequireDeployment)]`
|
|
*does not* enforce the documented site scoping, callers must remember to inject
|
|
`SiteScopeService` and re-check `IsSiteAllowedAsync` themselves (which the two new
|
|
report pages flagged by CentralUI-028 forgot to do). The remaining Lows are: role names
|
|
are magic strings duplicated across `RoleMapper`, `SiteScopeAuthorizationHandler`, and
|
|
`AuthorizationPolicies` (Security-018); a service-account-rebind failure is reported
|
|
to the user as "Invalid username or password" — masking a misconfiguration as a
|
|
user-credential error (Security-019); required `SecurityOptions` fields
|
|
(`LdapServer`, `LdapSearchBase`) have no `IValidateOptions` startup check, so empty
|
|
values silently surface only on first login (Security-020); and the
|
|
`RequireHttpsCookie=false` dev opt-out emits no warning, so an HTTP production
|
|
deployment silently transmits the JWT bearer credential in cleartext (Security-021).
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ☑ | `uid=`/`cn=` attribute mismatch between search filter and fallback DN construction (Security-004); StartTLS branch is unreachable (Security-001). |
|
|
| 2 | Akka.NET conventions | ☑ | No actors in this module — `AddSecurityActors` is an empty placeholder. Nothing to assess. |
|
|
| 3 | Concurrency & thread safety | ☑ | Services are stateless and DI-scoped; LDAP sync calls wrapped in `Task.Run`. No shared mutable state. No issues found. |
|
|
| 4 | Error handling & resilience | ☑ | LDAP failure paths return structured `LdapAuthResult`; group-lookup failure is tolerated per design. `ct` not honored inside `Task.Run` bodies (Security-009). |
|
|
| 5 | Security | ☑ | StartTLS dead code (Security-001), cookie not `Secure` (Security-002), JWT key unvalidated (Security-003), DN injection (Security-005), no issuer/audience validation (Security-006), idle-timeout reset on refresh (Security-007). |
|
|
| 6 | Performance & resource management | ☑ | N+1 scope-rule query in `RoleMapper` (Security-008). `LdapConnection` correctly disposed via `using`. |
|
|
| 7 | Design-document adherence | ☑ | StartTLS unsupported and Secure cookie missing both contradict the design doc; design also says "Windows Integrated Authentication" in Responsibilities, contradicting its own Authentication section (Security-010). |
|
|
| 8 | Code organization & conventions | ☑ | `SecurityOptions` correctly owned by the component; repository interface in Commons. No issues found. |
|
|
| 9 | Testing coverage | ☑ | No tests for `RoleMapper` N+1 behavior, DN-injection inputs, StartTLS path, or idle-timeout-after-refresh. Insecure-config combinations under-tested (Security-011). |
|
|
| 10 | Documentation & comments | ☑ | `SecurityOptions` XML docs say direct bind uses `cn={username}` while the search filter uses `uid=` — comment is misleading (covered under Security-004). |
|
|
|
|
_Re-review (2026-05-28, `1eb6e97`):_
|
|
|
|
| # | Category | Examined | Notes |
|
|
|---|----------|----------|-------|
|
|
| 1 | Correctness & logic bugs | ☑ | `RoleMapper` drops a system-wide Deployment grant when the user is also in any site-scoped Deployment group (Security-016); hard-coded role-name string `"Deployment"` in two separate places allows a refactor to silently break site scoping (Security-018). |
|
|
| 2 | Akka.NET conventions | ☑ | No actors. `AddSecurityActors` is still a registration placeholder. No issues. |
|
|
| 3 | Concurrency & thread safety | ☑ | Services stateless; LDAP sync calls wrapped in `Task.Run` with the now-bounded timeout (Security-009 resolution holds). No issues found. |
|
|
| 4 | Error handling & resilience | ☑ | A service-account-rebind failure inside `AuthenticateAsync` is reported as "Invalid username or password", masking a misconfiguration as a user-credential error (Security-019). LDAP-failure rule + partial-outage path remain correctly enforced post-Security-012. |
|
|
| 5 | Security | ☑ | `SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are dead code — no policy is registered that uses them and no production caller instantiates them, so declarative `[Authorize]` does not enforce site scoping (Security-017, cross-module partner of CentralUI-028). `RequireHttpsCookie=false` dev opt-out has no warning path — a production misconfiguration silently transmits the JWT bearer credential over HTTP (Security-021). |
|
|
| 6 | Performance & resource management | ☑ | Security-008 N+1 remains correctly Deferred (still gated on `ISecurityRepository`). No new perf issues. |
|
|
| 7 | Design-document adherence | ☑ | `RoleMapper`'s drop-system-wide-on-any-scoped behaviour (Security-016) contradicts the design's "A user can hold multiple roles simultaneously … roles are independent — there is no implied hierarchy" rule for the union case; `SiteScopeRequirement` advertises a site-scope authorization pattern the implementation does not actually wire up (Security-017). |
|
|
| 8 | Code organization & conventions | ☑ | Role-name strings are duplicated as magic literals across `RoleMapper.cs`, `SiteScopeAuthorizationHandler.cs`, and `AuthorizationPolicies.cs` — only the audit roles have a single source of truth via `OperationalAuditRoles` / `AuditExportRoles` (Security-018). `SecurityOptions` defaults pass through to runtime with no `IValidateOptions` for required fields like `LdapServer` / `LdapSearchBase` (Security-020). |
|
|
| 9 | Testing coverage | ☑ | No test covers a user mapped to both a system-wide AND a site-scoped Deployment LDAP group (the Security-016 case). No test covers the `SiteScopeRequirement` cross-page integration — tests evaluate the handler in isolation, not the absence of a policy that uses it (Security-017). |
|
|
| 10 | Documentation & comments | ☑ | `SiteScopeAuthorizationHandler` XML doc describes a permission model no caller actually invokes (Security-017). Otherwise stable. |
|
|
|
|
## Findings
|
|
|
|
### Security-001 — StartTLS upgrade path is unreachable dead code
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:37-47` |
|
|
|
|
**Description**
|
|
|
|
When `LdapUseTls` is true the code sets `connection.SecureSocketLayer = true` (LDAPS).
|
|
The subsequent StartTLS block is guarded by `if (_options.LdapUseTls && !connection.SecureSocketLayer)`.
|
|
Because `SecureSocketLayer` was just set to `true`, the second condition `!connection.SecureSocketLayer`
|
|
is always false, so `connection.StartTls()` is never called. The design doc explicitly
|
|
states LDAP connections must use **"LDAPS (port 636) or StartTLS"** — StartTLS is in
|
|
practice unsupported. A deployment that intends to use StartTLS on port 389 would get a
|
|
plaintext LDAPS-mode connection attempt that fails, or worse, an operator may disable
|
|
TLS entirely to make it work, sending credentials in cleartext.
|
|
|
|
**Recommendation**
|
|
|
|
Introduce an explicit transport mode (e.g. `LdapTransport { Ldaps, StartTls, None }`)
|
|
or a separate `LdapUseStartTls` flag. For StartTLS, leave `SecureSocketLayer` false,
|
|
call `connection.Connect`, then call `connection.StartTls()` and verify the negotiated
|
|
session is encrypted before binding. Remove the unreachable conditional.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). Added an explicit `LdapTransport` enum
|
|
(`Ldaps`/`StartTls`/`None`); `SecureSocketLayer` is set only for LDAPS, and the
|
|
StartTLS branch now connects in plaintext, calls `StartTls()`, and verifies
|
|
`connection.Tls` before binding. `LdapUseTls` is retained as a compatibility shim
|
|
mapping onto the enum. Regression tests `AuthenticateAsync_StartTlsTransport_AttemptsConnection`
|
|
and `AuthenticateAsync_NoTlsTransport_RejectedWithoutAllowInsecure`.
|
|
|
|
### Security-002 — Authentication cookie is not marked `Secure`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/ServiceCollectionExtensions.cs:16-23` |
|
|
|
|
**Description**
|
|
|
|
`AddCookie` sets `HttpOnly = true` and `SameSite = Strict` but never sets
|
|
`options.Cookie.SecurePolicy`. The ASP.NET Core default is `CookieSecurePolicy.SameAsRequest`,
|
|
which permits the cookie (carrying the embedded JWT — a bearer credential) to be sent
|
|
over plain HTTP. The design doc states the cookie is **"HttpOnly and Secure (requires
|
|
HTTPS)"**. As written, the module does not enforce that requirement; a misconfigured or
|
|
HTTP-fronted deployment would transmit the session token in cleartext.
|
|
|
|
**Recommendation**
|
|
|
|
Set `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` in `AddCookie`. Consider
|
|
also setting `ExpireTimeSpan` and `SlidingExpiration` to align the cookie lifetime with
|
|
the documented 15-minute JWT / 30-minute idle policy.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). Confirmed the cookie is configured in this
|
|
module (`ServiceCollectionExtensions.AddSecurity`), not CentralUI — no misattribution.
|
|
Added `options.Cookie.SecurePolicy = CookieSecurePolicy.Always` so the JWT-bearing
|
|
cookie is never sent over plain HTTP. Regression test
|
|
`AddSecurity_AuthCookie_IsMarkedSecureAlways`. (`ExpireTimeSpan`/`SlidingExpiration`
|
|
tuning left as a separate, lower-priority improvement.)
|
|
|
|
### Security-003 — JWT signing key length is never validated
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | High |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/JwtTokenService.cs:33`, `src/ScadaLink.Security/SecurityOptions.cs:42` |
|
|
|
|
**Description**
|
|
|
|
`SecurityOptions.JwtSigningKey` defaults to `string.Empty` and is fed directly into
|
|
`new SymmetricSecurityKey(Encoding.UTF8.GetBytes(_options.JwtSigningKey))` with no
|
|
validation. HMAC-SHA256 requires a key of at least 256 bits (32 bytes); a short or empty
|
|
key produces a trivially forgeable token. The `SecurityHardeningTests` comment claims a
|
|
minimum length is "enforced", but no code in this module enforces it — the test only
|
|
asserts that a 32+ char key works. A deployment with a missing or short `JwtSigningKey`
|
|
would start successfully and issue weakly-signed tokens.
|
|
|
|
**Recommendation**
|
|
|
|
Validate `JwtSigningKey` at startup — fail fast if it is empty or shorter than 32 bytes.
|
|
Use an `IValidateOptions<SecurityOptions>` validator or guard in the `JwtTokenService`
|
|
constructor so a weak key is rejected before any token is issued.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `<pending>`). The `JwtTokenService` constructor now
|
|
fails fast with an `InvalidOperationException` when `JwtSigningKey` is empty or
|
|
shorter than 32 bytes (`SecurityOptions.MinJwtSigningKeyBytes`), so a weak key is
|
|
rejected before any token is issued. The misleading `SecurityOptions` XML doc was
|
|
corrected to state the requirement. Regression tests
|
|
`JwtTokenService_EmptySigningKey_ThrowsAtConstruction`,
|
|
`JwtTokenService_ShortSigningKey_ThrowsAtConstruction`, and
|
|
`JwtTokenService_AdequateSigningKey_ConstructsSuccessfully`.
|
|
|
|
### Security-004 — Search filter uses `uid=` while fallback DN construction uses `cn=`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:66`, `:138`, `:157-159` |
|
|
|
|
**Description**
|
|
|
|
`AuthenticateAsync` and `ResolveUserDnAsync` build the search filter as
|
|
`(uid={username})`, but the no-service-account fallback in `ResolveUserDnAsync`
|
|
constructs the bind DN as `cn={username},{LdapSearchBase}`. The `SecurityOptions.LdapServiceAccountDn`
|
|
XML comment also documents the fallback as `cn={username},{LdapSearchBase}`. A directory
|
|
keyed on `uid` will succeed via search-then-bind but fail via the direct-bind fallback
|
|
(and vice versa). The attribute used for lookup is hard-coded and inconsistent across
|
|
the two code paths, so the two configuration modes are not interchangeable.
|
|
|
|
**Recommendation**
|
|
|
|
Introduce a single configurable `LdapUserIdAttribute` (default `uid`) and use it
|
|
consistently in both the search filter and the fallback DN. Update the XML doc to match.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`). Confirmed: the search filter was hard-coded
|
|
`(uid={username})` (both in `AuthenticateAsync` and `ResolveUserDnAsync`) while the
|
|
fallback DN used `cn={username}` — the two auth modes were not interchangeable. Added
|
|
a configurable `SecurityOptions.LdapUserIdAttribute` (default `uid`) used for both the
|
|
search filter and the fallback DN via the new `BuildFallbackUserDn` helper, and
|
|
corrected the `LdapServiceAccountDn` XML doc to reference `{LdapUserIdAttribute}`.
|
|
Regression tests `BuildFallbackUserDn_UsesConfiguredUserIdAttribute`,
|
|
`BuildFallbackUserDn_HonoursNonDefaultUserIdAttribute`,
|
|
`SecurityOptions_LdapUserIdAttribute_DefaultsToUid`.
|
|
|
|
### Security-005 — DN injection in the no-service-account bind fallback
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:157-159` |
|
|
|
|
**Description**
|
|
|
|
When no service account is configured, the user-supplied `username` is interpolated
|
|
directly into a distinguished name: `$"cn={username},{LdapSearchBase}"`. `EscapeLdapFilter`
|
|
escapes *search-filter* metacharacters, but DN construction requires a different
|
|
escaping scheme (RFC 4514 — `,`, `+`, `"`, `\`, `<`, `>`, `;`, leading/trailing spaces).
|
|
No DN escaping is applied here. A username such as `victim,ou=admins` alters the DN
|
|
structure, allowing a caller to attempt a bind as a different DN than intended. Combined
|
|
with the `username.Contains('=')` shortcut at line 129 — which lets a caller supply a
|
|
full arbitrary DN — the fallback path gives the client undue control over the bind
|
|
identity.
|
|
|
|
**Recommendation**
|
|
|
|
Apply RFC 4514 DN-component escaping to `username` before interpolation, or use the
|
|
LDAP library's DN-builder API. Reconsider the `Contains('=')` shortcut — accepting a
|
|
raw DN from untrusted input is risky; restrict it or remove it.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`). Confirmed: the fallback path interpolated
|
|
the raw `username` straight into `cn={username},{LdapSearchBase}` with no DN escaping,
|
|
and the `username.Contains('=')` shortcut let a caller supply an arbitrary bind DN.
|
|
Added an RFC 4514 `EscapeLdapDn` helper (escapes `, + " \ < > ;`, leading/trailing
|
|
space, leading `#`, NUL) applied in `BuildFallbackUserDn`, so a username such as
|
|
`victim,ou=admins` can no longer alter the DN structure. The `Contains('=')` raw-DN
|
|
shortcut was removed entirely — untrusted input no longer controls the bind identity.
|
|
Regression tests `BuildFallbackUserDn_EscapesDnMetacharacters`,
|
|
`EscapeLdapDn_EscapesAllRfc4514Specials`, `EscapeLdapDn_EscapesLeadingAndTrailingSpaces`.
|
|
|
|
### Security-006 — JWT validation disables issuer and audience checks
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/JwtTokenService.cs:67-75`, `:56-59` |
|
|
|
|
**Description**
|
|
|
|
`ValidateToken` sets `ValidateIssuer = false` and `ValidateAudience = false`, and
|
|
`GenerateToken` never sets an `iss` or `aud`. With a shared symmetric HMAC key, any
|
|
other system or component that signs JWTs with the same key would produce tokens this
|
|
service accepts. While the design states the key is shared only between the two central
|
|
nodes, omitting issuer/audience binding removes a cheap defense-in-depth control and
|
|
makes accidental key reuse (e.g. the same secret used for another internal token)
|
|
silently exploitable.
|
|
|
|
**Recommendation**
|
|
|
|
Set a fixed `Issuer` and `Audience` (e.g. `"scadalink-central"`) when generating tokens
|
|
and enable `ValidateIssuer`/`ValidateAudience` with the matching expected values during
|
|
validation.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`). Confirmed: `GenerateToken` set neither `iss`
|
|
nor `aud` and `ValidateToken` had `ValidateIssuer = false`/`ValidateAudience = false`.
|
|
`GenerateToken` now binds `JwtTokenService.TokenIssuer`/`TokenAudience`
|
|
(both `"scadalink-central"`) into every token, and `ValidateToken` enables
|
|
`ValidateIssuer`/`ValidateAudience` against those fixed values — a token signed with
|
|
the shared key but a foreign issuer is now rejected. Regression tests
|
|
`GenerateToken_SetsIssuerAndAudience`, `ValidateToken_RejectsTokenWithWrongIssuer`,
|
|
`ValidateToken_AcceptsOwnIssuerAndAudience`.
|
|
|
|
### Security-007 — Idle-timeout claim is reset on every token refresh
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/JwtTokenService.cs:40`, `:111-123` |
|
|
|
|
**Description**
|
|
|
|
The design states the 30-minute idle timeout is tracked via a "last-activity timestamp
|
|
in the token", and `IsIdleTimedOut` reads the `LastActivity` claim. But `RefreshToken`
|
|
calls `GenerateToken`, which unconditionally writes `LastActivity = DateTimeOffset.UtcNow`.
|
|
Token refresh fires whenever a request arrives within ~5 minutes of expiry. The result
|
|
is that `LastActivity` reflects *token issuance time*, not genuine user activity — and
|
|
since refresh itself is a request, the timestamp keeps moving forward. A more subtle
|
|
consequence: the idle window is effectively measured from the last refresh, not the
|
|
last real interaction, so the documented "no requests within the idle window" semantics
|
|
are not faithfully implemented. The claim name `LastActivity` is also misleading.
|
|
|
|
**Recommendation**
|
|
|
|
Decide explicitly how activity is tracked. Either (a) carry the original `LastActivity`
|
|
forward across refreshes and update it only on real request handling in the middleware,
|
|
or (b) rename the claim to `IssuedAt`/`TokenCreated` and document that the idle window
|
|
is measured from issuance. Whichever is chosen, ensure `IsIdleTimedOut` and the refresh
|
|
path agree on the semantics.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending`). Confirmed: `RefreshToken` → `GenerateToken`
|
|
unconditionally wrote `LastActivity = UtcNow`, so the idle clock reset on every
|
|
refresh and the documented 30-minute idle timeout could never fire for a client that
|
|
polls in the background. Implemented option (a) — the Security-side half of the
|
|
documented "15-min sliding + 30-min idle" policy (the cross-module partner of
|
|
CentralUI-005): `GenerateToken` now takes an optional `lastActivity` anchor;
|
|
`RefreshToken` carries the **existing** `LastActivity` claim forward unchanged, and a
|
|
new explicit `RecordActivity` method advances the anchor to now — to be called by the
|
|
CentralUI request pipeline on genuine user interaction (not on a background refresh).
|
|
`IsIdleTimedOut` is unchanged and now agrees with the refresh path. The remaining
|
|
CentralUI-side wiring (calling `RecordActivity` from the middleware, setting
|
|
`SlidingExpiration`/`ExpireTimeSpan`) stays tracked under CentralUI-005; this finding's
|
|
Security-side defect — the reset-on-refresh bug — is fully fixed here. Regression tests
|
|
`RefreshToken_PreservesOriginalLastActivityClaim`,
|
|
`RefreshToken_DoesNotResetIdleTimeoutWhenUserIsActuallyIdle`,
|
|
`RecordActivity_UpdatesLastActivityToNow`.
|
|
|
|
### Security-008 — N+1 query loading site-scope rules in `RoleMapper`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Performance & resource management |
|
|
| Status | Deferred |
|
|
| Location | `src/ScadaLink.Security/RoleMapper.cs:25-48` |
|
|
|
|
**Description**
|
|
|
|
`MapGroupsToRolesAsync` first calls `GetAllMappingsAsync`, then inside the per-mapping
|
|
loop calls `GetScopeRulesForMappingAsync(mapping.Id, ct)` once for every matched
|
|
Deployment mapping. This is an N+1 query pattern executed on the login hot path and on
|
|
every 15-minute token refresh. With multiple site-scoped Deployment groups it issues a
|
|
round-trip per group.
|
|
|
|
**Recommendation**
|
|
|
|
Add a repository method that loads scope rules for a set of mapping IDs in one query
|
|
(or eager-loads them with the mappings), and resolve all scope rules with a single call.
|
|
|
|
**Resolution**
|
|
|
|
Deferred 2026-05-16 (commit `pending commit`). Finding confirmed accurate: `RoleMapper.MapGroupsToRolesAsync`
|
|
issues one `GetScopeRulesForMappingAsync` round-trip per matched Deployment mapping — a
|
|
genuine N+1 on the login / 15-minute-refresh path. However, the only correct fix
|
|
(a batch `GetScopeRulesForMappingsAsync(IEnumerable<int>)` repository method, or an
|
|
eager-load navigation property) requires changes to `ISecurityRepository`
|
|
(`src/ScadaLink.Commons`) and `SecurityRepository` (`src/ScadaLink.ConfigurationDatabase`).
|
|
Both are outside the Security module's permitted edit scope for this review pass, and the
|
|
existing `ISecurityRepository` surface offers no per-set scope-rule query, so the N+1
|
|
cannot be removed from within `RoleMapper.cs` alone. Severity is Low (bounded by the
|
|
number of site-scoped Deployment groups, typically small). Deferred to a cross-module
|
|
change that adds the batch repository method; `RoleMapper` should then resolve all scope
|
|
rules in a single call.
|
|
|
|
### Security-009 — CancellationToken not honored inside `Task.Run` LDAP calls
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:42`, `:46`, `:51`, `:56-57`, `:67-73`, `:135`, `:139-145` |
|
|
|
|
**Description**
|
|
|
|
The synchronous Novell LDAP calls are wrapped in `Task.Run(() => ..., ct)`. The `ct`
|
|
argument only prevents the work item from *starting* if cancellation is already
|
|
signaled; once a `connection.Connect`/`Bind`/`Search` call is in progress it cannot be
|
|
cancelled. A cancelled or timed-out login request will continue to occupy a thread-pool
|
|
thread and an LDAP connection until the blocking call returns on its own. There is also
|
|
no explicit network/operation timeout configured on the `LdapConnection`.
|
|
|
|
**Recommendation**
|
|
|
|
Configure `LdapConnection.ConnectionTimeout` and search/operation time limits so a
|
|
hung LDAP server cannot pin a thread indefinitely. Document that `ct` only guards
|
|
work-item scheduling, or implement a timeout-with-disconnect fallback.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending commit`). Confirmed: the synchronous Novell LDAP
|
|
calls were wrapped in `Task.Run(..., ct)` where `ct` only prevents the work item from
|
|
starting and cannot interrupt an in-progress blocking `Connect`/`Bind`/`Search`, and no
|
|
network/operation timeout was configured on the `LdapConnection`. Added a configurable
|
|
`SecurityOptions.LdapConnectionTimeoutMs` (default 10s) and a new `ApplyConnectionTimeout`
|
|
helper that sets both `LdapConnection.ConnectionTimeout` (socket connect) and
|
|
`LdapConstraints.TimeLimit` (per-operation limit) before connecting, so a hung LDAP
|
|
server can no longer pin a thread-pool thread indefinitely. The `ct`-only-guards-scheduling
|
|
limitation is now documented in the option's XML doc and an inline comment. Regression
|
|
tests `SecurityOptions_LdapConnectionTimeout_HasSaneDefault` and
|
|
`AuthenticateAsync_UnreachableHost_FailsWithinConfiguredTimeout`.
|
|
|
|
### Security-010 — Design doc contradicts itself on Windows Integrated Authentication
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Status | Resolved |
|
|
| Location | `docs/requirements/Component-Security.md:13` (vs. `:23`) |
|
|
|
|
**Description**
|
|
|
|
The Responsibilities section states the component authenticates "using Windows
|
|
Integrated Authentication", but the Authentication section (line 23) and CLAUDE.md
|
|
explicitly state **"No Windows Integrated Authentication ... authenticates directly
|
|
against LDAP/AD, not via Kerberos/NTLM"** — which is what the code actually does
|
|
(direct LDAP bind). The Responsibilities line is stale and contradicts both the rest of
|
|
the doc and the implementation.
|
|
|
|
**Recommendation**
|
|
|
|
Fix `Component-Security.md:13` to say "using a direct LDAP/Active Directory bind"
|
|
to match the implemented behavior and the rest of the document.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending commit`). Confirmed: the Responsibilities bullet at
|
|
line 13 said "using Windows Integrated Authentication", directly contradicting the
|
|
Authentication section's "No Windows Integrated Authentication ... authenticates directly
|
|
against LDAP/AD, not via Kerberos/NTLM", CLAUDE.md, and the implementation (`LdapAuthService`
|
|
performs a direct username/password bind). Documentation-only finding — no regression test
|
|
is meaningful. Reworded the Responsibilities bullet to "Authenticate users against
|
|
LDAP/Active Directory using a direct LDAP/AD bind (username/password)", matching the rest
|
|
of the document and the code.
|
|
|
|
### Security-011 — Missing tests for security-critical paths
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Status | Resolved |
|
|
| Location | `tests/ScadaLink.Security.Tests/UnitTest1.cs` |
|
|
|
|
**Description**
|
|
|
|
The test suite covers happy paths well but omits several security-relevant cases:
|
|
no test exercises the StartTLS path (Security-001), the DN-injection / `Contains('=')`
|
|
fallback inputs (Security-005), JWT validation with a too-short or empty signing key
|
|
(Security-003), `IsIdleTimedOut` returning true after a token has been refreshed
|
|
(Security-007), or the `uid`/`cn` mismatch in the no-service-account path (Security-004).
|
|
The integration `SecurityHardeningTests` only asserts default option values, not
|
|
enforcement. The test file is still named `UnitTest1.cs`.
|
|
|
|
**Recommendation**
|
|
|
|
Add negative/edge-case tests for the items above, particularly key-length rejection,
|
|
DN-escaping of hostile usernames, and idle-timeout behavior across a refresh. Rename
|
|
`UnitTest1.cs` to a descriptive name.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-16 (commit `pending commit`). Re-triage: most of the listed gaps were
|
|
already closed by the Security-001..007 fixes — the regression classes
|
|
`SecurityReviewRegressionTests` (StartTLS path, JWT empty/short-key rejection) and
|
|
`SecurityReviewRegressionTests2` (DN-injection / hostile-username escaping, `uid`/`cn`
|
|
fallback consistency, idle-timeout preserved across refresh) cover them. The finding's
|
|
reference to a `SecurityHardeningTests` class is stale — no such class exists in the
|
|
suite. Remaining gaps closed here: renamed the test file `UnitTest1.cs` →
|
|
`SecurityTests.cs`, and added a new `SecurityReviewRegressionTests3` class with the
|
|
LDAP-timeout coverage (Security-009) plus extra no-service-account / DN-path edge cases
|
|
(`BuildFallbackUserDn_NoSearchBase_ReturnsBareRdn`, `EscapeLdapDn_LeadingHash_IsEscaped`,
|
|
`EscapeLdapDn_NullOrEmpty_ReturnedUnchanged`). Full module suite: 54 tests, all green.
|
|
|
|
### Security-012 — Partial LDAP failure during login yields a roleless authenticated session
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:78-118` |
|
|
|
|
**Description**
|
|
|
|
In `AuthenticateAsync`, after the user bind succeeds, the group/attribute `Search` is
|
|
wrapped in a `try`/`catch (LdapException)` that logs a warning and continues — the
|
|
comment states "Auth succeeded even if attribute lookup failed". The method then returns
|
|
`new LdapAuthResult(true, displayName, username, groups, null)` with an **empty
|
|
`groups`** list. Because `RoleMapper.MapGroupsToRolesAsync` derives all roles from that
|
|
group list, a login during a *partial* LDAP outage (bind reachable, search/group server
|
|
unavailable, or a transient `LdapException` mid-search) produces a fully authenticated
|
|
session with **zero roles** — the user is logged in but silently stripped of all
|
|
permissions. The design's LDAP Connection Failure rule states new logins must **fail**
|
|
when LDAP is unavailable; this path instead admits the user. The caller cannot
|
|
distinguish "user genuinely belongs to no mapped groups" from "the group query failed",
|
|
so it cannot make the documented fail-the-login decision. The `while` loop's inner
|
|
`catch (LdapException) { break; }` (line 107-111) compounds this: a real error mid-enumeration
|
|
is indistinguishable from end-of-results and silently truncates the group list.
|
|
|
|
**Recommendation**
|
|
|
|
Treat a failed group/attribute lookup on the *initial login* as an authentication
|
|
failure: return `Success = false` with a "directory unavailable, try again" message, or
|
|
add an explicit flag on `LdapAuthResult` (e.g. `GroupLookupSucceeded`) so the caller can
|
|
apply the design's fail-the-login rule. The inner `while`-loop `catch` should not treat
|
|
an arbitrary `LdapException` as a benign end-of-results sentinel — only the specific
|
|
"no more results" condition should break the loop; any other exception should propagate
|
|
or be surfaced.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17 (commit `pending`). Confirmed: the group/attribute `Search` was
|
|
wrapped in a `catch (LdapException)` that logged a warning and returned
|
|
`new LdapAuthResult(true, …, groups: [])` — a partial LDAP outage produced an
|
|
authenticated, zero-role session — and the inner `while`-loop `catch { break; }` masked
|
|
real mid-enumeration errors as end-of-results. `AuthenticateAsync` now tracks a
|
|
`groupLookupSucceeded` flag (false when any `LdapException` is thrown by the search or
|
|
`Next()`); the inner swallowing catch was removed so such errors propagate; the new
|
|
`BuildAuthResultFromGroupLookup` helper returns a FAILED `LdapAuthResult`
|
|
("directory temporarily unavailable, please try again") on lookup failure while still
|
|
treating a genuine empty-groups result as a successful login. Regression tests
|
|
`BuildAuthResultFromGroupLookup_LookupFailed_FailsTheLogin`,
|
|
`BuildAuthResultFromGroupLookup_LookupSucceededNoGroups_IsAuthenticated`,
|
|
`BuildAuthResultFromGroupLookup_LookupSucceededWithGroups_CarriesGroups`.
|
|
|
|
### Security-013 — `ExtractFirstRdnValue` mis-parses group DNs containing escaped commas
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:258-269` |
|
|
|
|
**Description**
|
|
|
|
`ExtractFirstRdnValue` extracts a group name from a `memberOf` DN by taking the substring
|
|
between the first `=` and the first `,`. RFC 4514 permits a `,` inside an RDN value when
|
|
it is backslash-escaped — e.g. `cn=Acme\, Inc Operators,ou=groups,dc=example,dc=com`. The
|
|
method splits on the *escaped* comma and returns `Acme\` as the group name. `RoleMapper`
|
|
then matches that mangled name verbatim against configured `LdapGroupMapping.LdapGroupName`
|
|
values, so a group whose CN legitimately contains a comma silently fails to map to its
|
|
role — the user loses that role with no error. The same naive parsing also does not strip
|
|
the trailing backslash escape.
|
|
|
|
**Recommendation**
|
|
|
|
Parse the first RDN with an escape-aware scan: ignore a `,` preceded by an unescaped
|
|
`\`, and unescape RFC 4514 sequences in the extracted value. Alternatively use the LDAP
|
|
library's DN-parsing API (`LdapDN` / RDN accessors) rather than hand-rolled `IndexOf`.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17 (commit `pending`). Confirmed: `ExtractFirstRdnValue` took the
|
|
substring between the first `=` and the first `,`, splitting on an RFC 4514
|
|
backslash-escaped comma so a CN like `Acme\, Inc Operators` became `Acme\`. Replaced the
|
|
naive `IndexOf` scan with an escape-aware character scan that ignores a backslash-escaped
|
|
`,`, unescapes single-character and `\XX` hex escape sequences, and only terminates the
|
|
value on an unescaped comma — so a comma-containing group CN is returned intact and
|
|
matches its configured `LdapGroupName`. Method was also made `public` for direct
|
|
testing. Regression tests `ExtractFirstRdnValue_EscapedComma_KeepsWholeGroupName`,
|
|
`ExtractFirstRdnValue_PlainDn_ReturnsFirstRdnValue`,
|
|
`ExtractFirstRdnValue_SingleRdn_ReturnsValue`,
|
|
`ExtractFirstRdnValue_EscapedSpecials_AreUnescaped`.
|
|
|
|
### Security-014 — `RefreshToken` re-issues a token without checking the idle timeout
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/JwtTokenService.cs:156-169` |
|
|
|
|
**Description**
|
|
|
|
`RefreshToken` carries the existing `LastActivity` claim forward (correct, per the
|
|
Security-007 fix) and unconditionally issues a fresh token with a new 15-minute expiry.
|
|
It never calls `IsIdleTimedOut`. The documented 30-minute idle policy therefore depends
|
|
entirely on the CentralUI request pipeline calling `IsIdleTimedOut` *before* calling
|
|
`RefreshToken` — two independent, order-dependent checks with no enforcement that they
|
|
are used together. If a caller refreshes without first checking idle state (an easy
|
|
mistake, and there is no compiler or API signal preventing it), an idle-expired session
|
|
is silently renewed: each refresh resets the JWT expiry while `LastActivity` keeps
|
|
ageing, so the session can be kept alive indefinitely by background refreshes even
|
|
though the user has been idle for hours. `RefreshToken` is the natural place to enforce
|
|
the invariant but provides no safety net.
|
|
|
|
**Recommendation**
|
|
|
|
Have `RefreshToken` itself reject a principal that is already past the idle timeout —
|
|
return `null` (the same "cannot refresh" signal it already uses for missing claims) when
|
|
`IsIdleTimedOut(currentPrincipal)` is true — so the idle policy holds regardless of
|
|
caller discipline. Document that an idle-expired principal cannot be refreshed and add a
|
|
regression test covering refresh of a 31-minute-idle token.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17 (commit `pending`). Confirmed: `RefreshToken` carried `LastActivity`
|
|
forward and minted a fresh 15-minute token without ever calling `IsIdleTimedOut`, so the
|
|
idle policy depended entirely on caller discipline and a background refresh could keep an
|
|
idle-expired session alive indefinitely. `RefreshToken` now calls
|
|
`IsIdleTimedOut(currentPrincipal)` after the missing-claims check and returns `null`
|
|
(its existing "cannot refresh" signal) when the session is past the idle window, so the
|
|
30-minute idle policy holds regardless of caller order; XML doc updated to state the
|
|
invariant. Regression tests `RefreshToken_IdleExpiredPrincipal_ReturnsNull`,
|
|
`RefreshToken_ActiveSessionWithinIdleWindow_StillRefreshes`,
|
|
`RefreshToken_MissingLastActivityClaim_ReturnsNull`.
|
|
|
|
### Security-015 — Username is not trimmed before use in the LDAP filter, fallback DN, and JWT claims
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:20-21`, `:80`, `:122`, `:169`, `:193` |
|
|
|
|
**Description**
|
|
|
|
`AuthenticateAsync` rejects null/whitespace-only usernames via `IsNullOrWhiteSpace`, but
|
|
an otherwise-valid username with leading or trailing whitespace (`" alice"`, copy-paste
|
|
artefacts, mobile keyboards) is passed through verbatim. It flows into the LDAP search
|
|
filter `({attr}={EscapeLdapFilter(username)})`, into the fallback bind DN via
|
|
`BuildFallbackUserDn`, and — most consequentially — into the returned `LdapAuthResult`
|
|
and from there into the JWT `Username` claim. Most LDAP directories ignore surrounding
|
|
whitespace on a search term, so `" alice"` and `"alice"` may both authenticate but yield
|
|
JWT `Username` claims that differ by whitespace. Any downstream identity comparison
|
|
(audit-log `who`, site-scope lookups keyed by username, session de-duplication) then
|
|
sees two distinct identities for the same person.
|
|
|
|
**Recommendation**
|
|
|
|
Trim the username once at the top of `AuthenticateAsync` (after the `IsNullOrWhiteSpace`
|
|
guard) and use the trimmed value consistently for the filter, the DN, and the
|
|
`LdapAuthResult`. This normalises the identity that ends up in the JWT claim and audit
|
|
trail.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-17 (commit `pending`). Confirmed: an otherwise-valid username with
|
|
leading/trailing whitespace passed the `IsNullOrWhiteSpace` guard and flowed verbatim
|
|
into the LDAP filter, the fallback DN, and the returned `LdapAuthResult` (and thus the
|
|
JWT `Username` claim). Added a `NormalizeUsername` helper (trims whitespace) called once
|
|
at the top of `AuthenticateAsync` immediately after the guard; the local `username`
|
|
variable is reassigned to the trimmed value so the filter, fallback DN, and result all
|
|
use the single canonical identity. Regression tests
|
|
`NormalizeUsername_TrimsLeadingAndTrailingWhitespace`,
|
|
`BuildFallbackUserDn_TrimmedUsername_NoLeadingTrailingSpace`,
|
|
`AuthenticateAsync_UsernameWithSurroundingWhitespace_StillRejectedForInsecure`.
|
|
|
|
### Security-016 — `RoleMapper` silently drops the system-wide Deployment grant when a user is also in any site-scoped Deployment group
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/RoleMapper.cs:30-31`, `:41-55`, `:59` |
|
|
|
|
**Description**
|
|
|
|
`MapGroupsToRolesAsync` resolves the Deployment role's site scope as a single
|
|
`isSystemWide = hasDeploymentRole && !hasDeploymentWithScopeRules` flag computed across
|
|
ALL matched Deployment mappings. If a user is a member of both a system-wide Deployment
|
|
group (e.g. `SCADA-Deploy-All`, no scope rules) AND a site-scoped Deployment group
|
|
(e.g. `SCADA-Deploy-SiteA`, one scope rule for Site A), the second mapping sets
|
|
`hasDeploymentWithScopeRules = true`, so the final `isSystemWide` becomes `false` and
|
|
the returned `PermittedSiteIds` is just `[SiteA]`. The system-wide grant from
|
|
`SCADA-Deploy-All` is silently dropped — the user loses access to every other site, even
|
|
though one of their LDAP groups was intended to grant them system-wide reach. This
|
|
contradicts the design's "A user can hold multiple roles simultaneously … roles are
|
|
independent — there is no implied hierarchy" intent: the union of grants should be the
|
|
broadest grant in the set, not the narrowest. The mistake is also non-obvious to an
|
|
operator: from the Admin → LDAP Mappings page nothing flags that adding a site-scoped
|
|
Deployment mapping for a user already in `SCADA-Deploy-All` *removes* sites from their
|
|
effective grant. The downstream `SiteScopeService.IsSystemWideAsync` / `FilterSitesAsync`
|
|
faithfully reproduce this narrowing, so the user can no longer see or act on sites
|
|
outside `[SiteA]`.
|
|
|
|
**Recommendation**
|
|
|
|
Track the union semantics explicitly: if any matched Deployment mapping has no scope
|
|
rules, the user is system-wide regardless of what other mappings have. The simplest
|
|
change is to set `hasDeploymentWithScopeRules` only when the mapping has scope rules
|
|
AND another flag `hasUnscopedDeploymentMapping` is false; then compute
|
|
`isSystemWide = hasUnscopedDeploymentMapping || (hasDeploymentRole && !hasDeploymentWithScopeRules)`.
|
|
Equivalently: collect per-mapping `(hasRules, scopedSiteIds)` first, then
|
|
`isSystemWide = any mapping has hasRules==false`, and `permittedSiteIds = union of all
|
|
scopedSiteIds` (left empty for system-wide users). Add a regression test
|
|
`MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide` covering
|
|
the design's example pair `SCADA-Deploy-All` + `SCADA-Deploy-SiteA`.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-28 (commit pending). `RoleMapper.MapGroupsToRolesAsync` now tracks two
|
|
independent flags per matched Deployment mapping: `hasUnscopedDeploymentMapping` (any
|
|
matched mapping with no scope rules) and `hasScopedDeploymentMapping` (any matched
|
|
mapping with scope rules). `isSystemWide` is `hasUnscopedDeploymentMapping ||
|
|
(hasDeploymentRole && !hasScopedDeploymentMapping)` — so a user in both
|
|
`SCADA-Deploy-All` and `SCADA-Deploy-SiteA` is now correctly system-wide, with the
|
|
accumulated scope ids cleared. Magic string `"Deployment"` was replaced with the new
|
|
`Roles.Deployment` constant (Security-018). Regression test
|
|
`MapGroupsToRoles_UserInBothSystemWideAndScopedDeploymentGroup_IsSystemWide`
|
|
(seeds Site A, attaches a scope rule to the seeded `SCADA-Deploy-SiteA` mapping, and
|
|
asserts a user mapped via both `SCADA-Deploy-All` and `SCADA-Deploy-SiteA` resolves
|
|
to system-wide with empty `PermittedSiteIds`) fails on the pre-fix code and passes
|
|
after.
|
|
|
|
### Security-017 — `SiteScopeRequirement` / `SiteScopeAuthorizationHandler` are dead code from production callers — `[Authorize(Policy = RequireDeployment)]` does NOT enforce site scoping
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Medium |
|
|
| Category | Design-document adherence |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/SiteScopeAuthorizationHandler.cs:8-58`; `src/ScadaLink.Security/AuthorizationPolicies.cs:113-143` |
|
|
|
|
**Description**
|
|
|
|
The module declares `SiteScopeRequirement` (an `IAuthorizationRequirement` carrying a
|
|
`TargetSiteId`) and the matching `SiteScopeAuthorizationHandler` that combines the
|
|
Deployment role claim with the `SiteId` claims to enforce the design's site-scoping
|
|
rule. The handler is registered in `AddScadaLinkAuthorization`
|
|
(`services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>()`). But
|
|
no `AddPolicy` call ever wires the requirement to a named policy, and a grep across
|
|
`src/ScadaLink.CentralUI` and `src/ScadaLink.ManagementService` confirms that **no
|
|
production code ever instantiates `new SiteScopeRequirement(...)` or calls
|
|
`AuthorizeAsync(...)` with one** — the only callers are the unit tests in
|
|
`SecurityTests.cs:1146,1166,1185,1203`. The design + CLAUDE.md state that "Deployment
|
|
and Monitoring pages must filter every site/instance list through `FilterSitesAsync`
|
|
and re-check `IsSiteAllowedAsync` before any cross-site command", and the
|
|
CentralUI-028 finding (High, Open) confirms this is exactly the contract two new
|
|
report pages forgot — because there is no declarative `[Authorize(Policy = ...)]`
|
|
shortcut, callers must remember to inject `SiteScopeService` and write the check by
|
|
hand, and any new page that forgets is a silent regression with no compile-time or
|
|
test-time signal. The module's published surface advertises an authorization-handler
|
|
pattern that is, in practice, unwired plumbing.
|
|
|
|
**Recommendation**
|
|
|
|
Either (a) **delete** `SiteScopeRequirement` and `SiteScopeAuthorizationHandler` (and
|
|
the dead `IAuthorizationHandler` registration) and document `SiteScopeService` as the
|
|
sole site-scoping mechanism — this is the smaller change and matches what the codebase
|
|
actually does today; or, preferably, (b) **finish the wiring**: add a `RequireSiteScope`
|
|
policy that uses `SiteScopeRequirement` and provide a small helper / source generator
|
|
or analyzer that flags Deployment-policy-attributed pages without a site-scope check.
|
|
Either way, address the cross-module gap: CentralUI-028 stays open until production
|
|
pages reliably enforce the rule. If (b) is chosen, a route-parameter-aware
|
|
`IAuthorizationPolicyProvider` is needed so the policy can read the target site id from
|
|
the request — that is a meaningful design extension and would need to be planned
|
|
alongside the Central UI's existing `SiteScopeService` usage rather than replacing it
|
|
piecemeal.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-28 (commit pending) per recommendation option (a): deleted
|
|
`SiteScopeRequirement` and `SiteScopeAuthorizationHandler` outright, along with the
|
|
unwired `services.AddSingleton<IAuthorizationHandler, SiteScopeAuthorizationHandler>()`
|
|
registration in `AuthorizationPolicies.AddScadaLinkAuthorization` and the four
|
|
isolation tests in `SecurityTests.cs`. `SiteScopeService` (CentralUI-002) is now
|
|
documented as the sole site-scoping mechanism — the stale "mirrors
|
|
SiteScopeAuthorizationHandler" comment in `SiteScopeService.ResolveAsync` was rewritten
|
|
to say so. The cross-module partner CentralUI-028 is fixed in the same batch (the two
|
|
new report pages now consume `SiteScopeService`).
|
|
|
|
### Security-018 — Role names are hard-coded magic strings duplicated across `RoleMapper`, `SiteScopeAuthorizationHandler`, and `AuthorizationPolicies`
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/RoleMapper.cs:41`; `src/ScadaLink.Security/SiteScopeAuthorizationHandler.cs:36`; `src/ScadaLink.Security/AuthorizationPolicies.cs:118,121,124,95,107` |
|
|
|
|
**Description**
|
|
|
|
The role-name literals `"Admin"`, `"Design"`, `"Deployment"`, `"Audit"`, and
|
|
`"AuditReadOnly"` are duplicated as magic strings across three separate files:
|
|
`RoleMapper.cs:41` hard-codes `"Deployment"` to detect the site-scope branch;
|
|
`SiteScopeAuthorizationHandler.cs:36` independently hard-codes `"Deployment"` to gate
|
|
the handler; and `AuthorizationPolicies.cs:118,121,124` hard-code the four role names
|
|
as the policy `RequireClaim` values. Only the audit roles have a single source of truth
|
|
(via the `OperationalAuditRoles` / `AuditExportRoles` arrays on
|
|
`AuthorizationPolicies`). A future rename or addition of a role that misses any one of
|
|
these call sites silently breaks the system: e.g. renaming "Deployment" → "Deployer"
|
|
in `RoleMapper` alone would leave the policy still requiring `"Deployment"` (logins
|
|
get the new role name but the policy never matches), while changing it in the policy
|
|
alone would leave `RoleMapper` failing to populate scope rules for the renamed role.
|
|
The bug class is "string drift" — exactly the kind the `OperationalAuditRoles` constant
|
|
was introduced to prevent.
|
|
|
|
**Recommendation**
|
|
|
|
Introduce a `public static class Roles { public const string Admin = "Admin"; public const
|
|
string Design = "Design"; public const string Deployment = "Deployment"; public const string
|
|
Audit = "Audit"; public const string AuditReadOnly = "AuditReadOnly"; }` in the Security
|
|
project and replace every magic-string occurrence — including the elements of
|
|
`OperationalAuditRoles` and `AuditExportRoles` — with the constants. A single rename
|
|
will then either succeed everywhere or fail to compile.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-28 (commit pending). Added `src/ScadaLink.Security/Roles.cs` holding
|
|
`Admin`/`Design`/`Deployment`/`Audit`/`AuditReadOnly` as `public const string`
|
|
fields. Replaced every magic-string occurrence in this module:
|
|
`RoleMapper.MapGroupsToRolesAsync` now compares against `Roles.Deployment`;
|
|
`AuthorizationPolicies.AddScadaLinkAuthorization` binds `RequireClaim(...)` to
|
|
`Roles.{Admin,Design,Deployment}`; `OperationalAuditRoles` /
|
|
`AuditExportRoles` are now built from `Roles.Admin`, `Roles.Audit`, `Roles.AuditReadOnly`.
|
|
`SiteScopeAuthorizationHandler.cs` was deleted under Security-017, so its
|
|
`"Deployment"` literal is gone with it. A future rename now propagates by a
|
|
single edit or fails to compile.
|
|
|
|
### Security-019 — Service-account rebind failure is reported as "Invalid username or password" — masks misconfiguration as a user-credential error
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/LdapAuthService.cs:85-89`, `:147-151` |
|
|
|
|
**Description**
|
|
|
|
After the user's credentials bind successfully, `AuthenticateAsync` re-binds as the
|
|
configured service account to perform the group/attribute search
|
|
(`connection.Bind(_options.LdapServiceAccountDn, _options.LdapServiceAccountPassword)`).
|
|
A failure of this second bind — wrong service-account password, deleted/disabled
|
|
service-account, locked-out service-account — throws `LdapException` which is caught by
|
|
the broad outer `catch (LdapException)` and returned as
|
|
`new LdapAuthResult(false, null, username, null, "Invalid username or password.")`.
|
|
The user sees an "invalid credentials" message for *their* credentials even though
|
|
their bind succeeded and the failure was in the system's own service-account
|
|
configuration. Worse, every user attempting to log in sees the same incorrect message
|
|
during a service-account outage, which routes operators down the wrong incident path
|
|
(reset the user's password) instead of the right one (check the service-account
|
|
credentials). The successful user bind itself is also not auditable as a discrete
|
|
event because the result is "Invalid username or password" — indistinguishable from a
|
|
genuine bad-password attempt.
|
|
|
|
**Recommendation**
|
|
|
|
Wrap the service-account rebind in its own `try`/`catch (LdapException)` and surface a
|
|
distinct error: log `_logger.LogError(ex, "Service-account rebind failed; check
|
|
LdapServiceAccountDn / LdapServiceAccountPassword configuration")` and return
|
|
`new LdapAuthResult(false, null, username, null, "Authentication service is misconfigured. Contact an administrator.")`.
|
|
Add a regression test that exercises the service-account-bind failure path (a mocked
|
|
or seamed `LdapConnection.Bind` that throws on the second call) and asserts the
|
|
distinct error message.
|
|
|
|
**Resolution**
|
|
|
|
Resolved 2026-05-28 (commit pending). Added a new `ServiceAccountBindException`
|
|
(`src/ScadaLink.Security/ServiceAccountBindException.cs`) — deliberately NOT an
|
|
`LdapException` subtype so it short-circuits the generic LDAP catch — and a private
|
|
`BindServiceAccountAsync` helper on `LdapAuthService` that wraps both service-account
|
|
rebind sites (the post-user-bind group-lookup rebind AND the `ResolveUserDnAsync`
|
|
DN-search rebind). On `LdapException`, the helper logs Error
|
|
("Service-account rebind failed; check LdapServiceAccountDn /
|
|
LdapServiceAccountPassword configuration") and rethrows as `ServiceAccountBindException`,
|
|
which the outer `AuthenticateAsync` catch chain maps to the distinct user-facing message
|
|
"Authentication service is misconfigured. Contact an administrator." Service-account
|
|
faults no longer surface as "Invalid username or password". Regression test
|
|
`ServiceAccountBindException_DoesNotInheritLdapException_SoCatchOrderIsCorrect` pins the
|
|
load-bearing inheritance contract; a full end-to-end auth test that exercises the
|
|
service-account-bind failure path is not feasible without an LDAP seam in
|
|
`LdapAuthService` (the `LdapConnection` is constructed in-method), so the structural test
|
|
is the closest meaningful unit-level coverage.
|
|
|
|
### Security-020 — `SecurityOptions` has no startup validation for required fields (`LdapServer`, `LdapSearchBase`)
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Status | Resolved |
|
|
| Location | `src/ScadaLink.Security/SecurityOptions.cs:6-7`, `:36-37`; `src/ScadaLink.Security/ServiceCollectionExtensions.cs:13-30` |
|
|
|
|
**Resolution (2026-05-28):** added `SecurityOptionsValidator`
|
|
(`IValidateOptions<SecurityOptions>`) that rejects empty/whitespace
|
|
`LdapServer` and `LdapSearchBase` with messages naming the full
|
|
`Security:Field` key the operator would edit. `AddSecurity` registers it via
|
|
`services.AddOptions<SecurityOptions>().ValidateOnStart()` +
|
|
`TryAddEnumerable(... SecurityOptionsValidator)` so a misconfigured
|
|
`Security` section fails fast at boot rather than minutes later on the first
|
|
login. `JwtSigningKey` is deliberately left to `JwtTokenService`'s existing
|
|
length-aware constructor guard (Security-003). Regression tests in
|
|
`SecurityOptionsValidatorTests`: valid-options succeed; empty/whitespace
|
|
`LdapServer` and `LdapSearchBase` each fail with the key-naming message
|
|
(theory); both-empty reports both keys; `AddSecurity_RegistersSecurityOptionsValidator`
|
|
pins the DI wiring.
|
|
|
|
**Description**
|
|
|
|
`SecurityOptions.JwtSigningKey` correctly fails fast at `JwtTokenService` construction
|
|
(Security-003 fix), but the LDAP-side required fields — `LdapServer` (default
|
|
`string.Empty`) and `LdapSearchBase` (default `string.Empty`) — have no equivalent
|
|
guard. `AddSecurity` does not register an `IValidateOptions<SecurityOptions>`. A
|
|
deployment that fails to set `LdapServer` (a typo in the appsettings.json section name,
|
|
a missing environment-variable substitution, a misconfigured Docker compose file)
|
|
starts cleanly — the Central UI comes up, the login page loads, and only the first
|
|
authentication attempt fails with `LdapConnection.Connect("")` throwing a low-level
|
|
exception that bubbles up as the generic "An unexpected error occurred during
|
|
authentication." message. The misconfiguration surfaces minutes or hours into the
|
|
deploy, on the first real user login, rather than at startup where it is cheap to
|
|
diagnose.
|
|
|
|
**Recommendation**
|
|
|
|
Add an `IValidateOptions<SecurityOptions>` registered via
|
|
`services.AddOptions<SecurityOptions>().ValidateOnStart()` that fails when
|
|
`LdapServer` is null/whitespace, `LdapSearchBase` is null/whitespace, or
|
|
`LdapPort <= 0`. Combine with the existing `JwtTokenService` constructor check so
|
|
every required `SecurityOptions` field is enforced at startup, not at first use.
|
|
|
|
### Security-021 — `RequireHttpsCookie=false` dev opt-out has no warning path — an HTTP production deployment silently transmits the JWT bearer credential in cleartext
|
|
|
|
| | |
|
|
|--|--|
|
|
| Severity | Low |
|
|
| Category | Security |
|
|
| Status | Open |
|
|
| Location | `src/ScadaLink.Security/SecurityOptions.cs:100-108`; `src/ScadaLink.Security/ServiceCollectionExtensions.cs:54-59` |
|
|
|
|
**Description**
|
|
|
|
The Security-002 fix added `RequireHttpsCookie` (default `true`) so the auth cookie's
|
|
`SecurePolicy` is `Always` in production. The current Docker dev cluster sets
|
|
`RequireHttpsCookie=false` in both central nodes' `appsettings.Central.json`, downgrading
|
|
to `SameAsRequest` so the local HTTP cluster works. The downgrade is documented in the
|
|
XML doc but is silent at runtime: no log line warns that the cookie carrying the JWT
|
|
bearer credential is being sent over an HTTP-only path. A production deployment that
|
|
inherits a dev-derived appsettings — or that copy-pastes the docker config and forgets
|
|
to flip the flag — transmits the session token in cleartext with no diagnostic signal.
|
|
The default is correct; the gap is that the unsafe override has no operational guard.
|
|
|
|
**Recommendation**
|
|
|
|
In the `PostConfigure` block in `AddSecurity`, when `RequireHttpsCookie == false`, log
|
|
a single startup warning along the lines of `_logger.LogWarning("RequireHttpsCookie is
|
|
DISABLED — auth cookie SecurePolicy is SameAsRequest. The cookie-embedded JWT will be
|
|
transmitted over plain HTTP. This setting is intended for local dev only — set
|
|
SecurityOptions:RequireHttpsCookie=true in production.")`. Optionally, also fail
|
|
startup when `RequireHttpsCookie=false` AND `ASPNETCORE_ENVIRONMENT=Production`. Add a
|
|
regression test that asserts the warning is emitted when the flag is disabled and not
|
|
when it is enabled.
|