Files
scadaproj/code-reviews/Auth/findings.md
T
Joseph Doherty ae0ccc9a3a Mark all baseline code-review findings resolved
All 35 findings fixed in 544a6dd and marked Status: Resolved with resolution
notes. README regenerated: 0 pending / 35 total across 6 libraries.
2026-06-01 11:22:37 -04:00

255 lines
16 KiB
Markdown

# Code Review — Auth
| Field | Value |
|-------|-------|
| Library | `ZB.MOM.WW.Auth/` |
| Packages | `ZB.MOM.WW.Auth.Abstractions`, `ZB.MOM.WW.Auth.Ldap`, `ZB.MOM.WW.Auth.ApiKeys`, `ZB.MOM.WW.Auth.AspNetCore` |
| Component spec | `components/auth/spec/SPEC.md` |
| Shared contract | `components/auth/shared-contract/ZB.MOM.WW.Auth.md` |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 0 |
## Summary
The security-critical core of this library is sound: API-key secrets are 32-byte CSPRNG values,
hashing is HMAC-SHA256 keyed by an external pepper, verification uses
`CryptographicOperations.FixedTimeEquals`, the SQLite stores use parameterized SQL throughout,
LDAP filter values are RFC 4515-escaped before being interpolated into the search filter, group-DN
parsing is RFC 4514 escape-aware, and the LDAP service is genuinely fail-closed (never throws,
requires exactly one match plus a verified password plus >=1 group for success). Test coverage is
strong and contract-focused (~172 tests; every failure mode, the parser, the hasher, the escaper,
the stores, the migrator and the DI wiring are exercised). The findings below are about
robustness-around-the-edges and documentation, not the cryptographic core. The two themes are
(1) two narrow paths that can turn a recoverable condition into a thrown exception on the
authentication hot path, contradicting the library's own "only exception path is cancellation" /
"fail fast at startup" promises, and (2) public-facing documentation (README) that misstates the
hashing algorithm and the AspNetCore surface. The library is **not yet adopted** by the three apps
(tracked in `components/auth/GAPS.md`).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | Core logic correct; see Auth-003 (JSON parse can throw on hot path), Auth-005 (KeyPrefix value inconsistency) |
| 2 | Public API surface & compatibility | ☑ | Surface is minimal, sealed records, correct nullable annotations, internal seams hidden via `InternalsVisibleTo`. No issues found |
| 3 | Concurrency & thread safety | ☑ | Options/records immutable; services stateless singletons; fresh SQLite connection per op with WAL + busy_timeout. No issues found |
| 4 | Error handling & resilience | ☑ | LDAP path fully fail-closed; but see Auth-001 (no `ValidateOnStart`), Auth-002 (`MarkUsedAsync` failure fails a valid auth), Auth-003 |
| 5 | Security & secret handling | ☑ | Peppered HMAC-SHA256 + FixedTimeEquals, CSPRNG secrets, parameterized SQL, LDAP escaping, hash-free list projection, secrets never logged. See Auth-006 (TLS cert-validation hook) |
| 6 | Performance & resource management | ☑ | Connections disposed via `await using`; `LdapConnection` disposed; no hot-path allocs of note. No issues found |
| 7 | Spec & shared-contract adherence | ☑ | Matches SPEC §1–§5 closely (HMAC-SHA256 pepper, constant-time compare, discriminated failures, cookie hardening). See Auth-001 (insecure-refusal deferred to first use) |
| 8 | Packaging, dependencies & project layout | ☑ | Clean 4-package split, central versions, minimal deps, Abstractions has no runtime deps. No issues found |
| 9 | Testing coverage | ☑ | Broad and contract-level; gaps: no test for corrupt-`scopes` deserialization (Auth-003) or `MarkUsedAsync`-throws (Auth-002); `ValidateOnStart` behavior untested (Auth-001) |
| 10 | Documentation & XML docs | ☑ | XML docs thorough and accurate; README is stale — see Auth-004 |
## Findings
### Auth-001 — LDAP options validator is registered but never runs at startup
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.AspNetCore/ServiceCollectionExtensions.cs:40` |
**Description**
`AddZbLdapAuth` binds `LdapOptions` with `services.Configure<LdapOptions>(...)` and registers the
validator with a plain `services.AddSingleton<IValidateOptions<LdapOptions>, LdapOptionsValidator>()`.
It never calls `OptionsBuilder.ValidateOnStart()`. With `Microsoft.Extensions.Options`, an
`IValidateOptions<T>` runs only when the options instance is first materialized (`IOptions<T>.Value`),
not at host startup. Here the only consumer that materializes the options is the `ILdapAuthService`
singleton factory (line 48), whose lambda runs on first resolution of `ILdapAuthService` — i.e. on
the first login attempt, not at boot.
The result contradicts the explicit promise in both the method comment ("Fail fast at startup on a
misconfigured directory rather than on first login") and the validator's own XML doc ("Validates
`LdapOptions` at startup so a misconfiguration fails fast at boot"). It also weakens SPEC §2 step 1,
which says an insecure transport without `AllowInsecure` must be *refused* — that refusal is silently
deferred until a user tries to log in. A process can therefore boot green with `Transport=None` and
no `AllowInsecure`, and only surface the misconfiguration as a failed login later. Because this is a
shared library consumed by three hosts, the missing guard affects all of them.
**Recommendation**
Register options via the builder and add `ValidateOnStart`:
```csharp
services.AddOptions<LdapOptions>()
.Bind(config.GetSection(sectionPath))
.ValidateOnStart();
services.AddSingleton<IValidateOptions<LdapOptions>, LdapOptionsValidator>();
```
(`ValidateOnStart` relies on the host's start-time options validation, which the ASP.NET Core hosts
already run.) Add a test asserting that building + starting the host fails for an insecure config.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — `AddZbLdapAuth` now binds via `AddOptions<LdapOptions>().Bind(...).ValidateOnStart()` so the validator runs at host start, not on first login (test: `ServiceCollectionExtensionsTests.AddZbLdapAuth_StartingHost_FailsForInsecureConfig`, plus a `...SucceedsForSecureConfig` companion).
### Auth-002 — A failed `MarkUsedAsync` write turns a valid API key into a thrown exception
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/ApiKeyVerifier.cs:66` |
**Description**
After a successful constant-time secret match, `VerifyAsync` does
`await store.MarkUsedAsync(record.KeyId, _timeProvider.GetUtcNow(), ct)` before returning the
success result. `MarkUsedAsync` is a best-effort "last used" bookkeeping write. If that write fails
(SQLite still `SQLITE_BUSY` after the 5 s busy-timeout, disk full, DB locked by a long migration,
etc.), the exception propagates out of `VerifyAsync` and the otherwise-valid credential fails
authentication. The verifier's own class remark states "The only exception path is cancellation",
so callers (gRPC interceptors / middleware) will not be wrapping this in a try/catch and a transient
storage hiccup becomes an auth outage for a legitimate caller.
**Recommendation**
Treat the usage write as best-effort: either swap the order (return success, then fire-and-record),
or wrap the `MarkUsedAsync` call so a failure is logged/swallowed rather than failing the
verification — the authentication decision has already been made by line 60. If `MarkUsedAsync` is
allowed to throw, update the XML remark to say so. Add a test where the store's `MarkUsedAsync`
throws and assert the result is still `Succeeded == true`.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — `VerifyAsync` now wraps the best-effort `MarkUsedAsync` write in try/catch, re-throwing only `OperationCanceledException` and swallowing all other failures so a valid key still authenticates (tests: `ApiKeyVerifierTests.VerifyAsync_ValidKey_MarkUsedThrows_StillSucceeds` and `..._MarkUsedThrowsOperationCanceled_Propagates`).
### Auth-003 — Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/ScopeSerializer.cs:31`, `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/SqliteApiKeyStore.cs:71` |
**Description**
`SqliteApiKeyStore.ReadRecord` calls `ScopeSerializer.Deserialize(reader.GetString(4))`, which does
`JsonSerializer.Deserialize<string[]>(value)`. If the `scopes` column ever holds non-array or
malformed JSON (operator tampering, a partial write, a future schema/format change, or a bug in
another writer), `Deserialize` throws `JsonException`. That exception propagates out of
`FindByKeyIdAsync`, and therefore out of `ApiKeyVerifier.VerifyAsync`, as an unhandled exception —
again contradicting the "only exception path is cancellation" contract and converting a single
poisoned row into a crash on the auth path rather than a structured fail-closed result. The same
applies to the admin `ListAsync` projection.
**Recommendation**
Decide the policy explicitly. Either (a) catch `JsonException` in `ScopeSerializer.Deserialize` and
treat a malformed column as an empty/invalid scope set so verification fails closed structurally, or
(b) catch it at the store boundary and surface it as a controlled error. Add a test that inserts a
row with a malformed `scopes` value and asserts the verifier returns a structured failure (or a
documented exception) rather than an unhandled `JsonException`.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — `ScopeSerializer.Deserialize` now catches `JsonException` and fails closed to an empty scope set (policy (a)), so a corrupt `scopes`/`constraints` column degrades to a zero-scope identity instead of throwing through the store and verifier (tests: `SqliteApiKeyStoreTests.ScopeSerializer_DeserializeMalformed_ReturnsEmptySet_DoesNotThrow` (Theory) and `..._FindByKeyId_CorruptScopesColumn_ReturnsRecordWithEmptyScopes_DoesNotThrow`).
### Auth-004 — README misstates the hashing algorithm and the AspNetCore public surface
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/README.md:13`, `ZB.MOM.WW.Auth/README.md:14` |
**Description**
The README's package table contains two inaccuracies that matter for a security library:
1. It describes `ZB.MOM.WW.Auth.ApiKeys` as a store "with pepper-based **PBKDF2** hashing". The code
uses **HMAC-SHA256** keyed by the pepper (`ApiKeySecretHasher.Hash``new HMACSHA256(pepperBytes)`),
not PBKDF2. This is the correct algorithm per SPEC §4 ("HMAC-SHA256 with an external pepper"), so
the *code* is right and the *README* is wrong — but a reader auditing the crypto will be misled
about iteration count / KDF properties that do not exist here.
2. It says `ZB.MOM.WW.Auth.AspNetCore` provides "ASP.NET Core DI helpers (`AddZbAuth`) ... and
`LdapOptionsValidator` registration. **Wires together Ldap + ApiKeys + cookie middleware**." The
actual public surface is a single extension method `AddZbLdapAuth` (LDAP only). There is no
`AddZbAuth`, no API-key re-export from this package, and no cookie-middleware wiring —
`ZbCookieDefaults.Apply` is a helper the consumer must call itself, and API-key DI lives in the
separate `ZB.MOM.WW.Auth.ApiKeys` package (`AddZbApiKeyAuth`). The shared contract
(`ZB.MOM.WW.Auth.md`) likewise lists a combined DI shape that the code splits differently.
**Recommendation**
Fix the README table: change "PBKDF2" to "HMAC-SHA256", and correct the AspNetCore row to describe
the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`) and note that API-key
wiring is `AddZbApiKeyAuth` in the ApiKeys package. Optionally reconcile the shared-contract doc's
DI-method names with the shipped names.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — README package table corrected: "PBKDF2" → "pepper-keyed HMAC-SHA256" and the AspNetCore row rewritten to the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`, LDAP-only, no API-key/cookie-middleware wiring), with the stale `ApiKeys` dependency dropped from that row to match the `.csproj` (test: doc-only).
### Auth-005 — `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Admin/ApiKeyAdminCommands.cs:104` |
**Description**
When creating a key, `KeyPrefix` is set to `$"{_options.TokenPrefix}_{keyId}"` (e.g. `mxgw_key-1`),
embedding the key id into the column. Elsewhere `KeyPrefix` is treated as the bare token prefix:
the verifier test (`ApiKeyVerifierTests.BuildRecord`) and the store test
(`SqliteApiKeyStoreTests.SampleRecord`) seed it as `"mxgw"` / `"mxgw_ab12"`, and the `ApiKeyRecord`
contract gives no indication it should include the key id. `KeyPrefix` is not consulted during
verification (the token is re-parsed from the header), so there is no security impact — but the
persisted/displayed value is inconsistent and self-referential (the key id is already its own
column), which will confuse admin tooling that surfaces `KeyPrefix`.
**Recommendation**
Persist `KeyPrefix = _options.TokenPrefix` (the bare prefix) and let the key id remain the `KeyId`
column, or document precisely what `KeyPrefix` is meant to contain on the `ApiKeyRecord` /
`ApiKeyListItem` records and align all call sites to it.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — `CreateKeyAsync` now persists `KeyPrefix = _options.TokenPrefix` (the bare prefix) instead of `prefix_keyId`, aligning the write path with the read/test paths (test: `ApiKeyAdminCommandsTests.CreateKey_PersistsBareTokenPrefix_NotPrefixUnderscoreKeyId`).
### Auth-006 — LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook
| | |
|--|--|
| Severity | Low |
| Category | Security & secret handling |
| Status | Resolved |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.Ldap/Internal/NovellLdapConnection.cs:16` |
**Description**
`NovellLdapConnection.Connect` receives `bool allowInsecure` but never uses it; transport-security
correctness rests entirely on `LdapOptionsValidator` having run beforehand (and that registration is
itself lazy — see Auth-001). The method also sets `SecureSocketLayer`/`StartTls` but never configures
a server-certificate validation delegate or any pinning/CA hook. The underlying
`Novell.Directory.Ldap.NETStandard` 3.6.0 validates against the OS trust store by default (it does
*not* blind-accept), so this is not an unauthenticated-TLS hole today — hence Low — but the library
gives a consumer no seam to tighten (pin a CA, validate the SAN against `Server`, or deliberately
accept a self-signed dev cert), and the unused parameter reads as a guard that does nothing.
**Recommendation**
Either remove the unused `allowInsecure` parameter from the `ILdapConnection.Connect` seam or use it
to gate an explicit "accept invalid cert" path for dev only. Add an optional
`RemoteCertificateValidationCallback` (or CA-pinning option) so production consumers can harden LDAPS
beyond the OS default, and document the default validation behavior.
**Resolution**
Resolved in `544a6dd`, 2026-06-01 — added an optional `LdapOptions.ServerCertificateValidationCallback` (BCL `RemoteCertificateValidationCallback`, no new dependency) threaded through the `ILdapConnection.Connect` seam; `NovellLdapConnection` now builds `LdapConnectionOptions.ConfigureRemoteCertificateValidationCallback(...)` (a supplied callback wins; otherwise the previously-dead `allowInsecure` gates a dev-only accept-any path; otherwise OS-trust-store default), and the default validation behaviour is documented in XML/README (tests: `LdapAuthServiceTests.Connect_ReceivesConfiguredCertValidationCallback`, `..._Connect_ReceivesAllowInsecureFlag_FromOptions`, `..._Connect_NoCertCallbackConfigured_PassesNull`).