# Code Review — Auth | Field | Value | |-------|-------| | Library | `ZB.MOM.WW.Auth/` | | Packages | `ZB.MOM.WW.Auth.Abstractions`, `ZB.MOM.WW.Auth.Ldap`, `ZB.MOM.WW.Auth.ApiKeys`, `ZB.MOM.WW.Auth.AspNetCore` | | Component spec | `components/auth/spec/SPEC.md` | | Shared contract | `components/auth/shared-contract/ZB.MOM.WW.Auth.md` | | Status | Reviewed | | Last reviewed | 2026-06-01 | | Reviewer | Claude (automated baseline) | | Commit reviewed | `5f75cd4` | | Open findings | 0 | ## Summary The security-critical core of this library is sound: API-key secrets are 32-byte CSPRNG values, hashing is HMAC-SHA256 keyed by an external pepper, verification uses `CryptographicOperations.FixedTimeEquals`, the SQLite stores use parameterized SQL throughout, LDAP filter values are RFC 4515-escaped before being interpolated into the search filter, group-DN parsing is RFC 4514 escape-aware, and the LDAP service is genuinely fail-closed (never throws, requires exactly one match plus a verified password plus >=1 group for success). Test coverage is strong and contract-focused (~172 tests; every failure mode, the parser, the hasher, the escaper, the stores, the migrator and the DI wiring are exercised). The findings below are about robustness-around-the-edges and documentation, not the cryptographic core. The two themes are (1) two narrow paths that can turn a recoverable condition into a thrown exception on the authentication hot path, contradicting the library's own "only exception path is cancellation" / "fail fast at startup" promises, and (2) public-facing documentation (README) that misstates the hashing algorithm and the AspNetCore surface. The library is **not yet adopted** by the three apps (tracked in `components/auth/GAPS.md`). ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ☑ | Core logic correct; see Auth-003 (JSON parse can throw on hot path), Auth-005 (KeyPrefix value inconsistency) | | 2 | Public API surface & compatibility | ☑ | Surface is minimal, sealed records, correct nullable annotations, internal seams hidden via `InternalsVisibleTo`. No issues found | | 3 | Concurrency & thread safety | ☑ | Options/records immutable; services stateless singletons; fresh SQLite connection per op with WAL + busy_timeout. No issues found | | 4 | Error handling & resilience | ☑ | LDAP path fully fail-closed; but see Auth-001 (no `ValidateOnStart`), Auth-002 (`MarkUsedAsync` failure fails a valid auth), Auth-003 | | 5 | Security & secret handling | ☑ | Peppered HMAC-SHA256 + FixedTimeEquals, CSPRNG secrets, parameterized SQL, LDAP escaping, hash-free list projection, secrets never logged. See Auth-006 (TLS cert-validation hook) | | 6 | Performance & resource management | ☑ | Connections disposed via `await using`; `LdapConnection` disposed; no hot-path allocs of note. No issues found | | 7 | Spec & shared-contract adherence | ☑ | Matches SPEC §1–§5 closely (HMAC-SHA256 pepper, constant-time compare, discriminated failures, cookie hardening). See Auth-001 (insecure-refusal deferred to first use) | | 8 | Packaging, dependencies & project layout | ☑ | Clean 4-package split, central versions, minimal deps, Abstractions has no runtime deps. No issues found | | 9 | Testing coverage | ☑ | Broad and contract-level; gaps: no test for corrupt-`scopes` deserialization (Auth-003) or `MarkUsedAsync`-throws (Auth-002); `ValidateOnStart` behavior untested (Auth-001) | | 10 | Documentation & XML docs | ☑ | XML docs thorough and accurate; README is stale — see Auth-004 | ## Findings ### Auth-001 — LDAP options validator is registered but never runs at startup | | | |--|--| | Severity | Medium | | Category | Error handling & resilience | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.AspNetCore/ServiceCollectionExtensions.cs:40` | **Description** `AddZbLdapAuth` binds `LdapOptions` with `services.Configure(...)` and registers the validator with a plain `services.AddSingleton, LdapOptionsValidator>()`. It never calls `OptionsBuilder.ValidateOnStart()`. With `Microsoft.Extensions.Options`, an `IValidateOptions` runs only when the options instance is first materialized (`IOptions.Value`), not at host startup. Here the only consumer that materializes the options is the `ILdapAuthService` singleton factory (line 48), whose lambda runs on first resolution of `ILdapAuthService` — i.e. on the first login attempt, not at boot. The result contradicts the explicit promise in both the method comment ("Fail fast at startup on a misconfigured directory rather than on first login") and the validator's own XML doc ("Validates `LdapOptions` at startup so a misconfiguration fails fast at boot"). It also weakens SPEC §2 step 1, which says an insecure transport without `AllowInsecure` must be *refused* — that refusal is silently deferred until a user tries to log in. A process can therefore boot green with `Transport=None` and no `AllowInsecure`, and only surface the misconfiguration as a failed login later. Because this is a shared library consumed by three hosts, the missing guard affects all of them. **Recommendation** Register options via the builder and add `ValidateOnStart`: ```csharp services.AddOptions() .Bind(config.GetSection(sectionPath)) .ValidateOnStart(); services.AddSingleton, LdapOptionsValidator>(); ``` (`ValidateOnStart` relies on the host's start-time options validation, which the ASP.NET Core hosts already run.) Add a test asserting that building + starting the host fails for an insecure config. **Resolution** Resolved in `544a6dd`, 2026-06-01 — `AddZbLdapAuth` now binds via `AddOptions().Bind(...).ValidateOnStart()` so the validator runs at host start, not on first login (test: `ServiceCollectionExtensionsTests.AddZbLdapAuth_StartingHost_FailsForInsecureConfig`, plus a `...SucceedsForSecureConfig` companion). ### Auth-002 — A failed `MarkUsedAsync` write turns a valid API key into a thrown exception | | | |--|--| | Severity | Medium | | Category | Error handling & resilience | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/ApiKeyVerifier.cs:66` | **Description** After a successful constant-time secret match, `VerifyAsync` does `await store.MarkUsedAsync(record.KeyId, _timeProvider.GetUtcNow(), ct)` before returning the success result. `MarkUsedAsync` is a best-effort "last used" bookkeeping write. If that write fails (SQLite still `SQLITE_BUSY` after the 5 s busy-timeout, disk full, DB locked by a long migration, etc.), the exception propagates out of `VerifyAsync` and the otherwise-valid credential fails authentication. The verifier's own class remark states "The only exception path is cancellation", so callers (gRPC interceptors / middleware) will not be wrapping this in a try/catch and a transient storage hiccup becomes an auth outage for a legitimate caller. **Recommendation** Treat the usage write as best-effort: either swap the order (return success, then fire-and-record), or wrap the `MarkUsedAsync` call so a failure is logged/swallowed rather than failing the verification — the authentication decision has already been made by line 60. If `MarkUsedAsync` is allowed to throw, update the XML remark to say so. Add a test where the store's `MarkUsedAsync` throws and assert the result is still `Succeeded == true`. **Resolution** Resolved in `544a6dd`, 2026-06-01 — `VerifyAsync` now wraps the best-effort `MarkUsedAsync` write in try/catch, re-throwing only `OperationCanceledException` and swallowing all other failures so a valid key still authenticates (tests: `ApiKeyVerifierTests.VerifyAsync_ValidKey_MarkUsedThrows_StillSucceeds` and `..._MarkUsedThrowsOperationCanceled_Propagates`). ### Auth-003 — Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/ScopeSerializer.cs:31`, `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/SqliteApiKeyStore.cs:71` | **Description** `SqliteApiKeyStore.ReadRecord` calls `ScopeSerializer.Deserialize(reader.GetString(4))`, which does `JsonSerializer.Deserialize(value)`. If the `scopes` column ever holds non-array or malformed JSON (operator tampering, a partial write, a future schema/format change, or a bug in another writer), `Deserialize` throws `JsonException`. That exception propagates out of `FindByKeyIdAsync`, and therefore out of `ApiKeyVerifier.VerifyAsync`, as an unhandled exception — again contradicting the "only exception path is cancellation" contract and converting a single poisoned row into a crash on the auth path rather than a structured fail-closed result. The same applies to the admin `ListAsync` projection. **Recommendation** Decide the policy explicitly. Either (a) catch `JsonException` in `ScopeSerializer.Deserialize` and treat a malformed column as an empty/invalid scope set so verification fails closed structurally, or (b) catch it at the store boundary and surface it as a controlled error. Add a test that inserts a row with a malformed `scopes` value and asserts the verifier returns a structured failure (or a documented exception) rather than an unhandled `JsonException`. **Resolution** Resolved in `544a6dd`, 2026-06-01 — `ScopeSerializer.Deserialize` now catches `JsonException` and fails closed to an empty scope set (policy (a)), so a corrupt `scopes`/`constraints` column degrades to a zero-scope identity instead of throwing through the store and verifier (tests: `SqliteApiKeyStoreTests.ScopeSerializer_DeserializeMalformed_ReturnsEmptySet_DoesNotThrow` (Theory) and `..._FindByKeyId_CorruptScopesColumn_ReturnsRecordWithEmptyScopes_DoesNotThrow`). ### Auth-004 — README misstates the hashing algorithm and the AspNetCore public surface | | | |--|--| | Severity | Low | | Category | Documentation & XML docs | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/README.md:13`, `ZB.MOM.WW.Auth/README.md:14` | **Description** The README's package table contains two inaccuracies that matter for a security library: 1. It describes `ZB.MOM.WW.Auth.ApiKeys` as a store "with pepper-based **PBKDF2** hashing". The code uses **HMAC-SHA256** keyed by the pepper (`ApiKeySecretHasher.Hash` → `new HMACSHA256(pepperBytes)`), not PBKDF2. This is the correct algorithm per SPEC §4 ("HMAC-SHA256 with an external pepper"), so the *code* is right and the *README* is wrong — but a reader auditing the crypto will be misled about iteration count / KDF properties that do not exist here. 2. It says `ZB.MOM.WW.Auth.AspNetCore` provides "ASP.NET Core DI helpers (`AddZbAuth`) ... and `LdapOptionsValidator` registration. **Wires together Ldap + ApiKeys + cookie middleware**." The actual public surface is a single extension method `AddZbLdapAuth` (LDAP only). There is no `AddZbAuth`, no API-key re-export from this package, and no cookie-middleware wiring — `ZbCookieDefaults.Apply` is a helper the consumer must call itself, and API-key DI lives in the separate `ZB.MOM.WW.Auth.ApiKeys` package (`AddZbApiKeyAuth`). The shared contract (`ZB.MOM.WW.Auth.md`) likewise lists a combined DI shape that the code splits differently. **Recommendation** Fix the README table: change "PBKDF2" to "HMAC-SHA256", and correct the AspNetCore row to describe the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`) and note that API-key wiring is `AddZbApiKeyAuth` in the ApiKeys package. Optionally reconcile the shared-contract doc's DI-method names with the shipped names. **Resolution** Resolved in `544a6dd`, 2026-06-01 — README package table corrected: "PBKDF2" → "pepper-keyed HMAC-SHA256" and the AspNetCore row rewritten to the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`, LDAP-only, no API-key/cookie-middleware wiring), with the stale `ApiKeys` dependency dropped from that row to match the `.csproj` (test: doc-only). ### Auth-005 — `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path | | | |--|--| | Severity | Low | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Admin/ApiKeyAdminCommands.cs:104` | **Description** When creating a key, `KeyPrefix` is set to `$"{_options.TokenPrefix}_{keyId}"` (e.g. `mxgw_key-1`), embedding the key id into the column. Elsewhere `KeyPrefix` is treated as the bare token prefix: the verifier test (`ApiKeyVerifierTests.BuildRecord`) and the store test (`SqliteApiKeyStoreTests.SampleRecord`) seed it as `"mxgw"` / `"mxgw_ab12"`, and the `ApiKeyRecord` contract gives no indication it should include the key id. `KeyPrefix` is not consulted during verification (the token is re-parsed from the header), so there is no security impact — but the persisted/displayed value is inconsistent and self-referential (the key id is already its own column), which will confuse admin tooling that surfaces `KeyPrefix`. **Recommendation** Persist `KeyPrefix = _options.TokenPrefix` (the bare prefix) and let the key id remain the `KeyId` column, or document precisely what `KeyPrefix` is meant to contain on the `ApiKeyRecord` / `ApiKeyListItem` records and align all call sites to it. **Resolution** Resolved in `544a6dd`, 2026-06-01 — `CreateKeyAsync` now persists `KeyPrefix = _options.TokenPrefix` (the bare prefix) instead of `prefix_keyId`, aligning the write path with the read/test paths (test: `ApiKeyAdminCommandsTests.CreateKey_PersistsBareTokenPrefix_NotPrefixUnderscoreKeyId`). ### Auth-006 — LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook | | | |--|--| | Severity | Low | | Category | Security & secret handling | | Status | Resolved | | Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.Ldap/Internal/NovellLdapConnection.cs:16` | **Description** `NovellLdapConnection.Connect` receives `bool allowInsecure` but never uses it; transport-security correctness rests entirely on `LdapOptionsValidator` having run beforehand (and that registration is itself lazy — see Auth-001). The method also sets `SecureSocketLayer`/`StartTls` but never configures a server-certificate validation delegate or any pinning/CA hook. The underlying `Novell.Directory.Ldap.NETStandard` 3.6.0 validates against the OS trust store by default (it does *not* blind-accept), so this is not an unauthenticated-TLS hole today — hence Low — but the library gives a consumer no seam to tighten (pin a CA, validate the SAN against `Server`, or deliberately accept a self-signed dev cert), and the unused parameter reads as a guard that does nothing. **Recommendation** Either remove the unused `allowInsecure` parameter from the `ILdapConnection.Connect` seam or use it to gate an explicit "accept invalid cert" path for dev only. Add an optional `RemoteCertificateValidationCallback` (or CA-pinning option) so production consumers can harden LDAPS beyond the OS default, and document the default validation behavior. **Resolution** Resolved in `544a6dd`, 2026-06-01 — added an optional `LdapOptions.ServerCertificateValidationCallback` (BCL `RemoteCertificateValidationCallback`, no new dependency) threaded through the `ILdapConnection.Connect` seam; `NovellLdapConnection` now builds `LdapConnectionOptions.ConfigureRemoteCertificateValidationCallback(...)` (a supplied callback wins; otherwise the previously-dead `allowInsecure` gates a dev-only accept-any path; otherwise OS-trust-store default), and the default validation behaviour is documented in XML/README (tests: `LdapAuthServiceTests.Connect_ReceivesConfiguredCertValidationCallback`, `..._Connect_ReceivesAllowInsecureFlag_FromOptions`, `..._Connect_NoCertCallbackConfigured_PassesNull`).