diff --git a/code-reviews/Audit/findings.md b/code-reviews/Audit/findings.md index 2734f51..adbb946 100644 --- a/code-reviews/Audit/findings.md +++ b/code-reviews/Audit/findings.md @@ -6,31 +6,222 @@ | Packages | `ZB.MOM.WW.Audit` | | Component spec | `components/audit/spec/SPEC.md` | | Shared contract | `components/audit/shared-contract/ZB.MOM.WW.Audit.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 5 | ## Summary -_Not yet reviewed._ +`ZB.MOM.WW.Audit` is a small, clean, well-scoped library: one canonical `AuditEvent` +sealed record, a 3-value `AuditOutcome` enum, two narrow seams (`IAuditWriter`, +`IAuditRedactor`), five shipped helpers, and a single `AddZbAudit` registration. The record +matches `EVENT-MODEL.md` field-for-field (names, types, required/optional split, and the +`OccurredAtUtc` UTC-normalization contract — including value-equality on the normalized +instant). The dependency footprint is exactly as specified: the only non-BCL reference is +`Microsoft.Extensions.DependencyInjection.Abstractions` (verified against the built +`deps.json`), with no Akka/EF/SQLite/Serilog/Telemetry leakage. Guard clauses are present on +every public entry point, `RedactingAuditWriter` guarantees redaction runs before the inner +writer (no unredacted-PII leak), and `CompositeAuditWriter` isolates per-writer failures so a +throwing writer does not lose downstream events. 19 tests cover the happy paths and the most +important contract guards. No Critical or High findings. The five findings are all Medium/Low +and concern: a cancellation-propagation tension with the "never throw to the caller" writer +contract (Audit-001), a partial over-redaction path in `TruncatingAuditRedactor` (Audit-002), +mutable options on a singleton redactor (Audit-003), un-emitted XML docs (Audit-004, a +family-wide baseline pattern), and a couple of missing edge-case tests (Audit-005). ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | `Truncate` boundary logic correct for sane configs; marker-longer-than-max clips safely. Partial over-redaction edge under negative max → Audit-002. | +| 2 | Public API surface & compatibility | ☑ | Minimal, intentional surface; `sealed` records/classes; nullable annotations correct; no internal-type leakage. Options type is a mutable class, not the record the contract describes → Audit-003. | +| 3 | Concurrency & thread safety | ☑ | `AuditEvent` immutable (readonly backing field); singletons (`NoOp`/`Null`) stateless; composite snapshots its writers via `ToArray()`. Mutable singleton options noted in Audit-003. | +| 4 | Error handling & resilience | ☑ | Guard clauses on all entry points; composite swallows writer faults. OCE re-throw vs writer contract → Audit-001; partial over-redact → Audit-002. | +| 5 | Security & secret handling | ☑ | `RedactingAuditWriter` applies the redactor strictly before the inner writer — no unredacted-event leak. `Actor` kept a plain string (no Auth coupling), as specified. No secrets in code/messages. | +| 6 | Performance & resource management | ☑ | No `IDisposable` to manage; no hot-path allocations beyond a single `with`-copy per redact; DI registration is trivial. No issues found. | +| 7 | Spec & shared-contract adherence | ☑ | `AuditEvent` + `AuditOutcome` match `EVENT-MODEL.md` exactly; `AddZbAudit` uses `TryAdd` as specified. OCE re-throw deviates from the §1 writer hard rule → Audit-001. | +| 8 | Packaging, dependencies & project layout | ☑ | Sole non-BCL dep confirmed via `deps.json`; net10.0; correct PackageId/version (0.1.0). XML docs not packed → Audit-004. | +| 9 | Testing coverage | ☑ | 19 tests; public contract exercised (round-trip, UTC normalization, fan-out, cancellation, redaction ordering). Missing never-throw/over-redact and null-writer cases → Audit-005. | +| 10 | Documentation & XML docs | ☑ | XML docs present and accurate on the whole public surface; README accurate. Docs not emitted to a file → Audit-004. | ## Findings -_No findings recorded yet._ +### Audit-001 — `CompositeAuditWriter` re-throws `OperationCanceledException` to the caller, contradicting the "must not throw" writer contract + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/CompositeAuditWriter.cs:24` | + +**Description** + +`IAuditWriter`'s hard contract is unambiguous: "Best-effort; must not throw to the caller" +(`IAuditWriter.cs:11`), and SPEC §1 states "`WriteAsync` MUST NOT throw to the caller … a +failed write must never abort the user-facing action it is recording." The shared contract +further says a cancellation "does not constitute a contract violation" and the implementation +"may choose to complete a partially-written event anyway" — it permits ignoring the token, not +throwing on it. + +`CompositeAuditWriter.WriteAsync` re-throws `OperationCanceledException` +(`catch (OperationCanceledException) { throw; }`, line 24). The very common call shape is to +pass the ambient request token, e.g. `writer.WriteAsync(evt, httpContext.RequestAborted)`. When +a client disconnects mid-request, the inner writer observes the cancelled token, throws OCE, and +the composite propagates it straight into the user-facing action — exactly the abort the seam +promises never to cause. This also directly contradicts the helper's own XML doc, which claims +"the fan-out drains and the caller is never aborted" (`CompositeAuditWriter.cs:4`). Because the +composite is the recommended way to wire multiple sinks, every consumer that uses it inherits +this behaviour; the only workaround is to not pass a request-scoped token, which is non-obvious. + +Note this is a deliberate, tested choice (`CompositeAuditWriterTests.cs:41`), so it is a +design/contract conflict rather than an oversight — but the conflict is real: the writer contract +says "never throw," the composite says "drains, never aborts," and the code throws on +cancellation. + +**Recommendation** + +Reconcile the contract and the implementation. Either (a) honour the "never throw" rule +literally — swallow OCE like any other writer failure (drop the remaining writers or continue, +but do not propagate), and update the XML doc accordingly; or (b) if cancellation propagation is +genuinely wanted, amend the `IAuditWriter` SPEC/shared-contract to carve out OCE explicitly and +add a doc warning that consumers must not pass a request-scoped token they cannot afford to see +surface. Option (a) is the lower-blast-radius choice and matches the seam's stated intent. + +**Resolution** + +_Unresolved._ + +### Audit-002 — `TruncatingAuditRedactor` over-redaction is partial: the catch path scrubs only `DetailsJson`, leaving `Target` unredacted + +| | | +|--|--| +| Severity | Low | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/TruncatingAuditRedactor.cs:27-31`, `:34-40` | + +**Description** + +The redactor's over-redact fallback returns `rawEvent with { DetailsJson = null }` +(line 30). That scrubs only `DetailsJson` — `Target`, which this redactor is also responsible +for capping, is carried through untouched. The spec's "over-redact" rule is "returns a strictly +safer event" (SPEC §2); leaving an oversized/sensitive `Target` in place is not strictly safer. + +This becomes reachable under misconfiguration. `Truncate(value, max)` does `marker[..max]` when +`marker.Length >= max` (line 38). For a negative `MaxTargetLength` or `MaxDetailsJsonLength`, +that slice throws `ArgumentOutOfRangeException`, the `Apply` catch fires, and the result keeps +the **original, untruncated `Target`** while only nulling `DetailsJson`. So a config bug that +should fail safe instead leaves the `Target` field unredacted. (`max == 0` is handled correctly — +`marker[..0]` is `""`.) The redactor never throws, so the "must not throw" half of the contract +holds; it is the "strictly safer" half that is violated. + +**Recommendation** + +Make the fallback strictly safer across the fields this redactor owns, e.g. +`return rawEvent with { DetailsJson = null, Target = null };` in the catch. Additionally, guard +against nonsensical caps at the boundary — clamp `max` to `>= 0` in `Truncate` (treat negative +as 0), or validate the options in the constructor — so the catch path is only ever hit by truly +unexpected failures rather than a predictable negative-length misconfiguration. + +**Resolution** + +_Unresolved._ + +### Audit-003 — `TruncatingAuditRedactorOptions` is a mutable class, not the immutable "options record" the contract describes + +| | | +|--|--| +| Severity | Low | +| Category | Public API surface & compatibility | +| Status | Open | +| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/TruncatingAuditRedactorOptions.cs:4-12` | + +**Description** + +The shared contract lists `TruncatingAuditRedactorOptions` as an "Options record" +(`shared-contract/ZB.MOM.WW.Audit.md:93`), but it is implemented as a `sealed class` with three +mutable `get; set;` auto-properties. `TruncatingAuditRedactor` captures the instance by reference +(`_options = options ?? new(...)`, `TruncatingAuditRedactor.cs:14`) and the redactor is typically +registered as a singleton. Because the options object stays mutable, a caller (or any other +holder of the same reference) can change `MaxDetailsJsonLength` / `MaxTargetLength` / +`TruncationMarker` after the redactor is built, changing redaction behaviour at runtime under a +concurrent host with no synchronization. This is a (minor) thread-safety footgun and a drift from +both the documented "record" shape and the library's otherwise consistent immutability posture +(`AuditEvent` is a record with init-only members). + +**Recommendation** + +Make the options immutable: convert to a `sealed record` with `init`-only properties (matching +the contract wording), or have the redactor defensively snapshot the three values into private +readonly fields at construction so post-construction mutation cannot affect it. + +**Resolution** + +_Unresolved._ + +### Audit-004 — XML documentation is authored but not emitted, so IntelliSense docs do not ship to consumers + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & XML docs | +| Status | Open | +| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/ZB.MOM.WW.Audit.csproj:1-18`, `ZB.MOM.WW.Audit/Directory.Build.props:1-10` | + +**Description** + +Every public type and member carries accurate XML doc comments, but neither the project nor +`Directory.Build.props` sets `GenerateDocumentationFile`. As a result no `ZB.MOM.WW.Audit.xml` +is produced or packed alongside the DLL (confirmed: no XML doc file in `bin/`), so consuming +apps get no IntelliSense/tooltip documentation from the package. For a shared library where, per +the review process, "public docs matter," the authored docs are effectively invisible to +consumers. (This is a family-wide baseline gap — none of the sibling `ZB.MOM.WW.*` libraries set +the flag either — so it is a baseline issue rather than an Audit-specific regression, hence Low.) + +**Recommendation** + +Set `true` (ideally once in the shared +`Directory.Build.props` so it applies across the family). Because the public surface is fully +documented, no CS1591 "missing XML comment" warnings should appear. + +**Resolution** + +_Unresolved._ + +### Audit-005 — Missing edge-case tests for the redactor never-throw/over-redact contract and composite null/empty handling + +| | | +|--|--| +| Severity | Low | +| Category | Testing coverage | +| Status | Open | +| Location | `ZB.MOM.WW.Audit/tests/ZB.MOM.WW.Audit.Tests/TruncatingAuditRedactorTests.cs`, `.../CompositeAuditWriterTests.cs` | + +**Description** + +The 19 tests cover the happy paths and the key guards (UTC normalization + equality, fan-out, +cancellation propagation, redaction ordering, marker-longer-than-max). Two contract-critical +behaviours are untested: + +1. The `IAuditRedactor` "never throws / over-redacts on failure" contract — no test drives + `TruncatingAuditRedactor.Apply` into its `catch` branch, so the over-redaction fallback + (and the asymmetry flagged in Audit-002) is unverified. +2. `CompositeAuditWriter` boundary inputs — no test covers an empty writer list (should be a + no-op) or a list containing a `null` writer (currently swallowed via the bare `catch`, + silently dropping that sink — behaviour that should be pinned by a test). + +Pinning these would have surfaced Audit-002 and would lock the documented never-throw guarantee. + +**Recommendation** + +Add: (a) a redactor test that forces an internal failure (e.g. a misconfigured cap) and asserts +`Apply` returns a strictly-safer event without throwing; (b) a composite test for the empty-list +no-op; and (c) a composite test documenting the null-entry behaviour (whether swallowed or +guarded — pair with the chosen fix for the null case). + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/Auth/findings.md b/code-reviews/Auth/findings.md index e1016cd..0ad35cb 100644 --- a/code-reviews/Auth/findings.md +++ b/code-reviews/Auth/findings.md @@ -6,31 +6,249 @@ | Packages | `ZB.MOM.WW.Auth.Abstractions`, `ZB.MOM.WW.Auth.Ldap`, `ZB.MOM.WW.Auth.ApiKeys`, `ZB.MOM.WW.Auth.AspNetCore` | | Component spec | `components/auth/spec/SPEC.md` | | Shared contract | `components/auth/shared-contract/ZB.MOM.WW.Auth.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 6 | ## Summary -_Not yet reviewed._ +The security-critical core of this library is sound: API-key secrets are 32-byte CSPRNG values, +hashing is HMAC-SHA256 keyed by an external pepper, verification uses +`CryptographicOperations.FixedTimeEquals`, the SQLite stores use parameterized SQL throughout, +LDAP filter values are RFC 4515-escaped before being interpolated into the search filter, group-DN +parsing is RFC 4514 escape-aware, and the LDAP service is genuinely fail-closed (never throws, +requires exactly one match plus a verified password plus >=1 group for success). Test coverage is +strong and contract-focused (~172 tests; every failure mode, the parser, the hasher, the escaper, +the stores, the migrator and the DI wiring are exercised). The findings below are about +robustness-around-the-edges and documentation, not the cryptographic core. The two themes are +(1) two narrow paths that can turn a recoverable condition into a thrown exception on the +authentication hot path, contradicting the library's own "only exception path is cancellation" / +"fail fast at startup" promises, and (2) public-facing documentation (README) that misstates the +hashing algorithm and the AspNetCore surface. The library is **not yet adopted** by the three apps +(tracked in `components/auth/GAPS.md`). ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | Core logic correct; see Auth-003 (JSON parse can throw on hot path), Auth-005 (KeyPrefix value inconsistency) | +| 2 | Public API surface & compatibility | ☑ | Surface is minimal, sealed records, correct nullable annotations, internal seams hidden via `InternalsVisibleTo`. No issues found | +| 3 | Concurrency & thread safety | ☑ | Options/records immutable; services stateless singletons; fresh SQLite connection per op with WAL + busy_timeout. No issues found | +| 4 | Error handling & resilience | ☑ | LDAP path fully fail-closed; but see Auth-001 (no `ValidateOnStart`), Auth-002 (`MarkUsedAsync` failure fails a valid auth), Auth-003 | +| 5 | Security & secret handling | ☑ | Peppered HMAC-SHA256 + FixedTimeEquals, CSPRNG secrets, parameterized SQL, LDAP escaping, hash-free list projection, secrets never logged. See Auth-006 (TLS cert-validation hook) | +| 6 | Performance & resource management | ☑ | Connections disposed via `await using`; `LdapConnection` disposed; no hot-path allocs of note. No issues found | +| 7 | Spec & shared-contract adherence | ☑ | Matches SPEC §1–§5 closely (HMAC-SHA256 pepper, constant-time compare, discriminated failures, cookie hardening). See Auth-001 (insecure-refusal deferred to first use) | +| 8 | Packaging, dependencies & project layout | ☑ | Clean 4-package split, central versions, minimal deps, Abstractions has no runtime deps. No issues found | +| 9 | Testing coverage | ☑ | Broad and contract-level; gaps: no test for corrupt-`scopes` deserialization (Auth-003) or `MarkUsedAsync`-throws (Auth-002); `ValidateOnStart` behavior untested (Auth-001) | +| 10 | Documentation & XML docs | ☑ | XML docs thorough and accurate; README is stale — see Auth-004 | ## Findings -_No findings recorded yet._ +### Auth-001 — LDAP options validator is registered but never runs at startup + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.AspNetCore/ServiceCollectionExtensions.cs:40` | + +**Description** + +`AddZbLdapAuth` binds `LdapOptions` with `services.Configure(...)` and registers the +validator with a plain `services.AddSingleton, LdapOptionsValidator>()`. +It never calls `OptionsBuilder.ValidateOnStart()`. With `Microsoft.Extensions.Options`, an +`IValidateOptions` runs only when the options instance is first materialized (`IOptions.Value`), +not at host startup. Here the only consumer that materializes the options is the `ILdapAuthService` +singleton factory (line 48), whose lambda runs on first resolution of `ILdapAuthService` — i.e. on +the first login attempt, not at boot. + +The result contradicts the explicit promise in both the method comment ("Fail fast at startup on a +misconfigured directory rather than on first login") and the validator's own XML doc ("Validates +`LdapOptions` at startup so a misconfiguration fails fast at boot"). It also weakens SPEC §2 step 1, +which says an insecure transport without `AllowInsecure` must be *refused* — that refusal is silently +deferred until a user tries to log in. A process can therefore boot green with `Transport=None` and +no `AllowInsecure`, and only surface the misconfiguration as a failed login later. Because this is a +shared library consumed by three hosts, the missing guard affects all of them. + +**Recommendation** + +Register options via the builder and add `ValidateOnStart`: + +```csharp +services.AddOptions() + .Bind(config.GetSection(sectionPath)) + .ValidateOnStart(); +services.AddSingleton, LdapOptionsValidator>(); +``` + +(`ValidateOnStart` relies on the host's start-time options validation, which the ASP.NET Core hosts +already run.) Add a test asserting that building + starting the host fails for an insecure config. + +**Resolution** + +_Unresolved._ + +### Auth-002 — A failed `MarkUsedAsync` write turns a valid API key into a thrown exception + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/ApiKeyVerifier.cs:66` | + +**Description** + +After a successful constant-time secret match, `VerifyAsync` does +`await store.MarkUsedAsync(record.KeyId, _timeProvider.GetUtcNow(), ct)` before returning the +success result. `MarkUsedAsync` is a best-effort "last used" bookkeeping write. If that write fails +(SQLite still `SQLITE_BUSY` after the 5 s busy-timeout, disk full, DB locked by a long migration, +etc.), the exception propagates out of `VerifyAsync` and the otherwise-valid credential fails +authentication. The verifier's own class remark states "The only exception path is cancellation", +so callers (gRPC interceptors / middleware) will not be wrapping this in a try/catch and a transient +storage hiccup becomes an auth outage for a legitimate caller. + +**Recommendation** + +Treat the usage write as best-effort: either swap the order (return success, then fire-and-record), +or wrap the `MarkUsedAsync` call so a failure is logged/swallowed rather than failing the +verification — the authentication decision has already been made by line 60. If `MarkUsedAsync` is +allowed to throw, update the XML remark to say so. Add a test where the store's `MarkUsedAsync` +throws and assert the result is still `Succeeded == true`. + +**Resolution** + +_Unresolved._ + +### Auth-003 — Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier + +| | | +|--|--| +| Severity | Medium | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/ScopeSerializer.cs:31`, `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/SqliteApiKeyStore.cs:71` | + +**Description** + +`SqliteApiKeyStore.ReadRecord` calls `ScopeSerializer.Deserialize(reader.GetString(4))`, which does +`JsonSerializer.Deserialize(value)`. If the `scopes` column ever holds non-array or +malformed JSON (operator tampering, a partial write, a future schema/format change, or a bug in +another writer), `Deserialize` throws `JsonException`. That exception propagates out of +`FindByKeyIdAsync`, and therefore out of `ApiKeyVerifier.VerifyAsync`, as an unhandled exception — +again contradicting the "only exception path is cancellation" contract and converting a single +poisoned row into a crash on the auth path rather than a structured fail-closed result. The same +applies to the admin `ListAsync` projection. + +**Recommendation** + +Decide the policy explicitly. Either (a) catch `JsonException` in `ScopeSerializer.Deserialize` and +treat a malformed column as an empty/invalid scope set so verification fails closed structurally, or +(b) catch it at the store boundary and surface it as a controlled error. Add a test that inserts a +row with a malformed `scopes` value and asserts the verifier returns a structured failure (or a +documented exception) rather than an unhandled `JsonException`. + +**Resolution** + +_Unresolved._ + +### Auth-004 — README misstates the hashing algorithm and the AspNetCore public surface + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & XML docs | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/README.md:13`, `ZB.MOM.WW.Auth/README.md:14` | + +**Description** + +The README's package table contains two inaccuracies that matter for a security library: + +1. It describes `ZB.MOM.WW.Auth.ApiKeys` as a store "with pepper-based **PBKDF2** hashing". The code + uses **HMAC-SHA256** keyed by the pepper (`ApiKeySecretHasher.Hash` → `new HMACSHA256(pepperBytes)`), + not PBKDF2. This is the correct algorithm per SPEC §4 ("HMAC-SHA256 with an external pepper"), so + the *code* is right and the *README* is wrong — but a reader auditing the crypto will be misled + about iteration count / KDF properties that do not exist here. +2. It says `ZB.MOM.WW.Auth.AspNetCore` provides "ASP.NET Core DI helpers (`AddZbAuth`) ... and + `LdapOptionsValidator` registration. **Wires together Ldap + ApiKeys + cookie middleware**." The + actual public surface is a single extension method `AddZbLdapAuth` (LDAP only). There is no + `AddZbAuth`, no API-key re-export from this package, and no cookie-middleware wiring — + `ZbCookieDefaults.Apply` is a helper the consumer must call itself, and API-key DI lives in the + separate `ZB.MOM.WW.Auth.ApiKeys` package (`AddZbApiKeyAuth`). The shared contract + (`ZB.MOM.WW.Auth.md`) likewise lists a combined DI shape that the code splits differently. + +**Recommendation** + +Fix the README table: change "PBKDF2" to "HMAC-SHA256", and correct the AspNetCore row to describe +the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`) and note that API-key +wiring is `AddZbApiKeyAuth` in the ApiKeys package. Optionally reconcile the shared-contract doc's +DI-method names with the shipped names. + +**Resolution** + +_Unresolved._ + +### Auth-005 — `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path + +| | | +|--|--| +| Severity | Low | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Admin/ApiKeyAdminCommands.cs:104` | + +**Description** + +When creating a key, `KeyPrefix` is set to `$"{_options.TokenPrefix}_{keyId}"` (e.g. `mxgw_key-1`), +embedding the key id into the column. Elsewhere `KeyPrefix` is treated as the bare token prefix: +the verifier test (`ApiKeyVerifierTests.BuildRecord`) and the store test +(`SqliteApiKeyStoreTests.SampleRecord`) seed it as `"mxgw"` / `"mxgw_ab12"`, and the `ApiKeyRecord` +contract gives no indication it should include the key id. `KeyPrefix` is not consulted during +verification (the token is re-parsed from the header), so there is no security impact — but the +persisted/displayed value is inconsistent and self-referential (the key id is already its own +column), which will confuse admin tooling that surfaces `KeyPrefix`. + +**Recommendation** + +Persist `KeyPrefix = _options.TokenPrefix` (the bare prefix) and let the key id remain the `KeyId` +column, or document precisely what `KeyPrefix` is meant to contain on the `ApiKeyRecord` / +`ApiKeyListItem` records and align all call sites to it. + +**Resolution** + +_Unresolved._ + +### Auth-006 — LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook + +| | | +|--|--| +| Severity | Low | +| Category | Security & secret handling | +| Status | Open | +| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.Ldap/Internal/NovellLdapConnection.cs:16` | + +**Description** + +`NovellLdapConnection.Connect` receives `bool allowInsecure` but never uses it; transport-security +correctness rests entirely on `LdapOptionsValidator` having run beforehand (and that registration is +itself lazy — see Auth-001). The method also sets `SecureSocketLayer`/`StartTls` but never configures +a server-certificate validation delegate or any pinning/CA hook. The underlying +`Novell.Directory.Ldap.NETStandard` 3.6.0 validates against the OS trust store by default (it does +*not* blind-accept), so this is not an unauthenticated-TLS hole today — hence Low — but the library +gives a consumer no seam to tighten (pin a CA, validate the SAN against `Server`, or deliberately +accept a self-signed dev cert), and the unused parameter reads as a guard that does nothing. + +**Recommendation** + +Either remove the unused `allowInsecure` parameter from the `ILdapConnection.Connect` seam or use it +to gate an explicit "accept invalid cert" path for dev only. Add an optional +`RemoteCertificateValidationCallback` (or CA-pinning option) so production consumers can harden LDAPS +beyond the OS default, and document the default validation behavior. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/Configuration/findings.md b/code-reviews/Configuration/findings.md index 7d23733..840e833 100644 --- a/code-reviews/Configuration/findings.md +++ b/code-reviews/Configuration/findings.md @@ -6,31 +6,181 @@ | Packages | `ZB.MOM.WW.Configuration` | | Component spec | `components/configuration/spec/SPEC.md` | | Shared contract | `components/configuration/shared-contract/ZB.MOM.WW.Configuration.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 4 | ## Summary -_Not yet reviewed._ +The library is small (five source files, ~230 LOC) and in good health. Its core promise — +**accumulate every failure, never short-circuit** — holds across all three entry points +(`ValidationBuilder`, `OptionsValidatorBase`, `ConfigPreflight`): the override never returns +early, each primitive appends to a `List` and returns `this`, and the `Success`/`Fail` +decision is taken only after every rule has run. The boundary cases that matter are correct: +`Port` rejects 0 and 65536 and accepts 1/65535; `HostPort` correctly rejects bare hosts, empty +ports, out-of-range ports, and (deliberately) unbracketed IPv6; `Required` treats null / empty / +whitespace alike; `OneOf` is case-insensitive and treats null as a failure (documented); +`MinCount` treats null as zero. Null-guarding is consistent (`ArgumentNullException`/ +`ArgumentException.ThrowIfNullOrWhiteSpace` on every public entry point). The +`ConfigPreflight.ThrowIfInvalid()` envelope is **byte-compatible** with the ScadaBridge +`StartupValidator` format the SPEC pins (`"Configuration validation failed:\n - "`, +`\n`-joined), and the validator created by `OptionsValidatorBase` is stateless (fresh +`ValidationBuilder` per call), so the singleton-registration requirement in SPEC §3 is honoured. +Dependency footprint is minimal and exactly as documented (four `Microsoft.Extensions.*` +abstractions, no ASP.NET Core). The four findings are all **Low**: a non-idempotent DI +registration (`AddSingleton` vs `TryAddEnumerable`), a small wording inconsistency in the raw-port +check, over-permissive integer port parsing, and the XML-doc-not-shipped packaging gap (which is +fleet-wide, not specific to this library). Test coverage (27 tests across the four public surfaces) +is solid for the happy and primary-failure paths; a few edge cases noted below are untested. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | Accumulation never short-circuits; port/host:port/duration/min-count boundaries correct. Port parsing over-permissive (003). | +| 2 | Public API surface & compatibility | ☑ | Surface matches the shared contract exactly; nullability annotations correct; `sealed`/`abstract` deliberate; internal `Checks` does not leak. No issues. | +| 3 | Concurrency & thread safety | ☑ | Validator is stateless (fresh builder per `Validate`); `ValidationBuilder`/`ConfigPreflight` are per-call locals. Safe as a singleton. No issues. | +| 4 | Error handling & resilience | ☑ | Guard clauses on every public entry; correct exception types (`OptionsValidationException` via `ValidateOnStart`, `InvalidOperationException` from `ConfigPreflight`); no swallowed exceptions. No issues. | +| 5 | Security & secret handling | ☑ | No secrets handled; failure messages echo config *values* (port/host/role), which are non-sensitive by design. No issues. | +| 6 | Performance & resource management | ☑ | Startup-only, no hot path, nothing disposable, no async. No issues. | +| 7 | Spec & shared-contract adherence | ☑ | `ConfigPreflight` envelope byte-compatible with `StartupValidator`; primitives match §2 table. Minor wording inconsistency in `PortValue` (002). | +| 8 | Packaging, dependencies & project layout | ☑ | Single package, minimal closure, central versions, `.gitignore` present, no tracked build output. XML docs / README not packaged (004). | +| 9 | Testing coverage | ☑ | 27 tests cover the four public surfaces + accumulation + byte-format. Untested: `AddValidatedOptions` null guards, double-registration, exact wording strings. | +| 10 | Documentation & XML docs | ☑ | Public XML docs present and accurate on every type/member. Not emitted into the nupkg (004). | ## Findings -_No findings recorded yet._ +### Configuration-001 — `AddValidatedOptions` uses `AddSingleton`, so a double call registers (and runs) the validator twice + +| | | +|--|--| +| Severity | Low | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/ServiceCollectionExtensions.cs:36` | + +**Description** + +`AddValidatedOptions` registers the validator with +`services.AddSingleton, TValidator>()`. `IValidateOptions` +is a multi-registration (enumerable) service — the options pipeline resolves *all* registered +`IValidateOptions` and runs each. Because `AddSingleton` appends unconditionally, calling +`AddValidatedOptions` twice for the same `TOptions`/`TValidator` pair (e.g. +two modules both guarding the same section, or an accidental duplicate during refactoring) +registers the validator twice, so it executes twice and every accumulated failure is reported +twice in the resulting `OptionsValidationException`. The framework-idiomatic registration for +`IValidateOptions` is `TryAddEnumerable(ServiceDescriptor.Singleton, +TValidator>())`, which is idempotent. Impact is limited (consumers normally call once per section, +and duplicate messages are cosmetic, not a correctness break of the accumulate-all contract), hence +Low — but the deviation from the .NET idiom is real and easy to fix. The SPEC §3 wording ("registers +the validator as a singleton") is satisfied either way. + +**Recommendation** + +Replace `services.AddSingleton, TValidator>()` with +`services.TryAddEnumerable(ServiceDescriptor.Singleton, TValidator>())` +(via `Microsoft.Extensions.DependencyInjection.Extensions`). Add a test asserting that calling +`AddValidatedOptions` twice yields a single set of failure messages. + +**Resolution** + +_Unresolved._ + +### Configuration-002 — `Checks.PortValue` quotes the raw value on a parse failure but not on a range failure + +| | | +|--|--| +| Severity | Low | +| Category | Spec & shared-contract adherence | +| Status | Open | +| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:21` | + +**Description** + +`PortValue` produces two different message shapes for the same field depending on *why* the port +is invalid. When the raw string parses but is out of range (e.g. `"0"` or `"70000"`), it delegates +to `Checks.Port(int, field)`, which renders `" must be between 1 and 65535 (was 0)"` — +**unquoted**. When the raw string does not parse (e.g. `"notaport"`, `null`), it renders +`" must be between 1 and 65535 (was 'notaport')"` — **quoted**. So two failures of the same +rule, from the same raw-config caller, read with inconsistent quoting. The shared contract +(`shared-contract/ZB.MOM.WW.Configuration.md`, "Internal `Checks` seam") states `Checks` is "the +single source of failure wording" so "a port failure reads the same whether it came from a bound +options object or a raw config key" — this inconsistency is in mild tension with that claim. It is +cosmetic (no consumer parses the text) and the overlapping integer case *does* match the bound-options +wording, so it is Low. + +**Recommendation** + +Make the two branches use a consistent quoting convention for the offending value — e.g. have the +range-failure branch also quote (`(was '0')`), or have the parse-failure branch route through a +shared formatter. Lock the exact strings down with a wording assertion test. + +**Resolution** + +_Unresolved._ + +### Configuration-003 — Port parsing accepts leading sign and surrounding whitespace and is culture-sensitive + +| | | +|--|--| +| Severity | Low | +| Category | Correctness & logic bugs | +| Status | Open | +| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:22`, `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:36` | + +**Description** + +Both `PortValue` (`int.TryParse(raw, out var port)`) and `HostPort` +(`int.TryParse(value[(idx + 1)..], out var port)`) use the default `int.TryParse` overload, which +applies `NumberStyles.Integer` (`AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign`) and the +current culture. As a result strings the documentation describes as "integer TCP port" are accepted +more loosely than intended: `"+5000"`, `" 5000 "`, and `"host: 5000"` (space after the colon) all +parse and pass, and parsing is locale-dependent. A leading `-` parses to a negative number that the +subsequent range check rejects (so `"-1"` is correctly rejected), but the whitespace/leading-`+` +cases silently pass. This is a robustness nuance rather than a security or correctness break — a port +that survives the loose parse is still in `1..65535` — so it is Low. + +**Recommendation** + +Parse with an explicit, culture-invariant, strict style, e.g. +`int.TryParse(raw, NumberStyles.None, CultureInfo.InvariantCulture, out var port)` (rejects sign and +whitespace), in both `PortValue` and `HostPort`. Add `Theory` cases for `"+5000"`, `" 5000 "`, and a +space-after-colon endpoint to pin the behaviour. + +**Resolution** + +_Unresolved._ + +### Configuration-004 — XML documentation and README are not packaged into the nupkg + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & XML docs | +| Status | Open | +| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/ZB.MOM.WW.Configuration.csproj:1`, `ZB.MOM.WW.Configuration/Directory.Build.props:1` | + +**Description** + +Every public type and member carries accurate XML doc comments, but neither the project nor +`Directory.Build.props` sets `true`, so no +`.xml` doc file is produced or included in the `dotnet pack` output. Consumers of the package +therefore get **no IntelliSense documentation** for `OptionsValidatorBase`, `ValidationBuilder`, +`AddValidatedOptions`, or `ConfigPreflight`, despite the docs existing in source. The project also +does not set `PackageReadmeFile`, so the README is not embedded in the package for display on the +NuGet feed. Category 10 of the review process explicitly notes "these are libraries — public docs +matter," so the gap is worth recording. It is fleet-wide (the sibling `ZB.MOM.WW.Audit` library has +the same omission), not a Configuration-specific regression, hence Low. + +**Recommendation** + +Add `true` (ideally in +`Directory.Build.props` so the whole family inherits it) and consider `README.md` +plus packing the README. Treat as a shared-infra change across the six libraries rather than a +one-off. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/Health/findings.md b/code-reviews/Health/findings.md index d016820..0e4478c 100644 --- a/code-reviews/Health/findings.md +++ b/code-reviews/Health/findings.md @@ -6,31 +6,250 @@ | Packages | `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore` | | Component spec | `components/health/spec/SPEC.md` | | Shared contract | `components/health/shared-contract/ZB.MOM.WW.Health.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 6 | ## Summary -_Not yet reviewed._ +The library is small, cohesive, and well-documented, and it tracks the normalized spec closely: +the three-tier endpoint convention, the canonical JSON writer, the `IActiveNodeGate` seam, the +configurable `AkkaClusterStatusPolicy` presets, and the role-filtered `ActiveNodeHealthCheck` all +match the contract. The package split is clean — Akka and EF Core stay out of the core package, so +MxGateway can take core only (category 8: no leakage). The decision logic is factored into pure, +exhaustively table-tested functions (`AkkaClusterStatusPolicy`, `ActiveNodeDecision`), and the test +suite (58 tests) exercises the public surface through a real TestServer / real SQLite / real probe +delegates rather than mocks. + +The findings cluster around **error-handling completeness in the health-check probes** (category 4), +the highest-risk area for this library: a probe that throws past the `IHealthCheck` boundary, or that +returns a status the spec says it should not, degrades the value of every consuming app's `/health/*` +endpoints. The two Akka checks and the gRPC check do not guard the cluster-state / probe call the way +the spec requires (return Degraded on inaccessible cluster) or the way the sibling `DatabaseHealthCheck` +does (catch-all → Unhealthy). The framework's `HealthCheckService` catches escaping exceptions, so none +of these crash an endpoint — that caps their severity at Medium — but they produce a result that +disagrees with the documented contract. The remainder are Low: a JSON shape nuance (`description` key +omitted vs. emitted-null), recommended-tag drift in XML docs, a return-value footgun in `MapZbHealth`, +and missing `GenerateDocumentationFile` so the (otherwise excellent) XML docs do not ship in the nupkgs. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | Tier predicates, policy presets, and `ActiveNodeDecision` matrices are correct and match the spec tables. No logic defects found. | +| 2 | Public API surface & compatibility | ☑ | Health-005. Surface is minimal, sealed, well-named; nullable annotations correct; no internal types leak (`ActiveNodeDecision` is `internal`). | +| 3 | Concurrency & thread safety | ☑ | Options are mutable POCOs but per-registration; checks hold no shared mutable state; cluster-state reads are snapshot reads; static writer options are immutable. No issues found. | +| 4 | Error handling & resilience | ☑ | Health-001, Health-002. Akka checks don't guard cluster-state access; gRPC check lets non-Rpc/non-OCE exceptions escape. | +| 5 | Security & secret handling | ☑ | No secrets/PII. Exception objects attached to `HealthCheckResult` (standard); endpoints `AllowAnonymous` by design. No issues found. | +| 6 | Performance & resource management | ☑ | `DatabaseHealthCheck` is pool-safe and eagerly releases the timeout timer; scopes/contexts disposed via `await using`; channel not owned (correctly not disposed). No issues found. | +| 7 | Spec & shared-contract adherence | ☑ | Health-001, Health-003, Health-004. Endpoint convention + status strings otherwise match §3. | +| 8 | Packaging, dependencies & project layout | ☑ | Health-006. Dependency split correct; central versions; net10.0; correct package ids. | +| 9 | Testing coverage | ☑ | Strong: TestServer tier tests, real SQLite EF tests, table-driven policy/decision tests, gRPC probe + timeout + cancellation. Gaps noted inline in findings, not raised separately. | +| 10 | Documentation & XML docs | ☑ | Health-004, Health-006. Public API XML docs are thorough and accurate; the issues are recommended-tag drift and docs not being packed. | ## Findings -_No findings recorded yet._ +### Health-001 — Akka health checks throw (instead of returning Degraded) when cluster state is inaccessible + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs:45`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs:106` | + +**Description** + +Both Akka checks guard only the *null* ActorSystem case: `_serviceProvider.GetService()` +returns `null` → Degraded. But once a non-null `ActorSystem` is resolved they call `Cluster.Get(system)` +and read `.SelfMember` / `.State.Leader` with no try/catch. The spec is explicit that this path must be +Degraded, not an exception: + +> "in both modes, if the ActorSystem is not yet ready **or cluster state is inaccessible** (e.g. +> during startup), the check returns Degraded (startup-safety rule)." — `SPEC.md` §2.2 note. + +`Cluster.Get(system)` throws (`ConfigurationException`) when the `Akka.Cluster` extension is not +configured on the resolved ActorSystem, and `SelfMember` can throw during the window where the +ActorSystem object exists but the cluster has not finished initialising — exactly the startup race the +spec calls out. The ActorSystem being present-but-not-yet-clustered is a very real ordering in a +`Microsoft.Extensions.Hosting` app (the ActorSystem is registered in DI before `Cluster` has joined). + +The escaping exception is caught by ASP.NET Core's `HealthCheckService`, which records the entry as +Unhealthy — so the endpoint does not crash. But Unhealthy on the `ready` tier returns **503** and +removes the node from load-balancing, whereas the spec's intended Degraded returns **200** (still in +rotation). A transient startup window therefore yanks a healthy-but-starting node out of rotation, +the opposite of the "startup-safe" guarantee the XML docs advertise. + +**Recommendation** + +Wrap the cluster-state read in a try/catch and return `HealthCheckResult.Degraded(...)` on failure, so +both the null-ActorSystem and inaccessible-cluster paths converge on Degraded as the spec requires: +resolve `Cluster.Get(system)` and read membership inside a `try`, catching the cluster extension's +startup exceptions and mapping them to Degraded with a "cluster not yet ready" description. + +**Resolution** + +_Unresolved._ + +### Health-002 — `GrpcDependencyHealthCheck` lets non-`RpcException`/non-`OperationCanceledException` errors escape + +| | | +|--|--| +| Severity | Medium | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs:54` | + +**Description** + +`CheckHealthAsync` catches only `RpcException` and `OperationCanceledException`. Any other exception +thrown by the probe escapes the method. This is reachable on both paths: + +- Default probe: `GrpcChannel.ConnectAsync` can throw exceptions other than `RpcException` — + e.g. `InvalidOperationException` (channel/socket misuse, shutdown) or `HttpRequestException` / + `SocketException` surfaced from the transport before a gRPC status is produced. +- Custom probe (`GrpcDependencyOptions.Probe`): a caller-supplied delegate (e.g. a + `grpc.health.v1.Health/Check` RPC built on a non-gRPC client) can throw anything. + +The XML docs and shared contract describe the result as Unhealthy "when it returns `false`, throws an +`RpcException`, or times out" — they do not promise to handle arbitrary exceptions, and the code does +not. By contrast the sibling `DatabaseHealthCheck` has a catch-all `catch (Exception ex)` → +`Unhealthy("Database connection failed.", ex)` (`DatabaseHealthCheck.cs:78`), so the two probes are +asymmetric. As with Health-001 the framework's `HealthCheckService` records the escaping exception as +Unhealthy, so no endpoint crashes — but the dependency-named, controlled description +(`"{name} probe failed…"`) is lost, and the library violates its own "must not throw past the +`IHealthCheck` boundary" intent. + +**Recommendation** + +Add a trailing `catch (Exception ex)` returning `HealthCheckResult.Unhealthy($"{name} probe failed: +{ex.Message}", ex)`, after the existing OCE/Rpc handlers (so external cancellation still propagates). +This brings the gRPC probe in line with `DatabaseHealthCheck` and the documented contract. + +**Resolution** + +_Unresolved._ + +### Health-003 — Null `description` is omitted from the JSON body instead of emitted as `null` + +| | | +|--|--| +| Severity | Low | +| Category | Spec & shared-contract adherence | +| Status | Open | +| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZbHealthWriter.cs:32` | + +**Description** + +`SerializerOptions` sets `DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull`, so a check that +produces no description renders an entry with **no** `description` key at all: +`{ "status": "Healthy", "durationMs": 0.1 }`. The spec models `description` as a `string?` that "may be +null" and both the canonical example (`SPEC.md` §3) and the `HealthChecks.UI.Client` shape the writer +claims to mirror emit the key with a `null` value rather than dropping it. Consumers/dashboards parsing +the body and reading `entries..description` must therefore handle a *missing* property, not just a +null one. The deviation is undocumented and untested — `ResponseWriterTests` only asserts the +present-description case. + +Low because the aggregate `status`/HTTP-code contract that orchestrators key on is unaffected; only the +descriptive sub-field shape drifts. + +**Recommendation** + +Either remove `WhenWritingNull` (so `description: null` is emitted, matching the spec example and the +UI-client shape), or document explicitly in the writer's XML docs and `SPEC.md` §3 that absent +descriptions are omitted, and add a test pinning the chosen behaviour. + +**Resolution** + +_Unresolved._ + +### Health-004 — XML docs recommend the `active` tag for ready-tier probes, contradicting the spec + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & XML docs | +| Status | Open | +| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs:11`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs:21` | + +**Description** + +`AkkaClusterHealthCheck`'s summary says *"Register to the `ZbHealthTags.Ready` tag (recommended +`[ready, active]`)"*, and `GrpcDependencyHealthCheck` likewise recommends tagging `[ready, active]`. +The spec is unambiguous that both probes belong to the `ready` tier only: §2.2 "Registered to the +`ready` tag" (Akka cluster) and §2.4 "Registered to the `ready` tag" (gRPC dependency). The `active` +tier is reserved by §1 / §2.3 for the leader/active-singleton probe (`ActiveNodeHealthCheck`); putting +cluster-membership or gRPC-reachability checks on the `active` tier pollutes the active-node tier with +non-leadership concerns and contradicts the convergence convention the library exists to enforce. + +**Recommendation** + +Bring the XML-doc recommendations in line with the spec — recommend `ZbHealthTags.Ready` only for both +probes (drop the `active` suggestion); or, if the dual-tag recommendation is intentional, reconcile it +back into `SPEC.md` §2.2/§2.4 so code and spec agree. + +**Resolution** + +_Unresolved._ + +### Health-005 — `MapZbHealth` returns only the readiness builder, silently dropping conventions for the active/live tiers + +| | | +|--|--| +| Severity | Low | +| Category | Public API surface & compatibility | +| Status | Open | +| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs:64` | + +**Description** + +`MapZbHealth` maps three endpoints but returns the `IEndpointConventionBuilder` for `/health/ready` +only. Any convention a caller chains onto the result — `.RequireHost(...)`, `.RequireAuthorization()`, +`.WithMetadata(...)`, `.CacheOutput()` — applies to the readiness endpoint **alone**; the active and +liveness endpoints silently do not receive it. The behaviour is documented in the XML ``, so +this is a deliberate, disclosed trade-off rather than a bug, but it is a real footgun: the most natural +reading of `endpoints.MapZbHealth().RequireHost("…")` is "gate all three health endpoints", and the +result type gives the caller no signal that two of the three are excluded. + +**Recommendation** + +Consider returning a composite builder that fans conventions out to all three endpoints (collect the +three `IEndpointConventionBuilder`s and wrap them) so chained conventions behave as a caller expects. +If the single-builder return is kept for simplicity, strengthen the `` remark to a warning and +note it in the README's `MapZbHealth` example. + +**Resolution** + +_Unresolved._ + +### Health-006 — XML documentation is not emitted into the packed nupkgs + +| | | +|--|--| +| Severity | Low | +| Category | Packaging, dependencies & project layout | +| Status | Open | +| Location | `ZB.MOM.WW.Health/Directory.Build.props:3`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj:3` | + +**Description** + +Every public type and member in all three packages carries thorough, accurate XML documentation, but no +project (and neither `Directory.Build.props` nor the three `.csproj` files) sets +`true`. As a result `dotnet pack` produces nupkgs +that contain the DLLs but **no `.xml` doc files**, so consumers of `ZB.MOM.WW.Health` / `.Akka` / +`.EntityFrameworkCore` get no IntelliSense summaries or parameter docs for the API. For a shared library +whose value proposition is a documented common surface across three apps, this discards the +documentation effort at the package boundary. (It also means the compiler does not enforce the +"public members are documented" invariant via CS1591.) + +**Recommendation** + +Set `true` once in `Directory.Build.props` so all +three packable projects emit and pack their XML docs. Optionally pair with targeted `` if any +CS1591 warnings surface on currently-undocumented members. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/README.md b/code-reviews/README.md index d1a4b3f..713ebcc 100644 --- a/code-reviews/README.md +++ b/code-reviews/README.md @@ -35,26 +35,26 @@ code-reviews/ ## Summary -0 of 6 libraries reviewed. 0 pending findings across all libraries. +6 of 6 libraries reviewed. 35 pending findings across all libraries. | Severity | Open findings | |----------|---------------| | Critical | 0 | -| High | 0 | -| Medium | 0 | -| Low | 0 | -| **Total** | **0** | +| High | 1 | +| Medium | 9 | +| Low | 25 | +| **Total** | **35** | ## Library Status | Library | Last reviewed | Commit | Open (C/H/M/L) | Open | Total | |---------|---------------|--------|----------------|------|-------| -| [Audit](Audit/findings.md) | — | — | 0/0/0/0 | 0 | 0 | -| [Auth](Auth/findings.md) | — | — | 0/0/0/0 | 0 | 0 | -| [Configuration](Configuration/findings.md) | — | — | 0/0/0/0 | 0 | 0 | -| [Health](Health/findings.md) | — | — | 0/0/0/0 | 0 | 0 | -| [Telemetry](Telemetry/findings.md) | — | — | 0/0/0/0 | 0 | 0 | -| [Theme](Theme/findings.md) | — | — | 0/0/0/0 | 0 | 0 | +| [Audit](Audit/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/1/4 | 5 | 5 | +| [Auth](Auth/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/3/3 | 6 | 6 | +| [Configuration](Configuration/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/0/4 | 4 | 4 | +| [Health](Health/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/2/4 | 6 | 6 | +| [Telemetry](Telemetry/findings.md) | 2026-06-01 | `5f75cd4` | 0/1/2/5 | 8 | 8 | +| [Theme](Theme/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/1/5 | 6 | 6 | ## Pending Findings @@ -67,14 +67,52 @@ description, location, recommendation — lives in the library's `findings.md`. _None open._ -### High (0) +### High (1) -_None open._ +| ID | Library | Title | +|----|---------|-------| +| Telemetry-001 | [Telemetry](Telemetry/findings.md) | `RedactionEnricher` ignores property removal, leaving secrets in the event | -### Medium (0) +### Medium (9) -_None open._ +| ID | Library | Title | +|----|---------|-------| +| Audit-001 | [Audit](Audit/findings.md) | `CompositeAuditWriter` re-throws `OperationCanceledException` to the caller, contradicting the "must not throw" writer contract | +| Auth-001 | [Auth](Auth/findings.md) | LDAP options validator is registered but never runs at startup | +| Auth-002 | [Auth](Auth/findings.md) | A failed `MarkUsedAsync` write turns a valid API key into a thrown exception | +| Auth-003 | [Auth](Auth/findings.md) | Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier | +| Health-001 | [Health](Health/findings.md) | Akka health checks throw (instead of returning Degraded) when cluster state is inaccessible | +| Health-002 | [Health](Health/findings.md) | `GrpcDependencyHealthCheck` lets non-`RpcException`/non-`OperationCanceledException` errors escape | +| Telemetry-002 | [Telemetry](Telemetry/findings.md) | Redactor cannot inspect or scrub destructured/structured property values | +| Telemetry-003 | [Telemetry](Telemetry/findings.md) | No tests for redactor removal or structured-value redaction | +| Theme-001 | [Theme](Theme/findings.md) | Mobile hamburger toggle silently depends on Bootstrap collapse JS | -### Low (0) +### Low (25) -_None open._ +| ID | Library | Title | +|----|---------|-------| +| Audit-002 | [Audit](Audit/findings.md) | `TruncatingAuditRedactor` over-redaction is partial: the catch path scrubs only `DetailsJson`, leaving `Target` unredacted | +| Audit-003 | [Audit](Audit/findings.md) | `TruncatingAuditRedactorOptions` is a mutable class, not the immutable "options record" the contract describes | +| Audit-004 | [Audit](Audit/findings.md) | XML documentation is authored but not emitted, so IntelliSense docs do not ship to consumers | +| Audit-005 | [Audit](Audit/findings.md) | Missing edge-case tests for the redactor never-throw/over-redact contract and composite null/empty handling | +| Auth-004 | [Auth](Auth/findings.md) | README misstates the hashing algorithm and the AspNetCore public surface | +| Auth-005 | [Auth](Auth/findings.md) | `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path | +| Auth-006 | [Auth](Auth/findings.md) | LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook | +| Configuration-001 | [Configuration](Configuration/findings.md) | `AddValidatedOptions` uses `AddSingleton`, so a double call registers (and runs) the validator twice | +| Configuration-002 | [Configuration](Configuration/findings.md) | `Checks.PortValue` quotes the raw value on a parse failure but not on a range failure | +| Configuration-003 | [Configuration](Configuration/findings.md) | Port parsing accepts leading sign and surrounding whitespace and is culture-sensitive | +| Configuration-004 | [Configuration](Configuration/findings.md) | XML documentation and README are not packaged into the nupkg | +| Health-003 | [Health](Health/findings.md) | Null `description` is omitted from the JSON body instead of emitted as `null` | +| Health-004 | [Health](Health/findings.md) | XML docs recommend the `active` tag for ready-tier probes, contradicting the spec | +| Health-005 | [Health](Health/findings.md) | `MapZbHealth` returns only the readiness builder, silently dropping conventions for the active/live tiers | +| Health-006 | [Health](Health/findings.md) | XML documentation is not emitted into the packed nupkgs | +| Telemetry-004 | [Telemetry](Telemetry/findings.md) | `service.instance.id` Resource attribute is undocumented in spec and contract | +| Telemetry-005 | [Telemetry](Telemetry/findings.md) | Two hand-maintained Resource-attribute builders can silently drift | +| Telemetry-006 | [Telemetry](Telemetry/findings.md) | Malformed `OtlpEndpoint` throws `UriFormatException` late, with no context | +| Telemetry-007 | [Telemetry](Telemetry/findings.md) | Redaction snapshot allocates a dictionary on every log event | +| Telemetry-008 | [Telemetry](Telemetry/findings.md) | `MapZbMetrics` XML doc claims it is "only valid when Exporter = Prometheus" — stale | +| Theme-002 | [Theme](Theme/findings.md) | `.chip-idle` foreground diverges from the documented token pairing | +| Theme-003 | [Theme](Theme/findings.md) | Hardcoded hex values in CSS contradict the "no hardcoded hex" rule | +| Theme-004 | [Theme](Theme/findings.md) | `NavRailItem` emits a `.rail-ico` span that no stylesheet defines | +| Theme-005 | [Theme](Theme/findings.md) | Orphan and unstyled nav CSS classes in `layout.css` | +| Theme-006 | [Theme](Theme/findings.md) | Public component/parameter surface lacks XML documentation | diff --git a/code-reviews/Telemetry/findings.md b/code-reviews/Telemetry/findings.md index 366dea4..8566ea9 100644 --- a/code-reviews/Telemetry/findings.md +++ b/code-reviews/Telemetry/findings.md @@ -6,31 +6,312 @@ | Packages | `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog` | | Component spec | `components/observability/spec/SPEC.md` | | Shared contract | `components/observability/shared-contract/ZB.MOM.WW.Telemetry.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 8 | ## Summary -_Not yet reviewed._ +The library is small, focused, and well-structured: two packages with a clean Serilog/OTel +boundary (the `Serilog.*` stack appears only in the `.Serilog` package; the core package is +pure OTel + ASP.NET Core framework reference), correct argument validation, deliberate +`sealed` types, thorough XML docs, and a deliberate no-process-global-state design for +`AddZbSerilog` that is well covered by `MultiHostTests`. The identity triple, Resource +omission rules, exporter wiring (Prometheus always-on, OTLP additive), and trace/log +correlation all match the spec's intent and are exercised by the 19 tests. + +The most material problems are in the redaction seam — the one component the review brief +flags as security-critical. `RedactionEnricher` honours only *replacement* of scalar +properties: it silently ignores the redactor **removing** a key (a documented capability of +`ILogRedactor`), and it cannot see inside destructured/structured property values, so a +secret logged as a field of `{@Object}` is never scrubbed. Both let secrets reach sinks +despite a conforming redactor. Secondary themes: a spec drift around an undocumented +`service.instance.id` Resource attribute, two hand-maintained Resource-attribute builders +that can drift apart, and a stale doc-comment on `MapZbMetrics`. Tests are solid for the +happy paths but have no coverage for redactor removal or structured-value redaction. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | Redactor "remove" path is a no-op (Telemetry-001); structured values opaque to redactor (Telemetry-002). | +| 2 | Public API surface & compatibility | ☑ | Surface minimal, `sealed`, nullable-correct. `ZbResource.InstanceId` is an added public member not in the contract (Telemetry-004). | +| 3 | Concurrency & thread safety | ☑ | No issues found. Enrichers stateless; `Lazy` uses `ExecutionAndPublication`; `Activity.Current` is async-local. | +| 4 | Error handling & resilience | ☑ | Guard clauses present. `new Uri(OtlpEndpoint)` can throw late on malformed input (Telemetry-006). | +| 5 | Security & secret handling | ☑ | Redaction gaps (Telemetry-001/002) are security-relevant — secrets can survive a conforming redactor. | +| 6 | Performance & resource management | ☑ | Per-event dictionary snapshot when a redactor is registered (Telemetry-007); acceptable but noted. | +| 7 | Spec & shared-contract adherence | ☑ | Undocumented `service.instance.id` attribute (Telemetry-004); two Resource builders that can drift (Telemetry-005). | +| 8 | Packaging, dependencies & project layout | ☑ | No issues found. Serilog stack confined to `.Serilog`; central versions; correct net10.0; framework ref justified. | +| 9 | Testing coverage | ☑ | No tests for redactor removal or structured-value redaction (Telemetry-003). | +| 10 | Documentation & XML docs | ☑ | `MapZbMetrics` doc-comment is stale: claims "only valid when Exporter = Prometheus" (Telemetry-008). | ## Findings -_No findings recorded yet._ +### Telemetry-001 — `RedactionEnricher` ignores property removal, leaving secrets in the event + +| | | +|--|--| +| Severity | High | +| Category | Security & secret handling | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67` | + +**Description** + +`ILogRedactor.Redact` is documented to let a project "remove or replace any sensitive +values" (`ILogRedactor.cs:13` and the XML doc on the interface method: *"remove or replace"*; +the shared contract repeats *"Remove or replace any sensitive values"*). `RedactionEnricher` +builds a `snapshot` dictionary, hands it to the redactor, then writes back only via +`AddOrUpdateProperty` for entries that remain in the snapshot and `HasChanged`: + +```csharp +foreach (var entry in snapshot) +{ + if (HasChanged(logEvent, entry.Key, entry.Value)) + logEvent.AddOrUpdateProperty(propertyFactory.CreateProperty(entry.Key, entry.Value)); +} +``` + +If a redactor *removes* a key from the dictionary (`properties.Remove("apiKey")`) — the most +natural way to implement "must not leave the process" — that key simply no longer appears in +the write-back loop, so the original property is **never removed from `logEvent`**. The +secret reaches every sink unredacted, even though the redactor did exactly what its contract +permits. This defeats the seam's stated operational guarantee ("secrets never leave the +process in log events") for any removal-style redactor. + +**Recommendation** + +After calling the redactor, reconcile deletions: for each property key present on the +original `logEvent` but absent from the returned `snapshot`, call +`logEvent.RemovePropertyIfPresent(key)`. (Capture the original key set before mutation, then +diff.) Add a test asserting a removing redactor scrubs the property (see Telemetry-003). + +**Resolution** + +_Unresolved._ + +### Telemetry-002 — Redactor cannot inspect or scrub destructured/structured property values + +| | | +|--|--| +| Severity | Medium | +| Category | Security & secret handling | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-55` | + +**Description** + +The snapshot only unwraps `ScalarValue`; every other `LogEventPropertyValue` +(`StructureValue` from `{@Object}`, `SequenceValue`, `DictionaryValue`) is passed to the +redactor as the raw Serilog wrapper object: + +```csharp +snapshot[property.Key] = property.Value is ScalarValue scalar ? scalar.Value : property.Value; +``` + +A project redactor written against the seam (`IDictionary` of "values") +therefore sees an opaque `StructureValue` for a destructured payload — it cannot read or +mask a secret *field inside* a logged object (e.g. `logger.Information("{@Command}", cmd)` +where `cmd.ApiKey` is sensitive). MxGateway's reference redactor specifically guards +"which command payloads must not leave the process" (per `ILogRedactor` XML doc and the +contract), which is precisely the destructured-object case. The seam silently cannot meet +that requirement; the redactor only works for top-level scalar properties. + +**Recommendation** + +Document the seam's actual reach (scalar top-level properties only) on `ILogRedactor` and in +the shared contract, and/or recursively project `StructureValue`/`SequenceValue`/ +`DictionaryValue` into the snapshot and rebuild them on write-back so nested fields are +redactable. At minimum, make the limitation explicit so consumers do not assume nested +payloads are scrubbed when they are not. + +**Resolution** + +_Unresolved._ + +### Telemetry-003 — No tests for redactor removal or structured-value redaction + +| | | +|--|--| +| Severity | Medium | +| Category | Testing coverage | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs:33-69` | + +**Description** + +`RedactionTests` covers exactly two redaction behaviours: a registered redactor replacing a +scalar value, and a no-op when none is registered. The `FakeRedactor` only ever *reassigns* +`properties["apiKey"]`. There is no test that a redactor which **removes** a key actually +scrubs it (the Telemetry-001 defect would have been caught), and no test that a redactor can +mask a field of a destructured/structured property (Telemetry-002). For a seam whose entire +purpose is secret containment, the most security-relevant behaviours are untested. + +**Recommendation** + +Add tests: (a) a redactor calling `properties.Remove(key)` results in the property being +absent from the emitted `LogEvent`; (b) a redactor attempting to mask a nested field of a +`{@Object}` payload, asserting the documented behaviour (whichever resolution Telemetry-002 +takes). These should fail today and pin the fixes. + +**Resolution** + +_Unresolved._ + +### Telemetry-004 — `service.instance.id` Resource attribute is undocumented in spec and contract + +| | | +|--|--| +| Severity | Low | +| Category | Spec & shared-contract adherence | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:19-45` | + +**Description** + +`ZbResource` adds a `service.instance.id` attribute (deterministic `MachineName:ProcessId`) +to the Resource, and exposes it as a new public member `ZbResource.InstanceId`. The +normalized Resource attribute set is enumerated exhaustively in two authoritative docs — +`SPEC.md` §2 and `METRIC-CONVENTIONS.md` §4 — and **neither lists `service.instance.id`**; +the shared contract (`ZB.MOM.WW.Telemetry.md`) likewise documents `ZbResource.Build` as +populating only `service.name/namespace/version/site.id/node.role/host.name` and does not +mention an `InstanceId` member. The attribute itself is a reasonable, standards-aligned +improvement (and disabling the OTel SDK's random-GUID default is sensible for cross-signal +correlation), but it is a silent divergence: the spec/contract are now stale relative to the +code. Per REVIEW-PROCESS §2.7, both directions of drift must be flagged. + +**Recommendation** + +Add `service.instance.id` (with the `MachineName:ProcessId` rationale) to the Resource table +in `SPEC.md` §2 and `METRIC-CONVENTIONS.md` §4, and document the public `ZbResource.InstanceId` +member in the shared contract, so the normalized spec and the code agree. + +**Resolution** + +_Unresolved._ + +### Telemetry-005 — Two hand-maintained Resource-attribute builders can silently drift + +| | | +|--|--| +| Severity | Low | +| Category | Spec & shared-contract adherence | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:38-64`, `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogConfig.cs:125-151` | + +**Description** + +The Resource attached to metrics/traces is built by `ZbResource.Configure` (via the OTel +`AddService` + `AddAttributes` API), while the Resource attached to the OTLP *log* sink is +built independently by `ZbSerilogConfig.BuildResourceAttributes` (a hand-rolled +`Dictionary`). The two currently agree, but they enumerate the same six/seven +attributes in two places with two different mechanisms, so a future change to one (a new +attribute, a renamed key, a changed omission rule) will silently desynchronize logs from +metrics/traces and break the cross-signal correlation the library's whole "unifying hinge" +depends on. There is no test asserting parity between the two attribute sets. + +**Recommendation** + +Derive both from a single source of truth — e.g. have `ZbResource` expose the canonical +attribute map (already mostly the shape `BuildResourceAttributes` returns) and have the +Serilog sink consume it — or add a parity test that asserts the two attribute sets are +key-for-key identical for a representative options object. + +**Resolution** + +_Unresolved._ + +### Telemetry-006 — Malformed `OtlpEndpoint` throws `UriFormatException` late, with no context + +| | | +|--|--| +| Severity | Low | +| Category | Error handling & resilience | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs:127-135` | + +**Description** + +`ConfigureOtlp` does `otlp.Endpoint = new Uri(options.OtlpEndpoint)` with no validation. A +malformed endpoint string (typo, missing scheme) throws a raw `UriFormatException` deep +inside OTel exporter construction at host-build time, with no mention of which option was at +fault. `BuildOptions` already fails fast and clearly for a missing `ServiceName`, but does +not validate that `OtlpEndpoint` is a well-formed absolute URI when `Exporter == Otlp` (nor +that it is non-empty — an Otlp exporter with a null endpoint is silently registered and +points nowhere). The Serilog path (`ZbSerilogConfig`) has the same untyped string→endpoint +handoff. + +**Recommendation** + +In `BuildOptions`, when `Exporter == ZbExporter.Otlp`, validate `OtlpEndpoint` with +`Uri.TryCreate(..., UriKind.Absolute, out _)` and throw an `ArgumentException` naming the +option (consistent with the existing `ServiceName` guard) rather than letting a bare +`UriFormatException` escape later. + +**Resolution** + +_Unresolved._ + +### Telemetry-007 — Redaction snapshot allocates a dictionary on every log event + +| | | +|--|--| +| Severity | Low | +| Category | Performance & resource management | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67` | + +**Description** + +When an `ILogRedactor` is registered, `Enrich` allocates a new +`Dictionary(logEvent.Properties.Count)`, copies every property into it, and +then iterates again to diff/write-back — on every single log event, across every logging +thread. Enrichers are on the hottest path in the library (they run for each event the level +filter admits). The early-return when no redactor is registered keeps the common case free, +so the cost is borne only by redaction-enabled consumers (MxGateway), but for a high-volume +gateway this is non-trivial steady-state allocation/GC pressure. + +**Recommendation** + +Consider redacting in place against `logEvent.Properties` without a full snapshot copy (e.g. +only materialize replacements for keys the redactor touches), or short-circuit when the event +has no properties. At minimum, document the per-event cost so consumers can weigh enabling +redaction on very hot loggers. Acceptable as-is given redaction is opt-in and security-first. + +**Resolution** + +_Unresolved._ + +### Telemetry-008 — `MapZbMetrics` XML doc claims it is "only valid when Exporter = Prometheus" — stale + +| | | +|--|--| +| Severity | Low | +| Category | Documentation & XML docs | +| Status | Open | +| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs:11-14` | + +**Description** + +The XML doc on `MapZbMetrics` states it is "Only valid when +`ZbTelemetryOptions.Exporter = ZbExporter.Prometheus`." That contradicts the actual wiring: +`ApplyMetricsExporter` (`ZbTelemetryExtensions.cs:107-116`) **always** calls +`AddPrometheusExporter()` regardless of the `Exporter` setting — OTLP is purely additive. +The library's own README ("Prometheus is **always wired** for metrics regardless of the +`Exporter` setting") and the test `AddZbTelemetry_OtlpExporter_StillServesPrometheusEndpoint` +both confirm `/metrics` works under `Exporter = Otlp`. The doc-comment therefore tells +consumers the opposite of the real (and intended) behaviour and could lead them to wrongly +believe `MapZbMetrics` is a no-op under OTLP. The same stale wording is mirrored in the +shared contract (`ZB.MOM.WW.Telemetry.md`, `MapZbMetrics` summary). + +**Recommendation** + +Update the doc-comment to state that the Prometheus exporter is always registered and +`MapZbMetrics` is valid under any `Exporter` value (Prometheus is always-on; OTLP is an +overlay). Align the shared-contract summary for `MapZbMetrics` to match. + +**Resolution** + +_Unresolved._ diff --git a/code-reviews/Theme/findings.md b/code-reviews/Theme/findings.md index 7764e40..8193ca3 100644 --- a/code-reviews/Theme/findings.md +++ b/code-reviews/Theme/findings.md @@ -6,31 +6,237 @@ | Packages | `ZB.MOM.WW.Theme` (Razor Class Library) | | Component spec | `components/ui-theme/spec/SPEC.md` | | Shared contract | `components/ui-theme/shared-contract/ZB.MOM.WW.Theme.md` | -| Status | Not yet reviewed | -| Last reviewed | — | -| Reviewer | — | -| Commit reviewed | — | -| Open findings | 0 | +| Status | Reviewed | +| Last reviewed | 2026-06-01 | +| Reviewer | Claude (automated baseline) | +| Commit reviewed | `5f75cd4` | +| Open findings | 6 | ## Summary -_Not yet reviewed._ +`ZB.MOM.WW.Theme` is a small, clean, well-scoped Blazor Razor Class Library: ten flat-namespace +SSR components, two stylesheets (`theme.css` 379 lines, `layout.css` 191 lines), three vendored +IBM Plex `.woff2` fonts, and two public enums. The component C# is correct and minimal — stateless +render logic, no concurrency surface, no secret handling, sensible `[Parameter]`/`EditorRequired` +choices, and `TreatWarningsAsErrors`. The 32 bUnit tests (20 `[Fact]` + 12 `[Theory]` cases) exercise +the public render contract of every component plus a static-asset check on the corrected font path. +The standout security-sensitive area — `LoginCard` (open-redirect + antiforgery) — is handled +correctly by delegating both concerns to the consumer with explicit XML docs and in-markup security +notes, matching the spec. No findings are Critical or High. The findings are concentrated in two +themes: (1) one functional gap — the mobile hamburger relies on Bootstrap's collapse JS, an +undocumented runtime dependency that contradicts the "no JavaScript" framing; and (2) CSS / token +hygiene — a token-fidelity drift on `.chip-idle`, hardcoded hex values that violate the spec's +"no hardcoded hex" rule, an undefined `.rail-ico` class, orphan/dead nav CSS, and a near-total +absence of XML docs on the public parameter surface. ## Checklist coverage | # | Category | Examined | Notes | |---|----------|----------|-------| -| 1 | Correctness & logic bugs | ☐ | | -| 2 | Public API surface & compatibility | ☐ | | -| 3 | Concurrency & thread safety | ☐ | | -| 4 | Error handling & resilience | ☐ | | -| 5 | Security & secret handling | ☐ | | -| 6 | Performance & resource management | ☐ | | -| 7 | Spec & shared-contract adherence | ☐ | | -| 8 | Packaging, dependencies & project layout | ☐ | | -| 9 | Testing coverage | ☐ | | -| 10 | Documentation & XML docs | ☐ | | +| 1 | Correctness & logic bugs | ☑ | Component render logic correct; `open="@Expanded"` boolean-attribute binding behaves (tests confirm); chip/variant `switch` mappings complete. No issues found. | +| 2 | Public API surface & compatibility | ☑ | Flat `ZB.MOM.WW.Theme` namespace; minimal, intentional; `EditorRequired` applied; nullable annotations correct; no internal types leak. Surface matches contract. No issues found. | +| 3 | Concurrency & thread safety | ☑ | Stateless components, no shared mutable state, no `async`/statics/singletons. No issues found (UI kit). | +| 4 | Error handling & resilience | ☑ | No throwing paths; required params default to `string.Empty`; null `RenderFragment` slots are guarded (`is not null` / `IsNullOrEmpty`). No issues found. | +| 5 | Security & secret handling | ☑ | No secrets/PII. `LoginCard` open-redirect + CSRF correctly delegated to consumer with docs. `Accent` interpolates into a `style` attr but is developer-supplied (not untrusted). No issues found. | +| 6 | Performance & resource management | ☑ | No `IDisposable`, no allocations of note, fonts vendored once. No issues found. | +| 7 | Spec & shared-contract adherence | ☑ | Component/parameter API matches SPEC §4 and the shared contract. Drifts found: `.chip-idle` foreground (Theme-002) and "no hardcoded hex" rule (Theme-003). | +| 8 | Packaging, dependencies & project layout | ☑ | `net10.0` RCL, single package id `ZB.MOM.WW.Theme`, `FrameworkReference` only, version `0.1.0` (inherited from `Directory.Build.props`), static assets under `_content/`. No issues found. | +| 9 | Testing coverage | ☑ | 32 tests cover every component's render contract + corrected font path. CSS-class wiring (e.g. `.rail-ico`, orphan nav CSS) is not asserted, which is how Theme-004/005 slipped through. | +| 10 | Documentation & XML docs | ☑ | README/contract accurate. XML docs present only on `LoginCard`; absent on the other 9 components, all their parameters, and both enums (Theme-006). | ## Findings -_No findings recorded yet._ +### Theme-001 — Mobile hamburger toggle silently depends on Bootstrap collapse JS + +| | | +|--|--| +| Severity | Medium | +| Category | Spec & shared-contract adherence | +| Status | Open | +| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/ThemeShell.razor:5` | + +**Description** + +`ThemeShell` renders the narrow-viewport navigation toggle as a Bootstrap collapse trigger: +`