Baseline code review of the six ZB.MOM.WW.* shared libraries

All six libraries reviewed at commit 5f75cd4 against their components/ specs,
following code-reviews/REVIEW-PROCESS.md. 35 findings (0 Critical, 1 High,
9 Medium, 25 Low); none block adoption.

- Auth      0/0/3/3  (security core sound; startup-validation + key-verify contract gaps)
- Telemetry 0/1/2/5  (HIGH Telemetry-001: redactor 'remove' is a no-op -> secrets reach sinks)
- Health    0/0/2/4  (Akka checks throw instead of Degraded when cluster not yet up)
- Theme     0/0/1/5  (undocumented Bootstrap-collapse JS dep; token/CSS hygiene)
- Audit     0/0/1/4  (composite re-throws OCE vs never-throw writer contract)
- Configuration 0/0/0/4 (DI idempotency, port-parse strictness, packaging)

Cross-cutting: XML docs authored but GenerateDocumentationFile unset -> docs
not shipped in any nupkg (Auth/Health/Telemetry/Configuration/Audit).

README.md regenerated from the per-library findings; regen-readme.py --check passes.
This commit is contained in:
Joseph Doherty
2026-06-01 11:08:12 -04:00
parent 5f75cd4dab
commit 26ba1c7215
7 changed files with 1422 additions and 119 deletions
+208 -17
View File
@@ -6,31 +6,222 @@
| Packages | `ZB.MOM.WW.Audit` |
| Component spec | `components/audit/spec/SPEC.md` |
| Shared contract | `components/audit/shared-contract/ZB.MOM.WW.Audit.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 5 |
## Summary
_Not yet reviewed._
`ZB.MOM.WW.Audit` is a small, clean, well-scoped library: one canonical `AuditEvent`
sealed record, a 3-value `AuditOutcome` enum, two narrow seams (`IAuditWriter`,
`IAuditRedactor`), five shipped helpers, and a single `AddZbAudit` registration. The record
matches `EVENT-MODEL.md` field-for-field (names, types, required/optional split, and the
`OccurredAtUtc` UTC-normalization contract — including value-equality on the normalized
instant). The dependency footprint is exactly as specified: the only non-BCL reference is
`Microsoft.Extensions.DependencyInjection.Abstractions` (verified against the built
`deps.json`), with no Akka/EF/SQLite/Serilog/Telemetry leakage. Guard clauses are present on
every public entry point, `RedactingAuditWriter` guarantees redaction runs before the inner
writer (no unredacted-PII leak), and `CompositeAuditWriter` isolates per-writer failures so a
throwing writer does not lose downstream events. 19 tests cover the happy paths and the most
important contract guards. No Critical or High findings. The five findings are all Medium/Low
and concern: a cancellation-propagation tension with the "never throw to the caller" writer
contract (Audit-001), a partial over-redaction path in `TruncatingAuditRedactor` (Audit-002),
mutable options on a singleton redactor (Audit-003), un-emitted XML docs (Audit-004, a
family-wide baseline pattern), and a couple of missing edge-case tests (Audit-005).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | `Truncate` boundary logic correct for sane configs; marker-longer-than-max clips safely. Partial over-redaction edge under negative max → Audit-002. |
| 2 | Public API surface & compatibility | | Minimal, intentional surface; `sealed` records/classes; nullable annotations correct; no internal-type leakage. Options type is a mutable class, not the record the contract describes → Audit-003. |
| 3 | Concurrency & thread safety | | `AuditEvent` immutable (readonly backing field); singletons (`NoOp`/`Null`) stateless; composite snapshots its writers via `ToArray()`. Mutable singleton options noted in Audit-003. |
| 4 | Error handling & resilience | | Guard clauses on all entry points; composite swallows writer faults. OCE re-throw vs writer contract → Audit-001; partial over-redact → Audit-002. |
| 5 | Security & secret handling | | `RedactingAuditWriter` applies the redactor strictly before the inner writer — no unredacted-event leak. `Actor` kept a plain string (no Auth coupling), as specified. No secrets in code/messages. |
| 6 | Performance & resource management | | No `IDisposable` to manage; no hot-path allocations beyond a single `with`-copy per redact; DI registration is trivial. No issues found. |
| 7 | Spec & shared-contract adherence | | `AuditEvent` + `AuditOutcome` match `EVENT-MODEL.md` exactly; `AddZbAudit` uses `TryAdd` as specified. OCE re-throw deviates from the §1 writer hard rule → Audit-001. |
| 8 | Packaging, dependencies & project layout | | Sole non-BCL dep confirmed via `deps.json`; net10.0; correct PackageId/version (0.1.0). XML docs not packed → Audit-004. |
| 9 | Testing coverage | | 19 tests; public contract exercised (round-trip, UTC normalization, fan-out, cancellation, redaction ordering). Missing never-throw/over-redact and null-writer cases → Audit-005. |
| 10 | Documentation & XML docs | | XML docs present and accurate on the whole public surface; README accurate. Docs not emitted to a file → Audit-004. |
## Findings
_No findings recorded yet._
### Audit-001 — `CompositeAuditWriter` re-throws `OperationCanceledException` to the caller, contradicting the "must not throw" writer contract
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/CompositeAuditWriter.cs:24` |
**Description**
`IAuditWriter`'s hard contract is unambiguous: "Best-effort; must not throw to the caller"
(`IAuditWriter.cs:11`), and SPEC §1 states "`WriteAsync` MUST NOT throw to the caller … a
failed write must never abort the user-facing action it is recording." The shared contract
further says a cancellation "does not constitute a contract violation" and the implementation
"may choose to complete a partially-written event anyway" — it permits ignoring the token, not
throwing on it.
`CompositeAuditWriter.WriteAsync` re-throws `OperationCanceledException`
(`catch (OperationCanceledException) { throw; }`, line 24). The very common call shape is to
pass the ambient request token, e.g. `writer.WriteAsync(evt, httpContext.RequestAborted)`. When
a client disconnects mid-request, the inner writer observes the cancelled token, throws OCE, and
the composite propagates it straight into the user-facing action — exactly the abort the seam
promises never to cause. This also directly contradicts the helper's own XML doc, which claims
"the fan-out drains and the caller is never aborted" (`CompositeAuditWriter.cs:4`). Because the
composite is the recommended way to wire multiple sinks, every consumer that uses it inherits
this behaviour; the only workaround is to not pass a request-scoped token, which is non-obvious.
Note this is a deliberate, tested choice (`CompositeAuditWriterTests.cs:41`), so it is a
design/contract conflict rather than an oversight — but the conflict is real: the writer contract
says "never throw," the composite says "drains, never aborts," and the code throws on
cancellation.
**Recommendation**
Reconcile the contract and the implementation. Either (a) honour the "never throw" rule
literally — swallow OCE like any other writer failure (drop the remaining writers or continue,
but do not propagate), and update the XML doc accordingly; or (b) if cancellation propagation is
genuinely wanted, amend the `IAuditWriter` SPEC/shared-contract to carve out OCE explicitly and
add a doc warning that consumers must not pass a request-scoped token they cannot afford to see
surface. Option (a) is the lower-blast-radius choice and matches the seam's stated intent.
**Resolution**
_Unresolved._
### Audit-002 — `TruncatingAuditRedactor` over-redaction is partial: the catch path scrubs only `DetailsJson`, leaving `Target` unredacted
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/TruncatingAuditRedactor.cs:27-31`, `:34-40` |
**Description**
The redactor's over-redact fallback returns `rawEvent with { DetailsJson = null }`
(line 30). That scrubs only `DetailsJson``Target`, which this redactor is also responsible
for capping, is carried through untouched. The spec's "over-redact" rule is "returns a strictly
safer event" (SPEC §2); leaving an oversized/sensitive `Target` in place is not strictly safer.
This becomes reachable under misconfiguration. `Truncate(value, max)` does `marker[..max]` when
`marker.Length >= max` (line 38). For a negative `MaxTargetLength` or `MaxDetailsJsonLength`,
that slice throws `ArgumentOutOfRangeException`, the `Apply` catch fires, and the result keeps
the **original, untruncated `Target`** while only nulling `DetailsJson`. So a config bug that
should fail safe instead leaves the `Target` field unredacted. (`max == 0` is handled correctly —
`marker[..0]` is `""`.) The redactor never throws, so the "must not throw" half of the contract
holds; it is the "strictly safer" half that is violated.
**Recommendation**
Make the fallback strictly safer across the fields this redactor owns, e.g.
`return rawEvent with { DetailsJson = null, Target = null };` in the catch. Additionally, guard
against nonsensical caps at the boundary — clamp `max` to `>= 0` in `Truncate` (treat negative
as 0), or validate the options in the constructor — so the catch path is only ever hit by truly
unexpected failures rather than a predictable negative-length misconfiguration.
**Resolution**
_Unresolved._
### Audit-003 — `TruncatingAuditRedactorOptions` is a mutable class, not the immutable "options record" the contract describes
| | |
|--|--|
| Severity | Low |
| Category | Public API surface & compatibility |
| Status | Open |
| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/TruncatingAuditRedactorOptions.cs:4-12` |
**Description**
The shared contract lists `TruncatingAuditRedactorOptions` as an "Options record"
(`shared-contract/ZB.MOM.WW.Audit.md:93`), but it is implemented as a `sealed class` with three
mutable `get; set;` auto-properties. `TruncatingAuditRedactor` captures the instance by reference
(`_options = options ?? new(...)`, `TruncatingAuditRedactor.cs:14`) and the redactor is typically
registered as a singleton. Because the options object stays mutable, a caller (or any other
holder of the same reference) can change `MaxDetailsJsonLength` / `MaxTargetLength` /
`TruncationMarker` after the redactor is built, changing redaction behaviour at runtime under a
concurrent host with no synchronization. This is a (minor) thread-safety footgun and a drift from
both the documented "record" shape and the library's otherwise consistent immutability posture
(`AuditEvent` is a record with init-only members).
**Recommendation**
Make the options immutable: convert to a `sealed record` with `init`-only properties (matching
the contract wording), or have the redactor defensively snapshot the three values into private
readonly fields at construction so post-construction mutation cannot affect it.
**Resolution**
_Unresolved._
### Audit-004 — XML documentation is authored but not emitted, so IntelliSense docs do not ship to consumers
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Audit/src/ZB.MOM.WW.Audit/ZB.MOM.WW.Audit.csproj:1-18`, `ZB.MOM.WW.Audit/Directory.Build.props:1-10` |
**Description**
Every public type and member carries accurate XML doc comments, but neither the project nor
`Directory.Build.props` sets `GenerateDocumentationFile`. As a result no `ZB.MOM.WW.Audit.xml`
is produced or packed alongside the DLL (confirmed: no XML doc file in `bin/`), so consuming
apps get no IntelliSense/tooltip documentation from the package. For a shared library where, per
the review process, "public docs matter," the authored docs are effectively invisible to
consumers. (This is a family-wide baseline gap — none of the sibling `ZB.MOM.WW.*` libraries set
the flag either — so it is a baseline issue rather than an Audit-specific regression, hence Low.)
**Recommendation**
Set `<GenerateDocumentationFile>true</GenerateDocumentationFile>` (ideally once in the shared
`Directory.Build.props` so it applies across the family). Because the public surface is fully
documented, no CS1591 "missing XML comment" warnings should appear.
**Resolution**
_Unresolved._
### Audit-005 — Missing edge-case tests for the redactor never-throw/over-redact contract and composite null/empty handling
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Open |
| Location | `ZB.MOM.WW.Audit/tests/ZB.MOM.WW.Audit.Tests/TruncatingAuditRedactorTests.cs`, `.../CompositeAuditWriterTests.cs` |
**Description**
The 19 tests cover the happy paths and the key guards (UTC normalization + equality, fan-out,
cancellation propagation, redaction ordering, marker-longer-than-max). Two contract-critical
behaviours are untested:
1. The `IAuditRedactor` "never throws / over-redacts on failure" contract — no test drives
`TruncatingAuditRedactor.Apply` into its `catch` branch, so the over-redaction fallback
(and the asymmetry flagged in Audit-002) is unverified.
2. `CompositeAuditWriter` boundary inputs — no test covers an empty writer list (should be a
no-op) or a list containing a `null` writer (currently swallowed via the bare `catch`,
silently dropping that sink — behaviour that should be pinned by a test).
Pinning these would have surfaced Audit-002 and would lock the documented never-throw guarantee.
**Recommendation**
Add: (a) a redactor test that forces an internal failure (e.g. a misconfigured cap) and asserts
`Apply` returns a strictly-safer event without throwing; (b) a composite test for the empty-list
no-op; and (c) a composite test documenting the null-entry behaviour (whether swallowed or
guarded — pair with the chosen fix for the null case).
**Resolution**
_Unresolved._
+235 -17
View File
@@ -6,31 +6,249 @@
| Packages | `ZB.MOM.WW.Auth.Abstractions`, `ZB.MOM.WW.Auth.Ldap`, `ZB.MOM.WW.Auth.ApiKeys`, `ZB.MOM.WW.Auth.AspNetCore` |
| Component spec | `components/auth/spec/SPEC.md` |
| Shared contract | `components/auth/shared-contract/ZB.MOM.WW.Auth.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 6 |
## Summary
_Not yet reviewed._
The security-critical core of this library is sound: API-key secrets are 32-byte CSPRNG values,
hashing is HMAC-SHA256 keyed by an external pepper, verification uses
`CryptographicOperations.FixedTimeEquals`, the SQLite stores use parameterized SQL throughout,
LDAP filter values are RFC 4515-escaped before being interpolated into the search filter, group-DN
parsing is RFC 4514 escape-aware, and the LDAP service is genuinely fail-closed (never throws,
requires exactly one match plus a verified password plus >=1 group for success). Test coverage is
strong and contract-focused (~172 tests; every failure mode, the parser, the hasher, the escaper,
the stores, the migrator and the DI wiring are exercised). The findings below are about
robustness-around-the-edges and documentation, not the cryptographic core. The two themes are
(1) two narrow paths that can turn a recoverable condition into a thrown exception on the
authentication hot path, contradicting the library's own "only exception path is cancellation" /
"fail fast at startup" promises, and (2) public-facing documentation (README) that misstates the
hashing algorithm and the AspNetCore surface. The library is **not yet adopted** by the three apps
(tracked in `components/auth/GAPS.md`).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | Core logic correct; see Auth-003 (JSON parse can throw on hot path), Auth-005 (KeyPrefix value inconsistency) |
| 2 | Public API surface & compatibility | | Surface is minimal, sealed records, correct nullable annotations, internal seams hidden via `InternalsVisibleTo`. No issues found |
| 3 | Concurrency & thread safety | | Options/records immutable; services stateless singletons; fresh SQLite connection per op with WAL + busy_timeout. No issues found |
| 4 | Error handling & resilience | | LDAP path fully fail-closed; but see Auth-001 (no `ValidateOnStart`), Auth-002 (`MarkUsedAsync` failure fails a valid auth), Auth-003 |
| 5 | Security & secret handling | | Peppered HMAC-SHA256 + FixedTimeEquals, CSPRNG secrets, parameterized SQL, LDAP escaping, hash-free list projection, secrets never logged. See Auth-006 (TLS cert-validation hook) |
| 6 | Performance & resource management | | Connections disposed via `await using`; `LdapConnection` disposed; no hot-path allocs of note. No issues found |
| 7 | Spec & shared-contract adherence | | Matches SPEC §1–§5 closely (HMAC-SHA256 pepper, constant-time compare, discriminated failures, cookie hardening). See Auth-001 (insecure-refusal deferred to first use) |
| 8 | Packaging, dependencies & project layout | | Clean 4-package split, central versions, minimal deps, Abstractions has no runtime deps. No issues found |
| 9 | Testing coverage | | Broad and contract-level; gaps: no test for corrupt-`scopes` deserialization (Auth-003) or `MarkUsedAsync`-throws (Auth-002); `ValidateOnStart` behavior untested (Auth-001) |
| 10 | Documentation & XML docs | | XML docs thorough and accurate; README is stale — see Auth-004 |
## Findings
_No findings recorded yet._
### Auth-001 — LDAP options validator is registered but never runs at startup
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.AspNetCore/ServiceCollectionExtensions.cs:40` |
**Description**
`AddZbLdapAuth` binds `LdapOptions` with `services.Configure<LdapOptions>(...)` and registers the
validator with a plain `services.AddSingleton<IValidateOptions<LdapOptions>, LdapOptionsValidator>()`.
It never calls `OptionsBuilder.ValidateOnStart()`. With `Microsoft.Extensions.Options`, an
`IValidateOptions<T>` runs only when the options instance is first materialized (`IOptions<T>.Value`),
not at host startup. Here the only consumer that materializes the options is the `ILdapAuthService`
singleton factory (line 48), whose lambda runs on first resolution of `ILdapAuthService` — i.e. on
the first login attempt, not at boot.
The result contradicts the explicit promise in both the method comment ("Fail fast at startup on a
misconfigured directory rather than on first login") and the validator's own XML doc ("Validates
`LdapOptions` at startup so a misconfiguration fails fast at boot"). It also weakens SPEC §2 step 1,
which says an insecure transport without `AllowInsecure` must be *refused* — that refusal is silently
deferred until a user tries to log in. A process can therefore boot green with `Transport=None` and
no `AllowInsecure`, and only surface the misconfiguration as a failed login later. Because this is a
shared library consumed by three hosts, the missing guard affects all of them.
**Recommendation**
Register options via the builder and add `ValidateOnStart`:
```csharp
services.AddOptions<LdapOptions>()
.Bind(config.GetSection(sectionPath))
.ValidateOnStart();
services.AddSingleton<IValidateOptions<LdapOptions>, LdapOptionsValidator>();
```
(`ValidateOnStart` relies on the host's start-time options validation, which the ASP.NET Core hosts
already run.) Add a test asserting that building + starting the host fails for an insecure config.
**Resolution**
_Unresolved._
### Auth-002 — A failed `MarkUsedAsync` write turns a valid API key into a thrown exception
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/ApiKeyVerifier.cs:66` |
**Description**
After a successful constant-time secret match, `VerifyAsync` does
`await store.MarkUsedAsync(record.KeyId, _timeProvider.GetUtcNow(), ct)` before returning the
success result. `MarkUsedAsync` is a best-effort "last used" bookkeeping write. If that write fails
(SQLite still `SQLITE_BUSY` after the 5 s busy-timeout, disk full, DB locked by a long migration,
etc.), the exception propagates out of `VerifyAsync` and the otherwise-valid credential fails
authentication. The verifier's own class remark states "The only exception path is cancellation",
so callers (gRPC interceptors / middleware) will not be wrapping this in a try/catch and a transient
storage hiccup becomes an auth outage for a legitimate caller.
**Recommendation**
Treat the usage write as best-effort: either swap the order (return success, then fire-and-record),
or wrap the `MarkUsedAsync` call so a failure is logged/swallowed rather than failing the
verification — the authentication decision has already been made by line 60. If `MarkUsedAsync` is
allowed to throw, update the XML remark to say so. Add a test where the store's `MarkUsedAsync`
throws and assert the result is still `Succeeded == true`.
**Resolution**
_Unresolved._
### Auth-003 — Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/ScopeSerializer.cs:31`, `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Sqlite/SqliteApiKeyStore.cs:71` |
**Description**
`SqliteApiKeyStore.ReadRecord` calls `ScopeSerializer.Deserialize(reader.GetString(4))`, which does
`JsonSerializer.Deserialize<string[]>(value)`. If the `scopes` column ever holds non-array or
malformed JSON (operator tampering, a partial write, a future schema/format change, or a bug in
another writer), `Deserialize` throws `JsonException`. That exception propagates out of
`FindByKeyIdAsync`, and therefore out of `ApiKeyVerifier.VerifyAsync`, as an unhandled exception —
again contradicting the "only exception path is cancellation" contract and converting a single
poisoned row into a crash on the auth path rather than a structured fail-closed result. The same
applies to the admin `ListAsync` projection.
**Recommendation**
Decide the policy explicitly. Either (a) catch `JsonException` in `ScopeSerializer.Deserialize` and
treat a malformed column as an empty/invalid scope set so verification fails closed structurally, or
(b) catch it at the store boundary and surface it as a controlled error. Add a test that inserts a
row with a malformed `scopes` value and asserts the verifier returns a structured failure (or a
documented exception) rather than an unhandled `JsonException`.
**Resolution**
_Unresolved._
### Auth-004 — README misstates the hashing algorithm and the AspNetCore public surface
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/README.md:13`, `ZB.MOM.WW.Auth/README.md:14` |
**Description**
The README's package table contains two inaccuracies that matter for a security library:
1. It describes `ZB.MOM.WW.Auth.ApiKeys` as a store "with pepper-based **PBKDF2** hashing". The code
uses **HMAC-SHA256** keyed by the pepper (`ApiKeySecretHasher.Hash``new HMACSHA256(pepperBytes)`),
not PBKDF2. This is the correct algorithm per SPEC §4 ("HMAC-SHA256 with an external pepper"), so
the *code* is right and the *README* is wrong — but a reader auditing the crypto will be misled
about iteration count / KDF properties that do not exist here.
2. It says `ZB.MOM.WW.Auth.AspNetCore` provides "ASP.NET Core DI helpers (`AddZbAuth`) ... and
`LdapOptionsValidator` registration. **Wires together Ldap + ApiKeys + cookie middleware**." The
actual public surface is a single extension method `AddZbLdapAuth` (LDAP only). There is no
`AddZbAuth`, no API-key re-export from this package, and no cookie-middleware wiring —
`ZbCookieDefaults.Apply` is a helper the consumer must call itself, and API-key DI lives in the
separate `ZB.MOM.WW.Auth.ApiKeys` package (`AddZbApiKeyAuth`). The shared contract
(`ZB.MOM.WW.Auth.md`) likewise lists a combined DI shape that the code splits differently.
**Recommendation**
Fix the README table: change "PBKDF2" to "HMAC-SHA256", and correct the AspNetCore row to describe
the real surface (`AddZbLdapAuth`, `ZbCookieDefaults.Apply`, `ZbClaimTypes`) and note that API-key
wiring is `AddZbApiKeyAuth` in the ApiKeys package. Optionally reconcile the shared-contract doc's
DI-method names with the shipped names.
**Resolution**
_Unresolved._
### Auth-005 — `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.ApiKeys/Admin/ApiKeyAdminCommands.cs:104` |
**Description**
When creating a key, `KeyPrefix` is set to `$"{_options.TokenPrefix}_{keyId}"` (e.g. `mxgw_key-1`),
embedding the key id into the column. Elsewhere `KeyPrefix` is treated as the bare token prefix:
the verifier test (`ApiKeyVerifierTests.BuildRecord`) and the store test
(`SqliteApiKeyStoreTests.SampleRecord`) seed it as `"mxgw"` / `"mxgw_ab12"`, and the `ApiKeyRecord`
contract gives no indication it should include the key id. `KeyPrefix` is not consulted during
verification (the token is re-parsed from the header), so there is no security impact — but the
persisted/displayed value is inconsistent and self-referential (the key id is already its own
column), which will confuse admin tooling that surfaces `KeyPrefix`.
**Recommendation**
Persist `KeyPrefix = _options.TokenPrefix` (the bare prefix) and let the key id remain the `KeyId`
column, or document precisely what `KeyPrefix` is meant to contain on the `ApiKeyRecord` /
`ApiKeyListItem` records and align all call sites to it.
**Resolution**
_Unresolved._
### Auth-006 — LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook
| | |
|--|--|
| Severity | Low |
| Category | Security & secret handling |
| Status | Open |
| Location | `ZB.MOM.WW.Auth/src/ZB.MOM.WW.Auth.Ldap/Internal/NovellLdapConnection.cs:16` |
**Description**
`NovellLdapConnection.Connect` receives `bool allowInsecure` but never uses it; transport-security
correctness rests entirely on `LdapOptionsValidator` having run beforehand (and that registration is
itself lazy — see Auth-001). The method also sets `SecureSocketLayer`/`StartTls` but never configures
a server-certificate validation delegate or any pinning/CA hook. The underlying
`Novell.Directory.Ldap.NETStandard` 3.6.0 validates against the OS trust store by default (it does
*not* blind-accept), so this is not an unauthenticated-TLS hole today — hence Low — but the library
gives a consumer no seam to tighten (pin a CA, validate the SAN against `Server`, or deliberately
accept a self-signed dev cert), and the unused parameter reads as a guard that does nothing.
**Recommendation**
Either remove the unused `allowInsecure` parameter from the `ILdapConnection.Connect` seam or use it
to gate an explicit "accept invalid cert" path for dev only. Add an optional
`RemoteCertificateValidationCallback` (or CA-pinning option) so production consumers can harden LDAPS
beyond the OS default, and document the default validation behavior.
**Resolution**
_Unresolved._
+167 -17
View File
@@ -6,31 +6,181 @@
| Packages | `ZB.MOM.WW.Configuration` |
| Component spec | `components/configuration/spec/SPEC.md` |
| Shared contract | `components/configuration/shared-contract/ZB.MOM.WW.Configuration.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 4 |
## Summary
_Not yet reviewed._
The library is small (five source files, ~230 LOC) and in good health. Its core promise —
**accumulate every failure, never short-circuit** — holds across all three entry points
(`ValidationBuilder`, `OptionsValidatorBase`, `ConfigPreflight`): the override never returns
early, each primitive appends to a `List<string>` and returns `this`, and the `Success`/`Fail`
decision is taken only after every rule has run. The boundary cases that matter are correct:
`Port` rejects 0 and 65536 and accepts 1/65535; `HostPort` correctly rejects bare hosts, empty
ports, out-of-range ports, and (deliberately) unbracketed IPv6; `Required` treats null / empty /
whitespace alike; `OneOf` is case-insensitive and treats null as a failure (documented);
`MinCount` treats null as zero. Null-guarding is consistent (`ArgumentNullException`/
`ArgumentException.ThrowIfNullOrWhiteSpace` on every public entry point). The
`ConfigPreflight.ThrowIfInvalid()` envelope is **byte-compatible** with the ScadaBridge
`StartupValidator` format the SPEC pins (`"Configuration validation failed:\n - <field> <reason>"`,
`\n`-joined), and the validator created by `OptionsValidatorBase` is stateless (fresh
`ValidationBuilder` per call), so the singleton-registration requirement in SPEC §3 is honoured.
Dependency footprint is minimal and exactly as documented (four `Microsoft.Extensions.*`
abstractions, no ASP.NET Core). The four findings are all **Low**: a non-idempotent DI
registration (`AddSingleton` vs `TryAddEnumerable`), a small wording inconsistency in the raw-port
check, over-permissive integer port parsing, and the XML-doc-not-shipped packaging gap (which is
fleet-wide, not specific to this library). Test coverage (27 tests across the four public surfaces)
is solid for the happy and primary-failure paths; a few edge cases noted below are untested.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | Accumulation never short-circuits; port/host:port/duration/min-count boundaries correct. Port parsing over-permissive (003). |
| 2 | Public API surface & compatibility | | Surface matches the shared contract exactly; nullability annotations correct; `sealed`/`abstract` deliberate; internal `Checks` does not leak. No issues. |
| 3 | Concurrency & thread safety | | Validator is stateless (fresh builder per `Validate`); `ValidationBuilder`/`ConfigPreflight` are per-call locals. Safe as a singleton. No issues. |
| 4 | Error handling & resilience | | Guard clauses on every public entry; correct exception types (`OptionsValidationException` via `ValidateOnStart`, `InvalidOperationException` from `ConfigPreflight`); no swallowed exceptions. No issues. |
| 5 | Security & secret handling | | No secrets handled; failure messages echo config *values* (port/host/role), which are non-sensitive by design. No issues. |
| 6 | Performance & resource management | | Startup-only, no hot path, nothing disposable, no async. No issues. |
| 7 | Spec & shared-contract adherence | | `ConfigPreflight` envelope byte-compatible with `StartupValidator`; primitives match §2 table. Minor wording inconsistency in `PortValue` (002). |
| 8 | Packaging, dependencies & project layout | | Single package, minimal closure, central versions, `.gitignore` present, no tracked build output. XML docs / README not packaged (004). |
| 9 | Testing coverage | | 27 tests cover the four public surfaces + accumulation + byte-format. Untested: `AddValidatedOptions` null guards, double-registration, exact wording strings. |
| 10 | Documentation & XML docs | | Public XML docs present and accurate on every type/member. Not emitted into the nupkg (004). |
## Findings
_No findings recorded yet._
### Configuration-001 — `AddValidatedOptions` uses `AddSingleton`, so a double call registers (and runs) the validator twice
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/ServiceCollectionExtensions.cs:36` |
**Description**
`AddValidatedOptions` registers the validator with
`services.AddSingleton<IValidateOptions<TOptions>, TValidator>()`. `IValidateOptions<TOptions>`
is a multi-registration (enumerable) service — the options pipeline resolves *all* registered
`IValidateOptions<TOptions>` and runs each. Because `AddSingleton` appends unconditionally, calling
`AddValidatedOptions<TOptions, TValidator>` twice for the same `TOptions`/`TValidator` pair (e.g.
two modules both guarding the same section, or an accidental duplicate during refactoring)
registers the validator twice, so it executes twice and every accumulated failure is reported
twice in the resulting `OptionsValidationException`. The framework-idiomatic registration for
`IValidateOptions<T>` is `TryAddEnumerable(ServiceDescriptor.Singleton<IValidateOptions<TOptions>,
TValidator>())`, which is idempotent. Impact is limited (consumers normally call once per section,
and duplicate messages are cosmetic, not a correctness break of the accumulate-all contract), hence
Low — but the deviation from the .NET idiom is real and easy to fix. The SPEC §3 wording ("registers
the validator as a singleton") is satisfied either way.
**Recommendation**
Replace `services.AddSingleton<IValidateOptions<TOptions>, TValidator>()` with
`services.TryAddEnumerable(ServiceDescriptor.Singleton<IValidateOptions<TOptions>, TValidator>())`
(via `Microsoft.Extensions.DependencyInjection.Extensions`). Add a test asserting that calling
`AddValidatedOptions` twice yields a single set of failure messages.
**Resolution**
_Unresolved._
### Configuration-002 — `Checks.PortValue` quotes the raw value on a parse failure but not on a range failure
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:21` |
**Description**
`PortValue` produces two different message shapes for the same field depending on *why* the port
is invalid. When the raw string parses but is out of range (e.g. `"0"` or `"70000"`), it delegates
to `Checks.Port(int, field)`, which renders `"<field> must be between 1 and 65535 (was 0)"`
**unquoted**. When the raw string does not parse (e.g. `"notaport"`, `null`), it renders
`"<field> must be between 1 and 65535 (was 'notaport')"`**quoted**. So two failures of the same
rule, from the same raw-config caller, read with inconsistent quoting. The shared contract
(`shared-contract/ZB.MOM.WW.Configuration.md`, "Internal `Checks` seam") states `Checks` is "the
single source of failure wording" so "a port failure reads the same whether it came from a bound
options object or a raw config key" — this inconsistency is in mild tension with that claim. It is
cosmetic (no consumer parses the text) and the overlapping integer case *does* match the bound-options
wording, so it is Low.
**Recommendation**
Make the two branches use a consistent quoting convention for the offending value — e.g. have the
range-failure branch also quote (`(was '0')`), or have the parse-failure branch route through a
shared formatter. Lock the exact strings down with a wording assertion test.
**Resolution**
_Unresolved._
### Configuration-003 — Port parsing accepts leading sign and surrounding whitespace and is culture-sensitive
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:22`, `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/Checks.cs:36` |
**Description**
Both `PortValue` (`int.TryParse(raw, out var port)`) and `HostPort`
(`int.TryParse(value[(idx + 1)..], out var port)`) use the default `int.TryParse` overload, which
applies `NumberStyles.Integer` (`AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign`) and the
current culture. As a result strings the documentation describes as "integer TCP port" are accepted
more loosely than intended: `"+5000"`, `" 5000 "`, and `"host: 5000"` (space after the colon) all
parse and pass, and parsing is locale-dependent. A leading `-` parses to a negative number that the
subsequent range check rejects (so `"-1"` is correctly rejected), but the whitespace/leading-`+`
cases silently pass. This is a robustness nuance rather than a security or correctness break — a port
that survives the loose parse is still in `1..65535` — so it is Low.
**Recommendation**
Parse with an explicit, culture-invariant, strict style, e.g.
`int.TryParse(raw, NumberStyles.None, CultureInfo.InvariantCulture, out var port)` (rejects sign and
whitespace), in both `PortValue` and `HostPort`. Add `Theory` cases for `"+5000"`, `" 5000 "`, and a
space-after-colon endpoint to pin the behaviour.
**Resolution**
_Unresolved._
### Configuration-004 — XML documentation and README are not packaged into the nupkg
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/ZB.MOM.WW.Configuration.csproj:1`, `ZB.MOM.WW.Configuration/Directory.Build.props:1` |
**Description**
Every public type and member carries accurate XML doc comments, but neither the project nor
`Directory.Build.props` sets `<GenerateDocumentationFile>true</GenerateDocumentationFile>`, so no
`.xml` doc file is produced or included in the `dotnet pack` output. Consumers of the package
therefore get **no IntelliSense documentation** for `OptionsValidatorBase`, `ValidationBuilder`,
`AddValidatedOptions`, or `ConfigPreflight`, despite the docs existing in source. The project also
does not set `PackageReadmeFile`, so the README is not embedded in the package for display on the
NuGet feed. Category 10 of the review process explicitly notes "these are libraries — public docs
matter," so the gap is worth recording. It is fleet-wide (the sibling `ZB.MOM.WW.Audit` library has
the same omission), not a Configuration-specific regression, hence Low.
**Recommendation**
Add `<GenerateDocumentationFile>true</GenerateDocumentationFile>` (ideally in
`Directory.Build.props` so the whole family inherits it) and consider `<PackageReadmeFile>README.md</PackageReadmeFile>`
plus packing the README. Treat as a shared-infra change across the six libraries rather than a
one-off.
**Resolution**
_Unresolved._
+236 -17
View File
@@ -6,31 +6,250 @@
| Packages | `ZB.MOM.WW.Health`, `ZB.MOM.WW.Health.Akka`, `ZB.MOM.WW.Health.EntityFrameworkCore` |
| Component spec | `components/health/spec/SPEC.md` |
| Shared contract | `components/health/shared-contract/ZB.MOM.WW.Health.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 6 |
## Summary
_Not yet reviewed._
The library is small, cohesive, and well-documented, and it tracks the normalized spec closely:
the three-tier endpoint convention, the canonical JSON writer, the `IActiveNodeGate` seam, the
configurable `AkkaClusterStatusPolicy` presets, and the role-filtered `ActiveNodeHealthCheck` all
match the contract. The package split is clean — Akka and EF Core stay out of the core package, so
MxGateway can take core only (category 8: no leakage). The decision logic is factored into pure,
exhaustively table-tested functions (`AkkaClusterStatusPolicy`, `ActiveNodeDecision`), and the test
suite (58 tests) exercises the public surface through a real TestServer / real SQLite / real probe
delegates rather than mocks.
The findings cluster around **error-handling completeness in the health-check probes** (category 4),
the highest-risk area for this library: a probe that throws past the `IHealthCheck` boundary, or that
returns a status the spec says it should not, degrades the value of every consuming app's `/health/*`
endpoints. The two Akka checks and the gRPC check do not guard the cluster-state / probe call the way
the spec requires (return Degraded on inaccessible cluster) or the way the sibling `DatabaseHealthCheck`
does (catch-all → Unhealthy). The framework's `HealthCheckService` catches escaping exceptions, so none
of these crash an endpoint — that caps their severity at Medium — but they produce a result that
disagrees with the documented contract. The remainder are Low: a JSON shape nuance (`description` key
omitted vs. emitted-null), recommended-tag drift in XML docs, a return-value footgun in `MapZbHealth`,
and missing `GenerateDocumentationFile` so the (otherwise excellent) XML docs do not ship in the nupkgs.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | Tier predicates, policy presets, and `ActiveNodeDecision` matrices are correct and match the spec tables. No logic defects found. |
| 2 | Public API surface & compatibility | | Health-005. Surface is minimal, sealed, well-named; nullable annotations correct; no internal types leak (`ActiveNodeDecision` is `internal`). |
| 3 | Concurrency & thread safety | | Options are mutable POCOs but per-registration; checks hold no shared mutable state; cluster-state reads are snapshot reads; static writer options are immutable. No issues found. |
| 4 | Error handling & resilience | | Health-001, Health-002. Akka checks don't guard cluster-state access; gRPC check lets non-Rpc/non-OCE exceptions escape. |
| 5 | Security & secret handling | | No secrets/PII. Exception objects attached to `HealthCheckResult` (standard); endpoints `AllowAnonymous` by design. No issues found. |
| 6 | Performance & resource management | | `DatabaseHealthCheck` is pool-safe and eagerly releases the timeout timer; scopes/contexts disposed via `await using`; channel not owned (correctly not disposed). No issues found. |
| 7 | Spec & shared-contract adherence | | Health-001, Health-003, Health-004. Endpoint convention + status strings otherwise match §3. |
| 8 | Packaging, dependencies & project layout | | Health-006. Dependency split correct; central versions; net10.0; correct package ids. |
| 9 | Testing coverage | | Strong: TestServer tier tests, real SQLite EF tests, table-driven policy/decision tests, gRPC probe + timeout + cancellation. Gaps noted inline in findings, not raised separately. |
| 10 | Documentation & XML docs | | Health-004, Health-006. Public API XML docs are thorough and accurate; the issues are recommended-tag drift and docs not being packed. |
## Findings
_No findings recorded yet._
### Health-001 — Akka health checks throw (instead of returning Degraded) when cluster state is inaccessible
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs:45`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/ActiveNodeHealthCheck.cs:106` |
**Description**
Both Akka checks guard only the *null* ActorSystem case: `_serviceProvider.GetService<ActorSystem>()`
returns `null` → Degraded. But once a non-null `ActorSystem` is resolved they call `Cluster.Get(system)`
and read `.SelfMember` / `.State.Leader` with no try/catch. The spec is explicit that this path must be
Degraded, not an exception:
> "in both modes, if the ActorSystem is not yet ready **or cluster state is inaccessible** (e.g.
> during startup), the check returns Degraded (startup-safety rule)." — `SPEC.md` §2.2 note.
`Cluster.Get(system)` throws (`ConfigurationException`) when the `Akka.Cluster` extension is not
configured on the resolved ActorSystem, and `SelfMember` can throw during the window where the
ActorSystem object exists but the cluster has not finished initialising — exactly the startup race the
spec calls out. The ActorSystem being present-but-not-yet-clustered is a very real ordering in a
`Microsoft.Extensions.Hosting` app (the ActorSystem is registered in DI before `Cluster` has joined).
The escaping exception is caught by ASP.NET Core's `HealthCheckService`, which records the entry as
Unhealthy — so the endpoint does not crash. But Unhealthy on the `ready` tier returns **503** and
removes the node from load-balancing, whereas the spec's intended Degraded returns **200** (still in
rotation). A transient startup window therefore yanks a healthy-but-starting node out of rotation,
the opposite of the "startup-safe" guarantee the XML docs advertise.
**Recommendation**
Wrap the cluster-state read in a try/catch and return `HealthCheckResult.Degraded(...)` on failure, so
both the null-ActorSystem and inaccessible-cluster paths converge on Degraded as the spec requires:
resolve `Cluster.Get(system)` and read membership inside a `try`, catching the cluster extension's
startup exceptions and mapping them to Degraded with a "cluster not yet ready" description.
**Resolution**
_Unresolved._
### Health-002 — `GrpcDependencyHealthCheck` lets non-`RpcException`/non-`OperationCanceledException` errors escape
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs:54` |
**Description**
`CheckHealthAsync` catches only `RpcException` and `OperationCanceledException`. Any other exception
thrown by the probe escapes the method. This is reachable on both paths:
- Default probe: `GrpcChannel.ConnectAsync` can throw exceptions other than `RpcException`
e.g. `InvalidOperationException` (channel/socket misuse, shutdown) or `HttpRequestException` /
`SocketException` surfaced from the transport before a gRPC status is produced.
- Custom probe (`GrpcDependencyOptions.Probe`): a caller-supplied delegate (e.g. a
`grpc.health.v1.Health/Check` RPC built on a non-gRPC client) can throw anything.
The XML docs and shared contract describe the result as Unhealthy "when it returns `false`, throws an
`RpcException`, or times out" — they do not promise to handle arbitrary exceptions, and the code does
not. By contrast the sibling `DatabaseHealthCheck` has a catch-all `catch (Exception ex)`
`Unhealthy("Database connection failed.", ex)` (`DatabaseHealthCheck.cs:78`), so the two probes are
asymmetric. As with Health-001 the framework's `HealthCheckService` records the escaping exception as
Unhealthy, so no endpoint crashes — but the dependency-named, controlled description
(`"{name} probe failed…"`) is lost, and the library violates its own "must not throw past the
`IHealthCheck` boundary" intent.
**Recommendation**
Add a trailing `catch (Exception ex)` returning `HealthCheckResult.Unhealthy($"{name} probe failed:
{ex.Message}", ex)`, after the existing OCE/Rpc handlers (so external cancellation still propagates).
This brings the gRPC probe in line with `DatabaseHealthCheck` and the documented contract.
**Resolution**
_Unresolved._
### Health-003 — Null `description` is omitted from the JSON body instead of emitted as `null`
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZbHealthWriter.cs:32` |
**Description**
`SerializerOptions` sets `DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull`, so a check that
produces no description renders an entry with **no** `description` key at all:
`{ "status": "Healthy", "durationMs": 0.1 }`. The spec models `description` as a `string?` that "may be
null" and both the canonical example (`SPEC.md` §3) and the `HealthChecks.UI.Client` shape the writer
claims to mirror emit the key with a `null` value rather than dropping it. Consumers/dashboards parsing
the body and reading `entries.<name>.description` must therefore handle a *missing* property, not just a
null one. The deviation is undocumented and untested — `ResponseWriterTests` only asserts the
present-description case.
Low because the aggregate `status`/HTTP-code contract that orchestrators key on is unaffected; only the
descriptive sub-field shape drifts.
**Recommendation**
Either remove `WhenWritingNull` (so `description: null` is emitted, matching the spec example and the
UI-client shape), or document explicitly in the writer's XML docs and `SPEC.md` §3 that absent
descriptions are omitted, and add a test pinning the chosen behaviour.
**Resolution**
_Unresolved._
### Health-004 — XML docs recommend the `active` tag for ready-tier probes, contradicting the spec
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health.Akka/AkkaClusterHealthCheck.cs:11`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/GrpcDependencyHealthCheck.cs:21` |
**Description**
`AkkaClusterHealthCheck`'s summary says *"Register to the `ZbHealthTags.Ready` tag (recommended
`[ready, active]`)"*, and `GrpcDependencyHealthCheck` likewise recommends tagging `[ready, active]`.
The spec is unambiguous that both probes belong to the `ready` tier only: §2.2 "Registered to the
`ready` tag" (Akka cluster) and §2.4 "Registered to the `ready` tag" (gRPC dependency). The `active`
tier is reserved by §1 / §2.3 for the leader/active-singleton probe (`ActiveNodeHealthCheck`); putting
cluster-membership or gRPC-reachability checks on the `active` tier pollutes the active-node tier with
non-leadership concerns and contradicts the convergence convention the library exists to enforce.
**Recommendation**
Bring the XML-doc recommendations in line with the spec — recommend `ZbHealthTags.Ready` only for both
probes (drop the `active` suggestion); or, if the dual-tag recommendation is intentional, reconcile it
back into `SPEC.md` §2.2/§2.4 so code and spec agree.
**Resolution**
_Unresolved._
### Health-005 — `MapZbHealth` returns only the readiness builder, silently dropping conventions for the active/live tiers
| | |
|--|--|
| Severity | Low |
| Category | Public API surface & compatibility |
| Status | Open |
| Location | `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZbHealthEndpointExtensions.cs:64` |
**Description**
`MapZbHealth` maps three endpoints but returns the `IEndpointConventionBuilder` for `/health/ready`
only. Any convention a caller chains onto the result — `.RequireHost(...)`, `.RequireAuthorization()`,
`.WithMetadata(...)`, `.CacheOutput()` — applies to the readiness endpoint **alone**; the active and
liveness endpoints silently do not receive it. The behaviour is documented in the XML `<returns>`, so
this is a deliberate, disclosed trade-off rather than a bug, but it is a real footgun: the most natural
reading of `endpoints.MapZbHealth().RequireHost("…")` is "gate all three health endpoints", and the
result type gives the caller no signal that two of the three are excluded.
**Recommendation**
Consider returning a composite builder that fans conventions out to all three endpoints (collect the
three `IEndpointConventionBuilder`s and wrap them) so chained conventions behave as a caller expects.
If the single-builder return is kept for simplicity, strengthen the `<returns>` remark to a warning and
note it in the README's `MapZbHealth` example.
**Resolution**
_Unresolved._
### Health-006 — XML documentation is not emitted into the packed nupkgs
| | |
|--|--|
| Severity | Low |
| Category | Packaging, dependencies & project layout |
| Status | Open |
| Location | `ZB.MOM.WW.Health/Directory.Build.props:3`, `ZB.MOM.WW.Health/src/ZB.MOM.WW.Health/ZB.MOM.WW.Health.csproj:3` |
**Description**
Every public type and member in all three packages carries thorough, accurate XML documentation, but no
project (and neither `Directory.Build.props` nor the three `.csproj` files) sets
`<GenerateDocumentationFile>true</GenerateDocumentationFile>`. As a result `dotnet pack` produces nupkgs
that contain the DLLs but **no `.xml` doc files**, so consumers of `ZB.MOM.WW.Health` / `.Akka` /
`.EntityFrameworkCore` get no IntelliSense summaries or parameter docs for the API. For a shared library
whose value proposition is a documented common surface across three apps, this discards the
documentation effort at the package boundary. (It also means the compiler does not enforce the
"public members are documented" invariant via CS1591.)
**Recommendation**
Set `<GenerateDocumentationFile>true</GenerateDocumentationFile>` once in `Directory.Build.props` so all
three packable projects emit and pack their XML docs. Optionally pair with targeted `<NoWarn>` if any
CS1591 warnings surface on currently-undocumented members.
**Resolution**
_Unresolved._
+55 -17
View File
@@ -35,26 +35,26 @@ code-reviews/
## Summary
0 of 6 libraries reviewed. 0 pending findings across all libraries.
6 of 6 libraries reviewed. 35 pending findings across all libraries.
| Severity | Open findings |
|----------|---------------|
| Critical | 0 |
| High | 0 |
| Medium | 0 |
| Low | 0 |
| **Total** | **0** |
| High | 1 |
| Medium | 9 |
| Low | 25 |
| **Total** | **35** |
## Library Status
| Library | Last reviewed | Commit | Open (C/H/M/L) | Open | Total |
|---------|---------------|--------|----------------|------|-------|
| [Audit](Audit/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Auth](Auth/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Configuration](Configuration/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Health](Health/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Telemetry](Telemetry/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Theme](Theme/findings.md) | — | — | 0/0/0/0 | 0 | 0 |
| [Audit](Audit/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/1/4 | 5 | 5 |
| [Auth](Auth/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/3/3 | 6 | 6 |
| [Configuration](Configuration/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/0/4 | 4 | 4 |
| [Health](Health/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/2/4 | 6 | 6 |
| [Telemetry](Telemetry/findings.md) | 2026-06-01 | `5f75cd4` | 0/1/2/5 | 8 | 8 |
| [Theme](Theme/findings.md) | 2026-06-01 | `5f75cd4` | 0/0/1/5 | 6 | 6 |
## Pending Findings
@@ -67,14 +67,52 @@ description, location, recommendation — lives in the library's `findings.md`.
_None open._
### High (0)
### High (1)
_None open._
| ID | Library | Title |
|----|---------|-------|
| Telemetry-001 | [Telemetry](Telemetry/findings.md) | `RedactionEnricher` ignores property removal, leaving secrets in the event |
### Medium (0)
### Medium (9)
_None open._
| ID | Library | Title |
|----|---------|-------|
| Audit-001 | [Audit](Audit/findings.md) | `CompositeAuditWriter` re-throws `OperationCanceledException` to the caller, contradicting the "must not throw" writer contract |
| Auth-001 | [Auth](Auth/findings.md) | LDAP options validator is registered but never runs at startup |
| Auth-002 | [Auth](Auth/findings.md) | A failed `MarkUsedAsync` write turns a valid API key into a thrown exception |
| Auth-003 | [Auth](Auth/findings.md) | Corrupt `scopes`/`constraints` column throws `JsonException` through the verifier |
| Health-001 | [Health](Health/findings.md) | Akka health checks throw (instead of returning Degraded) when cluster state is inaccessible |
| Health-002 | [Health](Health/findings.md) | `GrpcDependencyHealthCheck` lets non-`RpcException`/non-`OperationCanceledException` errors escape |
| Telemetry-002 | [Telemetry](Telemetry/findings.md) | Redactor cannot inspect or scrub destructured/structured property values |
| Telemetry-003 | [Telemetry](Telemetry/findings.md) | No tests for redactor removal or structured-value redaction |
| Theme-001 | [Theme](Theme/findings.md) | Mobile hamburger toggle silently depends on Bootstrap collapse JS |
### Low (0)
### Low (25)
_None open._
| ID | Library | Title |
|----|---------|-------|
| Audit-002 | [Audit](Audit/findings.md) | `TruncatingAuditRedactor` over-redaction is partial: the catch path scrubs only `DetailsJson`, leaving `Target` unredacted |
| Audit-003 | [Audit](Audit/findings.md) | `TruncatingAuditRedactorOptions` is a mutable class, not the immutable "options record" the contract describes |
| Audit-004 | [Audit](Audit/findings.md) | XML documentation is authored but not emitted, so IntelliSense docs do not ship to consumers |
| Audit-005 | [Audit](Audit/findings.md) | Missing edge-case tests for the redactor never-throw/over-redact contract and composite null/empty handling |
| Auth-004 | [Auth](Auth/findings.md) | README misstates the hashing algorithm and the AspNetCore public surface |
| Auth-005 | [Auth](Auth/findings.md) | `CreateKeyAsync` persists `KeyPrefix` as `prefix_keyId`, inconsistent with the read path |
| Auth-006 | [Auth](Auth/findings.md) | LDAP connection ignores `allowInsecure` and offers no TLS certificate-validation hook |
| Configuration-001 | [Configuration](Configuration/findings.md) | `AddValidatedOptions` uses `AddSingleton`, so a double call registers (and runs) the validator twice |
| Configuration-002 | [Configuration](Configuration/findings.md) | `Checks.PortValue` quotes the raw value on a parse failure but not on a range failure |
| Configuration-003 | [Configuration](Configuration/findings.md) | Port parsing accepts leading sign and surrounding whitespace and is culture-sensitive |
| Configuration-004 | [Configuration](Configuration/findings.md) | XML documentation and README are not packaged into the nupkg |
| Health-003 | [Health](Health/findings.md) | Null `description` is omitted from the JSON body instead of emitted as `null` |
| Health-004 | [Health](Health/findings.md) | XML docs recommend the `active` tag for ready-tier probes, contradicting the spec |
| Health-005 | [Health](Health/findings.md) | `MapZbHealth` returns only the readiness builder, silently dropping conventions for the active/live tiers |
| Health-006 | [Health](Health/findings.md) | XML documentation is not emitted into the packed nupkgs |
| Telemetry-004 | [Telemetry](Telemetry/findings.md) | `service.instance.id` Resource attribute is undocumented in spec and contract |
| Telemetry-005 | [Telemetry](Telemetry/findings.md) | Two hand-maintained Resource-attribute builders can silently drift |
| Telemetry-006 | [Telemetry](Telemetry/findings.md) | Malformed `OtlpEndpoint` throws `UriFormatException` late, with no context |
| Telemetry-007 | [Telemetry](Telemetry/findings.md) | Redaction snapshot allocates a dictionary on every log event |
| Telemetry-008 | [Telemetry](Telemetry/findings.md) | `MapZbMetrics` XML doc claims it is "only valid when Exporter = Prometheus" — stale |
| Theme-002 | [Theme](Theme/findings.md) | `.chip-idle` foreground diverges from the documented token pairing |
| Theme-003 | [Theme](Theme/findings.md) | Hardcoded hex values in CSS contradict the "no hardcoded hex" rule |
| Theme-004 | [Theme](Theme/findings.md) | `NavRailItem` emits a `.rail-ico` span that no stylesheet defines |
| Theme-005 | [Theme](Theme/findings.md) | Orphan and unstyled nav CSS classes in `layout.css` |
| Theme-006 | [Theme](Theme/findings.md) | Public component/parameter surface lacks XML documentation |
+298 -17
View File
@@ -6,31 +6,312 @@
| Packages | `ZB.MOM.WW.Telemetry`, `ZB.MOM.WW.Telemetry.Serilog` |
| Component spec | `components/observability/spec/SPEC.md` |
| Shared contract | `components/observability/shared-contract/ZB.MOM.WW.Telemetry.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 8 |
## Summary
_Not yet reviewed._
The library is small, focused, and well-structured: two packages with a clean Serilog/OTel
boundary (the `Serilog.*` stack appears only in the `.Serilog` package; the core package is
pure OTel + ASP.NET Core framework reference), correct argument validation, deliberate
`sealed` types, thorough XML docs, and a deliberate no-process-global-state design for
`AddZbSerilog` that is well covered by `MultiHostTests`. The identity triple, Resource
omission rules, exporter wiring (Prometheus always-on, OTLP additive), and trace/log
correlation all match the spec's intent and are exercised by the 19 tests.
The most material problems are in the redaction seam — the one component the review brief
flags as security-critical. `RedactionEnricher` honours only *replacement* of scalar
properties: it silently ignores the redactor **removing** a key (a documented capability of
`ILogRedactor`), and it cannot see inside destructured/structured property values, so a
secret logged as a field of `{@Object}` is never scrubbed. Both let secrets reach sinks
despite a conforming redactor. Secondary themes: a spec drift around an undocumented
`service.instance.id` Resource attribute, two hand-maintained Resource-attribute builders
that can drift apart, and a stale doc-comment on `MapZbMetrics`. Tests are solid for the
happy paths but have no coverage for redactor removal or structured-value redaction.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | Redactor "remove" path is a no-op (Telemetry-001); structured values opaque to redactor (Telemetry-002). |
| 2 | Public API surface & compatibility | | Surface minimal, `sealed`, nullable-correct. `ZbResource.InstanceId` is an added public member not in the contract (Telemetry-004). |
| 3 | Concurrency & thread safety | | No issues found. Enrichers stateless; `Lazy` uses `ExecutionAndPublication`; `Activity.Current` is async-local. |
| 4 | Error handling & resilience | | Guard clauses present. `new Uri(OtlpEndpoint)` can throw late on malformed input (Telemetry-006). |
| 5 | Security & secret handling | | Redaction gaps (Telemetry-001/002) are security-relevant — secrets can survive a conforming redactor. |
| 6 | Performance & resource management | | Per-event dictionary snapshot when a redactor is registered (Telemetry-007); acceptable but noted. |
| 7 | Spec & shared-contract adherence | | Undocumented `service.instance.id` attribute (Telemetry-004); two Resource builders that can drift (Telemetry-005). |
| 8 | Packaging, dependencies & project layout | | No issues found. Serilog stack confined to `.Serilog`; central versions; correct net10.0; framework ref justified. |
| 9 | Testing coverage | | No tests for redactor removal or structured-value redaction (Telemetry-003). |
| 10 | Documentation & XML docs | | `MapZbMetrics` doc-comment is stale: claims "only valid when Exporter = Prometheus" (Telemetry-008). |
## Findings
_No findings recorded yet._
### Telemetry-001 — `RedactionEnricher` ignores property removal, leaving secrets in the event
| | |
|--|--|
| Severity | High |
| Category | Security & secret handling |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67` |
**Description**
`ILogRedactor.Redact` is documented to let a project "remove or replace any sensitive
values" (`ILogRedactor.cs:13` and the XML doc on the interface method: *"remove or replace"*;
the shared contract repeats *"Remove or replace any sensitive values"*). `RedactionEnricher`
builds a `snapshot` dictionary, hands it to the redactor, then writes back only via
`AddOrUpdateProperty` for entries that remain in the snapshot and `HasChanged`:
```csharp
foreach (var entry in snapshot)
{
if (HasChanged(logEvent, entry.Key, entry.Value))
logEvent.AddOrUpdateProperty(propertyFactory.CreateProperty(entry.Key, entry.Value));
}
```
If a redactor *removes* a key from the dictionary (`properties.Remove("apiKey")`) — the most
natural way to implement "must not leave the process" — that key simply no longer appears in
the write-back loop, so the original property is **never removed from `logEvent`**. The
secret reaches every sink unredacted, even though the redactor did exactly what its contract
permits. This defeats the seam's stated operational guarantee ("secrets never leave the
process in log events") for any removal-style redactor.
**Recommendation**
After calling the redactor, reconcile deletions: for each property key present on the
original `logEvent` but absent from the returned `snapshot`, call
`logEvent.RemovePropertyIfPresent(key)`. (Capture the original key set before mutation, then
diff.) Add a test asserting a removing redactor scrubs the property (see Telemetry-003).
**Resolution**
_Unresolved._
### Telemetry-002 — Redactor cannot inspect or scrub destructured/structured property values
| | |
|--|--|
| Severity | Medium |
| Category | Security & secret handling |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-55` |
**Description**
The snapshot only unwraps `ScalarValue`; every other `LogEventPropertyValue`
(`StructureValue` from `{@Object}`, `SequenceValue`, `DictionaryValue`) is passed to the
redactor as the raw Serilog wrapper object:
```csharp
snapshot[property.Key] = property.Value is ScalarValue scalar ? scalar.Value : property.Value;
```
A project redactor written against the seam (`IDictionary<string, object?>` of "values")
therefore sees an opaque `StructureValue` for a destructured payload — it cannot read or
mask a secret *field inside* a logged object (e.g. `logger.Information("{@Command}", cmd)`
where `cmd.ApiKey` is sensitive). MxGateway's reference redactor specifically guards
"which command payloads must not leave the process" (per `ILogRedactor` XML doc and the
contract), which is precisely the destructured-object case. The seam silently cannot meet
that requirement; the redactor only works for top-level scalar properties.
**Recommendation**
Document the seam's actual reach (scalar top-level properties only) on `ILogRedactor` and in
the shared contract, and/or recursively project `StructureValue`/`SequenceValue`/
`DictionaryValue` into the snapshot and rebuild them on write-back so nested fields are
redactable. At minimum, make the limitation explicit so consumers do not assume nested
payloads are scrubbed when they are not.
**Resolution**
_Unresolved._
### Telemetry-003 — No tests for redactor removal or structured-value redaction
| | |
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/tests/ZB.MOM.WW.Telemetry.Serilog.Tests/RedactionTests.cs:33-69` |
**Description**
`RedactionTests` covers exactly two redaction behaviours: a registered redactor replacing a
scalar value, and a no-op when none is registered. The `FakeRedactor` only ever *reassigns*
`properties["apiKey"]`. There is no test that a redactor which **removes** a key actually
scrubs it (the Telemetry-001 defect would have been caught), and no test that a redactor can
mask a field of a destructured/structured property (Telemetry-002). For a seam whose entire
purpose is secret containment, the most security-relevant behaviours are untested.
**Recommendation**
Add tests: (a) a redactor calling `properties.Remove(key)` results in the property being
absent from the emitted `LogEvent`; (b) a redactor attempting to mask a nested field of a
`{@Object}` payload, asserting the documented behaviour (whichever resolution Telemetry-002
takes). These should fail today and pin the fixes.
**Resolution**
_Unresolved._
### Telemetry-004 — `service.instance.id` Resource attribute is undocumented in spec and contract
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:19-45` |
**Description**
`ZbResource` adds a `service.instance.id` attribute (deterministic `MachineName:ProcessId`)
to the Resource, and exposes it as a new public member `ZbResource.InstanceId`. The
normalized Resource attribute set is enumerated exhaustively in two authoritative docs —
`SPEC.md` §2 and `METRIC-CONVENTIONS.md` §4 — and **neither lists `service.instance.id`**;
the shared contract (`ZB.MOM.WW.Telemetry.md`) likewise documents `ZbResource.Build` as
populating only `service.name/namespace/version/site.id/node.role/host.name` and does not
mention an `InstanceId` member. The attribute itself is a reasonable, standards-aligned
improvement (and disabling the OTel SDK's random-GUID default is sensible for cross-signal
correlation), but it is a silent divergence: the spec/contract are now stale relative to the
code. Per REVIEW-PROCESS §2.7, both directions of drift must be flagged.
**Recommendation**
Add `service.instance.id` (with the `MachineName:ProcessId` rationale) to the Resource table
in `SPEC.md` §2 and `METRIC-CONVENTIONS.md` §4, and document the public `ZbResource.InstanceId`
member in the shared contract, so the normalized spec and the code agree.
**Resolution**
_Unresolved._
### Telemetry-005 — Two hand-maintained Resource-attribute builders can silently drift
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbResource.cs:38-64`, `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/ZbSerilogConfig.cs:125-151` |
**Description**
The Resource attached to metrics/traces is built by `ZbResource.Configure` (via the OTel
`AddService` + `AddAttributes` API), while the Resource attached to the OTLP *log* sink is
built independently by `ZbSerilogConfig.BuildResourceAttributes` (a hand-rolled
`Dictionary<string, object>`). The two currently agree, but they enumerate the same six/seven
attributes in two places with two different mechanisms, so a future change to one (a new
attribute, a renamed key, a changed omission rule) will silently desynchronize logs from
metrics/traces and break the cross-signal correlation the library's whole "unifying hinge"
depends on. There is no test asserting parity between the two attribute sets.
**Recommendation**
Derive both from a single source of truth — e.g. have `ZbResource` expose the canonical
attribute map (already mostly the shape `BuildResourceAttributes` returns) and have the
Serilog sink consume it — or add a parity test that asserts the two attribute sets are
key-for-key identical for a representative options object.
**Resolution**
_Unresolved._
### Telemetry-006 — Malformed `OtlpEndpoint` throws `UriFormatException` late, with no context
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbTelemetryExtensions.cs:127-135` |
**Description**
`ConfigureOtlp` does `otlp.Endpoint = new Uri(options.OtlpEndpoint)` with no validation. A
malformed endpoint string (typo, missing scheme) throws a raw `UriFormatException` deep
inside OTel exporter construction at host-build time, with no mention of which option was at
fault. `BuildOptions` already fails fast and clearly for a missing `ServiceName`, but does
not validate that `OtlpEndpoint` is a well-formed absolute URI when `Exporter == Otlp` (nor
that it is non-empty — an Otlp exporter with a null endpoint is silently registered and
points nowhere). The Serilog path (`ZbSerilogConfig`) has the same untyped string→endpoint
handoff.
**Recommendation**
In `BuildOptions`, when `Exporter == ZbExporter.Otlp`, validate `OtlpEndpoint` with
`Uri.TryCreate(..., UriKind.Absolute, out _)` and throw an `ArgumentException` naming the
option (consistent with the existing `ServiceName` guard) rather than letting a bare
`UriFormatException` escape later.
**Resolution**
_Unresolved._
### Telemetry-007 — Redaction snapshot allocates a dictionary on every log event
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry.Serilog/RedactionEnricher.cs:49-67` |
**Description**
When an `ILogRedactor` is registered, `Enrich` allocates a new
`Dictionary<string, object?>(logEvent.Properties.Count)`, copies every property into it, and
then iterates again to diff/write-back — on every single log event, across every logging
thread. Enrichers are on the hottest path in the library (they run for each event the level
filter admits). The early-return when no redactor is registered keeps the common case free,
so the cost is borne only by redaction-enabled consumers (MxGateway), but for a high-volume
gateway this is non-trivial steady-state allocation/GC pressure.
**Recommendation**
Consider redacting in place against `logEvent.Properties` without a full snapshot copy (e.g.
only materialize replacements for keys the redactor touches), or short-circuit when the event
has no properties. At minimum, document the per-event cost so consumers can weigh enabling
redaction on very hot loggers. Acceptable as-is given redaction is opt-in and security-first.
**Resolution**
_Unresolved._
### Telemetry-008 — `MapZbMetrics` XML doc claims it is "only valid when Exporter = Prometheus" — stale
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Telemetry/src/ZB.MOM.WW.Telemetry/ZbMetricsEndpointExtensions.cs:11-14` |
**Description**
The XML doc on `MapZbMetrics` states it is "Only valid when
`ZbTelemetryOptions.Exporter = ZbExporter.Prometheus`." That contradicts the actual wiring:
`ApplyMetricsExporter` (`ZbTelemetryExtensions.cs:107-116`) **always** calls
`AddPrometheusExporter()` regardless of the `Exporter` setting — OTLP is purely additive.
The library's own README ("Prometheus is **always wired** for metrics regardless of the
`Exporter` setting") and the test `AddZbTelemetry_OtlpExporter_StillServesPrometheusEndpoint`
both confirm `/metrics` works under `Exporter = Otlp`. The doc-comment therefore tells
consumers the opposite of the real (and intended) behaviour and could lead them to wrongly
believe `MapZbMetrics` is a no-op under OTLP. The same stale wording is mirrored in the
shared contract (`ZB.MOM.WW.Telemetry.md`, `MapZbMetrics` summary).
**Recommendation**
Update the doc-comment to state that the Prometheus exporter is always registered and
`MapZbMetrics` is valid under any `Exporter` value (Prometheus is always-on; OTLP is an
overlay). Align the shared-contract summary for `MapZbMetrics` to match.
**Resolution**
_Unresolved._
+223 -17
View File
@@ -6,31 +6,237 @@
| Packages | `ZB.MOM.WW.Theme` (Razor Class Library) |
| Component spec | `components/ui-theme/spec/SPEC.md` |
| Shared contract | `components/ui-theme/shared-contract/ZB.MOM.WW.Theme.md` |
| Status | Not yet reviewed |
| Last reviewed | |
| Reviewer | |
| Commit reviewed | |
| Open findings | 0 |
| Status | Reviewed |
| Last reviewed | 2026-06-01 |
| Reviewer | Claude (automated baseline) |
| Commit reviewed | `5f75cd4` |
| Open findings | 6 |
## Summary
_Not yet reviewed._
`ZB.MOM.WW.Theme` is a small, clean, well-scoped Blazor Razor Class Library: ten flat-namespace
SSR components, two stylesheets (`theme.css` 379 lines, `layout.css` 191 lines), three vendored
IBM Plex `.woff2` fonts, and two public enums. The component C# is correct and minimal — stateless
render logic, no concurrency surface, no secret handling, sensible `[Parameter]`/`EditorRequired`
choices, and `TreatWarningsAsErrors`. The 32 bUnit tests (20 `[Fact]` + 12 `[Theory]` cases) exercise
the public render contract of every component plus a static-asset check on the corrected font path.
The standout security-sensitive area — `LoginCard` (open-redirect + antiforgery) — is handled
correctly by delegating both concerns to the consumer with explicit XML docs and in-markup security
notes, matching the spec. No findings are Critical or High. The findings are concentrated in two
themes: (1) one functional gap — the mobile hamburger relies on Bootstrap's collapse JS, an
undocumented runtime dependency that contradicts the "no JavaScript" framing; and (2) CSS / token
hygiene — a token-fidelity drift on `.chip-idle`, hardcoded hex values that violate the spec's
"no hardcoded hex" rule, an undefined `.rail-ico` class, orphan/dead nav CSS, and a near-total
absence of XML docs on the public parameter surface.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | | |
| 2 | Public API surface & compatibility | | |
| 3 | Concurrency & thread safety | | |
| 4 | Error handling & resilience | | |
| 5 | Security & secret handling | | |
| 6 | Performance & resource management | | |
| 7 | Spec & shared-contract adherence | | |
| 8 | Packaging, dependencies & project layout | | |
| 9 | Testing coverage | | |
| 10 | Documentation & XML docs | | |
| 1 | Correctness & logic bugs | | Component render logic correct; `open="@Expanded"` boolean-attribute binding behaves (tests confirm); chip/variant `switch` mappings complete. No issues found. |
| 2 | Public API surface & compatibility | | Flat `ZB.MOM.WW.Theme` namespace; minimal, intentional; `EditorRequired` applied; nullable annotations correct; no internal types leak. Surface matches contract. No issues found. |
| 3 | Concurrency & thread safety | | Stateless components, no shared mutable state, no `async`/statics/singletons. No issues found (UI kit). |
| 4 | Error handling & resilience | | No throwing paths; required params default to `string.Empty`; null `RenderFragment` slots are guarded (`is not null` / `IsNullOrEmpty`). No issues found. |
| 5 | Security & secret handling | | No secrets/PII. `LoginCard` open-redirect + CSRF correctly delegated to consumer with docs. `Accent` interpolates into a `style` attr but is developer-supplied (not untrusted). No issues found. |
| 6 | Performance & resource management | | No `IDisposable`, no allocations of note, fonts vendored once. No issues found. |
| 7 | Spec & shared-contract adherence | | Component/parameter API matches SPEC §4 and the shared contract. Drifts found: `.chip-idle` foreground (Theme-002) and "no hardcoded hex" rule (Theme-003). |
| 8 | Packaging, dependencies & project layout | | `net10.0` RCL, single package id `ZB.MOM.WW.Theme`, `FrameworkReference` only, version `0.1.0` (inherited from `Directory.Build.props`), static assets under `_content/`. No issues found. |
| 9 | Testing coverage | | 32 tests cover every component's render contract + corrected font path. CSS-class wiring (e.g. `.rail-ico`, orphan nav CSS) is not asserted, which is how Theme-004/005 slipped through. |
| 10 | Documentation & XML docs | | README/contract accurate. XML docs present only on `LoginCard`; absent on the other 9 components, all their parameters, and both enums (Theme-006). |
## Findings
_No findings recorded yet._
### Theme-001 — Mobile hamburger toggle silently depends on Bootstrap collapse JS
| | |
|--|--|
| Severity | Medium |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/ThemeShell.razor:5` |
**Description**
`ThemeShell` renders the narrow-viewport navigation toggle as a Bootstrap collapse trigger:
`<button ... data-bs-toggle="collapse" data-bs-target="#theme-rail" aria-expanded="false">`, with the
rail wrapped in `<div class="collapse d-lg-block" id="theme-rail">`. The expand/collapse behaviour
and the `aria-expanded` state transition are driven entirely by **Bootstrap's collapse JavaScript
bundle** at runtime. The shared contract states "No global JavaScript … `NavRailSection` is CSS-only"
and the spec emphasizes static-SSR/no-JS operation, while §5 only requires each app to keep its own
Bootstrap **`<link>`** (CSS) — Bootstrap *JS* is never mentioned. A consumer that includes only
Bootstrap CSS (a reasonable reading of the docs) gets a hamburger button that does nothing below the
`lg` breakpoint: the rail stays `display:none` (`.collapse`) and is unreachable on mobile/narrow
windows. There is no compile-time signal, and the hardcoded `aria-expanded="false"` never updates
without the JS, so assistive tech is also misinformed. This affects every consumer's small-viewport
navigation.
**Recommendation**
Either (a) document the Bootstrap-JS runtime requirement explicitly in the README/spec adoption steps
("the side-rail mobile toggle requires Bootstrap's collapse JS bundle"), or (b) make the toggle
JS-independent to match the "no JavaScript" promise — e.g. wrap the rail in a `<details>`/checkbox
CSS-only disclosure (the same pattern `NavRailSection` already uses), so collapse works in pure static
SSR. If keeping the Bootstrap-collapse approach, document that `aria-expanded` requires the JS to stay
accurate.
**Resolution**
_Unresolved._
### Theme-002 — `.chip-idle` foreground diverges from the documented token pairing
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/wwwroot/css/theme.css:177` |
**Description**
`DESIGN-TOKENS.md` ("Status tokens — tinted backgrounds") states the chip classes pair each tinted
background with its matching foreground token — "`--ok-bg` background with `--ok` foreground … Used by
`.chip-ok`, `.chip-warn`, `.chip-bad`, `.chip-idle`". The `--idle` token (`#868e96`) is correctly used
for `.s-idle` and the `.conn-pill .dot`, but `.chip-idle` uses `color: var(--ink-soft)` (`#5a6066`)
instead of `color: var(--idle)`:
```css
.chip-idle { color: var(--ink-soft); background: var(--idle-bg); border-color: var(--rule-strong); }
```
So the one idle chip variant does not follow the pairing rule the design-tokens doc spells out. It is
not a contrast or readability regression (`--ink-soft` is darker than `--idle`), but code and the
normalized token reference disagree, which is exactly the spec-drift category §7 asks to flag. Note the
`.chip-warn` foreground is likewise a hardcoded `#b56a00` rather than `var(--warn)` — see Theme-003.
**Recommendation**
Either change `.chip-idle` to `color: var(--idle)` to honor the documented pairing, or, if the darker
`--ink-soft` ink is the deliberate choice, amend `DESIGN-TOKENS.md` to record that `.chip-idle` uses
`--ink-soft` foreground (as it already does for the `Info` variant footnote).
**Resolution**
_Unresolved._
### Theme-003 — Hardcoded hex values in CSS contradict the "no hardcoded hex" rule
| | |
|--|--|
| Severity | Low |
| Category | Spec & shared-contract adherence |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/wwwroot/css/theme.css:174` (and theme.css:96-98,145-150,175-176,251-254,281,290-291,335,348,368-369,377; layout.css:123,128,184) |
**Description**
`SPEC.md` §1 and `DESIGN-TOKENS.md` both assert the rule: "Components carry **no hardcoded hex values**
— everything references these tokens" / "no hardcoded hex values appear in component markup or CSS".
In practice both stylesheets contain numerous literal hex colors outside the `:root` token block —
chip/agg border tints (`#c6e6cd`, `#eec3c3`, `#efd6a6`, `#bfe3c6`, `#f0d9ab`, `#f0c0c0`), chip/notice
foregrounds (`#b56a00`, `#8a5a00`), zebra/hover fills (`#fbfbf9`, `#f3f6fd`, `#eef2fc`), and the
info/dir tints (`#e7ecfb`, `#cdd9f7`). These are derived shades (lighter borders, hover washes) with no
corresponding token, so they are not re-themable via the token block and any per-app accent override
will not reach them. The rule as written is therefore aspirational, and the `DESIGN-TOKENS.md` Info
footnote itself already references a raw `#e7ecfb`, so the doc partially contradicts its own rule.
**Recommendation**
Reconcile code and doc: either promote the recurring derived shades to named tokens (e.g.
`--ok-border`, `--hover-bg`, `--info-bg`/`--info-border`) and reference them, or soften the spec wording
to "no hardcoded hex in **component markup**; CSS derives border/hover shades from tokens where
practical" so the rule matches reality. This keeps the "token changes are breaking" SemVer guarantee
honest.
**Resolution**
_Unresolved._
### Theme-004 — `NavRailItem` emits a `.rail-ico` span that no stylesheet defines
| | |
|--|--|
| Severity | Low |
| Category | Correctness & logic bugs |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/NavRailItem.razor:4` |
**Description**
When the optional `Icon` slot is supplied, `NavRailItem` wraps it: `<span class="rail-ico">@Icon</span>`.
Neither `theme.css` nor `layout.css` defines a `.rail-ico` rule (confirmed by grep). The icon therefore
renders as an unstyled inline span — no sizing, alignment, or gap relative to the label text — so an
icon + text rail link will not line up the way the other rail styling implies. It is not a crash, but
the component ships a class hook with no backing style, which is a latent presentation bug for any
consumer that uses the `Icon` parameter (and none of the 32 tests cover the icon path).
**Recommendation**
Add a `.rail-ico` rule to `layout.css` (e.g. `display:inline-flex; width:1rem; margin-right:0.4rem;
color:var(--ink-faint);`) alongside the existing `.rail-link` rules, and add a bUnit test asserting the
`.rail-ico` span is emitted when `Icon` is supplied.
**Resolution**
_Unresolved._
### Theme-005 — Orphan and unstyled nav CSS classes in `layout.css`
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/wwwroot/css/layout.css:75` (`.rail-eyebrow`), `:103` (`.rail-eyebrow-chevron`); `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/NavRailSection.razor:5` (`.rail-eyebrow-label`) |
**Description**
`NavRailSection` renders `<summary class="rail-eyebrow-toggle"><span class="rail-eyebrow-label">@Title</span></summary>`
and `layout.css` provides the actual collapse chevron via `.rail-section > summary::before` (lines
178-181). That leaves several mismatched class hooks: `.rail-eyebrow` (line 75, non-toggle) and
`.rail-eyebrow-chevron` (lines 103-108) are defined in CSS but emitted by **no** component (dead CSS),
while `.rail-eyebrow-label` is emitted by the component but has **no** CSS rule (it merely inherits from
`.rail-eyebrow-toggle`). The result is two parallel chevron mechanisms (the styled-but-unused
`.rail-eyebrow-chevron` and the actually-used `::before`) and stylesheet cruft that misleads future
maintainers about how the section header is themed.
**Recommendation**
Remove the dead `.rail-eyebrow` and `.rail-eyebrow-chevron` rules (or wire `.rail-eyebrow-chevron` into
`NavRailSection` and drop the `::before` approach — pick one), and either add a `.rail-eyebrow-label`
rule or drop the now-redundant wrapper span. Keeping one chevron mechanism removes the ambiguity.
**Resolution**
_Unresolved._
### Theme-006 — Public component/parameter surface lacks XML documentation
| | |
|--|--|
| Severity | Low |
| Category | Documentation & XML docs |
| Status | Open |
| Location | `ZB.MOM.WW.Theme/src/ZB.MOM.WW.Theme/Components/ThemeShell.razor:23` (and `BrandBar.razor`, `NavRailItem.razor`, `NavRailSection.razor`, `StatusPill.razor`, `TechButton.razor`, `TechCard.razor`, `TechField.razor`, `ThemeHead.razor`, `ButtonVariant.cs`, `StatusState.cs`) |
**Description**
This is a packaged library (`IsPackable=true`, `0.1.0`) consumed by three separate apps, and
`REVIEW-PROCESS.md` §2.10 calls for "XML doc accuracy and presence on the **public** API (these are
libraries — public docs matter)." Only `LoginCard` carries XML docs (on `ReturnUrl` and `ChildContent`).
The other nine components, all of their `[Parameter]` properties, and both public enums
(`ButtonVariant`, `StatusState`) have no `///` documentation, so consumers get no IntelliSense/tooltip
help on the parameter contract (e.g. that `ThemeShell.Accent` overrides the `--accent` token, that
`NavRailItem.Match` defaults to `Prefix`, or what each `StatusState`/`ButtonVariant` member maps to).
The README and shared contract document all of this, but that prose is not surfaced at the call site.
**Recommendation**
Add brief `///` summaries to the public components, their parameters, and the two enum types — at
minimum the non-obvious ones (`ThemeShell.Accent`, `NavRailItem.Match`, `TechButton.Busy`,
`TechCard.Title` vs `Header` precedence, and each enum member). Consider enabling
`GenerateDocumentationFile` so missing-doc warnings surface under the existing `TreatWarningsAsErrors`.
**Resolution**
_Unresolved._