docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked

Full per-module re-review of the 16 stale modules (last seen 1eb6e97 / 2026-05-28)
plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD 4307c381.

67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commit
fd618cf1 closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix
with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings
(IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision.

Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to
sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024),
an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001),
and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary
is semantically sound (symbol-based) in the production cluster config.

README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
Joseph Doherty
2026-06-20 18:02:32 -04:00
parent fd618cf1dc
commit d39089f4ed
19 changed files with 4031 additions and 69 deletions
+98 -3
View File
@@ -5,10 +5,10 @@
| Module | `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure` |
| Design doc | `docs/requirements/Component-ClusterInfrastructure.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Last reviewed | 2026-06-20 |
| Reviewer | claude-agent |
| Commit reviewed | `1eb6e97` |
| Open findings | 1 |
| Commit reviewed | `4307c381` |
| Open findings | 0 |
## Summary
@@ -82,6 +82,32 @@ either did not surface or that have aged into the file:
but the method now offers nothing — keeping it is API-surface noise that an
IDE will still suggest via auto-complete.
#### Re-review 2026-06-20 (commit `4307c381`) — full review
The module remains small and clean — `ClusterOptions`, `ClusterOptionsValidator`,
and `ServiceCollectionExtensions`, all well-documented and well-tested. The
split-brain-resolver, seed-node, and failure-detector configuration contract
matches the design doc, and all fourteen prior findings (CI-001..014) still hold
as resolved: `DownIfAlone` is now consumed by the Host's HOCON builder, the dead
`AddClusterInfrastructureActors` surface is gone, and the validator enforces every
catastrophic value. The only new observation is one Low code-organization item — the
validator hand-rolls every rule via raw `RequireThat` rather than the base's
`MinCount`/`OneOf`/`PositiveTimeSpan` primitives (CI-015), a deliberate trade-off
flagged as a judgment call rather than a clear defect.
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ✓ | Validator rules and DI registration are correct. No new defects. |
| 2 | Akka.NET conventions | ✓ | No actors in this module (legitimate, per CI-001 resolution). Nothing actor-shaped to evaluate. |
| 3 | Concurrency & thread safety | ✓ | Validator and DI extension remain stateless. No issues. |
| 4 | Error handling & resilience | ✓ | Validator rejects every catastrophic value the design doc enumerates, now including `DownIfAlone = false` (CI-010) and `SeedNodes.Count < 2` (CI-012). No issues. |
| 5 | Security | ✓ | No authn/authz surface, no secret handling, no remoting transport configured here. No issues. |
| 6 | Performance & resource management | ✓ | No resources held; validator allocates a small failure list per call only. No issues. |
| 7 | Design-document adherence | ✓ | `ClusterOptions` contract complete; SBR/seed/failure-detector config matches the design doc. No new drift. |
| 8 | Code organization & conventions | ✓ | Options/validator placement and Options pattern correct; CI-014 dead surface removed. New — validator hand-rolls every rule via raw `RequireThat` instead of the `OptionsValidatorBase` primitives `MinCount`/`OneOf`/`PositiveTimeSpan` (CI-015, a judgment call). |
| 9 | Testing coverage | ✓ | 19 tests across three classes covering options, validator (incl. single-seed and `DownIfAlone=false`), and DI registration. No gaps. |
| 10 | Documentation & comments | ✓ | XML docs accurate across all source files; CI-013 intent comment present. No issues. |
## Checklist coverage
Original review (2026-05-16, `9c60592`) below; the re-review notes (2026-05-17,
@@ -857,3 +883,72 @@ explicitly stating that this project exposes no actor-registration extension
(actor wiring lives in `ZB.MOM.WW.ScadaBridge.Host`). If the user prefers to keep the
"fail-fast" trap, mark the method `[Obsolete(true, error: true)]` so the compiler —
not the runtime — rejects the call.
### ClusterInfrastructure-015 — Validator hand-rolls every rule via raw `RequireThat` instead of the base primitives
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Deferred |
| Location | `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/ClusterOptionsValidator.cs:29-57` |
**Description**
`ClusterOptionsValidator` derives from `OptionsValidatorBase<ClusterOptions>`
(`ZB.MOM.WW.Configuration`), whose `ValidationBuilder` exposes a set of typed
validation primitives — `MinCount<T>` ("requires a collection with at least N
items"), `OneOf` ("requires the value to be one of `allowed`, case-insensitive"),
`PositiveTimeSpan` ("requires a strictly positive duration"), plus `Required`,
`Port`, and `HostPort` — alongside the lower-level `RequireThat(bool, message)`
escape hatch for "custom or cross-field rules". `ClusterOptionsValidator.Validate`
expresses *every* one of its rules through raw `RequireThat`, even where a
first-class primitive exists for the exact check:
- the `SeedNodes.Count >= 2` rule (29-33) is what `MinCount<string>(options.SeedNodes, 2, …)` does;
- the `keep-oldest`-only strategy rule (35-39) is what `OneOf(options.SplitBrainResolverStrategy, AllowedStrategies, …)` does;
- the three `> TimeSpan.Zero` timing rules (45-52) are each what `PositiveTimeSpan(…)` does.
Only the `MinNrOfMembers == 1` equality check, the cross-field
`HeartbeatInterval < FailureDetectionThreshold` comparison, and the `DownIfAlone`
boolean (59-63) genuinely require `RequireThat` (no primitive covers them). Routing
the three primitive-eligible rules through `RequireThat` makes this validator
inconsistent with the *intent* of the shared base (the primitives exist precisely so
common rules read uniformly across the codebase's validators) and risks subtle
wording drift between modules for identical checks.
This is **explicitly a deliberate trade-off, not a clear win**, and is filed as a
judgment call rather than a defect. Adopting the primitives would **lose** what makes
the current messages valuable: each hand-rolled message cites the governing design-doc
section (`Component-ClusterInfrastructure.md → Node Configuration` / `→ Split-Brain
Resolution`) and spells out the catastrophic consequence of the misconfiguration
("would risk a total cluster shutdown on a partition", "blocks the cluster singleton
after failover and halts all data collection", "the oldest node can run as an isolated
single-node cluster"). The shared primitives emit terse, generic messages
("requires a collection with at least 2 items") that cannot carry that
safety-critical context. The peer `AuditLogOptionsValidator` likewise prefers raw
`RequireThat` for the same reason, so the current style is at least internally
consistent.
**Recommendation**
Treat this as a conscious choice between two competing goods rather than assuming a
fix is warranted:
- **Adopt the primitives** (`MinCount`/`OneOf`/`PositiveTimeSpan` for the three
eligible rules; keep `RequireThat` for the equality, cross-field, and boolean
checks) if codebase-wide *wording consistency* across validators is the priority —
accepting that the design-doc citations and consequence explanations move out of the
failure message (e.g. into XML docs or an appended suffix).
- **Keep as-is** if the *richer, safety-critical failure messages* are the priority
for a module whose misconfiguration is cluster-fatal — accepting the minor
inconsistency with the base's primitive surface.
Do not assume either direction; decide deliberately. If kept as-is, a one-line code
comment recording the choice (richer messages over primitive uniformity) would stop a
future reader from "tidying" the validator into the primitives and silently dropping
the consequence text.
**Resolution**
Deferred 2026-06-20: this is a deliberate trade-off, not a clear win — the hand-rolled `RequireThat` messages carry design-doc citations and cluster-fatal consequence text that the shared base primitives (`MinCount`/`OneOf`/`PositiveTimeSpan`) cannot, and the peer `AuditLogOptionsValidator` follows the same style. Adopting the primitives would prioritize cross-validator wording consistency over those richer safety-critical messages — a deliberate decision left to the owner. No change this pass.