docs(code-review): full review at 4307c381 — 18 modules, 67 findings recorded + remediation tracked
Full per-module re-review of the 16 stale modules (last seen1eb6e97/ 2026-05-28) plus first-ever reviews of KpiHistory (#26) and ScriptAnalysis (#25), at HEAD4307c381. 67 new findings (0 Critical, 6 High, 27 Medium, 34 Low). Remediation in commitfd618cf1closed 5 of the 6 Highs and ~33 Medium/Low; the rest are Deferred/Won't Fix with rationale. Remaining pending (4) are all InboundAPI's Database-helper findings (IA-026 High .. IA-029), left to the active feat/ipsen-movein effort per owner decision. Highlights: caught a central-only-delivery security drift (SMTP creds broadcast to sites — DM-025/SR-031), a never-committed 'Resolved' fix (SiteEventLogging-016 → -024), an unguarded KPI recorder tick (KH-001), a trust-analyzer fallback weakening (SA-001), and a native-alarm subscribe-path leak (DCL-023). ScriptAnalysis verdict: trust boundary is semantically sound (symbol-based) in the production cluster config. README regenerated; regen-readme.py --check passes (4 pending / 567 total).
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure` |
|
||||
| Design doc | `docs/requirements/Component-ClusterInfrastructure.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-28 |
|
||||
| Last reviewed | 2026-06-20 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `1eb6e97` |
|
||||
| Open findings | 1 |
|
||||
| Commit reviewed | `4307c381` |
|
||||
| Open findings | 0 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -82,6 +82,32 @@ either did not surface or that have aged into the file:
|
||||
but the method now offers nothing — keeping it is API-surface noise that an
|
||||
IDE will still suggest via auto-complete.
|
||||
|
||||
#### Re-review 2026-06-20 (commit `4307c381`) — full review
|
||||
|
||||
The module remains small and clean — `ClusterOptions`, `ClusterOptionsValidator`,
|
||||
and `ServiceCollectionExtensions`, all well-documented and well-tested. The
|
||||
split-brain-resolver, seed-node, and failure-detector configuration contract
|
||||
matches the design doc, and all fourteen prior findings (CI-001..014) still hold
|
||||
as resolved: `DownIfAlone` is now consumed by the Host's HOCON builder, the dead
|
||||
`AddClusterInfrastructureActors` surface is gone, and the validator enforces every
|
||||
catastrophic value. The only new observation is one Low code-organization item — the
|
||||
validator hand-rolls every rule via raw `RequireThat` rather than the base's
|
||||
`MinCount`/`OneOf`/`PositiveTimeSpan` primitives (CI-015), a deliberate trade-off
|
||||
flagged as a judgment call rather than a clear defect.
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | Validator rules and DI registration are correct. No new defects. |
|
||||
| 2 | Akka.NET conventions | ✓ | No actors in this module (legitimate, per CI-001 resolution). Nothing actor-shaped to evaluate. |
|
||||
| 3 | Concurrency & thread safety | ✓ | Validator and DI extension remain stateless. No issues. |
|
||||
| 4 | Error handling & resilience | ✓ | Validator rejects every catastrophic value the design doc enumerates, now including `DownIfAlone = false` (CI-010) and `SeedNodes.Count < 2` (CI-012). No issues. |
|
||||
| 5 | Security | ✓ | No authn/authz surface, no secret handling, no remoting transport configured here. No issues. |
|
||||
| 6 | Performance & resource management | ✓ | No resources held; validator allocates a small failure list per call only. No issues. |
|
||||
| 7 | Design-document adherence | ✓ | `ClusterOptions` contract complete; SBR/seed/failure-detector config matches the design doc. No new drift. |
|
||||
| 8 | Code organization & conventions | ✓ | Options/validator placement and Options pattern correct; CI-014 dead surface removed. New — validator hand-rolls every rule via raw `RequireThat` instead of the `OptionsValidatorBase` primitives `MinCount`/`OneOf`/`PositiveTimeSpan` (CI-015, a judgment call). |
|
||||
| 9 | Testing coverage | ✓ | 19 tests across three classes covering options, validator (incl. single-seed and `DownIfAlone=false`), and DI registration. No gaps. |
|
||||
| 10 | Documentation & comments | ✓ | XML docs accurate across all source files; CI-013 intent comment present. No issues. |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
Original review (2026-05-16, `9c60592`) below; the re-review notes (2026-05-17,
|
||||
@@ -857,3 +883,72 @@ explicitly stating that this project exposes no actor-registration extension
|
||||
(actor wiring lives in `ZB.MOM.WW.ScadaBridge.Host`). If the user prefers to keep the
|
||||
"fail-fast" trap, mark the method `[Obsolete(true, error: true)]` so the compiler —
|
||||
not the runtime — rejects the call.
|
||||
|
||||
### ClusterInfrastructure-015 — Validator hand-rolls every rule via raw `RequireThat` instead of the base primitives
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Status | Deferred |
|
||||
| Location | `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/ClusterOptionsValidator.cs:29-57` |
|
||||
|
||||
**Description**
|
||||
|
||||
`ClusterOptionsValidator` derives from `OptionsValidatorBase<ClusterOptions>`
|
||||
(`ZB.MOM.WW.Configuration`), whose `ValidationBuilder` exposes a set of typed
|
||||
validation primitives — `MinCount<T>` ("requires a collection with at least N
|
||||
items"), `OneOf` ("requires the value to be one of `allowed`, case-insensitive"),
|
||||
`PositiveTimeSpan` ("requires a strictly positive duration"), plus `Required`,
|
||||
`Port`, and `HostPort` — alongside the lower-level `RequireThat(bool, message)`
|
||||
escape hatch for "custom or cross-field rules". `ClusterOptionsValidator.Validate`
|
||||
expresses *every* one of its rules through raw `RequireThat`, even where a
|
||||
first-class primitive exists for the exact check:
|
||||
|
||||
- the `SeedNodes.Count >= 2` rule (29-33) is what `MinCount<string>(options.SeedNodes, 2, …)` does;
|
||||
- the `keep-oldest`-only strategy rule (35-39) is what `OneOf(options.SplitBrainResolverStrategy, AllowedStrategies, …)` does;
|
||||
- the three `> TimeSpan.Zero` timing rules (45-52) are each what `PositiveTimeSpan(…)` does.
|
||||
|
||||
Only the `MinNrOfMembers == 1` equality check, the cross-field
|
||||
`HeartbeatInterval < FailureDetectionThreshold` comparison, and the `DownIfAlone`
|
||||
boolean (59-63) genuinely require `RequireThat` (no primitive covers them). Routing
|
||||
the three primitive-eligible rules through `RequireThat` makes this validator
|
||||
inconsistent with the *intent* of the shared base (the primitives exist precisely so
|
||||
common rules read uniformly across the codebase's validators) and risks subtle
|
||||
wording drift between modules for identical checks.
|
||||
|
||||
This is **explicitly a deliberate trade-off, not a clear win**, and is filed as a
|
||||
judgment call rather than a defect. Adopting the primitives would **lose** what makes
|
||||
the current messages valuable: each hand-rolled message cites the governing design-doc
|
||||
section (`Component-ClusterInfrastructure.md → Node Configuration` / `→ Split-Brain
|
||||
Resolution`) and spells out the catastrophic consequence of the misconfiguration
|
||||
("would risk a total cluster shutdown on a partition", "blocks the cluster singleton
|
||||
after failover and halts all data collection", "the oldest node can run as an isolated
|
||||
single-node cluster"). The shared primitives emit terse, generic messages
|
||||
("requires a collection with at least 2 items") that cannot carry that
|
||||
safety-critical context. The peer `AuditLogOptionsValidator` likewise prefers raw
|
||||
`RequireThat` for the same reason, so the current style is at least internally
|
||||
consistent.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Treat this as a conscious choice between two competing goods rather than assuming a
|
||||
fix is warranted:
|
||||
|
||||
- **Adopt the primitives** (`MinCount`/`OneOf`/`PositiveTimeSpan` for the three
|
||||
eligible rules; keep `RequireThat` for the equality, cross-field, and boolean
|
||||
checks) if codebase-wide *wording consistency* across validators is the priority —
|
||||
accepting that the design-doc citations and consequence explanations move out of the
|
||||
failure message (e.g. into XML docs or an appended suffix).
|
||||
- **Keep as-is** if the *richer, safety-critical failure messages* are the priority
|
||||
for a module whose misconfiguration is cluster-fatal — accepting the minor
|
||||
inconsistency with the base's primitive surface.
|
||||
|
||||
Do not assume either direction; decide deliberately. If kept as-is, a one-line code
|
||||
comment recording the choice (richer messages over primitive uniformity) would stop a
|
||||
future reader from "tidying" the validator into the primitives and silently dropping
|
||||
the consequence text.
|
||||
|
||||
**Resolution**
|
||||
|
||||
Deferred 2026-06-20: this is a deliberate trade-off, not a clear win — the hand-rolled `RequireThat` messages carry design-doc citations and cluster-fatal consequence text that the shared base primitives (`MinCount`/`OneOf`/`PositiveTimeSpan`) cannot, and the peer `AuditLogOptionsValidator` follows the same style. Adopting the primitives would prioritize cross-validator wording consistency over those richer safety-critical messages — a deliberate decision left to the owner. No change this pass.
|
||||
|
||||
Reference in New Issue
Block a user