docs(code-reviews): re-review batch 1 at 39d737e — CentralUI, CLI, ClusterInfrastructure, Commons, Communication
17 new findings: CentralUI-020..025, CLI-014..016, ClusterInfrastructure-009..010, Commons-013..014, Communication-012..015.
This commit is contained in:
@@ -5,10 +5,10 @@
|
||||
| Module | `src/ScadaLink.ClusterInfrastructure` |
|
||||
| Design doc | `docs/requirements/Component-ClusterInfrastructure.md` |
|
||||
| Status | Reviewed |
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Last reviewed | 2026-05-17 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 0 |
|
||||
| Commit reviewed | `39d737e` |
|
||||
| Open findings | 2 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -29,20 +29,39 @@ every other component runs on, yet it presently delivers nothing the design requ
|
||||
The single options class is clean and its test covers defaults and setters
|
||||
adequately for what exists.
|
||||
|
||||
**Re-review 2026-05-17 (commit `39d737e`).** All eight prior findings (CI-001..008)
|
||||
were resolved by the batch of work between `9c60592` and `39d737e`: `ClusterOptions`
|
||||
gained XML docs, the `SectionName` constant, and the `DownIfAlone` property;
|
||||
`ClusterOptionsValidator` was added; `ServiceCollectionExtensions` now registers the
|
||||
validator and throws from the dead actor-registration method; and the test project
|
||||
grew to 16 cases across three test classes. The module is in good shape — the
|
||||
`ClusterOptions` contract, its validator, and the DI registration are all sound,
|
||||
well-documented, and well-tested. This re-review examined all three source files and
|
||||
all three test files against the full 10-category checklist and found **two new
|
||||
issues**, both stemming from work the prior review explicitly deferred to a "Host
|
||||
review" that has not happened: the `DownIfAlone` property is exposed and validated as
|
||||
part of the configuration contract but is never consumed — `ScadaLink.Host`'s
|
||||
`BuildHocon` still hard-codes `down-if-alone = on` (CI-009, Medium) — and the validator
|
||||
does not enforce the design doc's requirement that `down-if-alone` be `on` for the
|
||||
keep-oldest resolver, so `DownIfAlone = false` is silently accepted (CI-010, Low).
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
Original review (2026-05-16, `9c60592`) below; the re-review notes (2026-05-17,
|
||||
`39d737e`) are appended in each row.
|
||||
|
||||
| # | Category | Examined | Notes |
|
||||
|---|----------|----------|-------|
|
||||
| 1 | Correctness & logic bugs | ✓ | No executable logic exists beyond an options POCO; no logic bugs, but `ServiceCollectionExtensions` returns success while doing nothing (CI-002). |
|
||||
| 2 | Akka.NET conventions | ✓ | No actors, no `ActorSystem` bootstrap, no supervision, no cluster/singleton wiring exist despite the design doc requiring all of them (CI-001). Nothing to assess against `Tell`/`Ask`, immutability, or `PipeTo`. |
|
||||
| 3 | Concurrency & thread safety | ✓ | No shared mutable state, no actors, no async code. No issues found in current code. |
|
||||
| 4 | Error handling & resilience | ✓ | Failover, split-brain, dual-node recovery, and graceful-shutdown logic are entirely absent (CI-001). No exception paths to review in current code. |
|
||||
| 5 | Security | ✓ | No authn/authz surface in this module. Akka remoting is unconfigured, so transport security cannot be assessed; flagged as part of the missing implementation (CI-001). No secret handling present. |
|
||||
| 6 | Performance & resource management | ✓ | No streams, connections, timers, or `IDisposable` resources exist yet. No issues found in current code. |
|
||||
| 7 | Design-document adherence | ✓ | Severe drift: the module implements none of its documented responsibilities (CI-001). `ClusterOptions` also omits remoting host/port, cluster role/site identifier, gRPC port, storage paths, and `down-if-alone` (CI-003). |
|
||||
| 8 | Code organization & conventions | ✓ | Options class is correctly owned by the component project. Missing config-section-name constant (CI-005) and missing `IValidateOptions`/data-annotation validation (CI-004) versus the Options pattern intent. |
|
||||
| 9 | Testing coverage | ✓ | `ClusterOptionsTests` covers defaults and setters. No tests for any cluster behaviour because none exists; the test project references nothing else (CI-006). |
|
||||
| 10 | Documentation & comments | ✓ | `ClusterOptions` has no XML doc comments unlike peer options classes (CI-007). The "Phase 0 skeleton" placeholders are undocumented at the module level — no README or tracking note (CI-008). |
|
||||
| 1 | Correctness & logic bugs | ✓ | No executable logic exists beyond an options POCO; no logic bugs, but `ServiceCollectionExtensions` returns success while doing nothing (CI-002). **Re-review:** CI-002 resolved. New — `DownIfAlone` is a settable property that controls nothing because the HOCON builder hard-codes the value (CI-009). |
|
||||
| 2 | Akka.NET conventions | ✓ | No actors, no `ActorSystem` bootstrap, no supervision, no cluster/singleton wiring exist despite the design doc requiring all of them (CI-001). Nothing to assess against `Tell`/`Ask`, immutability, or `PipeTo`. **Re-review:** confirmed the Akka bootstrap legitimately lives in `ScadaLink.Host` (CI-001 resolution); still nothing actor-related in this module. No issues. |
|
||||
| 3 | Concurrency & thread safety | ✓ | No shared mutable state, no actors, no async code. No issues found in current code. **Re-review:** validator and DI extensions are stateless; no issues. |
|
||||
| 4 | Error handling & resilience | ✓ | Failover, split-brain, dual-node recovery, and graceful-shutdown logic are entirely absent (CI-001). No exception paths to review in current code. **Re-review:** the validator now fails fast on misconfiguration. New — it does not enforce the design doc's `down-if-alone = on` requirement (CI-010). |
|
||||
| 5 | Security | ✓ | No authn/authz surface in this module. Akka remoting is unconfigured, so transport security cannot be assessed; flagged as part of the missing implementation (CI-001). No secret handling present. **Re-review:** still no authn/authz surface, no secret handling. No issues. |
|
||||
| 6 | Performance & resource management | ✓ | No streams, connections, timers, or `IDisposable` resources exist yet. No issues found in current code. **Re-review:** no resources held; the validator allocates a small failure list per call only. No issues. |
|
||||
| 7 | Design-document adherence | ✓ | Severe drift: the module implements none of its documented responsibilities (CI-001). `ClusterOptions` also omits remoting host/port, cluster role/site identifier, gRPC port, storage paths, and `down-if-alone` (CI-003). **Re-review:** CI-001/CI-003 resolved (ownership split documented; `DownIfAlone` added). New — `DownIfAlone` was added to the contract but never wired into the HOCON (CI-009). |
|
||||
| 8 | Code organization & conventions | ✓ | Options class is correctly owned by the component project. Missing config-section-name constant (CI-005) and missing `IValidateOptions`/data-annotation validation (CI-004) versus the Options pattern intent. **Re-review:** CI-004/CI-005 resolved; `SectionName` constant present and options/validator placement correct. No issues. |
|
||||
| 9 | Testing coverage | ✓ | `ClusterOptionsTests` covers defaults and setters. No tests for any cluster behaviour because none exists; the test project references nothing else (CI-006). **Re-review:** CI-006 resolved — 16 tests across three classes covering options, validator, and DI registration. No `DownIfAlone`-wiring test exists, but that wiring lives in the Host (CI-009). No new issue here. |
|
||||
| 10 | Documentation & comments | ✓ | `ClusterOptions` has no XML doc comments unlike peer options classes (CI-007). The "Phase 0 skeleton" placeholders are undocumented at the module level — no README or tracking note (CI-008). **Re-review:** CI-007/CI-008 resolved — full XML docs on all members; skeleton comments gone. Note: the `DownIfAlone` XML doc calls `true` "the design-doc requirement" yet the value is inert (CI-009) and unenforced (CI-010). |
|
||||
|
||||
## Findings
|
||||
|
||||
@@ -490,3 +509,97 @@ section, and the component table reflects the true placement. This is a
|
||||
documentation-only finding, so no runtime regression test is meaningful; verified by
|
||||
inspection of `ServiceCollectionExtensions.cs` and
|
||||
`docs/requirements/Component-ClusterInfrastructure.md:21-39`.
|
||||
|
||||
### ClusterInfrastructure-009 — `DownIfAlone` is an inert configuration knob — never consumed by the HOCON builder
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:74` |
|
||||
|
||||
**Description**
|
||||
|
||||
The `DownIfAlone` property was added to `ClusterOptions` by CI-003's resolution as
|
||||
part of "the split-brain configuration contract". It is public, defaults to `true`,
|
||||
carries an XML doc presenting it as "the design-doc requirement", and is exercised by
|
||||
`ClusterOptionsTests.DownIfAlone_CanBeSet`. However, nothing in the system reads it.
|
||||
The Akka.NET HOCON is generated by `ScadaLink.Host.Actors.AkkaHostedService.BuildHocon`,
|
||||
which **hard-codes** the resolver setting:
|
||||
|
||||
```
|
||||
split-brain-resolver {
|
||||
active-strategy = ...
|
||||
stable-after = ...
|
||||
keep-oldest {
|
||||
down-if-alone = on
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`BuildHocon` receives the full `ClusterOptions` instance and consumes every other
|
||||
field (`SeedNodes`, `MinNrOfMembers`, `SplitBrainResolverStrategy`, `StableAfter`,
|
||||
`HeartbeatInterval`, `FailureDetectionThreshold`) but ignores `DownIfAlone` entirely.
|
||||
The result is a configuration property that an operator can set in `appsettings.json`,
|
||||
that passes validation, and that has **zero runtime effect** — setting
|
||||
`DownIfAlone: false` does not turn the flag off. CI-003's resolution explicitly
|
||||
acknowledged this gap ("wiring it to read `DownIfAlone` is a one-line `ScadaLink.Host`
|
||||
change ... noted for the Host's review") but the wiring was never done and no tracked
|
||||
finding carried it, so the gap has silently persisted to commit `39d737e`. An inert,
|
||||
misleadingly-documented configuration knob is a correctness and design-adherence
|
||||
defect: it gives operators a false sense of control over a safety-critical resolver
|
||||
setting.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Either (a) wire `DownIfAlone` into `BuildHocon` — emit `down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")}`
|
||||
— so the property does what its XML doc claims (a Host-side change, to be tracked in
|
||||
the Host module's review since `BuildHocon` lives there), or (b) if the flag is
|
||||
intentionally fixed at `on` and must never be operator-configurable, remove the
|
||||
`DownIfAlone` property from `ClusterOptions` and document the hard-coded `on` value as
|
||||
a non-negotiable invariant. Do not leave a public, settable, validated property that
|
||||
controls nothing.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
### ClusterInfrastructure-010 — Validator does not enforce `DownIfAlone = true` despite the design doc requiring it
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptionsValidator.cs:21-71` |
|
||||
|
||||
**Description**
|
||||
|
||||
`Component-ClusterInfrastructure.md` (Split-Brain Resolution) states the keep-oldest
|
||||
resolver must be configured with `down-if-alone = on`, and the XML doc on
|
||||
`ClusterOptions.DownIfAlone` calls `true` "the design-doc requirement" — the rationale
|
||||
being that without it the oldest node can run as an isolated single-node cluster
|
||||
during a partition while the younger node forms its own. `ClusterOptionsValidator`
|
||||
guards every other safety-critical setting (`MinNrOfMembers == 1`, `keep-oldest`-only
|
||||
strategy, positive timings, heartbeat below the failure threshold) but performs no
|
||||
check on `DownIfAlone`. A configuration of `DownIfAlone: false` therefore passes
|
||||
validation cleanly. This is currently latent because CI-009 shows the property is not
|
||||
consumed at all, but the moment CI-009 is fixed by wiring the property into the HOCON
|
||||
(option (a)), `DownIfAlone: false` would silently produce the unsafe single-node
|
||||
behaviour the design doc explicitly forbids — with no fail-fast guard. The validator
|
||||
is the right place to enforce the invariant, consistent with how it already rejects
|
||||
quorum split-brain strategies.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
If CI-009 is resolved by keeping `DownIfAlone` configurable, add a check to
|
||||
`ClusterOptionsValidator.Validate` that fails when `DownIfAlone` is `false` (or, if
|
||||
some future deployment legitimately needs it off, fails only in combination with the
|
||||
`keep-oldest` strategy), with a message explaining the isolated-single-node-cluster
|
||||
hazard. If CI-009 is resolved by removing the property, this finding is moot and
|
||||
should be closed as resolved alongside it.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
|
||||
Reference in New Issue
Block a user