fix(cluster-infrastructure): resolve ClusterInfrastructure-002..006 — options validation, DI registration, down-if-alone

This commit is contained in:
Joseph Doherty
2026-05-16 20:58:03 -04:00
parent 71b90ba499
commit dba1a1b25f
8 changed files with 441 additions and 12 deletions

View File

@@ -8,7 +8,7 @@
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 7 |
| Open findings | 3 |
## Summary
@@ -144,7 +144,7 @@ module-ownership claim was wrong. Module test suite green (3 passed).
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:7-17` |
**Description**
@@ -167,7 +167,23 @@ with the genuine registration when CI-001 is addressed.
**Resolution**
_Unresolved._
Confirmed against the source: both methods returned the `IServiceCollection`
unchanged. Verified the consumers — `ScadaLink.Host` calls `AddClusterInfrastructure()`
(`Program.cs:68`, `SiteServiceRegistration.cs:24`); `AddClusterInfrastructureActors`
is dead — it is called nowhere in the solution.
**Resolved** — fixing commit `commit pending`, date 2026-05-16.
`AddClusterInfrastructure` now does real work: it registers the
`ClusterOptionsValidator` (CI-004) via `TryAddEnumerable`, so the method is no longer a
no-op and a misconfigured `ScadaLink:Cluster` section fails fast on the first
`IOptions<ClusterOptions>` resolution. `AddClusterInfrastructureActors` — which this
component never had any actors to register, as CI-001 established the Akka bootstrap
lives in `ScadaLink.Host` — now throws `NotImplementedException` with a message
pointing the caller to the Host, rather than masquerading as a completed registration.
Covered by `ServiceCollectionExtensionsTests`
(`AddClusterInfrastructure_RegistersOptionsValidator`,
`AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution`,
`AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding`).
### ClusterInfrastructure-003 — ClusterOptions omits several documented node-configuration settings
@@ -175,7 +191,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11` |
**Description**
@@ -202,7 +218,27 @@ agree on where each value lives.
**Resolution**
_Unresolved._
Partially re-triaged. Verified against the source: most of the "missing" settings are
**deliberately owned by `ScadaLink.Host.NodeOptions`**`NodeOptions` already carries
`Role`, `NodeHostname`, `SiteId`, `RemotingPort` and `GrpcPort`, and `AkkaHostedService`
builds the HOCON from `NodeOptions` for exactly those values. Local SQLite storage paths
live in the database / store-and-forward options. This is the ownership split CI-001
established (the Host owns node identity and bootstrap; this project owns the
cluster-formation contract), so those settings do **not** belong in `ClusterOptions`.
The one genuine gap the finding identifies is `down-if-alone`, which the design doc
puts with the split-brain settings.
**Resolved** — fixing commit `commit pending`, date 2026-05-16. Added the
`DownIfAlone` boolean (default `true`) to `ClusterOptions` so the split-brain
configuration contract is complete, and added a class-level XML doc that records the
deliberate ownership split — node identity/remoting/gRPC in `Host.NodeOptions`, storage
paths in the database options, cluster-formation settings here — so the design doc and
the options classes now agree on where each value lives. (`AkkaHostedService` currently
hard-codes `down-if-alone = on` in HOCON; wiring it to read `DownIfAlone` is a one-line
`ScadaLink.Host` change, outside this module's permitted edit scope, and is noted for
the Host's review.) Covered by `ClusterOptionsTests.DefaultValues_AreCorrect` and
`ClusterOptionsTests.DownIfAlone_CanBeSet`.
### ClusterInfrastructure-004 — ClusterOptions has no validation despite safety-critical values
@@ -210,7 +246,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Open |
| Status | Resolved |
| Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11` |
**Description**
@@ -239,7 +275,26 @@ FailureDetectionThreshold` and positive `StableAfter`. Register it with
**Resolution**
_Unresolved._
Confirmed: `ClusterOptions` had no validation of any kind, and the design doc's
catastrophic-misconfiguration values (`MinNrOfMembers: 2`, a quorum split-brain
strategy) would have been bound silently.
**Resolved** — fixing commit `commit pending`, date 2026-05-16. Added
`ClusterOptionsValidator : IValidateOptions<ClusterOptions>`, which enforces
`MinNrOfMembers == 1`, restricts `SplitBrainResolverStrategy` to the
`keep-oldest`-only allowed set, requires a non-empty `SeedNodes`, requires positive
`StableAfter` / `HeartbeatInterval` / `FailureDetectionThreshold`, and asserts
`HeartbeatInterval < FailureDetectionThreshold`. It accumulates every failure into one
result. It is registered by `AddClusterInfrastructure()` (CI-002) as a singleton
`IValidateOptions<ClusterOptions>`, so a misconfigured section throws
`OptionsValidationException` on the first `IOptions<ClusterOptions>.Value` resolution
— which `AkkaHostedService` performs during startup, giving the fail-fast-at-boot
behaviour the recommendation asks for without the src project taking a dependency on
the full `Microsoft.Extensions.DependencyInjection` package needed for the
`ValidateOnStart()` overload. Data annotations were not used — a single
`IValidateOptions` implementation expresses the interdependent timing rules that
attributes cannot. Covered by `ClusterOptionsValidatorTests` (8 cases) and
`ServiceCollectionExtensionsTests.AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution`.
### ClusterInfrastructure-005 — No configuration section name constant for the Options pattern binding
@@ -276,7 +331,7 @@ _Unresolved._
|--|--|
| Severity | Medium |
| Category | Testing coverage |
| Status | Open |
| Status | Resolved |
| Location | `tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:1-51` |
**Description**
@@ -301,7 +356,28 @@ from `ClusterOptions` and for the options validation from CI-004.
**Resolution**
_Unresolved._
Re-triaged in light of CI-001's resolution. The Akka bootstrap, HOCON generation,
cluster formation, failover and singleton handover are owned by `ScadaLink.Host`, not
this project — multi-node `Akka.Cluster.TestKit` tests for that behaviour belong in the
Host's test suite, outside this module's scope. What this module legitimately owns is
`ClusterOptions`, its validator, and the DI registration, and the testing gap there is
now closed.
**Resolved** — fixing commit `commit pending`, date 2026-05-16. Added two test classes
to `tests/ScadaLink.ClusterInfrastructure.Tests`: `ClusterOptionsValidatorTests`
(8 cases — valid defaults pass; `MinNrOfMembers != 1`, unsupported split-brain
strategies, empty seed nodes, heartbeat not below the failure threshold, non-positive
`StableAfter` all fail; and a multi-failure accumulation case) and
`ServiceCollectionExtensionsTests` (3 cases — `AddClusterInfrastructure` registers the
validator, the validator rejects bad options at `IOptions` resolution, and
`AddClusterInfrastructureActors` throws). The pre-existing `ClusterOptionsTests` was
extended with `DownIfAlone` coverage. The test project gained references to
`Microsoft.Extensions.DependencyInjection` and `Microsoft.Extensions.Options`. Module
test suite green: 16 passed (was 3). Note: the `keep-majority` value used in the
pre-existing `ClusterOptionsTests.Properties_CanBeSetToCustomValues` is intentionally
left — that test exercises the POCO's property setter (the POCO accepts any string by
design); `ClusterOptionsValidator` is the layer that now rejects `keep-majority`, and
`UnsupportedSplitBrainStrategy_FailsValidation` proves it.
### ClusterInfrastructure-007 — ClusterOptions lacks XML documentation comments