# Code Review — ClusterInfrastructure | Field | Value | |-------|-------| | Module | `src/ScadaLink.ClusterInfrastructure` | | Design doc | `docs/requirements/Component-ClusterInfrastructure.md` | | Status | Reviewed | | Last reviewed | 2026-05-17 | | Reviewer | claude-agent | | Commit reviewed | `39d737e` | | Open findings | 0 | ## Summary The ClusterInfrastructure module is currently a **Phase 0 skeleton**. It contains only two source files: `ClusterOptions.cs`, a plain options POCO, and `ServiceCollectionExtensions.cs`, whose two registration methods are explicit no-ops. None of the responsibilities described in `Component-ClusterInfrastructure.md` — Akka.NET cluster bootstrap, leader election, failover detection, split-brain resolution, cluster singleton hosting, Windows service lifecycle — are implemented. There are therefore no correctness, concurrency, or Akka-convention defects to find in *behaviour*, because there is no behaviour. The findings below instead concern (a) the large gap between the design doc and the code, (b) the options class missing the validation, configuration-binding affordances, and coverage of documented settings that peer modules provide, and (c) the no-op DI extensions silently returning success, which is a latent reliability hazard once the Host wires this module in. The dominant theme is **incompleteness**: this module is the foundation every other component runs on, yet it presently delivers nothing the design requires. The single options class is clean and its test covers defaults and setters adequately for what exists. **Re-review 2026-05-17 (commit `39d737e`).** All eight prior findings (CI-001..008) were resolved by the batch of work between `9c60592` and `39d737e`: `ClusterOptions` gained XML docs, the `SectionName` constant, and the `DownIfAlone` property; `ClusterOptionsValidator` was added; `ServiceCollectionExtensions` now registers the validator and throws from the dead actor-registration method; and the test project grew to 16 cases across three test classes. The module is in good shape — the `ClusterOptions` contract, its validator, and the DI registration are all sound, well-documented, and well-tested. This re-review examined all three source files and all three test files against the full 10-category checklist and found **two new issues**, both stemming from work the prior review explicitly deferred to a "Host review" that has not happened: the `DownIfAlone` property is exposed and validated as part of the configuration contract but is never consumed — `ScadaLink.Host`'s `BuildHocon` still hard-codes `down-if-alone = on` (CI-009, Medium) — and the validator does not enforce the design doc's requirement that `down-if-alone` be `on` for the keep-oldest resolver, so `DownIfAlone = false` is silently accepted (CI-010, Low). ## Checklist coverage Original review (2026-05-16, `9c60592`) below; the re-review notes (2026-05-17, `39d737e`) are appended in each row. | # | Category | Examined | Notes | |---|----------|----------|-------| | 1 | Correctness & logic bugs | ✓ | No executable logic exists beyond an options POCO; no logic bugs, but `ServiceCollectionExtensions` returns success while doing nothing (CI-002). **Re-review:** CI-002 resolved. New — `DownIfAlone` is a settable property that controls nothing because the HOCON builder hard-codes the value (CI-009). | | 2 | Akka.NET conventions | ✓ | No actors, no `ActorSystem` bootstrap, no supervision, no cluster/singleton wiring exist despite the design doc requiring all of them (CI-001). Nothing to assess against `Tell`/`Ask`, immutability, or `PipeTo`. **Re-review:** confirmed the Akka bootstrap legitimately lives in `ScadaLink.Host` (CI-001 resolution); still nothing actor-related in this module. No issues. | | 3 | Concurrency & thread safety | ✓ | No shared mutable state, no actors, no async code. No issues found in current code. **Re-review:** validator and DI extensions are stateless; no issues. | | 4 | Error handling & resilience | ✓ | Failover, split-brain, dual-node recovery, and graceful-shutdown logic are entirely absent (CI-001). No exception paths to review in current code. **Re-review:** the validator now fails fast on misconfiguration. New — it does not enforce the design doc's `down-if-alone = on` requirement (CI-010). | | 5 | Security | ✓ | No authn/authz surface in this module. Akka remoting is unconfigured, so transport security cannot be assessed; flagged as part of the missing implementation (CI-001). No secret handling present. **Re-review:** still no authn/authz surface, no secret handling. No issues. | | 6 | Performance & resource management | ✓ | No streams, connections, timers, or `IDisposable` resources exist yet. No issues found in current code. **Re-review:** no resources held; the validator allocates a small failure list per call only. No issues. | | 7 | Design-document adherence | ✓ | Severe drift: the module implements none of its documented responsibilities (CI-001). `ClusterOptions` also omits remoting host/port, cluster role/site identifier, gRPC port, storage paths, and `down-if-alone` (CI-003). **Re-review:** CI-001/CI-003 resolved (ownership split documented; `DownIfAlone` added). New — `DownIfAlone` was added to the contract but never wired into the HOCON (CI-009). | | 8 | Code organization & conventions | ✓ | Options class is correctly owned by the component project. Missing config-section-name constant (CI-005) and missing `IValidateOptions`/data-annotation validation (CI-004) versus the Options pattern intent. **Re-review:** CI-004/CI-005 resolved; `SectionName` constant present and options/validator placement correct. No issues. | | 9 | Testing coverage | ✓ | `ClusterOptionsTests` covers defaults and setters. No tests for any cluster behaviour because none exists; the test project references nothing else (CI-006). **Re-review:** CI-006 resolved — 16 tests across three classes covering options, validator, and DI registration. No `DownIfAlone`-wiring test exists, but that wiring lives in the Host (CI-009). No new issue here. | | 10 | Documentation & comments | ✓ | `ClusterOptions` has no XML doc comments unlike peer options classes (CI-007). The "Phase 0 skeleton" placeholders are undocumented at the module level — no README or tracking note (CI-008). **Re-review:** CI-007/CI-008 resolved — full XML docs on all members; skeleton comments gone. Note: the `DownIfAlone` XML doc calls `true` "the design-doc requirement" yet the value is inert (CI-009) and unenforced (CI-010). | ## Findings ### ClusterInfrastructure-001 — Module implements none of its documented responsibilities | | | |--|--| | Severity | High | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:9`, `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:16` | **Description** `Component-ClusterInfrastructure.md` assigns this module seven concrete responsibilities: bootstrap the Akka.NET `ActorSystem`, form the two-node cluster, manage leader election / active-standby role assignment, detect node failures and trigger failover, provide remoting, host the cluster singleton, and manage the Windows service lifecycle. The entire module is two files: a `ClusterOptions` POCO and a `ServiceCollectionExtensions` whose methods are explicitly commented `// Phase 0: skeleton only` and `// Phase 0: placeholder for Akka actor registration` and simply return the unmodified `IServiceCollection`. There is no `Akka.Cluster`, `Akka.Cluster.Tools`, `Akka.Remote`, or split-brain-resolver dependency in the `.csproj` at all (it references only `Microsoft.Extensions.DependencyInjection.Abstractions`, `Microsoft.Extensions.Options`, and `ScadaLink.Commons`). Because every other ScadaLink component runs inside the actor system this module is responsible for creating, the absence of any implementation blocks the foundational layer of the system. **Recommendation** Track the gap explicitly (a milestone/issue) and implement the documented behaviour: add the Akka cluster/remote/cluster-tools and split-brain-resolver package references, build the cluster bootstrap (HOCON generation from `ClusterOptions`), the split-brain resolver configuration, cluster-singleton hosting support, and `CoordinatedShutdown` wiring. Until then, the module's `Status` and the design doc should clearly state it is unimplemented so callers do not assume otherwise. **Resolution** _Re-triaged 2026-05-16 — remains Open, needs a design decision from the user._ Verified against the source at the reviewed commit: the finding's factual claims hold. `src/ScadaLink.ClusterInfrastructure` still contains only `ClusterOptions.cs` and a no-op `ServiceCollectionExtensions.cs`, and the `.csproj` references no Akka packages. However, the documented cluster behaviour is **not actually absent from the system** — it has been implemented in the **Host** project rather than in this module: - `src/ScadaLink.Host/Actors/AkkaHostedService.cs` bootstraps the `ActorSystem`, generates the HOCON from `ClusterOptions` (it imports `ScadaLink.ClusterInfrastructure` and injects `IOptions`), and configures the `keep-oldest` split-brain resolver with `down-if-alone = on` (see `AkkaHostedService.cs:95-96`). - `src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs`, `AkkaClusterNodeProvider.cs`, and `Health/ActiveNodeHealthCheck.cs` cover cluster membership / active-node detection. - Akka cluster/remote package references live in `ScadaLink.Host.csproj` and the per-component projects (`SiteRuntime`, `Communication`, etc.). So the real situation is an **ownership / design-doc drift**, not missing behaviour: `Component-ClusterInfrastructure.md` assigns the Akka bootstrap, HOCON generation, split-brain config and `CoordinatedShutdown` wiring to this module, but the implementation deliberately lives in the Host. `ClusterOptions` is the one piece this module legitimately owns and it is consumed correctly by the Host. Resolving CI-001 as literally written is **not a small, well-scoped fix** — it is one of two substantial decisions, both requiring the user: 1. **Move the bootstrap into this module** — relocate the HOCON generation, split-brain config, cluster-singleton helpers and `CoordinatedShutdown` wiring out of `ScadaLink.Host` into `ScadaLink.ClusterInfrastructure`, add the Akka package references, and re-wire the Host to call into it. This is a cross-module refactor touching `src/ScadaLink.Host/*` and several other projects — outside the edit scope permitted for this finding (only `src/ScadaLink.ClusterInfrastructure/`, `tests/ScadaLink.ClusterInfrastructure.Tests/`, and this file may be edited). 2. **Accept the current placement** — keep the bootstrap in the Host and update `Component-ClusterInfrastructure.md` (and the README component table) to record that the Host owns the actor-system/cluster bootstrap and that this module's role is the shared `ClusterOptions` contract. That fix is a design-doc edit, also outside this module's permitted edit scope. Either path is a deliberate architecture decision, not a bug fix. The decision was surfaced to the user, who chose **option 2 — accept the current placement**: the Akka bootstrap stays in the Host (the single deployable binary that performs all actor-system bring-up), and the design docs are corrected to record the true ownership. **Resolved** — fixing commit ``, date 2026-05-16. The finding was a design-doc drift, not missing behaviour. `docs/requirements/Component-ClusterInfrastructure.md` now carries an "Implementation Note — Code Placement" section stating that the `ScadaLink.ClusterInfrastructure` project owns the `ClusterOptions` configuration model while `ScadaLink.Host` owns the Akka bootstrap, HOCON generation, split-brain-resolver wiring, `CoordinatedShutdown` integration, and active-node health checks. The README component table (row 13) was updated to match. No code change was required — the documented cluster behaviour already exists and is exercised; only the doc's module-ownership claim was wrong. Module test suite green (3 passed). ### ClusterInfrastructure-002 — No-op DI extension methods report success while doing nothing | | | |--|--| | Severity | Medium | | Category | Correctness & logic bugs | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:7-17` | **Description** `AddClusterInfrastructure` and `AddClusterInfrastructureActors` both accept an `IServiceCollection` and return it unchanged. A caller (e.g. the Host) that invokes `services.AddClusterInfrastructure()` receives a fluent, success-looking result but no actor system, no cluster, and no singleton support is actually registered. This is a silent failure: the system will appear to start, then fail later and far from the cause (e.g. when a component resolves an `ActorSystem` that was never added, or when the cluster singleton never forms). A no-op that masquerades as a completed registration is worse than an unimplemented method that throws. **Recommendation** Until the real implementation exists, make the placeholder loud rather than silent — either throw `NotImplementedException` from the methods, or have them log a prominent warning, so an integrating caller fails fast with a clear cause. Replace with the genuine registration when CI-001 is addressed. **Resolution** Confirmed against the source: both methods returned the `IServiceCollection` unchanged. Verified the consumers — `ScadaLink.Host` calls `AddClusterInfrastructure()` (`Program.cs:68`, `SiteServiceRegistration.cs:24`); `AddClusterInfrastructureActors` is dead — it is called nowhere in the solution. **Resolved** — fixing commit `commit pending`, date 2026-05-16. `AddClusterInfrastructure` now does real work: it registers the `ClusterOptionsValidator` (CI-004) via `TryAddEnumerable`, so the method is no longer a no-op and a misconfigured `ScadaLink:Cluster` section fails fast on the first `IOptions` resolution. `AddClusterInfrastructureActors` — which this component never had any actors to register, as CI-001 established the Akka bootstrap lives in `ScadaLink.Host` — now throws `NotImplementedException` with a message pointing the caller to the Host, rather than masquerading as a completed registration. Covered by `ServiceCollectionExtensionsTests` (`AddClusterInfrastructure_RegistersOptionsValidator`, `AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution`, `AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding`). ### ClusterInfrastructure-003 — ClusterOptions omits several documented node-configuration settings | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11` | **Description** The "Node Configuration", "Split-Brain Resolution", and "Failure Detection Timing" sections of the design doc enumerate the settings each node needs. `ClusterOptions` exposes `SeedNodes`, `SplitBrainResolverStrategy`, `StableAfter`, `HeartbeatInterval`, `FailureDetectionThreshold`, and `MinNrOfMembers`, but is missing: the Akka remoting hostname/port (default 8081 central, 8082 site), the cluster role (Central vs. Site) and the site identifier, the `down-if-alone` flag (the design explicitly requires `down-if-alone = on` for the keep-oldest resolver), and — for site nodes — the gRPC port (default 8083) and local SQLite storage paths. Without these, the options class cannot drive a correct HOCON configuration when CI-001 is implemented. (Some settings such as remoting host/port may instead belong in `Host/NodeOptions.cs`; the split of ownership should be decided deliberately, but at minimum `down-if-alone` belongs with the split-brain settings here.) **Recommendation** Add the missing settings — at minimum a `DownIfAlone` boolean (default `true`) and the cluster role / site identifier — or document explicitly which settings are owned by `Host/NodeOptions.cs` instead, so the design doc and the options classes agree on where each value lives. **Resolution** Partially re-triaged. Verified against the source: most of the "missing" settings are **deliberately owned by `ScadaLink.Host.NodeOptions`** — `NodeOptions` already carries `Role`, `NodeHostname`, `SiteId`, `RemotingPort` and `GrpcPort`, and `AkkaHostedService` builds the HOCON from `NodeOptions` for exactly those values. Local SQLite storage paths live in the database / store-and-forward options. This is the ownership split CI-001 established (the Host owns node identity and bootstrap; this project owns the cluster-formation contract), so those settings do **not** belong in `ClusterOptions`. The one genuine gap the finding identifies is `down-if-alone`, which the design doc puts with the split-brain settings. **Resolved** — fixing commit `commit pending`, date 2026-05-16. Added the `DownIfAlone` boolean (default `true`) to `ClusterOptions` so the split-brain configuration contract is complete, and added a class-level XML doc that records the deliberate ownership split — node identity/remoting/gRPC in `Host.NodeOptions`, storage paths in the database options, cluster-formation settings here — so the design doc and the options classes now agree on where each value lives. (`AkkaHostedService` currently hard-codes `down-if-alone = on` in HOCON; wiring it to read `DownIfAlone` is a one-line `ScadaLink.Host` change, outside this module's permitted edit scope, and is noted for the Host's review.) Covered by `ClusterOptionsTests.DefaultValues_AreCorrect` and `ClusterOptionsTests.DownIfAlone_CanBeSet`. ### ClusterInfrastructure-004 — ClusterOptions has no validation despite safety-critical values | | | |--|--| | Severity | Medium | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11` | **Description** `ClusterOptions` carries values whose misconfiguration has cluster-wide consequences. The design doc is emphatic that `min-nr-of-members` must be `1` (a value of `2` blocks the singleton and therefore all data collection indefinitely after failover), that `SplitBrainResolverStrategy` must be `keep-oldest` for a two-node cluster (quorum strategies cause total shutdown), and that the timing values are interdependent (`HeartbeatInterval` must be well below `FailureDetectionThreshold`). The class has no data annotations, no `IValidateOptions`, and no guard logic, so an `appsettings.json` setting `MinNrOfMembers: 2` or `SplitBrainResolverStrategy: "keep-majority"` (the exact value the test at `ClusterOptionsTests.cs:35` shows is settable) would be accepted silently and produce the catastrophic outcomes the design doc warns against. **Recommendation** Add validation — data annotations (`[Range]` for `MinNrOfMembers`, etc.) plus an `IValidateOptions` implementation that enforces `MinNrOfMembers == 1`, restricts `SplitBrainResolverStrategy` to a known set, requires `SeedNodes` non-empty, and asserts `HeartbeatInterval < FailureDetectionThreshold` and positive `StableAfter`. Register it with `ValidateOnStart()` so misconfiguration fails fast at boot. **Resolution** Confirmed: `ClusterOptions` had no validation of any kind, and the design doc's catastrophic-misconfiguration values (`MinNrOfMembers: 2`, a quorum split-brain strategy) would have been bound silently. **Resolved** — fixing commit `commit pending`, date 2026-05-16. Added `ClusterOptionsValidator : IValidateOptions`, which enforces `MinNrOfMembers == 1`, restricts `SplitBrainResolverStrategy` to the `keep-oldest`-only allowed set, requires a non-empty `SeedNodes`, requires positive `StableAfter` / `HeartbeatInterval` / `FailureDetectionThreshold`, and asserts `HeartbeatInterval < FailureDetectionThreshold`. It accumulates every failure into one result. It is registered by `AddClusterInfrastructure()` (CI-002) as a singleton `IValidateOptions`, so a misconfigured section throws `OptionsValidationException` on the first `IOptions.Value` resolution — which `AkkaHostedService` performs during startup, giving the fail-fast-at-boot behaviour the recommendation asks for without the src project taking a dependency on the full `Microsoft.Extensions.DependencyInjection` package needed for the `ValidateOnStart()` overload. Data annotations were not used — a single `IValidateOptions` implementation expresses the interdependent timing rules that attributes cannot. Covered by `ClusterOptionsValidatorTests` (8 cases) and `ServiceCollectionExtensionsTests.AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution`. ### ClusterInfrastructure-005 — No configuration section name constant for the Options pattern binding | | | |--|--| | Severity | Low | | Category | Code organization & conventions | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3` | **Description** CLAUDE.md specifies per-component configuration via `appsettings.json` sections bound with the Options pattern. `ClusterOptions` provides no `public const string SectionName` (or equivalent) for the binding site to reference, so whichever code binds the section must hard-code the magic string, and there is no single source of truth for the section name. Because `AddClusterInfrastructure` is itself a no-op (CI-002), the options class is currently bound nowhere at all, making the missing constant easy to overlook. **Recommendation** Add a `public const string SectionName = "Cluster";` (or the agreed name) to `ClusterOptions` and have the eventual `AddClusterInfrastructure` bind `configuration.GetSection(ClusterOptions.SectionName)` against it. **Resolution** Confirmed against the source: `ClusterOptions` previously exposed no section-name constant, leaving binding sites to hard-code the magic string. **Resolved** — fixing commit `commit pending`, date 2026-05-16. `ClusterOptions` now exposes `public const string SectionName = "ScadaLink:Cluster";` as the single source of truth for the `appsettings.json` section name, with an XML doc explaining its purpose. The chosen value matches the `ScadaLink:`-prefixed section convention used by peer option classes and referenced by `ClusterOptionsValidator` / the design doc. Covered by `ClusterOptionsTests.SectionName_IsTheExpectedAppSettingsSection`, which both pins the value and — by referencing the constant — guards against its removal (its deletion would break compilation). ### ClusterInfrastructure-006 — No tests for any cluster behaviour; only the options POCO is covered | | | |--|--| | Severity | Medium | | Category | Testing coverage | | Status | Resolved | | Location | `tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:1-51` | **Description** The test project contains only `ClusterOptionsTests`, exercising default values and property setters of `ClusterOptions`. There are no tests for cluster formation, leader election, failover detection, split-brain resolution, singleton handover, or the `ServiceCollectionExtensions` registration methods — none can exist because the behaviour itself is absent (CI-001). This is recorded so the testing gap is tracked alongside the implementation gap: the most safety-critical paths of the entire system (failover, split-brain, dual-node recovery) are completely untested. The test at line 30-50 also asserts that `SplitBrainResolverStrategy` can be set to `"keep-majority"`, implicitly endorsing a value the design doc forbids for a two-node cluster — see CI-004. **Recommendation** When CI-001 is implemented, add multi-node `Akka.Cluster.TestKit` / `MultiNodeTestKit` tests covering cluster formation, failover promotion, split-brain downing, and singleton handover, plus unit tests for HOCON generation from `ClusterOptions` and for the options validation from CI-004. **Resolution** Re-triaged in light of CI-001's resolution. The Akka bootstrap, HOCON generation, cluster formation, failover and singleton handover are owned by `ScadaLink.Host`, not this project — multi-node `Akka.Cluster.TestKit` tests for that behaviour belong in the Host's test suite, outside this module's scope. What this module legitimately owns is `ClusterOptions`, its validator, and the DI registration, and the testing gap there is now closed. **Resolved** — fixing commit `commit pending`, date 2026-05-16. Added two test classes to `tests/ScadaLink.ClusterInfrastructure.Tests`: `ClusterOptionsValidatorTests` (8 cases — valid defaults pass; `MinNrOfMembers != 1`, unsupported split-brain strategies, empty seed nodes, heartbeat not below the failure threshold, non-positive `StableAfter` all fail; and a multi-failure accumulation case) and `ServiceCollectionExtensionsTests` (3 cases — `AddClusterInfrastructure` registers the validator, the validator rejects bad options at `IOptions` resolution, and `AddClusterInfrastructureActors` throws). The pre-existing `ClusterOptionsTests` was extended with `DownIfAlone` coverage. The test project gained references to `Microsoft.Extensions.DependencyInjection` and `Microsoft.Extensions.Options`. Module test suite green: 16 passed (was 3). Note: the `keep-majority` value used in the pre-existing `ClusterOptionsTests.Properties_CanBeSetToCustomValues` is intentionally left — that test exercises the POCO's property setter (the POCO accepts any string by design); `ClusterOptionsValidator` is the layer that now rejects `keep-majority`, and `UnsupportedSplitBrainStrategy_FailsValidation` proves it. ### ClusterInfrastructure-007 — ClusterOptions lacks XML documentation comments | | | |--|--| | Severity | Low | | Category | Documentation & comments | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11` | **Description** `ClusterOptions` and each of its six properties have no XML doc comments. Peer options classes such as `StoreAndForward/StoreAndForwardOptions.cs` document the class and every property (including units and design-doc references). For a class whose values carry the cluster-wide consequences described in the design doc (notably `MinNrOfMembers` and `SplitBrainResolverStrategy`), the absence of inline documentation is a maintainability and safety gap — a future editor has no in-code warning that `MinNrOfMembers` must stay `1`. **Recommendation** Add `` comments to the class and each property, stating units and the documented constraints (e.g. that `MinNrOfMembers` must be `1`, that `HeartbeatInterval` must be well below `FailureDetectionThreshold`), referencing the relevant design-doc sections as peer modules do. **Resolution** Confirmed against the source: `ClusterOptions` previously had no XML doc comments on the class or any property. **Resolved** — fixing commit `commit pending`, date 2026-05-16. `ClusterOptions` now carries a class-level `` documenting the cluster configuration contract and the deliberate ownership split (node identity / remoting / gRPC in `Host.NodeOptions`, storage paths in the database options, cluster-formation settings here), plus `` comments on all seven properties — each stating units, defaults, and the design-doc constraints. Notably `MinNrOfMembers` is documented as required to be `1` (a value of `2` blocks the cluster singleton after failover) and `HeartbeatInterval` as required to be well below `FailureDetectionThreshold`, giving a future editor the in-code warnings the recommendation asks for. This is a documentation-only change, so no runtime regression test is meaningful; verified by inspection of `ClusterOptions.cs:3-74`. Module test suite green (17 passed). ### ClusterInfrastructure-008 — "Phase 0 skeleton" status is undocumented at the module level | | | |--|--| | Severity | Low | | Category | Documentation & comments | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:9`, `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:16` | **Description** The only indication that this foundational module is unimplemented is two inline comments inside private method bodies (`// Phase 0: skeleton only` / `// Phase 0: placeholder for Akka actor registration`). There is no module README, no `` in the design doc, and no tracking marker visible to anyone reading the project structure or the component table. Given that the design doc (`Component-ClusterInfrastructure.md`) describes a fully featured component with no caveat, a reader will reasonably assume the module is built. The mismatch between a complete-looking design doc and an empty implementation is itself a documentation defect. **Recommendation** Add a short note to the design doc (or a module-level `README.md`) stating the current implementation status and what "Phase 0" delivers, and reference a tracked issue for the remaining work (CI-001). Keep the README component table accurate about which components are skeletons versus implemented. **Resolution** Re-triaged in light of CI-001's resolution. The original defect was the mismatch between two misleading inline `// Phase 0: skeleton only` / `// Phase 0: placeholder` comments buried in private method bodies and a complete-looking design doc with no caveat. That premise has been overtaken by the CI-001/CI-002 work: - The "Phase 0 skeleton" comments no longer exist anywhere in `src/ScadaLink.ClusterInfrastructure` (verified by `grep`). `ServiceCollectionExtensions` now does real work (registers `ClusterOptionsValidator`) and `AddClusterInfrastructureActors` throws explicitly — both with accurate XML docs explaining the ownership split. - The module is no longer an unimplemented skeleton. CI-001 established that the Akka bootstrap legitimately lives in `ScadaLink.Host`, and this project's true scope — the `ClusterOptions` configuration contract, its validator, and DI registration — is fully implemented and tested. - The design doc `Component-ClusterInfrastructure.md` now opens with an "Implementation Note — Code Placement" section (added by CI-001) that explicitly states the component is a *design responsibility* realised across `ScadaLink.ClusterInfrastructure` (configuration model) and `ScadaLink.Host` (bootstrap/runtime wiring), and the README component table (row 13) was updated to match. A reader of the design doc no longer assumes a single fully-built project. **Resolved** — fixing commit `commit pending`, date 2026-05-16. No further change required: the misleading skeleton comments are gone, the module's real status is documented at the design-doc level via the "Implementation Note — Code Placement" section, and the component table reflects the true placement. This is a documentation-only finding, so no runtime regression test is meaningful; verified by inspection of `ServiceCollectionExtensions.cs` and `docs/requirements/Component-ClusterInfrastructure.md:21-39`. ### ClusterInfrastructure-009 — `DownIfAlone` is an inert configuration knob — never consumed by the HOCON builder | | | |--|--| | Severity | Medium | | Category | Design-document adherence | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:74` | **Description** The `DownIfAlone` property was added to `ClusterOptions` by CI-003's resolution as part of "the split-brain configuration contract". It is public, defaults to `true`, carries an XML doc presenting it as "the design-doc requirement", and is exercised by `ClusterOptionsTests.DownIfAlone_CanBeSet`. However, nothing in the system reads it. The Akka.NET HOCON is generated by `ScadaLink.Host.Actors.AkkaHostedService.BuildHocon`, which **hard-codes** the resolver setting: ``` split-brain-resolver { active-strategy = ... stable-after = ... keep-oldest { down-if-alone = on } } ``` `BuildHocon` receives the full `ClusterOptions` instance and consumes every other field (`SeedNodes`, `MinNrOfMembers`, `SplitBrainResolverStrategy`, `StableAfter`, `HeartbeatInterval`, `FailureDetectionThreshold`) but ignores `DownIfAlone` entirely. The result is a configuration property that an operator can set in `appsettings.json`, that passes validation, and that has **zero runtime effect** — setting `DownIfAlone: false` does not turn the flag off. CI-003's resolution explicitly acknowledged this gap ("wiring it to read `DownIfAlone` is a one-line `ScadaLink.Host` change ... noted for the Host's review") but the wiring was never done and no tracked finding carried it, so the gap has silently persisted to commit `39d737e`. An inert, misleadingly-documented configuration knob is a correctness and design-adherence defect: it gives operators a false sense of control over a safety-critical resolver setting. **Recommendation** Either (a) wire `DownIfAlone` into `BuildHocon` — emit `down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")}` — so the property does what its XML doc claims (a Host-side change, to be tracked in the Host module's review since `BuildHocon` lives there), or (b) if the flag is intentionally fixed at `on` and must never be operator-configurable, remove the `DownIfAlone` property from `ClusterOptions` and document the hard-coded `on` value as a non-negotiable invariant. Do not leave a public, settable, validated property that controls nothing. **Resolution** Root cause verified against the source at commit `39d737e`: `src/ScadaLink.Host/Actors/AkkaHostedService.cs:147` hard-codes `down-if-alone = on` inside the `keep-oldest` block, and `BuildHocon` consumes every other `ClusterOptions` field but never reads `clusterOptions.DownIfAlone`. The finding's facts hold. The fix is correctly scoped to the **Host** module — the configuration property and its validation legitimately live in `ScadaLink.ClusterInfrastructure` (this module), and the per-CLAUDE.md ownership split (CI-001) places HOCON generation in `ScadaLink.Host`. **Resolved** — by cross-reference, date 2026-05-17. No code change in this module: `ClusterOptions.DownIfAlone` and its validation (CI-010) are correct and complete here. The consumption fix — wiring `DownIfAlone` into `BuildHocon` so the property has runtime effect — is tracked and fixed as **Host-012** in `code-reviews/Host/findings.md`, addressed this session. CI-009 closes against that cross-reference. Module test suite green (18 passed). ### ClusterInfrastructure-010 — Validator does not enforce `DownIfAlone = true` despite the design doc requiring it | | | |--|--| | Severity | Low | | Category | Error handling & resilience | | Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ClusterOptionsValidator.cs:21-71` | **Description** `Component-ClusterInfrastructure.md` (Split-Brain Resolution) states the keep-oldest resolver must be configured with `down-if-alone = on`, and the XML doc on `ClusterOptions.DownIfAlone` calls `true` "the design-doc requirement" — the rationale being that without it the oldest node can run as an isolated single-node cluster during a partition while the younger node forms its own. `ClusterOptionsValidator` guards every other safety-critical setting (`MinNrOfMembers == 1`, `keep-oldest`-only strategy, positive timings, heartbeat below the failure threshold) but performs no check on `DownIfAlone`. A configuration of `DownIfAlone: false` therefore passes validation cleanly. This is currently latent because CI-009 shows the property is not consumed at all, but the moment CI-009 is fixed by wiring the property into the HOCON (option (a)), `DownIfAlone: false` would silently produce the unsafe single-node behaviour the design doc explicitly forbids — with no fail-fast guard. The validator is the right place to enforce the invariant, consistent with how it already rejects quorum split-brain strategies. **Recommendation** If CI-009 is resolved by keeping `DownIfAlone` configurable, add a check to `ClusterOptionsValidator.Validate` that fails when `DownIfAlone` is `false` (or, if some future deployment legitimately needs it off, fails only in combination with the `keep-oldest` strategy), with a message explaining the isolated-single-node-cluster hazard. If CI-009 is resolved by removing the property, this finding is moot and should be closed as resolved alongside it. **Resolution** Confirmed against the source: `ClusterOptionsValidator.Validate` guarded every other safety-critical setting but performed no check on `DownIfAlone`, so `DownIfAlone: false` passed validation cleanly. With CI-009/Host-012 wiring the property into the HOCON this session, an unguarded `false` would now silently produce the unsafe isolated-single-node behaviour the design doc forbids. **Resolved** — fixing commit `commit pending`, date 2026-05-17. Added a `DownIfAlone` check to `ClusterOptionsValidator.Validate` that fails when the flag is `false`, with a message explaining the isolated-single-node-cluster hazard, consistent with how the validator already rejects quorum split-brain strategies. Developed test-first: `ClusterOptionsValidatorTests.DownIfAloneFalse_FailsValidation` was written first, confirmed failing, then passing after the fix. Module test suite green (18 passed).