Doc/XML-comment drift + small adherence fixes across 17 modules. Highlights: - Host-017: site CoordinatedShutdown ordering — SiteStreamGrpcServer gains CancelAllStreams() (refuse new streams, cancel active), wired into Program.cs site branch via ApplicationStopping. - InboundAPI-021: ParentExecutionId now travels on RouteToGet/SetAttributes symmetric with RouteToCallRequest; RouteHelper stamps from _parentExecutionId. - ClusterInfra-012: ClusterOptionsValidator now requires both seed nodes. - Comm-018: SiteCommunicationActor.HeartbeatMessage.IsActive derived from cluster leader check (was hardcoded true). - DM-020: reconciliation audit row attributes the current user, not prior deployer. - SEL-019: EventLogPurgeService early-exits on standby via active-node check. - Plus comment/XML-doc accuracy fixes across AuditLog, ConfigurationDatabase, NotificationOutbox, SiteRuntime, SiteCallAudit; doc refreshes for Component- Commons / -ManagementService / -CLI / -ExternalSystemGateway / -HealthMonitoring / -Transport / -ConfigurationDatabase; CD-023 index-name doc alignment. 11 new regression tests (RouteHelper x4, SiteStreamGrpcServer x2, ClusterOptionsValidator x1, SiteCommunicationActor x1, DeploymentService x1, EventLogPurgeService x3). Build clean (0 warnings); InboundAPI/Communication/ Host suites all green. README regenerated: 112 open (was 136).
50 KiB
Code Review — ClusterInfrastructure
| Field | Value |
|---|---|
| Module | src/ScadaLink.ClusterInfrastructure |
| Design doc | docs/requirements/Component-ClusterInfrastructure.md |
| Status | Reviewed |
| Last reviewed | 2026-05-28 |
| Reviewer | claude-agent |
| Commit reviewed | 1eb6e97 |
| Open findings | 4 |
Summary
The ClusterInfrastructure module is currently a Phase 0 skeleton. It contains
only two source files: ClusterOptions.cs, a plain options POCO, and
ServiceCollectionExtensions.cs, whose two registration methods are explicit no-ops.
None of the responsibilities described in Component-ClusterInfrastructure.md —
Akka.NET cluster bootstrap, leader election, failover detection, split-brain
resolution, cluster singleton hosting, Windows service lifecycle — are implemented.
There are therefore no correctness, concurrency, or Akka-convention defects to find
in behaviour, because there is no behaviour. The findings below instead concern
(a) the large gap between the design doc and the code, (b) the options class missing
the validation, configuration-binding affordances, and coverage of documented
settings that peer modules provide, and (c) the no-op DI extensions silently
returning success, which is a latent reliability hazard once the Host wires this
module in. The dominant theme is incompleteness: this module is the foundation
every other component runs on, yet it presently delivers nothing the design requires.
The single options class is clean and its test covers defaults and setters
adequately for what exists.
Re-review 2026-05-17 (commit 39d737e). All eight prior findings (CI-001..008)
were resolved by the batch of work between 9c60592 and 39d737e: ClusterOptions
gained XML docs, the SectionName constant, and the DownIfAlone property;
ClusterOptionsValidator was added; ServiceCollectionExtensions now registers the
validator and throws from the dead actor-registration method; and the test project
grew to 16 cases across three test classes. The module is in good shape — the
ClusterOptions contract, its validator, and the DI registration are all sound,
well-documented, and well-tested. This re-review examined all three source files and
all three test files against the full 10-category checklist and found two new
issues, both stemming from work the prior review explicitly deferred to a "Host
review" that has not happened: the DownIfAlone property is exposed and validated as
part of the configuration contract but is never consumed — ScadaLink.Host's
BuildHocon still hard-codes down-if-alone = on (CI-009, Medium) — and the validator
does not enforce the design doc's requirement that down-if-alone be on for the
keep-oldest resolver, so DownIfAlone = false is silently accepted (CI-010, Low).
Re-review 2026-05-28 (commit 1eb6e97)
The only change to this module between 39d737e and 1eb6e97 is the
documentation-only commit 1eb6e97 itself, which added a handful of <param>
XML doc tags to ClusterOptionsValidator.Validate and to
AddClusterInfrastructureActors — no source-of-truth changes. Walked all three
source files and all three test files against the full 10-category checklist
again. Found four new issues, all Low severity, that the prior re-review
either did not surface or that have aged into the file:
- CI-011 (Low, Code organization) —
ClusterOptions.SectionNameis documented as "the single source of truth so binding sites do not hard-code the magic string" (the very justification CI-005's resolution offered), butScadaLink.Host.SiteServiceRegistration.BindSharedOptions:100and three references inScadaLink.Host.StartupValidatorall hard-code"ScadaLink:Cluster"literals. The constant is decorative — a "single source of truth" that nothing reads. Same pattern as CI-009 (inert configuration knob). - CI-012 (Low, Design-document adherence) — the validator accepts
SeedNodes.Count == 1even though the design doc states "both nodes are seed nodes" (a properly-configured deployment lists 2).Host.StartupValidator:45already enforces>= 2, so this module's own contract validator is the weaker of the two. Inconsistent enforcement across the two projects that share ownership of the cluster contract. - CI-013 (Low, Documentation & comments) —
ClusterOptionsTests .Properties_CanBeSetToCustomValuesdeliberately setsSplitBrainResolverStrategy = "keep-majority"andMinNrOfMembers = 2— the exact values the design doc warns are catastrophic. The CI-006 resolution acknowledged this is intentional (testing the POCO accepts any value; the validator does the rejecting) but the test has no inline comment saying so, and a future reader could easily misinterpret it as endorsing those values. - CI-014 (Low, Code organization) —
AddClusterInfrastructureActorsis dead surface: no caller exists anywhere in the solution (verified viagrep -rn), its XML doc instructs callers "do not call", and its body unconditionally throws. CI-002's resolution chose "fail loudly" over "delete" but the method now offers nothing — keeping it is API-surface noise that an IDE will still suggest via auto-complete.
Checklist coverage
Original review (2026-05-16, 9c60592) below; the re-review notes (2026-05-17,
39d737e) are appended in each row.
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ✓ | No executable logic exists beyond an options POCO; no logic bugs, but ServiceCollectionExtensions returns success while doing nothing (CI-002). Re-review: CI-002 resolved. New — DownIfAlone is a settable property that controls nothing because the HOCON builder hard-codes the value (CI-009). |
| 2 | Akka.NET conventions | ✓ | No actors, no ActorSystem bootstrap, no supervision, no cluster/singleton wiring exist despite the design doc requiring all of them (CI-001). Nothing to assess against Tell/Ask, immutability, or PipeTo. Re-review: confirmed the Akka bootstrap legitimately lives in ScadaLink.Host (CI-001 resolution); still nothing actor-related in this module. No issues. |
| 3 | Concurrency & thread safety | ✓ | No shared mutable state, no actors, no async code. No issues found in current code. Re-review: validator and DI extensions are stateless; no issues. |
| 4 | Error handling & resilience | ✓ | Failover, split-brain, dual-node recovery, and graceful-shutdown logic are entirely absent (CI-001). No exception paths to review in current code. Re-review: the validator now fails fast on misconfiguration. New — it does not enforce the design doc's down-if-alone = on requirement (CI-010). |
| 5 | Security | ✓ | No authn/authz surface in this module. Akka remoting is unconfigured, so transport security cannot be assessed; flagged as part of the missing implementation (CI-001). No secret handling present. Re-review: still no authn/authz surface, no secret handling. No issues. |
| 6 | Performance & resource management | ✓ | No streams, connections, timers, or IDisposable resources exist yet. No issues found in current code. Re-review: no resources held; the validator allocates a small failure list per call only. No issues. |
| 7 | Design-document adherence | ✓ | Severe drift: the module implements none of its documented responsibilities (CI-001). ClusterOptions also omits remoting host/port, cluster role/site identifier, gRPC port, storage paths, and down-if-alone (CI-003). Re-review: CI-001/CI-003 resolved (ownership split documented; DownIfAlone added). New — DownIfAlone was added to the contract but never wired into the HOCON (CI-009). |
| 8 | Code organization & conventions | ✓ | Options class is correctly owned by the component project. Missing config-section-name constant (CI-005) and missing IValidateOptions/data-annotation validation (CI-004) versus the Options pattern intent. Re-review: CI-004/CI-005 resolved; SectionName constant present and options/validator placement correct. No issues. |
| 9 | Testing coverage | ✓ | ClusterOptionsTests covers defaults and setters. No tests for any cluster behaviour because none exists; the test project references nothing else (CI-006). Re-review: CI-006 resolved — 16 tests across three classes covering options, validator, and DI registration. No DownIfAlone-wiring test exists, but that wiring lives in the Host (CI-009). No new issue here. |
| 10 | Documentation & comments | ✓ | ClusterOptions has no XML doc comments unlike peer options classes (CI-007). The "Phase 0 skeleton" placeholders are undocumented at the module level — no README or tracking note (CI-008). Re-review: CI-007/CI-008 resolved — full XML docs on all members; skeleton comments gone. Note: the DownIfAlone XML doc calls true "the design-doc requirement" yet the value is inert (CI-009) and unenforced (CI-010). |
Re-review (2026-05-28, 1eb6e97):
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ✓ | Validator logic and DI registration are correct. No new defects. |
| 2 | Akka.NET conventions | ✓ | No actors in this module (legitimate, per CI-001 resolution). Nothing actor-shaped to evaluate. |
| 3 | Concurrency & thread safety | ✓ | Validator and DI extensions remain stateless. No issues. |
| 4 | Error handling & resilience | ✓ | Validator now rejects every catastrophic value the design doc enumerates. New — it accepts SeedNodes.Count == 1 even though the design doc requires both nodes as seeds, and Host.StartupValidator enforces >= 2, so the module's own validator is the weaker check (CI-012). |
| 5 | Security | ✓ | No authn/authz surface, no secret handling, no remoting transport configured here. No issues. |
| 6 | Performance & resource management | ✓ | No resources held; validator allocates a small failure list per call only. No issues. |
| 7 | Design-document adherence | ✓ | ClusterOptions contract complete and validated. New — validator's seed-node count check is weaker than the design (CI-012). |
| 8 | Code organization & conventions | ✓ | Options/validator placement and Options pattern correct. New — SectionName constant documented as "single source of truth" but never read by any binding site (CI-011); AddClusterInfrastructureActors is dead surface that no caller invokes (CI-014). |
| 9 | Testing coverage | ✓ | 16 tests across three classes. New — ClusterOptionsTests.Properties_CanBeSetToCustomValues sets the exact catastrophic values the design doc forbids without an inline comment explaining why (CI-013). |
| 10 | Documentation & comments | ✓ | XML docs accurate across all source files (commit 1eb6e97 filled in the remaining <param> tags). New — CI-013 (test lacks intent comment); CI-011 (XML doc for SectionName claims a property the code does not deliver). |
Findings
ClusterInfrastructure-001 — Module implements none of its documented responsibilities
| Severity | High |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:9, src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:16 |
Description
Component-ClusterInfrastructure.md assigns this module seven concrete
responsibilities: bootstrap the Akka.NET ActorSystem, form the two-node cluster,
manage leader election / active-standby role assignment, detect node failures and
trigger failover, provide remoting, host the cluster singleton, and manage the
Windows service lifecycle. The entire module is two files: a ClusterOptions POCO
and a ServiceCollectionExtensions whose methods are explicitly commented
// Phase 0: skeleton only and // Phase 0: placeholder for Akka actor registration
and simply return the unmodified IServiceCollection. There is no Akka.Cluster,
Akka.Cluster.Tools, Akka.Remote, or split-brain-resolver dependency in the
.csproj at all (it references only Microsoft.Extensions.DependencyInjection.Abstractions,
Microsoft.Extensions.Options, and ScadaLink.Commons). Because every other
ScadaLink component runs inside the actor system this module is responsible for
creating, the absence of any implementation blocks the foundational layer of the
system.
Recommendation
Track the gap explicitly (a milestone/issue) and implement the documented behaviour:
add the Akka cluster/remote/cluster-tools and split-brain-resolver package
references, build the cluster bootstrap (HOCON generation from ClusterOptions),
the split-brain resolver configuration, cluster-singleton hosting support, and
CoordinatedShutdown wiring. Until then, the module's Status and the design doc
should clearly state it is unimplemented so callers do not assume otherwise.
Resolution
Re-triaged 2026-05-16 — remains Open, needs a design decision from the user.
Verified against the source at the reviewed commit: the finding's factual claims hold.
src/ScadaLink.ClusterInfrastructure still contains only ClusterOptions.cs and a
no-op ServiceCollectionExtensions.cs, and the .csproj references no Akka packages.
However, the documented cluster behaviour is not actually absent from the system — it has been implemented in the Host project rather than in this module:
src/ScadaLink.Host/Actors/AkkaHostedService.csbootstraps theActorSystem, generates the HOCON fromClusterOptions(it importsScadaLink.ClusterInfrastructureand injectsIOptions<ClusterOptions>), and configures thekeep-oldestsplit-brain resolver withdown-if-alone = on(seeAkkaHostedService.cs:95-96).src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs,AkkaClusterNodeProvider.cs, andHealth/ActiveNodeHealthCheck.cscover cluster membership / active-node detection.- Akka cluster/remote package references live in
ScadaLink.Host.csprojand the per-component projects (SiteRuntime,Communication, etc.).
So the real situation is an ownership / design-doc drift, not missing behaviour:
Component-ClusterInfrastructure.md assigns the Akka bootstrap, HOCON generation,
split-brain config and CoordinatedShutdown wiring to this module, but the
implementation deliberately lives in the Host. ClusterOptions is the one piece this
module legitimately owns and it is consumed correctly by the Host.
Resolving CI-001 as literally written is not a small, well-scoped fix — it is one of two substantial decisions, both requiring the user:
- Move the bootstrap into this module — relocate the HOCON generation, split-brain
config, cluster-singleton helpers and
CoordinatedShutdownwiring out ofScadaLink.HostintoScadaLink.ClusterInfrastructure, add the Akka package references, and re-wire the Host to call into it. This is a cross-module refactor touchingsrc/ScadaLink.Host/*and several other projects — outside the edit scope permitted for this finding (onlysrc/ScadaLink.ClusterInfrastructure/,tests/ScadaLink.ClusterInfrastructure.Tests/, and this file may be edited). - Accept the current placement — keep the bootstrap in the Host and update
Component-ClusterInfrastructure.md(and the README component table) to record that the Host owns the actor-system/cluster bootstrap and that this module's role is the sharedClusterOptionscontract. That fix is a design-doc edit, also outside this module's permitted edit scope.
Either path is a deliberate architecture decision, not a bug fix. The decision was surfaced to the user, who chose option 2 — accept the current placement: the Akka bootstrap stays in the Host (the single deployable binary that performs all actor-system bring-up), and the design docs are corrected to record the true ownership.
Resolved — fixing commit <pending>, date 2026-05-16. The finding was a design-doc
drift, not missing behaviour. docs/requirements/Component-ClusterInfrastructure.md now
carries an "Implementation Note — Code Placement" section stating that the
ScadaLink.ClusterInfrastructure project owns the ClusterOptions configuration model
while ScadaLink.Host owns the Akka bootstrap, HOCON generation, split-brain-resolver
wiring, CoordinatedShutdown integration, and active-node health checks. The README
component table (row 13) was updated to match. No code change was required — the
documented cluster behaviour already exists and is exercised; only the doc's
module-ownership claim was wrong. Module test suite green (3 passed).
ClusterInfrastructure-002 — No-op DI extension methods report success while doing nothing
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:7-17 |
Description
AddClusterInfrastructure and AddClusterInfrastructureActors both accept an
IServiceCollection and return it unchanged. A caller (e.g. the Host) that invokes
services.AddClusterInfrastructure() receives a fluent, success-looking result but
no actor system, no cluster, and no singleton support is actually registered. This
is a silent failure: the system will appear to start, then fail later and far from
the cause (e.g. when a component resolves an ActorSystem that was never added, or
when the cluster singleton never forms). A no-op that masquerades as a completed
registration is worse than an unimplemented method that throws.
Recommendation
Until the real implementation exists, make the placeholder loud rather than silent —
either throw NotImplementedException from the methods, or have them log a
prominent warning, so an integrating caller fails fast with a clear cause. Replace
with the genuine registration when CI-001 is addressed.
Resolution
Confirmed against the source: both methods returned the IServiceCollection
unchanged. Verified the consumers — ScadaLink.Host calls AddClusterInfrastructure()
(Program.cs:68, SiteServiceRegistration.cs:24); AddClusterInfrastructureActors
is dead — it is called nowhere in the solution.
Resolved — fixing commit commit pending, date 2026-05-16.
AddClusterInfrastructure now does real work: it registers the
ClusterOptionsValidator (CI-004) via TryAddEnumerable, so the method is no longer a
no-op and a misconfigured ScadaLink:Cluster section fails fast on the first
IOptions<ClusterOptions> resolution. AddClusterInfrastructureActors — which this
component never had any actors to register, as CI-001 established the Akka bootstrap
lives in ScadaLink.Host — now throws NotImplementedException with a message
pointing the caller to the Host, rather than masquerading as a completed registration.
Covered by ServiceCollectionExtensionsTests
(AddClusterInfrastructure_RegistersOptionsValidator,
AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution,
AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding).
ClusterInfrastructure-003 — ClusterOptions omits several documented node-configuration settings
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11 |
Description
The "Node Configuration", "Split-Brain Resolution", and "Failure Detection Timing"
sections of the design doc enumerate the settings each node needs. ClusterOptions
exposes SeedNodes, SplitBrainResolverStrategy, StableAfter,
HeartbeatInterval, FailureDetectionThreshold, and MinNrOfMembers, but is
missing: the Akka remoting hostname/port (default 8081 central, 8082 site), the
cluster role (Central vs. Site) and the site identifier, the down-if-alone flag
(the design explicitly requires down-if-alone = on for the keep-oldest resolver),
and — for site nodes — the gRPC port (default 8083) and local SQLite storage paths.
Without these, the options class cannot drive a correct HOCON configuration when
CI-001 is implemented. (Some settings such as remoting host/port may instead belong
in Host/NodeOptions.cs; the split of ownership should be decided deliberately, but
at minimum down-if-alone belongs with the split-brain settings here.)
Recommendation
Add the missing settings — at minimum a DownIfAlone boolean (default true) and
the cluster role / site identifier — or document explicitly which settings are
owned by Host/NodeOptions.cs instead, so the design doc and the options classes
agree on where each value lives.
Resolution
Partially re-triaged. Verified against the source: most of the "missing" settings are
deliberately owned by ScadaLink.Host.NodeOptions — NodeOptions already carries
Role, NodeHostname, SiteId, RemotingPort and GrpcPort, and AkkaHostedService
builds the HOCON from NodeOptions for exactly those values. Local SQLite storage paths
live in the database / store-and-forward options. This is the ownership split CI-001
established (the Host owns node identity and bootstrap; this project owns the
cluster-formation contract), so those settings do not belong in ClusterOptions.
The one genuine gap the finding identifies is down-if-alone, which the design doc
puts with the split-brain settings.
Resolved — fixing commit commit pending, date 2026-05-16. Added the
DownIfAlone boolean (default true) to ClusterOptions so the split-brain
configuration contract is complete, and added a class-level XML doc that records the
deliberate ownership split — node identity/remoting/gRPC in Host.NodeOptions, storage
paths in the database options, cluster-formation settings here — so the design doc and
the options classes now agree on where each value lives. (AkkaHostedService currently
hard-codes down-if-alone = on in HOCON; wiring it to read DownIfAlone is a one-line
ScadaLink.Host change, outside this module's permitted edit scope, and is noted for
the Host's review.) Covered by ClusterOptionsTests.DefaultValues_AreCorrect and
ClusterOptionsTests.DownIfAlone_CanBeSet.
ClusterInfrastructure-004 — ClusterOptions has no validation despite safety-critical values
| Severity | Medium |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11 |
Description
ClusterOptions carries values whose misconfiguration has cluster-wide
consequences. The design doc is emphatic that min-nr-of-members must be 1 (a
value of 2 blocks the singleton and therefore all data collection indefinitely
after failover), that SplitBrainResolverStrategy must be keep-oldest for a
two-node cluster (quorum strategies cause total shutdown), and that the timing
values are interdependent (HeartbeatInterval must be well below
FailureDetectionThreshold). The class has no data annotations, no
IValidateOptions<ClusterOptions>, and no guard logic, so an appsettings.json
setting MinNrOfMembers: 2 or SplitBrainResolverStrategy: "keep-majority" (the
exact value the test at ClusterOptionsTests.cs:35 shows is settable) would be
accepted silently and produce the catastrophic outcomes the design doc warns
against.
Recommendation
Add validation — data annotations ([Range] for MinNrOfMembers, etc.) plus an
IValidateOptions<ClusterOptions> implementation that enforces
MinNrOfMembers == 1, restricts SplitBrainResolverStrategy to a known set,
requires SeedNodes non-empty, and asserts HeartbeatInterval < FailureDetectionThreshold and positive StableAfter. Register it with
ValidateOnStart() so misconfiguration fails fast at boot.
Resolution
Confirmed: ClusterOptions had no validation of any kind, and the design doc's
catastrophic-misconfiguration values (MinNrOfMembers: 2, a quorum split-brain
strategy) would have been bound silently.
Resolved — fixing commit commit pending, date 2026-05-16. Added
ClusterOptionsValidator : IValidateOptions<ClusterOptions>, which enforces
MinNrOfMembers == 1, restricts SplitBrainResolverStrategy to the
keep-oldest-only allowed set, requires a non-empty SeedNodes, requires positive
StableAfter / HeartbeatInterval / FailureDetectionThreshold, and asserts
HeartbeatInterval < FailureDetectionThreshold. It accumulates every failure into one
result. It is registered by AddClusterInfrastructure() (CI-002) as a singleton
IValidateOptions<ClusterOptions>, so a misconfigured section throws
OptionsValidationException on the first IOptions<ClusterOptions>.Value resolution
— which AkkaHostedService performs during startup, giving the fail-fast-at-boot
behaviour the recommendation asks for without the src project taking a dependency on
the full Microsoft.Extensions.DependencyInjection package needed for the
ValidateOnStart() overload. Data annotations were not used — a single
IValidateOptions implementation expresses the interdependent timing rules that
attributes cannot. Covered by ClusterOptionsValidatorTests (8 cases) and
ServiceCollectionExtensionsTests.AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution.
ClusterInfrastructure-005 — No configuration section name constant for the Options pattern binding
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3 |
Description
CLAUDE.md specifies per-component configuration via appsettings.json sections
bound with the Options pattern. ClusterOptions provides no public const string SectionName (or equivalent) for the binding site to reference, so whichever code
binds the section must hard-code the magic string, and there is no single source of
truth for the section name. Because AddClusterInfrastructure is itself a no-op
(CI-002), the options class is currently bound nowhere at all, making the missing
constant easy to overlook.
Recommendation
Add a public const string SectionName = "Cluster"; (or the agreed name) to
ClusterOptions and have the eventual AddClusterInfrastructure bind
configuration.GetSection(ClusterOptions.SectionName) against it.
Resolution
Confirmed against the source: ClusterOptions previously exposed no section-name
constant, leaving binding sites to hard-code the magic string.
Resolved — fixing commit commit pending, date 2026-05-16. ClusterOptions now
exposes public const string SectionName = "ScadaLink:Cluster"; as the single source
of truth for the appsettings.json section name, with an XML doc explaining its
purpose. The chosen value matches the ScadaLink:-prefixed section convention used by
peer option classes and referenced by ClusterOptionsValidator / the design doc.
Covered by ClusterOptionsTests.SectionName_IsTheExpectedAppSettingsSection, which
both pins the value and — by referencing the constant — guards against its removal
(its deletion would break compilation).
ClusterInfrastructure-006 — No tests for any cluster behaviour; only the options POCO is covered
| Severity | Medium |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:1-51 |
Description
The test project contains only ClusterOptionsTests, exercising default values and
property setters of ClusterOptions. There are no tests for cluster formation,
leader election, failover detection, split-brain resolution, singleton handover, or
the ServiceCollectionExtensions registration methods — none can exist because the
behaviour itself is absent (CI-001). This is recorded so the testing gap is tracked
alongside the implementation gap: the most safety-critical paths of the entire
system (failover, split-brain, dual-node recovery) are completely untested. The
test at line 30-50 also asserts that SplitBrainResolverStrategy can be set to
"keep-majority", implicitly endorsing a value the design doc forbids for a
two-node cluster — see CI-004.
Recommendation
When CI-001 is implemented, add multi-node Akka.Cluster.TestKit /
MultiNodeTestKit tests covering cluster formation, failover promotion,
split-brain downing, and singleton handover, plus unit tests for HOCON generation
from ClusterOptions and for the options validation from CI-004.
Resolution
Re-triaged in light of CI-001's resolution. The Akka bootstrap, HOCON generation,
cluster formation, failover and singleton handover are owned by ScadaLink.Host, not
this project — multi-node Akka.Cluster.TestKit tests for that behaviour belong in the
Host's test suite, outside this module's scope. What this module legitimately owns is
ClusterOptions, its validator, and the DI registration, and the testing gap there is
now closed.
Resolved — fixing commit commit pending, date 2026-05-16. Added two test classes
to tests/ScadaLink.ClusterInfrastructure.Tests: ClusterOptionsValidatorTests
(8 cases — valid defaults pass; MinNrOfMembers != 1, unsupported split-brain
strategies, empty seed nodes, heartbeat not below the failure threshold, non-positive
StableAfter all fail; and a multi-failure accumulation case) and
ServiceCollectionExtensionsTests (3 cases — AddClusterInfrastructure registers the
validator, the validator rejects bad options at IOptions resolution, and
AddClusterInfrastructureActors throws). The pre-existing ClusterOptionsTests was
extended with DownIfAlone coverage. The test project gained references to
Microsoft.Extensions.DependencyInjection and Microsoft.Extensions.Options. Module
test suite green: 16 passed (was 3). Note: the keep-majority value used in the
pre-existing ClusterOptionsTests.Properties_CanBeSetToCustomValues is intentionally
left — that test exercises the POCO's property setter (the POCO accepts any string by
design); ClusterOptionsValidator is the layer that now rejects keep-majority, and
UnsupportedSplitBrainStrategy_FailsValidation proves it.
ClusterInfrastructure-007 — ClusterOptions lacks XML documentation comments
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:3-11 |
Description
ClusterOptions and each of its six properties have no XML doc comments. Peer
options classes such as StoreAndForward/StoreAndForwardOptions.cs document the
class and every property (including units and design-doc references). For a class
whose values carry the cluster-wide consequences described in the design doc
(notably MinNrOfMembers and SplitBrainResolverStrategy), the absence of inline
documentation is a maintainability and safety gap — a future editor has no in-code
warning that MinNrOfMembers must stay 1.
Recommendation
Add <summary> comments to the class and each property, stating units and the
documented constraints (e.g. that MinNrOfMembers must be 1, that
HeartbeatInterval must be well below FailureDetectionThreshold), referencing
the relevant design-doc sections as peer modules do.
Resolution
Confirmed against the source: ClusterOptions previously had no XML doc comments on
the class or any property.
Resolved — fixing commit commit pending, date 2026-05-16. ClusterOptions now
carries a class-level <summary> documenting the cluster configuration contract and
the deliberate ownership split (node identity / remoting / gRPC in
Host.NodeOptions, storage paths in the database options, cluster-formation settings
here), plus <summary> comments on all seven properties — each stating units,
defaults, and the design-doc constraints. Notably MinNrOfMembers is documented as
required to be 1 (a value of 2 blocks the cluster singleton after failover) and
HeartbeatInterval as required to be well below FailureDetectionThreshold, giving a
future editor the in-code warnings the recommendation asks for. This is a
documentation-only change, so no runtime regression test is meaningful; verified by
inspection of ClusterOptions.cs:3-74. Module test suite green (17 passed).
ClusterInfrastructure-008 — "Phase 0 skeleton" status is undocumented at the module level
| Severity | Low |
| Category | Documentation & comments |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:9, src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:16 |
Description
The only indication that this foundational module is unimplemented is two inline
comments inside private method bodies (// Phase 0: skeleton only /
// Phase 0: placeholder for Akka actor registration). There is no module README,
no <!-- TODO --> in the design doc, and no tracking marker visible to anyone
reading the project structure or the component table. Given that the design doc
(Component-ClusterInfrastructure.md) describes a fully featured component with no
caveat, a reader will reasonably assume the module is built. The mismatch between a
complete-looking design doc and an empty implementation is itself a documentation
defect.
Recommendation
Add a short note to the design doc (or a module-level README.md) stating the
current implementation status and what "Phase 0" delivers, and reference a tracked
issue for the remaining work (CI-001). Keep the README component table accurate
about which components are skeletons versus implemented.
Resolution
Re-triaged in light of CI-001's resolution. The original defect was the mismatch
between two misleading inline // Phase 0: skeleton only /
// Phase 0: placeholder comments buried in private method bodies and a
complete-looking design doc with no caveat. That premise has been overtaken by the
CI-001/CI-002 work:
- The "Phase 0 skeleton" comments no longer exist anywhere in
src/ScadaLink.ClusterInfrastructure(verified bygrep).ServiceCollectionExtensionsnow does real work (registersClusterOptionsValidator) andAddClusterInfrastructureActorsthrows explicitly — both with accurate XML docs explaining the ownership split. - The module is no longer an unimplemented skeleton. CI-001 established that the Akka
bootstrap legitimately lives in
ScadaLink.Host, and this project's true scope — theClusterOptionsconfiguration contract, its validator, and DI registration — is fully implemented and tested. - The design doc
Component-ClusterInfrastructure.mdnow opens with an "Implementation Note — Code Placement" section (added by CI-001) that explicitly states the component is a design responsibility realised acrossScadaLink.ClusterInfrastructure(configuration model) andScadaLink.Host(bootstrap/runtime wiring), and the README component table (row 13) was updated to match. A reader of the design doc no longer assumes a single fully-built project.
Resolved — fixing commit commit pending, date 2026-05-16. No further change
required: the misleading skeleton comments are gone, the module's real status is
documented at the design-doc level via the "Implementation Note — Code Placement"
section, and the component table reflects the true placement. This is a
documentation-only finding, so no runtime regression test is meaningful; verified by
inspection of ServiceCollectionExtensions.cs and
docs/requirements/Component-ClusterInfrastructure.md:21-39.
ClusterInfrastructure-009 — DownIfAlone is an inert configuration knob — never consumed by the HOCON builder
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:74 |
Description
The DownIfAlone property was added to ClusterOptions by CI-003's resolution as
part of "the split-brain configuration contract". It is public, defaults to true,
carries an XML doc presenting it as "the design-doc requirement", and is exercised by
ClusterOptionsTests.DownIfAlone_CanBeSet. However, nothing in the system reads it.
The Akka.NET HOCON is generated by ScadaLink.Host.Actors.AkkaHostedService.BuildHocon,
which hard-codes the resolver setting:
split-brain-resolver {
active-strategy = ...
stable-after = ...
keep-oldest {
down-if-alone = on
}
}
BuildHocon receives the full ClusterOptions instance and consumes every other
field (SeedNodes, MinNrOfMembers, SplitBrainResolverStrategy, StableAfter,
HeartbeatInterval, FailureDetectionThreshold) but ignores DownIfAlone entirely.
The result is a configuration property that an operator can set in appsettings.json,
that passes validation, and that has zero runtime effect — setting
DownIfAlone: false does not turn the flag off. CI-003's resolution explicitly
acknowledged this gap ("wiring it to read DownIfAlone is a one-line ScadaLink.Host
change ... noted for the Host's review") but the wiring was never done and no tracked
finding carried it, so the gap has silently persisted to commit 39d737e. An inert,
misleadingly-documented configuration knob is a correctness and design-adherence
defect: it gives operators a false sense of control over a safety-critical resolver
setting.
Recommendation
Either (a) wire DownIfAlone into BuildHocon — emit down-if-alone = {(clusterOptions.DownIfAlone ? "on" : "off")}
— so the property does what its XML doc claims (a Host-side change, to be tracked in
the Host module's review since BuildHocon lives there), or (b) if the flag is
intentionally fixed at on and must never be operator-configurable, remove the
DownIfAlone property from ClusterOptions and document the hard-coded on value as
a non-negotiable invariant. Do not leave a public, settable, validated property that
controls nothing.
Resolution
Root cause verified against the source at commit 39d737e:
src/ScadaLink.Host/Actors/AkkaHostedService.cs:147 hard-codes down-if-alone = on
inside the keep-oldest block, and BuildHocon consumes every other ClusterOptions
field but never reads clusterOptions.DownIfAlone. The finding's facts hold. The fix
is correctly scoped to the Host module — the configuration property and its
validation legitimately live in ScadaLink.ClusterInfrastructure (this module),
and the per-CLAUDE.md ownership split (CI-001) places HOCON generation in
ScadaLink.Host.
Resolved — by cross-reference, date 2026-05-17. No code change in this module:
ClusterOptions.DownIfAlone and its validation (CI-010) are correct and complete here.
The consumption fix — wiring DownIfAlone into BuildHocon so the property has runtime
effect — is tracked and fixed as Host-012 in code-reviews/Host/findings.md,
addressed this session. CI-009 closes against that cross-reference. Module test suite
green (18 passed).
ClusterInfrastructure-010 — Validator does not enforce DownIfAlone = true despite the design doc requiring it
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptionsValidator.cs:21-71 |
Description
Component-ClusterInfrastructure.md (Split-Brain Resolution) states the keep-oldest
resolver must be configured with down-if-alone = on, and the XML doc on
ClusterOptions.DownIfAlone calls true "the design-doc requirement" — the rationale
being that without it the oldest node can run as an isolated single-node cluster
during a partition while the younger node forms its own. ClusterOptionsValidator
guards every other safety-critical setting (MinNrOfMembers == 1, keep-oldest-only
strategy, positive timings, heartbeat below the failure threshold) but performs no
check on DownIfAlone. A configuration of DownIfAlone: false therefore passes
validation cleanly. This is currently latent because CI-009 shows the property is not
consumed at all, but the moment CI-009 is fixed by wiring the property into the HOCON
(option (a)), DownIfAlone: false would silently produce the unsafe single-node
behaviour the design doc explicitly forbids — with no fail-fast guard. The validator
is the right place to enforce the invariant, consistent with how it already rejects
quorum split-brain strategies.
Recommendation
If CI-009 is resolved by keeping DownIfAlone configurable, add a check to
ClusterOptionsValidator.Validate that fails when DownIfAlone is false (or, if
some future deployment legitimately needs it off, fails only in combination with the
keep-oldest strategy), with a message explaining the isolated-single-node-cluster
hazard. If CI-009 is resolved by removing the property, this finding is moot and
should be closed as resolved alongside it.
Resolution
Confirmed against the source: ClusterOptionsValidator.Validate guarded every other
safety-critical setting but performed no check on DownIfAlone, so DownIfAlone: false
passed validation cleanly. With CI-009/Host-012 wiring the property into the HOCON this
session, an unguarded false would now silently produce the unsafe isolated-single-node
behaviour the design doc forbids.
Resolved — fixing commit commit pending, date 2026-05-17. Added a DownIfAlone
check to ClusterOptionsValidator.Validate that fails when the flag is false, with a
message explaining the isolated-single-node-cluster hazard, consistent with how the
validator already rejects quorum split-brain strategies. Developed test-first:
ClusterOptionsValidatorTests.DownIfAloneFalse_FailsValidation was written first,
confirmed failing, then passing after the fix. Module test suite green (18 passed).
ClusterInfrastructure-011 — SectionName constant is decorative — no binding site references it
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptions.cs:24-27, src/ScadaLink.Host/SiteServiceRegistration.cs:100, src/ScadaLink.Host/StartupValidator.cs:43, src/ScadaLink.Host/StartupValidator.cs:45, src/ScadaLink.Host/StartupValidator.cs:75 |
Description
ClusterOptions.SectionName was added by CI-005 as public const string SectionName = "ScadaLink:Cluster";, with an XML doc declaring it "the single source of truth so
binding sites do not hard-code the magic string". CI-005's resolution likewise framed
the constant as the canonical reference value. In practice, no caller in the
solution reads it. grep -rn "ClusterOptions.SectionName" src/ returns zero hits.
Every site that needs the section name hard-codes the literal:
ScadaLink.Host.SiteServiceRegistration.BindSharedOptions:100—services.Configure<ClusterOptions>(config.GetSection("ScadaLink:Cluster"));ScadaLink.Host.StartupValidator:43,45,75— three"ScadaLink:Cluster"/"ScadaLink:Cluster:SeedNodes"literals.
The SectionName_IsTheExpectedAppSettingsSection test pins the constant's value but
does not protect against the underlying drift hazard: if someone changes
SectionName to "ScadaLink:Akka:Cluster", the test still passes (because it tests
the constant against the same literal), the validator still registers, and binding
silently goes to whichever string the Host hard-codes. The constant currently
provides none of the safety its XML doc claims. This is the same pattern of "inert
configuration knob" CI-009 flagged for DownIfAlone, just with the harm being
configuration drift rather than runtime behaviour.
Recommendation
Either (a) replace the hard-coded "ScadaLink:Cluster" literals in
SiteServiceRegistration.cs:100 and StartupValidator.cs:43,45,75 with
ClusterOptions.SectionName (a small Host-module change, to be tracked there), or
(b) if the constant is intentionally decorative, soften the XML doc so it does not
claim to be the source of truth. Do not leave a public constant whose stated
guarantee the code does not deliver.
Resolution
Open — needs a one-line Host-side change to reference the constant, plus a test that proves the section name flows from this module to the Host.
ClusterInfrastructure-012 — Validator accepts SeedNodes.Count == 1 despite design requiring both nodes as seeds
| Severity | Low |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.ClusterInfrastructure/ClusterOptionsValidator.cs:30-43 |
Description
Component-ClusterInfrastructure.md (Node Configuration) states:
Cluster seed nodes: Both nodes are seed nodes — each node lists both itself and its partner. Either node can start first and form the cluster; the other joins when it starts. No startup ordering dependency.
A correctly-configured ScadaLink deployment therefore lists two seed nodes.
ClusterOptionsValidator.Validate only checks that SeedNodes is non-null and
non-empty (Count == 0). A configuration with a single seed node passes validation
silently — but that defeats the "no startup ordering dependency" guarantee the
design doc explicitly calls out.
ScadaLink.Host.StartupValidator:43-46 does enforce the rule:
var seedNodes = configuration.GetSection("ScadaLink:Cluster:SeedNodes").Get<List<string>>();
if (seedNodes is null || seedNodes.Count < 2)
errors.Add("ScadaLink:Cluster:SeedNodes must have at least 2 entries");
So the rule is enforced — but by the other project, after the
ClusterOptionsValidator (the contract owner) already accepted the value. This is
both inconsistent (two validators with different rules for the same field) and the
weaker check is the contract-owner's. The pre-existing test
ServiceCollectionExtensionsTests.AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution
even constructs a SeedNodes list with one entry and expects validation to succeed
on that count — locking in the gap.
Recommendation
Tighten the validator: require SeedNodes.Count >= 2 with a message that references
the "both nodes are seed nodes" design rule. Update
AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution to use a two-entry
list, and add a test case for SeedNodes.Count == 1 failing validation. Once this
module's validator enforces the rule, Host.StartupValidator's duplicate check
becomes redundant and can be removed in the Host's review.
Resolution (2026-05-28): Tightened the seed-node check to require
SeedNodes.Count >= 2 with a message that cites Component-ClusterInfrastructure.md
→ Node Configuration ("both nodes are seed nodes"). Added
ClusterOptionsValidatorTests.SingleSeedNode_FailsValidation; the existing
ServiceCollectionExtensionsTests.AddClusterInfrastructure_ValidatorRejectsBadOptionsAtResolution
still passes because errors accumulate (MinNrOfMembers=2 still fails as before).
Removing the duplicate check in Host.StartupValidator is left for the Host's
review per the original recommendation. Tests green (19/19 in
ClusterInfrastructure.Tests).
ClusterInfrastructure-013 — Test uses catastrophic config values without an inline-intent comment
| Severity | Low |
| Category | Documentation & comments |
| Status | Open |
| Location | tests/ScadaLink.ClusterInfrastructure.Tests/ClusterOptionsTests.cs:47-67 |
Description
ClusterOptionsTests.Properties_CanBeSetToCustomValues deliberately sets two values
the design doc explicitly warns are catastrophic:
SplitBrainResolverStrategy = "keep-majority", // design doc: total cluster shutdown on partition
...
MinNrOfMembers = 2 // design doc: blocks singleton, halts data collection
The CI-006 resolution acknowledged this is intentional — the test exercises the POCO
property setter (which by design accepts any string/int because the validator does
the rejecting), and ClusterOptionsValidatorTests.UnsupportedSplitBrainStrategy_FailsValidation
MinNrOfMembers_NotOne_FailsValidationprove the validator rejects them. But this reasoning is recorded only in the CI-006 resolution text in this findings file, not in the test itself. A reader landing on the test cold has no signal that these values are forbidden in production; they could reasonably infer the test endorses them.
Recommendation
Add a brief XML-doc / inline comment to Properties_CanBeSetToCustomValues stating
that it exercises only the POCO's setter — these values intentionally do not
represent a valid runtime configuration, and ClusterOptionsValidator rejects them
(with a cross-reference to the relevant validator tests). Two lines is enough; the
goal is to make the test's intent self-documenting.
Resolution
Open.
ClusterInfrastructure-014 — AddClusterInfrastructureActors is dead surface — no caller, no behaviour
| Severity | Low |
| Category | Code organization & conventions |
| Status | Open |
| Location | src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:42-48 |
Description
AddClusterInfrastructureActors has now reached a curious state: it is a public
extension method with an XML doc that ends "Do not call AddClusterInfrastructureActors()"
and a body that unconditionally throws NotImplementedException. CI-002's resolution
chose "throw loudly" over "delete" specifically because CI-001 was still resolving the
ownership-split question. That question is settled — the design doc, the README
component table, and Component-ClusterInfrastructure.md's "Implementation Note — Code
Placement" all permanently locate the Akka actor bootstrap in ScadaLink.Host.
A grep -rn "AddClusterInfrastructureActors" src/ tests/ confirms there is no caller
anywhere in the solution. The method's only consumer is its own test
(AddClusterInfrastructureActors_ThrowsRatherThanSilentlySucceeding), which asserts
that the method throws when called. Keeping it costs API surface (IDE auto-complete
suggests it, the docs render it, and a future contributor might re-introduce a call
expecting it to register something), and gives nothing in return.
Recommendation
Delete AddClusterInfrastructureActors, delete its test, and add a one-line note to
docs/requirements/Component-ClusterInfrastructure.md's code-placement section
explicitly stating that this project exposes no actor-registration extension
(actor wiring lives in ScadaLink.Host). If the user prefers to keep the
"fail-fast" trap, mark the method [Obsolete(true, error: true)] so the compiler —
not the runtime — rejects the call.
Resolution
Open.