From cac8aebe9f6bf97110e8dddd5c8e2abd0b05bf27 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sat, 16 May 2026 20:12:24 -0400 Subject: [PATCH] =?UTF-8?q?docs(cluster-infrastructure):=20resolve=20Clust?= =?UTF-8?q?erInfrastructure-001=20=E2=80=94=20document=20that=20the=20Host?= =?UTF-8?q?=20owns=20the=20Akka=20bootstrap?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 2 +- .../ClusterInfrastructure/findings.md | 21 ++++++++++++++----- .../Component-ClusterInfrastructure.md | 20 ++++++++++++++++++ 3 files changed, 37 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 36f9fd0..b4932d3 100644 --- a/README.md +++ b/README.md @@ -46,7 +46,7 @@ This document serves as the master index for the SCADA system design. The system | 10 | Security & Auth | [docs/requirements/Component-Security.md](docs/requirements/Component-Security.md) | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. | | 11 | Health Monitoring | [docs/requirements/Component-HealthMonitoring.md](docs/requirements/Component-HealthMonitoring.md) | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. | | 12 | Site Event Logging | [docs/requirements/Component-SiteEventLogging.md](docs/requirements/Component-SiteEventLogging.md) | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. | -| 13 | Cluster Infrastructure | [docs/requirements/Component-ClusterInfrastructure.md](docs/requirements/Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. | +| 13 | Cluster Infrastructure | [docs/requirements/Component-ClusterInfrastructure.md](docs/requirements/Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. The `ClusterInfrastructure` project owns the `ClusterOptions` config model; the Akka bootstrap/SBR/CoordinatedShutdown wiring lives in the Host. | | 14 | Inbound API | [docs/requirements/Component-InboundAPI.md](docs/requirements/Component-InboundAPI.md) | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. | | 15 | Host | [docs/requirements/Component-Host.md](docs/requirements/Component-Host.md) | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. | | 16 | Commons | [docs/requirements/Component-Commons.md](docs/requirements/Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention. | diff --git a/code-reviews/ClusterInfrastructure/findings.md b/code-reviews/ClusterInfrastructure/findings.md index e067a3d..57d8b9f 100644 --- a/code-reviews/ClusterInfrastructure/findings.md +++ b/code-reviews/ClusterInfrastructure/findings.md @@ -8,7 +8,7 @@ | Last reviewed | 2026-05-16 | | Reviewer | claude-agent | | Commit reviewed | `9c60592` | -| Open findings | 8 | +| Open findings | 7 | ## Summary @@ -52,7 +52,7 @@ adequately for what exists. |--|--| | Severity | High | | Category | Design-document adherence | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:9`, `src/ScadaLink.ClusterInfrastructure/ServiceCollectionExtensions.cs:16` | **Description** @@ -123,9 +123,20 @@ of two substantial decisions, both requiring the user: shared `ClusterOptions` contract. That fix is a design-doc edit, also outside this module's permitted edit scope. -Either path is a deliberate architecture decision, not a bug fix, so per -REVIEW-PROCESS.md §2 this finding is left **Open** and surfaced for the user to decide. -No code change was made. Module test suite verified green (3 passed) at re-triage time. +Either path is a deliberate architecture decision, not a bug fix. The decision was +surfaced to the user, who chose **option 2 — accept the current placement**: the Akka +bootstrap stays in the Host (the single deployable binary that performs all actor-system +bring-up), and the design docs are corrected to record the true ownership. + +**Resolved** — fixing commit ``, date 2026-05-16. The finding was a design-doc +drift, not missing behaviour. `docs/requirements/Component-ClusterInfrastructure.md` now +carries an "Implementation Note — Code Placement" section stating that the +`ScadaLink.ClusterInfrastructure` project owns the `ClusterOptions` configuration model +while `ScadaLink.Host` owns the Akka bootstrap, HOCON generation, split-brain-resolver +wiring, `CoordinatedShutdown` integration, and active-node health checks. The README +component table (row 13) was updated to match. No code change was required — the +documented cluster behaviour already exists and is exercised; only the doc's +module-ownership claim was wrong. Module test suite green (3 passed). ### ClusterInfrastructure-002 — No-op DI extension methods report success while doing nothing diff --git a/docs/requirements/Component-ClusterInfrastructure.md b/docs/requirements/Component-ClusterInfrastructure.md index dd48517..265795e 100644 --- a/docs/requirements/Component-ClusterInfrastructure.md +++ b/docs/requirements/Component-ClusterInfrastructure.md @@ -18,6 +18,26 @@ Both central and site clusters. - Support cluster singleton hosting (used by the Site Runtime Deployment Manager singleton on site clusters). - Manage Windows service lifecycle (start, stop, restart) on each node. +## Implementation Note — Code Placement + +This component is a **design responsibility**, not a single buildable project that +contains all of the code. The cluster-infrastructure responsibilities above are +realised across two projects: + +- **`src/ScadaLink.ClusterInfrastructure`** owns the cluster **configuration model**: + the `ClusterOptions` POCO (seed nodes, roles, remoting/gRPC ports, failure-detection + timings, split-brain settings) bound from `appsettings.json` via the Options pattern. +- **`src/ScadaLink.Host`** owns the cluster **bootstrap and runtime wiring**: it + builds the Akka.NET HOCON from `ClusterOptions`, starts the `ActorSystem`, + configures the keep-oldest split-brain resolver (`down-if-alone = on`), wires + `CoordinatedShutdown` into the service lifecycle, and provides active-node / + cluster-membership health checks. See `Component-Host.md` (REQ-HOST-*) for detail. + +This split is deliberate — the Host is the single deployable binary and the only +project that performs Akka.NET bootstrap, so the cluster bring-up lives there +alongside role-based component registration. The `ClusterInfrastructure` project +remains the home of the configuration contract that the Host consumes. + ## Cluster Topology ### Central Cluster