docs(config): components/configuration normalization (spec, shared-contract, current-state x3, GAPS, README)

This commit is contained in:
Joseph Doherty
2026-06-01 09:48:49 -04:00
parent b754873a44
commit 46c4bfae31
7 changed files with 1033 additions and 0 deletions
@@ -0,0 +1,191 @@
# Configuration validation — current state: ScadaBridge
Repo: `~/Desktop/ScadaBridge`. Stack: .NET 10, Akka.NET, Docker; solution
`ZB.MOM.WW.ScadaBridge.slnx`. All paths relative to repo root. Verified 2026-06-01.
ScadaBridge is the **heaviest** consumer — it has the most validation surface and the only
pre-host preflight in the family:
1. **Four per-module `*OptionsValidator : IValidateOptions<T>`** (Cluster, Security,
HealthMonitoring, AuditLog), each open-coding the same `List<string>` accumulation, each wired
through its module's bespoke `AddXxx` DI extension.
2. **One raw-config, pre-Akka `StartupValidator`** that validates critical node/cluster keys
*before* the actor system is built — the canonical motivation for `ConfigPreflight`. Its thrown
message is **byte-compatible** with `ConfigPreflight.ThrowIfInvalid()`.
## 1. Per-module options validators
All four follow the same shape: `List<string> failures`, a run of `if (...) failures.Add(...)`,
and `failures.Count > 0 ? Fail(failures) : Success` (order varies). They are registered via
`TryAddEnumerable(ServiceDescriptor.Singleton<IValidateOptions<T>, ...>())` so a misconfigured
section throws `OptionsValidationException` (with `ValidateOnStart`) or on first `IOptions<T>` resolve.
### `ClusterOptionsValidator`
`src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/ClusterOptionsValidator.cs`:
- `:13``public sealed class ClusterOptionsValidator : IValidateOptions<ClusterOptions>`.
- `:28``var failures = new List<string>();`.
- `:30` `SeedNodes` ≥ 2 (→ `MinCount`); `:44` `SplitBrainResolverStrategy` ∈ {`keep-oldest`}
(→ `OneOf`, with the allowed set at `:1619`); `:52` `MinNrOfMembers == 1` (→ `RequireThat`);
`:59`/`:64`/`:69` three positive-`TimeSpan` checks (→ `PositiveTimeSpan`); `:74` cross-field
`HeartbeatInterval < FailureDetectionThreshold` (→ `RequireThat`); `:82` `DownIfAlone` must be
true (→ `RequireThat`).
- `:9193` — the `failures.Count > 0 ? Fail : Success` tail.
- Wired: `src/ZB.MOM.WW.ScadaBridge.ClusterInfrastructure/ServiceCollectionExtensions.cs:2829`
`TryAddEnumerable(ServiceDescriptor.Singleton<IValidateOptions<ClusterOptions>, ClusterOptionsValidator>())`.
### `SecurityOptionsValidator`
`src/ZB.MOM.WW.ScadaBridge.Security/SecurityOptionsValidator.cs`:
- `:32``public sealed class SecurityOptionsValidator : IValidateOptions<SecurityOptions>`.
- `:48``var failures = new List<string>();`; `:50` required `LdapServer`, `:58` required
`LdapSearchBase` (both → `Required`). `JwtSigningKey` is intentionally **not** validated here
(`:2430` — it fails fast in `JwtTokenService`'s constructor instead).
- `:6668``failures.Count == 0 ? Success : Fail(failures)` tail.
- Wired: `src/ZB.MOM.WW.ScadaBridge.Security/ServiceCollectionExtensions.cs:2830`
`AddOptions<SecurityOptions>().ValidateOnStart()` + `TryAddEnumerable(...Singleton<IValidateOptions<SecurityOptions>, SecurityOptionsValidator>())`.
### `HealthMonitoringOptionsValidator`
`src/ZB.MOM.WW.ScadaBridge.HealthMonitoring/HealthMonitoringOptionsValidator.cs`:
- `:17``public sealed class HealthMonitoringOptionsValidator : IValidateOptions<HealthMonitoringOptions>`.
- `:26``var failures = new List<string>();`; `:28`/`:35`/`:42` three positive-`TimeSpan` checks
(→ `PositiveTimeSpan`); `:49` cross-field `CentralOfflineTimeout >= OfflineTimeout` (→ `RequireThat`).
- `:6062` — the `Count > 0 ? Fail : Success` tail.
- Wired: `src/ZB.MOM.WW.ScadaBridge.HealthMonitoring/ServiceCollectionExtensions.cs:6064` — a private
idempotent `AddOptionsValidation` does
`TryAddEnumerable(...Singleton<IValidateOptions<HealthMonitoringOptions>, HealthMonitoringOptionsValidator>())`,
called from all three `Add*HealthMonitoring`/`AddCentralHealthAggregation` entry points (`:16`, `:29`, `:42`).
### `AuditLogOptionsValidator`
`src/ZB.MOM.WW.ScadaBridge.AuditLog/Configuration/AuditLogOptionsValidator.cs`:
- `:16``public sealed class AuditLogOptionsValidator : IValidateOptions<AuditLogOptions>`.
- `:35``var failures = new List<string>();`; `:37` `DefaultCapBytes > 0` (→ `RequireThat`);
`:44` cross-field `ErrorCapBytes >= DefaultCapBytes` (→ `RequireThat`); `:52` `RetentionDays`
[30, 3650] (→ `RequireThat`, bounds at `:1922`); `:59` `InboundMaxBytes` ∈ [8 KiB, 16 MiB]
(→ `RequireThat`, bounds at `:2528`).
- `:6668` — the `Count == 0 ? Success : Fail` tail.
- Wired: `src/ZB.MOM.WW.ScadaBridge.AuditLog/ServiceCollectionExtensions.cs:6568`
`AddOptions<AuditLogOptions>().Bind(config.GetSection(ConfigSectionName)).ValidateOnStart()` +
`AddSingleton<IValidateOptions<AuditLogOptions>, AuditLogOptionsValidator>()`. This is the exact
`AddValidatedOptions` triple, spelled out.
> Other modules bind options with no validator (`Communication`, `DataConnectionLayer`,
> `Transport`, `Notification*`, `ExternalSystemGateway`, `ManagementService`, `SiteCallAudit`,
> `DeploymentManager` — all `AddOptions().BindConfiguration(...)` without `ValidateOnStart` or a
> validator). They are candidates for `AddValidatedOptions` only if/when they grow validators;
> not part of this pass's adoption.
## 2. `StartupValidator` — raw-config, pre-Akka preflight
`src/ZB.MOM.WW.ScadaBridge.Host/StartupValidator.cs`:
- `:7``public static class StartupValidator`; `:11`
`public static void Validate(IConfiguration configuration)`.
- `:13``var errors = new List<string>();`. Reads raw keys off `configuration` (no binding):
- `:16` `ScadaBridge:Node:Role` ∈ {`Central`, `Site`} (→ `Require(key, predicate, reason)`);
- `:20` `Node:NodeHostname` required (→ `RequireValue`);
- `:23` `Node:RemotingPort` parseable port 165535 (→ `RequirePort`);
- `:27` `Node:SiteId` required **when** role == Site (→ `When(role == "Site", ...)`);
- `:3041` `Database:ConfigurationDb` / `Security:LdapServer` / `Security:JwtSigningKey`
required **when** role == Central (→ `When(role == "Central", ...)`);
- `:43` `Cluster:SeedNodes` ≥ 2 entries (→ a `Require`/custom rule over the bound list);
- `:4779` Site-only rules: `GrpcPort` range (`:49`), `GrpcPort != RemotingPort` (`:58`),
`Database:SiteDbPath` required (`:61`), and seed-node-must-not-target-gRPC-port (`:6978`) —
all under a `When(role == "Site", ...)` block, with `SeedNodePort` (`:90`) as a domain helper
that stays per-project.
- `:8183`**the throw:**
```csharp
throw new InvalidOperationException(
$"Configuration validation failed:\n{string.Join("\n", errors.Select(e => $" - {e}"))}");
```
- Called once, before the actor system is built:
`src/ZB.MOM.WW.ScadaBridge.Host/Program.cs:39` — `StartupValidator.Validate(configuration);`.
### Message byte-compatibility with `ConfigPreflight` ✅
`StartupValidator`'s throw (`:8183`) and `ConfigPreflight.ThrowIfInvalid()`
(`ZB.MOM.WW.Configuration/src/ZB.MOM.WW.Configuration/ConfigPreflight.cs:6368`) build the **same
string**:
- both prefix `"Configuration validation failed:\n"`;
- both join the failures with `"\n"`;
- both format each failure as `" - " + message`.
The library deliberately copied this envelope so the migration is a **behaviour-preserving swap**:
same exception type (`InvalidOperationException`), same message bytes. The individual failure
messages are `"<field> <reason>"` (`StartupValidator` open-codes them; `ConfigPreflight` produces
them via the shared `Checks` primitives for the standardized rules — `RequireValue` → `"<key> is
required"`, `RequirePort` → `"<key> must be between 1 and 65535 (was '<raw>')"`). Rules that have
no shared primitive (role-set membership, gRPC-port-vs-remoting, seed-node-vs-gRPC-port) keep their
exact wording via `Require(key, predicate, reason)` and `When(...)`.
## 3. Summary
| Surface | What exists | Shared-lib mapping |
|---|---|---|
| Cluster validator | `ClusterOptionsValidator : IValidateOptions<ClusterOptions>` | → `OptionsValidatorBase<ClusterOptions>` |
| Security validator | `SecurityOptionsValidator : IValidateOptions<SecurityOptions>` | → `OptionsValidatorBase<SecurityOptions>` |
| Health validator | `HealthMonitoringOptionsValidator : IValidateOptions<HealthMonitoringOptions>` | → `OptionsValidatorBase<HealthMonitoringOptions>` |
| Audit validator | `AuditLogOptionsValidator : IValidateOptions<AuditLogOptions>` | → `OptionsValidatorBase<AuditLogOptions>` |
| Failure accumulation (×4) | private `List<string>` + `Count`-based tail in each | → owned by base + `ValidationBuilder` |
| DI wiring (×4) | per-module `TryAddEnumerable`/`AddSingleton` + `AddOptions().Bind().ValidateOnStart()` | → `AddValidatedOptions<T, TValidator>` |
| Pre-host preflight | `StartupValidator` (raw config, pre-Akka, `Program.cs:39`) | → `ConfigPreflight` (**byte-compatible** message) |
---
## Adoption plan → `ZB.MOM.WW.Configuration`
ScadaBridge is the heaviest adoption — five validation surfaces — but every change is
behaviour-preserving.
**Migrate the four module validators to the base:**
- For each of `ClusterOptionsValidator`, `SecurityOptionsValidator`,
`HealthMonitoringOptionsValidator`, `AuditLogOptionsValidator`: change the declaration from
`: IValidateOptions<T>` to `: OptionsValidatorBase<T>`, replace
`Validate(string?, T)` with `protected override void Validate(ValidationBuilder v, T o)`, delete
the `List<string> failures` and the `Count`-based tail, and re-express each rule on `v`:
- `SeedNodes` ≥ 2 → `v.MinCount(o.SeedNodes, 2, "ClusterOptions.SeedNodes")`;
- strategy set → `v.OneOf(o.SplitBrainResolverStrategy, ["keep-oldest"], "...")`;
- positive durations → `v.PositiveTimeSpan(...)`;
- required strings → `v.Required(...)`;
- cross-field / bounds rules (`MinNrOfMembers == 1`, heartbeat < threshold, `ErrorCapBytes >=
DefaultCapBytes`, retention bounds, etc.) → `v.RequireThat(condition, message)` with the
**existing message strings preserved verbatim**.
- Update each module's `ServiceCollectionExtensions` to register via
`AddValidatedOptions<T, TValidator>(configuration, "<section>")` instead of the
`AddOptions().Bind/BindConfiguration(...).ValidateOnStart()` + `AddSingleton`/`TryAddEnumerable`
pair (`ClusterInfrastructure/ServiceCollectionExtensions.cs:2829`;
`Security/ServiceCollectionExtensions.cs:2830`;
`HealthMonitoring/ServiceCollectionExtensions.cs:6064`;
`AuditLog/ServiceCollectionExtensions.cs:6568`). Where a module uses `TryAddEnumerable` for
idempotency across multiple entry points (HealthMonitoring), keep an idempotency guard around the
single `AddValidatedOptions` call.
**Migrate `StartupValidator` → `ConfigPreflight`:**
- Replace the body of `StartupValidator.Validate(IConfiguration)` with a `ConfigPreflight.For(configuration)`
chain: `.Require("ScadaBridge:Node:Role", v => v is "Central" or "Site", "must be 'Central' or 'Site'")`,
`.RequireValue("ScadaBridge:Node:NodeHostname")`, `.RequirePort("ScadaBridge:Node:RemotingPort")`,
`.When(role == "Site", p => p.RequireValue("ScadaBridge:Node:SiteId"))`,
`.When(role == "Central", p => p.RequireValue("ScadaBridge:Database:ConfigurationDb")...)`, the
Site-only block (`GrpcPort` range, `GrpcPort != RemotingPort`, `SiteDbPath`, seed-vs-gRPC-port),
then `.ThrowIfInvalid()`. Keep the `SeedNodePort` helper and the seed-node/gRPC-port custom rules
as `Require(...)` predicates — they have no shared primitive.
- **Verify the byte-compatibility** (covered by the library's `ConfigPreflightTests`): the swap
preserves the exact `"Configuration validation failed:\n - ..."` message and the
`InvalidOperationException` type. The call site (`Program.cs:39`) is unchanged.
**Keep bespoke (unchanged):**
- All options classes (`ClusterOptions`, `SecurityOptions`, `HealthMonitoringOptions`,
`AuditLogOptions`, `NodeOptions`) and every domain message/rule — split-brain strategy, Akka
heartbeat/threshold relationship, audit retention bounds, gRPC-port-vs-remoting-port, seed-node
topology. The library carries plumbing, not policy.
- The no-validator modules (`Communication`, `DataConnectionLayer`, `Transport`, etc.) — they have
no validation to migrate; leave them until they grow validators.
**Status:** follow-on (tracked in [`../GAPS.md`](../GAPS.md)). Heaviest of the three (five surfaces),
but every item is a behaviour-preserving swap — low risk, the preflight swap gated on the
byte-compatibility test.