docs(config): components/configuration normalization (spec, shared-contract, current-state x3, GAPS, README)

This commit is contained in:
Joseph Doherty
2026-06-01 09:48:49 -04:00
parent b754873a44
commit 46c4bfae31
7 changed files with 1033 additions and 0 deletions
+201
View File
@@ -0,0 +1,201 @@
# Configuration validation — normalized target spec
Status: **Draft**. The single design the sister projects converge on for **startup
configuration validation**. Derived from the three code-verified current-state docs
(`../current-state/`). Goal is *path to shared code*
(`../shared-contract/ZB.MOM.WW.Configuration.md`), so each normalized section maps to a shared
library seam. The library is **already built** at
[`../../../ZB.MOM.WW.Configuration/`](../../../ZB.MOM.WW.Configuration/) (`0.1.0`, 27 tests).
## 0. Scope
The common concern is **fail-fast validation of configuration at process startup**: bind an
`appsettings.json` / environment section to a typed options object (or read raw keys before the
host exists), check every field, and refuse to start when anything is wrong — surfacing **all**
problems at once so an operator fixes them in one edit rather than one boot-loop per typo. All
three apps already do this; they do it with three private copies of the same plumbing.
**Normalized here** (goes in the shared `ZB.MOM.WW.Configuration` library):
- **The `IValidateOptions<T>` failure-accumulation convention.** Every app hand-rolls a
`List<string> failures`, a pile of `if (...) failures.Add(...)`, and the
`failures.Count == 0 ? Success : Fail(failures)` tail. That plumbing becomes
`OptionsValidatorBase<TOptions>`: override `protected void Validate(ValidationBuilder, TOptions)`,
record failures on the builder, and the base aggregates them and returns a single
`ValidateOptionsResult` (Success only when the builder is clean).
- **Reusable rule primitives.** The same checks recur across apps — required-string, TCP port
range, `host:port` endpoint, positive `TimeSpan`, one-of-a-set, minimum collection count. They
become `ValidationBuilder` primitives (`Required`, `Port`, `HostPort`, `PositiveTimeSpan`,
`OneOf`, `MinCount`) plus `RequireThat(bool, message)` / `Add(message)` escape hatches for
custom and cross-field rules. Wording is centralized in an internal `Checks` seam so a
given rule reads identically everywhere.
- **`AddValidatedOptions<TOptions, TValidator>(IConfiguration, sectionPath)`** — one DI call that
binds the section, registers the validator as the options' `IValidateOptions<TOptions>`, and
enables `ValidateOnStart()`. Replaces the per-module `AddOptions().Bind(...).ValidateOnStart()`
+ `AddSingleton<IValidateOptions<...>, ...>()` pair that each app open-codes.
- **The pre-host `ConfigPreflight` aggregator** — a fluent checker over raw `IConfiguration` for
the keys that must be valid *before* the host / DI container / actor system is built (node
role, remoting port, site id). Generalizes ScadaBridge's `StartupValidator`. Fluent surface:
`For(config)`, `.Require(key, predicate, reason)`, `.RequireValue(key)`, `.RequirePort(key)`,
`.When(condition, block)` (role-conditional rules), `.ThrowIfInvalid()`.
**The error-handling contract** (shared across both front-ends):
- **Accumulate ALL failures.** Never short-circuit on the first failure — collect every problem
and surface them together. (`OptionsValidatorBase` and `ConfigPreflight` both do this; it is
the behaviour every app already wanted.)
- **Two surfacing paths**, by where validation runs:
1. **Options bound through DI**`ValidateOnStart()` raises an
**`OptionsValidationException`** at host start (the .NET options pipeline aggregates the
failures). This is the `AddValidatedOptions` path.
2. **Raw config, pre-host**`ConfigPreflight.ThrowIfInvalid()` throws an
**`InvalidOperationException`** listing all failures.
- **Message format `"<field> <reason>"`** for each individual failure, produced by the shared
`Checks` primitives (e.g. `"ScadaBridge:Node:RemotingPort must be between 1 and 65535 (was '0')"`).
`ConfigPreflight.ThrowIfInvalid()` wraps the accumulated lines in the exact envelope
ScadaBridge's `StartupValidator` uses today (§4) so the migration is byte-compatible.
**Explicitly NOT normalized** (domain-specific — stays per project):
- **Each app's options classes and their domain rules.** `GatewayOptions` (worker exe path,
heartbeat grace ≥ interval, TLS validity years), `ClusterOptions` (split-brain strategy,
`MinNrOfMembers == 1`, heartbeat ≪ failure-detection), `SecurityOptions` (LDAP server /
search base), `HealthMonitoringOptions` (positive `PeriodicTimer` intervals),
`AuditLogOptions` (payload caps, retention bounds), and ScadaBridge's `Node` topology rules
(gRPC port ≠ remoting port, seed nodes must not target the gRPC port) all stay where they
live. Only the *plumbing they sit on* is shared; the *rules* are theirs.
- **OtOpcUa's runtime draft/snapshot validation** (`DraftValidator` + `DraftSnapshot`). This is
**not** options/config validation at all — it is managed pre-publish validation of an operator's
*configuration draft* (UNS segment regex, EquipmentId derivation, cross-cluster namespace
binding, reservation pre-flight), run in the publish pipeline against database rows, not against
`IConfiguration`. It shares only a *philosophy* (return every failure in one pass) with this
component and is **out of scope** for the shared library. It stays entirely in OtOpcUa.
## 1. `IValidateOptions` base — `OptionsValidatorBase<TOptions>`
The headline plumbing fix. Today each validator re-implements: the `Validate(string?, TOptions)`
signature, a local `List<string>`, the `failures.Count == 0 ? Success : Fail(failures)` tail,
and (in several) private `AddIfBlank` / `AddIfNotPositive` helpers. The base owns all of that:
```csharp
public sealed class ClusterOptionsValidator : OptionsValidatorBase<ClusterOptions>
{
protected override void Validate(ValidationBuilder v, ClusterOptions o)
{
v.MinCount(o.SeedNodes, 2, "ClusterOptions.SeedNodes");
v.OneOf(o.SplitBrainResolverStrategy, ["keep-oldest"], "ClusterOptions.SplitBrainResolverStrategy");
v.PositiveTimeSpan(o.StableAfter, "ClusterOptions.StableAfter");
v.RequireThat(o.MinNrOfMembers == 1,
$"ClusterOptions.MinNrOfMembers must be 1 (was {o.MinNrOfMembers})");
// cross-field rule:
v.RequireThat(o.HeartbeatInterval < o.FailureDetectionThreshold,
"ClusterOptions.HeartbeatInterval must be below FailureDetectionThreshold");
}
}
```
`OptionsValidatorBase<TOptions>.Validate(string?, TOptions)` guards null, creates a
`ValidationBuilder`, calls the override, and returns `Success` only when `builder.IsValid`.
**Accumulation is automatic** — the override never returns early; it records everything.
## 2. Rule primitives — `ValidationBuilder`
`ValidationBuilder` is the accumulator passed into the override. Primitives both check a value
and append a consistently-worded `"<field> <reason>"` message on failure; escape hatches cover
the rest:
| Primitive | Checks | Failure wording (from `Checks`) |
|---|---|---|
| `Required(value, field)` | non-null, non-whitespace string | `"<field> is required"` |
| `Port(value, field)` | int in 165535 | `"<field> must be between 1 and 65535 (was <value>)"` |
| `HostPort(value, field)` | `host:port` with port 165535 | `"<field> must be 'host:port' with port 1-65535 (was '<value>')"` |
| `PositiveTimeSpan(value, field)` | `> TimeSpan.Zero` | `"<field> must be a positive duration (was <value>)"` |
| `OneOf(value, allowed, field)` | case-insensitive membership | `"<field> must be one of [<allowed>] (was '<value>')"` |
| `MinCount(value, min, field)` | collection ≥ `min` items | `"<field> must contain at least <min> item(s) (had <n>)"` |
| `RequireThat(ok, message)` | arbitrary boolean (cross-field, custom) | caller-supplied |
| `Add(message)` | unconditional failure | caller-supplied |
Properties: `Failures` (read-only accumulated list) and `IsValid`. Every method returns the
builder for chaining. `Add`/`RequireThat` carry the rules that are genuinely app-specific (e.g.
MxGateway's "ExecutablePath must point to a .exe", ScadaBridge's heartbeat-vs-threshold
ordering) without forcing them into a primitive.
## 3. DI wiring — `AddValidatedOptions`
```csharp
builder.Services.AddValidatedOptions<ClusterOptions, ClusterOptionsValidator>(
builder.Configuration, "ScadaBridge:Cluster");
```
Binds `ScadaBridge:Cluster``ClusterOptions`, registers `ClusterOptionsValidator` as a
singleton `IValidateOptions<ClusterOptions>`, and calls `ValidateOnStart()`. Returns the
`OptionsBuilder<TOptions>` for further chaining (e.g. `.PostConfigure(...)`). This collapses the
three-line idiom every module repeats (`AddOptions().Bind(...).ValidateOnStart()` +
`AddSingleton<IValidateOptions<...>, ...>()`) into one call.
> The validator is registered as a **singleton** (it backs the singleton options factory). It
> must be singleton-safe — no scoped dependencies. All current validators are stateless, so this
> holds.
When a section bound this way fails, the .NET options pipeline raises **`OptionsValidationException`**
at host start (because of `ValidateOnStart()`), with all accumulated messages.
## 4. Pre-host preflight — `ConfigPreflight`
For keys that must be valid **before** the host / DI / actor system exists, `ConfigPreflight`
reads raw `IConfiguration` and accumulates failures the same way:
```csharp
ConfigPreflight.For(configuration)
.Require("ScadaBridge:Node:Role", v => v is "Central" or "Site", "must be 'Central' or 'Site'")
.RequireValue("ScadaBridge:Node:NodeHostname")
.RequirePort("ScadaBridge:Node:RemotingPort")
.When(role == "Site", p => p.RequireValue("ScadaBridge:Node:SiteId"))
.ThrowIfInvalid();
```
`.ThrowIfInvalid()` throws **`InvalidOperationException`** when any failure was recorded, with
this exact envelope:
```
Configuration validation failed:
- <field> <reason>
- <field> <reason>
```
> **Byte-compatibility with ScadaBridge's `StartupValidator`.** ScadaBridge's
> `StartupValidator.Validate` throws
> `$"Configuration validation failed:\n{string.Join("\n", errors.Select(e => $" - {e}"))}"`.
> `ConfigPreflight.ThrowIfInvalid()` produces the **identical** string
> (`"Configuration validation failed:\n" + the same `" - <field> <reason>"` lines, `\n`-joined`).
> The migration is a behaviour-preserving swap: same exception type
> (`InvalidOperationException`), same message bytes. This is verified in the library's
> `ConfigPreflightTests` and is the reason the message format is pinned in §0.
`.When(condition, block)` carries role-conditional rules (ScadaBridge only validates database /
security / gRPC-port keys when the node is `Central` or `Site` respectively) without an `if` ladder.
## 5. Per-project migration
| Project | Current state | Primary gaps | What normalizes |
|---|---|---|---|
| **OtOpcUa** | **No options validation at all** — options bound with bare `.Bind()` (`LdapOptions`, `OpcUa`); zero `IValidateOptions` / `ValidateOnStart` in the repo. Only validator is `DraftValidator` (runtime draft/snapshot, **out of scope**). | No startup validation of `Ldap` / `OpcUa` sections — a bad value fails opaquely on first use. | *Optional* adoption: add `OptionsValidatorBase` subclasses + `AddValidatedOptions` for the sections worth guarding. `DraftValidator`/`DraftSnapshot` stay per-project untouched. Lightest consumer. |
| **MxGateway** | One large `GatewayOptionsValidator : IValidateOptions<GatewayOptions>` (~360 LOC, 9 sub-validators, private `AddIfBlank`/`AddIfNotPositive`/`AddIfInvalidPath` helpers); wired via `AddGatewayConfiguration` (`AddOptions().BindConfiguration().ValidateOnStart()`). | Hand-rolled accumulation + helpers duplicate the base; bespoke DI wiring duplicates `AddValidatedOptions`. | `GatewayOptionsValidator``OptionsValidatorBase<GatewayOptions>` (delete the `List<string>`/tail/helpers; keep the domain rules); `AddGatewayConfiguration``AddValidatedOptions<GatewayOptions, GatewayOptionsValidator>`. Domain rules unchanged. |
| **ScadaBridge** | **Heaviest.** Four per-module `*OptionsValidator : IValidateOptions<T>` (Cluster / Security / HealthMonitoring / AuditLog) each with their own `List<string>` accumulation, wired through bespoke `AddXxx` extensions; **plus** a raw-config pre-Akka `StartupValidator`. | Four copies of the accumulation plumbing + bespoke DI wiring; `StartupValidator` open-codes the preflight envelope. | Each `*OptionsValidator``OptionsValidatorBase<T>`; each module's `AddXxx``AddValidatedOptions`; `StartupValidator``ConfigPreflight` (byte-compatible message, §4). Domain rules unchanged. |
> No sister-repo adoption is in scope for this release — the library is built; adoption is the
> follow-on tracked in [`../GAPS.md`](../GAPS.md). (Unlike the observability pass, which carried
> one in-pass MxGateway adoption, this pass is library-only.)
## 6. Acceptance (what "converged" means)
A project is converged when: (a) every options validator it owns derives from
`OptionsValidatorBase<TOptions>` and records failures on the supplied `ValidationBuilder` (no
private `List<string>` plumbing, no early return); (b) every bind-and-validate registration goes
through `AddValidatedOptions<TOptions, TValidator>(config, sectionPath)`; (c) any pre-host raw-config
checks go through `ConfigPreflight` and surface via `ThrowIfInvalid()`; (d) all validation
**accumulates every failure** and surfaces them together (`OptionsValidationException` at host
start, or `InvalidOperationException` from `ConfigPreflight`); and (e) failure wording for the
shared primitives comes from the library's `Checks` seam, identical across the fleet. Each app's
**options classes and domain rules stay its own**; only the plumbing is shared. OtOpcUa's
`DraftValidator` is explicitly exempt — it is not part of the converged surface.