Files
scadaproj/components/configuration/spec/SPEC.md
T
Joseph Doherty fbf0f23e76 docs(config): correct OtOpcUa draft-validation description
The C# DraftValidator/DraftSnapshot has NO live caller in OtOpcUa src/ (verified
repo-wide) — it is dormant complement code. The enforced pre-publish draft
validation runs DB-side in the sp_ValidateDraft stored procedure (Status='Draft'
-> sp_PublishGeneration lifecycle). Reframe across current-state/SPEC/GAPS/README/
CLAUDE.md from 'runtime draft validation' + a false publish-pipeline caller to
'dormant managed validator; enforcement is DB-side'. Out-of-scope conclusion
for ZB.MOM.WW.Configuration is unchanged.
2026-06-01 10:13:29 -04:00

204 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Configuration validation — normalized target spec
Status: **Draft**. The single design the sister projects converge on for **startup
configuration validation**. Derived from the three code-verified current-state docs
(`../current-state/`). Goal is *path to shared code*
(`../shared-contract/ZB.MOM.WW.Configuration.md`), so each normalized section maps to a shared
library seam. The library is **already built** at
[`../../../ZB.MOM.WW.Configuration/`](../../../ZB.MOM.WW.Configuration/) (`0.1.0`, 27 tests).
## 0. Scope
The common concern is **fail-fast validation of configuration at process startup**: bind an
`appsettings.json` / environment section to a typed options object (or read raw keys before the
host exists), check every field, and refuse to start when anything is wrong — surfacing **all**
problems at once so an operator fixes them in one edit rather than one boot-loop per typo. All
three apps already do this; they do it with three private copies of the same plumbing.
**Normalized here** (goes in the shared `ZB.MOM.WW.Configuration` library):
- **The `IValidateOptions<T>` failure-accumulation convention.** Every app hand-rolls a
`List<string> failures`, a pile of `if (...) failures.Add(...)`, and the
`failures.Count == 0 ? Success : Fail(failures)` tail. That plumbing becomes
`OptionsValidatorBase<TOptions>`: override `protected void Validate(ValidationBuilder, TOptions)`,
record failures on the builder, and the base aggregates them and returns a single
`ValidateOptionsResult` (Success only when the builder is clean).
- **Reusable rule primitives.** The same checks recur across apps — required-string, TCP port
range, `host:port` endpoint, positive `TimeSpan`, one-of-a-set, minimum collection count. They
become `ValidationBuilder` primitives (`Required`, `Port`, `HostPort`, `PositiveTimeSpan`,
`OneOf`, `MinCount`) plus `RequireThat(bool, message)` / `Add(message)` escape hatches for
custom and cross-field rules. Wording is centralized in an internal `Checks` seam so a
given rule reads identically everywhere.
- **`AddValidatedOptions<TOptions, TValidator>(IConfiguration, sectionPath)`** — one DI call that
binds the section, registers the validator as the options' `IValidateOptions<TOptions>`, and
enables `ValidateOnStart()`. Replaces the per-module `AddOptions().Bind(...).ValidateOnStart()`
+ `AddSingleton<IValidateOptions<...>, ...>()` pair that each app open-codes.
- **The pre-host `ConfigPreflight` aggregator** — a fluent checker over raw `IConfiguration` for
the keys that must be valid *before* the host / DI container / actor system is built (node
role, remoting port, site id). Generalizes ScadaBridge's `StartupValidator`. Fluent surface:
`For(config)`, `.Require(key, predicate, reason)`, `.RequireValue(key)`, `.RequirePort(key)`,
`.When(condition, block)` (role-conditional rules), `.ThrowIfInvalid()`.
**The error-handling contract** (shared across both front-ends):
- **Accumulate ALL failures.** Never short-circuit on the first failure — collect every problem
and surface them together. (`OptionsValidatorBase` and `ConfigPreflight` both do this; it is
the behaviour every app already wanted.)
- **Two surfacing paths**, by where validation runs:
1. **Options bound through DI**`ValidateOnStart()` raises an
**`OptionsValidationException`** at host start (the .NET options pipeline aggregates the
failures). This is the `AddValidatedOptions` path.
2. **Raw config, pre-host**`ConfigPreflight.ThrowIfInvalid()` throws an
**`InvalidOperationException`** listing all failures.
- **Message format `"<field> <reason>"`** for each individual failure, produced by the shared
`Checks` primitives (e.g. `"ScadaBridge:Node:RemotingPort must be between 1 and 65535 (was '0')"`).
`ConfigPreflight.ThrowIfInvalid()` wraps the accumulated lines in the exact envelope
ScadaBridge's `StartupValidator` uses today (§4) so the migration is byte-compatible.
**Explicitly NOT normalized** (domain-specific — stays per project):
- **Each app's options classes and their domain rules.** `GatewayOptions` (worker exe path,
heartbeat grace ≥ interval, TLS validity years), `ClusterOptions` (split-brain strategy,
`MinNrOfMembers == 1`, heartbeat ≪ failure-detection), `SecurityOptions` (LDAP server /
search base), `HealthMonitoringOptions` (positive `PeriodicTimer` intervals),
`AuditLogOptions` (payload caps, retention bounds), and ScadaBridge's `Node` topology rules
(gRPC port ≠ remoting port, seed nodes must not target the gRPC port) all stay where they
live. Only the *plumbing they sit on* is shared; the *rules* are theirs.
- **OtOpcUa's draft/generation-content validation** (the dormant C# `DraftValidator` /
`DraftSnapshot`, plus the live DB stored procedure `sp_ValidateDraft` it was designed to
complement). This is **not** options/config validation at all — it is pre-publish validation of an
operator's *configuration draft content* (UNS segment regex, EquipmentId derivation, cross-cluster
namespace binding, reservation pre-flight) against database rows, not against `IConfiguration`;
enforcement lives DB-side in `sp_ValidateDraft` and the managed `DraftValidator` has **no live
caller** in `src/` today. It shares only a *philosophy* (return every failure in one pass) with
this component and is **out of scope** for the shared library. It stays entirely in OtOpcUa.
## 1. `IValidateOptions` base — `OptionsValidatorBase<TOptions>`
The headline plumbing fix. Today each validator re-implements: the `Validate(string?, TOptions)`
signature, a local `List<string>`, the `failures.Count == 0 ? Success : Fail(failures)` tail,
and (in several) private `AddIfBlank` / `AddIfNotPositive` helpers. The base owns all of that:
```csharp
public sealed class ClusterOptionsValidator : OptionsValidatorBase<ClusterOptions>
{
protected override void Validate(ValidationBuilder v, ClusterOptions o)
{
v.MinCount(o.SeedNodes, 2, "ClusterOptions.SeedNodes");
v.OneOf(o.SplitBrainResolverStrategy, ["keep-oldest"], "ClusterOptions.SplitBrainResolverStrategy");
v.PositiveTimeSpan(o.StableAfter, "ClusterOptions.StableAfter");
v.RequireThat(o.MinNrOfMembers == 1,
$"ClusterOptions.MinNrOfMembers must be 1 (was {o.MinNrOfMembers})");
// cross-field rule:
v.RequireThat(o.HeartbeatInterval < o.FailureDetectionThreshold,
"ClusterOptions.HeartbeatInterval must be below FailureDetectionThreshold");
}
}
```
`OptionsValidatorBase<TOptions>.Validate(string?, TOptions)` guards null, creates a
`ValidationBuilder`, calls the override, and returns `Success` only when `builder.IsValid`.
**Accumulation is automatic** — the override never returns early; it records everything.
## 2. Rule primitives — `ValidationBuilder`
`ValidationBuilder` is the accumulator passed into the override. Primitives both check a value
and append a consistently-worded `"<field> <reason>"` message on failure; escape hatches cover
the rest:
| Primitive | Checks | Failure wording (from `Checks`) |
|---|---|---|
| `Required(value, field)` | non-null, non-whitespace string | `"<field> is required"` |
| `Port(value, field)` | int in 165535 | `"<field> must be between 1 and 65535 (was <value>)"` |
| `HostPort(value, field)` | `host:port` with port 165535 | `"<field> must be 'host:port' with port 1-65535 (was '<value>')"` |
| `PositiveTimeSpan(value, field)` | `> TimeSpan.Zero` | `"<field> must be a positive duration (was <value>)"` |
| `OneOf(value, allowed, field)` | case-insensitive membership | `"<field> must be one of [<allowed>] (was '<value>')"` |
| `MinCount(value, min, field)` | collection ≥ `min` items | `"<field> must contain at least <min> item(s) (had <n>)"` |
| `RequireThat(ok, message)` | arbitrary boolean (cross-field, custom) | caller-supplied |
| `Add(message)` | unconditional failure | caller-supplied |
Properties: `Failures` (read-only accumulated list) and `IsValid`. Every method returns the
builder for chaining. `Add`/`RequireThat` carry the rules that are genuinely app-specific (e.g.
MxGateway's "ExecutablePath must point to a .exe", ScadaBridge's heartbeat-vs-threshold
ordering) without forcing them into a primitive.
## 3. DI wiring — `AddValidatedOptions`
```csharp
builder.Services.AddValidatedOptions<ClusterOptions, ClusterOptionsValidator>(
builder.Configuration, "ScadaBridge:Cluster");
```
Binds `ScadaBridge:Cluster``ClusterOptions`, registers `ClusterOptionsValidator` as a
singleton `IValidateOptions<ClusterOptions>`, and calls `ValidateOnStart()`. Returns the
`OptionsBuilder<TOptions>` for further chaining (e.g. `.PostConfigure(...)`). This collapses the
three-line idiom every module repeats (`AddOptions().Bind(...).ValidateOnStart()` +
`AddSingleton<IValidateOptions<...>, ...>()`) into one call.
> The validator is registered as a **singleton** (it backs the singleton options factory). It
> must be singleton-safe — no scoped dependencies. All current validators are stateless, so this
> holds.
When a section bound this way fails, the .NET options pipeline raises **`OptionsValidationException`**
at host start (because of `ValidateOnStart()`), with all accumulated messages.
## 4. Pre-host preflight — `ConfigPreflight`
For keys that must be valid **before** the host / DI / actor system exists, `ConfigPreflight`
reads raw `IConfiguration` and accumulates failures the same way:
```csharp
ConfigPreflight.For(configuration)
.Require("ScadaBridge:Node:Role", v => v is "Central" or "Site", "must be 'Central' or 'Site'")
.RequireValue("ScadaBridge:Node:NodeHostname")
.RequirePort("ScadaBridge:Node:RemotingPort")
.When(role == "Site", p => p.RequireValue("ScadaBridge:Node:SiteId"))
.ThrowIfInvalid();
```
`.ThrowIfInvalid()` throws **`InvalidOperationException`** when any failure was recorded, with
this exact envelope:
```
Configuration validation failed:
- <field> <reason>
- <field> <reason>
```
> **Byte-compatibility with ScadaBridge's `StartupValidator`.** ScadaBridge's
> `StartupValidator.Validate` throws
> `$"Configuration validation failed:\n{string.Join("\n", errors.Select(e => $" - {e}"))}"`.
> `ConfigPreflight.ThrowIfInvalid()` produces the **identical** string
> (`"Configuration validation failed:\n" + the same `" - <field> <reason>"` lines, `\n`-joined`).
> The migration is a behaviour-preserving swap: same exception type
> (`InvalidOperationException`), same message bytes. This is verified in the library's
> `ConfigPreflightTests` and is the reason the message format is pinned in §0.
`.When(condition, block)` carries role-conditional rules (ScadaBridge only validates database /
security / gRPC-port keys when the node is `Central` or `Site` respectively) without an `if` ladder.
## 5. Per-project migration
| Project | Current state | Primary gaps | What normalizes |
|---|---|---|---|
| **OtOpcUa** | **No options validation at all** — options bound with bare `.Bind()` (`LdapOptions`, `OpcUa`); zero `IValidateOptions` / `ValidateOnStart` in the repo. The only validation-shaped type is the dormant C# `DraftValidator` (draft/generation content; real enforcement is DB-side `sp_ValidateDraft`) — **out of scope**. | No startup validation of `Ldap` / `OpcUa` sections — a bad value fails opaquely on first use. | *Optional* adoption: add `OptionsValidatorBase` subclasses + `AddValidatedOptions` for the sections worth guarding. `DraftValidator`/`DraftSnapshot` stay per-project untouched. Lightest consumer. |
| **MxGateway** | One large `GatewayOptionsValidator : IValidateOptions<GatewayOptions>` (~360 LOC, 9 sub-validators, private `AddIfBlank`/`AddIfNotPositive`/`AddIfInvalidPath` helpers); wired via `AddGatewayConfiguration` (`AddOptions().BindConfiguration().ValidateOnStart()`). | Hand-rolled accumulation + helpers duplicate the base; bespoke DI wiring duplicates `AddValidatedOptions`. | `GatewayOptionsValidator``OptionsValidatorBase<GatewayOptions>` (delete the `List<string>`/tail/helpers; keep the domain rules); `AddGatewayConfiguration``AddValidatedOptions<GatewayOptions, GatewayOptionsValidator>`. Domain rules unchanged. |
| **ScadaBridge** | **Heaviest.** Four per-module `*OptionsValidator : IValidateOptions<T>` (Cluster / Security / HealthMonitoring / AuditLog) each with their own `List<string>` accumulation, wired through bespoke `AddXxx` extensions; **plus** a raw-config pre-Akka `StartupValidator`. | Four copies of the accumulation plumbing + bespoke DI wiring; `StartupValidator` open-codes the preflight envelope. | Each `*OptionsValidator``OptionsValidatorBase<T>`; each module's `AddXxx``AddValidatedOptions`; `StartupValidator``ConfigPreflight` (byte-compatible message, §4). Domain rules unchanged. |
> No sister-repo adoption is in scope for this release — the library is built; adoption is the
> follow-on tracked in [`../GAPS.md`](../GAPS.md). (Unlike the observability pass, which carried
> one in-pass MxGateway adoption, this pass is library-only.)
## 6. Acceptance (what "converged" means)
A project is converged when: (a) every options validator it owns derives from
`OptionsValidatorBase<TOptions>` and records failures on the supplied `ValidationBuilder` (no
private `List<string>` plumbing, no early return); (b) every bind-and-validate registration goes
through `AddValidatedOptions<TOptions, TValidator>(config, sectionPath)`; (c) any pre-host raw-config
checks go through `ConfigPreflight` and surface via `ThrowIfInvalid()`; (d) all validation
**accumulates every failure** and surfaces them together (`OptionsValidationException` at host
start, or `InvalidOperationException` from `ConfigPreflight`); and (e) failure wording for the
shared primitives comes from the library's `Checks` seam, identical across the fleet. Each app's
**options classes and domain rules stay its own**; only the plumbing is shared. OtOpcUa's
`DraftValidator` is explicitly exempt — it is not part of the converged surface.