Files
scadaproj/components/configuration/spec/SPEC.md
T
Joseph Doherty fbf0f23e76 docs(config): correct OtOpcUa draft-validation description
The C# DraftValidator/DraftSnapshot has NO live caller in OtOpcUa src/ (verified
repo-wide) — it is dormant complement code. The enforced pre-publish draft
validation runs DB-side in the sp_ValidateDraft stored procedure (Status='Draft'
-> sp_PublishGeneration lifecycle). Reframe across current-state/SPEC/GAPS/README/
CLAUDE.md from 'runtime draft validation' + a false publish-pipeline caller to
'dormant managed validator; enforcement is DB-side'. Out-of-scope conclusion
for ZB.MOM.WW.Configuration is unchanged.
2026-06-01 10:13:29 -04:00

14 KiB
Raw Blame History

Configuration validation — normalized target spec

Status: Draft. The single design the sister projects converge on for startup configuration validation. Derived from the three code-verified current-state docs (../current-state/). Goal is path to shared code (../shared-contract/ZB.MOM.WW.Configuration.md), so each normalized section maps to a shared library seam. The library is already built at ../../../ZB.MOM.WW.Configuration/ (0.1.0, 27 tests).

0. Scope

The common concern is fail-fast validation of configuration at process startup: bind an appsettings.json / environment section to a typed options object (or read raw keys before the host exists), check every field, and refuse to start when anything is wrong — surfacing all problems at once so an operator fixes them in one edit rather than one boot-loop per typo. All three apps already do this; they do it with three private copies of the same plumbing.

Normalized here (goes in the shared ZB.MOM.WW.Configuration library):

  • The IValidateOptions<T> failure-accumulation convention. Every app hand-rolls a List<string> failures, a pile of if (...) failures.Add(...), and the failures.Count == 0 ? Success : Fail(failures) tail. That plumbing becomes OptionsValidatorBase<TOptions>: override protected void Validate(ValidationBuilder, TOptions), record failures on the builder, and the base aggregates them and returns a single ValidateOptionsResult (Success only when the builder is clean).
  • Reusable rule primitives. The same checks recur across apps — required-string, TCP port range, host:port endpoint, positive TimeSpan, one-of-a-set, minimum collection count. They become ValidationBuilder primitives (Required, Port, HostPort, PositiveTimeSpan, OneOf, MinCount) plus RequireThat(bool, message) / Add(message) escape hatches for custom and cross-field rules. Wording is centralized in an internal Checks seam so a given rule reads identically everywhere.
  • AddValidatedOptions<TOptions, TValidator>(IConfiguration, sectionPath) — one DI call that binds the section, registers the validator as the options' IValidateOptions<TOptions>, and enables ValidateOnStart(). Replaces the per-module AddOptions().Bind(...).ValidateOnStart()
    • AddSingleton<IValidateOptions<...>, ...>() pair that each app open-codes.
  • The pre-host ConfigPreflight aggregator — a fluent checker over raw IConfiguration for the keys that must be valid before the host / DI container / actor system is built (node role, remoting port, site id). Generalizes ScadaBridge's StartupValidator. Fluent surface: For(config), .Require(key, predicate, reason), .RequireValue(key), .RequirePort(key), .When(condition, block) (role-conditional rules), .ThrowIfInvalid().

The error-handling contract (shared across both front-ends):

  • Accumulate ALL failures. Never short-circuit on the first failure — collect every problem and surface them together. (OptionsValidatorBase and ConfigPreflight both do this; it is the behaviour every app already wanted.)
  • Two surfacing paths, by where validation runs:
    1. Options bound through DIValidateOnStart() raises an OptionsValidationException at host start (the .NET options pipeline aggregates the failures). This is the AddValidatedOptions path.
    2. Raw config, pre-hostConfigPreflight.ThrowIfInvalid() throws an InvalidOperationException listing all failures.
  • Message format "<field> <reason>" for each individual failure, produced by the shared Checks primitives (e.g. "ScadaBridge:Node:RemotingPort must be between 1 and 65535 (was '0')"). ConfigPreflight.ThrowIfInvalid() wraps the accumulated lines in the exact envelope ScadaBridge's StartupValidator uses today (§4) so the migration is byte-compatible.

Explicitly NOT normalized (domain-specific — stays per project):

  • Each app's options classes and their domain rules. GatewayOptions (worker exe path, heartbeat grace ≥ interval, TLS validity years), ClusterOptions (split-brain strategy, MinNrOfMembers == 1, heartbeat ≪ failure-detection), SecurityOptions (LDAP server / search base), HealthMonitoringOptions (positive PeriodicTimer intervals), AuditLogOptions (payload caps, retention bounds), and ScadaBridge's Node topology rules (gRPC port ≠ remoting port, seed nodes must not target the gRPC port) all stay where they live. Only the plumbing they sit on is shared; the rules are theirs.
  • OtOpcUa's draft/generation-content validation (the dormant C# DraftValidator / DraftSnapshot, plus the live DB stored procedure sp_ValidateDraft it was designed to complement). This is not options/config validation at all — it is pre-publish validation of an operator's configuration draft content (UNS segment regex, EquipmentId derivation, cross-cluster namespace binding, reservation pre-flight) against database rows, not against IConfiguration; enforcement lives DB-side in sp_ValidateDraft and the managed DraftValidator has no live caller in src/ today. It shares only a philosophy (return every failure in one pass) with this component and is out of scope for the shared library. It stays entirely in OtOpcUa.

1. IValidateOptions base — OptionsValidatorBase<TOptions>

The headline plumbing fix. Today each validator re-implements: the Validate(string?, TOptions) signature, a local List<string>, the failures.Count == 0 ? Success : Fail(failures) tail, and (in several) private AddIfBlank / AddIfNotPositive helpers. The base owns all of that:

public sealed class ClusterOptionsValidator : OptionsValidatorBase<ClusterOptions>
{
    protected override void Validate(ValidationBuilder v, ClusterOptions o)
    {
        v.MinCount(o.SeedNodes, 2, "ClusterOptions.SeedNodes");
        v.OneOf(o.SplitBrainResolverStrategy, ["keep-oldest"], "ClusterOptions.SplitBrainResolverStrategy");
        v.PositiveTimeSpan(o.StableAfter, "ClusterOptions.StableAfter");
        v.RequireThat(o.MinNrOfMembers == 1,
            $"ClusterOptions.MinNrOfMembers must be 1 (was {o.MinNrOfMembers})");
        // cross-field rule:
        v.RequireThat(o.HeartbeatInterval < o.FailureDetectionThreshold,
            "ClusterOptions.HeartbeatInterval must be below FailureDetectionThreshold");
    }
}

OptionsValidatorBase<TOptions>.Validate(string?, TOptions) guards null, creates a ValidationBuilder, calls the override, and returns Success only when builder.IsValid. Accumulation is automatic — the override never returns early; it records everything.

2. Rule primitives — ValidationBuilder

ValidationBuilder is the accumulator passed into the override. Primitives both check a value and append a consistently-worded "<field> <reason>" message on failure; escape hatches cover the rest:

Primitive Checks Failure wording (from Checks)
Required(value, field) non-null, non-whitespace string "<field> is required"
Port(value, field) int in 165535 "<field> must be between 1 and 65535 (was <value>)"
HostPort(value, field) host:port with port 165535 "<field> must be 'host:port' with port 1-65535 (was '<value>')"
PositiveTimeSpan(value, field) > TimeSpan.Zero "<field> must be a positive duration (was <value>)"
OneOf(value, allowed, field) case-insensitive membership "<field> must be one of [<allowed>] (was '<value>')"
MinCount(value, min, field) collection ≥ min items "<field> must contain at least <min> item(s) (had <n>)"
RequireThat(ok, message) arbitrary boolean (cross-field, custom) caller-supplied
Add(message) unconditional failure caller-supplied

Properties: Failures (read-only accumulated list) and IsValid. Every method returns the builder for chaining. Add/RequireThat carry the rules that are genuinely app-specific (e.g. MxGateway's "ExecutablePath must point to a .exe", ScadaBridge's heartbeat-vs-threshold ordering) without forcing them into a primitive.

3. DI wiring — AddValidatedOptions

builder.Services.AddValidatedOptions<ClusterOptions, ClusterOptionsValidator>(
    builder.Configuration, "ScadaBridge:Cluster");

Binds ScadaBridge:ClusterClusterOptions, registers ClusterOptionsValidator as a singleton IValidateOptions<ClusterOptions>, and calls ValidateOnStart(). Returns the OptionsBuilder<TOptions> for further chaining (e.g. .PostConfigure(...)). This collapses the three-line idiom every module repeats (AddOptions().Bind(...).ValidateOnStart() + AddSingleton<IValidateOptions<...>, ...>()) into one call.

The validator is registered as a singleton (it backs the singleton options factory). It must be singleton-safe — no scoped dependencies. All current validators are stateless, so this holds.

When a section bound this way fails, the .NET options pipeline raises OptionsValidationException at host start (because of ValidateOnStart()), with all accumulated messages.

4. Pre-host preflight — ConfigPreflight

For keys that must be valid before the host / DI / actor system exists, ConfigPreflight reads raw IConfiguration and accumulates failures the same way:

ConfigPreflight.For(configuration)
    .Require("ScadaBridge:Node:Role", v => v is "Central" or "Site", "must be 'Central' or 'Site'")
    .RequireValue("ScadaBridge:Node:NodeHostname")
    .RequirePort("ScadaBridge:Node:RemotingPort")
    .When(role == "Site", p => p.RequireValue("ScadaBridge:Node:SiteId"))
    .ThrowIfInvalid();

.ThrowIfInvalid() throws InvalidOperationException when any failure was recorded, with this exact envelope:

Configuration validation failed:
  - <field> <reason>
  - <field> <reason>

Byte-compatibility with ScadaBridge's StartupValidator. ScadaBridge's StartupValidator.Validate throws $"Configuration validation failed:\n{string.Join("\n", errors.Select(e => $" - {e}"))}". ConfigPreflight.ThrowIfInvalid() produces the identical string ("Configuration validation failed:\n" + the same " - "lines,\n-joined). The migration is a behaviour-preserving swap: same exception type (InvalidOperationException), same message bytes. This is verified in the library's ConfigPreflightTests and is the reason the message format is pinned in §0.

.When(condition, block) carries role-conditional rules (ScadaBridge only validates database / security / gRPC-port keys when the node is Central or Site respectively) without an if ladder.

5. Per-project migration

Project Current state Primary gaps What normalizes
OtOpcUa No options validation at all — options bound with bare .Bind() (LdapOptions, OpcUa); zero IValidateOptions / ValidateOnStart in the repo. The only validation-shaped type is the dormant C# DraftValidator (draft/generation content; real enforcement is DB-side sp_ValidateDraft) — out of scope. No startup validation of Ldap / OpcUa sections — a bad value fails opaquely on first use. Optional adoption: add OptionsValidatorBase subclasses + AddValidatedOptions for the sections worth guarding. DraftValidator/DraftSnapshot stay per-project untouched. Lightest consumer.
MxGateway One large GatewayOptionsValidator : IValidateOptions<GatewayOptions> (~360 LOC, 9 sub-validators, private AddIfBlank/AddIfNotPositive/AddIfInvalidPath helpers); wired via AddGatewayConfiguration (AddOptions().BindConfiguration().ValidateOnStart()). Hand-rolled accumulation + helpers duplicate the base; bespoke DI wiring duplicates AddValidatedOptions. GatewayOptionsValidatorOptionsValidatorBase<GatewayOptions> (delete the List<string>/tail/helpers; keep the domain rules); AddGatewayConfigurationAddValidatedOptions<GatewayOptions, GatewayOptionsValidator>. Domain rules unchanged.
ScadaBridge Heaviest. Four per-module *OptionsValidator : IValidateOptions<T> (Cluster / Security / HealthMonitoring / AuditLog) each with their own List<string> accumulation, wired through bespoke AddXxx extensions; plus a raw-config pre-Akka StartupValidator. Four copies of the accumulation plumbing + bespoke DI wiring; StartupValidator open-codes the preflight envelope. Each *OptionsValidatorOptionsValidatorBase<T>; each module's AddXxxAddValidatedOptions; StartupValidatorConfigPreflight (byte-compatible message, §4). Domain rules unchanged.

No sister-repo adoption is in scope for this release — the library is built; adoption is the follow-on tracked in ../GAPS.md. (Unlike the observability pass, which carried one in-pass MxGateway adoption, this pass is library-only.)

6. Acceptance (what "converged" means)

A project is converged when: (a) every options validator it owns derives from OptionsValidatorBase<TOptions> and records failures on the supplied ValidationBuilder (no private List<string> plumbing, no early return); (b) every bind-and-validate registration goes through AddValidatedOptions<TOptions, TValidator>(config, sectionPath); (c) any pre-host raw-config checks go through ConfigPreflight and surface via ThrowIfInvalid(); (d) all validation accumulates every failure and surfaces them together (OptionsValidationException at host start, or InvalidOperationException from ConfigPreflight); and (e) failure wording for the shared primitives comes from the library's Checks seam, identical across the fleet. Each app's options classes and domain rules stay its own; only the plumbing is shared. OtOpcUa's DraftValidator is explicitly exempt — it is not part of the converged surface.