Files
lmxopcua/docs/v2/implementation/phase-1-configuration-and-admin-scaffold.md
Joseph Doherty 592fa79e3c Add Phase 0 + Phase 1 detailed implementation plans under docs/v2/implementation/ with a phase-gate model so the work can be verified for compliance to the v2 design as it lands. Three-gate structure per phase (entry / mid / exit) with explicit compliance-check categories: schema compliance (live DB introspected against config-db-schema.md DDL via xUnit), decision compliance (every decision number cited in the phase doc must have at least one code/test citation in the codebase, verified via git grep), visual compliance (Admin UI screenshots reviewed side-by-side against ScadaLink CentralUI's equivalent screens), behavioral compliance (per-phase end-to-end smoke test that always passes at exit, never "known broken fix later"), stability compliance (cross-cutting protections from driver-stability.md wired up and regression-tested for Tier C drivers), and documentation compliance (any deviation from v2 design docs reflected back as decision-log updates with explicit "supersedes" notes). Exit gate requires two-reviewer signoff and an exit-gate-{phase}.md record; silent deviation is the failure mode the gates exist to make impossible to ship. Phase 0 doc covers the mechanical LmxOpcUa → OtOpcUa rename with 9 tasks, 7 compliance checks, and a completion checklist that gates on baseline test count parity. Phase 1 doc covers the largest greenfield phase — 5 work streams (Core.Abstractions, Configuration project with EF Core schema + stored procs + LiteDB cache + generation-diff applier, Core with GenericDriverNodeManager rename + IAddressSpaceBuilder + driver isolation, Server with Microsoft.Extensions.Hosting replacing TopShelf + credential-bound bootstrap, Admin Blazor Server app mirroring ScadaLink CentralUI verbatim with LDAP cookie auth + draft/diff/publish workflow + UNS structure management + equipment CRUD + release-reservation and merge-equipment operator flows) — with task-level acceptance criteria, a 14-step end-to-end smoke test, and decision citation requirements for #1-125. New decisions #126-127 capture the gate model and per-phase doc structure. Cross-references added to plan.md Reference Documents section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 11:25:09 -04:00

33 KiB
Raw Blame History

Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold

Status: DRAFT — implementation plan for Phase 1 of the v2 build (plan.md §6).

Branch: v2/phase-1-configuration Estimated duration: 46 weeks (largest greenfield phase; most foundational) Predecessor: Phase 0 (phase-0-rename-and-net10.md) Successor: Phase 2 (Galaxy parity refactor)

Phase Objective

Stand up the central configuration substrate for the v2 fleet:

  1. Core.Abstractions project — driver capability interfaces (IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe, IDriverConfigEditor, DriverAttributeInfo)
  2. Configuration project — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
  3. Core projectGenericDriverNodeManager (renamed from LmxNodeManager), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via IAddressSpaceBuilder
  4. Server projectMicrosoft.Extensions.Hosting-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
  5. Admin project — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)

No driver instances yet (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.

Scope — What Changes

Concern Change
New projects 5 new src projects + 5 matching test projects
Existing v1 Host project Refactored to consume Core.Abstractions interfaces against its existing Galaxy implementation — but not split into Proxy/Host/Shared yet (Phase 2)
LmxNodeManager Renamed to GenericDriverNodeManager in Core, with IDriver swapped in for IMxAccessClient. The existing v1 Host instantiates GalaxyNodeManager : GenericDriverNodeManager (legacy in-process) — see plan.md §5a
Service hosting TopShelf removed; Microsoft.Extensions.Hosting BackgroundService used (decision #30)
Central config DB New SQL Server database OtOpcUaConfig provisioned from EF Core migrations
LDAP authentication for Admin Admin.Security project mirrors ScadaLink.Security; cookie auth + JWT API endpoint
Local LiteDB cache on each node New config_cache.db per node; bootstraps from central DB or cache

Scope — What Does NOT Change

Item Reason
Galaxy out-of-process split Phase 2
Any new driver (Modbus, AB, S7, etc.) Phase 3+
OPC UA wire behavior Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything
Equipment-class template integration with future schemas repo EquipmentClassRef is a nullable hook column; no validation yet (decisions #112, #115)
Per-driver custom config editors in Admin Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases
Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) Phases 68
Equipment Protocol Survey External prerequisite — ideally runs in parallel with Phase 1 (handoff §"Equipment Protocol Survey")

Entry Gate Checklist

  • Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
  • v2 branch is clean
  • Phase 0 PR merged
  • SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
  • LDAP / GLAuth dev instance available for Admin auth integration testing
  • ScadaLink CentralUI source accessible at C:\Users\dohertj2\Desktop\scadalink-design\ for parity reference
  • All Phase 1-relevant design docs reviewed: plan.md §45, config-db-schema.md (entire), admin-ui.md (entire), driver-stability.md §"Cross-Cutting Protections" (sets context for Core.Abstractions scope)
  • Decisions #1125 read at least skim-level; key ones for Phase 1: #1422, #25, #28, #30, #3233, #4651, #79125

Evidence file: docs/v2/implementation/entry-gate-phase-1.md recording date, signoff, environment availability.

Task Breakdown

Phase 1 is large — broken into 5 work streams (AE) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.

Stream A — Core.Abstractions (1 week)

Task A.1 — Define driver capability interfaces

Create src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ (.NET 10, no dependencies). Define:

public interface IDriver { /* lifecycle, metadata, health */ }
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
public interface IReadable { /* on-demand read */ }
public interface IWritable { /* on-demand write */ }
public interface ISubscribable { /* data change subscriptions */ }
public interface IAlarmSource { /* alarm events + acknowledgment */ }
public interface IHistoryProvider { /* historical reads */ }
public interface IRediscoverable { /* opt-in change-detection signal */ }
public interface IHostConnectivityProbe { /* per-host runtime status */ }
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }

Plus the data models referenced from the interfaces:

public sealed record DriverAttributeInfo(
    string FullName,
    DriverDataType DriverDataType,
    bool IsArray,
    uint? ArrayDim,
    SecurityClassification SecurityClass,
    bool IsHistorized);
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }

Acceptance:

  • All interfaces compile in a project with zero dependencies beyond BCL
  • xUnit test project asserts (via reflection) that no interface returns or accepts a type from Core or Configuration (interface independence per decision #59)
  • Each interface XML doc cites the design decision(s) it implements (e.g. IRediscoverable cites #54)

Task A.2 — Define DriverTypeRegistry

public sealed class DriverTypeRegistry
{
    public DriverTypeMetadata Get(string driverType);
    public IEnumerable<DriverTypeMetadata> All();
}

public sealed record DriverTypeMetadata(
    string TypeName,                                          // "Galaxy" | "ModbusTcp" | ...
    NamespaceKindCompatibility AllowedNamespaceKinds,         // per decision #111
    string DriverConfigJsonSchema,                            // per decision #91
    string DeviceConfigJsonSchema,                            // optional
    string TagConfigJsonSchema);

[Flags]
public enum NamespaceKindCompatibility
{
    Equipment = 1, SystemPlatform = 2, Simulated = 4
}

In v2.0 v1 only registers the Galaxy type (AllowedNamespaceKinds = SystemPlatform). Phase 3+ extends.

Acceptance:

  • Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
  • Galaxy registration entry exists with AllowedNamespaceKinds = SystemPlatform per decision #111

Stream B — Configuration project (1.5 weeks)

Task B.1 — EF Core schema + initial migration

Create src/ZB.MOM.WW.OtOpcUa.Configuration/ (.NET 10, EF Core 10).

Implement DbContext with entities matching config-db-schema.md exactly:

  • ServerCluster, ClusterNode, ClusterNodeCredential
  • Namespace (generation-versioned per decision #123)
  • UnsArea, UnsLine
  • ConfigGeneration
  • DriverInstance, Device, Equipment, Tag, PollGroup
  • ClusterNodeGenerationState, ConfigAuditLog
  • ExternalIdReservation (NOT generation-versioned per decision #124)

Generate the initial migration:

dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration

Acceptance:

  • Apply migration to a clean SQL Server instance produces the schema in config-db-schema.md
  • Schema-validation test (SchemaComplianceTests) introspects the live DB and asserts every table/column/index/constraint matches the doc
  • Test runs in CI against a SQL Server container

Task B.2 — Stored procedures via MigrationBuilder.Sql

Add stored procedures from config-db-schema.md §"Stored Procedures":

  • sp_GetCurrentGenerationForCluster
  • sp_GetGenerationContent
  • sp_RegisterNodeGenerationApplied
  • sp_PublishGeneration (with the MERGE against ExternalIdReservation per decision #124)
  • sp_RollbackToGeneration
  • sp_ValidateDraft (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
  • sp_ComputeGenerationDiff
  • sp_ReleaseExternalIdReservation (FleetAdmin only)

Use CREATE OR ALTER style in MigrationBuilder.Sql() blocks so procs version with the schema.

Acceptance:

  • Each proc has at least one xUnit test exercising the happy path + at least one error path
  • sp_PublishGeneration has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
  • sp_GetCurrentGenerationForCluster has an authorization test: caller bound to NodeId X cannot read cluster Y's generation

Task B.3 — Authorization model (SQL principals + GRANT)

Add a separate migration AuthorizationGrants that:

  • Creates two SQL roles: OtOpcUaNode, OtOpcUaAdmin
  • Grants EXECUTE on the appropriate procs per config-db-schema.md §"Authorization Model"
  • Grants no direct table access to either role

Acceptance:

  • Test that runs as a OtOpcUaNode-roled principal can only call the node procs, not admin procs
  • Test that runs as a OtOpcUaAdmin-roled principal can call publish/rollback procs
  • Test that direct SELECT * FROM dbo.ConfigGeneration from a OtOpcUaNode principal is denied

Task B.4 — JSON-schema validators (managed code)

In Configuration.Validation/, implement validators consumed by sp_ValidateDraft (called from the Admin app pre-publish per decision #91):

  • UNS segment regex (^[a-z0-9-]{1,32}$ or _default)
  • Path length (≤200 chars)
  • UUID immutability across generations
  • Same-cluster namespace binding (decision #122)
  • ZTag/SAPID reservation pre-flight (decision #124)
  • EquipmentId derivation rule (decision #125)
  • Driver type ↔ namespace kind allowed (decision #111)
  • JSON-schema validation per DriverType from DriverTypeRegistry

Acceptance:

  • One unit test per rule, both passing and failing cases
  • Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)

Task B.5 — LiteDB local cache

In Configuration.LocalCache/, implement the LiteDB schema from config-db-schema.md §"Local LiteDB Cache":

public interface ILocalConfigCache
{
    Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
    Task PutAsync(GenerationCacheEntry entry);
    Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
}

Acceptance:

  • Round-trip test: write a generation snapshot, read it back, assert deep equality
  • Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
  • Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error

Task B.6 — Generation-diff application logic

In Configuration.Apply/, implement the diff-and-apply logic that runs on each node when a new generation arrives:

public interface IGenerationApplier
{
    Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
}

Diff per entity type, dispatch to driver Reinitialize / cache flush as needed.

Acceptance:

  • Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → Added for each
  • Diff test: from = (above), to = same with one tag's Name changed → Modified for one tag, no other changes
  • Diff test: from = (above), to = same with one equipment removed → Removed for the equipment + cascading Removed for its tags
  • Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry

Stream C — Core project (1 week, can parallel with Stream D)

Task C.1 — Rename LmxNodeManagerGenericDriverNodeManager

Per plan.md §5a:

  • Lift the file from Host/OpcUa/LmxNodeManager.cs to Core/OpcUa/GenericDriverNodeManager.cs
  • Swap IMxAccessClient for IDriver (composing IReadable / IWritable / ISubscribable)
  • Swap GalaxyAttributeInfo for DriverAttributeInfo
  • Promote GalaxyRuntimeProbeManager interactions to use IHostConnectivityProbe
  • Move MxDataTypeMapper and SecurityClassificationMapper to a new Driver.Galaxy.Mapping/ (still in legacy Host until Phase 2)

Acceptance:

  • v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
  • Reflection test asserts GenericDriverNodeManager has no static or instance reference to any Galaxy-specific type

Task C.2 — Derive GalaxyNodeManager : GenericDriverNodeManager (legacy in-process)

In the existing Host project, add a thin GalaxyNodeManager that:

  • Inherits from GenericDriverNodeManager
  • Wires up MxDataTypeMapper, SecurityClassificationMapper, the probe manager, etc.
  • Replaces direct instantiation of the renamed class

Acceptance:

  • v1 IntegrationTests pass identically with GalaxyNodeManager instantiated instead of the old direct class
  • Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)

Task C.3 — IAddressSpaceBuilder API (decision #52)

Implement the streaming builder API drivers use to register nodes:

public interface IAddressSpaceBuilder
{
    IFolderBuilder Folder(string browseName, string displayName);
    IVariableBuilder Variable(string browseName, DriverDataType type, ...);
    void AddProperty(string browseName, object value);
}

Refactor GenericDriverNodeManager.BuildAddressSpace to consume IAddressSpaceBuilder (driver streams in tags rather than buffering them).

Acceptance:

  • Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
  • Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach

Task C.4 — Driver hosting + isolation (decision #65, #74)

Implement the in-process driver host that:

  • Loads each DriverInstance row's driver assembly
  • Catches and contains driver exceptions (driver isolation, decision #12)
  • Surfaces IDriver.Reinitialize() to the configuration applier
  • Tracks per-driver allocation footprint (GetMemoryFootprint() polled every 30s per driver-stability.md)
  • Flushes optional caches on budget breach
  • Marks drivers Faulted (Bad quality on their nodes) if Reinitialize fails

Acceptance:

  • Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
  • Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.

Stream D — Server project (4 days, can parallel with Stream C)

Task D.1 — Microsoft.Extensions.Hosting Windows Service host (decision #30)

Replace TopShelf with Microsoft.Extensions.Hosting:

  • New Program.cs using Host.CreateApplicationBuilder()
  • BackgroundService that owns the OPC UA server lifecycle
  • services.UseWindowsService() registers as a Windows service
  • Configuration bootstrap from appsettings.json (NodeId + ClusterId + DB conn) per decision #18

Acceptance:

  • dotnet run runs interactively (console mode)
  • Installed as a Windows Service (sc create OtOpcUa ...), starts and stops cleanly
  • Service install + uninstall cycle leaves no leftover state

Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)

On startup:

  • Read Cluster.NodeId + Cluster.ClusterId + ConfigDatabase.ConnectionString from appsettings.json
  • Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
  • Call sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId) — the proc verifies the connected principal is bound to NodeId
  • If proc rejects → fail startup loudly with the principal mismatch message

Acceptance:

  • Test: principal bound to Node A boots successfully when configured with NodeId = A
  • Test: principal bound to Node A configured with NodeId = B → startup fails with Unauthorized and the service does not stay running
  • Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → Forbidden

Task D.3 — LiteDB cache fallback on DB outage

If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.

Acceptance:

  • Test: with central DB unreachable, node starts from cache, logs ConfigDbUnreachableUsingCache event, OPC UA endpoint serves the cached config
  • Test: cache empty AND central DB unreachable → startup fails with NoConfigAvailable (decision #21)

Stream E — Admin project (2.5 weeks)

Copy the project layout from scadalink-design/src/ScadaLink.CentralUI/ (decision #104):

  • src/ZB.MOM.WW.OtOpcUa.Admin/: Razor Components project, .NET 10, AddInteractiveServerComponents
  • Auth/AuthEndpoints.cs, Auth/CookieAuthenticationStateProvider.cs
  • Components/Layout/MainLayout.razor, Components/Layout/NavMenu.razor
  • Components/Pages/Login.razor, Components/Pages/Dashboard.razor
  • Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor
  • EndpointExtensions.cs, ServiceCollectionExtensions.cs

Plus src/ZB.MOM.WW.OtOpcUa.Admin.Security/ (decision #104): LdapAuthService, RoleMapper, JwtTokenService, AuthorizationPolicies mirroring ScadaLink.Security.

Acceptance:

  • App builds and runs locally
  • Login page renders with OtOpcUa branding (only the <h4> text differs from ScadaLink)
  • Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)

Wire up LdapAuthService against the dev GLAuth instance per Security.md. Map LDAP groups to admin roles:

  • OtOpcUaAdminsFleetAdmin
  • OtOpcUaConfigEditorsConfigEditor
  • OtOpcUaViewersReadOnly

Plus cluster-scoped grants per decision #105 (LDAP group OtOpcUaConfigEditors-LINE3ConfigEditor + ClusterId = LINE3-OPCUA claim).

Acceptance:

  • Login as a FleetAdmin-mapped user → redirected to /, sidebar shows admin sections
  • Login as a ReadOnly-mapped user → redirected to /, sidebar shows view-only sections
  • Login as a cluster-scoped ConfigEditor → only their permitted clusters appear in /clusters
  • Login with bad credentials → redirected to /login?error=... with the LDAP error surfaced

Task E.3 — Cluster CRUD pages

Implement per admin-ui.md:

  • /clusters — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
  • /clusters/{ClusterId} — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
  • "New cluster" workflow per admin-ui.md §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
  • ApplicationUri auto-suggest on node create per decision #86

Acceptance:

  • Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
  • Edit cluster name → change reflected in list + detail
  • Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge

Task E.4 — Draft → diff → publish workflow (decision #89)

Implement per admin-ui.md §"Draft Editor", §"Diff Viewer", §"Generation History":

  • /clusters/{Id}/draft — full draft editor with auto-save (debounced 500ms per decision #97)
  • /clusters/{Id}/draft/diff — three-column diff viewer
  • /clusters/{Id}/generations — list of historical generations with rollback action
  • Live sp_ValidateDraft invocation in the validation panel; publish disabled while errors exist
  • Publish dialog requires Notes; runs sp_PublishGeneration in a transaction

Acceptance:

  • Create draft → validation panel runs and shows clean state for empty draft
  • Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
  • Fix the row → validation panel goes green + publish enables
  • Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
  • Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
  • The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label

Task E.5 — UNS Structure + Equipment + Namespace tabs

Implement the three hybrid tabs:

  • Namespaces tab — list with click-to-edit-in-draft
  • UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
  • Equipment tab — list with default sort by ZTag, search across all 5 identifiers

CSV import for Equipment per the revised schema in admin-ui.md (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).

Acceptance:

  • Add a UnsArea via draft → publishes → appears in tree
  • Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
  • Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against ExternalIdReservation (decision #124)
  • Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields

Task E.6 — Generic JSON config editor for DriverConfig

Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against DriverTypeRegistry's registered JSON schema for the driver type.

Acceptance:

  • Add a Galaxy DriverInstance in a draft → JSON editor renders the Galaxy DriverConfig schema
  • Editing produces live validation errors per the schema
  • Saving with errors → publish stays disabled

Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")

Two SignalR hubs:

  • FleetStatusHub — pushes ClusterNodeGenerationState changes
  • AlertHub — pushes new sticky alerts (crash-loop circuit trips, failed applies)

Backend IHostedService polls every 5s and diffs.

Acceptance:

  • Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
  • Simulate a LastAppliedStatus = Failed for a node → AlertHub pushes a sticky alert that doesn't auto-clear

Task E.8 — Release reservation + Merge equipment workflows

Per admin-ui.md §"Release an external-ID reservation" and §"Merge or rebind equipment":

  • Release flow: FleetAdmin only, requires reason, audit-logged via sp_ReleaseExternalIdReservation
  • Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs

Acceptance:

  • Release a reservation → ReleasedAt set in DB + audit log entry created with reason
  • After release: same (Kind, Value) can be reserved by a different EquipmentUuid in a future publish
  • Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with EquipmentMergedAway audit entry

Compliance Checks (run at exit gate)

A phase-1-compliance.ps1 script that exits non-zero on any failure:

Schema compliance

# Run all migrations against a clean SQL Server instance
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."

# Run schema-introspection tests
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"

Expected: every table, column, index, FK, CHECK, and stored procedure in config-db-schema.md is present and matches.

Decision compliance

# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
# verify at least one citation exists in source, tests, or migrations:
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
foreach ($d in $decisions) {
    $hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
    if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
}

Visual compliance (Admin UI)

Manual screenshot review:

  1. Login page side-by-side with ScadaLink's Login.razor rendered
  2. Sidebar + main layout side-by-side with ScadaLink's MainLayout.razor + NavMenu.razor
  3. Dashboard side-by-side with ScadaLink's Dashboard.razor
  4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink

Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.

Behavioral compliance (end-to-end smoke test)

dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"

The smoke test:

  1. Spins up SQL Server in a container
  2. Runs all migrations
  3. Creates a OtOpcUaAdmin SQL principal + OtOpcUaNode principal bound to a test NodeId
  4. Starts the Admin app
  5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
  6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
  7. Publishes the draft
  8. Boots a Server instance configured with the test NodeId
  9. Asserts the Server fetched the published generation via sp_GetCurrentGenerationForCluster
  10. Asserts the Server's ClusterNodeGenerationState row reports Applied
  11. Adds a tag in a new draft, publishes
  12. Asserts the Server picks up the new generation within 30s (next poll)
  13. Rolls back to generation 1
  14. Asserts the Server picks up the rollback within 30s

Expected: all 14 steps pass. Smoke test runs in CI on every PR to v2/phase-1-* branches.

Stability compliance

For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):

  • IDriver.Reinitialize() semantics tested
  • Driver-instance allocation tracking + cache flush tested with a mock driver
  • Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize

Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.

Documentation compliance

# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
# Schema doc + admin-ui doc must be updated if implementation deviated

Completion Checklist

The exit gate signs off only when every item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).

Stream A — Core.Abstractions

  • All 11 capability interfaces defined and compiling
  • DriverAttributeInfo + supporting enums defined
  • DriverTypeRegistry implemented with Galaxy registration
  • Interface-independence reflection test passes

Stream B — Configuration

  • EF Core migration InitialSchema applies cleanly to a clean SQL Server
  • Schema introspection test asserts the live schema matches config-db-schema.md
  • All stored procedures present and tested (happy path + error paths)
  • sp_PublishGeneration concurrency test passes (one wins, one fails)
  • Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
  • All 12 validation rules in Configuration.Validation have unit tests
  • LiteDB cache round-trip + pruning + corruption tests pass
  • Generation-diff applier handles add/remove/modify across all entity types

Stream C — Core

  • LmxNodeManager renamed to GenericDriverNodeManager; v1 IntegrationTests still pass
  • GalaxyNodeManager : GenericDriverNodeManager exists in legacy Host
  • IAddressSpaceBuilder API implemented; byte-equivalent OPC UA browse output to v1
  • Driver hosting + isolation tested with mock drivers (one fails, others continue)
  • Memory-budget cache-flush tested with mock driver

Stream D — Server

  • Microsoft.Extensions.Hosting host runs in console mode and as Windows Service
  • TopShelf removed from the codebase
  • Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
  • LiteDB fallback on DB outage tested

Stream E — Admin

  • Admin app boots, login screen renders with ScadaLink-equivalent visual
  • LDAP cookie auth works against dev GLAuth
  • Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
  • Cluster-scoped grants work (decision #105)
  • Cluster CRUD works end-to-end
  • Draft → diff → publish workflow works end-to-end
  • Rollback works end-to-end
  • UNS Structure tab supports add / rename / drag-move with impact preview
  • Equipment tab supports CSV import + search across 5 identifiers
  • Generic JSON config editor renders + validates DriverConfig per registered schema
  • SignalR real-time updates work (multi-tab test)
  • Release reservation flow works + audit-logged
  • Merge equipment flow works + audit-logged

Cross-cutting

  • phase-1-compliance.ps1 runs and exits 0
  • Smoke test (14 steps) passes in CI
  • Visual compliance review signed off (operator-equivalence test)
  • All decisions cited in code/tests (git grep "decision #N" returns hits for each)
  • Adversarial review of the phase diff (/codex:adversarial-review --base v2) — findings closed or deferred with rationale
  • PR opened against v2, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
  • Reviewer signoff (one reviewer beyond the implementation lead)
  • exit-gate-phase-1.md recorded

Risks and Mitigations

Risk Likelihood Impact Mitigation
EF Core 10 idiosyncrasies vs the documented schema Medium Medium Schema-introspection test catches drift; validate early in Stream B
sp_ValidateDraft cross-table checks complex enough to be slow Medium Medium Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit
Visual parity with ScadaLink slips because two component libraries diverge over time Low Medium Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical
LDAP integration breaks against production GLAuth (different schema than dev) Medium High Use the v1 LDAP layer as the integration reference; mirror its config exactly
Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) High High Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state
ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different Medium Medium Side-by-side review of RoleMapper after Stream E starts; refactor if claim shape diverges
Phase 1 takes longer than 6 weeks High Medium Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.58 to a Phase 1.5 follow-up
MERGE against ExternalIdReservation has a deadlock pathology under concurrent publishes Medium High Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to INSERT ... WHERE NOT EXISTS with explicit row locks

Out of Scope (do not do in Phase 1)

  • Galaxy out-of-process split (Phase 2)
  • Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 35)
  • Per-driver custom config editors in Admin (each driver's phase)
  • Equipment-class template integration with the schemas repo
  • Consumer cutover (Phases 68, separate planning track)
  • ACL / namespace-level authorization for OPC UA clients (corrections doc B1 — needs scoping before Phase 6, parallel work track)
  • Push-from-DB notification (decision #96 — v2.1)
  • Generation pruning operator UI (decision #93 — v2.1)
  • Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
  • Mobile / tablet layout