Files
lmxopcua/docs/v2/implementation/phase-1-configuration-and-admin-scaffold.md
Joseph Doherty 1189dc87fd Close corrections-doc E2 (Aveva System Platform IO upstream-OPC-UA pattern verification) with GREEN-YELLOW verdict (decision #141) — AVEVA's OI Gateway communication driver is the documented path for AppServer to consume from arbitrary upstream OPC UA servers; multiple AVEVA partners (Software Toolbox, InSource) have published end-to-end integrations against four different non-AVEVA upstream servers (TOP Server, OPC Router, OmniServer, Cogent DataHub). No re-architecting of OtOpcUa required. Path: OPC UA node → OI Gateway → SuiteLink → $DDESuiteLinkDIObject → AppServer attribute. Recommended AppServer floor: System Platform 2023 R2 Patch 01.
OtOpcUa-side requirements all met or trivially met by v2: Basic256Sha256 + SignAndEncrypt + username token (transport security covers this), reject-and-trust cert workflow, endpoint URL must NOT include /discovery suffix (forum-documented failure mode), hostname-stable certs (decision #86 already enforces this since clients pin trust to ApplicationUri), OI Gateway service must NOT run under SYSTEM (deployment-guide concern). Two integrator-burden risks tracked: validation/GxP paperwork (no AVEVA blueprint exists for non-AVEVA upstream servers in Part 11 deployments — engage QA/regulatory in Year 1) and unpublished scale benchmarks (in-house benchmark required in Year 2 before cutover scheduling).

Phase 1 acceptance gains Task E.10 (decision #142): end-to-end AppServer-via-OI-Gateway smoke test against a Phase 1 OtOpcUa instance, catching AppServer-specific quirks (cert exchange, endpoint URL handling, service account, security mode combo) well before the Year 3 tier-3 cutover schedule. Non-blocking for Phase 1 exit if it surfaces only documentation-level fixes; blocking if it surfaces architectural incompatibility.

New file `docs/v2/aveva-system-platform-io-research.md` captures the full research with all source citations (AVEVA docs, Communications Drivers Pack readmes, Software Toolbox / InSource partner walkthroughs, Inductive Automation forum failure-mode reports). Plan.md decision log gains #141 and #142; Reference Documents section links the new doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:27:13 -04:00

37 KiB
Raw Blame History

Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold

Status: DRAFT — implementation plan for Phase 1 of the v2 build (plan.md §6).

Branch: v2/phase-1-configuration Estimated duration: 46 weeks (largest greenfield phase; most foundational) Predecessor: Phase 0 (phase-0-rename-and-net10.md) Successor: Phase 2 (Galaxy parity refactor)

Phase Objective

Stand up the central configuration substrate for the v2 fleet:

  1. Core.Abstractions project — driver capability interfaces (IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe, IDriverConfigEditor, DriverAttributeInfo)
  2. Configuration project — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
  3. Core projectGenericDriverNodeManager (renamed from LmxNodeManager), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via IAddressSpaceBuilder
  4. Server projectMicrosoft.Extensions.Hosting-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
  5. Admin project — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)

No driver instances yet (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.

Scope — What Changes

Concern Change
New projects 5 new src projects + 5 matching test projects
Existing v1 Host project Refactored to consume Core.Abstractions interfaces against its existing Galaxy implementation — but not split into Proxy/Host/Shared yet (Phase 2)
LmxNodeManager Renamed to GenericDriverNodeManager in Core, with IDriver swapped in for IMxAccessClient. The existing v1 Host instantiates GalaxyNodeManager : GenericDriverNodeManager (legacy in-process) — see plan.md §5a
Service hosting TopShelf removed; Microsoft.Extensions.Hosting BackgroundService used (decision #30)
Central config DB New SQL Server database OtOpcUaConfig provisioned from EF Core migrations
LDAP authentication for Admin Admin.Security project mirrors ScadaLink.Security; cookie auth + JWT API endpoint
Local LiteDB cache on each node New config_cache.db per node; bootstraps from central DB or cache

Scope — What Does NOT Change

Item Reason
Galaxy out-of-process split Phase 2
Any new driver (Modbus, AB, S7, etc.) Phase 3+
OPC UA wire behavior Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything
Equipment-class template integration with future schemas repo EquipmentClassRef is a nullable hook column; no validation yet (decisions #112, #115)
Per-driver custom config editors in Admin Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases
Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) OUT of v2 scope — separate integration-team track per implementation/overview.md

Entry Gate Checklist

  • Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
  • v2 branch is clean
  • Phase 0 PR merged
  • SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
  • LDAP / GLAuth dev instance available for Admin auth integration testing
  • ScadaLink CentralUI source accessible at C:\Users\dohertj2\Desktop\scadalink-design\ for parity reference
  • All Phase 1-relevant design docs reviewed: plan.md §45, config-db-schema.md (entire), admin-ui.md (entire), driver-stability.md §"Cross-Cutting Protections" (sets context for Core.Abstractions scope)
  • Decisions #1125 read at least skim-level; key ones for Phase 1: #1422, #25, #28, #30, #3233, #4651, #79125

Evidence file: docs/v2/implementation/entry-gate-phase-1.md recording date, signoff, environment availability.

Task Breakdown

Phase 1 is large — broken into 5 work streams (AE) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.

Stream A — Core.Abstractions (1 week)

Task A.1 — Define driver capability interfaces

Create src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ (.NET 10, no dependencies). Define:

public interface IDriver { /* lifecycle, metadata, health */ }
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
public interface IReadable { /* on-demand read */ }
public interface IWritable { /* on-demand write */ }
public interface ISubscribable { /* data change subscriptions */ }
public interface IAlarmSource { /* alarm events + acknowledgment */ }
public interface IHistoryProvider { /* historical reads */ }
public interface IRediscoverable { /* opt-in change-detection signal */ }
public interface IHostConnectivityProbe { /* per-host runtime status */ }
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }

Plus the data models referenced from the interfaces:

public sealed record DriverAttributeInfo(
    string FullName,
    DriverDataType DriverDataType,
    bool IsArray,
    uint? ArrayDim,
    SecurityClassification SecurityClass,
    bool IsHistorized);
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }

Acceptance:

  • All interfaces compile in a project with zero dependencies beyond BCL
  • xUnit test project asserts (via reflection) that no interface returns or accepts a type from Core or Configuration (interface independence per decision #59)
  • Each interface XML doc cites the design decision(s) it implements (e.g. IRediscoverable cites #54)

Task A.2 — Define DriverTypeRegistry

public sealed class DriverTypeRegistry
{
    public DriverTypeMetadata Get(string driverType);
    public IEnumerable<DriverTypeMetadata> All();
}

public sealed record DriverTypeMetadata(
    string TypeName,                                          // "Galaxy" | "ModbusTcp" | ...
    NamespaceKindCompatibility AllowedNamespaceKinds,         // per decision #111
    string DriverConfigJsonSchema,                            // per decision #91
    string DeviceConfigJsonSchema,                            // optional
    string TagConfigJsonSchema);

[Flags]
public enum NamespaceKindCompatibility
{
    Equipment = 1, SystemPlatform = 2, Simulated = 4
}

In v2.0 v1 only registers the Galaxy type (AllowedNamespaceKinds = SystemPlatform). Phase 3+ extends.

Acceptance:

  • Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
  • Galaxy registration entry exists with AllowedNamespaceKinds = SystemPlatform per decision #111

Stream B — Configuration project (1.5 weeks)

Task B.1 — EF Core schema + initial migration

Create src/ZB.MOM.WW.OtOpcUa.Configuration/ (.NET 10, EF Core 10).

Implement DbContext with entities matching config-db-schema.md exactly:

  • ServerCluster, ClusterNode, ClusterNodeCredential
  • Namespace (generation-versioned per decision #123)
  • UnsArea, UnsLine
  • ConfigGeneration
  • DriverInstance, Device, Equipment, Tag, PollGroup
  • NodeAcl (generation-versioned per decision #130; data-path authorization grants per acl-design.md)
  • ClusterNodeGenerationState, ConfigAuditLog
  • ExternalIdReservation (NOT generation-versioned per decision #124)

Generate the initial migration:

dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration

Acceptance:

  • Apply migration to a clean SQL Server instance produces the schema in config-db-schema.md
  • Schema-validation test (SchemaComplianceTests) introspects the live DB and asserts every table/column/index/constraint matches the doc
  • Test runs in CI against a SQL Server container

Task B.2 — Stored procedures via MigrationBuilder.Sql

Add stored procedures from config-db-schema.md §"Stored Procedures":

  • sp_GetCurrentGenerationForCluster
  • sp_GetGenerationContent
  • sp_RegisterNodeGenerationApplied
  • sp_PublishGeneration (with the MERGE against ExternalIdReservation per decision #124)
  • sp_RollbackToGeneration
  • sp_ValidateDraft (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
  • sp_ComputeGenerationDiff
  • sp_ReleaseExternalIdReservation (FleetAdmin only)

Use CREATE OR ALTER style in MigrationBuilder.Sql() blocks so procs version with the schema.

Acceptance:

  • Each proc has at least one xUnit test exercising the happy path + at least one error path
  • sp_PublishGeneration has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
  • sp_GetCurrentGenerationForCluster has an authorization test: caller bound to NodeId X cannot read cluster Y's generation

Task B.3 — Authorization model (SQL principals + GRANT)

Add a separate migration AuthorizationGrants that:

  • Creates two SQL roles: OtOpcUaNode, OtOpcUaAdmin
  • Grants EXECUTE on the appropriate procs per config-db-schema.md §"Authorization Model"
  • Grants no direct table access to either role

Acceptance:

  • Test that runs as a OtOpcUaNode-roled principal can only call the node procs, not admin procs
  • Test that runs as a OtOpcUaAdmin-roled principal can call publish/rollback procs
  • Test that direct SELECT * FROM dbo.ConfigGeneration from a OtOpcUaNode principal is denied

Task B.4 — JSON-schema validators (managed code)

In Configuration.Validation/, implement validators consumed by sp_ValidateDraft (called from the Admin app pre-publish per decision #91):

  • UNS segment regex (^[a-z0-9-]{1,32}$ or _default)
  • Path length (≤200 chars)
  • UUID immutability across generations
  • Same-cluster namespace binding (decision #122)
  • ZTag/SAPID reservation pre-flight (decision #124)
  • EquipmentId derivation rule (decision #125)
  • Driver type ↔ namespace kind allowed (decision #111)
  • JSON-schema validation per DriverType from DriverTypeRegistry

Acceptance:

  • One unit test per rule, both passing and failing cases
  • Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)

Task B.5 — LiteDB local cache

In Configuration.LocalCache/, implement the LiteDB schema from config-db-schema.md §"Local LiteDB Cache":

public interface ILocalConfigCache
{
    Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
    Task PutAsync(GenerationCacheEntry entry);
    Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
}

Acceptance:

  • Round-trip test: write a generation snapshot, read it back, assert deep equality
  • Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
  • Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error

Task B.6 — Generation-diff application logic

In Configuration.Apply/, implement the diff-and-apply logic that runs on each node when a new generation arrives:

public interface IGenerationApplier
{
    Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
}

Diff per entity type, dispatch to driver Reinitialize / cache flush as needed.

Acceptance:

  • Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → Added for each
  • Diff test: from = (above), to = same with one tag's Name changed → Modified for one tag, no other changes
  • Diff test: from = (above), to = same with one equipment removed → Removed for the equipment + cascading Removed for its tags
  • Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry

Stream C — Core project (1 week, can parallel with Stream D)

Task C.1 — Rename LmxNodeManagerGenericDriverNodeManager

Per plan.md §5a:

  • Lift the file from Host/OpcUa/LmxNodeManager.cs to Core/OpcUa/GenericDriverNodeManager.cs
  • Swap IMxAccessClient for IDriver (composing IReadable / IWritable / ISubscribable)
  • Swap GalaxyAttributeInfo for DriverAttributeInfo
  • Promote GalaxyRuntimeProbeManager interactions to use IHostConnectivityProbe
  • Move MxDataTypeMapper and SecurityClassificationMapper to a new Driver.Galaxy.Mapping/ (still in legacy Host until Phase 2)

Acceptance:

  • v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
  • Reflection test asserts GenericDriverNodeManager has no static or instance reference to any Galaxy-specific type

Task C.2 — Derive GalaxyNodeManager : GenericDriverNodeManager (legacy in-process)

In the existing Host project, add a thin GalaxyNodeManager that:

  • Inherits from GenericDriverNodeManager
  • Wires up MxDataTypeMapper, SecurityClassificationMapper, the probe manager, etc.
  • Replaces direct instantiation of the renamed class

Acceptance:

  • v1 IntegrationTests pass identically with GalaxyNodeManager instantiated instead of the old direct class
  • Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)

Task C.3 — IAddressSpaceBuilder API (decision #52)

Implement the streaming builder API drivers use to register nodes:

public interface IAddressSpaceBuilder
{
    IFolderBuilder Folder(string browseName, string displayName);
    IVariableBuilder Variable(string browseName, DriverDataType type, ...);
    void AddProperty(string browseName, object value);
}

Refactor GenericDriverNodeManager.BuildAddressSpace to consume IAddressSpaceBuilder (driver streams in tags rather than buffering them).

Acceptance:

  • Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
  • Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach

Task C.4 — Driver hosting + isolation (decision #65, #74)

Implement the in-process driver host that:

  • Loads each DriverInstance row's driver assembly
  • Catches and contains driver exceptions (driver isolation, decision #12)
  • Surfaces IDriver.Reinitialize() to the configuration applier
  • Tracks per-driver allocation footprint (GetMemoryFootprint() polled every 30s per driver-stability.md)
  • Flushes optional caches on budget breach
  • Marks drivers Faulted (Bad quality on their nodes) if Reinitialize fails

Acceptance:

  • Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
  • Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.

Stream D — Server project (4 days, can parallel with Stream C)

Task D.1 — Microsoft.Extensions.Hosting Windows Service host (decision #30)

Replace TopShelf with Microsoft.Extensions.Hosting:

  • New Program.cs using Host.CreateApplicationBuilder()
  • BackgroundService that owns the OPC UA server lifecycle
  • services.UseWindowsService() registers as a Windows service
  • Configuration bootstrap from appsettings.json (NodeId + ClusterId + DB conn) per decision #18

Acceptance:

  • dotnet run runs interactively (console mode)
  • Installed as a Windows Service (sc create OtOpcUa ...), starts and stops cleanly
  • Service install + uninstall cycle leaves no leftover state

Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)

On startup:

  • Read Cluster.NodeId + Cluster.ClusterId + ConfigDatabase.ConnectionString from appsettings.json
  • Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
  • Call sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId) — the proc verifies the connected principal is bound to NodeId
  • If proc rejects → fail startup loudly with the principal mismatch message

Acceptance:

  • Test: principal bound to Node A boots successfully when configured with NodeId = A
  • Test: principal bound to Node A configured with NodeId = B → startup fails with Unauthorized and the service does not stay running
  • Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → Forbidden

Task D.3 — LiteDB cache fallback on DB outage

If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.

Acceptance:

  • Test: with central DB unreachable, node starts from cache, logs ConfigDbUnreachableUsingCache event, OPC UA endpoint serves the cached config
  • Test: cache empty AND central DB unreachable → startup fails with NoConfigAvailable (decision #21)

Stream E — Admin project (2.5 weeks)

Copy the project layout from scadalink-design/src/ScadaLink.CentralUI/ (decision #104):

  • src/ZB.MOM.WW.OtOpcUa.Admin/: Razor Components project, .NET 10, AddInteractiveServerComponents
  • Auth/AuthEndpoints.cs, Auth/CookieAuthenticationStateProvider.cs
  • Components/Layout/MainLayout.razor, Components/Layout/NavMenu.razor
  • Components/Pages/Login.razor, Components/Pages/Dashboard.razor
  • Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor
  • EndpointExtensions.cs, ServiceCollectionExtensions.cs

Plus src/ZB.MOM.WW.OtOpcUa.Admin.Security/ (decision #104): LdapAuthService, RoleMapper, JwtTokenService, AuthorizationPolicies mirroring ScadaLink.Security.

Acceptance:

  • App builds and runs locally
  • Login page renders with OtOpcUa branding (only the <h4> text differs from ScadaLink)
  • Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)

Wire up LdapAuthService against the dev GLAuth instance per Security.md. Map LDAP groups to admin roles:

  • OtOpcUaAdminsFleetAdmin
  • OtOpcUaConfigEditorsConfigEditor
  • OtOpcUaViewersReadOnly

Plus cluster-scoped grants per decision #105 (LDAP group OtOpcUaConfigEditors-LINE3ConfigEditor + ClusterId = LINE3-OPCUA claim).

Acceptance:

  • Login as a FleetAdmin-mapped user → redirected to /, sidebar shows admin sections
  • Login as a ReadOnly-mapped user → redirected to /, sidebar shows view-only sections
  • Login as a cluster-scoped ConfigEditor → only their permitted clusters appear in /clusters
  • Login with bad credentials → redirected to /login?error=... with the LDAP error surfaced

Task E.3 — Cluster CRUD pages

Implement per admin-ui.md:

  • /clusters — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
  • /clusters/{ClusterId} — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
  • "New cluster" workflow per admin-ui.md §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
  • ApplicationUri auto-suggest on node create per decision #86

Acceptance:

  • Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
  • Edit cluster name → change reflected in list + detail
  • Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge

Task E.4 — Draft → diff → publish workflow (decision #89)

Implement per admin-ui.md §"Draft Editor", §"Diff Viewer", §"Generation History":

  • /clusters/{Id}/draft — full draft editor with auto-save (debounced 500ms per decision #97)
  • /clusters/{Id}/draft/diff — three-column diff viewer
  • /clusters/{Id}/generations — list of historical generations with rollback action
  • Live sp_ValidateDraft invocation in the validation panel; publish disabled while errors exist
  • Publish dialog requires Notes; runs sp_PublishGeneration in a transaction

Acceptance:

  • Create draft → validation panel runs and shows clean state for empty draft
  • Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
  • Fix the row → validation panel goes green + publish enables
  • Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
  • Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
  • The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label

Task E.5 — UNS Structure + Equipment + Namespace tabs

Implement the three hybrid tabs:

  • Namespaces tab — list with click-to-edit-in-draft
  • UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
  • Equipment tab — list with default sort by ZTag, search across all 5 identifiers

CSV import for Equipment per the revised schema in admin-ui.md (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).

Acceptance:

  • Add a UnsArea via draft → publishes → appears in tree
  • Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
  • Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against ExternalIdReservation (decision #124)
  • Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields

Task E.6 — Generic JSON config editor for DriverConfig

Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against DriverTypeRegistry's registered JSON schema for the driver type.

Acceptance:

  • Add a Galaxy DriverInstance in a draft → JSON editor renders the Galaxy DriverConfig schema
  • Editing produces live validation errors per the schema
  • Saving with errors → publish stays disabled

Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")

Two SignalR hubs:

  • FleetStatusHub — pushes ClusterNodeGenerationState changes
  • AlertHub — pushes new sticky alerts (crash-loop circuit trips, failed applies)

Backend IHostedService polls every 5s and diffs.

Acceptance:

  • Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
  • Simulate a LastAppliedStatus = Failed for a node → AlertHub pushes a sticky alert that doesn't auto-clear

Task E.8 — Release reservation + Merge equipment workflows

Per admin-ui.md §"Release an external-ID reservation" and §"Merge or rebind equipment":

  • Release flow: FleetAdmin only, requires reason, audit-logged via sp_ReleaseExternalIdReservation
  • Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs

Acceptance:

  • Release a reservation → ReleasedAt set in DB + audit log entry created with reason
  • After release: same (Kind, Value) can be reserved by a different EquipmentUuid in a future publish
  • Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with EquipmentMergedAway audit entry

Task E.10 — AppServer-via-OI-Gateway end-to-end smoke test (decision #142)

Per aveva-system-platform-io-research.md, the tier-3 (Year 3) cutover redirects AVEVA System Platform's IO layer from direct-equipment connections to consuming OtOpcUa via OI Gateway. Catching AppServer-specific quirks at Phase 1 — well before the cutover schedule — protects the Year 3 timeline and ensures OtOpcUa's transport security model is actually compatible with the most opinionated downstream consumer.

Stand up a non-production AppServer + OtOpcUa pairing and verify:

  1. AppServer (System Platform 2023 R2 Patch 01 or later) installed with the matching Communications Drivers Pack
  2. OI Gateway service runs under a dedicated service account (NOT SYSTEM — known issue per AVEVA 2020 R2 readme)
  3. OtOpcUa endpoint exposed as opc.tcp://{host}:{port} with no /discovery suffix (forum-documented failure mode that produces Bad_SecurityChecksFailed even after cert trust)
  4. Configure an OPCUA Connection in OI Gateway pointing at OtOpcUa with Basic256Sha256 + SignAndEncrypt + LDAP-username token
  5. OI Gateway client cert appears in OtOpcUa's pending-certs folder; admin moves it to Trusted; OtOpcUa server cert trusted on OI Gateway side
  6. Configure an OPCUAGroup with at least one tag from a published OtOpcUa generation
  7. Configure a SuiteLink DI Object in AppServer pointing at the OI Gateway instance
  8. Create an AppServer attribute with IO.SourceAttribute = <SuiteLinkDIObjectName>.<TopicName>.<ItemReference>
  9. Verify the attribute reads end-to-end with quality 0x00C0 (good)
  10. Re-verify after a publish-generation cycle in OtOpcUa (the AppServer attribute must continue reading without manual re-trust)
  11. Capture the full configuration as docs/deployment/aveva-system-platform-integration.md for the future tier-3 cutover team

Acceptance:

  • All 10 connection steps succeed; AppServer reads at least one tag end-to-end with good quality
  • Reconnect after OtOpcUa publish: no manual intervention required
  • Documentation captured for the cutover team
  • Any failure mode that surfaces during the test is either: (a) fixed in OtOpcUa Phase 1, (b) added to Phase 1 known-limitations + escalated to corrections doc, or (c) confirmed as an AppServer / OI Gateway quirk operators must accept

This is non-blocking for Phase 1 exit if the test surfaces only documentation-level fixes. It IS blocking if it surfaces an OtOpcUa-side incompatibility that requires architectural change — that would be a tier-3 cutover risk and should escalate immediately.

Task E.9 — ACLs tab + bulk-grant + permission simulator

Per admin-ui.md Cluster Detail tab #8 ("ACLs") and acl-design.md §"Admin UI":

  • ACLs tab on Cluster Detail with two views ("By LDAP group" + "By scope")
  • Edit grant flow: pick scope, group, permission bundle or per-flag, save to draft
  • Bulk-grant flow: multi-select scope, group, permissions, preview rows that will be created, publish via draft
  • Permission simulator: enter username + LDAP groups → live trie of effective permissions across the cluster's UNS tree
  • Cluster-create workflow seeds the v1-compatibility default ACL set (per decision #131)
  • Banner on Cluster Detail when the cluster's ACL set diverges from the seed

Acceptance:

  • Add an ACL grant via draft → publishes → row in NodeAcl table; appears in both Admin views
  • Bulk grant 10 LDAP groups × 1 permission set across 5 UnsAreas → preview shows 50 rows; publish creates them atomically
  • Simulator: a user in OtOpcUaReadOnly group sees ReadOnly bundle effective at every node in the cluster
  • Simulator: a user in OtOpcUaWriteTune sees Engineer bundle effective; WriteConfigure is denied
  • Cluster-create workflow seeds 5 default ACL grants matching v1 LDAP roles (table in acl-design.md §"Default Permissions")
  • Divergence banner appears when an operator removes any of the seeded grants

Compliance Checks (run at exit gate)

A phase-1-compliance.ps1 script that exits non-zero on any failure:

Schema compliance

# Run all migrations against a clean SQL Server instance
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."

# Run schema-introspection tests
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"

Expected: every table, column, index, FK, CHECK, and stored procedure in config-db-schema.md is present and matches.

Decision compliance

# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
# verify at least one citation exists in source, tests, or migrations:
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
foreach ($d in $decisions) {
    $hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
    if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
}

Visual compliance (Admin UI)

Manual screenshot review:

  1. Login page side-by-side with ScadaLink's Login.razor rendered
  2. Sidebar + main layout side-by-side with ScadaLink's MainLayout.razor + NavMenu.razor
  3. Dashboard side-by-side with ScadaLink's Dashboard.razor
  4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink

Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.

Behavioral compliance (end-to-end smoke test)

dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"

The smoke test:

  1. Spins up SQL Server in a container
  2. Runs all migrations
  3. Creates a OtOpcUaAdmin SQL principal + OtOpcUaNode principal bound to a test NodeId
  4. Starts the Admin app
  5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
  6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
  7. Publishes the draft
  8. Boots a Server instance configured with the test NodeId
  9. Asserts the Server fetched the published generation via sp_GetCurrentGenerationForCluster
  10. Asserts the Server's ClusterNodeGenerationState row reports Applied
  11. Adds a tag in a new draft, publishes
  12. Asserts the Server picks up the new generation within 30s (next poll)
  13. Rolls back to generation 1
  14. Asserts the Server picks up the rollback within 30s

Expected: all 14 steps pass. Smoke test runs in CI on every PR to v2/phase-1-* branches.

Stability compliance

For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):

  • IDriver.Reinitialize() semantics tested
  • Driver-instance allocation tracking + cache flush tested with a mock driver
  • Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize

Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.

Documentation compliance

# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
# Schema doc + admin-ui doc must be updated if implementation deviated

Completion Checklist

The exit gate signs off only when every item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).

Stream A — Core.Abstractions

  • All 11 capability interfaces defined and compiling
  • DriverAttributeInfo + supporting enums defined
  • DriverTypeRegistry implemented with Galaxy registration
  • Interface-independence reflection test passes

Stream B — Configuration

  • EF Core migration InitialSchema applies cleanly to a clean SQL Server
  • Schema introspection test asserts the live schema matches config-db-schema.md
  • All stored procedures present and tested (happy path + error paths)
  • sp_PublishGeneration concurrency test passes (one wins, one fails)
  • Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
  • All 12 validation rules in Configuration.Validation have unit tests
  • LiteDB cache round-trip + pruning + corruption tests pass
  • Generation-diff applier handles add/remove/modify across all entity types

Stream C — Core

  • LmxNodeManager renamed to GenericDriverNodeManager; v1 IntegrationTests still pass
  • GalaxyNodeManager : GenericDriverNodeManager exists in legacy Host
  • IAddressSpaceBuilder API implemented; byte-equivalent OPC UA browse output to v1
  • Driver hosting + isolation tested with mock drivers (one fails, others continue)
  • Memory-budget cache-flush tested with mock driver

Stream D — Server

  • Microsoft.Extensions.Hosting host runs in console mode and as Windows Service
  • TopShelf removed from the codebase
  • Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
  • LiteDB fallback on DB outage tested

Stream E — Admin

  • Admin app boots, login screen renders with ScadaLink-equivalent visual
  • LDAP cookie auth works against dev GLAuth
  • Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
  • Cluster-scoped grants work (decision #105)
  • Cluster CRUD works end-to-end
  • Draft → diff → publish workflow works end-to-end
  • Rollback works end-to-end
  • UNS Structure tab supports add / rename / drag-move with impact preview
  • Equipment tab supports CSV import + search across 5 identifiers
  • Generic JSON config editor renders + validates DriverConfig per registered schema
  • SignalR real-time updates work (multi-tab test)
  • Release reservation flow works + audit-logged
  • Merge equipment flow works + audit-logged

Cross-cutting

  • phase-1-compliance.ps1 runs and exits 0
  • Smoke test (14 steps) passes in CI
  • Visual compliance review signed off (operator-equivalence test)
  • All decisions cited in code/tests (git grep "decision #N" returns hits for each)
  • Adversarial review of the phase diff (/codex:adversarial-review --base v2) — findings closed or deferred with rationale
  • PR opened against v2, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
  • Reviewer signoff (one reviewer beyond the implementation lead)
  • exit-gate-phase-1.md recorded

Risks and Mitigations

Risk Likelihood Impact Mitigation
EF Core 10 idiosyncrasies vs the documented schema Medium Medium Schema-introspection test catches drift; validate early in Stream B
sp_ValidateDraft cross-table checks complex enough to be slow Medium Medium Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit
Visual parity with ScadaLink slips because two component libraries diverge over time Low Medium Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical
LDAP integration breaks against production GLAuth (different schema than dev) Medium High Use the v1 LDAP layer as the integration reference; mirror its config exactly
Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) High High Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state
ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different Medium Medium Side-by-side review of RoleMapper after Stream E starts; refactor if claim shape diverges
Phase 1 takes longer than 6 weeks High Medium Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.58 to a Phase 1.5 follow-up
MERGE against ExternalIdReservation has a deadlock pathology under concurrent publishes Medium High Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to INSERT ... WHERE NOT EXISTS with explicit row locks

Out of Scope (do not do in Phase 1)

  • Galaxy out-of-process split (Phase 2)
  • Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 35)
  • Per-driver custom config editors in Admin (each driver's phase)
  • Equipment-class template integration with the schemas repo
  • Consumer cutover (out of v2 scope, separate integration-team track per implementation/overview.md)
  • Wiring the OPC UA NodeManager to enforce ACLs at runtime (Phase 2+ in each driver phase). Phase 1 ships the NodeAcl table + Admin UI ACL editing + evaluator unit tests; per-driver enforcement lands in each driver's phase per acl-design.md §"Implementation Plan"
  • Push-from-DB notification (decision #96 — v2.1)
  • Generation pruning operator UI (decision #93 — v2.1)
  • Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
  • Mobile / tablet layout