ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list. Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host). Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered. Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
35 KiB
Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold
Status: DRAFT — implementation plan for Phase 1 of the v2 build (
plan.md§6).Branch:
v2/phase-1-configurationEstimated duration: 4–6 weeks (largest greenfield phase; most foundational) Predecessor: Phase 0 (phase-0-rename-and-net10.md) Successor: Phase 2 (Galaxy parity refactor)
Phase Objective
Stand up the central configuration substrate for the v2 fleet:
Core.Abstractionsproject — driver capability interfaces (IDriver,ITagDiscovery,IReadable,IWritable,ISubscribable,IAlarmSource,IHistoryProvider,IRediscoverable,IHostConnectivityProbe,IDriverConfigEditor,DriverAttributeInfo)Configurationproject — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logicCoreproject —GenericDriverNodeManager(renamed fromLmxNodeManager), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration viaIAddressSpaceBuilderServerproject —Microsoft.Extensions.Hosting-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start CoreAdminproject — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)
No driver instances yet (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.
Scope — What Changes
| Concern | Change |
|---|---|
| New projects | 5 new src projects + 5 matching test projects |
| Existing v1 Host project | Refactored to consume Core.Abstractions interfaces against its existing Galaxy implementation — but not split into Proxy/Host/Shared yet (Phase 2) |
LmxNodeManager |
Renamed to GenericDriverNodeManager in Core, with IDriver swapped in for IMxAccessClient. The existing v1 Host instantiates GalaxyNodeManager : GenericDriverNodeManager (legacy in-process) — see plan.md §5a |
| Service hosting | TopShelf removed; Microsoft.Extensions.Hosting BackgroundService used (decision #30) |
| Central config DB | New SQL Server database OtOpcUaConfig provisioned from EF Core migrations |
| LDAP authentication for Admin | Admin.Security project mirrors ScadaLink.Security; cookie auth + JWT API endpoint |
| Local LiteDB cache on each node | New config_cache.db per node; bootstraps from central DB or cache |
Scope — What Does NOT Change
| Item | Reason |
|---|---|
| Galaxy out-of-process split | Phase 2 |
| Any new driver (Modbus, AB, S7, etc.) | Phase 3+ |
| OPC UA wire behavior | Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything |
| Equipment-class template integration with future schemas repo | EquipmentClassRef is a nullable hook column; no validation yet (decisions #112, #115) |
| Per-driver custom config editors in Admin | Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases |
| Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) | OUT of v2 scope — separate integration-team track per implementation/overview.md |
Entry Gate Checklist
- Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
v2branch is clean- Phase 0 PR merged
- SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
- LDAP / GLAuth dev instance available for Admin auth integration testing
- ScadaLink CentralUI source accessible at
C:\Users\dohertj2\Desktop\scadalink-design\for parity reference - All Phase 1-relevant design docs reviewed:
plan.md§4–5,config-db-schema.md(entire),admin-ui.md(entire),driver-stability.md§"Cross-Cutting Protections" (sets context forCore.Abstractionsscope) - Decisions #1–125 read at least skim-level; key ones for Phase 1: #14–22, #25, #28, #30, #32–33, #46–51, #79–125
Evidence file: docs/v2/implementation/entry-gate-phase-1.md recording date, signoff, environment availability.
Task Breakdown
Phase 1 is large — broken into 5 work streams (A–E) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.
Stream A — Core.Abstractions (1 week)
Task A.1 — Define driver capability interfaces
Create src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ (.NET 10, no dependencies). Define:
public interface IDriver { /* lifecycle, metadata, health */ }
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
public interface IReadable { /* on-demand read */ }
public interface IWritable { /* on-demand write */ }
public interface ISubscribable { /* data change subscriptions */ }
public interface IAlarmSource { /* alarm events + acknowledgment */ }
public interface IHistoryProvider { /* historical reads */ }
public interface IRediscoverable { /* opt-in change-detection signal */ }
public interface IHostConnectivityProbe { /* per-host runtime status */ }
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }
Plus the data models referenced from the interfaces:
public sealed record DriverAttributeInfo(
string FullName,
DriverDataType DriverDataType,
bool IsArray,
uint? ArrayDim,
SecurityClassification SecurityClass,
bool IsHistorized);
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }
Acceptance:
- All interfaces compile in a project with zero dependencies beyond BCL
- xUnit test project asserts (via reflection) that no interface returns or accepts a type from
CoreorConfiguration(interface independence per decision #59) - Each interface XML doc cites the design decision(s) it implements (e.g.
IRediscoverablecites #54)
Task A.2 — Define DriverTypeRegistry
public sealed class DriverTypeRegistry
{
public DriverTypeMetadata Get(string driverType);
public IEnumerable<DriverTypeMetadata> All();
}
public sealed record DriverTypeMetadata(
string TypeName, // "Galaxy" | "ModbusTcp" | ...
NamespaceKindCompatibility AllowedNamespaceKinds, // per decision #111
string DriverConfigJsonSchema, // per decision #91
string DeviceConfigJsonSchema, // optional
string TagConfigJsonSchema);
[Flags]
public enum NamespaceKindCompatibility
{
Equipment = 1, SystemPlatform = 2, Simulated = 4
}
In v2.0 v1 only registers the Galaxy type (AllowedNamespaceKinds = SystemPlatform). Phase 3+ extends.
Acceptance:
- Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
- Galaxy registration entry exists with
AllowedNamespaceKinds = SystemPlatformper decision #111
Stream B — Configuration project (1.5 weeks)
Task B.1 — EF Core schema + initial migration
Create src/ZB.MOM.WW.OtOpcUa.Configuration/ (.NET 10, EF Core 10).
Implement DbContext with entities matching config-db-schema.md exactly:
ServerCluster,ClusterNode,ClusterNodeCredentialNamespace(generation-versioned per decision #123)UnsArea,UnsLineConfigGenerationDriverInstance,Device,Equipment,Tag,PollGroupNodeAcl(generation-versioned per decision #130; data-path authorization grants peracl-design.md)ClusterNodeGenerationState,ConfigAuditLogExternalIdReservation(NOT generation-versioned per decision #124)
Generate the initial migration:
dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration
Acceptance:
- Apply migration to a clean SQL Server instance produces the schema in
config-db-schema.md - Schema-validation test (
SchemaComplianceTests) introspects the live DB and asserts every table/column/index/constraint matches the doc - Test runs in CI against a SQL Server container
Task B.2 — Stored procedures via MigrationBuilder.Sql
Add stored procedures from config-db-schema.md §"Stored Procedures":
sp_GetCurrentGenerationForClustersp_GetGenerationContentsp_RegisterNodeGenerationAppliedsp_PublishGeneration(with theMERGEagainstExternalIdReservationper decision #124)sp_RollbackToGenerationsp_ValidateDraft(calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)sp_ComputeGenerationDiffsp_ReleaseExternalIdReservation(FleetAdmin only)
Use CREATE OR ALTER style in MigrationBuilder.Sql() blocks so procs version with the schema.
Acceptance:
- Each proc has at least one xUnit test exercising the happy path + at least one error path
sp_PublishGenerationhas a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable errorsp_GetCurrentGenerationForClusterhas an authorization test: caller bound to NodeId X cannot read cluster Y's generation
Task B.3 — Authorization model (SQL principals + GRANT)
Add a separate migration AuthorizationGrants that:
- Creates two SQL roles:
OtOpcUaNode,OtOpcUaAdmin - Grants EXECUTE on the appropriate procs per
config-db-schema.md§"Authorization Model" - Grants no direct table access to either role
Acceptance:
- Test that runs as a
OtOpcUaNode-roled principal can only call the node procs, not admin procs - Test that runs as a
OtOpcUaAdmin-roled principal can call publish/rollback procs - Test that direct
SELECT * FROM dbo.ConfigGenerationfrom aOtOpcUaNodeprincipal is denied
Task B.4 — JSON-schema validators (managed code)
In Configuration.Validation/, implement validators consumed by sp_ValidateDraft (called from the Admin app pre-publish per decision #91):
- UNS segment regex (
^[a-z0-9-]{1,32}$or_default) - Path length (≤200 chars)
- UUID immutability across generations
- Same-cluster namespace binding (decision #122)
- ZTag/SAPID reservation pre-flight (decision #124)
- EquipmentId derivation rule (decision #125)
- Driver type ↔ namespace kind allowed (decision #111)
- JSON-schema validation per
DriverTypefromDriverTypeRegistry
Acceptance:
- One unit test per rule, both passing and failing cases
- Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)
Task B.5 — LiteDB local cache
In Configuration.LocalCache/, implement the LiteDB schema from config-db-schema.md §"Local LiteDB Cache":
public interface ILocalConfigCache
{
Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
Task PutAsync(GenerationCacheEntry entry);
Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
}
Acceptance:
- Round-trip test: write a generation snapshot, read it back, assert deep equality
- Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
- Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error
Task B.6 — Generation-diff application logic
In Configuration.Apply/, implement the diff-and-apply logic that runs on each node when a new generation arrives:
public interface IGenerationApplier
{
Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
}
Diff per entity type, dispatch to driver Reinitialize / cache flush as needed.
Acceptance:
- Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) →
Addedfor each - Diff test: from = (above), to = same with one tag's
Namechanged →Modifiedfor one tag, no other changes - Diff test: from = (above), to = same with one equipment removed →
Removedfor the equipment + cascadingRemovedfor its tags - Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry
Stream C — Core project (1 week, can parallel with Stream D)
Task C.1 — Rename LmxNodeManager → GenericDriverNodeManager
Per plan.md §5a:
- Lift the file from
Host/OpcUa/LmxNodeManager.cstoCore/OpcUa/GenericDriverNodeManager.cs - Swap
IMxAccessClientforIDriver(composingIReadable/IWritable/ISubscribable) - Swap
GalaxyAttributeInfoforDriverAttributeInfo - Promote
GalaxyRuntimeProbeManagerinteractions to useIHostConnectivityProbe - Move
MxDataTypeMapperandSecurityClassificationMapperto a newDriver.Galaxy.Mapping/(still in legacy Host until Phase 2)
Acceptance:
- v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
- Reflection test asserts
GenericDriverNodeManagerhas no static or instance reference to any Galaxy-specific type
Task C.2 — Derive GalaxyNodeManager : GenericDriverNodeManager (legacy in-process)
In the existing Host project, add a thin GalaxyNodeManager that:
- Inherits from
GenericDriverNodeManager - Wires up
MxDataTypeMapper,SecurityClassificationMapper, the probe manager, etc. - Replaces direct instantiation of the renamed class
Acceptance:
- v1 IntegrationTests pass identically with
GalaxyNodeManagerinstantiated instead of the old direct class - Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)
Task C.3 — IAddressSpaceBuilder API (decision #52)
Implement the streaming builder API drivers use to register nodes:
public interface IAddressSpaceBuilder
{
IFolderBuilder Folder(string browseName, string displayName);
IVariableBuilder Variable(string browseName, DriverDataType type, ...);
void AddProperty(string browseName, object value);
}
Refactor GenericDriverNodeManager.BuildAddressSpace to consume IAddressSpaceBuilder (driver streams in tags rather than buffering them).
Acceptance:
- Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
- Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach
Task C.4 — Driver hosting + isolation (decision #65, #74)
Implement the in-process driver host that:
- Loads each
DriverInstancerow's driver assembly - Catches and contains driver exceptions (driver isolation, decision #12)
- Surfaces
IDriver.Reinitialize()to the configuration applier - Tracks per-driver allocation footprint (
GetMemoryFootprint()polled every 30s perdriver-stability.md) - Flushes optional caches on budget breach
- Marks drivers
Faulted(Bad quality on their nodes) ifReinitializefails
Acceptance:
- Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
- Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.
Stream D — Server project (4 days, can parallel with Stream C)
Task D.1 — Microsoft.Extensions.Hosting Windows Service host (decision #30)
Replace TopShelf with Microsoft.Extensions.Hosting:
- New
Program.csusingHost.CreateApplicationBuilder() BackgroundServicethat owns the OPC UA server lifecycleservices.UseWindowsService()registers as a Windows service- Configuration bootstrap from
appsettings.json(NodeId + ClusterId + DB conn) per decision #18
Acceptance:
dotnet runruns interactively (console mode)- Installed as a Windows Service (
sc create OtOpcUa ...), starts and stops cleanly - Service install + uninstall cycle leaves no leftover state
Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)
On startup:
- Read
Cluster.NodeId+Cluster.ClusterId+ConfigDatabase.ConnectionStringfromappsettings.json - Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
- Call
sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId)— the proc verifies the connected principal is bound to NodeId - If proc rejects → fail startup loudly with the principal mismatch message
Acceptance:
- Test: principal bound to Node A boots successfully when configured with NodeId = A
- Test: principal bound to Node A configured with NodeId = B → startup fails with
Unauthorizedand the service does not stay running - Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 →
Forbidden
Task D.3 — LiteDB cache fallback on DB outage
If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.
Acceptance:
- Test: with central DB unreachable, node starts from cache, logs
ConfigDbUnreachableUsingCacheevent, OPC UA endpoint serves the cached config - Test: cache empty AND central DB unreachable → startup fails with
NoConfigAvailable(decision #21)
Stream E — Admin project (2.5 weeks)
Task E.1 — Project scaffold mirroring ScadaLink CentralUI (decision #102)
Copy the project layout from scadalink-design/src/ScadaLink.CentralUI/ (decision #104):
src/ZB.MOM.WW.OtOpcUa.Admin/: Razor Components project, .NET 10,AddInteractiveServerComponentsAuth/AuthEndpoints.cs,Auth/CookieAuthenticationStateProvider.csComponents/Layout/MainLayout.razor,Components/Layout/NavMenu.razorComponents/Pages/Login.razor,Components/Pages/Dashboard.razorComponents/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razorEndpointExtensions.cs,ServiceCollectionExtensions.cs
Plus src/ZB.MOM.WW.OtOpcUa.Admin.Security/ (decision #104): LdapAuthService, RoleMapper, JwtTokenService, AuthorizationPolicies mirroring ScadaLink.Security.
Acceptance:
- App builds and runs locally
- Login page renders with OtOpcUa branding (only the
<h4>text differs from ScadaLink) - Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)
Task E.2 — Bootstrap LDAP + cookie auth + admin role mapping
Wire up LdapAuthService against the dev GLAuth instance per Security.md. Map LDAP groups to admin roles:
OtOpcUaAdmins→FleetAdminOtOpcUaConfigEditors→ConfigEditorOtOpcUaViewers→ReadOnly
Plus cluster-scoped grants per decision #105 (LDAP group OtOpcUaConfigEditors-LINE3 → ConfigEditor + ClusterId = LINE3-OPCUA claim).
Acceptance:
- Login as a
FleetAdmin-mapped user → redirected to/, sidebar shows admin sections - Login as a
ReadOnly-mapped user → redirected to/, sidebar shows view-only sections - Login as a cluster-scoped
ConfigEditor→ only their permitted clusters appear in/clusters - Login with bad credentials → redirected to
/login?error=...with the LDAP error surfaced
Task E.3 — Cluster CRUD pages
Implement per admin-ui.md:
/clusters— Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)/clusters/{ClusterId}— Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)- "New cluster" workflow per
admin-ui.md§"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123) - ApplicationUri auto-suggest on node create per decision #86
Acceptance:
- Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
- Edit cluster name → change reflected in list + detail
- Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge
Task E.4 — Draft → diff → publish workflow (decision #89)
Implement per admin-ui.md §"Draft Editor", §"Diff Viewer", §"Generation History":
/clusters/{Id}/draft— full draft editor with auto-save (debounced 500ms per decision #97)/clusters/{Id}/draft/diff— three-column diff viewer/clusters/{Id}/generations— list of historical generations with rollback action- Live
sp_ValidateDraftinvocation in the validation panel; publish disabled while errors exist - Publish dialog requires Notes; runs
sp_PublishGenerationin a transaction
Acceptance:
- Create draft → validation panel runs and shows clean state for empty draft
- Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
- Fix the row → validation panel goes green + publish enables
- Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
- Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
- The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label
Task E.5 — UNS Structure + Equipment + Namespace tabs
Implement the three hybrid tabs:
- Namespaces tab — list with click-to-edit-in-draft
- UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
- Equipment tab — list with default sort by ZTag, search across all 5 identifiers
CSV import for Equipment per the revised schema in admin-ui.md (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).
Acceptance:
- Add a UnsArea via draft → publishes → appears in tree
- Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
- Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against
ExternalIdReservation(decision #124) - Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields
Task E.6 — Generic JSON config editor for DriverConfig
Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against DriverTypeRegistry's registered JSON schema for the driver type.
Acceptance:
- Add a Galaxy
DriverInstancein a draft → JSON editor renders the Galaxy DriverConfig schema - Editing produces live validation errors per the schema
- Saving with errors → publish stays disabled
Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")
Two SignalR hubs:
FleetStatusHub— pushesClusterNodeGenerationStatechangesAlertHub— pushes new sticky alerts (crash-loop circuit trips, failed applies)
Backend IHostedService polls every 5s and diffs.
Acceptance:
- Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
- Simulate a
LastAppliedStatus = Failedfor a node → AlertHub pushes a sticky alert that doesn't auto-clear
Task E.8 — Release reservation + Merge equipment workflows
Per admin-ui.md §"Release an external-ID reservation" and §"Merge or rebind equipment":
- Release flow: FleetAdmin only, requires reason, audit-logged via
sp_ReleaseExternalIdReservation - Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs
Acceptance:
- Release a reservation →
ReleasedAtset in DB + audit log entry created with reason - After release: same
(Kind, Value)can be reserved by a different EquipmentUuid in a future publish - Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with
EquipmentMergedAwayaudit entry
Task E.9 — ACLs tab + bulk-grant + permission simulator
Per admin-ui.md Cluster Detail tab #8 ("ACLs") and acl-design.md §"Admin UI":
- ACLs tab on Cluster Detail with two views ("By LDAP group" + "By scope")
- Edit grant flow: pick scope, group, permission bundle or per-flag, save to draft
- Bulk-grant flow: multi-select scope, group, permissions, preview rows that will be created, publish via draft
- Permission simulator: enter username + LDAP groups → live trie of effective permissions across the cluster's UNS tree
- Cluster-create workflow seeds the v1-compatibility default ACL set (per decision #131)
- Banner on Cluster Detail when the cluster's ACL set diverges from the seed
Acceptance:
- Add an ACL grant via draft → publishes → row in
NodeAcltable; appears in both Admin views - Bulk grant 10 LDAP groups × 1 permission set across 5 UnsAreas → preview shows 50 rows; publish creates them atomically
- Simulator: a user in
OtOpcUaReadOnlygroup seesReadOnlybundle effective at every node in the cluster - Simulator: a user in
OtOpcUaWriteTuneseesEngineerbundle effective;WriteConfigureis denied - Cluster-create workflow seeds 5 default ACL grants matching v1 LDAP roles (table in
acl-design.md§"Default Permissions") - Divergence banner appears when an operator removes any of the seeded grants
Compliance Checks (run at exit gate)
A phase-1-compliance.ps1 script that exits non-zero on any failure:
Schema compliance
# Run all migrations against a clean SQL Server instance
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."
# Run schema-introspection tests
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"
Expected: every table, column, index, FK, CHECK, and stored procedure in config-db-schema.md is present and matches.
Decision compliance
# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
# verify at least one citation exists in source, tests, or migrations:
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
foreach ($d in $decisions) {
$hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
}
Visual compliance (Admin UI)
Manual screenshot review:
- Login page side-by-side with ScadaLink's
Login.razorrendered - Sidebar + main layout side-by-side with ScadaLink's
MainLayout.razor+NavMenu.razor - Dashboard side-by-side with ScadaLink's
Dashboard.razor - Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink
Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.
Behavioral compliance (end-to-end smoke test)
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"
The smoke test:
- Spins up SQL Server in a container
- Runs all migrations
- Creates a
OtOpcUaAdminSQL principal +OtOpcUaNodeprincipal bound to a test NodeId - Starts the Admin app
- Creates a cluster + 1 node + Equipment-kind namespace via Admin API
- Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
- Publishes the draft
- Boots a Server instance configured with the test NodeId
- Asserts the Server fetched the published generation via
sp_GetCurrentGenerationForCluster - Asserts the Server's
ClusterNodeGenerationStaterow reportsApplied - Adds a tag in a new draft, publishes
- Asserts the Server picks up the new generation within 30s (next poll)
- Rolls back to generation 1
- Asserts the Server picks up the rollback within 30s
Expected: all 14 steps pass. Smoke test runs in CI on every PR to v2/phase-1-* branches.
Stability compliance
For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):
IDriver.Reinitialize()semantics tested- Driver-instance allocation tracking + cache flush tested with a mock driver
- Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize
Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.
Documentation compliance
# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
# Schema doc + admin-ui doc must be updated if implementation deviated
Completion Checklist
The exit gate signs off only when every item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).
Stream A — Core.Abstractions
- All 11 capability interfaces defined and compiling
DriverAttributeInfo+ supporting enums definedDriverTypeRegistryimplemented with Galaxy registration- Interface-independence reflection test passes
Stream B — Configuration
- EF Core migration
InitialSchemaapplies cleanly to a clean SQL Server - Schema introspection test asserts the live schema matches
config-db-schema.md - All stored procedures present and tested (happy path + error paths)
sp_PublishGenerationconcurrency test passes (one wins, one fails)- Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
- All 12 validation rules in
Configuration.Validationhave unit tests - LiteDB cache round-trip + pruning + corruption tests pass
- Generation-diff applier handles add/remove/modify across all entity types
Stream C — Core
LmxNodeManagerrenamed toGenericDriverNodeManager; v1 IntegrationTests still passGalaxyNodeManager : GenericDriverNodeManagerexists in legacy HostIAddressSpaceBuilderAPI implemented; byte-equivalent OPC UA browse output to v1- Driver hosting + isolation tested with mock drivers (one fails, others continue)
- Memory-budget cache-flush tested with mock driver
Stream D — Server
Microsoft.Extensions.Hostinghost runs in console mode and as Windows Service- TopShelf removed from the codebase
- Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
- LiteDB fallback on DB outage tested
Stream E — Admin
- Admin app boots, login screen renders with ScadaLink-equivalent visual
- LDAP cookie auth works against dev GLAuth
- Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
- Cluster-scoped grants work (decision #105)
- Cluster CRUD works end-to-end
- Draft → diff → publish workflow works end-to-end
- Rollback works end-to-end
- UNS Structure tab supports add / rename / drag-move with impact preview
- Equipment tab supports CSV import + search across 5 identifiers
- Generic JSON config editor renders + validates DriverConfig per registered schema
- SignalR real-time updates work (multi-tab test)
- Release reservation flow works + audit-logged
- Merge equipment flow works + audit-logged
Cross-cutting
phase-1-compliance.ps1runs and exits 0- Smoke test (14 steps) passes in CI
- Visual compliance review signed off (operator-equivalence test)
- All decisions cited in code/tests (
git grep "decision #N"returns hits for each) - Adversarial review of the phase diff (
/codex:adversarial-review --base v2) — findings closed or deferred with rationale - PR opened against
v2, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots - Reviewer signoff (one reviewer beyond the implementation lead)
exit-gate-phase-1.mdrecorded
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| EF Core 10 idiosyncrasies vs the documented schema | Medium | Medium | Schema-introspection test catches drift; validate early in Stream B |
sp_ValidateDraft cross-table checks complex enough to be slow |
Medium | Medium | Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit |
| Visual parity with ScadaLink slips because two component libraries diverge over time | Low | Medium | Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical |
| LDAP integration breaks against production GLAuth (different schema than dev) | Medium | High | Use the v1 LDAP layer as the integration reference; mirror its config exactly |
| Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) | High | High | Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state |
| ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different | Medium | Medium | Side-by-side review of RoleMapper after Stream E starts; refactor if claim shape diverges |
| Phase 1 takes longer than 6 weeks | High | Medium | Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.5–8 to a Phase 1.5 follow-up |
MERGE against ExternalIdReservation has a deadlock pathology under concurrent publishes |
Medium | High | Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to INSERT ... WHERE NOT EXISTS with explicit row locks |
Out of Scope (do not do in Phase 1)
- Galaxy out-of-process split (Phase 2)
- Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 3–5)
- Per-driver custom config editors in Admin (each driver's phase)
- Equipment-class template integration with the schemas repo
- Consumer cutover (out of v2 scope, separate integration-team track per
implementation/overview.md) - Wiring the OPC UA NodeManager to enforce ACLs at runtime (Phase 2+ in each driver phase). Phase 1 ships the
NodeAcltable + Admin UI ACL editing + evaluator unit tests; per-driver enforcement lands in each driver's phase peracl-design.md§"Implementation Plan" - Push-from-DB notification (decision #96 — v2.1)
- Generation pruning operator UI (decision #93 — v2.1)
- Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
- Mobile / tablet layout