Files
lmxopcua/docs/v2/implementation/phase-1-configuration-and-admin-scaffold.md
Joseph Doherty 4903a19ec9 Add data-path ACL design (acl-design.md, closes corrections B1) + dev-environment inventory and setup plan (dev-environment.md), and remove consumer cutover from OtOpcUa v2 scope.
ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list.

Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host).

Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered.

Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 11:58:33 -04:00

627 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold
> **Status**: DRAFT — implementation plan for Phase 1 of the v2 build (`plan.md` §6).
>
> **Branch**: `v2/phase-1-configuration`
> **Estimated duration**: 46 weeks (largest greenfield phase; most foundational)
> **Predecessor**: Phase 0 (`phase-0-rename-and-net10.md`)
> **Successor**: Phase 2 (Galaxy parity refactor)
## Phase Objective
Stand up the **central configuration substrate** for the v2 fleet:
1. **`Core.Abstractions` project** — driver capability interfaces (`IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IRediscoverable`, `IHostConnectivityProbe`, `IDriverConfigEditor`, `DriverAttributeInfo`)
2. **`Configuration` project** — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
3. **`Core` project** — `GenericDriverNodeManager` (renamed from `LmxNodeManager`), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via `IAddressSpaceBuilder`
4. **`Server` project** — `Microsoft.Extensions.Hosting`-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
5. **`Admin` project** — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)
**No driver instances yet** (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| New projects | 5 new src projects + 5 matching test projects |
| Existing v1 Host project | Refactored to consume `Core.Abstractions` interfaces against its existing Galaxy implementation — **but not split into Proxy/Host/Shared yet** (Phase 2) |
| `LmxNodeManager` | **Renamed to `GenericDriverNodeManager`** in Core, with `IDriver` swapped in for `IMxAccessClient`. The existing v1 Host instantiates `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process) — see `plan.md` §5a |
| Service hosting | TopShelf removed; `Microsoft.Extensions.Hosting` BackgroundService used (decision #30) |
| Central config DB | New SQL Server database `OtOpcUaConfig` provisioned from EF Core migrations |
| LDAP authentication for Admin | `Admin.Security` project mirrors `ScadaLink.Security`; cookie auth + JWT API endpoint |
| Local LiteDB cache on each node | New `config_cache.db` per node; bootstraps from central DB or cache |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| Galaxy out-of-process split | Phase 2 |
| Any new driver (Modbus, AB, S7, etc.) | Phase 3+ |
| OPC UA wire behavior | Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything |
| Equipment-class template integration with future schemas repo | `EquipmentClassRef` is a nullable hook column; no validation yet (decisions #112, #115) |
| Per-driver custom config editors in Admin | Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases |
| Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) | OUT of v2 scope — separate integration-team track per `implementation/overview.md` |
## Entry Gate Checklist
- [ ] Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
- [ ] `v2` branch is clean
- [ ] Phase 0 PR merged
- [ ] SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
- [ ] LDAP / GLAuth dev instance available for Admin auth integration testing
- [ ] ScadaLink CentralUI source accessible at `C:\Users\dohertj2\Desktop\scadalink-design\` for parity reference
- [ ] All Phase 1-relevant design docs reviewed: `plan.md` §45, `config-db-schema.md` (entire), `admin-ui.md` (entire), `driver-stability.md` §"Cross-Cutting Protections" (sets context for `Core.Abstractions` scope)
- [ ] Decisions #1125 read at least skim-level; key ones for Phase 1: #1422, #25, #28, #30, #3233, #4651, #79125
**Evidence file**: `docs/v2/implementation/entry-gate-phase-1.md` recording date, signoff, environment availability.
## Task Breakdown
Phase 1 is large — broken into 5 work streams (AE) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.
### Stream A — Core.Abstractions (1 week)
#### Task A.1 — Define driver capability interfaces
Create `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/` (.NET 10, no dependencies). Define:
```csharp
public interface IDriver { /* lifecycle, metadata, health */ }
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
public interface IReadable { /* on-demand read */ }
public interface IWritable { /* on-demand write */ }
public interface ISubscribable { /* data change subscriptions */ }
public interface IAlarmSource { /* alarm events + acknowledgment */ }
public interface IHistoryProvider { /* historical reads */ }
public interface IRediscoverable { /* opt-in change-detection signal */ }
public interface IHostConnectivityProbe { /* per-host runtime status */ }
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }
```
Plus the data models referenced from the interfaces:
```csharp
public sealed record DriverAttributeInfo(
string FullName,
DriverDataType DriverDataType,
bool IsArray,
uint? ArrayDim,
SecurityClassification SecurityClass,
bool IsHistorized);
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }
```
**Acceptance**:
- All interfaces compile in a project with **zero dependencies** beyond BCL
- xUnit test project asserts (via reflection) that no interface returns or accepts a type from `Core` or `Configuration` (interface independence per decision #59)
- Each interface XML doc cites the design decision(s) it implements (e.g. `IRediscoverable` cites #54)
#### Task A.2 — Define DriverTypeRegistry
```csharp
public sealed class DriverTypeRegistry
{
public DriverTypeMetadata Get(string driverType);
public IEnumerable<DriverTypeMetadata> All();
}
public sealed record DriverTypeMetadata(
string TypeName, // "Galaxy" | "ModbusTcp" | ...
NamespaceKindCompatibility AllowedNamespaceKinds, // per decision #111
string DriverConfigJsonSchema, // per decision #91
string DeviceConfigJsonSchema, // optional
string TagConfigJsonSchema);
[Flags]
public enum NamespaceKindCompatibility
{
Equipment = 1, SystemPlatform = 2, Simulated = 4
}
```
In v2.0 v1 only registers the `Galaxy` type (`AllowedNamespaceKinds = SystemPlatform`). Phase 3+ extends.
**Acceptance**:
- Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
- Galaxy registration entry exists with `AllowedNamespaceKinds = SystemPlatform` per decision #111
### Stream B — Configuration project (1.5 weeks)
#### Task B.1 — EF Core schema + initial migration
Create `src/ZB.MOM.WW.OtOpcUa.Configuration/` (.NET 10, EF Core 10).
Implement DbContext with entities matching `config-db-schema.md` exactly:
- `ServerCluster`, `ClusterNode`, `ClusterNodeCredential`
- `Namespace` (generation-versioned per decision #123)
- `UnsArea`, `UnsLine`
- `ConfigGeneration`
- `DriverInstance`, `Device`, `Equipment`, `Tag`, `PollGroup`
- `NodeAcl` (generation-versioned per decision #130; data-path authorization grants per `acl-design.md`)
- `ClusterNodeGenerationState`, `ConfigAuditLog`
- `ExternalIdReservation` (NOT generation-versioned per decision #124)
Generate the initial migration:
```bash
dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration
```
**Acceptance**:
- Apply migration to a clean SQL Server instance produces the schema in `config-db-schema.md`
- Schema-validation test (`SchemaComplianceTests`) introspects the live DB and asserts every table/column/index/constraint matches the doc
- Test runs in CI against a SQL Server container
#### Task B.2 — Stored procedures via `MigrationBuilder.Sql`
Add stored procedures from `config-db-schema.md` §"Stored Procedures":
- `sp_GetCurrentGenerationForCluster`
- `sp_GetGenerationContent`
- `sp_RegisterNodeGenerationApplied`
- `sp_PublishGeneration` (with the `MERGE` against `ExternalIdReservation` per decision #124)
- `sp_RollbackToGeneration`
- `sp_ValidateDraft` (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
- `sp_ComputeGenerationDiff`
- `sp_ReleaseExternalIdReservation` (FleetAdmin only)
Use `CREATE OR ALTER` style in `MigrationBuilder.Sql()` blocks so procs version with the schema.
**Acceptance**:
- Each proc has at least one xUnit test exercising the happy path + at least one error path
- `sp_PublishGeneration` has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
- `sp_GetCurrentGenerationForCluster` has an authorization test: caller bound to NodeId X cannot read cluster Y's generation
#### Task B.3 — Authorization model (SQL principals + GRANT)
Add a separate migration `AuthorizationGrants` that:
- Creates two SQL roles: `OtOpcUaNode`, `OtOpcUaAdmin`
- Grants EXECUTE on the appropriate procs per `config-db-schema.md` §"Authorization Model"
- Grants no direct table access to either role
**Acceptance**:
- Test that runs as a `OtOpcUaNode`-roled principal can only call the node procs, not admin procs
- Test that runs as a `OtOpcUaAdmin`-roled principal can call publish/rollback procs
- Test that direct `SELECT * FROM dbo.ConfigGeneration` from a `OtOpcUaNode` principal is denied
#### Task B.4 — JSON-schema validators (managed code)
In `Configuration.Validation/`, implement validators consumed by `sp_ValidateDraft` (called from the Admin app pre-publish per decision #91):
- UNS segment regex (`^[a-z0-9-]{1,32}$` or `_default`)
- Path length (≤200 chars)
- UUID immutability across generations
- Same-cluster namespace binding (decision #122)
- ZTag/SAPID reservation pre-flight (decision #124)
- EquipmentId derivation rule (decision #125)
- Driver type ↔ namespace kind allowed (decision #111)
- JSON-schema validation per `DriverType` from `DriverTypeRegistry`
**Acceptance**:
- One unit test per rule, both passing and failing cases
- Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)
#### Task B.5 — LiteDB local cache
In `Configuration.LocalCache/`, implement the LiteDB schema from `config-db-schema.md` §"Local LiteDB Cache":
```csharp
public interface ILocalConfigCache
{
Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
Task PutAsync(GenerationCacheEntry entry);
Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
}
```
**Acceptance**:
- Round-trip test: write a generation snapshot, read it back, assert deep equality
- Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
- Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error
#### Task B.6 — Generation-diff application logic
In `Configuration.Apply/`, implement the diff-and-apply logic that runs on each node when a new generation arrives:
```csharp
public interface IGenerationApplier
{
Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
}
```
Diff per entity type, dispatch to driver `Reinitialize` / cache flush as needed.
**Acceptance**:
- Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → `Added` for each
- Diff test: from = (above), to = same with one tag's `Name` changed → `Modified` for one tag, no other changes
- Diff test: from = (above), to = same with one equipment removed → `Removed` for the equipment + cascading `Removed` for its tags
- Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry
### Stream C — Core project (1 week, can parallel with Stream D)
#### Task C.1 — Rename `LmxNodeManager` → `GenericDriverNodeManager`
Per `plan.md` §5a:
- Lift the file from `Host/OpcUa/LmxNodeManager.cs` to `Core/OpcUa/GenericDriverNodeManager.cs`
- Swap `IMxAccessClient` for `IDriver` (composing `IReadable` / `IWritable` / `ISubscribable`)
- Swap `GalaxyAttributeInfo` for `DriverAttributeInfo`
- Promote `GalaxyRuntimeProbeManager` interactions to use `IHostConnectivityProbe`
- Move `MxDataTypeMapper` and `SecurityClassificationMapper` to a new `Driver.Galaxy.Mapping/` (still in legacy Host until Phase 2)
**Acceptance**:
- v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
- Reflection test asserts `GenericDriverNodeManager` has no static or instance reference to any Galaxy-specific type
#### Task C.2 — Derive `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process)
In the existing Host project, add a thin `GalaxyNodeManager` that:
- Inherits from `GenericDriverNodeManager`
- Wires up `MxDataTypeMapper`, `SecurityClassificationMapper`, the probe manager, etc.
- Replaces direct instantiation of the renamed class
**Acceptance**:
- v1 IntegrationTests pass identically with `GalaxyNodeManager` instantiated instead of the old direct class
- Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)
#### Task C.3 — `IAddressSpaceBuilder` API (decision #52)
Implement the streaming builder API drivers use to register nodes:
```csharp
public interface IAddressSpaceBuilder
{
IFolderBuilder Folder(string browseName, string displayName);
IVariableBuilder Variable(string browseName, DriverDataType type, ...);
void AddProperty(string browseName, object value);
}
```
Refactor `GenericDriverNodeManager.BuildAddressSpace` to consume `IAddressSpaceBuilder` (driver streams in tags rather than buffering them).
**Acceptance**:
- Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
- Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach
#### Task C.4 — Driver hosting + isolation (decision #65, #74)
Implement the in-process driver host that:
- Loads each `DriverInstance` row's driver assembly
- Catches and contains driver exceptions (driver isolation, decision #12)
- Surfaces `IDriver.Reinitialize()` to the configuration applier
- Tracks per-driver allocation footprint (`GetMemoryFootprint()` polled every 30s per `driver-stability.md`)
- Flushes optional caches on budget breach
- Marks drivers `Faulted` (Bad quality on their nodes) if `Reinitialize` fails
**Acceptance**:
- Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
- Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.
### Stream D — Server project (4 days, can parallel with Stream C)
#### Task D.1 — `Microsoft.Extensions.Hosting` Windows Service host (decision #30)
Replace TopShelf with `Microsoft.Extensions.Hosting`:
- New `Program.cs` using `Host.CreateApplicationBuilder()`
- `BackgroundService` that owns the OPC UA server lifecycle
- `services.UseWindowsService()` registers as a Windows service
- Configuration bootstrap from `appsettings.json` (NodeId + ClusterId + DB conn) per decision #18
**Acceptance**:
- `dotnet run` runs interactively (console mode)
- Installed as a Windows Service (`sc create OtOpcUa ...`), starts and stops cleanly
- Service install + uninstall cycle leaves no leftover state
#### Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)
On startup:
- Read `Cluster.NodeId` + `Cluster.ClusterId` + `ConfigDatabase.ConnectionString` from `appsettings.json`
- Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
- Call `sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId)` — the proc verifies the connected principal is bound to NodeId
- If proc rejects → fail startup loudly with the principal mismatch message
**Acceptance**:
- Test: principal bound to Node A boots successfully when configured with NodeId = A
- Test: principal bound to Node A configured with NodeId = B → startup fails with `Unauthorized` and the service does not stay running
- Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → `Forbidden`
#### Task D.3 — LiteDB cache fallback on DB outage
If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.
**Acceptance**:
- Test: with central DB unreachable, node starts from cache, logs `ConfigDbUnreachableUsingCache` event, OPC UA endpoint serves the cached config
- Test: cache empty AND central DB unreachable → startup fails with `NoConfigAvailable` (decision #21)
### Stream E — Admin project (2.5 weeks)
#### Task E.1 — Project scaffold mirroring ScadaLink CentralUI (decision #102)
Copy the project layout from `scadalink-design/src/ScadaLink.CentralUI/` (decision #104):
- `src/ZB.MOM.WW.OtOpcUa.Admin/`: Razor Components project, .NET 10, `AddInteractiveServerComponents`
- `Auth/AuthEndpoints.cs`, `Auth/CookieAuthenticationStateProvider.cs`
- `Components/Layout/MainLayout.razor`, `Components/Layout/NavMenu.razor`
- `Components/Pages/Login.razor`, `Components/Pages/Dashboard.razor`
- `Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor`
- `EndpointExtensions.cs`, `ServiceCollectionExtensions.cs`
Plus `src/ZB.MOM.WW.OtOpcUa.Admin.Security/` (decision #104): `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies` mirroring `ScadaLink.Security`.
**Acceptance**:
- App builds and runs locally
- Login page renders with OtOpcUa branding (only the `<h4>` text differs from ScadaLink)
- Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)
#### Task E.2 — Bootstrap LDAP + cookie auth + admin role mapping
Wire up `LdapAuthService` against the dev GLAuth instance per `Security.md`. Map LDAP groups to admin roles:
- `OtOpcUaAdmins``FleetAdmin`
- `OtOpcUaConfigEditors``ConfigEditor`
- `OtOpcUaViewers``ReadOnly`
Plus cluster-scoped grants per decision #105 (LDAP group `OtOpcUaConfigEditors-LINE3``ConfigEditor` + `ClusterId = LINE3-OPCUA` claim).
**Acceptance**:
- Login as a `FleetAdmin`-mapped user → redirected to `/`, sidebar shows admin sections
- Login as a `ReadOnly`-mapped user → redirected to `/`, sidebar shows view-only sections
- Login as a cluster-scoped `ConfigEditor` → only their permitted clusters appear in `/clusters`
- Login with bad credentials → redirected to `/login?error=...` with the LDAP error surfaced
#### Task E.3 — Cluster CRUD pages
Implement per `admin-ui.md`:
- `/clusters` — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
- `/clusters/{ClusterId}` — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
- "New cluster" workflow per `admin-ui.md` §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
- ApplicationUri auto-suggest on node create per decision #86
**Acceptance**:
- Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
- Edit cluster name → change reflected in list + detail
- Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge
#### Task E.4 — Draft → diff → publish workflow (decision #89)
Implement per `admin-ui.md` §"Draft Editor", §"Diff Viewer", §"Generation History":
- `/clusters/{Id}/draft` — full draft editor with auto-save (debounced 500ms per decision #97)
- `/clusters/{Id}/draft/diff` — three-column diff viewer
- `/clusters/{Id}/generations` — list of historical generations with rollback action
- Live `sp_ValidateDraft` invocation in the validation panel; publish disabled while errors exist
- Publish dialog requires Notes; runs `sp_PublishGeneration` in a transaction
**Acceptance**:
- Create draft → validation panel runs and shows clean state for empty draft
- Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
- Fix the row → validation panel goes green + publish enables
- Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
- Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
- The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label
#### Task E.5 — UNS Structure + Equipment + Namespace tabs
Implement the three hybrid tabs:
- Namespaces tab — list with click-to-edit-in-draft
- UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
- Equipment tab — list with default sort by ZTag, search across all 5 identifiers
CSV import for Equipment per the revised schema in `admin-ui.md` (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).
**Acceptance**:
- Add a UnsArea via draft → publishes → appears in tree
- Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
- Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against `ExternalIdReservation` (decision #124)
- Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields
#### Task E.6 — Generic JSON config editor for `DriverConfig`
Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against `DriverTypeRegistry`'s registered JSON schema for the driver type.
**Acceptance**:
- Add a Galaxy `DriverInstance` in a draft → JSON editor renders the Galaxy DriverConfig schema
- Editing produces live validation errors per the schema
- Saving with errors → publish stays disabled
#### Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")
Two SignalR hubs:
- `FleetStatusHub` — pushes `ClusterNodeGenerationState` changes
- `AlertHub` — pushes new sticky alerts (crash-loop circuit trips, failed applies)
Backend `IHostedService` polls every 5s and diffs.
**Acceptance**:
- Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
- Simulate a `LastAppliedStatus = Failed` for a node → AlertHub pushes a sticky alert that doesn't auto-clear
#### Task E.8 — Release reservation + Merge equipment workflows
Per `admin-ui.md` §"Release an external-ID reservation" and §"Merge or rebind equipment":
- Release flow: FleetAdmin only, requires reason, audit-logged via `sp_ReleaseExternalIdReservation`
- Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs
**Acceptance**:
- Release a reservation → `ReleasedAt` set in DB + audit log entry created with reason
- After release: same `(Kind, Value)` can be reserved by a different EquipmentUuid in a future publish
- Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with `EquipmentMergedAway` audit entry
#### Task E.9 — ACLs tab + bulk-grant + permission simulator
Per `admin-ui.md` Cluster Detail tab #8 ("ACLs") and `acl-design.md` §"Admin UI":
- ACLs tab on Cluster Detail with two views ("By LDAP group" + "By scope")
- Edit grant flow: pick scope, group, permission bundle or per-flag, save to draft
- Bulk-grant flow: multi-select scope, group, permissions, preview rows that will be created, publish via draft
- Permission simulator: enter username + LDAP groups → live trie of effective permissions across the cluster's UNS tree
- Cluster-create workflow seeds the v1-compatibility default ACL set (per decision #131)
- Banner on Cluster Detail when the cluster's ACL set diverges from the seed
**Acceptance**:
- Add an ACL grant via draft → publishes → row in `NodeAcl` table; appears in both Admin views
- Bulk grant 10 LDAP groups × 1 permission set across 5 UnsAreas → preview shows 50 rows; publish creates them atomically
- Simulator: a user in `OtOpcUaReadOnly` group sees `ReadOnly` bundle effective at every node in the cluster
- Simulator: a user in `OtOpcUaWriteTune` sees `Engineer` bundle effective; `WriteConfigure` is denied
- Cluster-create workflow seeds 5 default ACL grants matching v1 LDAP roles (table in `acl-design.md` §"Default Permissions")
- Divergence banner appears when an operator removes any of the seeded grants
## Compliance Checks (run at exit gate)
A `phase-1-compliance.ps1` script that exits non-zero on any failure:
### Schema compliance
```powershell
# Run all migrations against a clean SQL Server instance
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."
# Run schema-introspection tests
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"
```
Expected: every table, column, index, FK, CHECK, and stored procedure in `config-db-schema.md` is present and matches.
### Decision compliance
```powershell
# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
# verify at least one citation exists in source, tests, or migrations:
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
foreach ($d in $decisions) {
$hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
}
```
### Visual compliance (Admin UI)
Manual screenshot review:
1. Login page side-by-side with ScadaLink's `Login.razor` rendered
2. Sidebar + main layout side-by-side with ScadaLink's `MainLayout.razor` + `NavMenu.razor`
3. Dashboard side-by-side with ScadaLink's `Dashboard.razor`
4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink
Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.
### Behavioral compliance (end-to-end smoke test)
```bash
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"
```
The smoke test:
1. Spins up SQL Server in a container
2. Runs all migrations
3. Creates a `OtOpcUaAdmin` SQL principal + `OtOpcUaNode` principal bound to a test NodeId
4. Starts the Admin app
5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
7. Publishes the draft
8. Boots a Server instance configured with the test NodeId
9. Asserts the Server fetched the published generation via `sp_GetCurrentGenerationForCluster`
10. Asserts the Server's `ClusterNodeGenerationState` row reports `Applied`
11. Adds a tag in a new draft, publishes
12. Asserts the Server picks up the new generation within 30s (next poll)
13. Rolls back to generation 1
14. Asserts the Server picks up the rollback within 30s
Expected: all 14 steps pass. Smoke test runs in CI on every PR to `v2/phase-1-*` branches.
### Stability compliance
For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):
- `IDriver.Reinitialize()` semantics tested
- Driver-instance allocation tracking + cache flush tested with a mock driver
- Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize
Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.
### Documentation compliance
```bash
# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
# Schema doc + admin-ui doc must be updated if implementation deviated
```
## Completion Checklist
The exit gate signs off only when **every** item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).
### Stream A — Core.Abstractions
- [ ] All 11 capability interfaces defined and compiling
- [ ] `DriverAttributeInfo` + supporting enums defined
- [ ] `DriverTypeRegistry` implemented with Galaxy registration
- [ ] Interface-independence reflection test passes
### Stream B — Configuration
- [ ] EF Core migration `InitialSchema` applies cleanly to a clean SQL Server
- [ ] Schema introspection test asserts the live schema matches `config-db-schema.md`
- [ ] All stored procedures present and tested (happy path + error paths)
- [ ] `sp_PublishGeneration` concurrency test passes (one wins, one fails)
- [ ] Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
- [ ] All 12 validation rules in `Configuration.Validation` have unit tests
- [ ] LiteDB cache round-trip + pruning + corruption tests pass
- [ ] Generation-diff applier handles add/remove/modify across all entity types
### Stream C — Core
- [ ] `LmxNodeManager` renamed to `GenericDriverNodeManager`; v1 IntegrationTests still pass
- [ ] `GalaxyNodeManager : GenericDriverNodeManager` exists in legacy Host
- [ ] `IAddressSpaceBuilder` API implemented; byte-equivalent OPC UA browse output to v1
- [ ] Driver hosting + isolation tested with mock drivers (one fails, others continue)
- [ ] Memory-budget cache-flush tested with mock driver
### Stream D — Server
- [ ] `Microsoft.Extensions.Hosting` host runs in console mode and as Windows Service
- [ ] TopShelf removed from the codebase
- [ ] Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
- [ ] LiteDB fallback on DB outage tested
### Stream E — Admin
- [ ] Admin app boots, login screen renders with ScadaLink-equivalent visual
- [ ] LDAP cookie auth works against dev GLAuth
- [ ] Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
- [ ] Cluster-scoped grants work (decision #105)
- [ ] Cluster CRUD works end-to-end
- [ ] Draft → diff → publish workflow works end-to-end
- [ ] Rollback works end-to-end
- [ ] UNS Structure tab supports add / rename / drag-move with impact preview
- [ ] Equipment tab supports CSV import + search across 5 identifiers
- [ ] Generic JSON config editor renders + validates DriverConfig per registered schema
- [ ] SignalR real-time updates work (multi-tab test)
- [ ] Release reservation flow works + audit-logged
- [ ] Merge equipment flow works + audit-logged
### Cross-cutting
- [ ] `phase-1-compliance.ps1` runs and exits 0
- [ ] Smoke test (14 steps) passes in CI
- [ ] Visual compliance review signed off (operator-equivalence test)
- [ ] All decisions cited in code/tests (`git grep "decision #N"` returns hits for each)
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
- [ ] `exit-gate-phase-1.md` recorded
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| EF Core 10 idiosyncrasies vs the documented schema | Medium | Medium | Schema-introspection test catches drift; validate early in Stream B |
| `sp_ValidateDraft` cross-table checks complex enough to be slow | Medium | Medium | Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit |
| Visual parity with ScadaLink slips because two component libraries diverge over time | Low | Medium | Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical |
| LDAP integration breaks against production GLAuth (different schema than dev) | Medium | High | Use the v1 LDAP layer as the integration reference; mirror its config exactly |
| Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) | High | High | Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state |
| ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different | Medium | Medium | Side-by-side review of `RoleMapper` after Stream E starts; refactor if claim shape diverges |
| Phase 1 takes longer than 6 weeks | High | Medium | Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.58 to a Phase 1.5 follow-up |
| `MERGE` against `ExternalIdReservation` has a deadlock pathology under concurrent publishes | Medium | High | Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to `INSERT ... WHERE NOT EXISTS` with explicit row locks |
## Out of Scope (do not do in Phase 1)
- Galaxy out-of-process split (Phase 2)
- Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 35)
- Per-driver custom config editors in Admin (each driver's phase)
- Equipment-class template integration with the schemas repo
- Consumer cutover (out of v2 scope, separate integration-team track per `implementation/overview.md`)
- Wiring the OPC UA NodeManager to enforce ACLs at runtime (Phase 2+ in each driver phase). Phase 1 ships the `NodeAcl` table + Admin UI ACL editing + evaluator unit tests; per-driver enforcement lands in each driver's phase per `acl-design.md` §"Implementation Plan"
- Push-from-DB notification (decision #96 — v2.1)
- Generation pruning operator UI (decision #93 — v2.1)
- Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
- Mobile / tablet layout