ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list. Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host). Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered. Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
627 lines
35 KiB
Markdown
627 lines
35 KiB
Markdown
# Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold
|
||
|
||
> **Status**: DRAFT — implementation plan for Phase 1 of the v2 build (`plan.md` §6).
|
||
>
|
||
> **Branch**: `v2/phase-1-configuration`
|
||
> **Estimated duration**: 4–6 weeks (largest greenfield phase; most foundational)
|
||
> **Predecessor**: Phase 0 (`phase-0-rename-and-net10.md`)
|
||
> **Successor**: Phase 2 (Galaxy parity refactor)
|
||
|
||
## Phase Objective
|
||
|
||
Stand up the **central configuration substrate** for the v2 fleet:
|
||
|
||
1. **`Core.Abstractions` project** — driver capability interfaces (`IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IRediscoverable`, `IHostConnectivityProbe`, `IDriverConfigEditor`, `DriverAttributeInfo`)
|
||
2. **`Configuration` project** — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
|
||
3. **`Core` project** — `GenericDriverNodeManager` (renamed from `LmxNodeManager`), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via `IAddressSpaceBuilder`
|
||
4. **`Server` project** — `Microsoft.Extensions.Hosting`-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
|
||
5. **`Admin` project** — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)
|
||
|
||
**No driver instances yet** (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.
|
||
|
||
## Scope — What Changes
|
||
|
||
| Concern | Change |
|
||
|---------|--------|
|
||
| New projects | 5 new src projects + 5 matching test projects |
|
||
| Existing v1 Host project | Refactored to consume `Core.Abstractions` interfaces against its existing Galaxy implementation — **but not split into Proxy/Host/Shared yet** (Phase 2) |
|
||
| `LmxNodeManager` | **Renamed to `GenericDriverNodeManager`** in Core, with `IDriver` swapped in for `IMxAccessClient`. The existing v1 Host instantiates `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process) — see `plan.md` §5a |
|
||
| Service hosting | TopShelf removed; `Microsoft.Extensions.Hosting` BackgroundService used (decision #30) |
|
||
| Central config DB | New SQL Server database `OtOpcUaConfig` provisioned from EF Core migrations |
|
||
| LDAP authentication for Admin | `Admin.Security` project mirrors `ScadaLink.Security`; cookie auth + JWT API endpoint |
|
||
| Local LiteDB cache on each node | New `config_cache.db` per node; bootstraps from central DB or cache |
|
||
|
||
## Scope — What Does NOT Change
|
||
|
||
| Item | Reason |
|
||
|------|--------|
|
||
| Galaxy out-of-process split | Phase 2 |
|
||
| Any new driver (Modbus, AB, S7, etc.) | Phase 3+ |
|
||
| OPC UA wire behavior | Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything |
|
||
| Equipment-class template integration with future schemas repo | `EquipmentClassRef` is a nullable hook column; no validation yet (decisions #112, #115) |
|
||
| Per-driver custom config editors in Admin | Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases |
|
||
| Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) | OUT of v2 scope — separate integration-team track per `implementation/overview.md` |
|
||
|
||
## Entry Gate Checklist
|
||
|
||
- [ ] Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
|
||
- [ ] `v2` branch is clean
|
||
- [ ] Phase 0 PR merged
|
||
- [ ] SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
|
||
- [ ] LDAP / GLAuth dev instance available for Admin auth integration testing
|
||
- [ ] ScadaLink CentralUI source accessible at `C:\Users\dohertj2\Desktop\scadalink-design\` for parity reference
|
||
- [ ] All Phase 1-relevant design docs reviewed: `plan.md` §4–5, `config-db-schema.md` (entire), `admin-ui.md` (entire), `driver-stability.md` §"Cross-Cutting Protections" (sets context for `Core.Abstractions` scope)
|
||
- [ ] Decisions #1–125 read at least skim-level; key ones for Phase 1: #14–22, #25, #28, #30, #32–33, #46–51, #79–125
|
||
|
||
**Evidence file**: `docs/v2/implementation/entry-gate-phase-1.md` recording date, signoff, environment availability.
|
||
|
||
## Task Breakdown
|
||
|
||
Phase 1 is large — broken into 5 work streams (A–E) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.
|
||
|
||
### Stream A — Core.Abstractions (1 week)
|
||
|
||
#### Task A.1 — Define driver capability interfaces
|
||
|
||
Create `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/` (.NET 10, no dependencies). Define:
|
||
|
||
```csharp
|
||
public interface IDriver { /* lifecycle, metadata, health */ }
|
||
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
|
||
public interface IReadable { /* on-demand read */ }
|
||
public interface IWritable { /* on-demand write */ }
|
||
public interface ISubscribable { /* data change subscriptions */ }
|
||
public interface IAlarmSource { /* alarm events + acknowledgment */ }
|
||
public interface IHistoryProvider { /* historical reads */ }
|
||
public interface IRediscoverable { /* opt-in change-detection signal */ }
|
||
public interface IHostConnectivityProbe { /* per-host runtime status */ }
|
||
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
|
||
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }
|
||
```
|
||
|
||
Plus the data models referenced from the interfaces:
|
||
|
||
```csharp
|
||
public sealed record DriverAttributeInfo(
|
||
string FullName,
|
||
DriverDataType DriverDataType,
|
||
bool IsArray,
|
||
uint? ArrayDim,
|
||
SecurityClassification SecurityClass,
|
||
bool IsHistorized);
|
||
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
|
||
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }
|
||
```
|
||
|
||
**Acceptance**:
|
||
- All interfaces compile in a project with **zero dependencies** beyond BCL
|
||
- xUnit test project asserts (via reflection) that no interface returns or accepts a type from `Core` or `Configuration` (interface independence per decision #59)
|
||
- Each interface XML doc cites the design decision(s) it implements (e.g. `IRediscoverable` cites #54)
|
||
|
||
#### Task A.2 — Define DriverTypeRegistry
|
||
|
||
```csharp
|
||
public sealed class DriverTypeRegistry
|
||
{
|
||
public DriverTypeMetadata Get(string driverType);
|
||
public IEnumerable<DriverTypeMetadata> All();
|
||
}
|
||
|
||
public sealed record DriverTypeMetadata(
|
||
string TypeName, // "Galaxy" | "ModbusTcp" | ...
|
||
NamespaceKindCompatibility AllowedNamespaceKinds, // per decision #111
|
||
string DriverConfigJsonSchema, // per decision #91
|
||
string DeviceConfigJsonSchema, // optional
|
||
string TagConfigJsonSchema);
|
||
|
||
[Flags]
|
||
public enum NamespaceKindCompatibility
|
||
{
|
||
Equipment = 1, SystemPlatform = 2, Simulated = 4
|
||
}
|
||
```
|
||
|
||
In v2.0 v1 only registers the `Galaxy` type (`AllowedNamespaceKinds = SystemPlatform`). Phase 3+ extends.
|
||
|
||
**Acceptance**:
|
||
- Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
|
||
- Galaxy registration entry exists with `AllowedNamespaceKinds = SystemPlatform` per decision #111
|
||
|
||
### Stream B — Configuration project (1.5 weeks)
|
||
|
||
#### Task B.1 — EF Core schema + initial migration
|
||
|
||
Create `src/ZB.MOM.WW.OtOpcUa.Configuration/` (.NET 10, EF Core 10).
|
||
|
||
Implement DbContext with entities matching `config-db-schema.md` exactly:
|
||
- `ServerCluster`, `ClusterNode`, `ClusterNodeCredential`
|
||
- `Namespace` (generation-versioned per decision #123)
|
||
- `UnsArea`, `UnsLine`
|
||
- `ConfigGeneration`
|
||
- `DriverInstance`, `Device`, `Equipment`, `Tag`, `PollGroup`
|
||
- `NodeAcl` (generation-versioned per decision #130; data-path authorization grants per `acl-design.md`)
|
||
- `ClusterNodeGenerationState`, `ConfigAuditLog`
|
||
- `ExternalIdReservation` (NOT generation-versioned per decision #124)
|
||
|
||
Generate the initial migration:
|
||
|
||
```bash
|
||
dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration
|
||
```
|
||
|
||
**Acceptance**:
|
||
- Apply migration to a clean SQL Server instance produces the schema in `config-db-schema.md`
|
||
- Schema-validation test (`SchemaComplianceTests`) introspects the live DB and asserts every table/column/index/constraint matches the doc
|
||
- Test runs in CI against a SQL Server container
|
||
|
||
#### Task B.2 — Stored procedures via `MigrationBuilder.Sql`
|
||
|
||
Add stored procedures from `config-db-schema.md` §"Stored Procedures":
|
||
- `sp_GetCurrentGenerationForCluster`
|
||
- `sp_GetGenerationContent`
|
||
- `sp_RegisterNodeGenerationApplied`
|
||
- `sp_PublishGeneration` (with the `MERGE` against `ExternalIdReservation` per decision #124)
|
||
- `sp_RollbackToGeneration`
|
||
- `sp_ValidateDraft` (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
|
||
- `sp_ComputeGenerationDiff`
|
||
- `sp_ReleaseExternalIdReservation` (FleetAdmin only)
|
||
|
||
Use `CREATE OR ALTER` style in `MigrationBuilder.Sql()` blocks so procs version with the schema.
|
||
|
||
**Acceptance**:
|
||
- Each proc has at least one xUnit test exercising the happy path + at least one error path
|
||
- `sp_PublishGeneration` has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
|
||
- `sp_GetCurrentGenerationForCluster` has an authorization test: caller bound to NodeId X cannot read cluster Y's generation
|
||
|
||
#### Task B.3 — Authorization model (SQL principals + GRANT)
|
||
|
||
Add a separate migration `AuthorizationGrants` that:
|
||
- Creates two SQL roles: `OtOpcUaNode`, `OtOpcUaAdmin`
|
||
- Grants EXECUTE on the appropriate procs per `config-db-schema.md` §"Authorization Model"
|
||
- Grants no direct table access to either role
|
||
|
||
**Acceptance**:
|
||
- Test that runs as a `OtOpcUaNode`-roled principal can only call the node procs, not admin procs
|
||
- Test that runs as a `OtOpcUaAdmin`-roled principal can call publish/rollback procs
|
||
- Test that direct `SELECT * FROM dbo.ConfigGeneration` from a `OtOpcUaNode` principal is denied
|
||
|
||
#### Task B.4 — JSON-schema validators (managed code)
|
||
|
||
In `Configuration.Validation/`, implement validators consumed by `sp_ValidateDraft` (called from the Admin app pre-publish per decision #91):
|
||
- UNS segment regex (`^[a-z0-9-]{1,32}$` or `_default`)
|
||
- Path length (≤200 chars)
|
||
- UUID immutability across generations
|
||
- Same-cluster namespace binding (decision #122)
|
||
- ZTag/SAPID reservation pre-flight (decision #124)
|
||
- EquipmentId derivation rule (decision #125)
|
||
- Driver type ↔ namespace kind allowed (decision #111)
|
||
- JSON-schema validation per `DriverType` from `DriverTypeRegistry`
|
||
|
||
**Acceptance**:
|
||
- One unit test per rule, both passing and failing cases
|
||
- Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)
|
||
|
||
#### Task B.5 — LiteDB local cache
|
||
|
||
In `Configuration.LocalCache/`, implement the LiteDB schema from `config-db-schema.md` §"Local LiteDB Cache":
|
||
|
||
```csharp
|
||
public interface ILocalConfigCache
|
||
{
|
||
Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
|
||
Task PutAsync(GenerationCacheEntry entry);
|
||
Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
|
||
}
|
||
```
|
||
|
||
**Acceptance**:
|
||
- Round-trip test: write a generation snapshot, read it back, assert deep equality
|
||
- Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
|
||
- Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error
|
||
|
||
#### Task B.6 — Generation-diff application logic
|
||
|
||
In `Configuration.Apply/`, implement the diff-and-apply logic that runs on each node when a new generation arrives:
|
||
|
||
```csharp
|
||
public interface IGenerationApplier
|
||
{
|
||
Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
|
||
}
|
||
```
|
||
|
||
Diff per entity type, dispatch to driver `Reinitialize` / cache flush as needed.
|
||
|
||
**Acceptance**:
|
||
- Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → `Added` for each
|
||
- Diff test: from = (above), to = same with one tag's `Name` changed → `Modified` for one tag, no other changes
|
||
- Diff test: from = (above), to = same with one equipment removed → `Removed` for the equipment + cascading `Removed` for its tags
|
||
- Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry
|
||
|
||
### Stream C — Core project (1 week, can parallel with Stream D)
|
||
|
||
#### Task C.1 — Rename `LmxNodeManager` → `GenericDriverNodeManager`
|
||
|
||
Per `plan.md` §5a:
|
||
- Lift the file from `Host/OpcUa/LmxNodeManager.cs` to `Core/OpcUa/GenericDriverNodeManager.cs`
|
||
- Swap `IMxAccessClient` for `IDriver` (composing `IReadable` / `IWritable` / `ISubscribable`)
|
||
- Swap `GalaxyAttributeInfo` for `DriverAttributeInfo`
|
||
- Promote `GalaxyRuntimeProbeManager` interactions to use `IHostConnectivityProbe`
|
||
- Move `MxDataTypeMapper` and `SecurityClassificationMapper` to a new `Driver.Galaxy.Mapping/` (still in legacy Host until Phase 2)
|
||
|
||
**Acceptance**:
|
||
- v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
|
||
- Reflection test asserts `GenericDriverNodeManager` has no static or instance reference to any Galaxy-specific type
|
||
|
||
#### Task C.2 — Derive `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process)
|
||
|
||
In the existing Host project, add a thin `GalaxyNodeManager` that:
|
||
- Inherits from `GenericDriverNodeManager`
|
||
- Wires up `MxDataTypeMapper`, `SecurityClassificationMapper`, the probe manager, etc.
|
||
- Replaces direct instantiation of the renamed class
|
||
|
||
**Acceptance**:
|
||
- v1 IntegrationTests pass identically with `GalaxyNodeManager` instantiated instead of the old direct class
|
||
- Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)
|
||
|
||
#### Task C.3 — `IAddressSpaceBuilder` API (decision #52)
|
||
|
||
Implement the streaming builder API drivers use to register nodes:
|
||
|
||
```csharp
|
||
public interface IAddressSpaceBuilder
|
||
{
|
||
IFolderBuilder Folder(string browseName, string displayName);
|
||
IVariableBuilder Variable(string browseName, DriverDataType type, ...);
|
||
void AddProperty(string browseName, object value);
|
||
}
|
||
```
|
||
|
||
Refactor `GenericDriverNodeManager.BuildAddressSpace` to consume `IAddressSpaceBuilder` (driver streams in tags rather than buffering them).
|
||
|
||
**Acceptance**:
|
||
- Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
|
||
- Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach
|
||
|
||
#### Task C.4 — Driver hosting + isolation (decision #65, #74)
|
||
|
||
Implement the in-process driver host that:
|
||
- Loads each `DriverInstance` row's driver assembly
|
||
- Catches and contains driver exceptions (driver isolation, decision #12)
|
||
- Surfaces `IDriver.Reinitialize()` to the configuration applier
|
||
- Tracks per-driver allocation footprint (`GetMemoryFootprint()` polled every 30s per `driver-stability.md`)
|
||
- Flushes optional caches on budget breach
|
||
- Marks drivers `Faulted` (Bad quality on their nodes) if `Reinitialize` fails
|
||
|
||
**Acceptance**:
|
||
- Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
|
||
- Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.
|
||
|
||
### Stream D — Server project (4 days, can parallel with Stream C)
|
||
|
||
#### Task D.1 — `Microsoft.Extensions.Hosting` Windows Service host (decision #30)
|
||
|
||
Replace TopShelf with `Microsoft.Extensions.Hosting`:
|
||
- New `Program.cs` using `Host.CreateApplicationBuilder()`
|
||
- `BackgroundService` that owns the OPC UA server lifecycle
|
||
- `services.UseWindowsService()` registers as a Windows service
|
||
- Configuration bootstrap from `appsettings.json` (NodeId + ClusterId + DB conn) per decision #18
|
||
|
||
**Acceptance**:
|
||
- `dotnet run` runs interactively (console mode)
|
||
- Installed as a Windows Service (`sc create OtOpcUa ...`), starts and stops cleanly
|
||
- Service install + uninstall cycle leaves no leftover state
|
||
|
||
#### Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)
|
||
|
||
On startup:
|
||
- Read `Cluster.NodeId` + `Cluster.ClusterId` + `ConfigDatabase.ConnectionString` from `appsettings.json`
|
||
- Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
|
||
- Call `sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId)` — the proc verifies the connected principal is bound to NodeId
|
||
- If proc rejects → fail startup loudly with the principal mismatch message
|
||
|
||
**Acceptance**:
|
||
- Test: principal bound to Node A boots successfully when configured with NodeId = A
|
||
- Test: principal bound to Node A configured with NodeId = B → startup fails with `Unauthorized` and the service does not stay running
|
||
- Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → `Forbidden`
|
||
|
||
#### Task D.3 — LiteDB cache fallback on DB outage
|
||
|
||
If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.
|
||
|
||
**Acceptance**:
|
||
- Test: with central DB unreachable, node starts from cache, logs `ConfigDbUnreachableUsingCache` event, OPC UA endpoint serves the cached config
|
||
- Test: cache empty AND central DB unreachable → startup fails with `NoConfigAvailable` (decision #21)
|
||
|
||
### Stream E — Admin project (2.5 weeks)
|
||
|
||
#### Task E.1 — Project scaffold mirroring ScadaLink CentralUI (decision #102)
|
||
|
||
Copy the project layout from `scadalink-design/src/ScadaLink.CentralUI/` (decision #104):
|
||
- `src/ZB.MOM.WW.OtOpcUa.Admin/`: Razor Components project, .NET 10, `AddInteractiveServerComponents`
|
||
- `Auth/AuthEndpoints.cs`, `Auth/CookieAuthenticationStateProvider.cs`
|
||
- `Components/Layout/MainLayout.razor`, `Components/Layout/NavMenu.razor`
|
||
- `Components/Pages/Login.razor`, `Components/Pages/Dashboard.razor`
|
||
- `Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor`
|
||
- `EndpointExtensions.cs`, `ServiceCollectionExtensions.cs`
|
||
|
||
Plus `src/ZB.MOM.WW.OtOpcUa.Admin.Security/` (decision #104): `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies` mirroring `ScadaLink.Security`.
|
||
|
||
**Acceptance**:
|
||
- App builds and runs locally
|
||
- Login page renders with OtOpcUa branding (only the `<h4>` text differs from ScadaLink)
|
||
- Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)
|
||
|
||
#### Task E.2 — Bootstrap LDAP + cookie auth + admin role mapping
|
||
|
||
Wire up `LdapAuthService` against the dev GLAuth instance per `Security.md`. Map LDAP groups to admin roles:
|
||
- `OtOpcUaAdmins` → `FleetAdmin`
|
||
- `OtOpcUaConfigEditors` → `ConfigEditor`
|
||
- `OtOpcUaViewers` → `ReadOnly`
|
||
|
||
Plus cluster-scoped grants per decision #105 (LDAP group `OtOpcUaConfigEditors-LINE3` → `ConfigEditor` + `ClusterId = LINE3-OPCUA` claim).
|
||
|
||
**Acceptance**:
|
||
- Login as a `FleetAdmin`-mapped user → redirected to `/`, sidebar shows admin sections
|
||
- Login as a `ReadOnly`-mapped user → redirected to `/`, sidebar shows view-only sections
|
||
- Login as a cluster-scoped `ConfigEditor` → only their permitted clusters appear in `/clusters`
|
||
- Login with bad credentials → redirected to `/login?error=...` with the LDAP error surfaced
|
||
|
||
#### Task E.3 — Cluster CRUD pages
|
||
|
||
Implement per `admin-ui.md`:
|
||
- `/clusters` — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
|
||
- `/clusters/{ClusterId}` — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
|
||
- "New cluster" workflow per `admin-ui.md` §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
|
||
- ApplicationUri auto-suggest on node create per decision #86
|
||
|
||
**Acceptance**:
|
||
- Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
|
||
- Edit cluster name → change reflected in list + detail
|
||
- Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge
|
||
|
||
#### Task E.4 — Draft → diff → publish workflow (decision #89)
|
||
|
||
Implement per `admin-ui.md` §"Draft Editor", §"Diff Viewer", §"Generation History":
|
||
- `/clusters/{Id}/draft` — full draft editor with auto-save (debounced 500ms per decision #97)
|
||
- `/clusters/{Id}/draft/diff` — three-column diff viewer
|
||
- `/clusters/{Id}/generations` — list of historical generations with rollback action
|
||
- Live `sp_ValidateDraft` invocation in the validation panel; publish disabled while errors exist
|
||
- Publish dialog requires Notes; runs `sp_PublishGeneration` in a transaction
|
||
|
||
**Acceptance**:
|
||
- Create draft → validation panel runs and shows clean state for empty draft
|
||
- Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
|
||
- Fix the row → validation panel goes green + publish enables
|
||
- Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
|
||
- Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
|
||
- The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label
|
||
|
||
#### Task E.5 — UNS Structure + Equipment + Namespace tabs
|
||
|
||
Implement the three hybrid tabs:
|
||
- Namespaces tab — list with click-to-edit-in-draft
|
||
- UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
|
||
- Equipment tab — list with default sort by ZTag, search across all 5 identifiers
|
||
|
||
CSV import for Equipment per the revised schema in `admin-ui.md` (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).
|
||
|
||
**Acceptance**:
|
||
- Add a UnsArea via draft → publishes → appears in tree
|
||
- Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
|
||
- Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against `ExternalIdReservation` (decision #124)
|
||
- Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields
|
||
|
||
#### Task E.6 — Generic JSON config editor for `DriverConfig`
|
||
|
||
Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against `DriverTypeRegistry`'s registered JSON schema for the driver type.
|
||
|
||
**Acceptance**:
|
||
- Add a Galaxy `DriverInstance` in a draft → JSON editor renders the Galaxy DriverConfig schema
|
||
- Editing produces live validation errors per the schema
|
||
- Saving with errors → publish stays disabled
|
||
|
||
#### Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")
|
||
|
||
Two SignalR hubs:
|
||
- `FleetStatusHub` — pushes `ClusterNodeGenerationState` changes
|
||
- `AlertHub` — pushes new sticky alerts (crash-loop circuit trips, failed applies)
|
||
|
||
Backend `IHostedService` polls every 5s and diffs.
|
||
|
||
**Acceptance**:
|
||
- Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
|
||
- Simulate a `LastAppliedStatus = Failed` for a node → AlertHub pushes a sticky alert that doesn't auto-clear
|
||
|
||
#### Task E.8 — Release reservation + Merge equipment workflows
|
||
|
||
Per `admin-ui.md` §"Release an external-ID reservation" and §"Merge or rebind equipment":
|
||
- Release flow: FleetAdmin only, requires reason, audit-logged via `sp_ReleaseExternalIdReservation`
|
||
- Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs
|
||
|
||
**Acceptance**:
|
||
- Release a reservation → `ReleasedAt` set in DB + audit log entry created with reason
|
||
- After release: same `(Kind, Value)` can be reserved by a different EquipmentUuid in a future publish
|
||
- Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with `EquipmentMergedAway` audit entry
|
||
|
||
#### Task E.9 — ACLs tab + bulk-grant + permission simulator
|
||
|
||
Per `admin-ui.md` Cluster Detail tab #8 ("ACLs") and `acl-design.md` §"Admin UI":
|
||
- ACLs tab on Cluster Detail with two views ("By LDAP group" + "By scope")
|
||
- Edit grant flow: pick scope, group, permission bundle or per-flag, save to draft
|
||
- Bulk-grant flow: multi-select scope, group, permissions, preview rows that will be created, publish via draft
|
||
- Permission simulator: enter username + LDAP groups → live trie of effective permissions across the cluster's UNS tree
|
||
- Cluster-create workflow seeds the v1-compatibility default ACL set (per decision #131)
|
||
- Banner on Cluster Detail when the cluster's ACL set diverges from the seed
|
||
|
||
**Acceptance**:
|
||
- Add an ACL grant via draft → publishes → row in `NodeAcl` table; appears in both Admin views
|
||
- Bulk grant 10 LDAP groups × 1 permission set across 5 UnsAreas → preview shows 50 rows; publish creates them atomically
|
||
- Simulator: a user in `OtOpcUaReadOnly` group sees `ReadOnly` bundle effective at every node in the cluster
|
||
- Simulator: a user in `OtOpcUaWriteTune` sees `Engineer` bundle effective; `WriteConfigure` is denied
|
||
- Cluster-create workflow seeds 5 default ACL grants matching v1 LDAP roles (table in `acl-design.md` §"Default Permissions")
|
||
- Divergence banner appears when an operator removes any of the seeded grants
|
||
|
||
## Compliance Checks (run at exit gate)
|
||
|
||
A `phase-1-compliance.ps1` script that exits non-zero on any failure:
|
||
|
||
### Schema compliance
|
||
|
||
```powershell
|
||
# Run all migrations against a clean SQL Server instance
|
||
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."
|
||
|
||
# Run schema-introspection tests
|
||
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"
|
||
```
|
||
|
||
Expected: every table, column, index, FK, CHECK, and stored procedure in `config-db-schema.md` is present and matches.
|
||
|
||
### Decision compliance
|
||
|
||
```powershell
|
||
# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
|
||
# verify at least one citation exists in source, tests, or migrations:
|
||
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
|
||
foreach ($d in $decisions) {
|
||
$hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
|
||
if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
|
||
}
|
||
```
|
||
|
||
### Visual compliance (Admin UI)
|
||
|
||
Manual screenshot review:
|
||
1. Login page side-by-side with ScadaLink's `Login.razor` rendered
|
||
2. Sidebar + main layout side-by-side with ScadaLink's `MainLayout.razor` + `NavMenu.razor`
|
||
3. Dashboard side-by-side with ScadaLink's `Dashboard.razor`
|
||
4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink
|
||
|
||
Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.
|
||
|
||
### Behavioral compliance (end-to-end smoke test)
|
||
|
||
```bash
|
||
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"
|
||
```
|
||
|
||
The smoke test:
|
||
1. Spins up SQL Server in a container
|
||
2. Runs all migrations
|
||
3. Creates a `OtOpcUaAdmin` SQL principal + `OtOpcUaNode` principal bound to a test NodeId
|
||
4. Starts the Admin app
|
||
5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
|
||
6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
|
||
7. Publishes the draft
|
||
8. Boots a Server instance configured with the test NodeId
|
||
9. Asserts the Server fetched the published generation via `sp_GetCurrentGenerationForCluster`
|
||
10. Asserts the Server's `ClusterNodeGenerationState` row reports `Applied`
|
||
11. Adds a tag in a new draft, publishes
|
||
12. Asserts the Server picks up the new generation within 30s (next poll)
|
||
13. Rolls back to generation 1
|
||
14. Asserts the Server picks up the rollback within 30s
|
||
|
||
Expected: all 14 steps pass. Smoke test runs in CI on every PR to `v2/phase-1-*` branches.
|
||
|
||
### Stability compliance
|
||
|
||
For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):
|
||
- `IDriver.Reinitialize()` semantics tested
|
||
- Driver-instance allocation tracking + cache flush tested with a mock driver
|
||
- Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize
|
||
|
||
Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.
|
||
|
||
### Documentation compliance
|
||
|
||
```bash
|
||
# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
|
||
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
|
||
# Schema doc + admin-ui doc must be updated if implementation deviated
|
||
```
|
||
|
||
## Completion Checklist
|
||
|
||
The exit gate signs off only when **every** item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).
|
||
|
||
### Stream A — Core.Abstractions
|
||
- [ ] All 11 capability interfaces defined and compiling
|
||
- [ ] `DriverAttributeInfo` + supporting enums defined
|
||
- [ ] `DriverTypeRegistry` implemented with Galaxy registration
|
||
- [ ] Interface-independence reflection test passes
|
||
|
||
### Stream B — Configuration
|
||
- [ ] EF Core migration `InitialSchema` applies cleanly to a clean SQL Server
|
||
- [ ] Schema introspection test asserts the live schema matches `config-db-schema.md`
|
||
- [ ] All stored procedures present and tested (happy path + error paths)
|
||
- [ ] `sp_PublishGeneration` concurrency test passes (one wins, one fails)
|
||
- [ ] Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
|
||
- [ ] All 12 validation rules in `Configuration.Validation` have unit tests
|
||
- [ ] LiteDB cache round-trip + pruning + corruption tests pass
|
||
- [ ] Generation-diff applier handles add/remove/modify across all entity types
|
||
|
||
### Stream C — Core
|
||
- [ ] `LmxNodeManager` renamed to `GenericDriverNodeManager`; v1 IntegrationTests still pass
|
||
- [ ] `GalaxyNodeManager : GenericDriverNodeManager` exists in legacy Host
|
||
- [ ] `IAddressSpaceBuilder` API implemented; byte-equivalent OPC UA browse output to v1
|
||
- [ ] Driver hosting + isolation tested with mock drivers (one fails, others continue)
|
||
- [ ] Memory-budget cache-flush tested with mock driver
|
||
|
||
### Stream D — Server
|
||
- [ ] `Microsoft.Extensions.Hosting` host runs in console mode and as Windows Service
|
||
- [ ] TopShelf removed from the codebase
|
||
- [ ] Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
|
||
- [ ] LiteDB fallback on DB outage tested
|
||
|
||
### Stream E — Admin
|
||
- [ ] Admin app boots, login screen renders with ScadaLink-equivalent visual
|
||
- [ ] LDAP cookie auth works against dev GLAuth
|
||
- [ ] Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
|
||
- [ ] Cluster-scoped grants work (decision #105)
|
||
- [ ] Cluster CRUD works end-to-end
|
||
- [ ] Draft → diff → publish workflow works end-to-end
|
||
- [ ] Rollback works end-to-end
|
||
- [ ] UNS Structure tab supports add / rename / drag-move with impact preview
|
||
- [ ] Equipment tab supports CSV import + search across 5 identifiers
|
||
- [ ] Generic JSON config editor renders + validates DriverConfig per registered schema
|
||
- [ ] SignalR real-time updates work (multi-tab test)
|
||
- [ ] Release reservation flow works + audit-logged
|
||
- [ ] Merge equipment flow works + audit-logged
|
||
|
||
### Cross-cutting
|
||
- [ ] `phase-1-compliance.ps1` runs and exits 0
|
||
- [ ] Smoke test (14 steps) passes in CI
|
||
- [ ] Visual compliance review signed off (operator-equivalence test)
|
||
- [ ] All decisions cited in code/tests (`git grep "decision #N"` returns hits for each)
|
||
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
|
||
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
|
||
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
|
||
- [ ] `exit-gate-phase-1.md` recorded
|
||
|
||
## Risks and Mitigations
|
||
|
||
| Risk | Likelihood | Impact | Mitigation |
|
||
|------|:----------:|:------:|------------|
|
||
| EF Core 10 idiosyncrasies vs the documented schema | Medium | Medium | Schema-introspection test catches drift; validate early in Stream B |
|
||
| `sp_ValidateDraft` cross-table checks complex enough to be slow | Medium | Medium | Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit |
|
||
| Visual parity with ScadaLink slips because two component libraries diverge over time | Low | Medium | Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical |
|
||
| LDAP integration breaks against production GLAuth (different schema than dev) | Medium | High | Use the v1 LDAP layer as the integration reference; mirror its config exactly |
|
||
| Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) | High | High | Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state |
|
||
| ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different | Medium | Medium | Side-by-side review of `RoleMapper` after Stream E starts; refactor if claim shape diverges |
|
||
| Phase 1 takes longer than 6 weeks | High | Medium | Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.5–8 to a Phase 1.5 follow-up |
|
||
| `MERGE` against `ExternalIdReservation` has a deadlock pathology under concurrent publishes | Medium | High | Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to `INSERT ... WHERE NOT EXISTS` with explicit row locks |
|
||
|
||
## Out of Scope (do not do in Phase 1)
|
||
|
||
- Galaxy out-of-process split (Phase 2)
|
||
- Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 3–5)
|
||
- Per-driver custom config editors in Admin (each driver's phase)
|
||
- Equipment-class template integration with the schemas repo
|
||
- Consumer cutover (out of v2 scope, separate integration-team track per `implementation/overview.md`)
|
||
- Wiring the OPC UA NodeManager to enforce ACLs at runtime (Phase 2+ in each driver phase). Phase 1 ships the `NodeAcl` table + Admin UI ACL editing + evaluator unit tests; per-driver enforcement lands in each driver's phase per `acl-design.md` §"Implementation Plan"
|
||
- Push-from-DB notification (decision #96 — v2.1)
|
||
- Generation pruning operator UI (decision #93 — v2.1)
|
||
- Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
|
||
- Mobile / tablet layout
|