Add Phase 0 + Phase 1 detailed implementation plans under docs/v2/implementation/ with a phase-gate model so the work can be verified for compliance to the v2 design as it lands. Three-gate structure per phase (entry / mid / exit) with explicit compliance-check categories: schema compliance (live DB introspected against config-db-schema.md DDL via xUnit), decision compliance (every decision number cited in the phase doc must have at least one code/test citation in the codebase, verified via git grep), visual compliance (Admin UI screenshots reviewed side-by-side against ScadaLink CentralUI's equivalent screens), behavioral compliance (per-phase end-to-end smoke test that always passes at exit, never "known broken fix later"), stability compliance (cross-cutting protections from driver-stability.md wired up and regression-tested for Tier C drivers), and documentation compliance (any deviation from v2 design docs reflected back as decision-log updates with explicit "supersedes" notes). Exit gate requires two-reviewer signoff and an exit-gate-{phase}.md record; silent deviation is the failure mode the gates exist to make impossible to ship. Phase 0 doc covers the mechanical LmxOpcUa → OtOpcUa rename with 9 tasks, 7 compliance checks, and a completion checklist that gates on baseline test count parity. Phase 1 doc covers the largest greenfield phase — 5 work streams (Core.Abstractions, Configuration project with EF Core schema + stored procs + LiteDB cache + generation-diff applier, Core with GenericDriverNodeManager rename + IAddressSpaceBuilder + driver isolation, Server with Microsoft.Extensions.Hosting replacing TopShelf + credential-bound bootstrap, Admin Blazor Server app mirroring ScadaLink CentralUI verbatim with LDAP cookie auth + draft/diff/publish workflow + UNS structure management + equipment CRUD + release-reservation and merge-equipment operator flows) — with task-level acceptance criteria, a 14-step end-to-end smoke test, and decision citation requirements for #1-125. New decisions #126-127 capture the gate model and per-phase doc structure. Cross-references added to plan.md Reference Documents section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-17 11:25:09 -04:00
parent a59ad2e0c6
commit 592fa79e3c
4 changed files with 1062 additions and 0 deletions

View File

@@ -0,0 +1,180 @@
# Implementation Plan Overview — OtOpcUa v2
> **Status**: DRAFT — defines the gate structure, compliance check approach, and deliverable conventions used across all phase implementation plans (`phase-0-*.md`, `phase-1-*.md`, etc.).
>
> **Branch**: `v2`
> **Created**: 2026-04-17
## Purpose
Each phase of the v2 build (`plan.md` §6 Migration Strategy) gets a dedicated detailed implementation doc in this folder. This overview defines the structure those docs follow so reviewers can verify compliance with the v2 design without re-reading every artifact.
## Phase Gate Structure
Every phase has **three gates** the work must pass through:
```
┌──────────┐ ┌──────────┐ ┌──────────┐
START ──┤ ENTRY │── do ──┤ MID │── verify ──┤ EXIT │── PHASE COMPLETE
│ GATE │ work │ GATE │ artifacts │ GATE │
└──────────┘ └──────────┘ └──────────┘
```
### Entry gate
**Purpose**: ensures the phase starts with a known-good state and all prerequisites met. Prevents starting work on top of broken foundations.
**Checked before any phase work begins**:
- Prior phase has cleared its **exit gate** (or this is Phase 0)
- Working tree is clean on the appropriate branch
- All baseline tests for the prior phase still pass
- Any external dependencies the phase needs are confirmed available
- Implementation lead has read the phase doc and the relevant sections of `plan.md`, `config-db-schema.md`, `driver-specs.md`, `driver-stability.md`, `admin-ui.md`
**Evidence captured**: a short markdown file `entry-gate-{phase}.md` recording the date, signoff, baseline test pass, and any deviations noted.
### Mid gate
**Purpose**: course-correct partway through the phase. Catches drift before it compounds. Optional for phases ≤ 2 weeks; required for longer phases.
**Checked at the midpoint**:
- Are the highest-risk deliverables landing on schedule?
- Have any new design questions surfaced that the v2 docs don't answer? If so, escalate to plan revision before continuing.
- Are tests being written alongside code, or accumulating as a backlog?
- Has any decision (`plan.md` decision log) been silently violated by the implementation? If so, either revise the implementation or revise the decision (with explicit "supersedes" entry).
**Evidence captured**: short status update appended to the phase doc.
### Exit gate
**Purpose**: ensures the phase actually achieved what the v2 design specified, not just "the code compiles". This is where compliance verification happens.
**Checked before the phase is declared complete**:
- All **acceptance criteria** for every task in the phase doc are met (each criterion has explicit evidence)
- All **compliance checks** (see below) pass
- All **completion checklist** items are ticked, with links to the verifying artifact (test, screenshot, log line, etc.)
- Phase commit history is clean (no half-merged WIP, no skipped hooks)
- Documentation updates merged: any change in approach during the phase is reflected back in the v2 design docs (`plan.md` decision log gets new entries; `config-db-schema.md` updated if schema differed from spec; etc.)
- Adversarial review run on the phase output (`/codex:adversarial-review` or equivalent) — findings closed or explicitly deferred with rationale
- Implementation lead **and** one other reviewer sign off
**Evidence captured**: `exit-gate-{phase}.md` recording all of the above with links and signatures.
## Compliance Check Categories
Phase exit gates run compliance checks across these axes. Each phase doc enumerates the specific checks for that phase under "Compliance Checks".
### 1. Schema compliance (Phase 1+)
For phases that touch the central config DB:
- Run EF Core migrations against a clean SQL Server instance
- Diff the resulting schema against the DDL in `config-db-schema.md`:
- Table list matches
- Column types and nullability match
- Indexes (regular + unique + filtered) match
- CHECK constraints match
- Foreign keys match
- Stored procedures present and signatures match
- Any drift = blocking. Either fix the migration or update the schema doc with explicit reasoning, then re-run.
### 2. Decision compliance
For each decision number cited in the phase doc (`#XX` references to `plan.md` decision log):
- Locate the artifact (code module, test, configuration file) that demonstrates the decision is honored
- Add a code comment or test name that cites the decision number
- Phase exit gate uses a script (or grep) to verify every cited decision has at least one citation in the codebase
This makes the decision log a **load-bearing reference**, not a historical record.
### 3. Visual compliance (Admin UI phases)
For phases that touch the Admin UI:
- Side-by-side screenshots of equivalent ScadaLink CentralUI screens vs the new OtOpcUa Admin screens
- Login page, sidebar, dashboard, generic forms — must visually match per `admin-ui.md` §"Visual Design — Direct Parity with ScadaLink"
- Reviewer signoff: "could the same operator move between apps without noticing?"
### 4. Behavioral compliance (end-to-end smoke tests)
For each phase, an integration test exercises the new capability end-to-end:
- Phase 0: existing v1 IntegrationTests pass under the renamed projects
- Phase 1: create a cluster → publish a generation → node fetches the generation → roll back → fetch again
- Phase 2: v1 IntegrationTests parity suite passes against the v2 Galaxy.Host (per decision #56)
- Phase 3+: per-driver smoke test against the simulator
Smoke tests are **always green at exit**, never "known broken, fix later".
### 5. Stability compliance (Phase 2+ for Tier C drivers)
For phases that introduce Tier C drivers (Galaxy in Phase 2, FOCAS in Phase 5):
- All `Driver Stability & Isolation` cross-cutting protections from `driver-stability.md` §"Cross-Cutting Protections" are wired up:
- SafeHandle wrappers exist for every native handle
- Memory watchdog runs and triggers recycle on threshold breach (testable via FaultShim)
- Crash-loop circuit breaker fires after 3 crashes / 5 min (testable via stub-injected crash)
- Heartbeat between proxy and host functions; missed heartbeats trigger respawn
- Post-mortem MMF survives a hard process kill and the supervisor reads it on respawn
- Each protection has a regression test in the driver's test suite
### 6. Documentation compliance
For every phase:
- Any deviation from the v2 design docs (`plan.md`, `config-db-schema.md`, `admin-ui.md`, `driver-specs.md`, `driver-stability.md`, `test-data-sources.md`) is reflected back in the docs
- New decisions added to the decision log with rationale
- Old decisions superseded explicitly (not silently)
- Cross-references between docs stay current
## Deliverable Types
Each phase produces a defined set of deliverables. The phase doc enumerates which deliverables apply.
| Type | Format | Purpose |
|------|--------|---------|
| **Code** | Source files committed to a feature branch, merged to `v2` after exit gate | The implementation itself |
| **Tests** | xUnit unit + integration tests; per-phase smoke tests | Behavioral evidence |
| **Migrations** | EF Core migrations under `Configuration/Migrations/` | Schema delta |
| **Decision-log entries** | New rows appended to `plan.md` decision table | Architectural choices made during the phase |
| **Doc updates** | Edits to existing v2 docs | Keep design and implementation aligned |
| **Gate records** | `entry-gate-{phase}.md`, `exit-gate-{phase}.md` in this folder | Audit trail of gate clearance |
| **Compliance script** | Per-phase shell or PowerShell script that runs the compliance checks | Repeatable verification |
| **Adversarial review** | `/codex:adversarial-review` output on the phase diff | Independent challenge |
## Branch and PR Conventions
| Branch | Purpose |
|--------|---------|
| `v2` | Long-running design + implementation branch. All phase work merges here. |
| `v2/phase-{N}-{slug}` | Per-phase feature branch (e.g. `v2/phase-0-rename`) |
| `v2/phase-{N}-{slug}-{subtask}` | Per-subtask branches when the phase is large enough to warrant them |
Each phase merges to `v2` via PR after the exit gate clears. PRs include:
- Link to the phase implementation doc
- Link to the exit-gate record
- Compliance-script output
- Adversarial-review output
- Reviewer signoffs
The `master` branch stays at v1 production state until all phases are complete and a separate v2 release decision is made.
## What Counts as "Following the Plan"
The implementation **follows the plan** when, at every phase exit gate:
1. Every task listed in the phase doc has been done OR explicitly deferred with rationale
2. Every compliance check has a passing artifact OR an explicit deviation note signed off by the reviewer
3. The codebase contains traceable references to every decision number the phase implements
4. The v2 design docs are updated to reflect any approach changes
5. The smoke test for the phase passes
6. Two people have signed off — implementation lead + one other reviewer
The implementation **deviates from the plan** when any of those conditions fails. Deviations are not failures; they are signals to update the plan or revise the implementation. The unrecoverable failure mode is **silent deviation** — code that doesn't match the plan, with no decision-log update explaining why. The exit gate's compliance checks exist specifically to make silent deviation impossible to ship.
## Phase Implementation Docs
| Phase | Doc | Status |
|-------|-----|--------|
| 0 | [`phase-0-rename-and-net10.md`](phase-0-rename-and-net10.md) | DRAFT |
| 1 | [`phase-1-configuration-and-admin-scaffold.md`](phase-1-configuration-and-admin-scaffold.md) | DRAFT |
| 2 | (Phase 2: Galaxy parity refactor — TBD) | NOT STARTED |
| 3 | (Phase 3: Modbus TCP driver — TBD) | NOT STARTED |
| 4 | (Phase 4: PLC drivers AB CIP / AB Legacy / S7 / TwinCAT — TBD) | NOT STARTED |
| 5 | (Phase 5: Specialty drivers FOCAS / OPC UA Client — TBD) | NOT STARTED |
| 6+ | (Phases 68: tier 1/2/3 consumer cutover — separate planning track per corrections doc C5) | NOT SCOPED |

View File

@@ -0,0 +1,269 @@
# Phase 0 — Rename to OtOpcUa + .NET 10 Cleanup
> **Status**: DRAFT — implementation plan for Phase 0 of the v2 build (`plan.md` §6).
>
> **Branch**: `v2/phase-0-rename`
> **Estimated duration**: 35 working days
> **Predecessor**: none (first phase)
> **Successor**: Phase 1 (`phase-1-configuration-and-admin-scaffold.md`)
## Phase Objective
Mechanically rename the existing v1 codebase from `LmxOpcUa` to `OtOpcUa` and verify all existing v1 tests still pass under the new names. **No new functionality**, **no .NET 10 retargeting of `Host` or `Historian.Aveva`** (those move in Phase 2 with the Galaxy split — they need to stay on .NET 4.8 because of MXAccess and Wonderware Historian SDK dependencies). All other projects are already on .NET 10 and stay there.
The phase exists as a clean checkpoint: future PRs reference `OtOpcUa` consistently, the rename is not entangled with semantic changes, and the diff is mechanical enough to review safely.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| Project names | `ZB.MOM.WW.LmxOpcUa.*``ZB.MOM.WW.OtOpcUa.*` (all 11 projects) |
| Solution file | `ZB.MOM.WW.LmxOpcUa.slnx``ZB.MOM.WW.OtOpcUa.slnx` |
| Namespaces | `ZB.MOM.WW.LmxOpcUa` root → `ZB.MOM.WW.OtOpcUa` root (all source files) |
| Assembly names | `<AssemblyName>` and `<RootNamespace>` in every csproj |
| Folder names | `src/ZB.MOM.WW.LmxOpcUa.*``src/ZB.MOM.WW.OtOpcUa.*`; same in `tests/` |
| Default `appsettings.json` keys | `Lmx*``Ot*` only where the section name is product-bound (e.g. `LmxOpcUa.Server``OtOpcUa.Server`); leave `MxAccess.*` keys alone (those refer to the AVEVA product, not ours) |
| Service registration name | TopShelf service name `LmxOpcUa``OtOpcUa` (until Phase 1 swaps TopShelf for `Microsoft.Extensions.Hosting`) |
| Documentation | All `docs/*.md` references; `CLAUDE.md` |
| Repo name | **NOT** in scope for Phase 0 — repo rename happens in a separate ops step after exit gate clears |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| `.NET Framework 4.8` target on `Host` and `Historian.Aveva` | MXAccess COM is 32-bit only; Wonderware Historian SDK is .NET 4.8. Both move to `Galaxy.Host` (still .NET 4.8 x86) in Phase 2. |
| `.NET 10` target on Client.CLI / Client.Shared / Client.UI / all Tests | Already there (verified 2026-04-17 via `grep TargetFramework src/*/*.csproj`). |
| Project structure (no new projects) | New projects (Configuration, Core, Core.Abstractions, Server, Admin) are added in Phase 1, not Phase 0. |
| Galaxy / MXAccess implementation | Stays in `OtOpcUa.Host` for now; Phase 2 splits it into Proxy/Host/Shared. |
| `master` branch / production deployments | Untouched — v2 work all happens on the `v2` branch. |
| OPC UA `ApplicationUri` defaults | Currently include `LmxOpcUa` — leave as-is to avoid breaking existing client trust during v1/v2 coexistence. New `ApplicationUri` defaults land in Phase 1 alongside the cluster model. |
| MxAccess product references in docs / code | "MxAccess" is AVEVA's product name, not ours. Stays. |
## Entry Gate Checklist
Verify all before opening the `v2/phase-0-rename` branch:
- [ ] `v2` branch is at commit `a59ad2e` or later (decisions #1125 captured)
- [ ] `git status` is clean on `v2`
- [ ] `dotnet test ZB.MOM.WW.LmxOpcUa.slnx` passes locally with **zero failing tests**, baseline test count recorded
- [ ] `dotnet build ZB.MOM.WW.LmxOpcUa.slnx` succeeds with zero errors and ≤ baseline warning count
- [ ] All design docs reviewed by the implementation lead: `docs/v2/plan.md`, `docs/v2/config-db-schema.md`, `docs/v2/admin-ui.md`, `docs/v2/driver-specs.md`, `docs/v2/driver-stability.md`, `docs/v2/implementation/overview.md`
- [ ] Decision #9 (rename to OtOpcUa as step 1) re-read and confirmed
- [ ] No other developers have open work on `v2` that would conflict with bulk renames
**Evidence file**: `docs/v2/implementation/entry-gate-phase-0.md` recording date, baseline test count, signoff name.
## Task Breakdown
### Task 0.1 — Inventory references
Generate a complete map of every place `LmxOpcUa` appears:
```bash
grep -rln "LmxOpcUa" --include="*.cs" --include="*.csproj" --include="*.slnx" --include="*.json" --include="*.md" --include="*.razor" .
```
Save the result to `docs/v2/implementation/phase-0-rename-inventory.md` (gitignored after phase completes).
**Acceptance**:
- Inventory file exists, lists every reference grouped by file type
- Reviewer agrees inventory is complete (cross-check against `git grep -i lmx` for case-sensitivity bugs)
### Task 0.2 — Rename project folders
Per project (11 projects total — 5 src + 6 tests):
```bash
git mv src/ZB.MOM.WW.LmxOpcUa.Client.CLI src/ZB.MOM.WW.OtOpcUa.Client.CLI
git mv src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.LmxOpcUa.Client.CLI.csproj \
src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.OtOpcUa.Client.CLI.csproj
```
Repeat for: `Client.Shared`, `Client.UI`, `Historian.Aveva`, `Host`, and all 6 test projects.
Use `git mv` (not `mv` + `git rm`/`git add`) to preserve history.
**Acceptance**:
- `ls src/` shows only `ZB.MOM.WW.OtOpcUa.*` folders
- `ls tests/` shows only `ZB.MOM.WW.OtOpcUa.*` folders
- `git log --follow` on a renamed file shows continuous history pre-rename
### Task 0.3 — Rename solution file
```bash
git mv ZB.MOM.WW.LmxOpcUa.slnx ZB.MOM.WW.OtOpcUa.slnx
```
Edit the `.slnx` to update every project path reference inside it.
**Acceptance**:
- `ZB.MOM.WW.OtOpcUa.slnx` exists and references the renamed project paths
- `dotnet sln list` (or `dotnet build` against the slnx) succeeds
### Task 0.4 — Update csproj contents
For every csproj:
- Update `<AssemblyName>` if explicitly set
- Update `<RootNamespace>` if explicitly set
- Update `<ProjectReference Include=...>` paths for inter-project refs
- Update `<PackageId>` if any project ships as a NuGet (none currently expected, but verify)
**Acceptance**:
- `grep -rl "LmxOpcUa" src/*/*.csproj tests/*/*.csproj` returns empty
- `dotnet restore` succeeds with no missing project references
### Task 0.5 — Bulk-rename namespaces in source files
Run the rename across all `.cs` and `.razor` files:
```bash
grep -rl "ZB.MOM.WW.LmxOpcUa" --include="*.cs" --include="*.razor" . \
| xargs sed -i 's/ZB\.MOM\.WW\.LmxOpcUa/ZB.MOM.WW.OtOpcUa/g'
```
**Acceptance**:
- `grep -rln "ZB.MOM.WW.LmxOpcUa" --include="*.cs" --include="*.razor" .` returns empty
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` succeeds
### Task 0.6 — Update appsettings.json + service hosting
In `src/ZB.MOM.WW.OtOpcUa.Host/appsettings.json` and equivalents:
- Rename product-named sections: `LmxOpcUa.Server``OtOpcUa.Server` (if present)
- Leave `MxAccess`, `Galaxy`, `Historian` keys untouched (those are external product names)
- Update TopShelf `ServiceName` constant from `LmxOpcUa``OtOpcUa`
**Acceptance**:
- Service install (`dotnet run --project src/.../Host install`) registers as `OtOpcUa`
- Service uninstall + reinstall cycle succeeds on a Windows test box
### Task 0.7 — Update documentation references
- `CLAUDE.md`: replace `LmxOpcUa` references with `OtOpcUa` in product-naming contexts; leave `MxAccess` / `MXAccess` references alone
- `docs/*.md` (existing v1 docs): same pattern
- `docs/v2/*.md`: already uses `OtOpcUa` — verify with grep
**Acceptance**:
- `grep -rln "LmxOpcUa" docs/ CLAUDE.md` returns only references that explicitly need to retain the old name (e.g. historical sections, change log)
- Each retained reference has a comment explaining why
### Task 0.8 — Run full test suite + smoke test
```bash
dotnet build ZB.MOM.WW.OtOpcUa.slnx
dotnet test ZB.MOM.WW.OtOpcUa.slnx
```
Plus manual smoke test of Client.CLI against a running v1 OPC UA server:
```bash
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840 -r -d 2
```
**Acceptance**:
- Test count matches the baseline recorded at entry gate; **zero failing tests**
- Smoke test produces equivalent output to baseline (capture both, diff)
### Task 0.9 — Update build commands in CLAUDE.md
The Build Commands section currently references `ZB.MOM.WW.LmxOpcUa.slnx`. Update to `ZB.MOM.WW.OtOpcUa.slnx`. Also update test paths.
**Acceptance**:
- `cat CLAUDE.md | grep -i lmxopcua` returns only retained-by-design references
- A new developer cloning the repo can follow CLAUDE.md to build + test successfully
## Compliance Checks (run at exit gate)
A `phase-0-compliance.ps1` (or `.sh`) script runs all these and exits non-zero on any failure:
1. **No stale `LmxOpcUa` references**:
```
grep -rln "LmxOpcUa" --include="*.cs" --include="*.csproj" --include="*.slnx" \
--include="*.json" --include="*.razor" . | wc -l
```
Expected: 0 (or only allowlisted retained references)
2. **All projects build**:
```
dotnet build ZB.MOM.WW.OtOpcUa.slnx --warnaserror
```
Expected: success, warning count ≤ baseline
3. **All tests pass**:
```
dotnet test ZB.MOM.WW.OtOpcUa.slnx
```
Expected: total count = baseline, failures = 0
4. **Solution structure matches plan**:
- `ls src/` shows exactly: `ZB.MOM.WW.OtOpcUa.{Client.CLI, Client.Shared, Client.UI, Historian.Aveva, Host}` (5 entries)
- `ls tests/` shows the 6 test projects similarly renamed
- No new projects yet (those land in Phase 1)
5. **.NET targets unchanged**:
- Client projects (CLI/Shared/UI): `net10.0`
- Host + Historian.Aveva: `net48` (split + retarget happens Phase 2)
- All test projects: same targets as their SUT projects
6. **Decision compliance**: this phase implements decision #9 ("Rename to OtOpcUa as step 1"). Verify by:
```
grep -rln "decision #9\|Decision #9" src/ tests/
```
Expected: at least one citation in CLAUDE.md or a phase-rename README explaining the mechanical scope.
7. **Service registration works**:
- Install service → `sc query OtOpcUa` returns the service
- Uninstall service → `sc query OtOpcUa` returns "service does not exist"
## Behavioral Smoke Test (exit-gate gate)
The v1 IntegrationTests suite is the authoritative behavioral spec for Phase 0. The renamed code must pass it identically.
```bash
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --logger "console;verbosity=detailed"
```
Expected: pass count = baseline. Fail count = 0. Skipped count = baseline.
## Completion Checklist
The exit gate signs off only when **every** item below is checked:
- [ ] All 11 projects renamed (5 src + 6 tests)
- [ ] Solution file renamed
- [ ] All `<AssemblyName>` / `<RootNamespace>` / `<ProjectReference>` updated
- [ ] All namespaces in source files updated
- [ ] `appsettings.json` product-named sections updated; external product names untouched
- [ ] TopShelf service name updated; install/uninstall cycle verified on a Windows host
- [ ] `docs/*.md` and `CLAUDE.md` references updated; retained references explained
- [ ] Build succeeds with zero errors and warning count ≤ baseline
- [ ] Test suite passes with zero failures and count = baseline
- [ ] Smoke test against running OPC UA server matches baseline output
- [ ] `phase-0-compliance.ps1` script runs and exits 0
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, adversarial review output
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
- [ ] `exit-gate-phase-0.md` recorded with all of the above
After the PR merges, repo rename (`lmxopcua` → `otopcua` on Gitea) happens as a separate ops step — out of scope for Phase 0.
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| Bulk `sed` rename breaks string literals (e.g. `"LmxOpcUa"` used as a runtime identifier) | Medium | Medium | Inventory step (0.1) flags string literals separately; rename them deliberately, not via bulk sed |
| MxAccess / Galaxy / Wonderware references accidentally renamed | Low | High (breaks COM interop) | Inventory step (0.1) calls out external product names explicitly; bulk rename targets only `ZB.MOM.WW.LmxOpcUa` (with namespace prefix), not bare `LmxOpcUa` |
| Test count drops silently because a test project doesn't get re-discovered | Medium | High | Baseline test count captured at entry gate; exit gate compares exactly |
| `.slnx` references break and projects disappear from solution view | Low | Medium | `dotnet sln list` after Task 0.3 verifies all projects load |
| TopShelf service install fails on a hardened Windows box (UAC, signing) | Low | Low | Manual install/uninstall cycle is part of Task 0.6 acceptance |
| Long-lived branches diverge while phase 0 is in flight | Medium | Low | Phase 0 expected duration ≤ 5 days; coordinate that no other v2 work merges during the phase |
## Out of Scope (do not do in Phase 0)
- Adding any new project (Configuration, Admin, Core, Server, Driver.* — all Phase 1+)
- Splitting Host into Galaxy.Proxy/Host/Shared (Phase 2)
- Migrating Host/Historian.Aveva to .NET 10 (Phase 2 — when Galaxy is split, the .NET 4.8 x86 piece becomes Galaxy.Host and the rest can move)
- Replacing TopShelf with `Microsoft.Extensions.Hosting` (Phase 1, decision #30)
- Implementing the cluster / namespace / equipment data model (Phase 1)
- Changing any OPC UA wire behavior
- Renaming the Gitea repo

View File

@@ -0,0 +1,608 @@
# Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold
> **Status**: DRAFT — implementation plan for Phase 1 of the v2 build (`plan.md` §6).
>
> **Branch**: `v2/phase-1-configuration`
> **Estimated duration**: 46 weeks (largest greenfield phase; most foundational)
> **Predecessor**: Phase 0 (`phase-0-rename-and-net10.md`)
> **Successor**: Phase 2 (Galaxy parity refactor)
## Phase Objective
Stand up the **central configuration substrate** for the v2 fleet:
1. **`Core.Abstractions` project** — driver capability interfaces (`IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IRediscoverable`, `IHostConnectivityProbe`, `IDriverConfigEditor`, `DriverAttributeInfo`)
2. **`Configuration` project** — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
3. **`Core` project** — `GenericDriverNodeManager` (renamed from `LmxNodeManager`), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via `IAddressSpaceBuilder`
4. **`Server` project** — `Microsoft.Extensions.Hosting`-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
5. **`Admin` project** — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)
**No driver instances yet** (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.
## Scope — What Changes
| Concern | Change |
|---------|--------|
| New projects | 5 new src projects + 5 matching test projects |
| Existing v1 Host project | Refactored to consume `Core.Abstractions` interfaces against its existing Galaxy implementation — **but not split into Proxy/Host/Shared yet** (Phase 2) |
| `LmxNodeManager` | **Renamed to `GenericDriverNodeManager`** in Core, with `IDriver` swapped in for `IMxAccessClient`. The existing v1 Host instantiates `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process) — see `plan.md` §5a |
| Service hosting | TopShelf removed; `Microsoft.Extensions.Hosting` BackgroundService used (decision #30) |
| Central config DB | New SQL Server database `OtOpcUaConfig` provisioned from EF Core migrations |
| LDAP authentication for Admin | `Admin.Security` project mirrors `ScadaLink.Security`; cookie auth + JWT API endpoint |
| Local LiteDB cache on each node | New `config_cache.db` per node; bootstraps from central DB or cache |
## Scope — What Does NOT Change
| Item | Reason |
|------|--------|
| Galaxy out-of-process split | Phase 2 |
| Any new driver (Modbus, AB, S7, etc.) | Phase 3+ |
| OPC UA wire behavior | Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything |
| Equipment-class template integration with future schemas repo | `EquipmentClassRef` is a nullable hook column; no validation yet (decisions #112, #115) |
| Per-driver custom config editors in Admin | Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases |
| Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) | Phases 68 |
| Equipment Protocol Survey | External prerequisite — ideally runs in parallel with Phase 1 (handoff §"Equipment Protocol Survey") |
## Entry Gate Checklist
- [ ] Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
- [ ] `v2` branch is clean
- [ ] Phase 0 PR merged
- [ ] SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
- [ ] LDAP / GLAuth dev instance available for Admin auth integration testing
- [ ] ScadaLink CentralUI source accessible at `C:\Users\dohertj2\Desktop\scadalink-design\` for parity reference
- [ ] All Phase 1-relevant design docs reviewed: `plan.md` §45, `config-db-schema.md` (entire), `admin-ui.md` (entire), `driver-stability.md` §"Cross-Cutting Protections" (sets context for `Core.Abstractions` scope)
- [ ] Decisions #1125 read at least skim-level; key ones for Phase 1: #1422, #25, #28, #30, #3233, #4651, #79125
**Evidence file**: `docs/v2/implementation/entry-gate-phase-1.md` recording date, signoff, environment availability.
## Task Breakdown
Phase 1 is large — broken into 5 work streams (AE) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.
### Stream A — Core.Abstractions (1 week)
#### Task A.1 — Define driver capability interfaces
Create `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/` (.NET 10, no dependencies). Define:
```csharp
public interface IDriver { /* lifecycle, metadata, health */ }
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
public interface IReadable { /* on-demand read */ }
public interface IWritable { /* on-demand write */ }
public interface ISubscribable { /* data change subscriptions */ }
public interface IAlarmSource { /* alarm events + acknowledgment */ }
public interface IHistoryProvider { /* historical reads */ }
public interface IRediscoverable { /* opt-in change-detection signal */ }
public interface IHostConnectivityProbe { /* per-host runtime status */ }
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }
```
Plus the data models referenced from the interfaces:
```csharp
public sealed record DriverAttributeInfo(
string FullName,
DriverDataType DriverDataType,
bool IsArray,
uint? ArrayDim,
SecurityClassification SecurityClass,
bool IsHistorized);
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }
```
**Acceptance**:
- All interfaces compile in a project with **zero dependencies** beyond BCL
- xUnit test project asserts (via reflection) that no interface returns or accepts a type from `Core` or `Configuration` (interface independence per decision #59)
- Each interface XML doc cites the design decision(s) it implements (e.g. `IRediscoverable` cites #54)
#### Task A.2 — Define DriverTypeRegistry
```csharp
public sealed class DriverTypeRegistry
{
public DriverTypeMetadata Get(string driverType);
public IEnumerable<DriverTypeMetadata> All();
}
public sealed record DriverTypeMetadata(
string TypeName, // "Galaxy" | "ModbusTcp" | ...
NamespaceKindCompatibility AllowedNamespaceKinds, // per decision #111
string DriverConfigJsonSchema, // per decision #91
string DeviceConfigJsonSchema, // optional
string TagConfigJsonSchema);
[Flags]
public enum NamespaceKindCompatibility
{
Equipment = 1, SystemPlatform = 2, Simulated = 4
}
```
In v2.0 v1 only registers the `Galaxy` type (`AllowedNamespaceKinds = SystemPlatform`). Phase 3+ extends.
**Acceptance**:
- Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
- Galaxy registration entry exists with `AllowedNamespaceKinds = SystemPlatform` per decision #111
### Stream B — Configuration project (1.5 weeks)
#### Task B.1 — EF Core schema + initial migration
Create `src/ZB.MOM.WW.OtOpcUa.Configuration/` (.NET 10, EF Core 10).
Implement DbContext with entities matching `config-db-schema.md` exactly:
- `ServerCluster`, `ClusterNode`, `ClusterNodeCredential`
- `Namespace` (generation-versioned per decision #123)
- `UnsArea`, `UnsLine`
- `ConfigGeneration`
- `DriverInstance`, `Device`, `Equipment`, `Tag`, `PollGroup`
- `ClusterNodeGenerationState`, `ConfigAuditLog`
- `ExternalIdReservation` (NOT generation-versioned per decision #124)
Generate the initial migration:
```bash
dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration
```
**Acceptance**:
- Apply migration to a clean SQL Server instance produces the schema in `config-db-schema.md`
- Schema-validation test (`SchemaComplianceTests`) introspects the live DB and asserts every table/column/index/constraint matches the doc
- Test runs in CI against a SQL Server container
#### Task B.2 — Stored procedures via `MigrationBuilder.Sql`
Add stored procedures from `config-db-schema.md` §"Stored Procedures":
- `sp_GetCurrentGenerationForCluster`
- `sp_GetGenerationContent`
- `sp_RegisterNodeGenerationApplied`
- `sp_PublishGeneration` (with the `MERGE` against `ExternalIdReservation` per decision #124)
- `sp_RollbackToGeneration`
- `sp_ValidateDraft` (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
- `sp_ComputeGenerationDiff`
- `sp_ReleaseExternalIdReservation` (FleetAdmin only)
Use `CREATE OR ALTER` style in `MigrationBuilder.Sql()` blocks so procs version with the schema.
**Acceptance**:
- Each proc has at least one xUnit test exercising the happy path + at least one error path
- `sp_PublishGeneration` has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
- `sp_GetCurrentGenerationForCluster` has an authorization test: caller bound to NodeId X cannot read cluster Y's generation
#### Task B.3 — Authorization model (SQL principals + GRANT)
Add a separate migration `AuthorizationGrants` that:
- Creates two SQL roles: `OtOpcUaNode`, `OtOpcUaAdmin`
- Grants EXECUTE on the appropriate procs per `config-db-schema.md` §"Authorization Model"
- Grants no direct table access to either role
**Acceptance**:
- Test that runs as a `OtOpcUaNode`-roled principal can only call the node procs, not admin procs
- Test that runs as a `OtOpcUaAdmin`-roled principal can call publish/rollback procs
- Test that direct `SELECT * FROM dbo.ConfigGeneration` from a `OtOpcUaNode` principal is denied
#### Task B.4 — JSON-schema validators (managed code)
In `Configuration.Validation/`, implement validators consumed by `sp_ValidateDraft` (called from the Admin app pre-publish per decision #91):
- UNS segment regex (`^[a-z0-9-]{1,32}$` or `_default`)
- Path length (≤200 chars)
- UUID immutability across generations
- Same-cluster namespace binding (decision #122)
- ZTag/SAPID reservation pre-flight (decision #124)
- EquipmentId derivation rule (decision #125)
- Driver type ↔ namespace kind allowed (decision #111)
- JSON-schema validation per `DriverType` from `DriverTypeRegistry`
**Acceptance**:
- One unit test per rule, both passing and failing cases
- Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)
#### Task B.5 — LiteDB local cache
In `Configuration.LocalCache/`, implement the LiteDB schema from `config-db-schema.md` §"Local LiteDB Cache":
```csharp
public interface ILocalConfigCache
{
Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
Task PutAsync(GenerationCacheEntry entry);
Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
}
```
**Acceptance**:
- Round-trip test: write a generation snapshot, read it back, assert deep equality
- Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
- Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error
#### Task B.6 — Generation-diff application logic
In `Configuration.Apply/`, implement the diff-and-apply logic that runs on each node when a new generation arrives:
```csharp
public interface IGenerationApplier
{
Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
}
```
Diff per entity type, dispatch to driver `Reinitialize` / cache flush as needed.
**Acceptance**:
- Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → `Added` for each
- Diff test: from = (above), to = same with one tag's `Name` changed → `Modified` for one tag, no other changes
- Diff test: from = (above), to = same with one equipment removed → `Removed` for the equipment + cascading `Removed` for its tags
- Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry
### Stream C — Core project (1 week, can parallel with Stream D)
#### Task C.1 — Rename `LmxNodeManager` → `GenericDriverNodeManager`
Per `plan.md` §5a:
- Lift the file from `Host/OpcUa/LmxNodeManager.cs` to `Core/OpcUa/GenericDriverNodeManager.cs`
- Swap `IMxAccessClient` for `IDriver` (composing `IReadable` / `IWritable` / `ISubscribable`)
- Swap `GalaxyAttributeInfo` for `DriverAttributeInfo`
- Promote `GalaxyRuntimeProbeManager` interactions to use `IHostConnectivityProbe`
- Move `MxDataTypeMapper` and `SecurityClassificationMapper` to a new `Driver.Galaxy.Mapping/` (still in legacy Host until Phase 2)
**Acceptance**:
- v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
- Reflection test asserts `GenericDriverNodeManager` has no static or instance reference to any Galaxy-specific type
#### Task C.2 — Derive `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process)
In the existing Host project, add a thin `GalaxyNodeManager` that:
- Inherits from `GenericDriverNodeManager`
- Wires up `MxDataTypeMapper`, `SecurityClassificationMapper`, the probe manager, etc.
- Replaces direct instantiation of the renamed class
**Acceptance**:
- v1 IntegrationTests pass identically with `GalaxyNodeManager` instantiated instead of the old direct class
- Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)
#### Task C.3 — `IAddressSpaceBuilder` API (decision #52)
Implement the streaming builder API drivers use to register nodes:
```csharp
public interface IAddressSpaceBuilder
{
IFolderBuilder Folder(string browseName, string displayName);
IVariableBuilder Variable(string browseName, DriverDataType type, ...);
void AddProperty(string browseName, object value);
}
```
Refactor `GenericDriverNodeManager.BuildAddressSpace` to consume `IAddressSpaceBuilder` (driver streams in tags rather than buffering them).
**Acceptance**:
- Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
- Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach
#### Task C.4 — Driver hosting + isolation (decision #65, #74)
Implement the in-process driver host that:
- Loads each `DriverInstance` row's driver assembly
- Catches and contains driver exceptions (driver isolation, decision #12)
- Surfaces `IDriver.Reinitialize()` to the configuration applier
- Tracks per-driver allocation footprint (`GetMemoryFootprint()` polled every 30s per `driver-stability.md`)
- Flushes optional caches on budget breach
- Marks drivers `Faulted` (Bad quality on their nodes) if `Reinitialize` fails
**Acceptance**:
- Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
- Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.
### Stream D — Server project (4 days, can parallel with Stream C)
#### Task D.1 — `Microsoft.Extensions.Hosting` Windows Service host (decision #30)
Replace TopShelf with `Microsoft.Extensions.Hosting`:
- New `Program.cs` using `Host.CreateApplicationBuilder()`
- `BackgroundService` that owns the OPC UA server lifecycle
- `services.UseWindowsService()` registers as a Windows service
- Configuration bootstrap from `appsettings.json` (NodeId + ClusterId + DB conn) per decision #18
**Acceptance**:
- `dotnet run` runs interactively (console mode)
- Installed as a Windows Service (`sc create OtOpcUa ...`), starts and stops cleanly
- Service install + uninstall cycle leaves no leftover state
#### Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)
On startup:
- Read `Cluster.NodeId` + `Cluster.ClusterId` + `ConfigDatabase.ConnectionString` from `appsettings.json`
- Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
- Call `sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId)` — the proc verifies the connected principal is bound to NodeId
- If proc rejects → fail startup loudly with the principal mismatch message
**Acceptance**:
- Test: principal bound to Node A boots successfully when configured with NodeId = A
- Test: principal bound to Node A configured with NodeId = B → startup fails with `Unauthorized` and the service does not stay running
- Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → `Forbidden`
#### Task D.3 — LiteDB cache fallback on DB outage
If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.
**Acceptance**:
- Test: with central DB unreachable, node starts from cache, logs `ConfigDbUnreachableUsingCache` event, OPC UA endpoint serves the cached config
- Test: cache empty AND central DB unreachable → startup fails with `NoConfigAvailable` (decision #21)
### Stream E — Admin project (2.5 weeks)
#### Task E.1 — Project scaffold mirroring ScadaLink CentralUI (decision #102)
Copy the project layout from `scadalink-design/src/ScadaLink.CentralUI/` (decision #104):
- `src/ZB.MOM.WW.OtOpcUa.Admin/`: Razor Components project, .NET 10, `AddInteractiveServerComponents`
- `Auth/AuthEndpoints.cs`, `Auth/CookieAuthenticationStateProvider.cs`
- `Components/Layout/MainLayout.razor`, `Components/Layout/NavMenu.razor`
- `Components/Pages/Login.razor`, `Components/Pages/Dashboard.razor`
- `Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor`
- `EndpointExtensions.cs`, `ServiceCollectionExtensions.cs`
Plus `src/ZB.MOM.WW.OtOpcUa.Admin.Security/` (decision #104): `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies` mirroring `ScadaLink.Security`.
**Acceptance**:
- App builds and runs locally
- Login page renders with OtOpcUa branding (only the `<h4>` text differs from ScadaLink)
- Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)
#### Task E.2 — Bootstrap LDAP + cookie auth + admin role mapping
Wire up `LdapAuthService` against the dev GLAuth instance per `Security.md`. Map LDAP groups to admin roles:
- `OtOpcUaAdmins``FleetAdmin`
- `OtOpcUaConfigEditors``ConfigEditor`
- `OtOpcUaViewers``ReadOnly`
Plus cluster-scoped grants per decision #105 (LDAP group `OtOpcUaConfigEditors-LINE3``ConfigEditor` + `ClusterId = LINE3-OPCUA` claim).
**Acceptance**:
- Login as a `FleetAdmin`-mapped user → redirected to `/`, sidebar shows admin sections
- Login as a `ReadOnly`-mapped user → redirected to `/`, sidebar shows view-only sections
- Login as a cluster-scoped `ConfigEditor` → only their permitted clusters appear in `/clusters`
- Login with bad credentials → redirected to `/login?error=...` with the LDAP error surfaced
#### Task E.3 — Cluster CRUD pages
Implement per `admin-ui.md`:
- `/clusters` — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
- `/clusters/{ClusterId}` — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
- "New cluster" workflow per `admin-ui.md` §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
- ApplicationUri auto-suggest on node create per decision #86
**Acceptance**:
- Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
- Edit cluster name → change reflected in list + detail
- Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge
#### Task E.4 — Draft → diff → publish workflow (decision #89)
Implement per `admin-ui.md` §"Draft Editor", §"Diff Viewer", §"Generation History":
- `/clusters/{Id}/draft` — full draft editor with auto-save (debounced 500ms per decision #97)
- `/clusters/{Id}/draft/diff` — three-column diff viewer
- `/clusters/{Id}/generations` — list of historical generations with rollback action
- Live `sp_ValidateDraft` invocation in the validation panel; publish disabled while errors exist
- Publish dialog requires Notes; runs `sp_PublishGeneration` in a transaction
**Acceptance**:
- Create draft → validation panel runs and shows clean state for empty draft
- Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
- Fix the row → validation panel goes green + publish enables
- Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
- Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
- The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label
#### Task E.5 — UNS Structure + Equipment + Namespace tabs
Implement the three hybrid tabs:
- Namespaces tab — list with click-to-edit-in-draft
- UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
- Equipment tab — list with default sort by ZTag, search across all 5 identifiers
CSV import for Equipment per the revised schema in `admin-ui.md` (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).
**Acceptance**:
- Add a UnsArea via draft → publishes → appears in tree
- Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
- Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against `ExternalIdReservation` (decision #124)
- Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields
#### Task E.6 — Generic JSON config editor for `DriverConfig`
Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against `DriverTypeRegistry`'s registered JSON schema for the driver type.
**Acceptance**:
- Add a Galaxy `DriverInstance` in a draft → JSON editor renders the Galaxy DriverConfig schema
- Editing produces live validation errors per the schema
- Saving with errors → publish stays disabled
#### Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")
Two SignalR hubs:
- `FleetStatusHub` — pushes `ClusterNodeGenerationState` changes
- `AlertHub` — pushes new sticky alerts (crash-loop circuit trips, failed applies)
Backend `IHostedService` polls every 5s and diffs.
**Acceptance**:
- Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
- Simulate a `LastAppliedStatus = Failed` for a node → AlertHub pushes a sticky alert that doesn't auto-clear
#### Task E.8 — Release reservation + Merge equipment workflows
Per `admin-ui.md` §"Release an external-ID reservation" and §"Merge or rebind equipment":
- Release flow: FleetAdmin only, requires reason, audit-logged via `sp_ReleaseExternalIdReservation`
- Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs
**Acceptance**:
- Release a reservation → `ReleasedAt` set in DB + audit log entry created with reason
- After release: same `(Kind, Value)` can be reserved by a different EquipmentUuid in a future publish
- Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with `EquipmentMergedAway` audit entry
## Compliance Checks (run at exit gate)
A `phase-1-compliance.ps1` script that exits non-zero on any failure:
### Schema compliance
```powershell
# Run all migrations against a clean SQL Server instance
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."
# Run schema-introspection tests
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"
```
Expected: every table, column, index, FK, CHECK, and stored procedure in `config-db-schema.md` is present and matches.
### Decision compliance
```powershell
# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
# verify at least one citation exists in source, tests, or migrations:
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
foreach ($d in $decisions) {
$hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
}
```
### Visual compliance (Admin UI)
Manual screenshot review:
1. Login page side-by-side with ScadaLink's `Login.razor` rendered
2. Sidebar + main layout side-by-side with ScadaLink's `MainLayout.razor` + `NavMenu.razor`
3. Dashboard side-by-side with ScadaLink's `Dashboard.razor`
4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink
Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.
### Behavioral compliance (end-to-end smoke test)
```bash
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"
```
The smoke test:
1. Spins up SQL Server in a container
2. Runs all migrations
3. Creates a `OtOpcUaAdmin` SQL principal + `OtOpcUaNode` principal bound to a test NodeId
4. Starts the Admin app
5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
7. Publishes the draft
8. Boots a Server instance configured with the test NodeId
9. Asserts the Server fetched the published generation via `sp_GetCurrentGenerationForCluster`
10. Asserts the Server's `ClusterNodeGenerationState` row reports `Applied`
11. Adds a tag in a new draft, publishes
12. Asserts the Server picks up the new generation within 30s (next poll)
13. Rolls back to generation 1
14. Asserts the Server picks up the rollback within 30s
Expected: all 14 steps pass. Smoke test runs in CI on every PR to `v2/phase-1-*` branches.
### Stability compliance
For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):
- `IDriver.Reinitialize()` semantics tested
- Driver-instance allocation tracking + cache flush tested with a mock driver
- Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize
Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.
### Documentation compliance
```bash
# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
# Schema doc + admin-ui doc must be updated if implementation deviated
```
## Completion Checklist
The exit gate signs off only when **every** item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).
### Stream A — Core.Abstractions
- [ ] All 11 capability interfaces defined and compiling
- [ ] `DriverAttributeInfo` + supporting enums defined
- [ ] `DriverTypeRegistry` implemented with Galaxy registration
- [ ] Interface-independence reflection test passes
### Stream B — Configuration
- [ ] EF Core migration `InitialSchema` applies cleanly to a clean SQL Server
- [ ] Schema introspection test asserts the live schema matches `config-db-schema.md`
- [ ] All stored procedures present and tested (happy path + error paths)
- [ ] `sp_PublishGeneration` concurrency test passes (one wins, one fails)
- [ ] Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
- [ ] All 12 validation rules in `Configuration.Validation` have unit tests
- [ ] LiteDB cache round-trip + pruning + corruption tests pass
- [ ] Generation-diff applier handles add/remove/modify across all entity types
### Stream C — Core
- [ ] `LmxNodeManager` renamed to `GenericDriverNodeManager`; v1 IntegrationTests still pass
- [ ] `GalaxyNodeManager : GenericDriverNodeManager` exists in legacy Host
- [ ] `IAddressSpaceBuilder` API implemented; byte-equivalent OPC UA browse output to v1
- [ ] Driver hosting + isolation tested with mock drivers (one fails, others continue)
- [ ] Memory-budget cache-flush tested with mock driver
### Stream D — Server
- [ ] `Microsoft.Extensions.Hosting` host runs in console mode and as Windows Service
- [ ] TopShelf removed from the codebase
- [ ] Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
- [ ] LiteDB fallback on DB outage tested
### Stream E — Admin
- [ ] Admin app boots, login screen renders with ScadaLink-equivalent visual
- [ ] LDAP cookie auth works against dev GLAuth
- [ ] Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
- [ ] Cluster-scoped grants work (decision #105)
- [ ] Cluster CRUD works end-to-end
- [ ] Draft → diff → publish workflow works end-to-end
- [ ] Rollback works end-to-end
- [ ] UNS Structure tab supports add / rename / drag-move with impact preview
- [ ] Equipment tab supports CSV import + search across 5 identifiers
- [ ] Generic JSON config editor renders + validates DriverConfig per registered schema
- [ ] SignalR real-time updates work (multi-tab test)
- [ ] Release reservation flow works + audit-logged
- [ ] Merge equipment flow works + audit-logged
### Cross-cutting
- [ ] `phase-1-compliance.ps1` runs and exits 0
- [ ] Smoke test (14 steps) passes in CI
- [ ] Visual compliance review signed off (operator-equivalence test)
- [ ] All decisions cited in code/tests (`git grep "decision #N"` returns hits for each)
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
- [ ] `exit-gate-phase-1.md` recorded
## Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|:----------:|:------:|------------|
| EF Core 10 idiosyncrasies vs the documented schema | Medium | Medium | Schema-introspection test catches drift; validate early in Stream B |
| `sp_ValidateDraft` cross-table checks complex enough to be slow | Medium | Medium | Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit |
| Visual parity with ScadaLink slips because two component libraries diverge over time | Low | Medium | Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical |
| LDAP integration breaks against production GLAuth (different schema than dev) | Medium | High | Use the v1 LDAP layer as the integration reference; mirror its config exactly |
| Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) | High | High | Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state |
| ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different | Medium | Medium | Side-by-side review of `RoleMapper` after Stream E starts; refactor if claim shape diverges |
| Phase 1 takes longer than 6 weeks | High | Medium | Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.58 to a Phase 1.5 follow-up |
| `MERGE` against `ExternalIdReservation` has a deadlock pathology under concurrent publishes | Medium | High | Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to `INSERT ... WHERE NOT EXISTS` with explicit row locks |
## Out of Scope (do not do in Phase 1)
- Galaxy out-of-process split (Phase 2)
- Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 35)
- Per-driver custom config editors in Admin (each driver's phase)
- Equipment-class template integration with the schemas repo
- Consumer cutover (Phases 68, separate planning track)
- ACL / namespace-level authorization for OPC UA clients (corrections doc B1 — needs scoping before Phase 6, parallel work track)
- Push-from-DB notification (decision #96 — v2.1)
- Generation pruning operator UI (decision #93 — v2.1)
- Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
- Mobile / tablet layout