Add Phase 0 + Phase 1 detailed implementation plans under docs/v2/implementation/ with a phase-gate model so the work can be verified for compliance to the v2 design as it lands. Three-gate structure per phase (entry / mid / exit) with explicit compliance-check categories: schema compliance (live DB introspected against config-db-schema.md DDL via xUnit), decision compliance (every decision number cited in the phase doc must have at least one code/test citation in the codebase, verified via git grep), visual compliance (Admin UI screenshots reviewed side-by-side against ScadaLink CentralUI's equivalent screens), behavioral compliance (per-phase end-to-end smoke test that always passes at exit, never "known broken fix later"), stability compliance (cross-cutting protections from driver-stability.md wired up and regression-tested for Tier C drivers), and documentation compliance (any deviation from v2 design docs reflected back as decision-log updates with explicit "supersedes" notes). Exit gate requires two-reviewer signoff and an exit-gate-{phase}.md record; silent deviation is the failure mode the gates exist to make impossible to ship. Phase 0 doc covers the mechanical LmxOpcUa → OtOpcUa rename with 9 tasks, 7 compliance checks, and a completion checklist that gates on baseline test count parity. Phase 1 doc covers the largest greenfield phase — 5 work streams (Core.Abstractions, Configuration project with EF Core schema + stored procs + LiteDB cache + generation-diff applier, Core with GenericDriverNodeManager rename + IAddressSpaceBuilder + driver isolation, Server with Microsoft.Extensions.Hosting replacing TopShelf + credential-bound bootstrap, Admin Blazor Server app mirroring ScadaLink CentralUI verbatim with LDAP cookie auth + draft/diff/publish workflow + UNS structure management + equipment CRUD + release-reservation and merge-equipment operator flows) — with task-level acceptance criteria, a 14-step end-to-end smoke test, and decision citation requirements for #1-125. New decisions #126-127 capture the gate model and per-phase doc structure. Cross-references added to plan.md Reference Documents section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
180
docs/v2/implementation/overview.md
Normal file
180
docs/v2/implementation/overview.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Implementation Plan Overview — OtOpcUa v2
|
||||
|
||||
> **Status**: DRAFT — defines the gate structure, compliance check approach, and deliverable conventions used across all phase implementation plans (`phase-0-*.md`, `phase-1-*.md`, etc.).
|
||||
>
|
||||
> **Branch**: `v2`
|
||||
> **Created**: 2026-04-17
|
||||
|
||||
## Purpose
|
||||
|
||||
Each phase of the v2 build (`plan.md` §6 Migration Strategy) gets a dedicated detailed implementation doc in this folder. This overview defines the structure those docs follow so reviewers can verify compliance with the v2 design without re-reading every artifact.
|
||||
|
||||
## Phase Gate Structure
|
||||
|
||||
Every phase has **three gates** the work must pass through:
|
||||
|
||||
```
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
START ──┤ ENTRY │── do ──┤ MID │── verify ──┤ EXIT │── PHASE COMPLETE
|
||||
│ GATE │ work │ GATE │ artifacts │ GATE │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
### Entry gate
|
||||
|
||||
**Purpose**: ensures the phase starts with a known-good state and all prerequisites met. Prevents starting work on top of broken foundations.
|
||||
|
||||
**Checked before any phase work begins**:
|
||||
- Prior phase has cleared its **exit gate** (or this is Phase 0)
|
||||
- Working tree is clean on the appropriate branch
|
||||
- All baseline tests for the prior phase still pass
|
||||
- Any external dependencies the phase needs are confirmed available
|
||||
- Implementation lead has read the phase doc and the relevant sections of `plan.md`, `config-db-schema.md`, `driver-specs.md`, `driver-stability.md`, `admin-ui.md`
|
||||
|
||||
**Evidence captured**: a short markdown file `entry-gate-{phase}.md` recording the date, signoff, baseline test pass, and any deviations noted.
|
||||
|
||||
### Mid gate
|
||||
|
||||
**Purpose**: course-correct partway through the phase. Catches drift before it compounds. Optional for phases ≤ 2 weeks; required for longer phases.
|
||||
|
||||
**Checked at the midpoint**:
|
||||
- Are the highest-risk deliverables landing on schedule?
|
||||
- Have any new design questions surfaced that the v2 docs don't answer? If so, escalate to plan revision before continuing.
|
||||
- Are tests being written alongside code, or accumulating as a backlog?
|
||||
- Has any decision (`plan.md` decision log) been silently violated by the implementation? If so, either revise the implementation or revise the decision (with explicit "supersedes" entry).
|
||||
|
||||
**Evidence captured**: short status update appended to the phase doc.
|
||||
|
||||
### Exit gate
|
||||
|
||||
**Purpose**: ensures the phase actually achieved what the v2 design specified, not just "the code compiles". This is where compliance verification happens.
|
||||
|
||||
**Checked before the phase is declared complete**:
|
||||
- All **acceptance criteria** for every task in the phase doc are met (each criterion has explicit evidence)
|
||||
- All **compliance checks** (see below) pass
|
||||
- All **completion checklist** items are ticked, with links to the verifying artifact (test, screenshot, log line, etc.)
|
||||
- Phase commit history is clean (no half-merged WIP, no skipped hooks)
|
||||
- Documentation updates merged: any change in approach during the phase is reflected back in the v2 design docs (`plan.md` decision log gets new entries; `config-db-schema.md` updated if schema differed from spec; etc.)
|
||||
- Adversarial review run on the phase output (`/codex:adversarial-review` or equivalent) — findings closed or explicitly deferred with rationale
|
||||
- Implementation lead **and** one other reviewer sign off
|
||||
|
||||
**Evidence captured**: `exit-gate-{phase}.md` recording all of the above with links and signatures.
|
||||
|
||||
## Compliance Check Categories
|
||||
|
||||
Phase exit gates run compliance checks across these axes. Each phase doc enumerates the specific checks for that phase under "Compliance Checks".
|
||||
|
||||
### 1. Schema compliance (Phase 1+)
|
||||
|
||||
For phases that touch the central config DB:
|
||||
- Run EF Core migrations against a clean SQL Server instance
|
||||
- Diff the resulting schema against the DDL in `config-db-schema.md`:
|
||||
- Table list matches
|
||||
- Column types and nullability match
|
||||
- Indexes (regular + unique + filtered) match
|
||||
- CHECK constraints match
|
||||
- Foreign keys match
|
||||
- Stored procedures present and signatures match
|
||||
- Any drift = blocking. Either fix the migration or update the schema doc with explicit reasoning, then re-run.
|
||||
|
||||
### 2. Decision compliance
|
||||
|
||||
For each decision number cited in the phase doc (`#XX` references to `plan.md` decision log):
|
||||
- Locate the artifact (code module, test, configuration file) that demonstrates the decision is honored
|
||||
- Add a code comment or test name that cites the decision number
|
||||
- Phase exit gate uses a script (or grep) to verify every cited decision has at least one citation in the codebase
|
||||
|
||||
This makes the decision log a **load-bearing reference**, not a historical record.
|
||||
|
||||
### 3. Visual compliance (Admin UI phases)
|
||||
|
||||
For phases that touch the Admin UI:
|
||||
- Side-by-side screenshots of equivalent ScadaLink CentralUI screens vs the new OtOpcUa Admin screens
|
||||
- Login page, sidebar, dashboard, generic forms — must visually match per `admin-ui.md` §"Visual Design — Direct Parity with ScadaLink"
|
||||
- Reviewer signoff: "could the same operator move between apps without noticing?"
|
||||
|
||||
### 4. Behavioral compliance (end-to-end smoke tests)
|
||||
|
||||
For each phase, an integration test exercises the new capability end-to-end:
|
||||
- Phase 0: existing v1 IntegrationTests pass under the renamed projects
|
||||
- Phase 1: create a cluster → publish a generation → node fetches the generation → roll back → fetch again
|
||||
- Phase 2: v1 IntegrationTests parity suite passes against the v2 Galaxy.Host (per decision #56)
|
||||
- Phase 3+: per-driver smoke test against the simulator
|
||||
|
||||
Smoke tests are **always green at exit**, never "known broken, fix later".
|
||||
|
||||
### 5. Stability compliance (Phase 2+ for Tier C drivers)
|
||||
|
||||
For phases that introduce Tier C drivers (Galaxy in Phase 2, FOCAS in Phase 5):
|
||||
- All `Driver Stability & Isolation` cross-cutting protections from `driver-stability.md` §"Cross-Cutting Protections" are wired up:
|
||||
- SafeHandle wrappers exist for every native handle
|
||||
- Memory watchdog runs and triggers recycle on threshold breach (testable via FaultShim)
|
||||
- Crash-loop circuit breaker fires after 3 crashes / 5 min (testable via stub-injected crash)
|
||||
- Heartbeat between proxy and host functions; missed heartbeats trigger respawn
|
||||
- Post-mortem MMF survives a hard process kill and the supervisor reads it on respawn
|
||||
- Each protection has a regression test in the driver's test suite
|
||||
|
||||
### 6. Documentation compliance
|
||||
|
||||
For every phase:
|
||||
- Any deviation from the v2 design docs (`plan.md`, `config-db-schema.md`, `admin-ui.md`, `driver-specs.md`, `driver-stability.md`, `test-data-sources.md`) is reflected back in the docs
|
||||
- New decisions added to the decision log with rationale
|
||||
- Old decisions superseded explicitly (not silently)
|
||||
- Cross-references between docs stay current
|
||||
|
||||
## Deliverable Types
|
||||
|
||||
Each phase produces a defined set of deliverables. The phase doc enumerates which deliverables apply.
|
||||
|
||||
| Type | Format | Purpose |
|
||||
|------|--------|---------|
|
||||
| **Code** | Source files committed to a feature branch, merged to `v2` after exit gate | The implementation itself |
|
||||
| **Tests** | xUnit unit + integration tests; per-phase smoke tests | Behavioral evidence |
|
||||
| **Migrations** | EF Core migrations under `Configuration/Migrations/` | Schema delta |
|
||||
| **Decision-log entries** | New rows appended to `plan.md` decision table | Architectural choices made during the phase |
|
||||
| **Doc updates** | Edits to existing v2 docs | Keep design and implementation aligned |
|
||||
| **Gate records** | `entry-gate-{phase}.md`, `exit-gate-{phase}.md` in this folder | Audit trail of gate clearance |
|
||||
| **Compliance script** | Per-phase shell or PowerShell script that runs the compliance checks | Repeatable verification |
|
||||
| **Adversarial review** | `/codex:adversarial-review` output on the phase diff | Independent challenge |
|
||||
|
||||
## Branch and PR Conventions
|
||||
|
||||
| Branch | Purpose |
|
||||
|--------|---------|
|
||||
| `v2` | Long-running design + implementation branch. All phase work merges here. |
|
||||
| `v2/phase-{N}-{slug}` | Per-phase feature branch (e.g. `v2/phase-0-rename`) |
|
||||
| `v2/phase-{N}-{slug}-{subtask}` | Per-subtask branches when the phase is large enough to warrant them |
|
||||
|
||||
Each phase merges to `v2` via PR after the exit gate clears. PRs include:
|
||||
- Link to the phase implementation doc
|
||||
- Link to the exit-gate record
|
||||
- Compliance-script output
|
||||
- Adversarial-review output
|
||||
- Reviewer signoffs
|
||||
|
||||
The `master` branch stays at v1 production state until all phases are complete and a separate v2 release decision is made.
|
||||
|
||||
## What Counts as "Following the Plan"
|
||||
|
||||
The implementation **follows the plan** when, at every phase exit gate:
|
||||
|
||||
1. Every task listed in the phase doc has been done OR explicitly deferred with rationale
|
||||
2. Every compliance check has a passing artifact OR an explicit deviation note signed off by the reviewer
|
||||
3. The codebase contains traceable references to every decision number the phase implements
|
||||
4. The v2 design docs are updated to reflect any approach changes
|
||||
5. The smoke test for the phase passes
|
||||
6. Two people have signed off — implementation lead + one other reviewer
|
||||
|
||||
The implementation **deviates from the plan** when any of those conditions fails. Deviations are not failures; they are signals to update the plan or revise the implementation. The unrecoverable failure mode is **silent deviation** — code that doesn't match the plan, with no decision-log update explaining why. The exit gate's compliance checks exist specifically to make silent deviation impossible to ship.
|
||||
|
||||
## Phase Implementation Docs
|
||||
|
||||
| Phase | Doc | Status |
|
||||
|-------|-----|--------|
|
||||
| 0 | [`phase-0-rename-and-net10.md`](phase-0-rename-and-net10.md) | DRAFT |
|
||||
| 1 | [`phase-1-configuration-and-admin-scaffold.md`](phase-1-configuration-and-admin-scaffold.md) | DRAFT |
|
||||
| 2 | (Phase 2: Galaxy parity refactor — TBD) | NOT STARTED |
|
||||
| 3 | (Phase 3: Modbus TCP driver — TBD) | NOT STARTED |
|
||||
| 4 | (Phase 4: PLC drivers AB CIP / AB Legacy / S7 / TwinCAT — TBD) | NOT STARTED |
|
||||
| 5 | (Phase 5: Specialty drivers FOCAS / OPC UA Client — TBD) | NOT STARTED |
|
||||
| 6+ | (Phases 6–8: tier 1/2/3 consumer cutover — separate planning track per corrections doc C5) | NOT SCOPED |
|
||||
269
docs/v2/implementation/phase-0-rename-and-net10.md
Normal file
269
docs/v2/implementation/phase-0-rename-and-net10.md
Normal file
@@ -0,0 +1,269 @@
|
||||
# Phase 0 — Rename to OtOpcUa + .NET 10 Cleanup
|
||||
|
||||
> **Status**: DRAFT — implementation plan for Phase 0 of the v2 build (`plan.md` §6).
|
||||
>
|
||||
> **Branch**: `v2/phase-0-rename`
|
||||
> **Estimated duration**: 3–5 working days
|
||||
> **Predecessor**: none (first phase)
|
||||
> **Successor**: Phase 1 (`phase-1-configuration-and-admin-scaffold.md`)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Mechanically rename the existing v1 codebase from `LmxOpcUa` to `OtOpcUa` and verify all existing v1 tests still pass under the new names. **No new functionality**, **no .NET 10 retargeting of `Host` or `Historian.Aveva`** (those move in Phase 2 with the Galaxy split — they need to stay on .NET 4.8 because of MXAccess and Wonderware Historian SDK dependencies). All other projects are already on .NET 10 and stay there.
|
||||
|
||||
The phase exists as a clean checkpoint: future PRs reference `OtOpcUa` consistently, the rename is not entangled with semantic changes, and the diff is mechanical enough to review safely.
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| Project names | `ZB.MOM.WW.LmxOpcUa.*` → `ZB.MOM.WW.OtOpcUa.*` (all 11 projects) |
|
||||
| Solution file | `ZB.MOM.WW.LmxOpcUa.slnx` → `ZB.MOM.WW.OtOpcUa.slnx` |
|
||||
| Namespaces | `ZB.MOM.WW.LmxOpcUa` root → `ZB.MOM.WW.OtOpcUa` root (all source files) |
|
||||
| Assembly names | `<AssemblyName>` and `<RootNamespace>` in every csproj |
|
||||
| Folder names | `src/ZB.MOM.WW.LmxOpcUa.*` → `src/ZB.MOM.WW.OtOpcUa.*`; same in `tests/` |
|
||||
| Default `appsettings.json` keys | `Lmx*` → `Ot*` only where the section name is product-bound (e.g. `LmxOpcUa.Server` → `OtOpcUa.Server`); leave `MxAccess.*` keys alone (those refer to the AVEVA product, not ours) |
|
||||
| Service registration name | TopShelf service name `LmxOpcUa` → `OtOpcUa` (until Phase 1 swaps TopShelf for `Microsoft.Extensions.Hosting`) |
|
||||
| Documentation | All `docs/*.md` references; `CLAUDE.md` |
|
||||
| Repo name | **NOT** in scope for Phase 0 — repo rename happens in a separate ops step after exit gate clears |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| `.NET Framework 4.8` target on `Host` and `Historian.Aveva` | MXAccess COM is 32-bit only; Wonderware Historian SDK is .NET 4.8. Both move to `Galaxy.Host` (still .NET 4.8 x86) in Phase 2. |
|
||||
| `.NET 10` target on Client.CLI / Client.Shared / Client.UI / all Tests | Already there (verified 2026-04-17 via `grep TargetFramework src/*/*.csproj`). |
|
||||
| Project structure (no new projects) | New projects (Configuration, Core, Core.Abstractions, Server, Admin) are added in Phase 1, not Phase 0. |
|
||||
| Galaxy / MXAccess implementation | Stays in `OtOpcUa.Host` for now; Phase 2 splits it into Proxy/Host/Shared. |
|
||||
| `master` branch / production deployments | Untouched — v2 work all happens on the `v2` branch. |
|
||||
| OPC UA `ApplicationUri` defaults | Currently include `LmxOpcUa` — leave as-is to avoid breaking existing client trust during v1/v2 coexistence. New `ApplicationUri` defaults land in Phase 1 alongside the cluster model. |
|
||||
| MxAccess product references in docs / code | "MxAccess" is AVEVA's product name, not ours. Stays. |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
Verify all before opening the `v2/phase-0-rename` branch:
|
||||
|
||||
- [ ] `v2` branch is at commit `a59ad2e` or later (decisions #1–125 captured)
|
||||
- [ ] `git status` is clean on `v2`
|
||||
- [ ] `dotnet test ZB.MOM.WW.LmxOpcUa.slnx` passes locally with **zero failing tests**, baseline test count recorded
|
||||
- [ ] `dotnet build ZB.MOM.WW.LmxOpcUa.slnx` succeeds with zero errors and ≤ baseline warning count
|
||||
- [ ] All design docs reviewed by the implementation lead: `docs/v2/plan.md`, `docs/v2/config-db-schema.md`, `docs/v2/admin-ui.md`, `docs/v2/driver-specs.md`, `docs/v2/driver-stability.md`, `docs/v2/implementation/overview.md`
|
||||
- [ ] Decision #9 (rename to OtOpcUa as step 1) re-read and confirmed
|
||||
- [ ] No other developers have open work on `v2` that would conflict with bulk renames
|
||||
|
||||
**Evidence file**: `docs/v2/implementation/entry-gate-phase-0.md` recording date, baseline test count, signoff name.
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
### Task 0.1 — Inventory references
|
||||
|
||||
Generate a complete map of every place `LmxOpcUa` appears:
|
||||
|
||||
```bash
|
||||
grep -rln "LmxOpcUa" --include="*.cs" --include="*.csproj" --include="*.slnx" --include="*.json" --include="*.md" --include="*.razor" .
|
||||
```
|
||||
|
||||
Save the result to `docs/v2/implementation/phase-0-rename-inventory.md` (gitignored after phase completes).
|
||||
|
||||
**Acceptance**:
|
||||
- Inventory file exists, lists every reference grouped by file type
|
||||
- Reviewer agrees inventory is complete (cross-check against `git grep -i lmx` for case-sensitivity bugs)
|
||||
|
||||
### Task 0.2 — Rename project folders
|
||||
|
||||
Per project (11 projects total — 5 src + 6 tests):
|
||||
|
||||
```bash
|
||||
git mv src/ZB.MOM.WW.LmxOpcUa.Client.CLI src/ZB.MOM.WW.OtOpcUa.Client.CLI
|
||||
git mv src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.LmxOpcUa.Client.CLI.csproj \
|
||||
src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.OtOpcUa.Client.CLI.csproj
|
||||
```
|
||||
|
||||
Repeat for: `Client.Shared`, `Client.UI`, `Historian.Aveva`, `Host`, and all 6 test projects.
|
||||
|
||||
Use `git mv` (not `mv` + `git rm`/`git add`) to preserve history.
|
||||
|
||||
**Acceptance**:
|
||||
- `ls src/` shows only `ZB.MOM.WW.OtOpcUa.*` folders
|
||||
- `ls tests/` shows only `ZB.MOM.WW.OtOpcUa.*` folders
|
||||
- `git log --follow` on a renamed file shows continuous history pre-rename
|
||||
|
||||
### Task 0.3 — Rename solution file
|
||||
|
||||
```bash
|
||||
git mv ZB.MOM.WW.LmxOpcUa.slnx ZB.MOM.WW.OtOpcUa.slnx
|
||||
```
|
||||
|
||||
Edit the `.slnx` to update every project path reference inside it.
|
||||
|
||||
**Acceptance**:
|
||||
- `ZB.MOM.WW.OtOpcUa.slnx` exists and references the renamed project paths
|
||||
- `dotnet sln list` (or `dotnet build` against the slnx) succeeds
|
||||
|
||||
### Task 0.4 — Update csproj contents
|
||||
|
||||
For every csproj:
|
||||
- Update `<AssemblyName>` if explicitly set
|
||||
- Update `<RootNamespace>` if explicitly set
|
||||
- Update `<ProjectReference Include=...>` paths for inter-project refs
|
||||
- Update `<PackageId>` if any project ships as a NuGet (none currently expected, but verify)
|
||||
|
||||
**Acceptance**:
|
||||
- `grep -rl "LmxOpcUa" src/*/*.csproj tests/*/*.csproj` returns empty
|
||||
- `dotnet restore` succeeds with no missing project references
|
||||
|
||||
### Task 0.5 — Bulk-rename namespaces in source files
|
||||
|
||||
Run the rename across all `.cs` and `.razor` files:
|
||||
|
||||
```bash
|
||||
grep -rl "ZB.MOM.WW.LmxOpcUa" --include="*.cs" --include="*.razor" . \
|
||||
| xargs sed -i 's/ZB\.MOM\.WW\.LmxOpcUa/ZB.MOM.WW.OtOpcUa/g'
|
||||
```
|
||||
|
||||
**Acceptance**:
|
||||
- `grep -rln "ZB.MOM.WW.LmxOpcUa" --include="*.cs" --include="*.razor" .` returns empty
|
||||
- `dotnet build ZB.MOM.WW.OtOpcUa.slnx` succeeds
|
||||
|
||||
### Task 0.6 — Update appsettings.json + service hosting
|
||||
|
||||
In `src/ZB.MOM.WW.OtOpcUa.Host/appsettings.json` and equivalents:
|
||||
- Rename product-named sections: `LmxOpcUa.Server` → `OtOpcUa.Server` (if present)
|
||||
- Leave `MxAccess`, `Galaxy`, `Historian` keys untouched (those are external product names)
|
||||
- Update TopShelf `ServiceName` constant from `LmxOpcUa` → `OtOpcUa`
|
||||
|
||||
**Acceptance**:
|
||||
- Service install (`dotnet run --project src/.../Host install`) registers as `OtOpcUa`
|
||||
- Service uninstall + reinstall cycle succeeds on a Windows test box
|
||||
|
||||
### Task 0.7 — Update documentation references
|
||||
|
||||
- `CLAUDE.md`: replace `LmxOpcUa` references with `OtOpcUa` in product-naming contexts; leave `MxAccess` / `MXAccess` references alone
|
||||
- `docs/*.md` (existing v1 docs): same pattern
|
||||
- `docs/v2/*.md`: already uses `OtOpcUa` — verify with grep
|
||||
|
||||
**Acceptance**:
|
||||
- `grep -rln "LmxOpcUa" docs/ CLAUDE.md` returns only references that explicitly need to retain the old name (e.g. historical sections, change log)
|
||||
- Each retained reference has a comment explaining why
|
||||
|
||||
### Task 0.8 — Run full test suite + smoke test
|
||||
|
||||
```bash
|
||||
dotnet build ZB.MOM.WW.OtOpcUa.slnx
|
||||
dotnet test ZB.MOM.WW.OtOpcUa.slnx
|
||||
```
|
||||
|
||||
Plus manual smoke test of Client.CLI against a running v1 OPC UA server:
|
||||
|
||||
```bash
|
||||
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840
|
||||
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840 -r -d 2
|
||||
```
|
||||
|
||||
**Acceptance**:
|
||||
- Test count matches the baseline recorded at entry gate; **zero failing tests**
|
||||
- Smoke test produces equivalent output to baseline (capture both, diff)
|
||||
|
||||
### Task 0.9 — Update build commands in CLAUDE.md
|
||||
|
||||
The Build Commands section currently references `ZB.MOM.WW.LmxOpcUa.slnx`. Update to `ZB.MOM.WW.OtOpcUa.slnx`. Also update test paths.
|
||||
|
||||
**Acceptance**:
|
||||
- `cat CLAUDE.md | grep -i lmxopcua` returns only retained-by-design references
|
||||
- A new developer cloning the repo can follow CLAUDE.md to build + test successfully
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
A `phase-0-compliance.ps1` (or `.sh`) script runs all these and exits non-zero on any failure:
|
||||
|
||||
1. **No stale `LmxOpcUa` references**:
|
||||
```
|
||||
grep -rln "LmxOpcUa" --include="*.cs" --include="*.csproj" --include="*.slnx" \
|
||||
--include="*.json" --include="*.razor" . | wc -l
|
||||
```
|
||||
Expected: 0 (or only allowlisted retained references)
|
||||
|
||||
2. **All projects build**:
|
||||
```
|
||||
dotnet build ZB.MOM.WW.OtOpcUa.slnx --warnaserror
|
||||
```
|
||||
Expected: success, warning count ≤ baseline
|
||||
|
||||
3. **All tests pass**:
|
||||
```
|
||||
dotnet test ZB.MOM.WW.OtOpcUa.slnx
|
||||
```
|
||||
Expected: total count = baseline, failures = 0
|
||||
|
||||
4. **Solution structure matches plan**:
|
||||
- `ls src/` shows exactly: `ZB.MOM.WW.OtOpcUa.{Client.CLI, Client.Shared, Client.UI, Historian.Aveva, Host}` (5 entries)
|
||||
- `ls tests/` shows the 6 test projects similarly renamed
|
||||
- No new projects yet (those land in Phase 1)
|
||||
|
||||
5. **.NET targets unchanged**:
|
||||
- Client projects (CLI/Shared/UI): `net10.0`
|
||||
- Host + Historian.Aveva: `net48` (split + retarget happens Phase 2)
|
||||
- All test projects: same targets as their SUT projects
|
||||
|
||||
6. **Decision compliance**: this phase implements decision #9 ("Rename to OtOpcUa as step 1"). Verify by:
|
||||
```
|
||||
grep -rln "decision #9\|Decision #9" src/ tests/
|
||||
```
|
||||
Expected: at least one citation in CLAUDE.md or a phase-rename README explaining the mechanical scope.
|
||||
|
||||
7. **Service registration works**:
|
||||
- Install service → `sc query OtOpcUa` returns the service
|
||||
- Uninstall service → `sc query OtOpcUa` returns "service does not exist"
|
||||
|
||||
## Behavioral Smoke Test (exit-gate gate)
|
||||
|
||||
The v1 IntegrationTests suite is the authoritative behavioral spec for Phase 0. The renamed code must pass it identically.
|
||||
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --logger "console;verbosity=detailed"
|
||||
```
|
||||
|
||||
Expected: pass count = baseline. Fail count = 0. Skipped count = baseline.
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
The exit gate signs off only when **every** item below is checked:
|
||||
|
||||
- [ ] All 11 projects renamed (5 src + 6 tests)
|
||||
- [ ] Solution file renamed
|
||||
- [ ] All `<AssemblyName>` / `<RootNamespace>` / `<ProjectReference>` updated
|
||||
- [ ] All namespaces in source files updated
|
||||
- [ ] `appsettings.json` product-named sections updated; external product names untouched
|
||||
- [ ] TopShelf service name updated; install/uninstall cycle verified on a Windows host
|
||||
- [ ] `docs/*.md` and `CLAUDE.md` references updated; retained references explained
|
||||
- [ ] Build succeeds with zero errors and warning count ≤ baseline
|
||||
- [ ] Test suite passes with zero failures and count = baseline
|
||||
- [ ] Smoke test against running OPC UA server matches baseline output
|
||||
- [ ] `phase-0-compliance.ps1` script runs and exits 0
|
||||
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
|
||||
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, adversarial review output
|
||||
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
|
||||
- [ ] `exit-gate-phase-0.md` recorded with all of the above
|
||||
|
||||
After the PR merges, repo rename (`lmxopcua` → `otopcua` on Gitea) happens as a separate ops step — out of scope for Phase 0.
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| Bulk `sed` rename breaks string literals (e.g. `"LmxOpcUa"` used as a runtime identifier) | Medium | Medium | Inventory step (0.1) flags string literals separately; rename them deliberately, not via bulk sed |
|
||||
| MxAccess / Galaxy / Wonderware references accidentally renamed | Low | High (breaks COM interop) | Inventory step (0.1) calls out external product names explicitly; bulk rename targets only `ZB.MOM.WW.LmxOpcUa` (with namespace prefix), not bare `LmxOpcUa` |
|
||||
| Test count drops silently because a test project doesn't get re-discovered | Medium | High | Baseline test count captured at entry gate; exit gate compares exactly |
|
||||
| `.slnx` references break and projects disappear from solution view | Low | Medium | `dotnet sln list` after Task 0.3 verifies all projects load |
|
||||
| TopShelf service install fails on a hardened Windows box (UAC, signing) | Low | Low | Manual install/uninstall cycle is part of Task 0.6 acceptance |
|
||||
| Long-lived branches diverge while phase 0 is in flight | Medium | Low | Phase 0 expected duration ≤ 5 days; coordinate that no other v2 work merges during the phase |
|
||||
|
||||
## Out of Scope (do not do in Phase 0)
|
||||
|
||||
- Adding any new project (Configuration, Admin, Core, Server, Driver.* — all Phase 1+)
|
||||
- Splitting Host into Galaxy.Proxy/Host/Shared (Phase 2)
|
||||
- Migrating Host/Historian.Aveva to .NET 10 (Phase 2 — when Galaxy is split, the .NET 4.8 x86 piece becomes Galaxy.Host and the rest can move)
|
||||
- Replacing TopShelf with `Microsoft.Extensions.Hosting` (Phase 1, decision #30)
|
||||
- Implementing the cluster / namespace / equipment data model (Phase 1)
|
||||
- Changing any OPC UA wire behavior
|
||||
- Renaming the Gitea repo
|
||||
@@ -0,0 +1,608 @@
|
||||
# Phase 1 — Configuration Project + Core.Abstractions + Admin UI Scaffold
|
||||
|
||||
> **Status**: DRAFT — implementation plan for Phase 1 of the v2 build (`plan.md` §6).
|
||||
>
|
||||
> **Branch**: `v2/phase-1-configuration`
|
||||
> **Estimated duration**: 4–6 weeks (largest greenfield phase; most foundational)
|
||||
> **Predecessor**: Phase 0 (`phase-0-rename-and-net10.md`)
|
||||
> **Successor**: Phase 2 (Galaxy parity refactor)
|
||||
|
||||
## Phase Objective
|
||||
|
||||
Stand up the **central configuration substrate** for the v2 fleet:
|
||||
|
||||
1. **`Core.Abstractions` project** — driver capability interfaces (`IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IRediscoverable`, `IHostConnectivityProbe`, `IDriverConfigEditor`, `DriverAttributeInfo`)
|
||||
2. **`Configuration` project** — central MSSQL schema + EF Core migrations + stored procedures + LiteDB local cache + generation-diff application logic
|
||||
3. **`Core` project** — `GenericDriverNodeManager` (renamed from `LmxNodeManager`), driver-hosting infrastructure, OPC UA server lifecycle, address-space registration via `IAddressSpaceBuilder`
|
||||
4. **`Server` project** — `Microsoft.Extensions.Hosting`-based Windows Service host (replacing TopShelf), bootstrap from Configuration using node-bound credential, register drivers, start Core
|
||||
5. **`Admin` project** — Blazor Server admin app scaffolded with ScadaLink CentralUI parity (Bootstrap 5, dark sidebar, LDAP cookie auth, three admin roles, draft → publish → rollback workflow, cluster/node/namespace/equipment/tag CRUD)
|
||||
|
||||
**No driver instances yet** (Galaxy stays in legacy in-process Host until Phase 2). The phase exit requires that an empty cluster can be created in Admin, an empty generation can be published, and a node can fetch the published generation — proving the configuration substrate works end-to-end.
|
||||
|
||||
## Scope — What Changes
|
||||
|
||||
| Concern | Change |
|
||||
|---------|--------|
|
||||
| New projects | 5 new src projects + 5 matching test projects |
|
||||
| Existing v1 Host project | Refactored to consume `Core.Abstractions` interfaces against its existing Galaxy implementation — **but not split into Proxy/Host/Shared yet** (Phase 2) |
|
||||
| `LmxNodeManager` | **Renamed to `GenericDriverNodeManager`** in Core, with `IDriver` swapped in for `IMxAccessClient`. The existing v1 Host instantiates `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process) — see `plan.md` §5a |
|
||||
| Service hosting | TopShelf removed; `Microsoft.Extensions.Hosting` BackgroundService used (decision #30) |
|
||||
| Central config DB | New SQL Server database `OtOpcUaConfig` provisioned from EF Core migrations |
|
||||
| LDAP authentication for Admin | `Admin.Security` project mirrors `ScadaLink.Security`; cookie auth + JWT API endpoint |
|
||||
| Local LiteDB cache on each node | New `config_cache.db` per node; bootstraps from central DB or cache |
|
||||
|
||||
## Scope — What Does NOT Change
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| Galaxy out-of-process split | Phase 2 |
|
||||
| Any new driver (Modbus, AB, S7, etc.) | Phase 3+ |
|
||||
| OPC UA wire behavior | Galaxy address space still served exactly as v1; the Configuration substrate is read but not yet driving everything |
|
||||
| Equipment-class template integration with future schemas repo | `EquipmentClassRef` is a nullable hook column; no validation yet (decisions #112, #115) |
|
||||
| Per-driver custom config editors in Admin | Generic JSON editor only in v2.0 (decision #27); driver-specific editors land in their respective phases |
|
||||
| Consumer cutover (ScadaBridge / Ignition / SystemPlatform IO) | Phases 6–8 |
|
||||
| Equipment Protocol Survey | External prerequisite — ideally runs in parallel with Phase 1 (handoff §"Equipment Protocol Survey") |
|
||||
|
||||
## Entry Gate Checklist
|
||||
|
||||
- [ ] Phase 0 exit gate cleared (rename complete, all v1 tests pass under OtOpcUa names)
|
||||
- [ ] `v2` branch is clean
|
||||
- [ ] Phase 0 PR merged
|
||||
- [ ] SQL Server 2019+ instance available for development (local dev box minimum; shared dev instance for integration tests)
|
||||
- [ ] LDAP / GLAuth dev instance available for Admin auth integration testing
|
||||
- [ ] ScadaLink CentralUI source accessible at `C:\Users\dohertj2\Desktop\scadalink-design\` for parity reference
|
||||
- [ ] All Phase 1-relevant design docs reviewed: `plan.md` §4–5, `config-db-schema.md` (entire), `admin-ui.md` (entire), `driver-stability.md` §"Cross-Cutting Protections" (sets context for `Core.Abstractions` scope)
|
||||
- [ ] Decisions #1–125 read at least skim-level; key ones for Phase 1: #14–22, #25, #28, #30, #32–33, #46–51, #79–125
|
||||
|
||||
**Evidence file**: `docs/v2/implementation/entry-gate-phase-1.md` recording date, signoff, environment availability.
|
||||
|
||||
## Task Breakdown
|
||||
|
||||
Phase 1 is large — broken into 5 work streams (A–E) that can partly overlap. A typical sequencing: A → B → (C and D in parallel) → E.
|
||||
|
||||
### Stream A — Core.Abstractions (1 week)
|
||||
|
||||
#### Task A.1 — Define driver capability interfaces
|
||||
|
||||
Create `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/` (.NET 10, no dependencies). Define:
|
||||
|
||||
```csharp
|
||||
public interface IDriver { /* lifecycle, metadata, health */ }
|
||||
public interface ITagDiscovery { /* discover tags/hierarchy from backend */ }
|
||||
public interface IReadable { /* on-demand read */ }
|
||||
public interface IWritable { /* on-demand write */ }
|
||||
public interface ISubscribable { /* data change subscriptions */ }
|
||||
public interface IAlarmSource { /* alarm events + acknowledgment */ }
|
||||
public interface IHistoryProvider { /* historical reads */ }
|
||||
public interface IRediscoverable { /* opt-in change-detection signal */ }
|
||||
public interface IHostConnectivityProbe { /* per-host runtime status */ }
|
||||
public interface IDriverConfigEditor { /* Admin UI plug point per driver */ }
|
||||
public interface IAddressSpaceBuilder { /* core-owned tree builder */ }
|
||||
```
|
||||
|
||||
Plus the data models referenced from the interfaces:
|
||||
|
||||
```csharp
|
||||
public sealed record DriverAttributeInfo(
|
||||
string FullName,
|
||||
DriverDataType DriverDataType,
|
||||
bool IsArray,
|
||||
uint? ArrayDim,
|
||||
SecurityClassification SecurityClass,
|
||||
bool IsHistorized);
|
||||
public enum DriverDataType { Boolean, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float32, Float64, String, DateTime, Reference, Custom }
|
||||
public enum SecurityClassification { FreeAccess, Operate, SecuredWrite, VerifiedWrite, Tune, Configure, ViewOnly }
|
||||
```
|
||||
|
||||
**Acceptance**:
|
||||
- All interfaces compile in a project with **zero dependencies** beyond BCL
|
||||
- xUnit test project asserts (via reflection) that no interface returns or accepts a type from `Core` or `Configuration` (interface independence per decision #59)
|
||||
- Each interface XML doc cites the design decision(s) it implements (e.g. `IRediscoverable` cites #54)
|
||||
|
||||
#### Task A.2 — Define DriverTypeRegistry
|
||||
|
||||
```csharp
|
||||
public sealed class DriverTypeRegistry
|
||||
{
|
||||
public DriverTypeMetadata Get(string driverType);
|
||||
public IEnumerable<DriverTypeMetadata> All();
|
||||
}
|
||||
|
||||
public sealed record DriverTypeMetadata(
|
||||
string TypeName, // "Galaxy" | "ModbusTcp" | ...
|
||||
NamespaceKindCompatibility AllowedNamespaceKinds, // per decision #111
|
||||
string DriverConfigJsonSchema, // per decision #91
|
||||
string DeviceConfigJsonSchema, // optional
|
||||
string TagConfigJsonSchema);
|
||||
|
||||
[Flags]
|
||||
public enum NamespaceKindCompatibility
|
||||
{
|
||||
Equipment = 1, SystemPlatform = 2, Simulated = 4
|
||||
}
|
||||
```
|
||||
|
||||
In v2.0 v1 only registers the `Galaxy` type (`AllowedNamespaceKinds = SystemPlatform`). Phase 3+ extends.
|
||||
|
||||
**Acceptance**:
|
||||
- Registry compiles, has unit tests for: register a type, look it up, reject duplicate registration, enumerate all
|
||||
- Galaxy registration entry exists with `AllowedNamespaceKinds = SystemPlatform` per decision #111
|
||||
|
||||
### Stream B — Configuration project (1.5 weeks)
|
||||
|
||||
#### Task B.1 — EF Core schema + initial migration
|
||||
|
||||
Create `src/ZB.MOM.WW.OtOpcUa.Configuration/` (.NET 10, EF Core 10).
|
||||
|
||||
Implement DbContext with entities matching `config-db-schema.md` exactly:
|
||||
- `ServerCluster`, `ClusterNode`, `ClusterNodeCredential`
|
||||
- `Namespace` (generation-versioned per decision #123)
|
||||
- `UnsArea`, `UnsLine`
|
||||
- `ConfigGeneration`
|
||||
- `DriverInstance`, `Device`, `Equipment`, `Tag`, `PollGroup`
|
||||
- `ClusterNodeGenerationState`, `ConfigAuditLog`
|
||||
- `ExternalIdReservation` (NOT generation-versioned per decision #124)
|
||||
|
||||
Generate the initial migration:
|
||||
|
||||
```bash
|
||||
dotnet ef migrations add InitialSchema --project src/ZB.MOM.WW.OtOpcUa.Configuration
|
||||
```
|
||||
|
||||
**Acceptance**:
|
||||
- Apply migration to a clean SQL Server instance produces the schema in `config-db-schema.md`
|
||||
- Schema-validation test (`SchemaComplianceTests`) introspects the live DB and asserts every table/column/index/constraint matches the doc
|
||||
- Test runs in CI against a SQL Server container
|
||||
|
||||
#### Task B.2 — Stored procedures via `MigrationBuilder.Sql`
|
||||
|
||||
Add stored procedures from `config-db-schema.md` §"Stored Procedures":
|
||||
- `sp_GetCurrentGenerationForCluster`
|
||||
- `sp_GetGenerationContent`
|
||||
- `sp_RegisterNodeGenerationApplied`
|
||||
- `sp_PublishGeneration` (with the `MERGE` against `ExternalIdReservation` per decision #124)
|
||||
- `sp_RollbackToGeneration`
|
||||
- `sp_ValidateDraft` (calls into managed validator code per decision #91 — proc is structural-only, content schema validation is in the Admin app)
|
||||
- `sp_ComputeGenerationDiff`
|
||||
- `sp_ReleaseExternalIdReservation` (FleetAdmin only)
|
||||
|
||||
Use `CREATE OR ALTER` style in `MigrationBuilder.Sql()` blocks so procs version with the schema.
|
||||
|
||||
**Acceptance**:
|
||||
- Each proc has at least one xUnit test exercising the happy path + at least one error path
|
||||
- `sp_PublishGeneration` has a concurrency test: two simultaneous publishes for the same cluster → one wins, one fails with a recognizable error
|
||||
- `sp_GetCurrentGenerationForCluster` has an authorization test: caller bound to NodeId X cannot read cluster Y's generation
|
||||
|
||||
#### Task B.3 — Authorization model (SQL principals + GRANT)
|
||||
|
||||
Add a separate migration `AuthorizationGrants` that:
|
||||
- Creates two SQL roles: `OtOpcUaNode`, `OtOpcUaAdmin`
|
||||
- Grants EXECUTE on the appropriate procs per `config-db-schema.md` §"Authorization Model"
|
||||
- Grants no direct table access to either role
|
||||
|
||||
**Acceptance**:
|
||||
- Test that runs as a `OtOpcUaNode`-roled principal can only call the node procs, not admin procs
|
||||
- Test that runs as a `OtOpcUaAdmin`-roled principal can call publish/rollback procs
|
||||
- Test that direct `SELECT * FROM dbo.ConfigGeneration` from a `OtOpcUaNode` principal is denied
|
||||
|
||||
#### Task B.4 — JSON-schema validators (managed code)
|
||||
|
||||
In `Configuration.Validation/`, implement validators consumed by `sp_ValidateDraft` (called from the Admin app pre-publish per decision #91):
|
||||
- UNS segment regex (`^[a-z0-9-]{1,32}$` or `_default`)
|
||||
- Path length (≤200 chars)
|
||||
- UUID immutability across generations
|
||||
- Same-cluster namespace binding (decision #122)
|
||||
- ZTag/SAPID reservation pre-flight (decision #124)
|
||||
- EquipmentId derivation rule (decision #125)
|
||||
- Driver type ↔ namespace kind allowed (decision #111)
|
||||
- JSON-schema validation per `DriverType` from `DriverTypeRegistry`
|
||||
|
||||
**Acceptance**:
|
||||
- One unit test per rule, both passing and failing cases
|
||||
- Cross-rule integration test: a draft that violates 3 rules surfaces all 3 (not just the first)
|
||||
|
||||
#### Task B.5 — LiteDB local cache
|
||||
|
||||
In `Configuration.LocalCache/`, implement the LiteDB schema from `config-db-schema.md` §"Local LiteDB Cache":
|
||||
|
||||
```csharp
|
||||
public interface ILocalConfigCache
|
||||
{
|
||||
Task<GenerationCacheEntry?> GetMostRecentAsync(string clusterId);
|
||||
Task PutAsync(GenerationCacheEntry entry);
|
||||
Task PruneOldGenerationsAsync(string clusterId, int keepLatest = 10);
|
||||
}
|
||||
```
|
||||
|
||||
**Acceptance**:
|
||||
- Round-trip test: write a generation snapshot, read it back, assert deep equality
|
||||
- Pruning test: write 15 generations, prune to 10, assert the 5 oldest are gone
|
||||
- Corruption test: corrupt the LiteDB file, assert the loader fails fast with a clear error
|
||||
|
||||
#### Task B.6 — Generation-diff application logic
|
||||
|
||||
In `Configuration.Apply/`, implement the diff-and-apply logic that runs on each node when a new generation arrives:
|
||||
|
||||
```csharp
|
||||
public interface IGenerationApplier
|
||||
{
|
||||
Task<ApplyResult> ApplyAsync(GenerationSnapshot from, GenerationSnapshot to, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
Diff per entity type, dispatch to driver `Reinitialize` / cache flush as needed.
|
||||
|
||||
**Acceptance**:
|
||||
- Diff test: from = empty, to = (1 driver + 5 equipment + 50 tags) → `Added` for each
|
||||
- Diff test: from = (above), to = same with one tag's `Name` changed → `Modified` for one tag, no other changes
|
||||
- Diff test: from = (above), to = same with one equipment removed → `Removed` for the equipment + cascading `Removed` for its tags
|
||||
- Apply test against an in-memory mock driver: applies the diff in correct order, idempotent on retry
|
||||
|
||||
### Stream C — Core project (1 week, can parallel with Stream D)
|
||||
|
||||
#### Task C.1 — Rename `LmxNodeManager` → `GenericDriverNodeManager`
|
||||
|
||||
Per `plan.md` §5a:
|
||||
- Lift the file from `Host/OpcUa/LmxNodeManager.cs` to `Core/OpcUa/GenericDriverNodeManager.cs`
|
||||
- Swap `IMxAccessClient` for `IDriver` (composing `IReadable` / `IWritable` / `ISubscribable`)
|
||||
- Swap `GalaxyAttributeInfo` for `DriverAttributeInfo`
|
||||
- Promote `GalaxyRuntimeProbeManager` interactions to use `IHostConnectivityProbe`
|
||||
- Move `MxDataTypeMapper` and `SecurityClassificationMapper` to a new `Driver.Galaxy.Mapping/` (still in legacy Host until Phase 2)
|
||||
|
||||
**Acceptance**:
|
||||
- v1 IntegrationTests still pass against the renamed class (parity is the gate, decision #62 — class is "foundation, not rewrite")
|
||||
- Reflection test asserts `GenericDriverNodeManager` has no static or instance reference to any Galaxy-specific type
|
||||
|
||||
#### Task C.2 — Derive `GalaxyNodeManager : GenericDriverNodeManager` (legacy in-process)
|
||||
|
||||
In the existing Host project, add a thin `GalaxyNodeManager` that:
|
||||
- Inherits from `GenericDriverNodeManager`
|
||||
- Wires up `MxDataTypeMapper`, `SecurityClassificationMapper`, the probe manager, etc.
|
||||
- Replaces direct instantiation of the renamed class
|
||||
|
||||
**Acceptance**:
|
||||
- v1 IntegrationTests pass identically with `GalaxyNodeManager` instantiated instead of the old direct class
|
||||
- Existing dev Galaxy still serves the same address space byte-for-byte (compare with a baseline browse capture)
|
||||
|
||||
#### Task C.3 — `IAddressSpaceBuilder` API (decision #52)
|
||||
|
||||
Implement the streaming builder API drivers use to register nodes:
|
||||
|
||||
```csharp
|
||||
public interface IAddressSpaceBuilder
|
||||
{
|
||||
IFolderBuilder Folder(string browseName, string displayName);
|
||||
IVariableBuilder Variable(string browseName, DriverDataType type, ...);
|
||||
void AddProperty(string browseName, object value);
|
||||
}
|
||||
```
|
||||
|
||||
Refactor `GenericDriverNodeManager.BuildAddressSpace` to consume `IAddressSpaceBuilder` (driver streams in tags rather than buffering them).
|
||||
|
||||
**Acceptance**:
|
||||
- Build a Galaxy address space via the new builder API, assert byte-equivalent OPC UA browse output vs v1
|
||||
- Memory profiling test: building a 5000-tag address space via the builder uses <50% the peak RAM of the buffered approach
|
||||
|
||||
#### Task C.4 — Driver hosting + isolation (decision #65, #74)
|
||||
|
||||
Implement the in-process driver host that:
|
||||
- Loads each `DriverInstance` row's driver assembly
|
||||
- Catches and contains driver exceptions (driver isolation, decision #12)
|
||||
- Surfaces `IDriver.Reinitialize()` to the configuration applier
|
||||
- Tracks per-driver allocation footprint (`GetMemoryFootprint()` polled every 30s per `driver-stability.md`)
|
||||
- Flushes optional caches on budget breach
|
||||
- Marks drivers `Faulted` (Bad quality on their nodes) if `Reinitialize` fails
|
||||
|
||||
**Acceptance**:
|
||||
- Integration test: spin up two mock drivers; one throws on Read; the other keeps working. Quality on the broken driver's nodes goes Bad; the other driver is unaffected.
|
||||
- Memory-budget test: mock driver reports growing footprint above budget; cache-flush is triggered; footprint drops; no process action taken.
|
||||
|
||||
### Stream D — Server project (4 days, can parallel with Stream C)
|
||||
|
||||
#### Task D.1 — `Microsoft.Extensions.Hosting` Windows Service host (decision #30)
|
||||
|
||||
Replace TopShelf with `Microsoft.Extensions.Hosting`:
|
||||
- New `Program.cs` using `Host.CreateApplicationBuilder()`
|
||||
- `BackgroundService` that owns the OPC UA server lifecycle
|
||||
- `services.UseWindowsService()` registers as a Windows service
|
||||
- Configuration bootstrap from `appsettings.json` (NodeId + ClusterId + DB conn) per decision #18
|
||||
|
||||
**Acceptance**:
|
||||
- `dotnet run` runs interactively (console mode)
|
||||
- Installed as a Windows Service (`sc create OtOpcUa ...`), starts and stops cleanly
|
||||
- Service install + uninstall cycle leaves no leftover state
|
||||
|
||||
#### Task D.2 — Bootstrap with credential-bound DB connection (decisions #46, #83)
|
||||
|
||||
On startup:
|
||||
- Read `Cluster.NodeId` + `Cluster.ClusterId` + `ConfigDatabase.ConnectionString` from `appsettings.json`
|
||||
- Connect to central DB with the configured principal (gMSA / SQL login / cert-mapped)
|
||||
- Call `sp_GetCurrentGenerationForCluster(@NodeId, @ClusterId)` — the proc verifies the connected principal is bound to NodeId
|
||||
- If proc rejects → fail startup loudly with the principal mismatch message
|
||||
|
||||
**Acceptance**:
|
||||
- Test: principal bound to Node A boots successfully when configured with NodeId = A
|
||||
- Test: principal bound to Node A configured with NodeId = B → startup fails with `Unauthorized` and the service does not stay running
|
||||
- Test: principal bound to Node A in cluster C1 configured with ClusterId = C2 → `Forbidden`
|
||||
|
||||
#### Task D.3 — LiteDB cache fallback on DB outage
|
||||
|
||||
If the central DB is unreachable at startup, load the most recent cached generation from LiteDB and start with it. Log loudly. Continue retrying the central DB in the background; on reconnect, resume normal poll cycle.
|
||||
|
||||
**Acceptance**:
|
||||
- Test: with central DB unreachable, node starts from cache, logs `ConfigDbUnreachableUsingCache` event, OPC UA endpoint serves the cached config
|
||||
- Test: cache empty AND central DB unreachable → startup fails with `NoConfigAvailable` (decision #21)
|
||||
|
||||
### Stream E — Admin project (2.5 weeks)
|
||||
|
||||
#### Task E.1 — Project scaffold mirroring ScadaLink CentralUI (decision #102)
|
||||
|
||||
Copy the project layout from `scadalink-design/src/ScadaLink.CentralUI/` (decision #104):
|
||||
- `src/ZB.MOM.WW.OtOpcUa.Admin/`: Razor Components project, .NET 10, `AddInteractiveServerComponents`
|
||||
- `Auth/AuthEndpoints.cs`, `Auth/CookieAuthenticationStateProvider.cs`
|
||||
- `Components/Layout/MainLayout.razor`, `Components/Layout/NavMenu.razor`
|
||||
- `Components/Pages/Login.razor`, `Components/Pages/Dashboard.razor`
|
||||
- `Components/Shared/{DataTable, ConfirmDialog, LoadingSpinner, NotAuthorizedView, RedirectToLogin, TimestampDisplay, ToastNotification}.razor`
|
||||
- `EndpointExtensions.cs`, `ServiceCollectionExtensions.cs`
|
||||
|
||||
Plus `src/ZB.MOM.WW.OtOpcUa.Admin.Security/` (decision #104): `LdapAuthService`, `RoleMapper`, `JwtTokenService`, `AuthorizationPolicies` mirroring `ScadaLink.Security`.
|
||||
|
||||
**Acceptance**:
|
||||
- App builds and runs locally
|
||||
- Login page renders with OtOpcUa branding (only the `<h4>` text differs from ScadaLink)
|
||||
- Visual diff between OtOpcUa and ScadaLink login pages: only the brand text differs (compliance check #3)
|
||||
|
||||
#### Task E.2 — Bootstrap LDAP + cookie auth + admin role mapping
|
||||
|
||||
Wire up `LdapAuthService` against the dev GLAuth instance per `Security.md`. Map LDAP groups to admin roles:
|
||||
- `OtOpcUaAdmins` → `FleetAdmin`
|
||||
- `OtOpcUaConfigEditors` → `ConfigEditor`
|
||||
- `OtOpcUaViewers` → `ReadOnly`
|
||||
|
||||
Plus cluster-scoped grants per decision #105 (LDAP group `OtOpcUaConfigEditors-LINE3` → `ConfigEditor` + `ClusterId = LINE3-OPCUA` claim).
|
||||
|
||||
**Acceptance**:
|
||||
- Login as a `FleetAdmin`-mapped user → redirected to `/`, sidebar shows admin sections
|
||||
- Login as a `ReadOnly`-mapped user → redirected to `/`, sidebar shows view-only sections
|
||||
- Login as a cluster-scoped `ConfigEditor` → only their permitted clusters appear in `/clusters`
|
||||
- Login with bad credentials → redirected to `/login?error=...` with the LDAP error surfaced
|
||||
|
||||
#### Task E.3 — Cluster CRUD pages
|
||||
|
||||
Implement per `admin-ui.md`:
|
||||
- `/clusters` — Cluster list (FleetAdmin sees all, ConfigEditor sees scoped)
|
||||
- `/clusters/{ClusterId}` — Cluster Detail with all 9 tabs (Overview / Namespaces / UNS Structure / Drivers / Devices / Equipment / Tags / Generations / Audit), but Drivers/Devices/Equipment/Tags tabs initially show empty tables (no driver implementations yet — Phase 2+)
|
||||
- "New cluster" workflow per `admin-ui.md` §"Add a new cluster" — creates cluster row, opens initial draft with default namespaces (decision #123)
|
||||
- ApplicationUri auto-suggest on node create per decision #86
|
||||
|
||||
**Acceptance**:
|
||||
- Create a cluster → cluster row exists, initial draft exists with Equipment-kind namespace
|
||||
- Edit cluster name → change reflected in list + detail
|
||||
- Disable a cluster → no longer offered as a target for new nodes; existing nodes keep showing in list with "Disabled" badge
|
||||
|
||||
#### Task E.4 — Draft → diff → publish workflow (decision #89)
|
||||
|
||||
Implement per `admin-ui.md` §"Draft Editor", §"Diff Viewer", §"Generation History":
|
||||
- `/clusters/{Id}/draft` — full draft editor with auto-save (debounced 500ms per decision #97)
|
||||
- `/clusters/{Id}/draft/diff` — three-column diff viewer
|
||||
- `/clusters/{Id}/generations` — list of historical generations with rollback action
|
||||
- Live `sp_ValidateDraft` invocation in the validation panel; publish disabled while errors exist
|
||||
- Publish dialog requires Notes; runs `sp_PublishGeneration` in a transaction
|
||||
|
||||
**Acceptance**:
|
||||
- Create draft → validation panel runs and shows clean state for empty draft
|
||||
- Add an invalid Equipment row (bad UNS segment) → validation panel surfaces the error inline + publish stays disabled
|
||||
- Fix the row → validation panel goes green + publish enables
|
||||
- Publish → generation moves Draft → Published; previous Published moves to Superseded; audit log row created
|
||||
- Roll back to a prior generation → new generation cloned from target; previous generation moves to Superseded; nodes pick up the new generation on next poll
|
||||
- The "Push now" button per decision #96 is rendered but disabled with the "Available in v2.1" label
|
||||
|
||||
#### Task E.5 — UNS Structure + Equipment + Namespace tabs
|
||||
|
||||
Implement the three hybrid tabs:
|
||||
- Namespaces tab — list with click-to-edit-in-draft
|
||||
- UNS Structure tab — tree view with drag-drop reorganize, rename with live impact preview
|
||||
- Equipment tab — list with default sort by ZTag, search across all 5 identifiers
|
||||
|
||||
CSV import for Equipment per the revised schema in `admin-ui.md` (no EquipmentId column; matches by EquipmentUuid for updates per decision #125).
|
||||
|
||||
**Acceptance**:
|
||||
- Add a UnsArea via draft → publishes → appears in tree
|
||||
- Drag a UnsLine to a different UnsArea → impact preview shows count of affected equipment + signals → publish moves it; UUIDs preserved
|
||||
- Equipment CSV import: 10 new rows → all get system-generated EquipmentId + EquipmentUuid; ZTag uniqueness checked against `ExternalIdReservation` (decision #124)
|
||||
- Equipment CSV import: 1 row with existing EquipmentUuid → updates the matched row's editable fields
|
||||
|
||||
#### Task E.6 — Generic JSON config editor for `DriverConfig`
|
||||
|
||||
Per decision #94 — until per-driver editors land in their respective phases, use a generic JSON editor with schema-driven validation against `DriverTypeRegistry`'s registered JSON schema for the driver type.
|
||||
|
||||
**Acceptance**:
|
||||
- Add a Galaxy `DriverInstance` in a draft → JSON editor renders the Galaxy DriverConfig schema
|
||||
- Editing produces live validation errors per the schema
|
||||
- Saving with errors → publish stays disabled
|
||||
|
||||
#### Task E.7 — Real-time updates via SignalR (admin-ui.md §"Real-Time Updates")
|
||||
|
||||
Two SignalR hubs:
|
||||
- `FleetStatusHub` — pushes `ClusterNodeGenerationState` changes
|
||||
- `AlertHub` — pushes new sticky alerts (crash-loop circuit trips, failed applies)
|
||||
|
||||
Backend `IHostedService` polls every 5s and diffs.
|
||||
|
||||
**Acceptance**:
|
||||
- Open Cluster Detail in two browser tabs → publish in tab A → tab B's "current generation" updates within 5s without page reload
|
||||
- Simulate a `LastAppliedStatus = Failed` for a node → AlertHub pushes a sticky alert that doesn't auto-clear
|
||||
|
||||
#### Task E.8 — Release reservation + Merge equipment workflows
|
||||
|
||||
Per `admin-ui.md` §"Release an external-ID reservation" and §"Merge or rebind equipment":
|
||||
- Release flow: FleetAdmin only, requires reason, audit-logged via `sp_ReleaseExternalIdReservation`
|
||||
- Merge flow: opens a draft that disables source equipment, re-points tags, releases + re-reserves IDs
|
||||
|
||||
**Acceptance**:
|
||||
- Release a reservation → `ReleasedAt` set in DB + audit log entry created with reason
|
||||
- After release: same `(Kind, Value)` can be reserved by a different EquipmentUuid in a future publish
|
||||
- Merge equipment A → B: draft preview shows tag re-pointing + ID re-reservation; publish executes atomically; A is disabled with `EquipmentMergedAway` audit entry
|
||||
|
||||
## Compliance Checks (run at exit gate)
|
||||
|
||||
A `phase-1-compliance.ps1` script that exits non-zero on any failure:
|
||||
|
||||
### Schema compliance
|
||||
|
||||
```powershell
|
||||
# Run all migrations against a clean SQL Server instance
|
||||
dotnet ef database update --project src/ZB.MOM.WW.OtOpcUa.Configuration --connection "Server=...;Database=OtOpcUaConfig_Test_$(date +%s);..."
|
||||
|
||||
# Run schema-introspection tests
|
||||
dotnet test tests/ZB.MOM.WW.OtOpcUa.Configuration.Tests --filter "Category=SchemaCompliance"
|
||||
```
|
||||
|
||||
Expected: every table, column, index, FK, CHECK, and stored procedure in `config-db-schema.md` is present and matches.
|
||||
|
||||
### Decision compliance
|
||||
|
||||
```powershell
|
||||
# For each decision number Phase 1 implements (#9, #14-22, #25, #28, #30, #32-33, #46-51, #79-125),
|
||||
# verify at least one citation exists in source, tests, or migrations:
|
||||
$decisions = @(9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 25, 28, 30, 32, 33, 46, 47, 48, 49, 50, 51, 79..125)
|
||||
foreach ($d in $decisions) {
|
||||
$hits = git grep "decision #$d" -- 'src/' 'tests/' 'docs/v2/implementation/'
|
||||
if (-not $hits) { Write-Error "Decision #$d has no citation in code or tests"; exit 1 }
|
||||
}
|
||||
```
|
||||
|
||||
### Visual compliance (Admin UI)
|
||||
|
||||
Manual screenshot review:
|
||||
1. Login page side-by-side with ScadaLink's `Login.razor` rendered
|
||||
2. Sidebar + main layout side-by-side with ScadaLink's `MainLayout.razor` + `NavMenu.razor`
|
||||
3. Dashboard side-by-side with ScadaLink's `Dashboard.razor`
|
||||
4. Reconnect overlay triggered (kill the SignalR connection) — same modal as ScadaLink
|
||||
|
||||
Reviewer answers: "could the same operator move between apps without noticing?" Y/N. N = blocking.
|
||||
|
||||
### Behavioral compliance (end-to-end smoke test)
|
||||
|
||||
```bash
|
||||
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests --filter "Category=Phase1Smoke"
|
||||
```
|
||||
|
||||
The smoke test:
|
||||
1. Spins up SQL Server in a container
|
||||
2. Runs all migrations
|
||||
3. Creates a `OtOpcUaAdmin` SQL principal + `OtOpcUaNode` principal bound to a test NodeId
|
||||
4. Starts the Admin app
|
||||
5. Creates a cluster + 1 node + Equipment-kind namespace via Admin API
|
||||
6. Opens a draft, adds 1 UnsArea + 1 UnsLine + 1 Equipment + 0 tags (empty)
|
||||
7. Publishes the draft
|
||||
8. Boots a Server instance configured with the test NodeId
|
||||
9. Asserts the Server fetched the published generation via `sp_GetCurrentGenerationForCluster`
|
||||
10. Asserts the Server's `ClusterNodeGenerationState` row reports `Applied`
|
||||
11. Adds a tag in a new draft, publishes
|
||||
12. Asserts the Server picks up the new generation within 30s (next poll)
|
||||
13. Rolls back to generation 1
|
||||
14. Asserts the Server picks up the rollback within 30s
|
||||
|
||||
Expected: all 14 steps pass. Smoke test runs in CI on every PR to `v2/phase-1-*` branches.
|
||||
|
||||
### Stability compliance
|
||||
|
||||
For Phase 1 the only stability concern is the in-process driver isolation primitives (used later by Phase 3+ drivers, but built in Phase 1):
|
||||
- `IDriver.Reinitialize()` semantics tested
|
||||
- Driver-instance allocation tracking + cache flush tested with a mock driver
|
||||
- Crash-loop circuit breaker tested with a mock driver that throws on every Reinitialize
|
||||
|
||||
Galaxy is still legacy in-process in Phase 1 — Tier C protections for Galaxy land in Phase 2.
|
||||
|
||||
### Documentation compliance
|
||||
|
||||
```bash
|
||||
# Every Phase 1 task in this doc must either be Done or have a deferral note in exit-gate-phase-1.md
|
||||
# Every decision the phase implements must be reflected in plan.md (no silent decisions)
|
||||
# Schema doc + admin-ui doc must be updated if implementation deviated
|
||||
```
|
||||
|
||||
## Completion Checklist
|
||||
|
||||
The exit gate signs off only when **every** item below is checked. Each item links to the verifying artifact (test name, screenshot, log line, etc.).
|
||||
|
||||
### Stream A — Core.Abstractions
|
||||
- [ ] All 11 capability interfaces defined and compiling
|
||||
- [ ] `DriverAttributeInfo` + supporting enums defined
|
||||
- [ ] `DriverTypeRegistry` implemented with Galaxy registration
|
||||
- [ ] Interface-independence reflection test passes
|
||||
|
||||
### Stream B — Configuration
|
||||
- [ ] EF Core migration `InitialSchema` applies cleanly to a clean SQL Server
|
||||
- [ ] Schema introspection test asserts the live schema matches `config-db-schema.md`
|
||||
- [ ] All stored procedures present and tested (happy path + error paths)
|
||||
- [ ] `sp_PublishGeneration` concurrency test passes (one wins, one fails)
|
||||
- [ ] Authorization tests pass (Node principal limited to its cluster, Admin can read/write fleet-wide)
|
||||
- [ ] All 12 validation rules in `Configuration.Validation` have unit tests
|
||||
- [ ] LiteDB cache round-trip + pruning + corruption tests pass
|
||||
- [ ] Generation-diff applier handles add/remove/modify across all entity types
|
||||
|
||||
### Stream C — Core
|
||||
- [ ] `LmxNodeManager` renamed to `GenericDriverNodeManager`; v1 IntegrationTests still pass
|
||||
- [ ] `GalaxyNodeManager : GenericDriverNodeManager` exists in legacy Host
|
||||
- [ ] `IAddressSpaceBuilder` API implemented; byte-equivalent OPC UA browse output to v1
|
||||
- [ ] Driver hosting + isolation tested with mock drivers (one fails, others continue)
|
||||
- [ ] Memory-budget cache-flush tested with mock driver
|
||||
|
||||
### Stream D — Server
|
||||
- [ ] `Microsoft.Extensions.Hosting` host runs in console mode and as Windows Service
|
||||
- [ ] TopShelf removed from the codebase
|
||||
- [ ] Credential-bound bootstrap tested (correct principal succeeds; wrong principal fails)
|
||||
- [ ] LiteDB fallback on DB outage tested
|
||||
|
||||
### Stream E — Admin
|
||||
- [ ] Admin app boots, login screen renders with ScadaLink-equivalent visual
|
||||
- [ ] LDAP cookie auth works against dev GLAuth
|
||||
- [ ] Admin roles mapped (FleetAdmin / ConfigEditor / ReadOnly)
|
||||
- [ ] Cluster-scoped grants work (decision #105)
|
||||
- [ ] Cluster CRUD works end-to-end
|
||||
- [ ] Draft → diff → publish workflow works end-to-end
|
||||
- [ ] Rollback works end-to-end
|
||||
- [ ] UNS Structure tab supports add / rename / drag-move with impact preview
|
||||
- [ ] Equipment tab supports CSV import + search across 5 identifiers
|
||||
- [ ] Generic JSON config editor renders + validates DriverConfig per registered schema
|
||||
- [ ] SignalR real-time updates work (multi-tab test)
|
||||
- [ ] Release reservation flow works + audit-logged
|
||||
- [ ] Merge equipment flow works + audit-logged
|
||||
|
||||
### Cross-cutting
|
||||
- [ ] `phase-1-compliance.ps1` runs and exits 0
|
||||
- [ ] Smoke test (14 steps) passes in CI
|
||||
- [ ] Visual compliance review signed off (operator-equivalence test)
|
||||
- [ ] All decisions cited in code/tests (`git grep "decision #N"` returns hits for each)
|
||||
- [ ] Adversarial review of the phase diff (`/codex:adversarial-review --base v2`) — findings closed or deferred with rationale
|
||||
- [ ] PR opened against `v2`, includes: link to this doc, link to exit-gate record, compliance script output, smoke test logs, adversarial review output, screenshots
|
||||
- [ ] Reviewer signoff (one reviewer beyond the implementation lead)
|
||||
- [ ] `exit-gate-phase-1.md` recorded
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|:----------:|:------:|------------|
|
||||
| EF Core 10 idiosyncrasies vs the documented schema | Medium | Medium | Schema-introspection test catches drift; validate early in Stream B |
|
||||
| `sp_ValidateDraft` cross-table checks complex enough to be slow | Medium | Medium | Per-decision-cited test exists; benchmark with a large draft (1000+ tags) before exit |
|
||||
| Visual parity with ScadaLink slips because two component libraries diverge over time | Low | Medium | Copy ScadaLink's CSS verbatim where possible; shared component set is structurally identical |
|
||||
| LDAP integration breaks against production GLAuth (different schema than dev) | Medium | High | Use the v1 LDAP layer as the integration reference; mirror its config exactly |
|
||||
| Generation-diff applier has subtle bugs on edge cases (renamed entity with same logical ID) | High | High | Property-based test that generates random diffs and asserts apply-then-rebuild produces the same end state |
|
||||
| ScadaLink.Security pattern works well for site-scoped roles but our cluster-scoped grants are subtly different | Medium | Medium | Side-by-side review of `RoleMapper` after Stream E starts; refactor if claim shape diverges |
|
||||
| Phase 1 takes longer than 6 weeks | High | Medium | Mid-gate review at 3 weeks — if Stream B isn't done, defer Stream E.5–8 to a Phase 1.5 follow-up |
|
||||
| `MERGE` against `ExternalIdReservation` has a deadlock pathology under concurrent publishes | Medium | High | Concurrency test in Task B.2 specifically targets this; if it deadlocks, switch to `INSERT ... WHERE NOT EXISTS` with explicit row locks |
|
||||
|
||||
## Out of Scope (do not do in Phase 1)
|
||||
|
||||
- Galaxy out-of-process split (Phase 2)
|
||||
- Any Modbus / AB / S7 / TwinCAT / FOCAS driver code (Phases 3–5)
|
||||
- Per-driver custom config editors in Admin (each driver's phase)
|
||||
- Equipment-class template integration with the schemas repo
|
||||
- Consumer cutover (Phases 6–8, separate planning track)
|
||||
- ACL / namespace-level authorization for OPC UA clients (corrections doc B1 — needs scoping before Phase 6, parallel work track)
|
||||
- Push-from-DB notification (decision #96 — v2.1)
|
||||
- Generation pruning operator UI (decision #93 — v2.1)
|
||||
- Cluster-scoped admin grant editor in UI (admin-ui.md "Deferred / Out of Scope" — v2.1)
|
||||
- Mobile / tablet layout
|
||||
Reference in New Issue
Block a user