Files
lmxopcua/docs/v2/implementation/overview.md
Joseph Doherty 4903a19ec9 Add data-path ACL design (acl-design.md, closes corrections B1) + dev-environment inventory and setup plan (dev-environment.md), and remove consumer cutover from OtOpcUa v2 scope.
ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list.

Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host).

Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered.

Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 11:58:33 -04:00

182 lines
11 KiB
Markdown

# Implementation Plan Overview — OtOpcUa v2
> **Status**: DRAFT — defines the gate structure, compliance check approach, and deliverable conventions used across all phase implementation plans (`phase-0-*.md`, `phase-1-*.md`, etc.).
>
> **Branch**: `v2`
> **Created**: 2026-04-17
## Purpose
Each phase of the v2 build (`plan.md` §6 Migration Strategy) gets a dedicated detailed implementation doc in this folder. This overview defines the structure those docs follow so reviewers can verify compliance with the v2 design without re-reading every artifact.
## Phase Gate Structure
Every phase has **three gates** the work must pass through:
```
┌──────────┐ ┌──────────┐ ┌──────────┐
START ──┤ ENTRY │── do ──┤ MID │── verify ──┤ EXIT │── PHASE COMPLETE
│ GATE │ work │ GATE │ artifacts │ GATE │
└──────────┘ └──────────┘ └──────────┘
```
### Entry gate
**Purpose**: ensures the phase starts with a known-good state and all prerequisites met. Prevents starting work on top of broken foundations.
**Checked before any phase work begins**:
- Prior phase has cleared its **exit gate** (or this is Phase 0)
- Working tree is clean on the appropriate branch
- All baseline tests for the prior phase still pass
- Any external dependencies the phase needs are confirmed available
- Implementation lead has read the phase doc and the relevant sections of `plan.md`, `config-db-schema.md`, `driver-specs.md`, `driver-stability.md`, `admin-ui.md`
**Evidence captured**: a short markdown file `entry-gate-{phase}.md` recording the date, signoff, baseline test pass, and any deviations noted.
### Mid gate
**Purpose**: course-correct partway through the phase. Catches drift before it compounds. Optional for phases ≤ 2 weeks; required for longer phases.
**Checked at the midpoint**:
- Are the highest-risk deliverables landing on schedule?
- Have any new design questions surfaced that the v2 docs don't answer? If so, escalate to plan revision before continuing.
- Are tests being written alongside code, or accumulating as a backlog?
- Has any decision (`plan.md` decision log) been silently violated by the implementation? If so, either revise the implementation or revise the decision (with explicit "supersedes" entry).
**Evidence captured**: short status update appended to the phase doc.
### Exit gate
**Purpose**: ensures the phase actually achieved what the v2 design specified, not just "the code compiles". This is where compliance verification happens.
**Checked before the phase is declared complete**:
- All **acceptance criteria** for every task in the phase doc are met (each criterion has explicit evidence)
- All **compliance checks** (see below) pass
- All **completion checklist** items are ticked, with links to the verifying artifact (test, screenshot, log line, etc.)
- Phase commit history is clean (no half-merged WIP, no skipped hooks)
- Documentation updates merged: any change in approach during the phase is reflected back in the v2 design docs (`plan.md` decision log gets new entries; `config-db-schema.md` updated if schema differed from spec; etc.)
- Adversarial review run on the phase output (`/codex:adversarial-review` or equivalent) — findings closed or explicitly deferred with rationale
- Implementation lead **and** one other reviewer sign off
**Evidence captured**: `exit-gate-{phase}.md` recording all of the above with links and signatures.
## Compliance Check Categories
Phase exit gates run compliance checks across these axes. Each phase doc enumerates the specific checks for that phase under "Compliance Checks".
### 1. Schema compliance (Phase 1+)
For phases that touch the central config DB:
- Run EF Core migrations against a clean SQL Server instance
- Diff the resulting schema against the DDL in `config-db-schema.md`:
- Table list matches
- Column types and nullability match
- Indexes (regular + unique + filtered) match
- CHECK constraints match
- Foreign keys match
- Stored procedures present and signatures match
- Any drift = blocking. Either fix the migration or update the schema doc with explicit reasoning, then re-run.
### 2. Decision compliance
For each decision number cited in the phase doc (`#XX` references to `plan.md` decision log):
- Locate the artifact (code module, test, configuration file) that demonstrates the decision is honored
- Add a code comment or test name that cites the decision number
- Phase exit gate uses a script (or grep) to verify every cited decision has at least one citation in the codebase
This makes the decision log a **load-bearing reference**, not a historical record.
### 3. Visual compliance (Admin UI phases)
For phases that touch the Admin UI:
- Side-by-side screenshots of equivalent ScadaLink CentralUI screens vs the new OtOpcUa Admin screens
- Login page, sidebar, dashboard, generic forms — must visually match per `admin-ui.md` §"Visual Design — Direct Parity with ScadaLink"
- Reviewer signoff: "could the same operator move between apps without noticing?"
### 4. Behavioral compliance (end-to-end smoke tests)
For each phase, an integration test exercises the new capability end-to-end:
- Phase 0: existing v1 IntegrationTests pass under the renamed projects
- Phase 1: create a cluster → publish a generation → node fetches the generation → roll back → fetch again
- Phase 2: v1 IntegrationTests parity suite passes against the v2 Galaxy.Host (per decision #56)
- Phase 3+: per-driver smoke test against the simulator
Smoke tests are **always green at exit**, never "known broken, fix later".
### 5. Stability compliance (Phase 2+ for Tier C drivers)
For phases that introduce Tier C drivers (Galaxy in Phase 2, FOCAS in Phase 5):
- All `Driver Stability & Isolation` cross-cutting protections from `driver-stability.md` §"Cross-Cutting Protections" are wired up:
- SafeHandle wrappers exist for every native handle
- Memory watchdog runs and triggers recycle on threshold breach (testable via FaultShim)
- Crash-loop circuit breaker fires after 3 crashes / 5 min (testable via stub-injected crash)
- Heartbeat between proxy and host functions; missed heartbeats trigger respawn
- Post-mortem MMF survives a hard process kill and the supervisor reads it on respawn
- Each protection has a regression test in the driver's test suite
### 6. Documentation compliance
For every phase:
- Any deviation from the v2 design docs (`plan.md`, `config-db-schema.md`, `admin-ui.md`, `driver-specs.md`, `driver-stability.md`, `test-data-sources.md`) is reflected back in the docs
- New decisions added to the decision log with rationale
- Old decisions superseded explicitly (not silently)
- Cross-references between docs stay current
## Deliverable Types
Each phase produces a defined set of deliverables. The phase doc enumerates which deliverables apply.
| Type | Format | Purpose |
|------|--------|---------|
| **Code** | Source files committed to a feature branch, merged to `v2` after exit gate | The implementation itself |
| **Tests** | xUnit unit + integration tests; per-phase smoke tests | Behavioral evidence |
| **Migrations** | EF Core migrations under `Configuration/Migrations/` | Schema delta |
| **Decision-log entries** | New rows appended to `plan.md` decision table | Architectural choices made during the phase |
| **Doc updates** | Edits to existing v2 docs | Keep design and implementation aligned |
| **Gate records** | `entry-gate-{phase}.md`, `exit-gate-{phase}.md` in this folder | Audit trail of gate clearance |
| **Compliance script** | Per-phase shell or PowerShell script that runs the compliance checks | Repeatable verification |
| **Adversarial review** | `/codex:adversarial-review` output on the phase diff | Independent challenge |
## Branch and PR Conventions
| Branch | Purpose |
|--------|---------|
| `v2` | Long-running design + implementation branch. All phase work merges here. |
| `v2/phase-{N}-{slug}` | Per-phase feature branch (e.g. `v2/phase-0-rename`) |
| `v2/phase-{N}-{slug}-{subtask}` | Per-subtask branches when the phase is large enough to warrant them |
Each phase merges to `v2` via PR after the exit gate clears. PRs include:
- Link to the phase implementation doc
- Link to the exit-gate record
- Compliance-script output
- Adversarial-review output
- Reviewer signoffs
The `master` branch stays at v1 production state until all phases are complete and a separate v2 release decision is made.
## What Counts as "Following the Plan"
The implementation **follows the plan** when, at every phase exit gate:
1. Every task listed in the phase doc has been done OR explicitly deferred with rationale
2. Every compliance check has a passing artifact OR an explicit deviation note signed off by the reviewer
3. The codebase contains traceable references to every decision number the phase implements
4. The v2 design docs are updated to reflect any approach changes
5. The smoke test for the phase passes
6. Two people have signed off — implementation lead + one other reviewer
The implementation **deviates from the plan** when any of those conditions fails. Deviations are not failures; they are signals to update the plan or revise the implementation. The unrecoverable failure mode is **silent deviation** — code that doesn't match the plan, with no decision-log update explaining why. The exit gate's compliance checks exist specifically to make silent deviation impossible to ship.
## Phase Implementation Docs
| Phase | Doc | Status |
|-------|-----|--------|
| 0 | [`phase-0-rename-and-net10.md`](phase-0-rename-and-net10.md) | DRAFT |
| 1 | [`phase-1-configuration-and-admin-scaffold.md`](phase-1-configuration-and-admin-scaffold.md) | DRAFT |
| 2 | [`phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md) | DRAFT |
| 3 | (Phase 3: Modbus TCP driver — TBD) | NOT STARTED |
| 4 | (Phase 4: PLC drivers AB CIP / AB Legacy / S7 / TwinCAT — TBD) | NOT STARTED |
| 5 | (Phase 5: Specialty drivers FOCAS / OPC UA Client — TBD) | NOT STARTED |
**Consumer cutover (ScadaBridge / Ignition / System Platform IO) is OUT of v2 scope.** It is a separate work track owned by the integration / operations team, tracked in the 3-year-plan handoff (`handoffs/otopcua-handoff.md` §"Rollout Posture") and the corrections doc (§C5). The OtOpcUa team's responsibility ends at Phase 5 (all drivers built, all stability protections in place, full Admin UI shipped). Cutover sequencing, validation methodology, rollback procedures, and Aveva-pattern validation for tier 3 are the integration team's deliverables.