Add Phase 2 detailed implementation plan (docs/v2/implementation/phase-2-galaxy-out-of-process.md) covering the largest refactor phase — moving Galaxy from the legacy in-process OtOpcUa.Host project into the Tier C out-of-process topology specified in driver-stability.md. Five work streams: A. Driver.Galaxy.Shared (.NET Standard 2.0 IPC contracts using MessagePack with hello-message version negotiation), B. Driver.Galaxy.Host (.NET 4.8 x86 separate Windows service that owns MxAccessBridge / GalaxyRepository / alarm tracking / GalaxyRuntimeProbeManager / Wonderware Historian SDK / STA thread + Win32 message pump with health probe / MxAccessHandle SafeHandle for COM lifetime / subscription registry with cross-host quality scoping / named-pipe IPC server with mandatory ACL + caller SID verification + per-process shared secret / memory watchdog with Galaxy-specific 1.5x baseline + 200MB floor + 1.5GB ceiling / recycle policy with 15s grace + WM_QUIT escalation to hard-exit / post-mortem MMF writer / Driver.Galaxy.FaultShim test-only assembly), C. Driver.Galaxy.Proxy (.NET 10 in-process driver implementing every capability interface, heartbeat sender on dedicated channel with 2s/3-miss tolerance, supervisor with respawn-with-backoff and crash-loop circuit breaker with escalating cooldown 1h/4h/24h, address space build via IAddressSpaceBuilder producing byte-equivalent v1 output), D. Retire legacy OtOpcUa.Host (delete from solution, two-service Windows installer, migrate appsettings.json Galaxy sections to central DB DriverConfig blob), E. Parity validation (v1 IntegrationTests pass count = baseline failures = 0, scripted Client.CLI walkthrough output diff vs v1 only differs in timestamps/latency, four named regression tests for the 2026-04-13 stability findings). Compliance script verifies all eight Tier C cross-cutting protections have named passing tests. Decision #128 captures the survey-removal; cross-references added to plan.md Reference Documents and overview.md phase index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
181 lines
10 KiB
Markdown
181 lines
10 KiB
Markdown
# Implementation Plan Overview — OtOpcUa v2
|
||
|
||
> **Status**: DRAFT — defines the gate structure, compliance check approach, and deliverable conventions used across all phase implementation plans (`phase-0-*.md`, `phase-1-*.md`, etc.).
|
||
>
|
||
> **Branch**: `v2`
|
||
> **Created**: 2026-04-17
|
||
|
||
## Purpose
|
||
|
||
Each phase of the v2 build (`plan.md` §6 Migration Strategy) gets a dedicated detailed implementation doc in this folder. This overview defines the structure those docs follow so reviewers can verify compliance with the v2 design without re-reading every artifact.
|
||
|
||
## Phase Gate Structure
|
||
|
||
Every phase has **three gates** the work must pass through:
|
||
|
||
```
|
||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||
START ──┤ ENTRY │── do ──┤ MID │── verify ──┤ EXIT │── PHASE COMPLETE
|
||
│ GATE │ work │ GATE │ artifacts │ GATE │
|
||
└──────────┘ └──────────┘ └──────────┘
|
||
```
|
||
|
||
### Entry gate
|
||
|
||
**Purpose**: ensures the phase starts with a known-good state and all prerequisites met. Prevents starting work on top of broken foundations.
|
||
|
||
**Checked before any phase work begins**:
|
||
- Prior phase has cleared its **exit gate** (or this is Phase 0)
|
||
- Working tree is clean on the appropriate branch
|
||
- All baseline tests for the prior phase still pass
|
||
- Any external dependencies the phase needs are confirmed available
|
||
- Implementation lead has read the phase doc and the relevant sections of `plan.md`, `config-db-schema.md`, `driver-specs.md`, `driver-stability.md`, `admin-ui.md`
|
||
|
||
**Evidence captured**: a short markdown file `entry-gate-{phase}.md` recording the date, signoff, baseline test pass, and any deviations noted.
|
||
|
||
### Mid gate
|
||
|
||
**Purpose**: course-correct partway through the phase. Catches drift before it compounds. Optional for phases ≤ 2 weeks; required for longer phases.
|
||
|
||
**Checked at the midpoint**:
|
||
- Are the highest-risk deliverables landing on schedule?
|
||
- Have any new design questions surfaced that the v2 docs don't answer? If so, escalate to plan revision before continuing.
|
||
- Are tests being written alongside code, or accumulating as a backlog?
|
||
- Has any decision (`plan.md` decision log) been silently violated by the implementation? If so, either revise the implementation or revise the decision (with explicit "supersedes" entry).
|
||
|
||
**Evidence captured**: short status update appended to the phase doc.
|
||
|
||
### Exit gate
|
||
|
||
**Purpose**: ensures the phase actually achieved what the v2 design specified, not just "the code compiles". This is where compliance verification happens.
|
||
|
||
**Checked before the phase is declared complete**:
|
||
- All **acceptance criteria** for every task in the phase doc are met (each criterion has explicit evidence)
|
||
- All **compliance checks** (see below) pass
|
||
- All **completion checklist** items are ticked, with links to the verifying artifact (test, screenshot, log line, etc.)
|
||
- Phase commit history is clean (no half-merged WIP, no skipped hooks)
|
||
- Documentation updates merged: any change in approach during the phase is reflected back in the v2 design docs (`plan.md` decision log gets new entries; `config-db-schema.md` updated if schema differed from spec; etc.)
|
||
- Adversarial review run on the phase output (`/codex:adversarial-review` or equivalent) — findings closed or explicitly deferred with rationale
|
||
- Implementation lead **and** one other reviewer sign off
|
||
|
||
**Evidence captured**: `exit-gate-{phase}.md` recording all of the above with links and signatures.
|
||
|
||
## Compliance Check Categories
|
||
|
||
Phase exit gates run compliance checks across these axes. Each phase doc enumerates the specific checks for that phase under "Compliance Checks".
|
||
|
||
### 1. Schema compliance (Phase 1+)
|
||
|
||
For phases that touch the central config DB:
|
||
- Run EF Core migrations against a clean SQL Server instance
|
||
- Diff the resulting schema against the DDL in `config-db-schema.md`:
|
||
- Table list matches
|
||
- Column types and nullability match
|
||
- Indexes (regular + unique + filtered) match
|
||
- CHECK constraints match
|
||
- Foreign keys match
|
||
- Stored procedures present and signatures match
|
||
- Any drift = blocking. Either fix the migration or update the schema doc with explicit reasoning, then re-run.
|
||
|
||
### 2. Decision compliance
|
||
|
||
For each decision number cited in the phase doc (`#XX` references to `plan.md` decision log):
|
||
- Locate the artifact (code module, test, configuration file) that demonstrates the decision is honored
|
||
- Add a code comment or test name that cites the decision number
|
||
- Phase exit gate uses a script (or grep) to verify every cited decision has at least one citation in the codebase
|
||
|
||
This makes the decision log a **load-bearing reference**, not a historical record.
|
||
|
||
### 3. Visual compliance (Admin UI phases)
|
||
|
||
For phases that touch the Admin UI:
|
||
- Side-by-side screenshots of equivalent ScadaLink CentralUI screens vs the new OtOpcUa Admin screens
|
||
- Login page, sidebar, dashboard, generic forms — must visually match per `admin-ui.md` §"Visual Design — Direct Parity with ScadaLink"
|
||
- Reviewer signoff: "could the same operator move between apps without noticing?"
|
||
|
||
### 4. Behavioral compliance (end-to-end smoke tests)
|
||
|
||
For each phase, an integration test exercises the new capability end-to-end:
|
||
- Phase 0: existing v1 IntegrationTests pass under the renamed projects
|
||
- Phase 1: create a cluster → publish a generation → node fetches the generation → roll back → fetch again
|
||
- Phase 2: v1 IntegrationTests parity suite passes against the v2 Galaxy.Host (per decision #56)
|
||
- Phase 3+: per-driver smoke test against the simulator
|
||
|
||
Smoke tests are **always green at exit**, never "known broken, fix later".
|
||
|
||
### 5. Stability compliance (Phase 2+ for Tier C drivers)
|
||
|
||
For phases that introduce Tier C drivers (Galaxy in Phase 2, FOCAS in Phase 5):
|
||
- All `Driver Stability & Isolation` cross-cutting protections from `driver-stability.md` §"Cross-Cutting Protections" are wired up:
|
||
- SafeHandle wrappers exist for every native handle
|
||
- Memory watchdog runs and triggers recycle on threshold breach (testable via FaultShim)
|
||
- Crash-loop circuit breaker fires after 3 crashes / 5 min (testable via stub-injected crash)
|
||
- Heartbeat between proxy and host functions; missed heartbeats trigger respawn
|
||
- Post-mortem MMF survives a hard process kill and the supervisor reads it on respawn
|
||
- Each protection has a regression test in the driver's test suite
|
||
|
||
### 6. Documentation compliance
|
||
|
||
For every phase:
|
||
- Any deviation from the v2 design docs (`plan.md`, `config-db-schema.md`, `admin-ui.md`, `driver-specs.md`, `driver-stability.md`, `test-data-sources.md`) is reflected back in the docs
|
||
- New decisions added to the decision log with rationale
|
||
- Old decisions superseded explicitly (not silently)
|
||
- Cross-references between docs stay current
|
||
|
||
## Deliverable Types
|
||
|
||
Each phase produces a defined set of deliverables. The phase doc enumerates which deliverables apply.
|
||
|
||
| Type | Format | Purpose |
|
||
|------|--------|---------|
|
||
| **Code** | Source files committed to a feature branch, merged to `v2` after exit gate | The implementation itself |
|
||
| **Tests** | xUnit unit + integration tests; per-phase smoke tests | Behavioral evidence |
|
||
| **Migrations** | EF Core migrations under `Configuration/Migrations/` | Schema delta |
|
||
| **Decision-log entries** | New rows appended to `plan.md` decision table | Architectural choices made during the phase |
|
||
| **Doc updates** | Edits to existing v2 docs | Keep design and implementation aligned |
|
||
| **Gate records** | `entry-gate-{phase}.md`, `exit-gate-{phase}.md` in this folder | Audit trail of gate clearance |
|
||
| **Compliance script** | Per-phase shell or PowerShell script that runs the compliance checks | Repeatable verification |
|
||
| **Adversarial review** | `/codex:adversarial-review` output on the phase diff | Independent challenge |
|
||
|
||
## Branch and PR Conventions
|
||
|
||
| Branch | Purpose |
|
||
|--------|---------|
|
||
| `v2` | Long-running design + implementation branch. All phase work merges here. |
|
||
| `v2/phase-{N}-{slug}` | Per-phase feature branch (e.g. `v2/phase-0-rename`) |
|
||
| `v2/phase-{N}-{slug}-{subtask}` | Per-subtask branches when the phase is large enough to warrant them |
|
||
|
||
Each phase merges to `v2` via PR after the exit gate clears. PRs include:
|
||
- Link to the phase implementation doc
|
||
- Link to the exit-gate record
|
||
- Compliance-script output
|
||
- Adversarial-review output
|
||
- Reviewer signoffs
|
||
|
||
The `master` branch stays at v1 production state until all phases are complete and a separate v2 release decision is made.
|
||
|
||
## What Counts as "Following the Plan"
|
||
|
||
The implementation **follows the plan** when, at every phase exit gate:
|
||
|
||
1. Every task listed in the phase doc has been done OR explicitly deferred with rationale
|
||
2. Every compliance check has a passing artifact OR an explicit deviation note signed off by the reviewer
|
||
3. The codebase contains traceable references to every decision number the phase implements
|
||
4. The v2 design docs are updated to reflect any approach changes
|
||
5. The smoke test for the phase passes
|
||
6. Two people have signed off — implementation lead + one other reviewer
|
||
|
||
The implementation **deviates from the plan** when any of those conditions fails. Deviations are not failures; they are signals to update the plan or revise the implementation. The unrecoverable failure mode is **silent deviation** — code that doesn't match the plan, with no decision-log update explaining why. The exit gate's compliance checks exist specifically to make silent deviation impossible to ship.
|
||
|
||
## Phase Implementation Docs
|
||
|
||
| Phase | Doc | Status |
|
||
|-------|-----|--------|
|
||
| 0 | [`phase-0-rename-and-net10.md`](phase-0-rename-and-net10.md) | DRAFT |
|
||
| 1 | [`phase-1-configuration-and-admin-scaffold.md`](phase-1-configuration-and-admin-scaffold.md) | DRAFT |
|
||
| 2 | [`phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md) | DRAFT |
|
||
| 3 | (Phase 3: Modbus TCP driver — TBD) | NOT STARTED |
|
||
| 4 | (Phase 4: PLC drivers AB CIP / AB Legacy / S7 / TwinCAT — TBD) | NOT STARTED |
|
||
| 5 | (Phase 5: Specialty drivers FOCAS / OPC UA Client — TBD) | NOT STARTED |
|
||
| 6+ | (Phases 6–8: tier 1/2/3 consumer cutover — separate planning track per corrections doc C5) | NOT SCOPED |
|