Confirm the v2 driver list as fixed (decision #128) and remove the Equipment Protocol Survey from the v2 prerequisites — the seven committed drivers (Modbus TCP including DL205, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, OPC UA Client) plus Galaxy/MXAccess are confirmed by direct knowledge of the equipment estate (TwinCAT and AB Legacy specifically called out by the OtOpcUa team based on known Beckhoff installations and SLC/MicroLogix legacy equipment); the survey may still inform long-tail driver scoping and per-site capacity planning but adding/removing drivers from the v2 implementation list is now out of scope. Phase-1 implementation doc loses the survey row from its Out-of-Scope table.

Add Phase 2 detailed implementation plan (docs/v2/implementation/phase-2-galaxy-out-of-process.md) covering the largest refactor phase — moving Galaxy from the legacy in-process OtOpcUa.Host project into the Tier C out-of-process topology specified in driver-stability.md. Five work streams: A. Driver.Galaxy.Shared (.NET Standard 2.0 IPC contracts using MessagePack with hello-message version negotiation), B. Driver.Galaxy.Host (.NET 4.8 x86 separate Windows service that owns MxAccessBridge / GalaxyRepository / alarm tracking / GalaxyRuntimeProbeManager / Wonderware Historian SDK / STA thread + Win32 message pump with health probe / MxAccessHandle SafeHandle for COM lifetime / subscription registry with cross-host quality scoping / named-pipe IPC server with mandatory ACL + caller SID verification + per-process shared secret / memory watchdog with Galaxy-specific 1.5x baseline + 200MB floor + 1.5GB ceiling / recycle policy with 15s grace + WM_QUIT escalation to hard-exit / post-mortem MMF writer / Driver.Galaxy.FaultShim test-only assembly), C. Driver.Galaxy.Proxy (.NET 10 in-process driver implementing every capability interface, heartbeat sender on dedicated channel with 2s/3-miss tolerance, supervisor with respawn-with-backoff and crash-loop circuit breaker with escalating cooldown 1h/4h/24h, address space build via IAddressSpaceBuilder producing byte-equivalent v1 output), D. Retire legacy OtOpcUa.Host (delete from solution, two-service Windows installer, migrate appsettings.json Galaxy sections to central DB DriverConfig blob), E. Parity validation (v1 IntegrationTests pass count = baseline failures = 0, scripted Client.CLI walkthrough output diff vs v1 only differs in timestamps/latency, four named regression tests for the 2026-04-13 stability findings). Compliance script verifies all eight Tier C cross-cutting protections have named passing tests. Decision #128 captures the survey-removal; cross-references added to plan.md Reference Documents and overview.md phase index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-17 11:35:04 -04:00
parent 592fa79e3c
commit 2a6c9828e4
4 changed files with 508 additions and 2 deletions

View File

@@ -892,6 +892,7 @@ Each step leaves the system runnable. The generic extraction is effectively free
| 125 | `Equipment.EquipmentId` is system-generated (`'EQ-' + first 12 hex chars of EquipmentUuid`), never operator-supplied or editable, never in CSV imports | Operator-supplied IDs are a real corruption path: typos and bulk-import renames mint new EquipmentIds, which then get new UUIDs even when the physical asset is the same. That permanently splits downstream joins keyed on EquipmentUuid. Removing operator authoring of EquipmentId eliminates the failure mode entirely. CSV imports match by EquipmentUuid (preferred) for updates; rows without UUID create new equipment with system-generated identifiers. Explicit Merge / Rebind operator flow handles the rare case where two UUIDs need to be reconciled. (Closes adversarial review 2026-04-17 finding #4, supersedes part of #116) | 2026-04-17 |
| 126 | Three-gate model (entry / mid / exit) for every implementation phase, with explicit compliance-check categories | Specified in `implementation/overview.md`. Categories: schema compliance (DB matches the doc), decision compliance (every decision number has a code/test citation), visual compliance (Admin UI parity with ScadaLink), behavioral compliance (per-phase smoke test), stability compliance (cross-cutting protections wired up for Tier C drivers), documentation compliance (any deviation reflected back in v2 docs). Exit gate requires two-reviewer signoff; silent deviation is the failure mode the gates exist to prevent | 2026-04-17 |
| 127 | Per-phase implementation docs live under `docs/v2/implementation/` with structured task / acceptance / compliance / completion sections | Each phase doc enumerates: scope (in / out), entry gate checklist, task breakdown with per-task acceptance criteria, compliance checks (script-runnable), behavioral smoke test, completion checklist. Phase 0 + Phase 1 docs are committed; Phases 28 land as their predecessors clear exit gates | 2026-04-17 |
| 128 | Driver list is fixed for v2.0 — Equipment Protocol Survey is NOT a prerequisite | The seven committed drivers (Modbus TCP including DL205, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, OPC UA Client) plus the existing Galaxy/MXAccess driver are confirmed by direct knowledge of the equipment estate, not pending the formal survey. Supersedes the corrections-doc concern (C1) that the v2 commitment was made pre-survey. The survey may still produce useful inventory data for downstream planning (capacity, prioritization), but adding or removing drivers from the v2 implementation list is out of scope. Closes corrections-doc C1 | 2026-04-17 |
## Reference Documents
@@ -903,6 +904,7 @@ Each step leaves the system runnable. The generic extraction is effectively free
- **[Implementation Plan Overview](implementation/overview.md)** — phase gate structure (entry / mid / exit), compliance check categories (schema / decision / visual / behavioral / stability / documentation), deliverable conventions, "what counts as following the plan"
- **[Phase 0 — Rename + .NET 10 cleanup](implementation/phase-0-rename-and-net10.md)** — mechanical LmxOpcUa → OtOpcUa rename with full task breakdown, compliance checks, completion checklist
- **[Phase 1 — Configuration + Core.Abstractions + Admin scaffold](implementation/phase-1-configuration-and-admin-scaffold.md)** — central MSSQL schema, EF Core migrations, stored procs, LDAP-authenticated Blazor Server admin app with ScadaLink visual parity, LiteDB local cache, generation-diff applier; 5 work streams (AE), full task breakdown, compliance checks, 14-step end-to-end smoke test
- **[Phase 2 — Galaxy out-of-process refactor (Tier C)](implementation/phase-2-galaxy-out-of-process.md)** — split legacy in-process Galaxy into `Driver.Galaxy.Shared` (.NET Standard 2.0 IPC contracts) + `Driver.Galaxy.Host` (.NET 4.8 x86 separate Windows service with STA pump, COM SafeHandle wrappers, named-pipe IPC with mandatory ACL, memory watchdog, scheduled recycle with WM_QUIT escalation, post-mortem MMF, FaultShim) + `Driver.Galaxy.Proxy` (.NET 10 in-process IDriver implementation with heartbeat sender and crash-loop circuit-breaker supervisor); retire legacy `OtOpcUa.Host` project; parity gate is v1 IntegrationTests + scripted Client.CLI walkthrough passing byte-equivalent to v1; closes the four 2026-04-13 stability findings as named regression tests
## Out of Scope / Deferred