After shipping the four Phase 6 plan drafts (PRs 77-80), the adversarial-review
adjustments lived only as trailing "Review" sections. An implementer reading
Stream A would find the original unadjusted guidance, then have to cross-reference
the review to reconcile. This PR makes the plans genuinely executable:
1. Merges every ACCEPTed review finding into the actual Scope / Stream / Compliance
sections of each phase plan:
- phase-6-1: Scope table rewrite (per-capability retry, (instance,host) pipeline key,
MemoryTracking vs MemoryRecycle split, hybrid watchdog formula, demand-aware
wedge detector, generation-sealed LiteDB). Streams A/B/D + Compliance rewritten.
- phase-6-2: AuthorizationDecision tri-state, control/data-plane separation,
MembershipFreshnessInterval (15 min), AuthCacheMaxStaleness (5 min),
subscription stamp-and-reevaluate. Stream C widened to 11 OPC UA operations.
- phase-6-3: 8-state ServiceLevel matrix (OPC UA Part 5 §6.3.34-compliant),
two-layer peer probe (/healthz + UaHealthProbe), apply-lease via await using,
publish-generation fencing, InvalidTopology runtime state, ServerUriArray
self-first + peers. New Stream F (interop matrix + Galaxy failover).
- phase-6-4: DraftRevisionToken concurrency control, staged-import via
EquipmentImportBatch with user-scoped visibility, CSV header version marker,
decision-#117-aligned identifier columns, 1000-row diff cap,
decision-#139 OPC 40010 fields, Identification inherits Equipment ACL.
2. Appends decisions #143 through #162 to docs/v2/plan.md capturing the
architectural commitments the adjustments created. Each decision carries its
dated rationale so future readers know why the choice was made.
3. Scaffolds scripts/compliance/phase-6-{1,2,3,4}-compliance.ps1 — PowerShell
stubs with Assert-Todo / Assert-Pass / Assert-Fail helpers. Every check
maps to a Stream task ID from the corresponding phase plan. Currently all
checks are TODO and scripts exit 0; each implementation task is responsible
for replacing its TODO with a real check before closing that task. Saved
as UTF-8 with BOM so Windows PowerShell 5.1 parses em-dash characters
without breaking.
Net result: the Phase 6.1 plan is genuinely ready to execute. Stream A.3 can
start tomorrow without reconciling Streams vs. Review on every task; the
compliance script is wired to the Stream IDs; plan.md has the architectural
commitments that justify the Stream choices.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OtOpcUa-side requirements all met or trivially met by v2: Basic256Sha256 + SignAndEncrypt + username token (transport security covers this), reject-and-trust cert workflow, endpoint URL must NOT include /discovery suffix (forum-documented failure mode), hostname-stable certs (decision #86 already enforces this since clients pin trust to ApplicationUri), OI Gateway service must NOT run under SYSTEM (deployment-guide concern). Two integrator-burden risks tracked: validation/GxP paperwork (no AVEVA blueprint exists for non-AVEVA upstream servers in Part 11 deployments — engage QA/regulatory in Year 1) and unpublished scale benchmarks (in-house benchmark required in Year 2 before cutover scheduling).
Phase 1 acceptance gains Task E.10 (decision #142): end-to-end AppServer-via-OI-Gateway smoke test against a Phase 1 OtOpcUa instance, catching AppServer-specific quirks (cert exchange, endpoint URL handling, service account, security mode combo) well before the Year 3 tier-3 cutover schedule. Non-blocking for Phase 1 exit if it surfaces only documentation-level fixes; blocking if it surfaces architectural incompatibility.
New file `docs/v2/aveva-system-platform-io-research.md` captures the full research with all source citations (AVEVA docs, Communications Drivers Pack readmes, Software Toolbox / InSource partner walkthroughs, Inductive Automation forum failure-mode reports). Plan.md decision log gains #141 and #142; Reference Documents section links the new doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list.
Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host).
Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered.
Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Phase 2 detailed implementation plan (docs/v2/implementation/phase-2-galaxy-out-of-process.md) covering the largest refactor phase — moving Galaxy from the legacy in-process OtOpcUa.Host project into the Tier C out-of-process topology specified in driver-stability.md. Five work streams: A. Driver.Galaxy.Shared (.NET Standard 2.0 IPC contracts using MessagePack with hello-message version negotiation), B. Driver.Galaxy.Host (.NET 4.8 x86 separate Windows service that owns MxAccessBridge / GalaxyRepository / alarm tracking / GalaxyRuntimeProbeManager / Wonderware Historian SDK / STA thread + Win32 message pump with health probe / MxAccessHandle SafeHandle for COM lifetime / subscription registry with cross-host quality scoping / named-pipe IPC server with mandatory ACL + caller SID verification + per-process shared secret / memory watchdog with Galaxy-specific 1.5x baseline + 200MB floor + 1.5GB ceiling / recycle policy with 15s grace + WM_QUIT escalation to hard-exit / post-mortem MMF writer / Driver.Galaxy.FaultShim test-only assembly), C. Driver.Galaxy.Proxy (.NET 10 in-process driver implementing every capability interface, heartbeat sender on dedicated channel with 2s/3-miss tolerance, supervisor with respawn-with-backoff and crash-loop circuit breaker with escalating cooldown 1h/4h/24h, address space build via IAddressSpaceBuilder producing byte-equivalent v1 output), D. Retire legacy OtOpcUa.Host (delete from solution, two-service Windows installer, migrate appsettings.json Galaxy sections to central DB DriverConfig blob), E. Parity validation (v1 IntegrationTests pass count = baseline failures = 0, scripted Client.CLI walkthrough output diff vs v1 only differs in timestamps/latency, four named regression tests for the 2026-04-13 stability findings). Compliance script verifies all eight Tier C cross-cutting protections have named passing tests. Decision #128 captures the survey-removal; cross-references added to plan.md Reference Documents and overview.md phase index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>