From 3b2d0474a77d914c4b4bf402f2c2f4d4a33cb7b2 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 19 Apr 2026 10:32:21 -0400 Subject: [PATCH] =?UTF-8?q?v2=20release-readiness=20capstone=20=E2=80=94?= =?UTF-8?q?=20aggregate=20compliance=20runner=20+=20release-readiness=20da?= =?UTF-8?q?shboard?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes out Phase 6 with the two pieces a release engineer needs before tagging v2 GA: 1. scripts/compliance/phase-6-all.ps1 — meta-runner that invokes every per-phase Phase 6.N compliance script in sequence + aggregates results. Each sub-script runs in its own powershell.exe child process so per-script $ErrorActionPreference + exit semantics can't interfere with the parent. Exit 0 = every phase passes; exit 1 = one or more phases failed. Prints a PASS/FAIL summary matrix at the end. 2. docs/v2/v2-release-readiness.md — single-view dashboard of everything shipped + everything still deferred + release exit criteria. Called out explicitly: - Three release BLOCKERS (must close before v2 GA): * Phase 6.2 Stream C dispatch wiring — AuthorizationGate exists but no DriverNodeManager Read/Write/etc. path calls it (task #143). * Phase 6.1 Stream D follow-up — ResilientConfigReader + sealed-cache hook not yet consumed by any read path (task #136). * Phase 6.3 Streams A/C/F — coordinator + UA-node wiring + client interop still deferred (tasks #145, #147, #150). - Three nice-to-haves (not release-blocking) — Admin UI polish, background services, multi-host dispatch. - Release exit criteria: all 4 compliance scripts exit 0, dotnet test ≤ 1 known flake, blockers closed or v2.1-deferred with written decision, Fleet Admin signoff on deployment checklist, live-Galaxy smoke test, OPC UA CTT pass, redundancy cutover validated with at least one production client. - Change log at the bottom so future ships of deferred follow-ups just append dates + close out dashboard rows. Meta-runner verified locally: Phase 6.1 — PASS Phase 6.2 — PASS Phase 6.3 — PASS Phase 6.4 — PASS Aggregate: PASS (elapsed 340 s — most of that is the full solution `dotnet test` each phase runs). Net counts at capstone time: 906 baseline → 1159 passing across Phase 6 (+253). 15 deferred follow-up tasks tracked with IDs (#134-137, #143-144, #145, #147, #149-150, #153, #155-157). v2 is NOT YET release-ready — capstone makes that explicit rather than letting the "shipped" label on each phase imply full readiness. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/v2/v2-release-readiness.md | 102 +++++++++++++++++++++++++++++ scripts/compliance/phase-6-all.ps1 | 77 ++++++++++++++++++++++ 2 files changed, 179 insertions(+) create mode 100644 docs/v2/v2-release-readiness.md create mode 100644 scripts/compliance/phase-6-all.ps1 diff --git a/docs/v2/v2-release-readiness.md b/docs/v2/v2-release-readiness.md new file mode 100644 index 0000000..7effcd4 --- /dev/null +++ b/docs/v2/v2-release-readiness.md @@ -0,0 +1,102 @@ +# v2 Release Readiness + +> **Last updated**: 2026-04-19 (Phase 6.4 data layer merged) +> **Status**: **NOT YET RELEASE-READY** — four Phase 6 data-layer ships have landed, but several production-path wirings are still deferred. + +This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered. + +## Release-readiness dashboard + +| Phase | Shipped | Status | +|---|---|---| +| Phase 0 — Rename + entry gate | ✓ | Shipped | +| Phase 1 — Configuration + Admin scaffold | ✓ | Shipped (some UI items deferred to 6.4) | +| Phase 2 — Galaxy driver split (Proxy/Host/Shared) | ✓ | Shipped | +| Phase 3 — OPC UA server + LDAP + security profiles | ✓ | Shipped | +| Phase 4 — Redundancy scaffold (entities + endpoints) | ✓ | Shipped (runtime closes in 6.3) | +| Phase 5 — Drivers | ⚠ partial | Galaxy / Modbus / S7 / OpcUaClient shipped; AB CIP / AB Legacy / TwinCAT / FOCAS deferred (task #120) | +| Phase 6.1 — Resilience & Observability | ✓ | **SHIPPED** (PRs #78–83) | +| Phase 6.2 — Authorization runtime | ◐ core | **SHIPPED (core)** (PRs #84–88); dispatch wiring + Admin UI deferred | +| Phase 6.3 — Redundancy runtime | ◐ core | **SHIPPED (core)** (PRs #89–90); coordinator + UA-node wiring + Admin UI + interop deferred | +| Phase 6.4 — Admin UI completion | ◐ data layer | **SHIPPED (data layer)** (PRs #91–92); Blazor UI + OPC 40010 address-space wiring deferred | + +**Aggregate test counts:** 906 baseline (pre-Phase-6) → **1159 passing** across Phase 6. One pre-existing Client.CLI `SubscribeCommandTests.Execute_PrintsSubscriptionMessage` flake tracked separately. + +## Release blockers (must close before v2 GA) + +Ordered by severity + impact on production fitness. + +### Security — Phase 6.2 dispatch wiring (task #143) + +The `AuthorizationGate` + `IPermissionEvaluator` + `PermissionTrie` stack is fully built and unit-tested, **but no dispatch path in `DriverNodeManager` actually calls it**. Every OPC UA Read / Write / HistoryRead / Browse / Call / CreateMonitoredItems on the live server currently runs through the pre-Phase-6.2 code path (which gates Write via `WriteAuthzPolicy` only — no per-tag ACL). + +Closing this requires: + +- Thread `AuthorizationGate` through `OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager` (the same plumbing path Phase 6.1's `DriverResiliencePipelineBuilder` took). +- Build a `NodeScopeResolver` that maps `fullRef → NodeScope` via a live DB lookup of the tag's UnsArea / UnsLine / Equipment path. Cache per generation. +- Call `gate.IsAllowed(identity, operation, scope)` in OnReadValue / OnWriteValue / the four HistoryRead paths / Browse / Call / Acknowledge/Confirm/Shelve / CreateMonitoredItems / TransferSubscriptions. +- Stamp MonitoredItems with `(AuthGenerationId, MembershipVersion)` per decision #153 so revoked grants surface `BadUserAccessDenied` within one publish cycle. +- 3-user integration matrix covering each operation × allow/deny. + +**Strict mode default**: start lax (`Authorization:StrictMode = false`) during rollout so deployments without populated ACLs keep working. Flip to strict once ACL seeding lands for production clusters. + +### Config fallback — Phase 6.1 Stream D wiring (task #136) + +`ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag` all exist but nothing consumes them. The `NodeBootstrap` path still uses the original single-file `LiteDbConfigCache` via `ILocalConfigCache`; `sp_PublishGeneration` doesn't call `GenerationSealedCache.SealAsync` after commit; the Configuration read services don't wrap queries in `ResilientConfigReader.ReadAsync`. + +Closing this requires: + +- `sp_PublishGeneration` (or its EF-side wrapper) calls `SealAsync` after successful commit. +- DriverInstance enumeration, LdapGroupRoleMapping fetches, cluster + namespace metadata reads route through `ResilientConfigReader.ReadAsync`. +- Integration test: SQL container kill mid-operation → serves sealed snapshot, `UsingStaleConfig` = true, driver stays Healthy, `/healthz` body reflects the flag. + +### Redundancy — Phase 6.3 Streams A/C/F (tasks #145, #147, #150) + +`ServiceLevelCalculator` + `RecoveryStateManager` + `ApplyLeaseRegistry` exist as pure logic. **No code invokes them at runtime.** The OPC UA server still publishes a static `ServiceLevel`; `ServerUriArray` still carries only self; no coordinator reads cluster topology; no peer probing. + +Closing this requires: + +- `RedundancyCoordinator` singleton reads `ClusterNode` + peer list at startup (Stream A). +- `PeerHttpProbeLoop` + `PeerUaProbeLoop` feed the calculator. +- OPC UA node wiring: `ServiceLevel` becomes a live `BaseDataVariable` on calculator observer output; `ServerUriArray` includes self + peers; `RedundancySupport` static from `RedundancyMode` (Stream C). +- `sp_PublishGeneration` wraps in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band fires during actual publishes. +- Client interop matrix validation against Ignition / Kepware / Aveva OI Gateway (Stream F). + +### Remaining drivers (task #120) + +AB CIP, AB Legacy, TwinCAT ADS, FOCAS drivers are planned but unshipped. Decision pending on whether these are release-blocking for v2 GA or can slip to a v2.1 follow-up. + +## Nice-to-haves (not release-blocking) + +- **Admin UI** — Phase 6.1 Stream E.2/E.3 (`/hosts` column refresh), Phase 6.2 Stream D (`RoleGrantsTab` + `AclsTab` Probe), Phase 6.3 Stream E (`RedundancyTab`), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D `IdentificationFields.razor`. Tasks #134, #144, #149, #153, #155, #156, #157. +- **Background services** — Phase 6.1 Stream B.4 `ScheduledRecycleScheduler` HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through `CapabilityInvoker`). +- **Multi-host dispatch** — Phase 6.1 Stream A follow-up (task #135). Currently every driver gets a single pipeline keyed on `driver.DriverInstanceId`; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but we haven't wired it yet. + +## Running the release-readiness check + +```bash +pwsh ./scripts/compliance/phase-6-all.ps1 +``` + +This meta-runner invokes each `phase-6-N-compliance.ps1` script in sequence and reports an aggregate PASS/FAIL. It is the single-command verification that what we claim is shipped still compiles + tests pass + the plan-level invariants are still satisfied. + +Exit 0 = every phase passes its compliance checks + no test-count regression. + +## Release-readiness exit criteria + +v2 GA requires all of the following: + +- [ ] All four Phase 6.N compliance scripts exit 0. +- [ ] `dotnet test ZB.MOM.WW.OtOpcUa.slnx` passes with ≤ 1 known-flake failure. +- [ ] Release blockers listed above all closed (or consciously deferred to v2.1 with a written decision). +- [ ] Production deployment checklist (separate doc) signed off by Fleet Admin. +- [ ] At least one end-to-end integration run against the live Galaxy on the dev box succeeds. +- [ ] OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint. +- [ ] Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85). + +## Change log + +- **2026-04-19** — Phase 6.4 data layer merged (PRs #91–92). Phase 6 core complete. Capstone doc created. +- **2026-04-19** — Phase 6.3 core merged (PRs #89–90). `ServiceLevelCalculator` + `RecoveryStateManager` + `ApplyLeaseRegistry` land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred. +- **2026-04-19** — Phase 6.2 core merged (PRs #84–88). `AuthorizationGate` + `TriePermissionEvaluator` + `LdapGroupRoleMapping` land; dispatch wiring + Admin UI deferred. +- **2026-04-19** — Phase 6.1 shipped (PRs #78–83). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin `/hosts` data layer all live. diff --git a/scripts/compliance/phase-6-all.ps1 b/scripts/compliance/phase-6-all.ps1 new file mode 100644 index 0000000..038c4f4 --- /dev/null +++ b/scripts/compliance/phase-6-all.ps1 @@ -0,0 +1,77 @@ +<# +.SYNOPSIS + Meta-runner that invokes every per-phase Phase 6.x compliance script and + reports an aggregate verdict. + +.DESCRIPTION + Runs phase-6-1-compliance.ps1, phase-6-2, phase-6-3, phase-6-4 in sequence. + Each sub-script returns its own exit code; this wrapper aggregates them. + Useful before a v2 release tag + as the `dotnet test` companion in CI. + +.NOTES + Usage: pwsh ./scripts/compliance/phase-6-all.ps1 + Exit: 0 = every phase passed; 1 = one or more phases failed +#> +[CmdletBinding()] +param() + +$ErrorActionPreference = 'Continue' + +$phases = @( + @{ Name = 'Phase 6.1 - Resilience & Observability'; Script = 'phase-6-1-compliance.ps1' }, + @{ Name = 'Phase 6.2 - Authorization runtime'; Script = 'phase-6-2-compliance.ps1' }, + @{ Name = 'Phase 6.3 - Redundancy runtime'; Script = 'phase-6-3-compliance.ps1' }, + @{ Name = 'Phase 6.4 - Admin UI completion'; Script = 'phase-6-4-compliance.ps1' } +) + +$results = @() +$startedAt = Get-Date + +foreach ($phase in $phases) { + Write-Host "" + Write-Host "" + Write-Host "=============================================================" -ForegroundColor DarkGray + Write-Host ("Running {0}" -f $phase.Name) -ForegroundColor Cyan + Write-Host "=============================================================" -ForegroundColor DarkGray + + $scriptPath = Join-Path $PSScriptRoot $phase.Script + if (-not (Test-Path $scriptPath)) { + Write-Host (" [MISSING] {0}" -f $phase.Script) -ForegroundColor Red + $results += @{ Name = $phase.Name; Exit = 2 } + continue + } + + # Invoke each sub-script in its own powershell.exe process so its local + # $ErrorActionPreference + exit-code semantics can't interfere with the meta-runner's + # state. Slower (one process spawn per phase) but makes aggregate PASS/FAIL match + # standalone runs exactly. + & powershell.exe -NoProfile -ExecutionPolicy Bypass -File $scriptPath + $exitCode = $LASTEXITCODE + $results += @{ Name = $phase.Name; Exit = $exitCode } +} + +$elapsed = (Get-Date) - $startedAt + +Write-Host "" +Write-Host "" +Write-Host "=============================================================" -ForegroundColor DarkGray +Write-Host "Phase 6 compliance aggregate" -ForegroundColor Cyan +Write-Host "=============================================================" -ForegroundColor DarkGray + +$totalFailures = 0 +foreach ($r in $results) { + $colour = if ($r.Exit -eq 0) { 'Green' } else { 'Red' } + $tag = if ($r.Exit -eq 0) { 'PASS' } else { "FAIL (exit=$($r.Exit))" } + Write-Host (" [{0}] {1}" -f $tag, $r.Name) -ForegroundColor $colour + if ($r.Exit -ne 0) { $totalFailures++ } +} + +Write-Host "" +Write-Host ("Elapsed: {0:N1} s" -f $elapsed.TotalSeconds) -ForegroundColor DarkGray + +if ($totalFailures -eq 0) { + Write-Host "Phase 6 aggregate: PASS" -ForegroundColor Green + exit 0 +} +Write-Host ("Phase 6 aggregate: {0} phase(s) FAILED" -f $totalFailures) -ForegroundColor Red +exit 1