Closes out Phase 6 with the two pieces a release engineer needs before
tagging v2 GA:
1. scripts/compliance/phase-6-all.ps1 — meta-runner that invokes every
per-phase Phase 6.N compliance script in sequence + aggregates results.
Each sub-script runs in its own powershell.exe child process so per-script
$ErrorActionPreference + exit semantics can't interfere with the parent.
Exit 0 = every phase passes; exit 1 = one or more phases failed. Prints a
PASS/FAIL summary matrix at the end.
2. docs/v2/v2-release-readiness.md — single-view dashboard of everything
shipped + everything still deferred + release exit criteria. Called out
explicitly:
- Three release BLOCKERS (must close before v2 GA):
* Phase 6.2 Stream C dispatch wiring — AuthorizationGate exists but no
DriverNodeManager Read/Write/etc. path calls it (task #143).
* Phase 6.1 Stream D follow-up — ResilientConfigReader + sealed-cache
hook not yet consumed by any read path (task #136).
* Phase 6.3 Streams A/C/F — coordinator + UA-node wiring + client
interop still deferred (tasks #145, #147, #150).
- Three nice-to-haves (not release-blocking) — Admin UI polish, background
services, multi-host dispatch.
- Release exit criteria: all 4 compliance scripts exit 0, dotnet test ≤ 1
known flake, blockers closed or v2.1-deferred with written decision,
Fleet Admin signoff on deployment checklist, live-Galaxy smoke test,
OPC UA CTT pass, redundancy cutover validated with at least one
production client.
- Change log at the bottom so future ships of deferred follow-ups just
append dates + close out dashboard rows.
Meta-runner verified locally:
Phase 6.1 — PASS
Phase 6.2 — PASS
Phase 6.3 — PASS
Phase 6.4 — PASS
Aggregate: PASS (elapsed 340 s — most of that is the full solution
`dotnet test` each phase runs).
Net counts at capstone time: 906 baseline → 1159 passing across Phase 6
(+253). 15 deferred follow-up tasks tracked with IDs (#134-137, #143-144,
#145, #147, #149-150, #153, #155-157). v2 is NOT YET release-ready —
capstone makes that explicit rather than letting the "shipped" label on
each phase imply full readiness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.9 KiB
v2 Release Readiness
Last updated: 2026-04-19 (Phase 6.4 data layer merged) Status: NOT YET RELEASE-READY — four Phase 6 data-layer ships have landed, but several production-path wirings are still deferred.
This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.
Release-readiness dashboard
| Phase | Shipped | Status |
|---|---|---|
| Phase 0 — Rename + entry gate | ✓ | Shipped |
| Phase 1 — Configuration + Admin scaffold | ✓ | Shipped (some UI items deferred to 6.4) |
| Phase 2 — Galaxy driver split (Proxy/Host/Shared) | ✓ | Shipped |
| Phase 3 — OPC UA server + LDAP + security profiles | ✓ | Shipped |
| Phase 4 — Redundancy scaffold (entities + endpoints) | ✓ | Shipped (runtime closes in 6.3) |
| Phase 5 — Drivers | ⚠ partial | Galaxy / Modbus / S7 / OpcUaClient shipped; AB CIP / AB Legacy / TwinCAT / FOCAS deferred (task #120) |
| Phase 6.1 — Resilience & Observability | ✓ | SHIPPED (PRs #78–83) |
| Phase 6.2 — Authorization runtime | ◐ core | SHIPPED (core) (PRs #84–88); dispatch wiring + Admin UI deferred |
| Phase 6.3 — Redundancy runtime | ◐ core | SHIPPED (core) (PRs #89–90); coordinator + UA-node wiring + Admin UI + interop deferred |
| Phase 6.4 — Admin UI completion | ◐ data layer | SHIPPED (data layer) (PRs #91–92); Blazor UI + OPC 40010 address-space wiring deferred |
Aggregate test counts: 906 baseline (pre-Phase-6) → 1159 passing across Phase 6. One pre-existing Client.CLI SubscribeCommandTests.Execute_PrintsSubscriptionMessage flake tracked separately.
Release blockers (must close before v2 GA)
Ordered by severity + impact on production fitness.
Security — Phase 6.2 dispatch wiring (task #143)
The AuthorizationGate + IPermissionEvaluator + PermissionTrie stack is fully built and unit-tested, but no dispatch path in DriverNodeManager actually calls it. Every OPC UA Read / Write / HistoryRead / Browse / Call / CreateMonitoredItems on the live server currently runs through the pre-Phase-6.2 code path (which gates Write via WriteAuthzPolicy only — no per-tag ACL).
Closing this requires:
- Thread
AuthorizationGatethroughOpcUaApplicationHost → OtOpcUaServer → DriverNodeManager(the same plumbing path Phase 6.1'sDriverResiliencePipelineBuildertook). - Build a
NodeScopeResolverthat mapsfullRef → NodeScopevia a live DB lookup of the tag's UnsArea / UnsLine / Equipment path. Cache per generation. - Call
gate.IsAllowed(identity, operation, scope)in OnReadValue / OnWriteValue / the four HistoryRead paths / Browse / Call / Acknowledge/Confirm/Shelve / CreateMonitoredItems / TransferSubscriptions. - Stamp MonitoredItems with
(AuthGenerationId, MembershipVersion)per decision #153 so revoked grants surfaceBadUserAccessDeniedwithin one publish cycle. - 3-user integration matrix covering each operation × allow/deny.
Strict mode default: start lax (Authorization:StrictMode = false) during rollout so deployments without populated ACLs keep working. Flip to strict once ACL seeding lands for production clusters.
Config fallback — Phase 6.1 Stream D wiring (task #136)
ResilientConfigReader + GenerationSealedCache + StaleConfigFlag all exist but nothing consumes them. The NodeBootstrap path still uses the original single-file LiteDbConfigCache via ILocalConfigCache; sp_PublishGeneration doesn't call GenerationSealedCache.SealAsync after commit; the Configuration read services don't wrap queries in ResilientConfigReader.ReadAsync.
Closing this requires:
sp_PublishGeneration(or its EF-side wrapper) callsSealAsyncafter successful commit.- DriverInstance enumeration, LdapGroupRoleMapping fetches, cluster + namespace metadata reads route through
ResilientConfigReader.ReadAsync. - Integration test: SQL container kill mid-operation → serves sealed snapshot,
UsingStaleConfig= true, driver stays Healthy,/healthzbody reflects the flag.
Redundancy — Phase 6.3 Streams A/C/F (tasks #145, #147, #150)
ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry exist as pure logic. No code invokes them at runtime. The OPC UA server still publishes a static ServiceLevel; ServerUriArray still carries only self; no coordinator reads cluster topology; no peer probing.
Closing this requires:
RedundancyCoordinatorsingleton readsClusterNode+ peer list at startup (Stream A).PeerHttpProbeLoop+PeerUaProbeLoopfeed the calculator.- OPC UA node wiring:
ServiceLevelbecomes a liveBaseDataVariableon calculator observer output;ServerUriArrayincludes self + peers;RedundancySupportstatic fromRedundancyMode(Stream C). sp_PublishGenerationwraps inawait using var lease = coordinator.BeginApplyLease(...)so thePrimaryMidApplyband fires during actual publishes.- Client interop matrix validation against Ignition / Kepware / Aveva OI Gateway (Stream F).
Remaining drivers (task #120)
AB CIP, AB Legacy, TwinCAT ADS, FOCAS drivers are planned but unshipped. Decision pending on whether these are release-blocking for v2 GA or can slip to a v2.1 follow-up.
Nice-to-haves (not release-blocking)
- Admin UI — Phase 6.1 Stream E.2/E.3 (
/hostscolumn refresh), Phase 6.2 Stream D (RoleGrantsTab+AclsTabProbe), Phase 6.3 Stream E (RedundancyTab), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream DIdentificationFields.razor. Tasks #134, #144, #149, #153, #155, #156, #157. - Background services — Phase 6.1 Stream B.4
ScheduledRecycleSchedulerHostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes throughCapabilityInvoker). - Multi-host dispatch — Phase 6.1 Stream A follow-up (task #135). Currently every driver gets a single pipeline keyed on
driver.DriverInstanceId; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but we haven't wired it yet.
Running the release-readiness check
pwsh ./scripts/compliance/phase-6-all.ps1
This meta-runner invokes each phase-6-N-compliance.ps1 script in sequence and reports an aggregate PASS/FAIL. It is the single-command verification that what we claim is shipped still compiles + tests pass + the plan-level invariants are still satisfied.
Exit 0 = every phase passes its compliance checks + no test-count regression.
Release-readiness exit criteria
v2 GA requires all of the following:
- All four Phase 6.N compliance scripts exit 0.
dotnet test ZB.MOM.WW.OtOpcUa.slnxpasses with ≤ 1 known-flake failure.- Release blockers listed above all closed (or consciously deferred to v2.1 with a written decision).
- Production deployment checklist (separate doc) signed off by Fleet Admin.
- At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
- OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
- Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).
Change log
- 2026-04-19 — Phase 6.4 data layer merged (PRs #91–92). Phase 6 core complete. Capstone doc created.
- 2026-04-19 — Phase 6.3 core merged (PRs #89–90).
ServiceLevelCalculator+RecoveryStateManager+ApplyLeaseRegistryland as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred. - 2026-04-19 — Phase 6.2 core merged (PRs #84–88).
AuthorizationGate+TriePermissionEvaluator+LdapGroupRoleMappingland; dispatch wiring + Admin UI deferred. - 2026-04-19 — Phase 6.1 shipped (PRs #78–83). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin
/hostsdata layer all live.