Files
lmxopcua/docs/v2/v2-release-readiness.md
Joseph Doherty a8401ab8fd v2 release-readiness — blocker #2 closed; doc reflects state
PR #96 closed the Phase 6.1 Stream D config-cache wiring blocker.

- Status line: "one of three release blockers remains".
- Blocker #2 struck through + CLOSED with PR link. Periodic-poller + richer-
  snapshot-payload follow-ups downgraded to hardening.
- Change log: dated entry.

One blocker remains: Phase 6.3 Streams A/C/F redundancy runtime (tasks
#145, #147, #150).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:16:31 -04:00

8.8 KiB
Raw Blame History

v2 Release Readiness

Last updated: 2026-04-19 (release blockers #1 + #2 closed; Phase 6.3 redundancy runtime is the last) Status: NOT YET RELEASE-READY — one of three release blockers remains (Phase 6.3 Streams A/C/F redundancy-coordinator + OPC UA node wiring + client interop).

This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.

Release-readiness dashboard

Phase Shipped Status
Phase 0 — Rename + entry gate Shipped
Phase 1 — Configuration + Admin scaffold Shipped (some UI items deferred to 6.4)
Phase 2 — Galaxy driver split (Proxy/Host/Shared) Shipped
Phase 3 — OPC UA server + LDAP + security profiles Shipped
Phase 4 — Redundancy scaffold (entities + endpoints) Shipped (runtime closes in 6.3)
Phase 5 — Drivers ⚠ partial Galaxy / Modbus / S7 / OpcUaClient shipped; AB CIP / AB Legacy / TwinCAT / FOCAS deferred (task #120)
Phase 6.1 — Resilience & Observability SHIPPED (PRs #7883)
Phase 6.2 — Authorization runtime ◐ core SHIPPED (core) (PRs #8488); dispatch wiring + Admin UI deferred
Phase 6.3 — Redundancy runtime ◐ core SHIPPED (core) (PRs #8990); coordinator + UA-node wiring + Admin UI + interop deferred
Phase 6.4 — Admin UI completion ◐ data layer SHIPPED (data layer) (PRs #9192); Blazor UI + OPC 40010 address-space wiring deferred

Aggregate test counts: 906 baseline (pre-Phase-6) → 1159 passing across Phase 6. One pre-existing Client.CLI SubscribeCommandTests.Execute_PrintsSubscriptionMessage flake tracked separately.

Release blockers (must close before v2 GA)

Ordered by severity + impact on production fitness.

Security — Phase 6.2 dispatch wiring (task #143 — CLOSED 2026-04-19, PR #94)

Closed. AuthorizationGate + NodeScopeResolver now thread through OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager. OnReadValue + OnWriteValue + all four HistoryRead paths call gate.IsAllowed(identity, operation, scope) before the invoker. Production deployments activate enforcement by constructing OpcUaApplicationHost with an AuthorizationGate(StrictMode: true) + populating the NodeAcl table.

Additional Stream C surfaces (not release-blocking, hardening only):

  • Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per acl-design.md §Browse.
  • CreateMonitoredItems + TransferSubscriptions gating with per-item (AuthGenerationId, MembershipVersion) stamp so revoked grants surface BadUserAccessDenied within one publish cycle (decision #153).
  • Alarm Acknowledge / Confirm / Shelve gating.
  • Call (method invocation) gating.
  • Finer-grained scope resolution — current NodeScopeResolver returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.
  • 3-user integration matrix covering every operation × allow/deny.

These are additional hardening — the three highest-value surfaces (Read / Write / HistoryRead) are now gated, which covers the base-security gap for v2 GA.

Config fallback — Phase 6.1 Stream D wiring (task #136 — CLOSED 2026-04-19, PR #96)

Closed. SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag end-to-end: bootstrap calls go through the timeout → retry → fallback-to-sealed pipeline; every central-DB success writes a fresh sealed snapshot so the next cache-miss has a known-good fallback; StaleConfigFlag.IsStale is now consumed by HealthEndpointsHost.usingStaleConfig so /healthz body reports reality.

Production activation: Program.cs switches NodeBootstrap → SealedBootstrap + constructs OpcUaApplicationHost with the StaleConfigFlag as an optional ctor parameter.

Remaining follow-ups (hardening, not release-blocking):

  • A HostedService that polls sp_GetCurrentGenerationForCluster periodically so peer-published generations land in this node's cache without a restart.
  • Richer snapshot payload via sp_GetGenerationContent so fallback can serve the full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.

Redundancy — Phase 6.3 Streams A/C/F (tasks #145, #147, #150)

ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry exist as pure logic. No code invokes them at runtime. The OPC UA server still publishes a static ServiceLevel; ServerUriArray still carries only self; no coordinator reads cluster topology; no peer probing.

Closing this requires:

  • RedundancyCoordinator singleton reads ClusterNode + peer list at startup (Stream A).
  • PeerHttpProbeLoop + PeerUaProbeLoop feed the calculator.
  • OPC UA node wiring: ServiceLevel becomes a live BaseDataVariable on calculator observer output; ServerUriArray includes self + peers; RedundancySupport static from RedundancyMode (Stream C).
  • sp_PublishGeneration wraps in await using var lease = coordinator.BeginApplyLease(...) so the PrimaryMidApply band fires during actual publishes.
  • Client interop matrix validation against Ignition / Kepware / Aveva OI Gateway (Stream F).

Remaining drivers (task #120)

AB CIP, AB Legacy, TwinCAT ADS, FOCAS drivers are planned but unshipped. Decision pending on whether these are release-blocking for v2 GA or can slip to a v2.1 follow-up.

Nice-to-haves (not release-blocking)

  • Admin UI — Phase 6.1 Stream E.2/E.3 (/hosts column refresh), Phase 6.2 Stream D (RoleGrantsTab + AclsTab Probe), Phase 6.3 Stream E (RedundancyTab), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D IdentificationFields.razor. Tasks #134, #144, #149, #153, #155, #156, #157.
  • Background services — Phase 6.1 Stream B.4 ScheduledRecycleScheduler HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through CapabilityInvoker).
  • Multi-host dispatch — Phase 6.1 Stream A follow-up (task #135). Currently every driver gets a single pipeline keyed on driver.DriverInstanceId; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but we haven't wired it yet.

Running the release-readiness check

pwsh ./scripts/compliance/phase-6-all.ps1

This meta-runner invokes each phase-6-N-compliance.ps1 script in sequence and reports an aggregate PASS/FAIL. It is the single-command verification that what we claim is shipped still compiles + tests pass + the plan-level invariants are still satisfied.

Exit 0 = every phase passes its compliance checks + no test-count regression.

Release-readiness exit criteria

v2 GA requires all of the following:

  • All four Phase 6.N compliance scripts exit 0.
  • dotnet test ZB.MOM.WW.OtOpcUa.slnx passes with ≤ 1 known-flake failure.
  • Release blockers listed above all closed (or consciously deferred to v2.1 with a written decision).
  • Production deployment checklist (separate doc) signed off by Fleet Admin.
  • At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
  • OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
  • Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).

Change log

  • 2026-04-19 — Release blocker #2 closed (PR #96). SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag; /healthz now surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
  • 2026-04-19 — Release blocker #1 closed (PR #94). AuthorizationGate wired into DriverNodeManager Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
  • 2026-04-19 — Phase 6.4 data layer merged (PRs #9192). Phase 6 core complete. Capstone doc created.
  • 2026-04-19 — Phase 6.3 core merged (PRs #8990). ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred.
  • 2026-04-19 — Phase 6.2 core merged (PRs #8488). AuthorizationGate + TriePermissionEvaluator + LdapGroupRoleMapping land; dispatch wiring + Admin UI deferred.
  • 2026-04-19 — Phase 6.1 shipped (PRs #7883). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin /hosts data layer all live.