Files
lmxopcua/docs/v2/v2-release-readiness.md
Joseph Doherty ded292ecd7 Phase 6.2 Stream C — Call + Alarm Acknowledge/Confirm gating
Closes task #122 (Acknowledge + Confirm + generic Call — Shelve stays as
a follow-up pending per-instance method-NodeId resolution).

Before this commit any session with a connected channel could invoke
method nodes on driver-materialized equipment — including alarm
Acknowledge / Confirm. Combined with the Browse + CreateMonitoredItems
gates that landed earlier in Stream C, this was the last service-layer
entry point where a session could still affect state without passing
the authz trie.

Implementation on DriverNodeManager:
- `Call` override — pre-iterates methodsToCall, gates each through
  AuthorizationGate with the operation kind returned by
  MapCallOperation. Denied calls get errors[i] = BadUserAccessDenied
  before delegating to base.Call.
- `MapCallOperation(NodeId methodId)` — maps well-known Part 9 method
  NodeIds to dedicated operation kinds:
    MethodIds.AcknowledgeableConditionType_Acknowledge →
        OpcUaOperation.AlarmAcknowledge
    MethodIds.AcknowledgeableConditionType_Confirm →
        OpcUaOperation.AlarmConfirm
    everything else → OpcUaOperation.Call
  Lets the ACL distinguish "can acknowledge alarms" from "can invoke
  arbitrary methods" without conflating the two roles.
- Shelve dispatch paths through per-instance ShelvedStateMachine methods
  with dynamic NodeIds that can't be constant-matched — falls through
  to generic Call. Fine-grained OpcUaOperation.AlarmShelve is a follow-
  up when the method-invocation path grows a "method-role" annotation.

Extracted GateCallMethodRequests + MapCallOperation as static internal
for unit-testability. 8 new tests (MapCallOperation Acknowledge /
Confirm / generic; gate-null no-op, denied-Acknowledge, allowed-
Acknowledge, mixed-batch, pre-populated-error-preserved).
Server.Tests 269 → 277.

Known follow-ups:
- Shelve per-operation gating (see above).
- TranslateBrowsePathsToNodeIds gating (Browse follow-up from #120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:22:19 -04:00

14 KiB
Raw Blame History

v2 Release Readiness

Last updated: 2026-04-24 (Phase 5 driver complement closed — AB CIP, AB Legacy, TwinCAT, FOCAS all shipped; FOCAS Tier-C retired for a pure-managed in-process client) Status: RELEASE-READY (code-path) for v2 GA. All three original code-path release blockers remain closed. Phase 5 is now complete. Remaining work is manual (live-hardware validations, client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.

This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.

Release-readiness dashboard

Phase Shipped Status
Phase 0 — Rename + entry gate Shipped
Phase 1 — Configuration + Admin scaffold Shipped (some UI items deferred to 6.4)
Phase 2 — Galaxy driver split (Proxy/Host/Shared) Shipped
Phase 3 — OPC UA server + LDAP + security profiles Shipped
Phase 4 — Redundancy scaffold (entities + endpoints) Shipped (runtime closes in 6.3)
Phase 5 — Drivers Shipped — Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP, AB Legacy, TwinCAT ADS, FOCAS (managed wire client)
Phase 6.1 — Resilience & Observability Shipped (PRs #7883)
Phase 6.2 — Authorization runtime ◐ core Core shipped (PRs #8488, #94 dispatch wiring); finer-grained Browse/Subscribe/Alarm/Call gating + 3-user interop matrix deferred
Phase 6.3 — Redundancy runtime ◐ core Core shipped (PRs #8990, #9899); peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix deferred
Phase 6.4 — Admin UI completion ◐ data layer + Identification Data layer + OPC 40010 Identification folder shipped (PRs #9192, Identification audit close-out 2026-04-23); Blazor UI pieces deferred

Driver integration-test counts (end-to-end against live or simulated targets): Modbus 26, FOCAS 9, AbCip 7, OpcUaClient 3, S7 3, AbLegacy 2, TwinCAT 2. Plus Galaxy's separate cross-FX parity/stability suite.

Aggregate test counts (2026-04-19 baseline): 1159 passing across the solution. One pre-existing Client.CLI SubscribeCommandTests.Execute_PrintsSubscriptionMessage flake tracked separately. Rerun dotnet test ZB.MOM.WW.OtOpcUa.slnx after the FOCAS migration commits land to refresh the number.

Release blockers (must close before v2 GA)

All code-path release blockers are closed. The remaining items are live-hardware / manual validations listed under exit criteria.

Security — Phase 6.2 dispatch wiring (task #143 — CLOSED 2026-04-19, PR #94)

Closed. AuthorizationGate + NodeScopeResolver thread through OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager. OnReadValue + OnWriteValue + all four HistoryRead paths call gate.IsAllowed(identity, operation, scope) before the invoker. Production deployments activate enforcement by constructing OpcUaApplicationHost with an AuthorizationGate(StrictMode: true) + populating the NodeAcl table.

Remaining Stream C surfaces (hardening, not release-blocking):

  • Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per acl-design.md §Browse. Partial, 2026-04-24. DriverNodeManager.Browse override post-filters the ReferenceDescription list via a new FilterBrowseReferences helper — denied nodes disappear silently per OPC UA convention. Ancestor-visibility implication (Read-grant at Line/Tag implying Browse on Line) still to ship; needs a subtree-has-any-grant query on the trie evaluator. TranslateBrowsePathsToNodeIds surface not yet wired.
  • CreateMonitoredItems + TransferSubscriptions gating with per-item (AuthGenerationId, MembershipVersion) stamp so revoked grants surface BadUserAccessDenied within one publish cycle (decision #153). Partial, 2026-04-24. DriverNodeManager.CreateMonitoredItems override pre-gates each request and pre-populates BadUserAccessDenied into the errors slot for denied items (the base stack honours pre-set errors and skips those items). Decision #153's per-item (AuthGenerationId, MembershipVersion) stamp for detecting mid-subscription revocation is still to ship — needs subscription-layer plumbing. TransferSubscriptions not yet wired (same pattern).
  • Alarm Acknowledge / Confirm / Shelve gating. Partial, 2026-04-24. Acknowledge + Confirm map to dedicated OpcUaOperation.AlarmAcknowledge / AlarmConfirm via MapCallOperation; Shelve falls through to generic OpcUaOperation.Call (needs per-instance method NodeId resolution to distinguish — follow-up).
  • Call (method invocation) gating. Closed 2026-04-24. DriverNodeManager.Call override pre-gates each CallMethodRequest via GateCallMethodRequests. Denied calls return BadUserAccessDenied without running the method. Alarm methods map to alarm-specific operation kinds; everything else gates as generic Call.
  • Finer-grained scope resolution — current NodeScopeResolver returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.
  • 3-user integration matrix covering every operation × allow/deny.

Config fallback — Phase 6.1 Stream D wiring (task #136 — CLOSED 2026-04-19, PR #96)

Closed. SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag end-to-end; /healthz surfaces the stale flag.

Remaining follow-ups (hardening):

  • A HostedService that polls sp_GetCurrentGenerationForCluster periodically so peer-published generations land in this node's cache without a restart.
  • Richer snapshot payload via sp_GetGenerationContent so fallback can serve full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.

Redundancy — Phase 6.3 Streams A/C core (tasks #145 + #147 — CLOSED 2026-04-19, PRs #9899)

Closed. RedundancyCoordinator + RedundancyStatePublisher + PeerReachabilityTracker orchestrate topology + apply lease + recovery state + peer reachability through ServiceLevelCalculator + emit OnStateChanged / OnServerUriArrayChanged edge-triggered events.

Remaining Phase 6.3 surfaces (hardening, not release-blocking):

  • PeerHttpProbeLoop + PeerUaProbeLoop HostedServices populating PeerReachabilityTracker on each tick. Closed 2026-04-24. Two-layer probe model shipped: HTTP probe at 2 s / 1 s timeout against /healthz; OPC UA probe at 10 s / 5 s timeout via DiscoveryClient.GetEndpoints, short-circuiting when HTTP reports the peer unhealthy. Registered on the Server as AddHostedService<PeerHttpProbeLoop> + AddHostedService<PeerUaProbeLoop>. Publisher now sees accurate PeerReachability per peer instead of degrading to Unknown → Isolated-Primary band (230).
  • OPC UA variable-node wiring: bind ServiceLevel Byte + ServerUriArray String[] to the publisher's events via BaseDataVariable.OnReadValue / direct value push.
  • sp_PublishGeneration wraps its apply in await using var lease = coordinator.BeginApplyLease(...) so the PrimaryMidApply band (200) fires during actual publishes (task #148 part 2). Closed 2026-04-24. The apply loop now lives in GenerationRefreshHostedService — polls sp_GetCurrentGenerationForCluster every 5s, opens a lease when a new generation is detected, calls RedundancyCoordinator.RefreshAsync inside the await using, releases the lease on all exit paths. Replaces the previous "topology never refreshes without a process restart" behaviour.
  • Client interop matrix — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only.

Phase 5 driver complement (task #120 — CLOSED 2026-04-24)

Closed. All four deferred drivers shipped:

  • AB CIP (PRs #202222) — Driver.AbCip, Driver.AbCip.IntegrationTests (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
  • AB Legacy (PRs #202, #223) — Driver.AbLegacy, Driver.AbLegacy.IntegrationTests (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
  • TwinCAT ADS (PRs #205, this branch task-galaxy-e2e) — Driver.TwinCAT, Driver.TwinCAT.IntegrationTests (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
  • FOCAS (PRs #173, #199 + this session's migration) — Driver.FOCAS with an in-process managed FocasWireClient that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — Driver.FOCAS.Host + Driver.FOCAS.Shared + FwlibNative P/Invoke + shim DLL + NSSM service all deleted. Driver.FOCAS.IntegrationTests covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).

Decision recorded: FOCAS is read-only against the CNC by design — writes return BadNotWritable. See docs/drivers/FOCAS.md + docs/drivers/FOCAS-Test-Fixture.md for the deployment + coverage map.

Nice-to-haves (not release-blocking)

  • Admin UI — Phase 6.1 Stream E.2/E.3 (/hosts column refresh), Phase 6.2 Stream D (RoleGrantsTab + AclsTab Probe), Phase 6.3 Stream E (RedundancyTab), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D IdentificationFields.razor. Tasks #134, #144, #149, #153, #155, #156, #157.
  • Background services — Phase 6.1 Stream B.4 ScheduledRecycleScheduler HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through CapabilityInvoker).
  • Multi-host dispatch — Phase 6.1 Stream A follow-up (task #135). Every driver currently gets a single pipeline keyed on driver.DriverInstanceId; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but not wired.
  • Phase 7 — scripting + alarming + historian sink (plan drafted 2026-04-20 in docs/v2/implementation/phase-7-*.md). Out of scope for v2 GA.

Live-hardware validations (task #54 + task family)

The code ships; these tasks remain open as lab/field verification:

  • #54 — FOCAS live-CNC wire-level smoke against a real FANUC control. The mock's wire responder is PDU-verified against fwlibe64.dll upstream but OtOpcUa's managed client has not been pointed at a production CNC.
  • AB CIP live-boot — already passed on a ControlLogix rig (PR #222). Continue to run ahead of each release.
  • TwinCAT wire-live — TCBSD/ESXi fixture covers the common path; production PLC verification remains lab-gated.

Running the release-readiness check

pwsh ./scripts/compliance/phase-6-all.ps1

This meta-runner invokes each phase-6-N-compliance.ps1 script in sequence and reports an aggregate PASS/FAIL:

  • phase-6-1-compliance.ps1 — Resilience & Observability
  • phase-6-2-compliance.ps1 — Authorization runtime
  • phase-6-3-compliance.ps1 — Redundancy runtime
  • phase-6-4-compliance.ps1 — Admin UI completion

Exit 0 = every phase passes its compliance checks + no test-count regression.

Release-readiness exit criteria

v2 GA requires all of the following:

  • All four Phase 6.N compliance scripts exit 0.
  • dotnet test ZB.MOM.WW.OtOpcUa.slnx passes with ≤ 1 known-flake failure.
  • Release blockers listed above all closed.
  • Phase 5 driver complement shipped (Galaxy, Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS).
  • Production deployment checklist (separate doc) signed off by Fleet Admin.
  • At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
  • FOCAS live-CNC wire-level smoke (#54) runs clean against a real FANUC control.
  • OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
  • Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).

Change log

  • 2026-04-24 — Phase 5 driver complement closed (task #120 CLOSED). AB CIP, AB Legacy, TwinCAT, FOCAS all shipped. FOCAS migration: retired the Tier-C split (Driver.FOCAS.Host + Driver.FOCAS.Shared + FwlibNative + shim DLL deleted) in favour of a pure-managed in-process FocasWireClient inlined into Driver.FOCAS; driver is now read-only against the CNC by design. Integration test matrix grew to cover Browse / Subscribe / IAlarmSource / Probe end-to-end.
  • 2026-04-23 — Phase 6.4 audit close-out. IdentificationFolderBuilder + OPC 40010 Identification folder verified against the shipped code.
  • 2026-04-20 — Phase 7 plan drafted (phase-7-scripting-and-alarming.md, phase-7-e2e-smoke.md). Out of scope for v2 GA.
  • 2026-04-19 — Release blocker #3 closed (PRs #9899). Phase 6.3 Streams A + C core shipped: ClusterTopologyLoader + RedundancyCoordinator + RedundancyStatePublisher + PeerReachabilityTracker. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix) are hardening follow-ups.
  • 2026-04-19 — Release blocker #2 closed (PR #96). SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag; /healthz surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
  • 2026-04-19 — Release blocker #1 closed (PR #94). AuthorizationGate wired into DriverNodeManager Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
  • 2026-04-19 — Phase 6.4 data layer merged (PRs #9192). Phase 6 core complete.
  • 2026-04-19 — Phase 6.3 core merged (PRs #8990). ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred.
  • 2026-04-19 — Phase 6.2 core merged (PRs #8488). AuthorizationGate + TriePermissionEvaluator + LdapGroupRoleMapping land; dispatch wiring + Admin UI deferred.
  • 2026-04-19 — Phase 6.1 shipped (PRs #7883). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin /hosts data layer all live.