Closes task #121 (partial — creation-time gate; decision #153 per-item revocation stamp is a follow-up). Before this commit a session could subscribe to any node via CreateMonitoredItems, even nodes where Read was denied — the subscription would surface BadUserAccessDenied on each data-change read, but the client saw a successful CreateMonitoredItems response and held the subscription open, wasting resources and leaking the address-space shape through the item metadata. New override on DriverNodeManager.CreateMonitoredItems: - Pre-iterates itemsToCreate, gates each through AuthorizationGate with OpcUaOperation.CreateMonitoredItems at the target node's scope. - For denied slots: sets errors[i] = new ServiceResult( StatusCodes.BadUserAccessDenied). The OPC Foundation base stack honours pre-populated non-success errors and skips item creation for those slots — the subscription never holds a handle to a denied node. - Preserves prior errors (e.g. BadNodeIdUnknown) — first diagnosis wins. - Non-string-identifier references (stack-synthesized numeric ids) bypass the gate. Extracted the pure filter logic into GateMonitoredItemCreateRequests(items, errors, identity, gate, scopeResolver) — static internal, unit-testable without the OPC UA server stack. Tests — 6 new in MonitoredItemGatingTests.cs (gate-null no-op, denied-gets-BadUserAccessDenied, allowed-passes, mixed-batch-denies- per-item, pre-populated-error-preserved, numeric-id-bypass). Server.Tests 263 → 269. Known follow-ups: - Per-item (AuthGenerationId, MembershipVersion) stamp (decision #153) for detecting revocation mid-subscription — needs subscription-layer plumbing. - TransferSubscriptions not yet wired (same pattern, smaller scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
131 lines
14 KiB
Markdown
131 lines
14 KiB
Markdown
# v2 Release Readiness
|
||
|
||
> **Last updated**: 2026-04-24 (Phase 5 driver complement closed — AB CIP, AB Legacy, TwinCAT, FOCAS all shipped; FOCAS Tier-C retired for a pure-managed in-process client)
|
||
> **Status**: **RELEASE-READY (code-path)** for v2 GA. All three original code-path release blockers remain closed. Phase 5 is now complete. Remaining work is manual (live-hardware validations, client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.
|
||
|
||
This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.
|
||
|
||
## Release-readiness dashboard
|
||
|
||
| Phase | Shipped | Status |
|
||
|---|---|---|
|
||
| Phase 0 — Rename + entry gate | ✓ | Shipped |
|
||
| Phase 1 — Configuration + Admin scaffold | ✓ | Shipped (some UI items deferred to 6.4) |
|
||
| Phase 2 — Galaxy driver split (Proxy/Host/Shared) | ✓ | Shipped |
|
||
| Phase 3 — OPC UA server + LDAP + security profiles | ✓ | Shipped |
|
||
| Phase 4 — Redundancy scaffold (entities + endpoints) | ✓ | Shipped (runtime closes in 6.3) |
|
||
| Phase 5 — Drivers | ✓ | **Shipped** — Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP, AB Legacy, TwinCAT ADS, FOCAS (managed wire client) |
|
||
| Phase 6.1 — Resilience & Observability | ✓ | Shipped (PRs #78–83) |
|
||
| Phase 6.2 — Authorization runtime | ◐ core | Core shipped (PRs #84–88, #94 dispatch wiring); finer-grained Browse/Subscribe/Alarm/Call gating + 3-user interop matrix deferred |
|
||
| Phase 6.3 — Redundancy runtime | ◐ core | Core shipped (PRs #89–90, #98–99); peer-probe HostedServices, OPC UA variable-node binding, `sp_PublishGeneration` lease wrap, client interop matrix deferred |
|
||
| Phase 6.4 — Admin UI completion | ◐ data layer + Identification | Data layer + OPC 40010 Identification folder shipped (PRs #91–92, Identification audit close-out 2026-04-23); Blazor UI pieces deferred |
|
||
|
||
**Driver integration-test counts** (end-to-end against live or simulated targets): Modbus 26, FOCAS 9, AbCip 7, OpcUaClient 3, S7 3, AbLegacy 2, TwinCAT 2. Plus Galaxy's separate cross-FX parity/stability suite.
|
||
|
||
**Aggregate test counts** (2026-04-19 baseline): 1159 passing across the solution. One pre-existing Client.CLI `SubscribeCommandTests.Execute_PrintsSubscriptionMessage` flake tracked separately. Rerun `dotnet test ZB.MOM.WW.OtOpcUa.slnx` after the FOCAS migration commits land to refresh the number.
|
||
|
||
## Release blockers (must close before v2 GA)
|
||
|
||
All code-path release blockers are closed. The remaining items are live-hardware / manual validations listed under exit criteria.
|
||
|
||
### ~~Security — Phase 6.2 dispatch wiring~~ (task #143 — **CLOSED** 2026-04-19, PR #94)
|
||
|
||
**Closed**. `AuthorizationGate` + `NodeScopeResolver` thread through `OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager`. `OnReadValue` + `OnWriteValue` + all four HistoryRead paths call `gate.IsAllowed(identity, operation, scope)` before the invoker. Production deployments activate enforcement by constructing `OpcUaApplicationHost` with an `AuthorizationGate(StrictMode: true)` + populating the `NodeAcl` table.
|
||
|
||
Remaining Stream C surfaces (hardening, not release-blocking):
|
||
|
||
- ~~Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per `acl-design.md` §Browse.~~ **Partial, 2026-04-24.** `DriverNodeManager.Browse` override post-filters the `ReferenceDescription` list via a new `FilterBrowseReferences` helper — denied nodes disappear silently per OPC UA convention. Ancestor-visibility implication (Read-grant at `Line/Tag` implying Browse on `Line`) still to ship; needs a subtree-has-any-grant query on the trie evaluator. `TranslateBrowsePathsToNodeIds` surface not yet wired.
|
||
- ~~CreateMonitoredItems + TransferSubscriptions gating with per-item `(AuthGenerationId, MembershipVersion)` stamp so revoked grants surface `BadUserAccessDenied` within one publish cycle (decision #153).~~ **Partial, 2026-04-24.** `DriverNodeManager.CreateMonitoredItems` override pre-gates each request and pre-populates `BadUserAccessDenied` into the errors slot for denied items (the base stack honours pre-set errors and skips those items). Decision #153's per-item `(AuthGenerationId, MembershipVersion)` stamp for detecting mid-subscription revocation is still to ship — needs subscription-layer plumbing. TransferSubscriptions not yet wired (same pattern).
|
||
- Alarm Acknowledge / Confirm / Shelve gating.
|
||
- Call (method invocation) gating.
|
||
- Finer-grained scope resolution — current `NodeScopeResolver` returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.
|
||
- 3-user integration matrix covering every operation × allow/deny.
|
||
|
||
### ~~Config fallback — Phase 6.1 Stream D wiring~~ (task #136 — **CLOSED** 2026-04-19, PR #96)
|
||
|
||
**Closed**. `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag` end-to-end; `/healthz` surfaces the stale flag.
|
||
|
||
Remaining follow-ups (hardening):
|
||
|
||
- A `HostedService` that polls `sp_GetCurrentGenerationForCluster` periodically so peer-published generations land in this node's cache without a restart.
|
||
- Richer snapshot payload via `sp_GetGenerationContent` so fallback can serve full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.
|
||
|
||
### ~~Redundancy — Phase 6.3 Streams A/C core~~ (tasks #145 + #147 — **CLOSED** 2026-04-19, PRs #98–99)
|
||
|
||
**Closed**. `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker` orchestrate topology + apply lease + recovery state + peer reachability through `ServiceLevelCalculator` + emit `OnStateChanged` / `OnServerUriArrayChanged` edge-triggered events.
|
||
|
||
Remaining Phase 6.3 surfaces (hardening, not release-blocking):
|
||
|
||
- ~~`PeerHttpProbeLoop` + `PeerUaProbeLoop` HostedServices populating `PeerReachabilityTracker` on each tick.~~ **Closed 2026-04-24.** Two-layer probe model shipped: HTTP probe at 2 s / 1 s timeout against `/healthz`; OPC UA probe at 10 s / 5 s timeout via `DiscoveryClient.GetEndpoints`, short-circuiting when HTTP reports the peer unhealthy. Registered on the Server as `AddHostedService<PeerHttpProbeLoop>` + `AddHostedService<PeerUaProbeLoop>`. Publisher now sees accurate `PeerReachability` per peer instead of degrading to `Unknown` → Isolated-Primary band (230).
|
||
- OPC UA variable-node wiring: bind `ServiceLevel` Byte + `ServerUriArray` String[] to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push.
|
||
- ~~`sp_PublishGeneration` wraps its apply in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band (200) fires during actual publishes (task #148 part 2).~~ **Closed 2026-04-24.** The apply loop now lives in `GenerationRefreshHostedService` — polls `sp_GetCurrentGenerationForCluster` every 5s, opens a lease when a new generation is detected, calls `RedundancyCoordinator.RefreshAsync` inside the `await using`, releases the lease on all exit paths. Replaces the previous "topology never refreshes without a process restart" behaviour.
|
||
- Client interop matrix — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only.
|
||
|
||
### ~~Phase 5 driver complement~~ (task #120 — **CLOSED** 2026-04-24)
|
||
|
||
**Closed**. All four deferred drivers shipped:
|
||
|
||
- **AB CIP** (PRs #202–222) — `Driver.AbCip`, `Driver.AbCip.IntegrationTests` (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
|
||
- **AB Legacy** (PRs #202, #223) — `Driver.AbLegacy`, `Driver.AbLegacy.IntegrationTests` (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
|
||
- **TwinCAT ADS** (PRs #205, this branch `task-galaxy-e2e`) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
|
||
- **FOCAS** (PRs #173, #199 + this session's migration) — `Driver.FOCAS` with an **in-process managed `FocasWireClient`** that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — `Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` P/Invoke + shim DLL + NSSM service all deleted. `Driver.FOCAS.IntegrationTests` covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).
|
||
|
||
Decision recorded: FOCAS is **read-only** against the CNC by design — writes return `BadNotWritable`. See `docs/drivers/FOCAS.md` + `docs/drivers/FOCAS-Test-Fixture.md` for the deployment + coverage map.
|
||
|
||
## Nice-to-haves (not release-blocking)
|
||
|
||
- **Admin UI** — Phase 6.1 Stream E.2/E.3 (`/hosts` column refresh), Phase 6.2 Stream D (`RoleGrantsTab` + `AclsTab` Probe), Phase 6.3 Stream E (`RedundancyTab`), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D `IdentificationFields.razor`. Tasks #134, #144, #149, #153, #155, #156, #157.
|
||
- **Background services** — Phase 6.1 Stream B.4 `ScheduledRecycleScheduler` HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through `CapabilityInvoker`).
|
||
- **Multi-host dispatch** — Phase 6.1 Stream A follow-up (task #135). Every driver currently gets a single pipeline keyed on `driver.DriverInstanceId`; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but not wired.
|
||
- **Phase 7** — scripting + alarming + historian sink (plan drafted 2026-04-20 in `docs/v2/implementation/phase-7-*.md`). Out of scope for v2 GA.
|
||
|
||
## Live-hardware validations (task #54 + task family)
|
||
|
||
The code ships; these tasks remain open as lab/field verification:
|
||
|
||
- **#54** — FOCAS live-CNC wire-level smoke against a real FANUC control. The mock's wire responder is PDU-verified against `fwlibe64.dll` upstream but OtOpcUa's managed client has not been pointed at a production CNC.
|
||
- **AB CIP live-boot** — already passed on a ControlLogix rig (PR #222). Continue to run ahead of each release.
|
||
- **TwinCAT wire-live** — TCBSD/ESXi fixture covers the common path; production PLC verification remains lab-gated.
|
||
|
||
## Running the release-readiness check
|
||
|
||
```bash
|
||
pwsh ./scripts/compliance/phase-6-all.ps1
|
||
```
|
||
|
||
This meta-runner invokes each `phase-6-N-compliance.ps1` script in sequence and reports an aggregate PASS/FAIL:
|
||
|
||
- `phase-6-1-compliance.ps1` — Resilience & Observability
|
||
- `phase-6-2-compliance.ps1` — Authorization runtime
|
||
- `phase-6-3-compliance.ps1` — Redundancy runtime
|
||
- `phase-6-4-compliance.ps1` — Admin UI completion
|
||
|
||
Exit 0 = every phase passes its compliance checks + no test-count regression.
|
||
|
||
## Release-readiness exit criteria
|
||
|
||
v2 GA requires all of the following:
|
||
|
||
- [ ] All four Phase 6.N compliance scripts exit 0.
|
||
- [ ] `dotnet test ZB.MOM.WW.OtOpcUa.slnx` passes with ≤ 1 known-flake failure.
|
||
- [x] Release blockers listed above all closed.
|
||
- [x] Phase 5 driver complement shipped (Galaxy, Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS).
|
||
- [ ] Production deployment checklist (separate doc) signed off by Fleet Admin.
|
||
- [ ] At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
|
||
- [ ] FOCAS live-CNC wire-level smoke (#54) runs clean against a real FANUC control.
|
||
- [ ] OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
|
||
- [ ] Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).
|
||
|
||
## Change log
|
||
|
||
- **2026-04-24** — Phase 5 driver complement closed (task #120 CLOSED). AB CIP, AB Legacy, TwinCAT, FOCAS all shipped. FOCAS migration: retired the Tier-C split (`Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` + shim DLL deleted) in favour of a pure-managed in-process `FocasWireClient` inlined into `Driver.FOCAS`; driver is now read-only against the CNC by design. Integration test matrix grew to cover Browse / Subscribe / IAlarmSource / Probe end-to-end.
|
||
- **2026-04-23** — Phase 6.4 audit close-out. IdentificationFolderBuilder + OPC 40010 Identification folder verified against the shipped code.
|
||
- **2026-04-20** — Phase 7 plan drafted (`phase-7-scripting-and-alarming.md`, `phase-7-e2e-smoke.md`). Out of scope for v2 GA.
|
||
- **2026-04-19** — Release blocker #3 closed (PRs #98–99). Phase 6.3 Streams A + C core shipped: `ClusterTopologyLoader` + `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker`. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, `sp_PublishGeneration` lease wrap, client interop matrix) are hardening follow-ups.
|
||
- **2026-04-19** — Release blocker #2 closed (PR #96). `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag`; `/healthz` surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
|
||
- **2026-04-19** — Release blocker #1 closed (PR #94). `AuthorizationGate` wired into `DriverNodeManager` Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
|
||
- **2026-04-19** — Phase 6.4 data layer merged (PRs #91–92). Phase 6 core complete.
|
||
- **2026-04-19** — Phase 6.3 core merged (PRs #89–90). `ServiceLevelCalculator` + `RecoveryStateManager` + `ApplyLeaseRegistry` land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred.
|
||
- **2026-04-19** — Phase 6.2 core merged (PRs #84–88). `AuthorizationGate` + `TriePermissionEvaluator` + `LdapGroupRoleMapping` land; dispatch wiring + Admin UI deferred.
|
||
- **2026-04-19** — Phase 6.1 shipped (PRs #78–83). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin `/hosts` data layer all live.
|