Files
lmxopcua/docs/v2/v2-release-readiness.md
Joseph Doherty 4b0664bd55 FOCAS — retire Tier-C split, inline managed wire client, make read-only
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.

Architecture

- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
  (owner-imported — see Wire/FocasWireClient.cs for the full surface).
  Opens two TCP sockets, runs the initiate handshake, serialises requests
  on socket 2 through a semaphore, closes cleanly with PDU + socket
  teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
  `IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
  alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
  against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
  and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
  rejected at startup with a diagnostic pointing at the migration doc.

Deletions

- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
  Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
  FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
  P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
  `IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
  Host-process supervisor (backoff, circuit breaker, heartbeat, post-
  mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
  FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
  SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
  exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
  DLL that masqueraded as Fwlib64.dll.

Solution changes

- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
  `Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
  hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
  entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
  `WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.

Integration tests

- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
  tree reads (identity, axes, dynamic, program, operation mode, timers,
  spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
  PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
  `IAlarmSource` raise/clear transitions, and `ProbeAsync` /
  `OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
  (Docker container with the vendored Python mock's native FOCAS/2
  Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
  compose down. Dropped the shim-build stage + DLL-copy step + the split
  testhost workaround (the latter only existed because of native-DLL
  lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
  service. Tests seed per-series state via `mock_load_profile` at test
  start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
  FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
  pre-refresh snapshot only spoke the JSON admin protocol.

Tests

- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
  63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
  surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
  to assert the new diagnostic message pointing at
  `docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.

Docs

- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
  deployment collapses to one `"Backend": "wire"` config block, no
  separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
  gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
  topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
  the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
  driver complement closed. FOCAS change-log entry added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:10:59 -04:00

12 KiB
Raw Blame History

v2 Release Readiness

Last updated: 2026-04-24 (Phase 5 driver complement closed — AB CIP, AB Legacy, TwinCAT, FOCAS all shipped; FOCAS Tier-C retired for a pure-managed in-process client) Status: RELEASE-READY (code-path) for v2 GA. All three original code-path release blockers remain closed. Phase 5 is now complete. Remaining work is manual (live-hardware validations, client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.

This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.

Release-readiness dashboard

Phase Shipped Status
Phase 0 — Rename + entry gate Shipped
Phase 1 — Configuration + Admin scaffold Shipped (some UI items deferred to 6.4)
Phase 2 — Galaxy driver split (Proxy/Host/Shared) Shipped
Phase 3 — OPC UA server + LDAP + security profiles Shipped
Phase 4 — Redundancy scaffold (entities + endpoints) Shipped (runtime closes in 6.3)
Phase 5 — Drivers Shipped — Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP, AB Legacy, TwinCAT ADS, FOCAS (managed wire client)
Phase 6.1 — Resilience & Observability Shipped (PRs #7883)
Phase 6.2 — Authorization runtime ◐ core Core shipped (PRs #8488, #94 dispatch wiring); finer-grained Browse/Subscribe/Alarm/Call gating + 3-user interop matrix deferred
Phase 6.3 — Redundancy runtime ◐ core Core shipped (PRs #8990, #9899); peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix deferred
Phase 6.4 — Admin UI completion ◐ data layer + Identification Data layer + OPC 40010 Identification folder shipped (PRs #9192, Identification audit close-out 2026-04-23); Blazor UI pieces deferred

Driver integration-test counts (end-to-end against live or simulated targets): Modbus 26, FOCAS 9, AbCip 7, OpcUaClient 3, S7 3, AbLegacy 2, TwinCAT 2. Plus Galaxy's separate cross-FX parity/stability suite.

Aggregate test counts (2026-04-19 baseline): 1159 passing across the solution. One pre-existing Client.CLI SubscribeCommandTests.Execute_PrintsSubscriptionMessage flake tracked separately. Rerun dotnet test ZB.MOM.WW.OtOpcUa.slnx after the FOCAS migration commits land to refresh the number.

Release blockers (must close before v2 GA)

All code-path release blockers are closed. The remaining items are live-hardware / manual validations listed under exit criteria.

Security — Phase 6.2 dispatch wiring (task #143 — CLOSED 2026-04-19, PR #94)

Closed. AuthorizationGate + NodeScopeResolver thread through OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager. OnReadValue + OnWriteValue + all four HistoryRead paths call gate.IsAllowed(identity, operation, scope) before the invoker. Production deployments activate enforcement by constructing OpcUaApplicationHost with an AuthorizationGate(StrictMode: true) + populating the NodeAcl table.

Remaining Stream C surfaces (hardening, not release-blocking):

  • Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per acl-design.md §Browse.
  • CreateMonitoredItems + TransferSubscriptions gating with per-item (AuthGenerationId, MembershipVersion) stamp so revoked grants surface BadUserAccessDenied within one publish cycle (decision #153).
  • Alarm Acknowledge / Confirm / Shelve gating.
  • Call (method invocation) gating.
  • Finer-grained scope resolution — current NodeScopeResolver returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.
  • 3-user integration matrix covering every operation × allow/deny.

Config fallback — Phase 6.1 Stream D wiring (task #136 — CLOSED 2026-04-19, PR #96)

Closed. SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag end-to-end; /healthz surfaces the stale flag.

Remaining follow-ups (hardening):

  • A HostedService that polls sp_GetCurrentGenerationForCluster periodically so peer-published generations land in this node's cache without a restart.
  • Richer snapshot payload via sp_GetGenerationContent so fallback can serve full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.

Redundancy — Phase 6.3 Streams A/C core (tasks #145 + #147 — CLOSED 2026-04-19, PRs #9899)

Closed. RedundancyCoordinator + RedundancyStatePublisher + PeerReachabilityTracker orchestrate topology + apply lease + recovery state + peer reachability through ServiceLevelCalculator + emit OnStateChanged / OnServerUriArrayChanged edge-triggered events.

Remaining Phase 6.3 surfaces (hardening, not release-blocking):

  • PeerHttpProbeLoop + PeerUaProbeLoop HostedServices populating PeerReachabilityTracker on each tick. Without these the publisher sees PeerReachability.Unknown → Isolated-Primary band (230). Safe default but not the full non-transparent-redundancy UX.
  • OPC UA variable-node wiring: bind ServiceLevel Byte + ServerUriArray String[] to the publisher's events via BaseDataVariable.OnReadValue / direct value push.
  • sp_PublishGeneration wraps its apply in await using var lease = coordinator.BeginApplyLease(...) so the PrimaryMidApply band (200) fires during actual publishes (task #148 part 2).
  • Client interop matrix — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only.

Phase 5 driver complement (task #120 — CLOSED 2026-04-24)

Closed. All four deferred drivers shipped:

  • AB CIP (PRs #202222) — Driver.AbCip, Driver.AbCip.IntegrationTests (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
  • AB Legacy (PRs #202, #223) — Driver.AbLegacy, Driver.AbLegacy.IntegrationTests (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
  • TwinCAT ADS (PRs #205, this branch task-galaxy-e2e) — Driver.TwinCAT, Driver.TwinCAT.IntegrationTests (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
  • FOCAS (PRs #173, #199 + this session's migration) — Driver.FOCAS with an in-process managed FocasWireClient that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — Driver.FOCAS.Host + Driver.FOCAS.Shared + FwlibNative P/Invoke + shim DLL + NSSM service all deleted. Driver.FOCAS.IntegrationTests covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).

Decision recorded: FOCAS is read-only against the CNC by design — writes return BadNotWritable. See docs/drivers/FOCAS.md + docs/drivers/FOCAS-Test-Fixture.md for the deployment + coverage map.

Nice-to-haves (not release-blocking)

  • Admin UI — Phase 6.1 Stream E.2/E.3 (/hosts column refresh), Phase 6.2 Stream D (RoleGrantsTab + AclsTab Probe), Phase 6.3 Stream E (RedundancyTab), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D IdentificationFields.razor. Tasks #134, #144, #149, #153, #155, #156, #157.
  • Background services — Phase 6.1 Stream B.4 ScheduledRecycleScheduler HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through CapabilityInvoker).
  • Multi-host dispatch — Phase 6.1 Stream A follow-up (task #135). Every driver currently gets a single pipeline keyed on driver.DriverInstanceId; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but not wired.
  • Phase 7 — scripting + alarming + historian sink (plan drafted 2026-04-20 in docs/v2/implementation/phase-7-*.md). Out of scope for v2 GA.

Live-hardware validations (task #54 + task family)

The code ships; these tasks remain open as lab/field verification:

  • #54 — FOCAS live-CNC wire-level smoke against a real FANUC control. The mock's wire responder is PDU-verified against fwlibe64.dll upstream but OtOpcUa's managed client has not been pointed at a production CNC.
  • AB CIP live-boot — already passed on a ControlLogix rig (PR #222). Continue to run ahead of each release.
  • TwinCAT wire-live — TCBSD/ESXi fixture covers the common path; production PLC verification remains lab-gated.

Running the release-readiness check

pwsh ./scripts/compliance/phase-6-all.ps1

This meta-runner invokes each phase-6-N-compliance.ps1 script in sequence and reports an aggregate PASS/FAIL:

  • phase-6-1-compliance.ps1 — Resilience & Observability
  • phase-6-2-compliance.ps1 — Authorization runtime
  • phase-6-3-compliance.ps1 — Redundancy runtime
  • phase-6-4-compliance.ps1 — Admin UI completion

Exit 0 = every phase passes its compliance checks + no test-count regression.

Release-readiness exit criteria

v2 GA requires all of the following:

  • All four Phase 6.N compliance scripts exit 0.
  • dotnet test ZB.MOM.WW.OtOpcUa.slnx passes with ≤ 1 known-flake failure.
  • Release blockers listed above all closed.
  • Phase 5 driver complement shipped (Galaxy, Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS).
  • Production deployment checklist (separate doc) signed off by Fleet Admin.
  • At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
  • FOCAS live-CNC wire-level smoke (#54) runs clean against a real FANUC control.
  • OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
  • Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).

Change log

  • 2026-04-24 — Phase 5 driver complement closed (task #120 CLOSED). AB CIP, AB Legacy, TwinCAT, FOCAS all shipped. FOCAS migration: retired the Tier-C split (Driver.FOCAS.Host + Driver.FOCAS.Shared + FwlibNative + shim DLL deleted) in favour of a pure-managed in-process FocasWireClient inlined into Driver.FOCAS; driver is now read-only against the CNC by design. Integration test matrix grew to cover Browse / Subscribe / IAlarmSource / Probe end-to-end.
  • 2026-04-23 — Phase 6.4 audit close-out. IdentificationFolderBuilder + OPC 40010 Identification folder verified against the shipped code.
  • 2026-04-20 — Phase 7 plan drafted (phase-7-scripting-and-alarming.md, phase-7-e2e-smoke.md). Out of scope for v2 GA.
  • 2026-04-19 — Release blocker #3 closed (PRs #9899). Phase 6.3 Streams A + C core shipped: ClusterTopologyLoader + RedundancyCoordinator + RedundancyStatePublisher + PeerReachabilityTracker. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix) are hardening follow-ups.
  • 2026-04-19 — Release blocker #2 closed (PR #96). SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag; /healthz surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
  • 2026-04-19 — Release blocker #1 closed (PR #94). AuthorizationGate wired into DriverNodeManager Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
  • 2026-04-19 — Phase 6.4 data layer merged (PRs #9192). Phase 6 core complete.
  • 2026-04-19 — Phase 6.3 core merged (PRs #8990). ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred.
  • 2026-04-19 — Phase 6.2 core merged (PRs #8488). AuthorizationGate + TriePermissionEvaluator + LdapGroupRoleMapping land; dispatch wiring + Admin UI deferred.
  • 2026-04-19 — Phase 6.1 shipped (PRs #7883). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin /hosts data layer all live.