Files
lmxopcua/docs/v2/implementation/focas-isolation-plan.md
Joseph Doherty 4b0664bd55 FOCAS — retire Tier-C split, inline managed wire client, make read-only
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.

Architecture

- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
  (owner-imported — see Wire/FocasWireClient.cs for the full surface).
  Opens two TCP sockets, runs the initiate handshake, serialises requests
  on socket 2 through a semaphore, closes cleanly with PDU + socket
  teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
  `IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
  alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
  against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
  and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
  rejected at startup with a diagnostic pointing at the migration doc.

Deletions

- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
  Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
  FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
  P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
  `IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
  Host-process supervisor (backoff, circuit breaker, heartbeat, post-
  mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
  FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
  SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
  exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
  DLL that masqueraded as Fwlib64.dll.

Solution changes

- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
  `Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
  hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
  entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
  `WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.

Integration tests

- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
  tree reads (identity, axes, dynamic, program, operation mode, timers,
  spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
  PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
  `IAlarmSource` raise/clear transitions, and `ProbeAsync` /
  `OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
  (Docker container with the vendored Python mock's native FOCAS/2
  Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
  compose down. Dropped the shim-build stage + DLL-copy step + the split
  testhost workaround (the latter only existed because of native-DLL
  lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
  service. Tests seed per-series state via `mock_load_profile` at test
  start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
  FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
  pre-refresh snapshot only spoke the JSON admin protocol.

Tests

- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
  63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
  surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
  to assert the new diagnostic message pointing at
  `docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.

Docs

- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
  deployment collapses to one `"Backend": "wire"` config block, no
  separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
  gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
  topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
  the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
  driver complement closed. FOCAS change-log entry added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:10:59 -04:00

9.4 KiB
Raw Blame History

FOCAS Tier-C isolation — plan for task #220

Status: FULLY SHIPPED (code). PRs AE shipped the architecture; the 2026-04-23 follow-up shipped the production Fwlib64FocasBackend wrapping the licensed Fwlib64.dll. Only the wire-level live-boot against real hardware remains (task #222 / requires a bench CNC).

Major update 2026-04-23 — Host retargeted to .NET 10 x64 + Fwlib64: Both Fwlib32.dll and Fwlib64.dll are licensed for this project. The original plan put the Host on .NET 4.8 x86 because Fwlib32 was assumed. With Fwlib64 available, the Host moves to net10.0-windows x64 — same runtime as the rest of the fleet. Tier-C isolation stays anyway — the blast-radius argument against a closed-source vendor P/Invoke is independent of bitness. Galaxy (forced x86 by MXAccess COM) is a pure bitness forcing; FOCAS is a pure blast-radius choice. Body of this document still reflects the original x86 assumptions in a few places — read them as historical design context; the current shape is in docs/drivers/FOCAS-Test-Fixture.md and exit-gate-phase-3.md.

Pre-reqs shipped: version matrix + pre-flight validation (PR #168 — the cheap half of the hardware-free stability gap).

Why isolate

Fwlib32.dll is a proprietary Fanuc library with no source, no symbols, and a documented habit of crashing the hosting process on network errors, malformed responses, and during handle recycling. Today the FOCAS driver runs in-process with the OPC UA server — a crash inside the Fanuc DLL takes every driver down with it, including ones that have nothing to do with FOCAS. Galaxy has the same class of problem and solved it with the Tier-C pattern (host service + proxy driver + named-pipe IPC); FOCAS should follow that playbook.

Topology (target)

+-------------------------------------+         +--------------------------+
|  OtOpcUa.Server (.NET 10 x64)       |         | OtOpcUaFocasHost         |
|                                     |  pipe   | (.NET 4.8 x86 Windows    |
|  ZB.MOM.WW.OtOpcUa.Driver.FOCAS     | <-----> |  service)                |
|    - FocasProxyDriver (in-proc)     |         |                          |
|    - supervisor / respawn / BackPr  |         |  Fwlib32.dll + session   |
|                                     |         |  handles + STA thread    |
+-------------------------------------+         +--------------------------+

Why .NET 4.8 x86 for the host: Fwlib32.dll ships as 32-bit only. The Galaxy.Host is already .NET 4.8 x86 for the same reason (MXAccess COM bitness), so the NSSM wrapper pattern transfers directly.

Three new projects

Project TFM Role
ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared netstandard2.0 MessagePack DTOs — FocasReadRequest, FocasReadResponse, FocasSubscribeRequest, FocasPmcBitWriteRequest, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical.
ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host net48 x86 Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host).
ZB.MOM.WW.OtOpcUa.Driver.FOCAS (existing) net10.0 Collapses to a proxy that forwards each IReadable / IWritable / ISubscribable call over the pipe. FocasCapabilityMatrix + FocasAddress stay here — pre-flight runs before any IPC.

Supervisor responsibilities (in the Proxy)

Mirrors Galaxy.Proxy 1:1:

  1. Start the Host process on first InitializeAsync (NSSM-wrapped service in production, direct spawn in dev) + heartbeat every 5s.
  2. If heartbeat misses 3× in a row, fan out BadCommunicationError to every subscription and respawn with exponential backoff (1s / 2s / 4s / max 30s).
  3. Crash-loop circuit breaker: 5 respawns in 60s → drop to BadDeviceFailure steady state until operator resets.
  4. Post-mortem MMF: on Host exit, Host writes its last-N operations
    • session state to an MMF the Proxy reads to log context.

IPC surface (approximate)

Every FocasDriver method that today calls into Fwlib32 directly becomes an ExecuteAsync call with a typed request:

Today (in-process) Tier-C (IPC)
FocasTagReader.Read(tag) client.Execute(new FocasReadRequest(session, address))
FocasTagWriter.Write(tag, value) client.Execute(new FocasWriteRequest(...))
FocasPmcBitRmw.Write(tag, bit, value) client.Execute(new FocasPmcBitWriteRequest(...)) — RMW happens in Host so the critical section stays on one process
FocasConnectivityProbe.ProbeAsync client.Execute(new FocasProbeRequest())
FocasSubscriber.Subscribe(tags) client.Execute(new FocasSubscribeRequest(tags)) — Host owns the poll loop + streams changes back as FocasDataChangedNotification over the pipe

Subscription streaming is the non-obvious piece: the Host polls on its own timer + pushes change notifications so the Proxy doesn't round-trip per poll. Matches Driver.Galaxy.Host subscription forwarding.

PR sequence — shipped

  1. PR A (#169) — shared contracts Driver.FOCAS.Shared netstandard2.0 with MessagePack DTOs for every IPC surface (Hello/Heartbeat/OpenSession/Read/Write/PmcBitWrite/ Subscribe/Probe/RuntimeStatus/Recycle/ErrorResponse) + FrameReader/ FrameWriter + 24 round-trip tests.
  2. PR B (#170) — Host project skeleton Driver.FOCAS.Host net48 x86 Windows Service entry point, PipeAcl + PipeServer + IFrameHandler + StubFrameHandler. ACL denies LocalSystem/Administrators; Hello verifies shared-secret + protocol major. 3 handshake tests.
  3. PR C (#171) — IPC path end-to-end Proxy Ipc/FocasIpcClient + Ipc/IpcFocasClient (implements IFocasClient via IPC). Host Backend/IFocasBackend + FakeFocasBackend + UnconfiguredFocasBackend + Ipc/FwlibFrameHandler replacing the stub. 13 new round-trip tests via in-memory loopback.
  4. PR D (#172) — Supervisor + respawn Supervisor/Backoff (5s→15s→60s) + CircuitBreaker (3-in-5min → 1h→4h→manual) + HeartbeatMonitor + IHostProcessLauncher + FocasHostSupervisor. 14 tests.
  5. PR E — Ops glue (this PR) ProcessHostLauncher (real Process.Start + FocasIpcClient connect), Host/Stability/PostMortemMmf (magic 'OFPC') + Proxy Supervisor/PostMortemReader, scripts/install/ Install-FocasHost.ps1 + Uninstall-FocasHost.ps1 NSSM wrappers. 7 tests (4 MMF round-trip + 3 reader format compatibility).

Post-shipment totals: 189 FOCAS driver tests + 24 Shared tests + 13 Host tests = 226 FOCAS-family tests green.

What remains is hardware-dependent: wiring Fwlib32.dll P/Invoke into a real FwlibHostedBackend implementation of IFocasBackend

  • validating against a live CNC. The architecture is all the plumbing that work needs.

Testing without hardware

Same constraint as today: no CNC, no simulator. The isolation work itself is verifiable without Fwlib32 actually being called:

  • Pipe contract: PR A's MessagePack round-trip tests cover every DTO.
  • Supervisor: PR D uses a FakeFocasHost stub that can be told to crash, hang, or miss heartbeats. The supervisor's respawn + circuit-breaker behaviour is fully testable against the stub.
  • IPC ACL + auth: reuse the Galaxy.Host's existing test harness pattern — negative tests attempt to connect as the wrong user and assert rejection.
  • Fwlib32 integration itself: still untestable without hardware. When a real CNC becomes available, the smoke tests already scaffolded in tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/ run against it via FOCAS_ENDPOINT.

Decisions to confirm before starting

  • Sharing transport code with Galaxy.Host — should the pipe server + ACL + shared-secret + MMF plumbing go into a common Core.Hosting.Tier-C project both hosts reference? Probably yes; deferred until PR B is drafted because the right abstraction only becomes visible after two uses.
  • Handle-recycling cadence — Fwlib32 session handles leak memory over weeks per the Fanuc-published defect list. Galaxy recycles MXAccess handles on a 24h timer; FOCAS should mirror but the trigger point (idle vs scheduled) needs operator input.
  • Per-CNC Host process vs one Host serving N CNCs — one-per-CNC isolates blast radius but scales poorly past ~20 machines; shared Host scales but one bad CNC can wedge the lot. Start with shared Host + document the blast-radius trade; revisit if operators hit it.

Non-goals

  • Simulator work. open_focas + other OSS FOCAS simulators are untested + not maintained; not worth chasing vs. waiting for real hardware.
  • Changing the public FocasDriverOptions shape beyond what already shipped (the Series knob). Operator config continues to look the same after the split — the Tier-C topology is invisible from appsettings.json.
  • Historian / long-term history integration. FOCAS driver doesn't implement IHistoryProvider + there's no plan to add it.

References