Files
lmxopcua/docs/v2/implementation/focas-isolation-plan.md
Joseph Doherty 969b0847a1 docs: update path references for module-folder reorganization
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:10:29 -04:00

185 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FOCAS Tier-C isolation — plan for task #220
> **Status**: **FULLY SHIPPED** (code). PRs AE shipped the architecture; the
> 2026-04-23 follow-up shipped the production `Fwlib64FocasBackend` wrapping
> the licensed `Fwlib64.dll`. Only the wire-level live-boot against real
> hardware remains (task #222 / requires a bench CNC).
>
> **Major update 2026-04-23 — Host retargeted to .NET 10 x64 + Fwlib64**:
> Both `Fwlib32.dll` and `Fwlib64.dll` are licensed for this project. The
> original plan put the Host on .NET 4.8 x86 because Fwlib32 was assumed.
> With Fwlib64 available, the Host moves to `net10.0-windows` x64 — same
> runtime as the rest of the fleet. **Tier-C isolation stays anyway** — the
> blast-radius argument against a closed-source vendor P/Invoke is independent
> of bitness. Galaxy (forced x86 by MXAccess COM) is a pure bitness forcing;
> FOCAS is a pure blast-radius choice. Body of this document still reflects
> the original x86 assumptions in a few places — read them as historical
> design context; the current shape is in `docs/drivers/FOCAS-Test-Fixture.md`
> and `exit-gate-phase-3.md`.
>
> **Pre-reqs shipped**: version matrix + pre-flight validation
> (PR #168 — the cheap half of the hardware-free stability gap).
## Why isolate
`Fwlib32.dll` is a proprietary Fanuc library with no source, no
symbols, and a documented habit of crashing the hosting process on
network errors, malformed responses, and during handle recycling.
Today the FOCAS driver runs in-process with the OPC UA server —
a crash inside the Fanuc DLL takes every driver down with it,
including ones that have nothing to do with FOCAS. Galaxy has the
same class of problem and solved it with the Tier-C pattern (host
service + proxy driver + named-pipe IPC); FOCAS should follow that
playbook.
## Topology (target)
```
+-------------------------------------+ +--------------------------+
| OtOpcUa.Server (.NET 10 x64) | | OtOpcUaFocasHost |
| | pipe | (.NET 4.8 x86 Windows |
| ZB.MOM.WW.OtOpcUa.Driver.FOCAS | <-----> | service) |
| - FocasProxyDriver (in-proc) | | |
| - supervisor / respawn / BackPr | | Fwlib32.dll + session |
| | | handles + STA thread |
+-------------------------------------+ +--------------------------+
```
Why .NET 4.8 x86 for the host: `Fwlib32.dll` ships as 32-bit only.
The Galaxy.Host is already .NET 4.8 x86 for the same reason
(MXAccess COM bitness), so the NSSM wrapper pattern transfers
directly.
## Three new projects
| Project | TFM | Role |
| --- | --- | --- |
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared` | `netstandard2.0` | MessagePack DTOs — `FocasReadRequest`, `FocasReadResponse`, `FocasSubscribeRequest`, `FocasPmcBitWriteRequest`, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical. |
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host` | `net48` x86 | Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host). |
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (existing) | `net10.0` | Collapses to a proxy that forwards each `IReadable` / `IWritable` / `ISubscribable` call over the pipe. `FocasCapabilityMatrix` + `FocasAddress` stay here — pre-flight runs before any IPC. |
## Supervisor responsibilities (in the Proxy)
Mirrors Galaxy.Proxy 1:1:
1. Start the Host process on first `InitializeAsync` (NSSM-wrapped
service in production, direct spawn in dev) + heartbeat every
5s.
2. If heartbeat misses 3× in a row, fan out `BadCommunicationError`
to every subscription and respawn with exponential backoff
(1s / 2s / 4s / max 30s).
3. Crash-loop circuit breaker: 5 respawns in 60s → drop to
`BadDeviceFailure` steady state until operator resets.
4. Post-mortem MMF: on Host exit, Host writes its last-N operations
+ session state to an MMF the Proxy reads to log context.
## IPC surface (approximate)
Every `FocasDriver` method that today calls into Fwlib32 directly
becomes an `ExecuteAsync` call with a typed request:
| Today (in-process) | Tier-C (IPC) |
| --- | --- |
| `FocasTagReader.Read(tag)` | `client.Execute(new FocasReadRequest(session, address))` |
| `FocasTagWriter.Write(tag, value)` | `client.Execute(new FocasWriteRequest(...))` |
| `FocasPmcBitRmw.Write(tag, bit, value)` | `client.Execute(new FocasPmcBitWriteRequest(...))` — RMW happens in Host so the critical section stays on one process |
| `FocasConnectivityProbe.ProbeAsync` | `client.Execute(new FocasProbeRequest())` |
| `FocasSubscriber.Subscribe(tags)` | `client.Execute(new FocasSubscribeRequest(tags))` — Host owns the poll loop + streams changes back as `FocasDataChangedNotification` over the pipe |
Subscription streaming is the non-obvious piece: the Host polls on
its own timer + pushes change notifications so the Proxy doesn't
round-trip per poll. Matches `Driver.Galaxy.Host` subscription
forwarding.
## PR sequence — shipped
1. **PR A (#169) — shared contracts**
`Driver.FOCAS.Shared` netstandard2.0 with MessagePack DTOs for every
IPC surface (Hello/Heartbeat/OpenSession/Read/Write/PmcBitWrite/
Subscribe/Probe/RuntimeStatus/Recycle/ErrorResponse) + FrameReader/
FrameWriter + 24 round-trip tests.
2. **PR B (#170) — Host project skeleton**
`Driver.FOCAS.Host` net48 x86 Windows Service entry point,
`PipeAcl` + `PipeServer` + `IFrameHandler` + `StubFrameHandler`.
ACL denies LocalSystem/Administrators; Hello verifies
shared-secret + protocol major. 3 handshake tests.
3. **PR C (#171) — IPC path end-to-end**
Proxy `Ipc/FocasIpcClient` + `Ipc/IpcFocasClient` (implements
IFocasClient via IPC). Host `Backend/IFocasBackend` +
`FakeFocasBackend` + `UnconfiguredFocasBackend` +
`Ipc/FwlibFrameHandler` replacing the stub. 13 new round-trip
tests via in-memory loopback.
4. **PR D (#172) — Supervisor + respawn**
`Supervisor/Backoff` (5s→15s→60s) + `CircuitBreaker` (3-in-5min →
1h→4h→manual) + `HeartbeatMonitor` + `IHostProcessLauncher` +
`FocasHostSupervisor`. 14 tests.
5. **PR E — Ops glue** ✅ (this PR)
`ProcessHostLauncher` (real Process.Start + FocasIpcClient
connect), `Host/Stability/PostMortemMmf` (magic 'OFPC') +
Proxy `Supervisor/PostMortemReader`, `scripts/install/
Install-FocasHost.ps1` + `Uninstall-FocasHost.ps1` NSSM wrappers.
7 tests (4 MMF round-trip + 3 reader format compatibility).
**Post-shipment totals: 189 FOCAS driver tests + 24 Shared tests + 13 Host tests = 226 FOCAS-family tests green.**
What remains is hardware-dependent: wiring `Fwlib32.dll` P/Invoke
into a real `FwlibHostedBackend` implementation of `IFocasBackend`
+ validating against a live CNC. The architecture is all the
plumbing that work needs.
## Testing without hardware
Same constraint as today: no CNC, no simulator. The isolation work
itself is verifiable without Fwlib32 actually being called:
- **Pipe contract**: PR A's MessagePack round-trip tests cover every
DTO.
- **Supervisor**: PR D uses a `FakeFocasHost` stub that can be told
to crash, hang, or miss heartbeats. The supervisor's respawn +
circuit-breaker behaviour is fully testable against the stub.
- **IPC ACL + auth**: reuse the Galaxy.Host's existing test harness
pattern — negative tests attempt to connect as the wrong user and
assert rejection.
- **Fwlib32 integration itself**: still untestable without hardware.
When a real CNC becomes available, the smoke tests already
scaffolded in `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
run against it via `FOCAS_ENDPOINT`.
## Decisions to confirm before starting
- **Sharing transport code with Galaxy.Host** — should the pipe
server + ACL + shared-secret + MMF plumbing go into a common
`Core.Hosting.Tier-C` project both hosts reference? Probably yes;
deferred until PR B is drafted because the right abstraction only
becomes visible after two uses.
- **Handle-recycling cadence** — Fwlib32 session handles leak
memory over weeks per the Fanuc-published defect list. Galaxy
recycles MXAccess handles on a 24h timer; FOCAS should mirror but
the trigger point (idle vs scheduled) needs operator input.
- **Per-CNC Host process vs one Host serving N CNCs** — one-per-CNC
isolates blast radius but scales poorly past ~20 machines; shared
Host scales but one bad CNC can wedge the lot. Start with shared
Host + document the blast-radius trade; revisit if operators hit
it.
## Non-goals
- Simulator work. `open_focas` + other OSS FOCAS simulators are
untested + not maintained; not worth chasing vs. waiting for real
hardware.
- Changing the public `FocasDriverOptions` shape beyond what
already shipped (the `Series` knob). Operator config continues to
look the same after the split — the Tier-C topology is invisible
from `appsettings.json`.
- Historian / long-term history integration. FOCAS driver doesn't
implement `IHistoryProvider` + there's no plan to add it.
## References
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md)
— the working Tier-C template this plan follows.
- [`docs/drivers/FOCAS-Test-Fixture.md`](../../drivers/FOCAS-Test-Fixture.md)
— what's covered today + what stays blocked on hardware.
- [`docs/v2/focas-version-matrix.md`](../focas-version-matrix.md) —
the capability matrix that pre-flights configs before IPC runs.