Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/ Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared, the legacy monolithic test projects) are left untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
185 lines
9.4 KiB
Markdown
185 lines
9.4 KiB
Markdown
# FOCAS Tier-C isolation — plan for task #220
|
||
|
||
> **Status**: **FULLY SHIPPED** (code). PRs A–E shipped the architecture; the
|
||
> 2026-04-23 follow-up shipped the production `Fwlib64FocasBackend` wrapping
|
||
> the licensed `Fwlib64.dll`. Only the wire-level live-boot against real
|
||
> hardware remains (task #222 / requires a bench CNC).
|
||
>
|
||
> **Major update 2026-04-23 — Host retargeted to .NET 10 x64 + Fwlib64**:
|
||
> Both `Fwlib32.dll` and `Fwlib64.dll` are licensed for this project. The
|
||
> original plan put the Host on .NET 4.8 x86 because Fwlib32 was assumed.
|
||
> With Fwlib64 available, the Host moves to `net10.0-windows` x64 — same
|
||
> runtime as the rest of the fleet. **Tier-C isolation stays anyway** — the
|
||
> blast-radius argument against a closed-source vendor P/Invoke is independent
|
||
> of bitness. Galaxy (forced x86 by MXAccess COM) is a pure bitness forcing;
|
||
> FOCAS is a pure blast-radius choice. Body of this document still reflects
|
||
> the original x86 assumptions in a few places — read them as historical
|
||
> design context; the current shape is in `docs/drivers/FOCAS-Test-Fixture.md`
|
||
> and `exit-gate-phase-3.md`.
|
||
>
|
||
> **Pre-reqs shipped**: version matrix + pre-flight validation
|
||
> (PR #168 — the cheap half of the hardware-free stability gap).
|
||
|
||
## Why isolate
|
||
|
||
`Fwlib32.dll` is a proprietary Fanuc library with no source, no
|
||
symbols, and a documented habit of crashing the hosting process on
|
||
network errors, malformed responses, and during handle recycling.
|
||
Today the FOCAS driver runs in-process with the OPC UA server —
|
||
a crash inside the Fanuc DLL takes every driver down with it,
|
||
including ones that have nothing to do with FOCAS. Galaxy has the
|
||
same class of problem and solved it with the Tier-C pattern (host
|
||
service + proxy driver + named-pipe IPC); FOCAS should follow that
|
||
playbook.
|
||
|
||
## Topology (target)
|
||
|
||
```
|
||
+-------------------------------------+ +--------------------------+
|
||
| OtOpcUa.Server (.NET 10 x64) | | OtOpcUaFocasHost |
|
||
| | pipe | (.NET 4.8 x86 Windows |
|
||
| ZB.MOM.WW.OtOpcUa.Driver.FOCAS | <-----> | service) |
|
||
| - FocasProxyDriver (in-proc) | | |
|
||
| - supervisor / respawn / BackPr | | Fwlib32.dll + session |
|
||
| | | handles + STA thread |
|
||
+-------------------------------------+ +--------------------------+
|
||
```
|
||
|
||
Why .NET 4.8 x86 for the host: `Fwlib32.dll` ships as 32-bit only.
|
||
The Galaxy.Host is already .NET 4.8 x86 for the same reason
|
||
(MXAccess COM bitness), so the NSSM wrapper pattern transfers
|
||
directly.
|
||
|
||
## Three new projects
|
||
|
||
| Project | TFM | Role |
|
||
| --- | --- | --- |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared` | `netstandard2.0` | MessagePack DTOs — `FocasReadRequest`, `FocasReadResponse`, `FocasSubscribeRequest`, `FocasPmcBitWriteRequest`, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical. |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host` | `net48` x86 | Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host). |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (existing) | `net10.0` | Collapses to a proxy that forwards each `IReadable` / `IWritable` / `ISubscribable` call over the pipe. `FocasCapabilityMatrix` + `FocasAddress` stay here — pre-flight runs before any IPC. |
|
||
|
||
## Supervisor responsibilities (in the Proxy)
|
||
|
||
Mirrors Galaxy.Proxy 1:1:
|
||
|
||
1. Start the Host process on first `InitializeAsync` (NSSM-wrapped
|
||
service in production, direct spawn in dev) + heartbeat every
|
||
5s.
|
||
2. If heartbeat misses 3× in a row, fan out `BadCommunicationError`
|
||
to every subscription and respawn with exponential backoff
|
||
(1s / 2s / 4s / max 30s).
|
||
3. Crash-loop circuit breaker: 5 respawns in 60s → drop to
|
||
`BadDeviceFailure` steady state until operator resets.
|
||
4. Post-mortem MMF: on Host exit, Host writes its last-N operations
|
||
+ session state to an MMF the Proxy reads to log context.
|
||
|
||
## IPC surface (approximate)
|
||
|
||
Every `FocasDriver` method that today calls into Fwlib32 directly
|
||
becomes an `ExecuteAsync` call with a typed request:
|
||
|
||
| Today (in-process) | Tier-C (IPC) |
|
||
| --- | --- |
|
||
| `FocasTagReader.Read(tag)` | `client.Execute(new FocasReadRequest(session, address))` |
|
||
| `FocasTagWriter.Write(tag, value)` | `client.Execute(new FocasWriteRequest(...))` |
|
||
| `FocasPmcBitRmw.Write(tag, bit, value)` | `client.Execute(new FocasPmcBitWriteRequest(...))` — RMW happens in Host so the critical section stays on one process |
|
||
| `FocasConnectivityProbe.ProbeAsync` | `client.Execute(new FocasProbeRequest())` |
|
||
| `FocasSubscriber.Subscribe(tags)` | `client.Execute(new FocasSubscribeRequest(tags))` — Host owns the poll loop + streams changes back as `FocasDataChangedNotification` over the pipe |
|
||
|
||
Subscription streaming is the non-obvious piece: the Host polls on
|
||
its own timer + pushes change notifications so the Proxy doesn't
|
||
round-trip per poll. Matches `Driver.Galaxy.Host` subscription
|
||
forwarding.
|
||
|
||
## PR sequence — shipped
|
||
|
||
1. **PR A (#169) — shared contracts** ✅
|
||
`Driver.FOCAS.Shared` netstandard2.0 with MessagePack DTOs for every
|
||
IPC surface (Hello/Heartbeat/OpenSession/Read/Write/PmcBitWrite/
|
||
Subscribe/Probe/RuntimeStatus/Recycle/ErrorResponse) + FrameReader/
|
||
FrameWriter + 24 round-trip tests.
|
||
2. **PR B (#170) — Host project skeleton** ✅
|
||
`Driver.FOCAS.Host` net48 x86 Windows Service entry point,
|
||
`PipeAcl` + `PipeServer` + `IFrameHandler` + `StubFrameHandler`.
|
||
ACL denies LocalSystem/Administrators; Hello verifies
|
||
shared-secret + protocol major. 3 handshake tests.
|
||
3. **PR C (#171) — IPC path end-to-end** ✅
|
||
Proxy `Ipc/FocasIpcClient` + `Ipc/IpcFocasClient` (implements
|
||
IFocasClient via IPC). Host `Backend/IFocasBackend` +
|
||
`FakeFocasBackend` + `UnconfiguredFocasBackend` +
|
||
`Ipc/FwlibFrameHandler` replacing the stub. 13 new round-trip
|
||
tests via in-memory loopback.
|
||
4. **PR D (#172) — Supervisor + respawn** ✅
|
||
`Supervisor/Backoff` (5s→15s→60s) + `CircuitBreaker` (3-in-5min →
|
||
1h→4h→manual) + `HeartbeatMonitor` + `IHostProcessLauncher` +
|
||
`FocasHostSupervisor`. 14 tests.
|
||
5. **PR E — Ops glue** ✅ (this PR)
|
||
`ProcessHostLauncher` (real Process.Start + FocasIpcClient
|
||
connect), `Host/Stability/PostMortemMmf` (magic 'OFPC') +
|
||
Proxy `Supervisor/PostMortemReader`, `scripts/install/
|
||
Install-FocasHost.ps1` + `Uninstall-FocasHost.ps1` NSSM wrappers.
|
||
7 tests (4 MMF round-trip + 3 reader format compatibility).
|
||
|
||
**Post-shipment totals: 189 FOCAS driver tests + 24 Shared tests + 13 Host tests = 226 FOCAS-family tests green.**
|
||
|
||
What remains is hardware-dependent: wiring `Fwlib32.dll` P/Invoke
|
||
into a real `FwlibHostedBackend` implementation of `IFocasBackend`
|
||
+ validating against a live CNC. The architecture is all the
|
||
plumbing that work needs.
|
||
|
||
## Testing without hardware
|
||
|
||
Same constraint as today: no CNC, no simulator. The isolation work
|
||
itself is verifiable without Fwlib32 actually being called:
|
||
|
||
- **Pipe contract**: PR A's MessagePack round-trip tests cover every
|
||
DTO.
|
||
- **Supervisor**: PR D uses a `FakeFocasHost` stub that can be told
|
||
to crash, hang, or miss heartbeats. The supervisor's respawn +
|
||
circuit-breaker behaviour is fully testable against the stub.
|
||
- **IPC ACL + auth**: reuse the Galaxy.Host's existing test harness
|
||
pattern — negative tests attempt to connect as the wrong user and
|
||
assert rejection.
|
||
- **Fwlib32 integration itself**: still untestable without hardware.
|
||
When a real CNC becomes available, the smoke tests already
|
||
scaffolded in `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
|
||
run against it via `FOCAS_ENDPOINT`.
|
||
|
||
## Decisions to confirm before starting
|
||
|
||
- **Sharing transport code with Galaxy.Host** — should the pipe
|
||
server + ACL + shared-secret + MMF plumbing go into a common
|
||
`Core.Hosting.Tier-C` project both hosts reference? Probably yes;
|
||
deferred until PR B is drafted because the right abstraction only
|
||
becomes visible after two uses.
|
||
- **Handle-recycling cadence** — Fwlib32 session handles leak
|
||
memory over weeks per the Fanuc-published defect list. Galaxy
|
||
recycles MXAccess handles on a 24h timer; FOCAS should mirror but
|
||
the trigger point (idle vs scheduled) needs operator input.
|
||
- **Per-CNC Host process vs one Host serving N CNCs** — one-per-CNC
|
||
isolates blast radius but scales poorly past ~20 machines; shared
|
||
Host scales but one bad CNC can wedge the lot. Start with shared
|
||
Host + document the blast-radius trade; revisit if operators hit
|
||
it.
|
||
|
||
## Non-goals
|
||
|
||
- Simulator work. `open_focas` + other OSS FOCAS simulators are
|
||
untested + not maintained; not worth chasing vs. waiting for real
|
||
hardware.
|
||
- Changing the public `FocasDriverOptions` shape beyond what
|
||
already shipped (the `Series` knob). Operator config continues to
|
||
look the same after the split — the Tier-C topology is invisible
|
||
from `appsettings.json`.
|
||
- Historian / long-term history integration. FOCAS driver doesn't
|
||
implement `IHistoryProvider` + there's no plan to add it.
|
||
|
||
## References
|
||
|
||
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md)
|
||
— the working Tier-C template this plan follows.
|
||
- [`docs/drivers/FOCAS-Test-Fixture.md`](../../drivers/FOCAS-Test-Fixture.md)
|
||
— what's covered today + what stays blocked on hardware.
|
||
- [`docs/v2/focas-version-matrix.md`](../focas-version-matrix.md) —
|
||
the capability matrix that pre-flights configs before IPC runs.
|