164 lines
8.1 KiB
Markdown
164 lines
8.1 KiB
Markdown
# FOCAS Tier-C isolation — plan for task #220
|
||
|
||
> **Status**: DRAFT — not yet started. Tracks the multi-PR work to
|
||
> move `Fwlib32.dll` behind an out-of-process host, mirroring the
|
||
> Galaxy Tier-C split in [`phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md).
|
||
>
|
||
> **Pre-reqs shipped** (this PR): version matrix + pre-flight
|
||
> validation + unit tests. Those close the cheap half of the
|
||
> hardware-free stability gap. Tier-C closes the expensive half.
|
||
|
||
## Why isolate
|
||
|
||
`Fwlib32.dll` is a proprietary Fanuc library with no source, no
|
||
symbols, and a documented habit of crashing the hosting process on
|
||
network errors, malformed responses, and during handle recycling.
|
||
Today the FOCAS driver runs in-process with the OPC UA server —
|
||
a crash inside the Fanuc DLL takes every driver down with it,
|
||
including ones that have nothing to do with FOCAS. Galaxy has the
|
||
same class of problem and solved it with the Tier-C pattern (host
|
||
service + proxy driver + named-pipe IPC); FOCAS should follow that
|
||
playbook.
|
||
|
||
## Topology (target)
|
||
|
||
```
|
||
+-------------------------------------+ +--------------------------+
|
||
| OtOpcUa.Server (.NET 10 x64) | | OtOpcUaFocasHost |
|
||
| | pipe | (.NET 4.8 x86 Windows |
|
||
| ZB.MOM.WW.OtOpcUa.Driver.FOCAS | <-----> | service) |
|
||
| - FocasProxyDriver (in-proc) | | |
|
||
| - supervisor / respawn / BackPr | | Fwlib32.dll + session |
|
||
| | | handles + STA thread |
|
||
+-------------------------------------+ +--------------------------+
|
||
```
|
||
|
||
Why .NET 4.8 x86 for the host: `Fwlib32.dll` ships as 32-bit only.
|
||
The Galaxy.Host is already .NET 4.8 x86 for the same reason
|
||
(MXAccess COM bitness), so the NSSM wrapper pattern transfers
|
||
directly.
|
||
|
||
## Three new projects
|
||
|
||
| Project | TFM | Role |
|
||
| --- | --- | --- |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared` | `netstandard2.0` | MessagePack DTOs — `FocasReadRequest`, `FocasReadResponse`, `FocasSubscribeRequest`, `FocasPmcBitWriteRequest`, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical. |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host` | `net48` x86 | Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host). |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (existing) | `net10.0` | Collapses to a proxy that forwards each `IReadable` / `IWritable` / `ISubscribable` call over the pipe. `FocasCapabilityMatrix` + `FocasAddress` stay here — pre-flight runs before any IPC. |
|
||
|
||
## Supervisor responsibilities (in the Proxy)
|
||
|
||
Mirrors Galaxy.Proxy 1:1:
|
||
|
||
1. Start the Host process on first `InitializeAsync` (NSSM-wrapped
|
||
service in production, direct spawn in dev) + heartbeat every
|
||
5s.
|
||
2. If heartbeat misses 3× in a row, fan out `BadCommunicationError`
|
||
to every subscription and respawn with exponential backoff
|
||
(1s / 2s / 4s / max 30s).
|
||
3. Crash-loop circuit breaker: 5 respawns in 60s → drop to
|
||
`BadDeviceFailure` steady state until operator resets.
|
||
4. Post-mortem MMF: on Host exit, Host writes its last-N operations
|
||
+ session state to an MMF the Proxy reads to log context.
|
||
|
||
## IPC surface (approximate)
|
||
|
||
Every `FocasDriver` method that today calls into Fwlib32 directly
|
||
becomes an `ExecuteAsync` call with a typed request:
|
||
|
||
| Today (in-process) | Tier-C (IPC) |
|
||
| --- | --- |
|
||
| `FocasTagReader.Read(tag)` | `client.Execute(new FocasReadRequest(session, address))` |
|
||
| `FocasTagWriter.Write(tag, value)` | `client.Execute(new FocasWriteRequest(...))` |
|
||
| `FocasPmcBitRmw.Write(tag, bit, value)` | `client.Execute(new FocasPmcBitWriteRequest(...))` — RMW happens in Host so the critical section stays on one process |
|
||
| `FocasConnectivityProbe.ProbeAsync` | `client.Execute(new FocasProbeRequest())` |
|
||
| `FocasSubscriber.Subscribe(tags)` | `client.Execute(new FocasSubscribeRequest(tags))` — Host owns the poll loop + streams changes back as `FocasDataChangedNotification` over the pipe |
|
||
|
||
Subscription streaming is the non-obvious piece: the Host polls on
|
||
its own timer + pushes change notifications so the Proxy doesn't
|
||
round-trip per poll. Matches `Driver.Galaxy.Host` subscription
|
||
forwarding.
|
||
|
||
## PR sequence (proposed)
|
||
|
||
1. **PR A — shared contracts**
|
||
Create `Driver.FOCAS.Shared` with the MessagePack DTOs. No
|
||
behaviour change. ~200 LOC + round-trip tests for each DTO.
|
||
2. **PR B — Host project skeleton**
|
||
Create `Driver.FOCAS.Host` .NET 4.8 x86 project, NSSM wrapper,
|
||
pipe server scaffold with the same ACL + caller-SID + shared
|
||
secret plumbing as Galaxy.Host. No Fwlib32 wiring yet — returns
|
||
`NotImplemented` for everything. ~400 LOC.
|
||
3. **PR C — Move Fwlib32 calls into Host**
|
||
Move `FocasNativeSession`, `FocasTagReader`, `FocasTagWriter`,
|
||
`FocasPmcBitRmw` + the STA thread into the Host. Proxy forwards
|
||
over IPC. This is the biggest PR — probably 800-1500 LOC of
|
||
move-with-translation. Existing unit tests keep passing because
|
||
`IFocasTagFactory` is the DI seam the tests inject against.
|
||
4. **PR D — Supervisor + respawn**
|
||
Proxy-side heartbeat + respawn + crash-loop circuit breaker +
|
||
BackPressure fan-out on Host death. ~500 LOC + chaos tests.
|
||
5. **PR E — Post-mortem MMF + operational glue**
|
||
MMF writer in Host, reader in Proxy. Install scripts for the
|
||
new `OtOpcUaFocasHost` Windows service. Docs. ~300 LOC.
|
||
|
||
Total estimate: 2200-3200 LOC across 5 PRs. Consistent with Galaxy
|
||
Tier-C but narrower since FOCAS has no Historian + no alarm
|
||
history.
|
||
|
||
## Testing without hardware
|
||
|
||
Same constraint as today: no CNC, no simulator. The isolation work
|
||
itself is verifiable without Fwlib32 actually being called:
|
||
|
||
- **Pipe contract**: PR A's MessagePack round-trip tests cover every
|
||
DTO.
|
||
- **Supervisor**: PR D uses a `FakeFocasHost` stub that can be told
|
||
to crash, hang, or miss heartbeats. The supervisor's respawn +
|
||
circuit-breaker behaviour is fully testable against the stub.
|
||
- **IPC ACL + auth**: reuse the Galaxy.Host's existing test harness
|
||
pattern — negative tests attempt to connect as the wrong user and
|
||
assert rejection.
|
||
- **Fwlib32 integration itself**: still untestable without hardware.
|
||
When a real CNC becomes available, the smoke tests already
|
||
scaffolded in `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
|
||
run against it via `FOCAS_ENDPOINT`.
|
||
|
||
## Decisions to confirm before starting
|
||
|
||
- **Sharing transport code with Galaxy.Host** — should the pipe
|
||
server + ACL + shared-secret + MMF plumbing go into a common
|
||
`Core.Hosting.Tier-C` project both hosts reference? Probably yes;
|
||
deferred until PR B is drafted because the right abstraction only
|
||
becomes visible after two uses.
|
||
- **Handle-recycling cadence** — Fwlib32 session handles leak
|
||
memory over weeks per the Fanuc-published defect list. Galaxy
|
||
recycles MXAccess handles on a 24h timer; FOCAS should mirror but
|
||
the trigger point (idle vs scheduled) needs operator input.
|
||
- **Per-CNC Host process vs one Host serving N CNCs** — one-per-CNC
|
||
isolates blast radius but scales poorly past ~20 machines; shared
|
||
Host scales but one bad CNC can wedge the lot. Start with shared
|
||
Host + document the blast-radius trade; revisit if operators hit
|
||
it.
|
||
|
||
## Non-goals
|
||
|
||
- Simulator work. `open_focas` + other OSS FOCAS simulators are
|
||
untested + not maintained; not worth chasing vs. waiting for real
|
||
hardware.
|
||
- Changing the public `FocasDriverOptions` shape beyond what
|
||
already shipped (the `Series` knob). Operator config continues to
|
||
look the same after the split — the Tier-C topology is invisible
|
||
from `appsettings.json`.
|
||
- Historian / long-term history integration. FOCAS driver doesn't
|
||
implement `IHistoryProvider` + there's no plan to add it.
|
||
|
||
## References
|
||
|
||
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md)
|
||
— the working Tier-C template this plan follows.
|
||
- [`docs/drivers/FOCAS-Test-Fixture.md`](../../drivers/FOCAS-Test-Fixture.md)
|
||
— what's covered today + what stays blocked on hardware.
|
||
- [`docs/v2/focas-version-matrix.md`](../focas-version-matrix.md) —
|
||
the capability matrix that pre-flights configs before IPC runs.
|