FOCAS version-matrix stabilization (PR 1 of #220 split) — ship the cheap half of the hardware-free stability gap ahead of the Tier-C out-of-process split. Without any CNC or simulator on the bench, the highest-leverage move is to catch operator config errors at init time instead of at steady-state per-read. Adds FocasCncSeries enum (Unknown/16i/0i-D/0i-F family/30i family/PowerMotion-i) + FocasCapabilityMatrix static class that encodes the per-series documented ranges for macro variables (cnc_rdmacro/wrmacro), parameters (cnc_rdparam/wrparam), and PMC letters + byte ceilings (pmc_rdpmcrng/wrpmcrng) straight from the Fanuc FOCAS Developer Kit. FocasDeviceOptions gains a Series knob (defaults Unknown = permissive so pre-matrix configs don't break on upgrade). FocasDriver.InitializeAsync now calls FocasAddress.TryParse on every tag + runs FocasCapabilityMatrix.Validate against the owning device's declared series, throwing InvalidOperationException with a reason string that names both the series and the documented limit ("Parameter #30000 is outside the documented range [0, 29999] for Thirty_i") so an operator can tell whether the mismatch is in the config or in their declared CNC model. Unknown series skips validation entirely. Ships 46 new theory cases in FocasCapabilityMatrixTests.cs — covering every boundary in the matrix (widen 16i->0i-F: macro ceiling 999->9999, param 9999->14999; widen 0i-F->30i: PMC letters +K+T; PMC-number 16i=999/0i-D=1999/0i-F=9999/30i=59999), permissive Unknown-series behavior, rejection-message content, and case-insensitive PMC-letter matching. Widening a range without updating docs/v2/focas-version-matrix.md fails a test because every InlineData cites the row it reflects. Full FOCAS test suite stays at 165/165 passing (119 existing + 46 new). Also authors docs/v2/focas-version-matrix.md as the authoritative range reference with per-function citations, CNC-series era context, error-surface shape, and the link back to the matrix code; docs/v2/implementation/focas-isolation-plan.md as the multi-PR plan for #220 Tier-C isolation (Shared contracts -> Host skeleton -> move Fwlib32 calls -> Supervisor+respawn -> MMF+ops glue, 2200-3200 LOC across 5 PRs mirroring the Galaxy Tier-C topology); and promotes docs/drivers/FOCAS-Test-Fixture.md from "version-matrix coverage = no" to explicit coverage via the new test file + cross-links to the matrix and isolation-plan docs. Leaves task #220 open since isolation itself (the expensive half) is still ahead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
163
docs/v2/implementation/focas-isolation-plan.md
Normal file
163
docs/v2/implementation/focas-isolation-plan.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# FOCAS Tier-C isolation — plan for task #220
|
||||
|
||||
> **Status**: DRAFT — not yet started. Tracks the multi-PR work to
|
||||
> move `Fwlib32.dll` behind an out-of-process host, mirroring the
|
||||
> Galaxy Tier-C split in [`phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md).
|
||||
>
|
||||
> **Pre-reqs shipped** (this PR): version matrix + pre-flight
|
||||
> validation + unit tests. Those close the cheap half of the
|
||||
> hardware-free stability gap. Tier-C closes the expensive half.
|
||||
|
||||
## Why isolate
|
||||
|
||||
`Fwlib32.dll` is a proprietary Fanuc library with no source, no
|
||||
symbols, and a documented habit of crashing the hosting process on
|
||||
network errors, malformed responses, and during handle recycling.
|
||||
Today the FOCAS driver runs in-process with the OPC UA server —
|
||||
a crash inside the Fanuc DLL takes every driver down with it,
|
||||
including ones that have nothing to do with FOCAS. Galaxy has the
|
||||
same class of problem and solved it with the Tier-C pattern (host
|
||||
service + proxy driver + named-pipe IPC); FOCAS should follow that
|
||||
playbook.
|
||||
|
||||
## Topology (target)
|
||||
|
||||
```
|
||||
+-------------------------------------+ +--------------------------+
|
||||
| OtOpcUa.Server (.NET 10 x64) | | OtOpcUaFocasHost |
|
||||
| | pipe | (.NET 4.8 x86 Windows |
|
||||
| ZB.MOM.WW.OtOpcUa.Driver.FOCAS | <-----> | service) |
|
||||
| - FocasProxyDriver (in-proc) | | |
|
||||
| - supervisor / respawn / BackPr | | Fwlib32.dll + session |
|
||||
| | | handles + STA thread |
|
||||
+-------------------------------------+ +--------------------------+
|
||||
```
|
||||
|
||||
Why .NET 4.8 x86 for the host: `Fwlib32.dll` ships as 32-bit only.
|
||||
The Galaxy.Host is already .NET 4.8 x86 for the same reason
|
||||
(MXAccess COM bitness), so the NSSM wrapper pattern transfers
|
||||
directly.
|
||||
|
||||
## Three new projects
|
||||
|
||||
| Project | TFM | Role |
|
||||
| --- | --- | --- |
|
||||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared` | `netstandard2.0` | MessagePack DTOs — `FocasReadRequest`, `FocasReadResponse`, `FocasSubscribeRequest`, `FocasPmcBitWriteRequest`, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical. |
|
||||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host` | `net48` x86 | Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host). |
|
||||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (existing) | `net10.0` | Collapses to a proxy that forwards each `IReadable` / `IWritable` / `ISubscribable` call over the pipe. `FocasCapabilityMatrix` + `FocasAddress` stay here — pre-flight runs before any IPC. |
|
||||
|
||||
## Supervisor responsibilities (in the Proxy)
|
||||
|
||||
Mirrors Galaxy.Proxy 1:1:
|
||||
|
||||
1. Start the Host process on first `InitializeAsync` (NSSM-wrapped
|
||||
service in production, direct spawn in dev) + heartbeat every
|
||||
5s.
|
||||
2. If heartbeat misses 3× in a row, fan out `BadCommunicationError`
|
||||
to every subscription and respawn with exponential backoff
|
||||
(1s / 2s / 4s / max 30s).
|
||||
3. Crash-loop circuit breaker: 5 respawns in 60s → drop to
|
||||
`BadDeviceFailure` steady state until operator resets.
|
||||
4. Post-mortem MMF: on Host exit, Host writes its last-N operations
|
||||
+ session state to an MMF the Proxy reads to log context.
|
||||
|
||||
## IPC surface (approximate)
|
||||
|
||||
Every `FocasDriver` method that today calls into Fwlib32 directly
|
||||
becomes an `ExecuteAsync` call with a typed request:
|
||||
|
||||
| Today (in-process) | Tier-C (IPC) |
|
||||
| --- | --- |
|
||||
| `FocasTagReader.Read(tag)` | `client.Execute(new FocasReadRequest(session, address))` |
|
||||
| `FocasTagWriter.Write(tag, value)` | `client.Execute(new FocasWriteRequest(...))` |
|
||||
| `FocasPmcBitRmw.Write(tag, bit, value)` | `client.Execute(new FocasPmcBitWriteRequest(...))` — RMW happens in Host so the critical section stays on one process |
|
||||
| `FocasConnectivityProbe.ProbeAsync` | `client.Execute(new FocasProbeRequest())` |
|
||||
| `FocasSubscriber.Subscribe(tags)` | `client.Execute(new FocasSubscribeRequest(tags))` — Host owns the poll loop + streams changes back as `FocasDataChangedNotification` over the pipe |
|
||||
|
||||
Subscription streaming is the non-obvious piece: the Host polls on
|
||||
its own timer + pushes change notifications so the Proxy doesn't
|
||||
round-trip per poll. Matches `Driver.Galaxy.Host` subscription
|
||||
forwarding.
|
||||
|
||||
## PR sequence (proposed)
|
||||
|
||||
1. **PR A — shared contracts**
|
||||
Create `Driver.FOCAS.Shared` with the MessagePack DTOs. No
|
||||
behaviour change. ~200 LOC + round-trip tests for each DTO.
|
||||
2. **PR B — Host project skeleton**
|
||||
Create `Driver.FOCAS.Host` .NET 4.8 x86 project, NSSM wrapper,
|
||||
pipe server scaffold with the same ACL + caller-SID + shared
|
||||
secret plumbing as Galaxy.Host. No Fwlib32 wiring yet — returns
|
||||
`NotImplemented` for everything. ~400 LOC.
|
||||
3. **PR C — Move Fwlib32 calls into Host**
|
||||
Move `FocasNativeSession`, `FocasTagReader`, `FocasTagWriter`,
|
||||
`FocasPmcBitRmw` + the STA thread into the Host. Proxy forwards
|
||||
over IPC. This is the biggest PR — probably 800-1500 LOC of
|
||||
move-with-translation. Existing unit tests keep passing because
|
||||
`IFocasTagFactory` is the DI seam the tests inject against.
|
||||
4. **PR D — Supervisor + respawn**
|
||||
Proxy-side heartbeat + respawn + crash-loop circuit breaker +
|
||||
BackPressure fan-out on Host death. ~500 LOC + chaos tests.
|
||||
5. **PR E — Post-mortem MMF + operational glue**
|
||||
MMF writer in Host, reader in Proxy. Install scripts for the
|
||||
new `OtOpcUaFocasHost` Windows service. Docs. ~300 LOC.
|
||||
|
||||
Total estimate: 2200-3200 LOC across 5 PRs. Consistent with Galaxy
|
||||
Tier-C but narrower since FOCAS has no Historian + no alarm
|
||||
history.
|
||||
|
||||
## Testing without hardware
|
||||
|
||||
Same constraint as today: no CNC, no simulator. The isolation work
|
||||
itself is verifiable without Fwlib32 actually being called:
|
||||
|
||||
- **Pipe contract**: PR A's MessagePack round-trip tests cover every
|
||||
DTO.
|
||||
- **Supervisor**: PR D uses a `FakeFocasHost` stub that can be told
|
||||
to crash, hang, or miss heartbeats. The supervisor's respawn +
|
||||
circuit-breaker behaviour is fully testable against the stub.
|
||||
- **IPC ACL + auth**: reuse the Galaxy.Host's existing test harness
|
||||
pattern — negative tests attempt to connect as the wrong user and
|
||||
assert rejection.
|
||||
- **Fwlib32 integration itself**: still untestable without hardware.
|
||||
When a real CNC becomes available, the smoke tests already
|
||||
scaffolded in `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
|
||||
run against it via `FOCAS_ENDPOINT`.
|
||||
|
||||
## Decisions to confirm before starting
|
||||
|
||||
- **Sharing transport code with Galaxy.Host** — should the pipe
|
||||
server + ACL + shared-secret + MMF plumbing go into a common
|
||||
`Core.Hosting.Tier-C` project both hosts reference? Probably yes;
|
||||
deferred until PR B is drafted because the right abstraction only
|
||||
becomes visible after two uses.
|
||||
- **Handle-recycling cadence** — Fwlib32 session handles leak
|
||||
memory over weeks per the Fanuc-published defect list. Galaxy
|
||||
recycles MXAccess handles on a 24h timer; FOCAS should mirror but
|
||||
the trigger point (idle vs scheduled) needs operator input.
|
||||
- **Per-CNC Host process vs one Host serving N CNCs** — one-per-CNC
|
||||
isolates blast radius but scales poorly past ~20 machines; shared
|
||||
Host scales but one bad CNC can wedge the lot. Start with shared
|
||||
Host + document the blast-radius trade; revisit if operators hit
|
||||
it.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Simulator work. `open_focas` + other OSS FOCAS simulators are
|
||||
untested + not maintained; not worth chasing vs. waiting for real
|
||||
hardware.
|
||||
- Changing the public `FocasDriverOptions` shape beyond what
|
||||
already shipped (the `Series` knob). Operator config continues to
|
||||
look the same after the split — the Tier-C topology is invisible
|
||||
from `appsettings.json`.
|
||||
- Historian / long-term history integration. FOCAS driver doesn't
|
||||
implement `IHistoryProvider` + there's no plan to add it.
|
||||
|
||||
## References
|
||||
|
||||
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md)
|
||||
— the working Tier-C template this plan follows.
|
||||
- [`docs/drivers/FOCAS-Test-Fixture.md`](../../drivers/FOCAS-Test-Fixture.md)
|
||||
— what's covered today + what stays blocked on hardware.
|
||||
- [`docs/v2/focas-version-matrix.md`](../focas-version-matrix.md) —
|
||||
the capability matrix that pre-flights configs before IPC runs.
|
||||
Reference in New Issue
Block a user