Production IHostProcessLauncher (ProcessHostLauncher.cs): Process.Start spawns OtOpcUa.Driver.FOCAS.Host.exe with OTOPCUA_FOCAS_PIPE / OTOPCUA_ALLOWED_SID / OTOPCUA_FOCAS_SECRET / OTOPCUA_FOCAS_BACKEND in the environment (supervisor-owned, never disk), polls FocasIpcClient.ConnectAsync at 250ms cadence until the pipe is up or the Host exits or the ConnectTimeout deadline passes, then wraps the connected client in an IpcFocasClient. TerminateAsync kills the entire process tree + disposes the IPC stream. ProcessHostLauncherOptions carries HostExePath + PipeName + AllowedSid plus optional SharedSecret (auto-generated from a GUID when omitted so install scripts don't have to), Arguments, Backend (fwlib32/fake/unconfigured default-unconfigured), ConnectTimeout (15s), and Series for CNC pre-flight. Post-mortem MMF (Host/Stability/PostMortemMmf.cs + Proxy/Supervisor/PostMortemReader.cs): ring-buffer of the last ~1000 IPC operations written by the Host into a memory-mapped file. On a Host crash the supervisor reads the MMF — which survives process death — to see what was in flight. File format: 16-byte header [magic 'OFPC' (0x4F465043) | version | capacity | writeIndex] + N × 256-byte entries [8-byte UTC unix ms | 8-byte opKind | 240-byte UTF-8 message + null terminator]. Magic distinguishes FOCAS MMFs from the Galaxy MMFs that ship the same format shape. Writer is single-producer (Host) with a lock_writeGate; reader is multi-consumer (Proxy + any diagnostic tool) using a separate MemoryMappedFile handle. NSSM install wrappers (scripts/install/Install-FocasHost.ps1 + Uninstall-FocasHost.ps1): idempotent service registration for OtOpcUaFocasHost. Resolves SID from the ServiceAccount, generates a fresh shared secret per install if not supplied, stages OTOPCUA_FOCAS_PIPE/SID/SECRET/BACKEND in AppEnvironmentExtra so they never hit disk, rotates 10MB stdout/stderr logs under %ProgramData%\OtOpcUa, DependOnService=OtOpcUa so startup order is deterministic. Backend selector defaults to unconfigured so a fresh install doesn't accidentally load a half-configured Fwlib32.dll on first start. Tests (7 new, 2 files): PostMortemMmfTests.cs in FOCAS.Host.Tests — round-trip write+read preserves order + content, ring-buffer wraps at capacity (writes 10 entries to a 3-slot buffer, asserts only op-7/8/9 survive in FIFO order), message truncation at the 240-byte cap is null-terminated + non-overflowing, reopening an existing file preserves entries. PostMortemReaderCompatibilityTests.cs in FOCAS.Tests — hand-writes a file in the exact host format (magic/entry layout) + asserts the Proxy reader decodes with correct ring-walk ordering when writeIndex != 0, empty-return on missing file + magic mismatch. Keeps the two codebases in format-lockstep without the net10 test project referencing the net48 Host assembly. Docs updated: docs/v2/implementation/focas-isolation-plan.md promoted from DRAFT to PRs A-E shipped status with per-PR citations + post-ship test counts (189 + 24 + 13 = 226 FOCAS-family tests green). docs/drivers/FOCAS-Test-Fixture.md §5 updated from "architecture scoped but not implemented" to listing the shipped components with the FwlibHostedBackend gap explicitly labeled as hardware-gated. Install-FocasHost.ps1 documents the OTOPCUA_FOCAS_BACKEND selector + points at docs/v2/focas-deployment.md for Fwlib32.dll licensing. What ISN'T in this PR: (1) the real FwlibHostedBackend implementing IFocasBackend with the P/Invoke — requires either a CNC on the bench or a licensed FANUC developer kit to validate, tracked under #220 as a single follow-up task; (2) Admin /hosts surface integration for FOCAS runtime status — Galaxy Tier-C already has the shape, FOCAS can slot in when someone wires ObservedCrashes/StickyAlertActive/BackoffAttempt to the FleetStatusHub; (3) a full integration test that actually spawns a real FOCAS Host process — ProcessHostLauncher is tested via its contract + the MMF is tested via round-trip, but no test spins up the real exe (the Galaxy Tier-C tests do this, but the FOCAS equivalent adds no new coverage over what's already in place). Total FOCAS-family tests green after this PR: 189 driver + 24 Shared + 13 Host = 226. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
174 lines
8.6 KiB
Markdown
174 lines
8.6 KiB
Markdown
# FOCAS Tier-C isolation — plan for task #220
|
||
|
||
> **Status**: PRs A–E shipped. Architecture is in place; the only
|
||
> remaining FOCAS work is the hardware-dependent production
|
||
> integration of `Fwlib32.dll` into a real `IFocasBackend`
|
||
> (`FwlibHostedBackend`), which needs an actual CNC on the bench
|
||
> and is tracked as a follow-up on #220.
|
||
>
|
||
> **Pre-reqs shipped**: version matrix + pre-flight validation
|
||
> (PR #168 — the cheap half of the hardware-free stability gap).
|
||
|
||
## Why isolate
|
||
|
||
`Fwlib32.dll` is a proprietary Fanuc library with no source, no
|
||
symbols, and a documented habit of crashing the hosting process on
|
||
network errors, malformed responses, and during handle recycling.
|
||
Today the FOCAS driver runs in-process with the OPC UA server —
|
||
a crash inside the Fanuc DLL takes every driver down with it,
|
||
including ones that have nothing to do with FOCAS. Galaxy has the
|
||
same class of problem and solved it with the Tier-C pattern (host
|
||
service + proxy driver + named-pipe IPC); FOCAS should follow that
|
||
playbook.
|
||
|
||
## Topology (target)
|
||
|
||
```
|
||
+-------------------------------------+ +--------------------------+
|
||
| OtOpcUa.Server (.NET 10 x64) | | OtOpcUaFocasHost |
|
||
| | pipe | (.NET 4.8 x86 Windows |
|
||
| ZB.MOM.WW.OtOpcUa.Driver.FOCAS | <-----> | service) |
|
||
| - FocasProxyDriver (in-proc) | | |
|
||
| - supervisor / respawn / BackPr | | Fwlib32.dll + session |
|
||
| | | handles + STA thread |
|
||
+-------------------------------------+ +--------------------------+
|
||
```
|
||
|
||
Why .NET 4.8 x86 for the host: `Fwlib32.dll` ships as 32-bit only.
|
||
The Galaxy.Host is already .NET 4.8 x86 for the same reason
|
||
(MXAccess COM bitness), so the NSSM wrapper pattern transfers
|
||
directly.
|
||
|
||
## Three new projects
|
||
|
||
| Project | TFM | Role |
|
||
| --- | --- | --- |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared` | `netstandard2.0` | MessagePack DTOs — `FocasReadRequest`, `FocasReadResponse`, `FocasSubscribeRequest`, `FocasPmcBitWriteRequest`, etc. Same assembly referenced by .NET 10 + .NET 4.8 so the wire format stays identical. |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host` | `net48` x86 | Windows service. Owns the Fwlib32 session handles + STA thread + handle-recycling loop. Pipe server + per-call auth (same ACL + caller SID + shared secret pattern as Galaxy.Host). |
|
||
| `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (existing) | `net10.0` | Collapses to a proxy that forwards each `IReadable` / `IWritable` / `ISubscribable` call over the pipe. `FocasCapabilityMatrix` + `FocasAddress` stay here — pre-flight runs before any IPC. |
|
||
|
||
## Supervisor responsibilities (in the Proxy)
|
||
|
||
Mirrors Galaxy.Proxy 1:1:
|
||
|
||
1. Start the Host process on first `InitializeAsync` (NSSM-wrapped
|
||
service in production, direct spawn in dev) + heartbeat every
|
||
5s.
|
||
2. If heartbeat misses 3× in a row, fan out `BadCommunicationError`
|
||
to every subscription and respawn with exponential backoff
|
||
(1s / 2s / 4s / max 30s).
|
||
3. Crash-loop circuit breaker: 5 respawns in 60s → drop to
|
||
`BadDeviceFailure` steady state until operator resets.
|
||
4. Post-mortem MMF: on Host exit, Host writes its last-N operations
|
||
+ session state to an MMF the Proxy reads to log context.
|
||
|
||
## IPC surface (approximate)
|
||
|
||
Every `FocasDriver` method that today calls into Fwlib32 directly
|
||
becomes an `ExecuteAsync` call with a typed request:
|
||
|
||
| Today (in-process) | Tier-C (IPC) |
|
||
| --- | --- |
|
||
| `FocasTagReader.Read(tag)` | `client.Execute(new FocasReadRequest(session, address))` |
|
||
| `FocasTagWriter.Write(tag, value)` | `client.Execute(new FocasWriteRequest(...))` |
|
||
| `FocasPmcBitRmw.Write(tag, bit, value)` | `client.Execute(new FocasPmcBitWriteRequest(...))` — RMW happens in Host so the critical section stays on one process |
|
||
| `FocasConnectivityProbe.ProbeAsync` | `client.Execute(new FocasProbeRequest())` |
|
||
| `FocasSubscriber.Subscribe(tags)` | `client.Execute(new FocasSubscribeRequest(tags))` — Host owns the poll loop + streams changes back as `FocasDataChangedNotification` over the pipe |
|
||
|
||
Subscription streaming is the non-obvious piece: the Host polls on
|
||
its own timer + pushes change notifications so the Proxy doesn't
|
||
round-trip per poll. Matches `Driver.Galaxy.Host` subscription
|
||
forwarding.
|
||
|
||
## PR sequence — shipped
|
||
|
||
1. **PR A (#169) — shared contracts** ✅
|
||
`Driver.FOCAS.Shared` netstandard2.0 with MessagePack DTOs for every
|
||
IPC surface (Hello/Heartbeat/OpenSession/Read/Write/PmcBitWrite/
|
||
Subscribe/Probe/RuntimeStatus/Recycle/ErrorResponse) + FrameReader/
|
||
FrameWriter + 24 round-trip tests.
|
||
2. **PR B (#170) — Host project skeleton** ✅
|
||
`Driver.FOCAS.Host` net48 x86 Windows Service entry point,
|
||
`PipeAcl` + `PipeServer` + `IFrameHandler` + `StubFrameHandler`.
|
||
ACL denies LocalSystem/Administrators; Hello verifies
|
||
shared-secret + protocol major. 3 handshake tests.
|
||
3. **PR C (#171) — IPC path end-to-end** ✅
|
||
Proxy `Ipc/FocasIpcClient` + `Ipc/IpcFocasClient` (implements
|
||
IFocasClient via IPC). Host `Backend/IFocasBackend` +
|
||
`FakeFocasBackend` + `UnconfiguredFocasBackend` +
|
||
`Ipc/FwlibFrameHandler` replacing the stub. 13 new round-trip
|
||
tests via in-memory loopback.
|
||
4. **PR D (#172) — Supervisor + respawn** ✅
|
||
`Supervisor/Backoff` (5s→15s→60s) + `CircuitBreaker` (3-in-5min →
|
||
1h→4h→manual) + `HeartbeatMonitor` + `IHostProcessLauncher` +
|
||
`FocasHostSupervisor`. 14 tests.
|
||
5. **PR E — Ops glue** ✅ (this PR)
|
||
`ProcessHostLauncher` (real Process.Start + FocasIpcClient
|
||
connect), `Host/Stability/PostMortemMmf` (magic 'OFPC') +
|
||
Proxy `Supervisor/PostMortemReader`, `scripts/install/
|
||
Install-FocasHost.ps1` + `Uninstall-FocasHost.ps1` NSSM wrappers.
|
||
7 tests (4 MMF round-trip + 3 reader format compatibility).
|
||
|
||
**Post-shipment totals: 189 FOCAS driver tests + 24 Shared tests + 13 Host tests = 226 FOCAS-family tests green.**
|
||
|
||
What remains is hardware-dependent: wiring `Fwlib32.dll` P/Invoke
|
||
into a real `FwlibHostedBackend` implementation of `IFocasBackend`
|
||
+ validating against a live CNC. The architecture is all the
|
||
plumbing that work needs.
|
||
|
||
## Testing without hardware
|
||
|
||
Same constraint as today: no CNC, no simulator. The isolation work
|
||
itself is verifiable without Fwlib32 actually being called:
|
||
|
||
- **Pipe contract**: PR A's MessagePack round-trip tests cover every
|
||
DTO.
|
||
- **Supervisor**: PR D uses a `FakeFocasHost` stub that can be told
|
||
to crash, hang, or miss heartbeats. The supervisor's respawn +
|
||
circuit-breaker behaviour is fully testable against the stub.
|
||
- **IPC ACL + auth**: reuse the Galaxy.Host's existing test harness
|
||
pattern — negative tests attempt to connect as the wrong user and
|
||
assert rejection.
|
||
- **Fwlib32 integration itself**: still untestable without hardware.
|
||
When a real CNC becomes available, the smoke tests already
|
||
scaffolded in `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
|
||
run against it via `FOCAS_ENDPOINT`.
|
||
|
||
## Decisions to confirm before starting
|
||
|
||
- **Sharing transport code with Galaxy.Host** — should the pipe
|
||
server + ACL + shared-secret + MMF plumbing go into a common
|
||
`Core.Hosting.Tier-C` project both hosts reference? Probably yes;
|
||
deferred until PR B is drafted because the right abstraction only
|
||
becomes visible after two uses.
|
||
- **Handle-recycling cadence** — Fwlib32 session handles leak
|
||
memory over weeks per the Fanuc-published defect list. Galaxy
|
||
recycles MXAccess handles on a 24h timer; FOCAS should mirror but
|
||
the trigger point (idle vs scheduled) needs operator input.
|
||
- **Per-CNC Host process vs one Host serving N CNCs** — one-per-CNC
|
||
isolates blast radius but scales poorly past ~20 machines; shared
|
||
Host scales but one bad CNC can wedge the lot. Start with shared
|
||
Host + document the blast-radius trade; revisit if operators hit
|
||
it.
|
||
|
||
## Non-goals
|
||
|
||
- Simulator work. `open_focas` + other OSS FOCAS simulators are
|
||
untested + not maintained; not worth chasing vs. waiting for real
|
||
hardware.
|
||
- Changing the public `FocasDriverOptions` shape beyond what
|
||
already shipped (the `Series` knob). Operator config continues to
|
||
look the same after the split — the Tier-C topology is invisible
|
||
from `appsettings.json`.
|
||
- Historian / long-term history integration. FOCAS driver doesn't
|
||
implement `IHistoryProvider` + there's no plan to add it.
|
||
|
||
## References
|
||
|
||
- [`docs/v2/implementation/phase-2-galaxy-out-of-process.md`](phase-2-galaxy-out-of-process.md)
|
||
— the working Tier-C template this plan follows.
|
||
- [`docs/drivers/FOCAS-Test-Fixture.md`](../../drivers/FOCAS-Test-Fixture.md)
|
||
— what's covered today + what stays blocked on hardware.
|
||
- [`docs/v2/focas-version-matrix.md`](../focas-version-matrix.md) —
|
||
the capability matrix that pre-flights configs before IPC runs.
|