feat(focas): real FANUC 30i/31i-B PDU-v3 support (live-validated on a 31i-B)

First real FOCAS hardware contact (Makino Pro 5 / 31i-B @ 10.201.31.5). A full
v3 data-PDU capture corrected the initial diagnosis: the v3 block envelope is
identical to v1, so only specific payload structs / request math / one client
robustness gap were wrong — not "framing rewrites".

Fixes (all re-validated live through the fixed driver):
- version gate: accept inbound PDU {1,3}, keep emitting v1 (FocasWireProtocol).
- cnc_rdtimer: 8-byte {minute,msec} payload is little-endian (ParseTimer) — the
  only decode with an in-range msec field.
- pmc_rdpmcrng: request range widened to the data-type byte width
  (end = start + width - 1) so a Word/Long isn't truncated to 0 values
  (was spurious BadOutOfRange); decode extracted to ParsePmcRange.
- cnc_rdsvmeter: per-axis LOADELM is 8 bytes (not 12) and names come from the
  0x0089 block — ParseServoMeters fixes the misaligned 655360 garbage. Also the
  "hang" was NetworkStream.ReadAsync not aborting a stalled socket: ReadExactlyAsync
  now disposes the stream on cancellation so a stalled peer can't wedge a poll loop.
- cnc_rddynamic2: contract guard rejecting axis < 1 (driver poll already 1-based).
- FocasDriverProbe: run a real wire session (initiate + cnc_statinfo) instead of
  degrading to Ok=true "TCP reachability only" when FWLIB is absent — a bare TCP
  listener no longer reports HEALTHY.

cnc_rdparam (0x000e) is unsupported on this control — EW_FUNC across 14
request-framing variants x 4 known-present params; needs a reference FWLIB trace
or is restricted. Deferred (deployed config uses macros, not parameters).

Tests: FOCAS suite 234 green (+16), full solution builds 0 errors. Raw v3
captures checked in under tests/.../Fixtures/v3/. Capture tools under scripts/focas/.

Docs: docs/plans/2026-06-25-focas-pdu-v3-{30i-b-support,implementation-plan}.md,
docs/drivers/FOCAS.md, docs/v2/focas-version-matrix.md,
docs/deployments/wonder-app-vd03-makino-z-34184.md.
This commit is contained in:
Joseph Doherty
2026-06-25 16:41:42 -04:00
parent fd01448ac4
commit 5f0a52864c
36 changed files with 1567 additions and 177 deletions
@@ -0,0 +1,222 @@
# FOCAS wire-protocol PDU v3 — finding + support analysis (Makino Pro 5 / FANUC 30i-B)
**Date:** 2026-06-25
**Author:** field investigation against a live CNC (first real FOCAS hardware contact)
**Status:** **RESOLVED + live-validated on a real 31i-B (`10.201.31.5`) 2026-06-25** — version gate, timer, PMC range, servo-meter, alarms, probe all fixed and verified live; `cnc_rdparam` found unsupported on this control (see Resolution). See `2026-06-25-focas-pdu-v3-implementation-plan.md` for the per-phase record.
**Components:** `ZB.MOM.WW.OtOpcUa.Driver.FOCAS` (`Wire/FocasWireProtocol.cs`, `Wire/FocasWireClient.cs`)
---
## TL;DR
The pure-managed `WireFocasClient` only implements **FOCAS Ethernet wire-protocol PDU version 1**
and hard-rejects every other version. A real **Makino Pro 5 (FANUC 30i-B)** at `10.201.31.5:8193`
speaks **PDU version 3**, so the driver fails the session with
`BadCommunicationError` / "Unsupported FOCAS PDU version 3" and no tag ever produces a value.
A live experiment (relaxing the version gate to accept v3) showed the **v3 initiate handshake and the
macro-read data framing are already compatible** — `MACRO:*` reads returned correct values from the
real machine. **PMC** and **Parameter** reads still failed and need per-command v3 work. So v3 support
is *not* a rewrite, but it is also *not* a one-line version bump if PMC/PARAM/FixedTree are required.
This was the **first time the driver has ever been pointed at real FOCAS hardware** — the wire
implementation was written from `strangesast/fwlib` + public docs with an explicit "needs Wireshark
traces to validate" caveat (see `docs/v2/implementation/focas-wire-protocol.md`). This finding is
those traces.
---
## Resolution (2026-06-25 — implemented + live-validated)
A full v3 data-PDU capture (`scripts/focas/capture-v3.py`, fixtures under
`tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/Fixtures/v3/`) **substantially corrected the
initial diagnosis** — 4 of the 6 "v3 framing failures" were not framing problems at all. The block
envelope is byte-identical to v1; only specific payload structs / request-range math / a client-side
robustness gap were wrong. Every fix below was re-validated by reading the live 31i-B through the
fixed driver.
| Surface | Root cause (from the capture) | Fix | Live result |
|---|---|---|---|
| version gate | hard `version != 1` reject | accept `{1,3}` inbound (`FocasWireProtocol`) | macros + all reads work on v3 |
| `cnc_rdtimer` | 8-byte {minute,msec} payload is **little-endian** (only decode with an in-range msec) | `FocasWireClient.ParseTimer` reads LE | cutting = 1,110,700 min / 41,872 ms |
| `pmc_rdpmcrng` | request asked for `end=start` (1 byte) but a Word needs its byte width → 0 values → spurious `BadOutOfRange` | `WireFocasClient.ReadPmcAsync` sets `end = start + width 1`; decode extracted to `ParsePmcRange` | R0 = 7873, R100 = 0, status Good |
| `cnc_rdsvmeter` | (a) **no wire hang** — CNC answers fully + promptly; the "hang" was `NetworkStream.ReadAsync` not aborting a genuinely stalled socket. (b) per-axis LOADELM is **8 bytes**, not 12 → 12-byte stride misaligned (→ 655360 garbage); names live in the 0x0089 block | (a) `ReadExactlyAsync` dispose-on-cancel abort. (b) `ParseServoMeters` 8-byte stride + name correlation | 7 axes X,Y,Z,B,C,AA,AA, aligned values (≈0 idle) |
| `cnc_rdalmmsg2` | not broken — empty payload = no active alarms | none (parser already handles empty) | returned active alarm `#3080 WRONG PALLET IN MACHINE` |
| `cnc_rddynamic2` axis 0 | not a driver bug — the FixedTree poll already iterates 1..N; only a direct harness call used 0 | contract guard in `ReadDynamicAsync` (reject `axisIndex < 1`) | axes 1..N read clean |
| Test-Connect probe | degraded to `Ok=true` "TCP reachability only" when FWLIB absent → any TCP listener looked HEALTHY | `FocasDriverProbe` now runs a real wire session (initiate + `cnc_statinfo`) | `Ok=true` vs real CNC, `Ok=false` vs bare listener |
### `cnc_rdparam` — unsupported on this control (blocked)
The one genuine v3 problem. A live matrix of **14 request-framing variants × 4 known-present
parameters** (8130 / 1320 / 1825 / 3201) — every combination of arg ordering, axis, length,
request-class, and extra payload (`scripts/focas/param-probe.py`) — returned **`EW_FUNC(1)`
uniformly**. That is not a tweakable-framing bug. `0x000e` is also the ignored post-connect setup
command, which makes it a doubtful parameter opcode. Either parameter read is genuinely restricted on
this control via the wire path, or the v3 command id differs from `0x000e` and cannot be recovered
without a reference FWLIB Wireshark trace (the long-blocked "Stream C.2"). Parameter support is parked
on that reference; the deployed config uses macros, not parameters, so nothing live depends on it.
### Open caveats
- **Servo-load magnitude/scaling** (`data / 10^dec`; `dec` read as 10 → idle loads ≈ 0) is inferred
from the wire and unconfirmed against the machine's servo-meter screen — confirm at commissioning.
- **Timer type→counter mapping**: power-on / operating / cycle read 0 while cutting is non-zero on
this control. The *decode* is correct; whether type 0/1/3 map to populated counters here is a
CNC-configuration question for commissioning.
## Environment
- **CNC:** Makino Pro 5, FANUC 30i-B control, `10.201.31.5:8193` (ZTag `Z-34184`). FOCAS Ethernet
reachable (TCP 8193 open from both the OtOpcUa host `wonder-app-vd03` and a dev laptop over VPN).
- **Driver backend:** `wire` (the default and now only real backend — `fwlib`/`ipc` were retired in
the Wire migration; see `FocasDriverFactoryExtensions.BuildClientFactory`). FANUC FWLIB is NOT
installed on the host, and is not used for reads.
- **Deployment using this CNC:** see `docs/deployments/wonder-app-vd03-makino-z-34184.md`.
## Symptom
Equipment tags backed by this device never leave OPC UA `Bad_WaitingForInitialData` (`0x80320000`),
and the FOCAS FixedTree emits no nodes (capability detection against the CNC never succeeds).
## Reproduction (driver CLI, straight to the CNC — no OPC UA server involved)
```bash
dotnet run --project src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli -- \
probe -h 10.201.31.5 -s Thirty_i --timeout-ms 6000 --verbose
```
```
CNC: 10.201.31.5:8193
Series: Thirty_i
Health: Degraded
Last error: Unsupported FOCAS PDU version 3.
R100 → 0x80050000 (BadCommunicationError)
```
## Root cause (exact)
`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/Wire/FocasWireProtocol.cs`
- Line 22: `public const ushort Version = 1;`
- `ReadPduAsync` (~line 102) and `ReadPdu` (~line 125): `if (version != Version) throw new FocasWireException($"Unsupported FOCAS PDU version {version}.");`
The 10-byte PDU header is `A0 A0 A0 A0` magic + `u16` version + type byte + direction byte + `u16`
body length. The client emits version 1 and **rejects any response whose version field != 1**. The
30i-B answers the initiate with version 3.
## What the wire actually shows (captured live, read-only initiate handshake)
Request the driver sends (socket-1 initiate, exactly as today):
```
a0 a0 a0 a0 00 01 01 01 00 02 00 01
^magic------ ^v=1 ^t ^d ^len ^sockIdx=1
```
Response from the real 30i-B:
```
a0 a0 a0 a0 00 03 01 02 01 68 + 360-byte body
^magic------ ^v=3 ^t ^d ^len=0x0168(360)
body[0..]: 00 08 00 05 00 03 00 20 00 08 00 0a 00 03 00 03 00 0f 00 0f 00 01 00 02 00 01 00 00 00 01 02 00 ...
```
Key observations:
- **The 10-byte header framing is byte-identical to v1** — same magic, same `type=0x01` (initiate),
same `direction=0x02` (response). **Only the version field differs (3 vs 1).**
- The initiate-response **body is 360 bytes** (v3 carries a larger capability/version descriptor block;
it parses as a run of big-endian `u16` words). The current client doesn't deeply parse the initiate
body, so its size/shape did not block the handshake once the version gate was relaxed.
- **No version negotiation:** sending version 3 in our *request* header produced the identical response —
the CNC speaks v3 unconditionally.
(Capture is reproducible with `scripts/focas/capture-initiate.py <host>`.)
## Implemented + validated — accept v3 on inbound PDUs
Shipped change (`Wire/FocasWireProtocol.cs`): the hard `version != Version` reject is replaced by a
supported-read-version set `{1, 3}` (`SupportedReadVersions` + `IsSupportedReadVersion`). We still
**emit** `Version` (v1) on requests — the 30i-B accepts v1 request framing — and now **accept** v1 or v3
on inbound PDUs. Covered by `FocasWireProtocolTests.ReadPduAsync_accepts_supported_version` (v1 + v3
theory) with the existing v99-rejection test still green; full FOCAS unit suite 218/218.
Validated live against `10.201.31.5` (30i-B) with the change in source — reading several addresses:
| Address | Type | Result |
|---|---|---|
| `MACRO:500` | Float64 | **0.02 — Good (`0x0`)** |
| `MACRO:3901` (parts total) | Float64 | **0 — Good (`0x0`)** |
| `MACRO:3902` (parts required) | Float64 | Good |
| `R100` (PMC R-file) | Int16 | `0x803C0000` BadOutOfRange |
| `PARAM:1320/0` | Int32 | `0x803D0000` BadNotSupported |
Interpretation:
- **Initiate handshake + macro command (`cnc_rdmacro`, cmd path) data framing are already v3-compatible.**
Macro reads returned correct, plausible values from the real machine. The deployed equipment tags
(`MACRO:3901`/`3902`) would go **Good** with nothing more than the version-gate relaxation.
- **PMC (`pmc_rdpmcrng`) and Parameter (`cnc_rdparam`) reads still fail.** Two candidate causes, not yet
separated: (a) the v3 response *block/struct* framing for these specific commands differs from the
v1-shaped parser (so the return-code/value lands at the wrong offset → spurious `BadOutOfRange` /
`BadNotSupported`); or (b) genuine CNC restrictions (PMC path/range, parameter not present). Macro
working argues the *envelope* is fine, so this is per-command struct work, not a framing rewrite.
## Status-command validation on v3 (the FixedTree surface) — 2026-06-25
Drove every `IFocasClient` status call directly against the live control (v3 accepted). **Most of the
FixedTree lights up on v3.** Note: sysinfo reveals the control is actually a **31i** (CncType=31, Series
`G431`, MaxAxis 32, 7 axes, MtType MM) — the deployment declared `Thirty_i`; same family, reads fine.
| Call (FOCAS fn) | Result on v3 |
|---|---|
| `GetSysInfoAsync` (`cnc_sysinfo`) → Identity | ✅ real — CncType 31, Series G431, 7 axes, MtType MM |
| `GetAxisNamesAsync` (`cnc_rdaxisname`) | ✅ real — X,Y,Z,B,C,A,A |
| `GetSpindleNamesAsync` (`cnc_rdspdlname`) | ✅ real — S1 |
| `GetProgramInfoAsync` (program/mode) | ✅ real — `//CNC_MEM/USER/LIBRARY/O1111`, Mode 1 |
| `ReadDynamicAsync(n)` (`cnc_rddynamic2`) | ✅ real for axes 1..N (feed 4200, spindle ~15000, live positions). **axis 0 → `EW_4`** — the call is 1-based; FixedTree must iterate 1..N, not 0 |
| `GetTimerAsync(*)` | ⚠️ **misparsed** — a running machine shows PowerOn/Operating/Cycle = 0 and Cutting = garbage; the v3 timer struct differs |
| `GetServoLoadsAsync` (`cnc_rdsvmeter`) | ❌ **hangs** — blocks awaiting bytes that never arrive (v3 framing differs) *and* ignores the cancellation token (poll-loop-stalling robustness bug; the read must honor CT regardless) |
| `ReadAlarmsAsync` (`cnc_rdalmmsg2`) | ❓ untested — ServoLoads hung ahead of it; validate once the hang is fixed |
Remaining v3 work, now scoped concretely: timer struct; 1-based axis iteration for dynamic; the
`cnc_rdsvmeter` framing + a cancellation-honoring read; and the PMC (`pmc_rdpmcrng`) + Parameter
(`cnc_rdparam`) struct diffs. Identity / axes / positions / feed / spindle / program-mode already work.
## Implementation analysis — what "support PDU v3" actually involves
1. **Accept v3 at the framing layer (cheap, validated). — ✅ DONE.** Replaced the `version != Version`
hard reject with the supported-set `{1, 3}`. This alone makes the **initiate handshake + all macro
reads** work on a real 30i-B (validated live; deployed `MACRO:3901`/`3902` read Good). We still emit
v1 on requests (the CNC accepts it). If a future command turns out to need the request version echoed,
thread the negotiated version from the initiate response onto the connection.
2. **Validate each command family against v3 response framing.** Capture v3 `0x21` data-PDU responses
for `cnc_rdparam`, `pmc_rdpmcrng`, `cnc_statinfo`, `cnc_rddynamic2`, `cnc_rdaxisname`, and the timer
reads (the FixedTree set), and diff the block/struct offsets vs the v1 assumptions in
`FocasWireModels.cs` / the `ParseX` helpers. Where they differ, add v3 parsing. Capture by extending
`scripts/focas/capture-initiate.py` to complete the handshake and issue one data request per command.
3. **FixedTree depends on (2).** Identity/Axes/Timers/Program nodes only emit if `cnc_sysinfo` +
`cnc_rdaxisname` + dynamic/timer reads succeed at discovery — so they come online once `cnc_statinfo`
/ `cnc_rddynamic2` / timer framing is v3-validated.
4. **Don't let the gate lie.** Shipping only step 1 makes the driver accept v3 while PMC/PARAM/FixedTree
silently misbehave. Either gate macro-only configs as "supported on v3" with the others explicitly
flagged, or land steps 13 together.
### Alternative considered: reinstate an FWLIB-backed client
The official FANUC FWLIB (`Fwlib64.dll`) handles all protocol versions natively. But the `fwlib`/`ipc`
backends were deliberately retired in the Wire migration (native Windows component, x86/x64 + STA, and
licensing — the exact coupling the managed client removed). Reintroducing it reverses that decision and
is heavier than completing v3 in the wire client; recommend only if multiple controls need surfaces the
managed client can't reach.
## Secondary finding — the Test-Connect / health probe is misleading without FWLIB
`FocasDriverProbe` Phase 2 (the real `cnc_allclibhndl3` FWLIB handshake) **catches the FWLIB-absent load
failure and degrades to `Ok=true` ("TCP reachability only")**. On a host with no FWLIB (the normal case
for the managed wire client), the driver therefore reports **HEALTHY off a bare TCP connect** — which is
exactly how this CNC looked "healthy" while no data flowed. The probe should exercise the wire-client
path (open a `WireFocasClient` session + one sample read) so health reflects real FOCAS reachability,
not just an open socket.
## Recommended next steps
1. Land step 1 (accept v3) + capture/validate PMC + Parameter + FixedTree command framing (step 2),
ideally in one change, tested against `10.201.31.5` while access lasts.
2. Fix the probe to use the wire client so HEALTHY means "FOCAS session + read OK," not "TCP open."
3. Add a real-hardware row to `docs/v2/focas-version-matrix.md` (currently hardware-free) recording that
30i-B = PDU v3, macro reads validated.