Files
lmxopcua/docs/plans/2026-06-25-focas-pdu-v3-30i-b-support.md
T
Joseph Doherty 5f0a52864c feat(focas): real FANUC 30i/31i-B PDU-v3 support (live-validated on a 31i-B)
First real FOCAS hardware contact (Makino Pro 5 / 31i-B @ 10.201.31.5). A full
v3 data-PDU capture corrected the initial diagnosis: the v3 block envelope is
identical to v1, so only specific payload structs / request math / one client
robustness gap were wrong — not "framing rewrites".

Fixes (all re-validated live through the fixed driver):
- version gate: accept inbound PDU {1,3}, keep emitting v1 (FocasWireProtocol).
- cnc_rdtimer: 8-byte {minute,msec} payload is little-endian (ParseTimer) — the
  only decode with an in-range msec field.
- pmc_rdpmcrng: request range widened to the data-type byte width
  (end = start + width - 1) so a Word/Long isn't truncated to 0 values
  (was spurious BadOutOfRange); decode extracted to ParsePmcRange.
- cnc_rdsvmeter: per-axis LOADELM is 8 bytes (not 12) and names come from the
  0x0089 block — ParseServoMeters fixes the misaligned 655360 garbage. Also the
  "hang" was NetworkStream.ReadAsync not aborting a stalled socket: ReadExactlyAsync
  now disposes the stream on cancellation so a stalled peer can't wedge a poll loop.
- cnc_rddynamic2: contract guard rejecting axis < 1 (driver poll already 1-based).
- FocasDriverProbe: run a real wire session (initiate + cnc_statinfo) instead of
  degrading to Ok=true "TCP reachability only" when FWLIB is absent — a bare TCP
  listener no longer reports HEALTHY.

cnc_rdparam (0x000e) is unsupported on this control — EW_FUNC across 14
request-framing variants x 4 known-present params; needs a reference FWLIB trace
or is restricted. Deferred (deployed config uses macros, not parameters).

Tests: FOCAS suite 234 green (+16), full solution builds 0 errors. Raw v3
captures checked in under tests/.../Fixtures/v3/. Capture tools under scripts/focas/.

Docs: docs/plans/2026-06-25-focas-pdu-v3-{30i-b-support,implementation-plan}.md,
docs/drivers/FOCAS.md, docs/v2/focas-version-matrix.md,
docs/deployments/wonder-app-vd03-makino-z-34184.md.
2026-06-25 16:41:42 -04:00

15 KiB
Raw Blame History

FOCAS wire-protocol PDU v3 — finding + support analysis (Makino Pro 5 / FANUC 30i-B)

Date: 2026-06-25 Author: field investigation against a live CNC (first real FOCAS hardware contact) Status: RESOLVED + live-validated on a real 31i-B (10.201.31.5) 2026-06-25 — version gate, timer, PMC range, servo-meter, alarms, probe all fixed and verified live; cnc_rdparam found unsupported on this control (see Resolution). See 2026-06-25-focas-pdu-v3-implementation-plan.md for the per-phase record. Components: ZB.MOM.WW.OtOpcUa.Driver.FOCAS (Wire/FocasWireProtocol.cs, Wire/FocasWireClient.cs)


TL;DR

The pure-managed WireFocasClient only implements FOCAS Ethernet wire-protocol PDU version 1 and hard-rejects every other version. A real Makino Pro 5 (FANUC 30i-B) at 10.201.31.5:8193 speaks PDU version 3, so the driver fails the session with BadCommunicationError / "Unsupported FOCAS PDU version 3" and no tag ever produces a value.

A live experiment (relaxing the version gate to accept v3) showed the v3 initiate handshake and the macro-read data framing are already compatibleMACRO:* reads returned correct values from the real machine. PMC and Parameter reads still failed and need per-command v3 work. So v3 support is not a rewrite, but it is also not a one-line version bump if PMC/PARAM/FixedTree are required.

This was the first time the driver has ever been pointed at real FOCAS hardware — the wire implementation was written from strangesast/fwlib + public docs with an explicit "needs Wireshark traces to validate" caveat (see docs/v2/implementation/focas-wire-protocol.md). This finding is those traces.


Resolution (2026-06-25 — implemented + live-validated)

A full v3 data-PDU capture (scripts/focas/capture-v3.py, fixtures under tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/Fixtures/v3/) substantially corrected the initial diagnosis — 4 of the 6 "v3 framing failures" were not framing problems at all. The block envelope is byte-identical to v1; only specific payload structs / request-range math / a client-side robustness gap were wrong. Every fix below was re-validated by reading the live 31i-B through the fixed driver.

Surface Root cause (from the capture) Fix Live result
version gate hard version != 1 reject accept {1,3} inbound (FocasWireProtocol) macros + all reads work on v3
cnc_rdtimer 8-byte {minute,msec} payload is little-endian (only decode with an in-range msec) FocasWireClient.ParseTimer reads LE cutting = 1,110,700 min / 41,872 ms
pmc_rdpmcrng request asked for end=start (1 byte) but a Word needs its byte width → 0 values → spurious BadOutOfRange WireFocasClient.ReadPmcAsync sets end = start + width 1; decode extracted to ParsePmcRange R0 = 7873, R100 = 0, status Good
cnc_rdsvmeter (a) no wire hang — CNC answers fully + promptly; the "hang" was NetworkStream.ReadAsync not aborting a genuinely stalled socket. (b) per-axis LOADELM is 8 bytes, not 12 → 12-byte stride misaligned (→ 655360 garbage); names live in the 0x0089 block (a) ReadExactlyAsync dispose-on-cancel abort. (b) ParseServoMeters 8-byte stride + name correlation 7 axes X,Y,Z,B,C,AA,AA, aligned values (≈0 idle)
cnc_rdalmmsg2 not broken — empty payload = no active alarms none (parser already handles empty) returned active alarm #3080 WRONG PALLET IN MACHINE
cnc_rddynamic2 axis 0 not a driver bug — the FixedTree poll already iterates 1..N; only a direct harness call used 0 contract guard in ReadDynamicAsync (reject axisIndex < 1) axes 1..N read clean
Test-Connect probe degraded to Ok=true "TCP reachability only" when FWLIB absent → any TCP listener looked HEALTHY FocasDriverProbe now runs a real wire session (initiate + cnc_statinfo) Ok=true vs real CNC, Ok=false vs bare listener

cnc_rdparam — unsupported on this control (blocked)

The one genuine v3 problem. A live matrix of 14 request-framing variants × 4 known-present parameters (8130 / 1320 / 1825 / 3201) — every combination of arg ordering, axis, length, request-class, and extra payload (scripts/focas/param-probe.py) — returned EW_FUNC(1) uniformly. That is not a tweakable-framing bug. 0x000e is also the ignored post-connect setup command, which makes it a doubtful parameter opcode. Either parameter read is genuinely restricted on this control via the wire path, or the v3 command id differs from 0x000e and cannot be recovered without a reference FWLIB Wireshark trace (the long-blocked "Stream C.2"). Parameter support is parked on that reference; the deployed config uses macros, not parameters, so nothing live depends on it.

Open caveats

  • Servo-load magnitude/scaling (data / 10^dec; dec read as 10 → idle loads ≈ 0) is inferred from the wire and unconfirmed against the machine's servo-meter screen — confirm at commissioning.
  • Timer type→counter mapping: power-on / operating / cycle read 0 while cutting is non-zero on this control. The decode is correct; whether type 0/1/3 map to populated counters here is a CNC-configuration question for commissioning.

Environment

  • CNC: Makino Pro 5, FANUC 30i-B control, 10.201.31.5:8193 (ZTag Z-34184). FOCAS Ethernet reachable (TCP 8193 open from both the OtOpcUa host wonder-app-vd03 and a dev laptop over VPN).
  • Driver backend: wire (the default and now only real backend — fwlib/ipc were retired in the Wire migration; see FocasDriverFactoryExtensions.BuildClientFactory). FANUC FWLIB is NOT installed on the host, and is not used for reads.
  • Deployment using this CNC: see docs/deployments/wonder-app-vd03-makino-z-34184.md.

Symptom

Equipment tags backed by this device never leave OPC UA Bad_WaitingForInitialData (0x80320000), and the FOCAS FixedTree emits no nodes (capability detection against the CNC never succeeds).

Reproduction (driver CLI, straight to the CNC — no OPC UA server involved)

dotnet run --project src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli -- \
  probe -h 10.201.31.5 -s Thirty_i --timeout-ms 6000 --verbose
CNC:        10.201.31.5:8193
Series:     Thirty_i
Health:     Degraded
Last error: Unsupported FOCAS PDU version 3.
R100 →      0x80050000 (BadCommunicationError)

Root cause (exact)

src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/Wire/FocasWireProtocol.cs

  • Line 22: public const ushort Version = 1;
  • ReadPduAsync (~line 102) and ReadPdu (~line 125): if (version != Version) throw new FocasWireException($"Unsupported FOCAS PDU version {version}.");

The 10-byte PDU header is A0 A0 A0 A0 magic + u16 version + type byte + direction byte + u16 body length. The client emits version 1 and rejects any response whose version field != 1. The 30i-B answers the initiate with version 3.

What the wire actually shows (captured live, read-only initiate handshake)

Request the driver sends (socket-1 initiate, exactly as today):

a0 a0 a0 a0 00 01 01 01 00 02 00 01
^magic------ ^v=1  ^t ^d ^len  ^sockIdx=1

Response from the real 30i-B:

a0 a0 a0 a0 00 03 01 02 01 68  + 360-byte body
^magic------ ^v=3  ^t ^d ^len=0x0168(360)
body[0..]: 00 08 00 05 00 03 00 20 00 08 00 0a 00 03 00 03 00 0f 00 0f 00 01 00 02 00 01 00 00 00 01 02 00 ...

Key observations:

  • The 10-byte header framing is byte-identical to v1 — same magic, same type=0x01 (initiate), same direction=0x02 (response). Only the version field differs (3 vs 1).
  • The initiate-response body is 360 bytes (v3 carries a larger capability/version descriptor block; it parses as a run of big-endian u16 words). The current client doesn't deeply parse the initiate body, so its size/shape did not block the handshake once the version gate was relaxed.
  • No version negotiation: sending version 3 in our request header produced the identical response — the CNC speaks v3 unconditionally.

(Capture is reproducible with scripts/focas/capture-initiate.py <host>.)

Implemented + validated — accept v3 on inbound PDUs

Shipped change (Wire/FocasWireProtocol.cs): the hard version != Version reject is replaced by a supported-read-version set {1, 3} (SupportedReadVersions + IsSupportedReadVersion). We still emit Version (v1) on requests — the 30i-B accepts v1 request framing — and now accept v1 or v3 on inbound PDUs. Covered by FocasWireProtocolTests.ReadPduAsync_accepts_supported_version (v1 + v3 theory) with the existing v99-rejection test still green; full FOCAS unit suite 218/218.

Validated live against 10.201.31.5 (30i-B) with the change in source — reading several addresses:

Address Type Result
MACRO:500 Float64 0.02 — Good (0x0)
MACRO:3901 (parts total) Float64 0 — Good (0x0)
MACRO:3902 (parts required) Float64 Good
R100 (PMC R-file) Int16 0x803C0000 BadOutOfRange
PARAM:1320/0 Int32 0x803D0000 BadNotSupported

Interpretation:

  • Initiate handshake + macro command (cnc_rdmacro, cmd path) data framing are already v3-compatible. Macro reads returned correct, plausible values from the real machine. The deployed equipment tags (MACRO:3901/3902) would go Good with nothing more than the version-gate relaxation.
  • PMC (pmc_rdpmcrng) and Parameter (cnc_rdparam) reads still fail. Two candidate causes, not yet separated: (a) the v3 response block/struct framing for these specific commands differs from the v1-shaped parser (so the return-code/value lands at the wrong offset → spurious BadOutOfRange / BadNotSupported); or (b) genuine CNC restrictions (PMC path/range, parameter not present). Macro working argues the envelope is fine, so this is per-command struct work, not a framing rewrite.

Status-command validation on v3 (the FixedTree surface) — 2026-06-25

Drove every IFocasClient status call directly against the live control (v3 accepted). Most of the FixedTree lights up on v3. Note: sysinfo reveals the control is actually a 31i (CncType=31, Series G431, MaxAxis 32, 7 axes, MtType MM) — the deployment declared Thirty_i; same family, reads fine.

Call (FOCAS fn) Result on v3
GetSysInfoAsync (cnc_sysinfo) → Identity real — CncType 31, Series G431, 7 axes, MtType MM
GetAxisNamesAsync (cnc_rdaxisname) real — X,Y,Z,B,C,A,A
GetSpindleNamesAsync (cnc_rdspdlname) real — S1
GetProgramInfoAsync (program/mode) real — //CNC_MEM/USER/LIBRARY/O1111, Mode 1
ReadDynamicAsync(n) (cnc_rddynamic2) real for axes 1..N (feed 4200, spindle ~15000, live positions). axis 0 → EW_4 — the call is 1-based; FixedTree must iterate 1..N, not 0
GetTimerAsync(*) ⚠️ misparsed — a running machine shows PowerOn/Operating/Cycle = 0 and Cutting = garbage; the v3 timer struct differs
GetServoLoadsAsync (cnc_rdsvmeter) hangs — blocks awaiting bytes that never arrive (v3 framing differs) and ignores the cancellation token (poll-loop-stalling robustness bug; the read must honor CT regardless)
ReadAlarmsAsync (cnc_rdalmmsg2) untested — ServoLoads hung ahead of it; validate once the hang is fixed

Remaining v3 work, now scoped concretely: timer struct; 1-based axis iteration for dynamic; the cnc_rdsvmeter framing + a cancellation-honoring read; and the PMC (pmc_rdpmcrng) + Parameter (cnc_rdparam) struct diffs. Identity / axes / positions / feed / spindle / program-mode already work.

Implementation analysis — what "support PDU v3" actually involves

  1. Accept v3 at the framing layer (cheap, validated). — DONE. Replaced the version != Version hard reject with the supported-set {1, 3}. This alone makes the initiate handshake + all macro reads work on a real 30i-B (validated live; deployed MACRO:3901/3902 read Good). We still emit v1 on requests (the CNC accepts it). If a future command turns out to need the request version echoed, thread the negotiated version from the initiate response onto the connection.
  2. Validate each command family against v3 response framing. Capture v3 0x21 data-PDU responses for cnc_rdparam, pmc_rdpmcrng, cnc_statinfo, cnc_rddynamic2, cnc_rdaxisname, and the timer reads (the FixedTree set), and diff the block/struct offsets vs the v1 assumptions in FocasWireModels.cs / the ParseX helpers. Where they differ, add v3 parsing. Capture by extending scripts/focas/capture-initiate.py to complete the handshake and issue one data request per command.
  3. FixedTree depends on (2). Identity/Axes/Timers/Program nodes only emit if cnc_sysinfo + cnc_rdaxisname + dynamic/timer reads succeed at discovery — so they come online once cnc_statinfo / cnc_rddynamic2 / timer framing is v3-validated.
  4. Don't let the gate lie. Shipping only step 1 makes the driver accept v3 while PMC/PARAM/FixedTree silently misbehave. Either gate macro-only configs as "supported on v3" with the others explicitly flagged, or land steps 13 together.

Alternative considered: reinstate an FWLIB-backed client

The official FANUC FWLIB (Fwlib64.dll) handles all protocol versions natively. But the fwlib/ipc backends were deliberately retired in the Wire migration (native Windows component, x86/x64 + STA, and licensing — the exact coupling the managed client removed). Reintroducing it reverses that decision and is heavier than completing v3 in the wire client; recommend only if multiple controls need surfaces the managed client can't reach.

Secondary finding — the Test-Connect / health probe is misleading without FWLIB

FocasDriverProbe Phase 2 (the real cnc_allclibhndl3 FWLIB handshake) catches the FWLIB-absent load failure and degrades to Ok=true ("TCP reachability only"). On a host with no FWLIB (the normal case for the managed wire client), the driver therefore reports HEALTHY off a bare TCP connect — which is exactly how this CNC looked "healthy" while no data flowed. The probe should exercise the wire-client path (open a WireFocasClient session + one sample read) so health reflects real FOCAS reachability, not just an open socket.

  1. Land step 1 (accept v3) + capture/validate PMC + Parameter + FixedTree command framing (step 2), ideally in one change, tested against 10.201.31.5 while access lasts.
  2. Fix the probe to use the wire client so HEALTHY means "FOCAS session + read OK," not "TCP open."
  3. Add a real-hardware row to docs/v2/focas-version-matrix.md (currently hardware-free) recording that 30i-B = PDU v3, macro reads validated.