Files
lmxopcua/code-reviews/Driver.FOCAS/findings.md
T
Joseph Doherty e07a4fbf52 review(Driver.FOCAS): add byte-level wire-protocol test coverage
Re-review at 7286d320. -013 (Medium, testing): the managed FOCAS/2 wire-decode layer
(BuildPdu/ParseResponseBlocks, incl. cnc_getfigure stride) had zero byte-level tests; added
15 (no decode bug found). -014 (spindle-load truncation heuristic) deferred bench-gated.
Note: runtime read path is now pure-managed TCP (no P/Invoke except the probe handshake).
2026-06-19 11:47:11 -04:00

26 KiB

Code Review — Driver.FOCAS

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS
Reviewer Claude Code
Review date 2026-06-19
Commit reviewed 04e0877b (re-review; prior 76d35d1)
Status Reviewed
Open findings 0

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Driver.FOCAS-001, Driver.FOCAS-002, Driver.FOCAS-003
2 OtOpcUa conventions Driver.FOCAS-004
3 Concurrency & thread safety Driver.FOCAS-005
4 Error handling & resilience Driver.FOCAS-006, Driver.FOCAS-007
5 Security No issues found
6 Performance & resource management Driver.FOCAS-008
7 Design-document adherence Driver.FOCAS-009
8 Code organization & conventions Driver.FOCAS-010, Driver.FOCAS-011
9 Testing coverage Driver.FOCAS-012
10 Documentation & comments No issues found

Findings

Driver.FOCAS-001

Field Value
Severity High
Category Correctness & logic bugs
Location FocasDriverFactoryExtensions.cs:54-86, FocasDriverFactoryExtensions.cs:132-140
Status Resolved

Description: FocasDriverConfigDto exposes only Backend, Series, TimeoutMs, Devices, Tags, and Probe. It has no FixedTree, AlarmProjection, or HandleRecycle properties, and CreateInstance never sets those three options on FocasDriverOptions. As a result, a deployment that follows the documented config - docs/drivers/FOCAS.md shows "FixedTree": { "Enabled": true }, "AlarmProjection": { "Enabled": true }, and "HandleRecycle": { "Enabled": true } inside Config - is parsed with PropertyNameCaseInsensitive and the unknown sections are discarded. The features stay at their hard-coded defaults (all Enabled = false). The fixed-node tree never appears, alarm subscriptions throw NotSupportedException ("FOCAS alarm projection is disabled"), and handle recycling never runs - despite the operator explicitly opting in.

Recommendation: Add FixedTree, AlarmProjection, and HandleRecycle DTO classes to FocasDriverConfigDto, parse their TimeSpan/bool fields, and populate the corresponding FocasDriverOptions properties in CreateInstance. Consider enabling strict JSON handling (UnmappedMemberHandling.Disallow) so future unknown config sections fail loudly instead of being dropped.

Resolution: Resolved 2026-05-22 — added FixedTreeDto/AlarmProjectionDto/HandleRecycleDto to FocasDriverConfigDto and Build* mappers in CreateInstance that populate the matching FocasDriverOptions properties (missing section / field keeps its default).

Driver.FOCAS-002

Field Value
Severity High
Category Correctness & logic bugs
Location WireFocasClient.cs:164-179, FocasDriver.cs:513, FocasDriver.cs:593
Status Resolved

Description: The fixed-tree bootstrap probes the ProgramInfo capability via SafeTryProbe(() => client.GetProgramInfoAsync(ct)) and treats a non-null result as "supported". But WireFocasClient.GetProgramInfoAsync never throws on a FOCAS error return code: ReadExecutingProgramNameAsync, ReadBlockCountAsync, and ReadOperationModeCodeAsync all return FocasResult<T> envelopes, and the method substitutes defaults (string.Empty, 0) when IsOk is false instead of throwing. It only throws from RequireConnected(). Consequently GetProgramInfoAsync always returns a non-null FocasProgramInfo, so Capabilities.ProgramInfo is set true even on a CNC series that returns EW_FUNC/EW_NOOPT for cnc_exeprgname2/cnc_rdopmode. The driver then emits the Program/ and OperationMode/ subtrees and polls them every tick against a controller that does not support them - the exact "nodes that only ever return BadDeviceFailure" outcome the capability suppression was designed to prevent (docs/drivers/FOCAS.md, "Per-series node suppression").

Recommendation: Make GetProgramInfoAsync throw (or return a nullable result) when the underlying cnc_exeprgname2 / cnc_rdopmode calls report a non-zero RC, so SafeTryProbe can correctly classify the series. At minimum require the program-name or op-mode read to be IsOk before declaring the capability present.

Resolution: Resolved 2026-05-22 — WireFocasClient.GetProgramInfoAsync now throws InvalidOperationException when neither the cnc_exeprgname2 nor the cnc_rdopmode read is IsOk, so SafeTryProbe records ProgramInfo as unsupported on series that answer EW_FUNC/EW_NOOPT.

Driver.FOCAS-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location FocasDriver.cs:71-79
Status Resolved

Description: In InitializeAsync, capability-matrix validation only runs when _devices.TryGetValue(tag.DeviceHostAddress, out var device) succeeds. A tag whose DeviceHostAddress does not match any configured device (a common config typo, e.g. a trailing :8193 mismatch or a wrong host) silently skips validation and is still added to _tagsByName. The mistake is not surfaced at load time - it only manifests at read time as BadNodeIdUnknown (ReadAsync lines 191-194), defeating the documented goal that "config errors now fail at load instead of per-read" (docs/v2/focas-version-matrix.md).

Recommendation: After parsing the tag address, if _devices does not contain tag.DeviceHostAddress, throw an InvalidOperationException naming the tag and the unresolved device host so the operator fixes the typo at startup.

Resolution: Resolved 2026-05-22 — InitializeAsync now throws InvalidOperationException naming the tag and the unresolved device when _devices does not contain tag.DeviceHostAddress, preventing silent skip-and-defer to per-read BadNodeIdUnknown.

Driver.FOCAS-004

Field Value
Severity Medium
Category OtOpcUa conventions
Location FocasDriver.cs:374-379, WireFocasClient.cs:48-50
Status Resolved

Description: DiscoverAsync emits user tags with SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly, and FocasTagDefinition.Writable defaults to true (also defaulted to true in the factory - t.Writable ?? true). But the production wire backend's WireFocasClient.WriteAsync unconditionally returns FocasStatusMapper.BadNotWritable

  • the driver is read-only against FOCAS by design (docs/drivers/FOCAS.md). The result is that every tag is advertised in the address space as a writable Operate node, yet every write attempt fails. This is misleading to OPC UA clients and to the DriverNodeManager ACL layer, which will grant write permission on nodes that can never be written.

Recommendation: Either default Writable to false for the FOCAS driver, or have DiscoverAsync force SecurityClassification.ViewOnly when the active backend cannot write. Given the wire backend is read-only and is the only production backend, treating all FOCAS tags as ViewOnly is the simplest correct behaviour.

Resolution: Resolved 2026-05-22 — DiscoverAsync now unconditionally emits SecurityClassification.ViewOnly for all user-authored tags; the Writable config field no longer influences the advertised security class since the wire backend never writes.

Driver.FOCAS-005

Field Value
Severity Medium
Category Concurrency & thread safety
Location FocasDriver.cs:28, FocasDriver.cs:206-215, FocasDriver.cs:261, FocasDriver.cs:274
Status Resolved

Description: _health is a plain (non-volatile) field mutated from multiple concurrent contexts - ReadAsync, WriteAsync, and the per-device ProbeLoopAsync can all run on different threads simultaneously (subscriptions go through PollGroupEngine timers; probe loops are Task.Run). Several updates are read-modify-write - e.g. new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ...) reads _health then writes a new instance - so a concurrent update can be lost or a stale LastSuccessfulRead propagated. While DriverHealth is an immutable record and the reference write is atomic, the lack of synchronization means GetHealth() can observe torn-in-time state and successful-read timestamps can regress.

Recommendation: Guard _health reads/writes with a lock, or use Interlocked.Exchange/Volatile around the whole record reference and compute the new value from a single captured snapshot. The DeviceState/HostState transition already uses ProbeLock; apply the same discipline to driver health.

Resolution: Resolved 2026-05-22 — All _health reads use Volatile.Read(ref _health) and all writes use Volatile.Write(ref _health, ...), ensuring every thread observes the latest reference and multi-step read-modify-write sequences capture a stable snapshot before computing the new value.

Driver.FOCAS-006

Field Value
Severity Medium
Category Error handling & resilience
Location FocasDriver.cs:859-874, WireFocasClient.cs:22-31
Status Resolved

Description: EnsureConnectedAsync reuses the cached IFocasClient instance across a transient disconnect: it only checks device.Client is { IsConnected: true } and otherwise calls ConnectAsync again on the same object. For a WireFocasClient whose underlying FocasWireClient has been disposed (e.g. via a HandleRecycle / DisposeClient race, or a prior teardown), every subsequent call hits FocasWireClient.ThrowIfDisposed and throws ObjectDisposedException. In ReadAsync that exception is caught only by the generic catch (Exception ex) and mapped to a permanent BadCommunicationError - the device stays wedged with no recovery path until ReinitializeAsync is invoked, because the reconnect logic never discards the disposed client.

Recommendation: On any connect/use failure, treat a disposed or non-connected client as unrecoverable and recreate it from _clientFactory. Simplest: in EnsureConnectedAsync, when device.Client is non-null but not connected, dispose and null it before creating a fresh instance, rather than retrying ConnectAsync on the stale object.

Resolution: Resolved 2026-05-22 — EnsureConnectedAsync now unconditionally disposes and nulls any existing non-connected client before calling _clientFactory.Create(), preventing ObjectDisposedException loops on a stale WireFocasClient after a HandleRecycle race or prior teardown.

Driver.FOCAS-007

Field Value
Severity Low
Category Error handling & resilience
Location FocasDriver.cs:140-148, FocasDriver.cs:478-484, FocasDriver.cs:529-533, FocasAlarmProjection.cs:61-63
Status Resolved

Description: Numerous try { ... } catch {} blocks swallow every exception with no logging - ShutdownAsync (CTS cancel/dispose), RecycleLoopAsync (DisposeClient), FixedTreeLoopAsync transient catches, ProbeLoopAsync, and the alarm projection's sub.Cts.Cancel(). The driver takes no ILogger dependency at all (only FocasWireClient optionally accepts one, and the driver never supplies it). A CNC that is silently failing every probe/poll tick produces no diagnostic trail, which conflicts with the project's Serilog logging convention and forces field troubleshooting to rely solely on GetHealth().

Recommendation: Inject an ILogger<FocasDriver> and log caught exceptions in the poll/probe/recycle loops at Debug/Warning. Pass a logger into FocasWireClient so the per-response Debug entries it already emits are actually captured.

Resolution: Resolved 2026-05-23 — FocasDriver now takes an optional ILogger<FocasDriver> (defaulting to NullLogger) and every previously-empty catch { } in ShutdownAsync / ProbeLoopAsync / FixedTreeLoopAsync / RecycleLoopAsync / ReadActiveAlarmsAcrossDevicesAsync now logs at Debug with the host address + context. FocasAlarmProjection also accepts an optional ILogger (forwarded by the driver) so its unsubscribe / dispose / per-tick poll swallows log. WireFocasClientFactory gained a logger-accepting overload that threads through to FocasWireClient, so its per-response Debug entries actually reach the host pipeline.

Driver.FOCAS-008

Field Value
Severity Low
Category Performance & resource management
Location FocasDriver.cs:201, FocasDriver.cs:253
Status Resolved

Description: ReadAsync and WriteAsync call FocasAddress.TryParse(def.Address) on every operation, even though InitializeAsync already parsed and validated every tag address. On a subscription hot path (each poll tick re-enters ReadAsync) this re-parses and allocates a FocasAddress record per tag per tick unnecessarily.

Recommendation: Parse each tag address once at InitializeAsync and store the parsed FocasAddress on FocasTagDefinition (or in a side dictionary), so the runtime read/write paths use the cached value.

Resolution: Resolved 2026-05-23 — FocasDriver now holds a _parsedAddressesByTagName side dictionary populated at InitializeAsync. ReadAsync and WriteAsync look up the cached FocasAddress instance; the defensive fallback TryParse only fires if a tag was somehow not seeded. The cache is cleared on ShutdownAsync. Regression test ReadAsync_uses_cached_FocasAddress_when_tag_definition_has_a_malformed_address_after_init (and the matching WriteAsync variant) asserts the same FocasAddress instance is reused across calls.

Driver.FOCAS-009

Field Value
Severity Low
Category Design-document adherence
Location FocasDriverOptions.cs:110-115, FocasDriver.cs:468-486, FocasDriverFactoryExtensions.cs:75-80
Status Resolved

Description: FocasProbeOptions.Timeout is parsed by the factory (FocasProbeDto.TimeoutMs to FocasProbeOptions.Timeout) but never consumed. ProbeLoopAsync calls client.ProbeAsync(ct) with only the probe-loop cancellation token; no per-probe timeout is applied, and EnsureConnectedAsync uses _options.Timeout rather than Probe.Timeout. A hung CNC socket during a probe blocks until the OS TCP timeout rather than the configured Probe.Timeout.

Recommendation: Apply Probe.Timeout as a linked CancellationTokenSource timeout around the ProbeAsync call, or remove the dead Timeout field from FocasProbeOptions / FocasProbeDto if it is genuinely not intended.

Resolution: Resolved 2026-05-23 — FocasDriver.ProbeLoopAsync now wraps client.ProbeAsync in a linked CancellationTokenSource that fires after Probe.Timeout (skipped when the timeout is <= TimeSpan.Zero). On timeout the loop logs the cancellation at Debug and surfaces it as a failed probe, so a hung CNC socket transitions the host to Stopped at the configured budget instead of blocking on the OS TCP timeout. Regression test ProbeLoop_cancels_a_slow_ProbeAsync_at_Probe_Timeout asserts the cancellation reaches the fake ProbeAsync within the configured 100 ms.

Driver.FOCAS-010

Field Value
Severity Low
Category Code organization & conventions
Location IFocasClient.cs:210-227 (FocasOpMode), FocasConstants.cs:42-78 (FocasOperationMode)
Status Resolved

Description: There are two parallel operation-mode-to-text mappings with divergent labels. FocasOpMode.ToText (used by the driver fixed-tree OperationMode/ModeText node) yields "TJOG", "TEACH_IN_HANDLE"; FocasOperationModeExtensions.ToText (in the Wire layer) yields "T-JOG", "TEACH-IN-HANDLE". They also use different fallback formats (Mode{mode} vs the bare number). The same concept is encoded twice with inconsistent results depending on which path renders it.

Recommendation: Consolidate to a single op-mode enum + ToText helper shared by both the wire layer and the driver projection, with one canonical label set.

Resolution: Resolved 2026-05-23 — FocasOperationModeExtensions.ToText now delegates to FocasOpMode.ToText((short)mode), so the wire layer and the driver fixed-tree projection render identical labels. FocasOpMode keeps its existing labels (TJOG, TEACH_IN_HANDLE, Mode{n} fallback), which are now the single canonical surface. Regression theory OpMode_ToText_yields_the_same_label_in_both_namespaces cross-checks every defined code; OpMode_ToText_fallback_label_is_consistent covers the unknown-code path.

Driver.FOCAS-011

Field Value
Severity Low
Category Code organization & conventions
Location IFocasClient.cs:275-287 (FocasAlarmType), FocasAlarmProjection.cs:149-175
Status Resolved

Description: FocasAlarmType declares its constants as public const int, but the only consumers - FocasAlarmProjection.MapAlarmType(short type) and MapSeverity(short type) - take a short and switch against these int constants. It compiles only because the values (0..13) fit in short range as constant expressions. The type mismatch is a latent maintenance hazard: adding a constant above short.MaxValue, or changing the projection signatures, would break the switch in non-obvious ways. FocasAlarmType.All is -1 and is also passed where a short is expected by ReadAlarmsAsync.

Recommendation: Declare the FocasAlarmType constants as short (or make it an enum : short) so the type matches the wire field width and the projection signatures.

Resolution: Resolved 2026-05-23 — every FocasAlarmType constant (All, Parameter, PulseCode, Overtravel, Overheat, Servo, DataIo, MemoryCheck, MacroAlarm) is now typed short, matching the wire field width on cnc_rdalmmsg2 and the switch (short type) arms in FocasAlarmProjection.MapAlarmType / MapSeverity. Regression test FocasAlarmType_constants_are_typed_short uses reflection to guarantee the type is preserved against future drift.

Driver.FOCAS-012

Field Value
Severity Medium
Category Testing coverage
Location FocasDriverFactoryExtensions.cs, FocasDriver.cs:495-629 (FixedTreeLoopAsync)
Status Resolved

Description: The unit test project does not exercise FocasDriverFactoryExtensions.CreateInstance with FixedTree / AlarmProjection / HandleRecycle config sections - which is why the config-mapping gap in Driver.FOCAS-001 was not caught. There is also no test that drives the fixed-tree bootstrap / capability-probe path (FixedTreeLoopAsync), so the false-positive ProgramInfo capability in Driver.FOCAS-002 is untested, and the EnsureConnectedAsync reconnect-after-disconnect path (Driver.FOCAS-006) has no coverage.

Recommendation: Add factory tests that round-trip a full JSON config including the three opt-in sections and assert the options reach the driver; add a FakeFocasClient-driven test for fixed-tree bootstrap capability classification (including the unsupported-program-info case); add a reconnect test that disposes the fake client mid-session and asserts recovery.

Resolution: Resolved 2026-05-22 — Added FocasDriverMediumFindingsTests.cs covering: unknown-DeviceHostAddress init throw (003), ViewOnly enforcement for all tags (004), Volatile _health under concurrent reads (005), reconnect-after-external-dispose recovery (006), and a factory full-round-trip test for all three opt-in config sections (012).

Re-review 2026-06-19 (commit 04e0877b)

Re-review of the FOCAS driver at HEAD (04e0877b, which descends from 7286d320). Since the prior review at 76d35d1 the driver gained ~1,200 lines, most notably the pure-managed FOCAS/2 Ethernet wire backend (Wire/FocasWireClient.cs, Wire/FocasWireProtocol.cs, Wire/WireFocasClient.cs, Wire/FocasConstants.cs) and the cnc_getfigure per-axis position auto-scale path (ReadPositionFiguresAsync + AxisFactor). NB: there is no P/Invoke in the runtime read path — the wire backend speaks the Fanuc binary protocol directly over TCP. The only P/Invoke is the cnc_allclibhndl3 / cnc_freelibhndl Test-Connect handshake in FocasDriverProbe.cs (marshalling reviewed below).

All 12 prior findings remain Resolved. Two new findings recorded below.

# Category Result
1 Correctness & logic bugs Driver.FOCAS-014 (deferred) — otherwise no issues found
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety No new issues found (see note)
4 Error handling & resilience No issues found
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions No issues found
9 Testing coverage Driver.FOCAS-013
10 Documentation & comments No issues found

Interop / marshalling note (P/Invoke probe). FocasDriverProbe.NativeFwlib declares cnc_allclibhndl3([MarshalAs(LPStr)] string ipaddr, ushort port, int timeout, out ushort handle) with CallingConvention.Cdecl + CharSet.Ansi. This matches the published FWLIB signature (short cnc_allclibhndl3(const char*, unsigned short, long, unsigned short*)). The out ushort handle is freed best-effort in a finally even on a timeout race, and the whole call is wrapped so a DllNotFoundException / load failure degrades to TCP-only — sound. The one residual subtlety (FWLIB long is 32-bit on Win32 → int is correct on the typical x86 worker; on an LP64 Linux libfwlib32.so the C long is 64-bit and int would mis-marshal) is bench-gated (no FWLIB on this dev/CI host) and is recorded as context only, not a separate finding, because the project ships no Linux-FWLIB path today.

Concurrency note (not a new finding). FocasDriver.DeviceState.Client is a plain auto-property read/written without a lock from EnsureConnectedAsync (Read/Write/probe/ fixed-tree threads) and RecycleLoopAsync/DisposeClient. A HandleRecycle-enabled config can therefore race a dispose against an in-flight read's captured client reference. This is the same lifecycle class as the already-Resolved Driver.FOCAS-006; HandleRecycle is disabled by default and the wire client's own _lifetimeGate/_requestGate plus the "reconnect on next call" recovery make the window self-correcting. Left as-is (no new finding) — hardening it to a per-device lock is a larger change with no offline-observable defect.

Driver.FOCAS-013

Field Value
Severity Medium
Category Testing coverage
Location Wire/FocasWireProtocol.cs, Wire/FocasWireClient.cs (ReadPositionFiguresAsync, ParseAlarms, name/sysinfo decode)
Status Resolved

Description: The entire managed wire-protocol decode layer — the byte-offset-fragile code that is this driver's analogue of P/Invoke struct marshalling — has zero unit tests. No test in Driver.FOCAS.Tests references FocasWireProtocol, FocasWireClient, ParseResponseBlocks, BuildPdu/BuildRequestBody, ReadAscii/ReadNameRecord, or the new cnc_getfigure figure decode (ReadPositionFiguresAsync, added since the prior review). Every existing test drives the IFocasClient seam through FakeFocasClient, which bypasses all wire framing and big-endian decode. A regression in a block-envelope offset, a PDU length field, the ASCII NUL/space trimming, or the figure short stride would not be caught by any test — exactly the corruption class the review brief calls out for marshalling code.

Recommendation: Add offline (no-socket) unit tests for the public/internal static decode primitives: BuildPdu header layout + ReadPduAsync round-trip; BuildRequestBody block count/stride; ParseResponseBlocks (command/RC/payload extraction, truncation guards); ReadAscii (NUL stop + trailing-space/NUL trim) and ReadNameRecord; and a ResponseBlock-shaped payload that exercises the cnc_getfigure figure short stride.

Resolution: Resolved 2026-06-19 — Added FocasWireProtocolTests.cs (15 tests) covering BuildPdu header layout + magic/version/length, the BuildPduReadPdu round-trip, BuildRequestBody block framing, ParseResponseBlocks (multi-block command/RC/payload extraction + the truncated-length and bad-block-length guards), ReadAscii NUL-stop and trailing space/NUL trimming, ReadNameRecord 2-byte extraction, and a response-block payload decoded into the per-axis short figure sequence that the cnc_getfigure path relies on.

Driver.FOCAS-014

Field Value
Severity Low
Category Correctness & logic bugs
Location Wire/WireFocasClient.cs:318-325 (ReadSpindleMetricAsync trailing-zero truncation)
Status Deferred

Description: WireFocasClient.ReadSpindleMetricAsync (the shared decode for GetSpindleLoadsAsync / GetSpindleMaxRpmsAsync) stops accumulating at the first zero value after a non-zero one: if (m.Value == 0 && list.Count > 0) break;. The intent (per the inline comment) is to drop Fanuc's trailing zero-padding of unused spindle slots. But a spindle that is legitimately at 0% load (stopped) sitting between two running spindles truncates the list at that point, dropping every subsequent spindle. Because the consumer in FocasDriver.FixedTreeLoopAsync index-aligns the returned list to the spindle order (state.LastSpindleLoads[i] = loads[i]) and the read path looks up Load by that index, a dropped middle element misaligns all later spindles' Load values. The truncation is correct only if the CNC always returns a contiguous run of active spindles followed by zero padding — which holds for single-spindle and trailing-stopped layouts but not for a stopped middle spindle.

Recommendation: Distinguish "trailing padding" from "a real zero in the middle". Either read the actual spindle count from cnc_rdspdlname / sysinfo and decode exactly that many fixed-width slots (no zero-based truncation), or keep zeros and trim only a trailing run of zeros after decoding the full payload.

Resolution: Deferred — the correct slot-count/padding semantics of cnc_rdspload / cnc_rdspmaxrpm depend on the real Fanuc wire response shape, which is bench-CNC-gated (this managed backend's binary shapes are validated only against the in-tree focas_mock sim, per the Wire/FocasWireClient.cs remarks). Fixing the truncation heuristic without a real CNC's padding behaviour risks substituting one wrong assumption for another. Waiting on a bench-CNC verification pass (the same gate that covers the cnc_getfigure binary shape).