Files
lmxopcua/code-reviews/Driver.Modbus/findings.md
T
Joseph Doherty 6853a0430f review(Driver.Modbus): validate FC03 RMW response + correct write error mapping
Re-review at 7286d320. Modbus-013 (Low): bit RMW now routes the FC03 read through the
validated ReadRegisterBlockAsync (was raw-indexing readResp -> IndexOutOfRange on a truncated
PDU). Modbus-014 (Low): WriteAsync maps InvalidDataException to BadCommunicationError (was
BadInternalError), matching ReadAsync. + TDD.
2026-06-19 11:34:34 -04:00

23 KiB

Code Review — Driver.Modbus

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus
Reviewer Claude Code
Review date 2026-06-19
Commit reviewed 7286d320
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Driver.Modbus-002, Driver.Modbus-005, Driver.Modbus-009
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety Driver.Modbus-001, Driver.Modbus-003
4 Error handling & resilience Driver.Modbus-006, Driver.Modbus-010
5 Security No issues found
6 Performance & resource management Driver.Modbus-004
7 Design-document adherence Driver.Modbus-007
8 Code organization & conventions Driver.Modbus-011
9 Testing coverage Driver.Modbus-012
10 Documentation & comments Driver.Modbus-008

Findings

Driver.Modbus-001

Field Value
Severity High
Category Concurrency & thread safety
Location ModbusDriver.cs:92,99-122
Status Resolved

Description: _lastPublishedByRef is a plain Dictionary<string, object> mutated inside ShouldPublish, which runs on the PollGroupEngine.onChange callback. PollGroupEngine runs one background Task per subscription (PollGroupEngine.cs:64), so a driver with two or more subscriptions invokes onChange — and therefore ShouldPublish — concurrently on separate threads. ShouldPublish does TryGetValue and indexer writes on the unsynchronized dictionary (ModbusDriver.cs:108, 112, 120). Concurrent reads/writes of a non-thread-safe Dictionary can corrupt internal state, drop entries, or throw IndexOutOfRangeException/InvalidOperationException, crashing the poll loop. The sibling cache _lastWrittenByRef is correctly guarded by _lastWrittenLock — only the deadband cache was left unprotected.

Recommendation: Guard _lastPublishedByRef with a dedicated lock around every access in ShouldPublish, or switch it to ConcurrentDictionary<string, object> and use AddOrUpdate/TryGetValue.

Resolution: Resolved 2026-05-22 — switched _lastPublishedByRef to ConcurrentDictionary<string, object> so the TryGetValue/indexer-write accesses in ShouldPublish are thread-safe under concurrent multi-subscription onChange callbacks; added a concurrent-deadband-subscription regression test.

Driver.Modbus-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location ModbusDriver.cs:127-186
Status Resolved

Description: ShutdownAsync never clears _tagsByName, and InitializeAsync repopulates it with _tagsByName[t.Name] = t (ModbusDriver.cs:134) without clearing first. ReinitializeAsync calls ShutdownAsync then InitializeAsync. Because _options.Tags is fixed for a driver instance, the same set re-inserts harmlessly today — but the asymmetry is a latent bug: any future path that re-runs init with a different tag set leaves stale tag entries that resolve reads/writes against deleted nodes. _lastPublishedByRef and _lastWrittenByRef similarly survive a Reinitialize, retaining deadband/write-suppression baselines against the old config, while _autoProhibited is deliberately cleared (ModbusDriver.cs:179) — the inconsistency shows the clearing was simply overlooked.

Recommendation: Clear _tagsByName, _lastPublishedByRef, and _lastWrittenByRef in ShutdownAsync (or at the top of InitializeAsync) so a Reinitialize starts from a clean state, consistent with the existing _autoProhibited.Clear().

Resolution: Resolved 2026-05-22 — added _tagsByName.Clear(), _lastPublishedByRef.Clear(), and _lastWrittenByRef.Clear() to ShutdownAsync (via the new shared TeardownAsync helper) so a ReinitializeAsync cycle always starts from a clean state, consistent with the existing _autoProhibited.Clear().

Driver.Modbus-003

Field Value
Severity Low
Category Concurrency & thread safety
Location ModbusDriver.cs:59,188,241,259,266,726,745,759
Status Resolved

Description: _health is a non-volatile reference field written from multiple threads (concurrent ReadAsync callers, the coalesced-read path, WriteAsync indirectly, and ProbeLoopAsync) and read by GetHealth(). Reference assignment is atomic on .NET so a torn read cannot occur, but there is no happens-before ordering: a stale DriverHealth can be observed on another core, and concurrent writers race so "last write wins" is non-deterministic (a Degraded write from a failed read can clobber a just-published Healthy, or vice versa).

Recommendation: Mark _health volatile, or assign via Volatile.Write and read with Volatile.Read, to give GetHealth() a defined ordering guarantee.

Resolution: Resolved 2026-05-23 — routed every _health access through new ReadHealth() (Volatile.Read) and WriteHealth() (Volatile.Write) helpers, giving GetHealth() a defined ordering guarantee on every core. Stress-test (ModbusLifecycleHygieneTests.GetHealth_under_concurrent_pressure_always_returns_a_complete_snapshot) confirms the read path never sees a torn / half-constructed snapshot under concurrent reader + writer pressure.

Driver.Modbus-004

Field Value
Severity Medium
Category Performance & resource management
Location ModbusDriver.cs:1468-1473
Status Resolved

Description: DisposeAsync() only disposes _transport. Unlike ShutdownAsync, it does not cancel/dispose _probeCts or _reprobeCts, nor dispose _poll (the PollGroupEngine). A caller that uses await using or using without first calling ShutdownAsync leaks the probe loop, the re-probe loop, and every active polled subscription background Task/CancellationTokenSource. The two Task.Run loops keep running against a disposed transport, throwing on every tick. Dispose() (sync) has the same gap and additionally blocks on the async path via GetAwaiter().GetResult().

Recommendation: Make DisposeAsync perform the same teardown as ShutdownAsync (cancel both CTSs, dispose them, dispose _poll) before disposing _transport. Have ShutdownAsync and DisposeAsync share a private TeardownAsync helper.

Resolution: Resolved 2026-05-22 — refactored teardown into a shared TeardownAsync helper; DisposeAsync now delegates to it, cancelling both CTS objects, disposing _poll, and disposing _transport — matching ShutdownAsync and eliminating the probe/re-probe/poll-engine leak on await using callers.

Driver.Modbus-005

Field Value
Severity Medium
Category Correctness & logic bugs
Location ModbusDriver.cs:777-798,323-330
Status Resolved

Description: ReadRegisterBlockAsync and ReadBitBlockAsync index resp[1] and call Buffer.BlockCopy(resp, 2, ..., resp[1]) with no bounds validation. ModbusTcpTransport.SendOnceAsync validates only the MBAP length field and the exception high-bit — it does not guarantee a non-exception response PDU is long enough to hold function-code + byte-count + the claimed data. A device (or buggy server) returning a 1-byte PDU, or a byte-count larger than the actual payload, produces an IndexOutOfRangeException/ArgumentException rather than a clean comms error. DecodeBitArray similarly indexes bitmap[0] (ModbusDriver.cs:325) without checking the bitmap is non-empty. In ReadAsync these are caught by the catch-all and mapped to BadCommunicationError, so impact is limited; in ReadCoalescedAsync the exception is opaque to the narrower catch arms.

Recommendation: In ReadRegisterBlockAsync/ReadBitBlockAsync, validate resp.Length >= 2 and resp.Length >= 2 + resp[1] before slicing, throwing a descriptive InvalidDataException. Validate the decoded byte/bit count matches the request quantity.

Resolution: Resolved 2026-05-22 — added resp.Length >= 2, resp.Length >= 2 + resp[1], and byte-count-vs-quantity checks in both ReadRegisterBlockAsync and ReadBitBlockAsync, throwing InvalidDataException with precise diagnostics; added an empty-bitmap guard in DecodeBitArray.

Driver.Modbus-006

Field Value
Severity Medium
Category Error handling & resilience
Location ModbusDriver.cs:514-524,532-550
Status Resolved

Description: RunReprobeOnceForTestAsync reads _transport once at the top (var transport = _transport ?? throw ...). If ShutdownAsync runs (setting _transport = null and disposing it) while a re-probe pass is mid-iteration, the loop keeps issuing reads against the captured, disposed transport. ReprobeLoopAsync only catches OperationCanceledException when (ct.IsCancellationRequested) — an ObjectDisposedException from the disposed transport escapes RunReprobeOnceForTestAsync and faults the fire-and-forget background Task, silently killing the re-probe loop with the wrong failure mode.

Recommendation: Re-check _transport/cancellation inside the per-candidate loop, or broaden the ReprobeLoopAsync catch to also swallow ObjectDisposedException when ct.IsCancellationRequested.

Resolution: Resolved 2026-05-22 — broadened ReprobeLoopAsync to catch ObjectDisposedException when (ct.IsCancellationRequested) and return cleanly, so a transport disposal race during shutdown exits the background task rather than faulting it.

Driver.Modbus-007

Field Value
Severity Low
Category Design-document adherence
Location ModbusDriver.cs:1392, ModbusDriverOptions.cs:74-80
Status Resolved

Description: Two design-vs-code drifts. (1) MapDataType maps Int64/UInt64 to DriverDataType.Int32 with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is declared Int32, misrepresenting the OPC UA variable's DataType even though DecodeRegister produces a correct long/ulong value — clients see a type/value mismatch. (2) DisableFC23 is documented and bound from JSON but is a confirmed no-op ("The driver does not currently emit FC23"). Both are acknowledged-but-unfinished items worth tracking.

Recommendation: Track the PR 25 DriverDataType.Int64 follow-up; until then document the Int32 surfacing limitation in docs/v2/modbus-addressing.md so operators configuring I_64/UI_64 tags understand the node type. Mark DisableFC23 clearly as reserved/unimplemented or gate it once FC23 ships.

Resolution: Resolved 2026-05-23 — promoted the inline Int64/UInt64 caveat into a full <remarks> block on MapDataType calling out the surfacing limitation and the tracked follow-up, and rewrote the DisableFC23 XML doc to flag the option as "Reserved / no-op" with a Driver.Modbus-007 tracking reference. (The cross-module doc update in docs/v2/modbus-addressing.md is out of scope for this module's edits — code is now self-documenting.)

Driver.Modbus-008

Field Value
Severity Low
Category Documentation & comments
Location ModbusDriver.cs:411-417,700-703,737-744
Status Resolved

Description: Stale/misleading comments. (1) The <summary> block at ModbusDriver.cs:411-417 says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, ReprobeLoopAsync), so the parenthetical is wrong. (2) The comment at ModbusDriver.cs:700-703 ("On block-level failure mark every member Bad — caller's per-tag fallback won't re-try since handled-set already includes them; auto-split-on-failure is a follow-up") contradicts the actual catch (ModbusException) arm below it, which deliberately does not add members to handled and does defer to per-tag fallback (and auto-split has shipped via bisection). The empty foreach (var (idx, _) in block.Members) { } loop at ModbusDriver.cs:737-744, with only a comment body, is dead code from that superseded design.

Recommendation: Update the two comments to match the shipped #148/#150/#151 behaviour and delete the empty foreach loop in the catch (ModbusException) arm.

Resolution: Resolved 2026-05-23 — deleted the duplicate <summary> orphaned at the top of the prohibition block, rewrote the surviving one to credit the shipped #151 re-probe loop, replaced the "auto-split-on-failure is a follow-up" comment above the block loop with the actual #148/#150 behaviour (per-tag fallback + bisection), and removed the empty foreach (var (idx, _) in block.Members) { } plus its unused status local from the catch (ModbusException) arm.

Driver.Modbus-009

Field Value
Severity Low
Category Correctness & logic bugs
Location ModbusDriver.cs:1160-1167, ModbusTcpTransport.cs:94-95
Status Resolved

Description: Two edge cases. (1) RegisterCount for ModbusDataType.String computes (tag.StringLength + 1) / 2; a tag configured with StringLength = 0 yields a register count of 0, flowing into ReadOneAsync as totalRegs = 0 and producing an FC03/FC04 with quantity 0 — a spec-illegal request the PLC rejects with exception 03. The factory does not reject StringLength = 0 for String tags. (2) EnableKeepAlive casts opts.Time.TotalSeconds/opts.Interval.TotalSeconds to int; a sub-second configured TimeSpan (e.g. 500 ms) truncates to 0, which most OSes reject or interpret as "use default", silently defeating the configured keep-alive timing.

Recommendation: Validate StringLength >= 1 for String tags in ModbusDriverFactoryExtensions.BuildTag. For keep-alive, round up to a minimum of 1 second or validate the configured TimeSpan is a whole number of seconds.

Resolution: Resolved 2026-05-23 — added ValidateStringLength in ModbusDriverFactoryExtensions.BuildTag so String-typed tags with StringLength < 1 throw at bind time with a clear diagnostic (both AddressString + structured DTO paths), and introduced ModbusTcpTransport.ClampToWholeSeconds which rounds the configured keep-alive TimeSpan up to a minimum of 1 second so sub-second values no longer truncate to 0. Regression coverage in ModbusEdgeCaseValidationTests.

Driver.Modbus-010

Field Value
Severity Low
Category Error handling & resilience
Location ModbusDriver.cs:864-868, ModbusDriverOptions.cs:116-125
Status Resolved

Description: When WriteOnChangeOnly is enabled and IsRedundantWrite returns true, WriteAsync returns WriteResult(0u) (Good) without touching the wire. The suppression baseline (_lastWrittenByRef) is only invalidated by a read that returns a divergent value. If a driver instance has WriteOnChangeOnly = true but a tag is never subscribed/read (write-only setpoint), a value the operator believes was re-asserted is silently suppressed forever after the first write — no time- or count-based expiry exists. The option XML doc describes the read-invalidation path but does not warn about write-only tags.

Recommendation: Document the write-only-tag caveat on the WriteOnChangeOnly option, or add an optional TTL to the suppression cache so a periodic re-write still reaches the PLC.

Resolution: Resolved 2026-05-23 — added a <remarks> block on ModbusDriverOptions.WriteOnChangeOnly that calls out the write-only-tag caveat explicitly: the cache is only invalidated by reads, so a tag that is never subscribed/polled stays suppressed forever after the first write. Operators choosing this option are directed to either subscribe every affected tag or leave WriteOnChangeOnly = false. Adding a TTL was considered but the safer option for an OPC UA driver is to make the limitation discoverable in the documentation surface (no behaviour change for existing deployments).

Driver.Modbus-011

Field Value
Severity Low
Category Code organization & conventions
Location ModbusDriver.cs:23-43,89-97,408-432
Status Resolved

Description: Field and member declarations are interleaved with methods throughout ModbusDriver. ResolveHost (a public method) is the first member of the class, followed by BuildSlaveHostName, then a block of fields; _lastPublishedByRef/_lastWrittenByRef are declared after the constructor; ProhibitionState, _autoProhibited, and _reprobeCts are declared mid-file between DecodeRegisterArray and RangeIsAutoProhibited. There are also two near-identical <summary> blocks stacked back-to-back at ModbusDriver.cs:411-423. This hurts readability of a 1400-line file and makes the field inventory hard to audit (relevant to the thread-safety findings above).

Recommendation: Group all instance fields at the top of the class, move nested types together, and remove the orphaned first <summary> at lines 411-417 that no longer precedes a member.

Resolution: Resolved 2026-05-23 — reorganized ModbusDriver so every instance field (including the _autoProhibited / _autoProhibitedLock / _reprobeCts / _rmwLocks / _lastPublishedByRef / _lastWrittenByRef fields that were declared mid-file) lives in a single contiguous block at the top of the class, followed by the ProhibitionState nested type, the constructor, and then methods. Removed the duplicate orphan <summary> and the now-redundant field declarations that had been scattered through the file. The full 263-test suite passes with no behavioural change.

Driver.Modbus-012

Field Value
Severity Low
Category Testing coverage
Location tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/
Status Resolved

Description: The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publishing, so the _lastPublishedByRef race (Driver.Modbus-001) is uncaught; (2) no test covers ReinitializeAsync state hygiene for stale _tagsByName/caches (Driver.Modbus-002); (3) no test feeds a malformed/short response PDU through ReadRegisterBlockAsync/DecodeBitArray to confirm a clean BadCommunicationError rather than an index-range crash (Driver.Modbus-005); (4) no test asserts DisposeAsync (vs ShutdownAsync) tears down the probe/re-probe loops and _poll (Driver.Modbus-004).

Recommendation: Add unit tests for concurrent deadband publishing across two subscriptions, ReinitializeAsync state hygiene, malformed-response handling in the register/bit block readers, and DisposeAsync loop teardown.

Resolution: Resolved 2026-05-23 — gap (1) was already covered by ModbusSubscriptionTests.Concurrent_deadband_subscriptions_do_not_corrupt_the_publish_cache from the Driver.Modbus-001 fix. Added the remaining three in a new ModbusLifecycleHygieneTests file: Reinitialize_clears_stale_tagsByName_entries + Reinitialize_clears_lastPublished_and_lastWritten_caches (gap 2), Short_response_PDU_surfaces_as_BadCommunicationError_not_an_IndexOutOfRangeException + Response_payload_truncated_below_declared_byteCount_surfaces_as_BadCommunicationError + DecodeBitArray_rejects_an_empty_bitmap_with_InvalidDataException (gap 3), DisposeAsync_without_explicit_Shutdown_tears_down_probe_loop_and_transport + DisposeAsync_disposes_the_pollEngine_so_subscriptions_stop (gap 4). All 12 new tests pass (full suite: 263/263 green).

Re-review 2026-06-19 (commit 7286d320)

Context

Re-review following addition of the bit read-modify-write path (_rmwLocks, WriteBitInRegisterAsync), enum-serialization fix (JsonStringEnumConverter on AdminUI page), and equipment-tag ref resolver. Previous findings 001-012 are all Resolved.

Checklist coverage

# Category Result
1 Correctness & logic bugs Driver.Modbus-013, Driver.Modbus-014
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety No new issues — _rmwLocks per-address key over-serializes multi-unit but is not incorrect (transport _gate already serializes all sends)
4 Error handling & resilience Driver.Modbus-014
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions No issues found
9 Testing coverage Driver.Modbus-013 (truncated RMW response) had no test; added
10 Documentation & comments No issues found

Driver.Modbus-013

Field Value
Severity Low
Category Correctness & logic bugs
Location ModbusDriver.cs:1169-1171
Status Resolved

Description: WriteBitInRegisterAsync called transport.SendAsync directly for the FC03 read and then immediately indexed readResp[2] and readResp[3] without validating the response length. A device returning a malformed or truncated PDU caused IndexOutOfRangeException. Unlike the normal read path (ReadRegisterBlockAsync) which was hardened in Driver.Modbus-005, the RMW read bypassed that validation entirely. The exception was absorbed by WriteAsync's catch-all and mapped to the wrong status code (see Driver.Modbus-014).

Recommendation: Route the FC03 read in WriteBitInRegisterAsync through ReadRegisterBlockAsync — the same validated path the normal read uses — so truncated responses throw InvalidDataException rather than IndexOutOfRangeException.

Resolution: Resolved 2026-06-19 (SHA blank) — replaced direct transport.SendAsync + raw indexing in WriteBitInRegisterAsync with a ReadRegisterBlockAsync(... qty=1 ...) call. ReadRegisterBlockAsync validates resp.Length >= 2, resp.Length >= 2 + resp[1], and byte-count-vs-quantity before returning clean [hi, lo] register bytes. Regression test: ModbusBitRmwTests.BitInRegister_write_with_truncated_FC03_response_returns_BadCommunicationError.

Driver.Modbus-014

Field Value
Severity Low
Category Error handling & resilience
Location ModbusDriver.cs:990-993
Status Resolved

Description: WriteAsync's generic catch (Exception) arm mapped all non-ModbusException failures to StatusBadInternalError (0x80020000). This includes InvalidDataException thrown by ReadRegisterBlockAsync for a malformed PDU during a BitInRegister RMW read — a communication/protocol error, not a driver code defect. By contrast, ReadAsync correctly returns StatusBadCommunicationError (0x80050000) for the same class of failure. The asymmetry makes write-time communication errors look like internal driver bugs to OPC UA clients monitoring StatusCodes, complicating diagnosis.

Recommendation: Add a catch (InvalidDataException) arm before the generic catch in WriteAsync that maps the exception to StatusBadCommunicationError, matching the ReadAsync error surface.

Resolution: Resolved 2026-06-19 (SHA blank) — added catch (InvalidDataException) { results[i] = new WriteResult(StatusBadCommunicationError); } between the ModbusException and generic catch arms in WriteAsync. Covered by the same regression test as Driver.Modbus-013: ModbusBitRmwTests.BitInRegister_write_with_truncated_FC03_response_returns_BadCommunicationError.