Clear _tagsByName, _lastPublishedByRef, and _lastWrittenByRef in ShutdownAsync (via the new shared TeardownAsync helper) so a ReinitializeAsync cycle starts from a clean state, consistent with the existing _autoProhibited.Clear(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
208 lines
16 KiB
Markdown
208 lines
16 KiB
Markdown
# Code Review — Driver.Modbus
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-22 |
|
|
| Commit reviewed | `76d35d1` |
|
|
| Status | Reviewed |
|
|
| Open findings | 7 |
|
|
|
|
## Checklist coverage
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Driver.Modbus-002, Driver.Modbus-005, Driver.Modbus-009 |
|
|
| 2 | OtOpcUa conventions | No issues found |
|
|
| 3 | Concurrency & thread safety | Driver.Modbus-001, Driver.Modbus-003 |
|
|
| 4 | Error handling & resilience | Driver.Modbus-006, Driver.Modbus-010 |
|
|
| 5 | Security | No issues found |
|
|
| 6 | Performance & resource management | Driver.Modbus-004 |
|
|
| 7 | Design-document adherence | Driver.Modbus-007 |
|
|
| 8 | Code organization & conventions | Driver.Modbus-011 |
|
|
| 9 | Testing coverage | Driver.Modbus-012 |
|
|
| 10 | Documentation & comments | Driver.Modbus-008 |
|
|
|
|
## Findings
|
|
|
|
### Driver.Modbus-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `ModbusDriver.cs:92,99-122` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `_lastPublishedByRef` is a plain `Dictionary<string, object>` mutated inside `ShouldPublish`, which runs on the `PollGroupEngine.onChange` callback. `PollGroupEngine` runs one background `Task` per subscription (`PollGroupEngine.cs:64`), so a driver with two or more subscriptions invokes `onChange` — and therefore `ShouldPublish` — concurrently on separate threads. `ShouldPublish` does `TryGetValue` and indexer writes on the unsynchronized dictionary (`ModbusDriver.cs:108`, `112`, `120`). Concurrent reads/writes of a non-thread-safe `Dictionary` can corrupt internal state, drop entries, or throw `IndexOutOfRangeException`/`InvalidOperationException`, crashing the poll loop. The sibling cache `_lastWrittenByRef` is correctly guarded by `_lastWrittenLock` — only the deadband cache was left unprotected.
|
|
|
|
**Recommendation:** Guard `_lastPublishedByRef` with a dedicated lock around every access in `ShouldPublish`, or switch it to `ConcurrentDictionary<string, object>` and use `AddOrUpdate`/`TryGetValue`.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — switched `_lastPublishedByRef` to `ConcurrentDictionary<string, object>` so the `TryGetValue`/indexer-write accesses in `ShouldPublish` are thread-safe under concurrent multi-subscription `onChange` callbacks; added a concurrent-deadband-subscription regression test.
|
|
|
|
### Driver.Modbus-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `ModbusDriver.cs:127-186` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_options.Tags` is fixed for a driver instance, the same set re-inserts harmlessly today — but the asymmetry is a latent bug: any future path that re-runs init with a different tag set leaves stale tag entries that resolve reads/writes against deleted nodes. `_lastPublishedByRef` and `_lastWrittenByRef` similarly survive a Reinitialize, retaining deadband/write-suppression baselines against the old config, while `_autoProhibited` *is* deliberately cleared (`ModbusDriver.cs:179`) — the inconsistency shows the clearing was simply overlooked.
|
|
|
|
**Recommendation:** Clear `_tagsByName`, `_lastPublishedByRef`, and `_lastWrittenByRef` in `ShutdownAsync` (or at the top of `InitializeAsync`) so a Reinitialize starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — added `_tagsByName.Clear()`, `_lastPublishedByRef.Clear()`, and `_lastWrittenByRef.Clear()` to `ShutdownAsync` (via the new shared `TeardownAsync` helper) so a `ReinitializeAsync` cycle always starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
|
|
|
|
### Driver.Modbus-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
|
|
| Status | Open |
|
|
|
|
**Description:** `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic on .NET so a torn read cannot occur, but there is no happens-before ordering: a stale `DriverHealth` can be observed on another core, and concurrent writers race so "last write wins" is non-deterministic (a `Degraded` write from a failed read can clobber a just-published `Healthy`, or vice versa).
|
|
|
|
**Recommendation:** Mark `_health` `volatile`, or assign via `Volatile.Write` and read with `Volatile.Read`, to give `GetHealth()` a defined ordering guarantee.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Performance & resource management |
|
|
| Location | `ModbusDriver.cs:1468-1473` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `ShutdownAsync` leaks the probe loop, the re-probe loop, and every active polled subscription background `Task`/`CancellationTokenSource`. The two `Task.Run` loops keep running against a disposed transport, throwing on every tick. `Dispose()` (sync) has the same gap and additionally blocks on the async path via `GetAwaiter().GetResult()`.
|
|
|
|
**Recommendation:** Make `DisposeAsync` perform the same teardown as `ShutdownAsync` (cancel both CTSs, dispose them, dispose `_poll`) before disposing `_transport`. Have `ShutdownAsync` and `DisposeAsync` share a private `TeardownAsync` helper.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — refactored teardown into a shared `TeardownAsync` helper; `DisposeAsync` now delegates to it, cancelling both CTS objects, disposing `_poll`, and disposing `_transport` — matching `ShutdownAsync` and eliminating the probe/re-probe/poll-engine leak on `await using` callers.
|
|
|
|
### Driver.Modbus-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `ModbusDriver.cs:777-798,323-330` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-bit — it does not guarantee a non-exception response PDU is long enough to hold function-code + byte-count + the claimed data. A device (or buggy server) returning a 1-byte PDU, or a byte-count larger than the actual payload, produces an `IndexOutOfRangeException`/`ArgumentException` rather than a clean comms error. `DecodeBitArray` similarly indexes `bitmap[0]` (`ModbusDriver.cs:325`) without checking the bitmap is non-empty. In `ReadAsync` these are caught by the catch-all and mapped to `BadCommunicationError`, so impact is limited; in `ReadCoalescedAsync` the exception is opaque to the narrower catch arms.
|
|
|
|
**Recommendation:** In `ReadRegisterBlockAsync`/`ReadBitBlockAsync`, validate `resp.Length >= 2` and `resp.Length >= 2 + resp[1]` before slicing, throwing a descriptive `InvalidDataException`. Validate the decoded byte/bit count matches the request quantity.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — added `resp.Length >= 2`, `resp.Length >= 2 + resp[1]`, and byte-count-vs-quantity checks in both `ReadRegisterBlockAsync` and `ReadBitBlockAsync`, throwing `InvalidDataException` with precise diagnostics; added an empty-bitmap guard in `DecodeBitArray`.
|
|
|
|
### Driver.Modbus-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `ModbusDriver.cs:514-524,532-550` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issuing reads against the captured, disposed transport. `ReprobeLoopAsync` only catches `OperationCanceledException when (ct.IsCancellationRequested)` — an `ObjectDisposedException` from the disposed transport escapes `RunReprobeOnceForTestAsync` and faults the fire-and-forget background `Task`, silently killing the re-probe loop with the wrong failure mode.
|
|
|
|
**Recommendation:** Re-check `_transport`/cancellation inside the per-candidate loop, or broaden the `ReprobeLoopAsync` catch to also swallow `ObjectDisposedException` when `ct.IsCancellationRequested`.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — broadened `ReprobeLoopAsync` to catch `ObjectDisposedException when (ct.IsCancellationRequested)` and return cleanly, so a transport disposal race during shutdown exits the background task rather than faulting it.
|
|
|
|
### Driver.Modbus-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
|
|
| Status | Open |
|
|
|
|
**Description:** Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is declared `Int32`, misrepresenting the OPC UA variable's `DataType` even though `DecodeRegister` produces a correct `long`/`ulong` value — clients see a type/value mismatch. (2) `DisableFC23` is documented and bound from JSON but is a confirmed no-op ("The driver does not currently emit FC23"). Both are acknowledged-but-unfinished items worth tracking.
|
|
|
|
**Recommendation:** Track the PR 25 `DriverDataType.Int64` follow-up; until then document the Int32 surfacing limitation in `docs/v2/modbus-addressing.md` so operators configuring `I_64`/`UI_64` tags understand the node type. Mark `DisableFC23` clearly as reserved/unimplemented or gate it once FC23 ships.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `ModbusDriver.cs:411-417,700-703,737-744` |
|
|
| Status | Open |
|
|
|
|
**Description:** Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `ReprobeLoopAsync`), so the parenthetical is wrong. (2) The comment at `ModbusDriver.cs:700-703` ("On block-level failure mark every member Bad — caller's per-tag fallback won't re-try since handled-set already includes them; auto-split-on-failure is a follow-up") contradicts the actual `catch (ModbusException)` arm below it, which deliberately does not add members to `handled` and does defer to per-tag fallback (and auto-split has shipped via bisection). The empty `foreach (var (idx, _) in block.Members) { }` loop at `ModbusDriver.cs:737-744`, with only a comment body, is dead code from that superseded design.
|
|
|
|
**Recommendation:** Update the two comments to match the shipped #148/#150/#151 behaviour and delete the empty `foreach` loop in the `catch (ModbusException)` arm.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` |
|
|
| Status | Open |
|
|
|
|
**Description:** Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an FC03/FC04 with quantity 0 — a spec-illegal request the PLC rejects with exception 03. The factory does not reject `StringLength = 0` for String tags. (2) `EnableKeepAlive` casts `opts.Time.TotalSeconds`/`opts.Interval.TotalSeconds` to `int`; a sub-second configured `TimeSpan` (e.g. 500 ms) truncates to 0, which most OSes reject or interpret as "use default", silently defeating the configured keep-alive timing.
|
|
|
|
**Recommendation:** Validate `StringLength >= 1` for `String` tags in `ModbusDriverFactoryExtensions.BuildTag`. For keep-alive, round up to a minimum of 1 second or validate the configured `TimeSpan` is a whole number of seconds.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` |
|
|
| Status | Open |
|
|
|
|
**Description:** When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a divergent value. If a driver instance has `WriteOnChangeOnly = true` but a tag is never subscribed/read (write-only setpoint), a value the operator believes was re-asserted is silently suppressed forever after the first write — no time- or count-based expiry exists. The option XML doc describes the read-invalidation path but does not warn about write-only tags.
|
|
|
|
**Recommendation:** Document the write-only-tag caveat on the `WriteOnChangeOnly` option, or add an optional TTL to the suppression cache so a periodic re-write still reaches the PLC.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `ModbusDriver.cs:23-43,89-97,408-432` |
|
|
| Status | Open |
|
|
|
|
**Description:** Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastWrittenByRef` are declared after the constructor; `ProhibitionState`, `_autoProhibited`, and `_reprobeCts` are declared mid-file between `DecodeRegisterArray` and `RangeIsAutoProhibited`. There are also two near-identical `<summary>` blocks stacked back-to-back at `ModbusDriver.cs:411-423`. This hurts readability of a 1400-line file and makes the field inventory hard to audit (relevant to the thread-safety findings above).
|
|
|
|
**Recommendation:** Group all instance fields at the top of the class, move nested types together, and remove the orphaned first `<summary>` at lines 411-417 that no longer precedes a member.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Modbus-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` |
|
|
| Status | Open |
|
|
|
|
**Description:** The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publishing, so the `_lastPublishedByRef` race (Driver.Modbus-001) is uncaught; (2) no test covers `ReinitializeAsync` state hygiene for stale `_tagsByName`/caches (Driver.Modbus-002); (3) no test feeds a malformed/short response PDU through `ReadRegisterBlockAsync`/`DecodeBitArray` to confirm a clean `BadCommunicationError` rather than an index-range crash (Driver.Modbus-005); (4) no test asserts `DisposeAsync` (vs `ShutdownAsync`) tears down the probe/re-probe loops and `_poll` (Driver.Modbus-004).
|
|
|
|
**Recommendation:** Add unit tests for concurrent deadband publishing across two subscriptions, `ReinitializeAsync` state hygiene, malformed-response handling in the register/bit block readers, and `DisposeAsync` loop teardown.
|
|
|
|
**Resolution:** _(open)_
|