Files
lmxopcua/code-reviews/Driver.AbLegacy/findings.md
Joseph Doherty 54d51a1d20 fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-009)
InitializeAsync catch block now mirrors ShutdownAsync teardown: cancels
and disposes probe CancellationTokenSources, calls DisposeRuntimes, and
clears _devices/_tagsByName before rethrowing. A caller that catches and
abandons (rather than retrying via ReinitializeAsync) no longer leaves
orphaned probe tasks or libplctag handles alive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:42 -04:00

18 KiB

Code Review - Driver.AbLegacy

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 11

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Driver.AbLegacy-001, Driver.AbLegacy-002, Driver.AbLegacy-003, Driver.AbLegacy-004
2 OtOpcUa conventions Driver.AbLegacy-005
3 Concurrency & thread safety Driver.AbLegacy-006, Driver.AbLegacy-007, Driver.AbLegacy-008
4 Error handling & resilience Driver.AbLegacy-009, Driver.AbLegacy-010
5 Security No issues found
6 Performance & resource management Driver.AbLegacy-011
7 Design-document adherence Driver.AbLegacy-012
8 Code organization & conventions Driver.AbLegacy-013
9 Testing coverage No issues found
10 Documentation & comments No issues found

Findings

Driver.AbLegacy-001

Field Value
Severity High
Category Correctness & logic bugs
Location AbLegacyAddress.cs:54, AbLegacyDriver.cs:368-374
Status Resolved

Description: AbLegacyAddress.TryParse accepts a BitIndex of 0..31 for every file type. A PCCC N-file word is a signed 16-bit integer, so valid bit indices are 0..15. When a tag is Bit-typed against an N-file with a bit suffix of 16..31 (e.g. N7:0/20), WriteBitInWordAsync reads the parent as AbLegacyDataType.Int (16-bit), then computes current | (1 << bit) / current & ~(1 << bit) with bit up to 31. 1 << 20 produces a value outside the 16-bit range, the result is cast (short)updated, and the high bits are silently truncated - the wrong bit (or no bit) is written and no error is surfaced. The mask arithmetic is also done on a sign-extended int. For L-file (32-bit) bits the parent is still read as Int (16-bit), so bits 16..31 of a long can never be addressed correctly.

Recommendation: Validate BitIndex against the parent word width during parse or in WriteBitInWordAsync - reject bit > 15 for N/B/I/O/S files and bit > 31 for L files. For bit-in-word RMW against L files, read the parent as Long. Mask the read-back value to the word width before applying the bit operation.

Resolution: Resolved 2026-05-22 — AbLegacyAddress.TryParse now range-checks the bit index against per-file word width (0..15 for N/B/I/O/S/A, 0..31 for L, no bits on F); WriteBitInWordAsync reads/writes an L-file parent as 32-bit Long and masks the RMW arithmetic to the native width so sign-extension can no longer corrupt high bits.

Driver.AbLegacy-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location AbLegacyDriver.cs:368
Status Resolved

Description: In WriteBitInWordAsync the parent word is decoded with Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...)). LibplctagLegacyTagRuntime.DecodeValue for AbLegacyDataType.Int returns (int)_tag.GetInt16(0) - a sign-extended int. When the current word has its high bit set (value 0x8000..0xFFFF, decoded as a negative int), the subsequent (short)updated cast re-encodes the low 16 bits correctly, but current | (1 << bit) is performed on the sign-extended value. The result is bit-correct for the low 16 bits only because the cast preserves them; any future change to widen the mask range will break silently. Combined with finding 001 this is a latent correctness hazard.

Recommendation: Mask current to current & 0xFFFF before the bit operation and operate on an explicitly 16-bit value, or document the reliance on low-16-bit preservation explicitly.

Resolution: Resolved 2026-05-22 — current & widthMask already applied in WriteBitInWordAsync by the -001 fix; no additional change needed.

Driver.AbLegacy-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location AbLegacyAddress.cs:62-95
Status Resolved

Description: TryParse does not reject several malformed PCCC addresses that the XML docs imply are invalid:

  • A sub-element and a bit index together (T4:0.ACC/2) parse successfully even though no PCCC element supports both.
  • I/O/S files with a file number (I3:0, S2:1) parse successfully - I/O and S are single-letter files with no file number per the doc comment, but the parser only requires "letter then optional digits".
  • B-file addresses with a sub-element (B3:0.DN) parse successfully. ToLibplctagName() re-emits whatever was parsed, so a malformed address is passed through to libplctag rather than rejected early with a clear error.

Recommendation: Tighten the parser: reject sub-element + bit-index combinations, reject file numbers on I/O/S, and restrict which file letters may carry a sub-element (T/C/R only). Add unit coverage for the rejection cases.

Resolution: Resolved 2026-05-22 — TryParse now rejects sub-element+bit-index combinations, file numbers on I/O/S files, and sub-elements on non-T/C/R files; unit tests added in AbLegacyAddressTests.

Driver.AbLegacy-004

Field Value
Severity Medium
Category Correctness & logic bugs
Location LibplctagLegacyTagRuntime.cs:36-37
Status Resolved

Description: DecodeValue for AbLegacyDataType.Bit with bitIndex == null returns _tag.GetInt8(0) != 0. A bit-file element (B3:0/0) is a single bit inside a 16-bit word; reading only the low byte (GetInt8(0)) means a Bit tag whose live bit sits in bits 8..15 of the word, or a B-file element addressed without an explicit bit suffix, decodes incorrectly. The driver passes parsed.ToLibplctagName() which preserves the /bit suffix, so libplctag resolves the bit when a suffix is present - but a Bit-typed tag configured with an address that has no /bit suffix (e.g. B3:0) silently decodes the wrong thing.

Recommendation: For Bit with no bitIndex, decide explicitly: either require a bit suffix on Bit-typed tags (validate in CreateInstance/DiscoverAsync) or decode the full 16-bit word and test bit 0.

Resolution: Resolved 2026-05-22 — DecodeValue for Bit with no bitIndex now reads the full 16-bit word via GetInt16(0) and tests bit 0, avoiding the silent half-word truncation from GetInt8.

Driver.AbLegacy-005

Field Value
Severity Low
Category OtOpcUa conventions
Location AbLegacyDriver.cs (whole file)
Status Open

Description: The driver uses no ILogger/Serilog at all. Probe-loop failures, runtime initialisation failures, libplctag non-zero statuses, and read/write exceptions are folded into DriverHealth.Detail strings but never logged. CLAUDE.md names Serilog with a rolling daily file sink as the logging library. The complete absence of structured logging makes field diagnosis of a PCCC comms problem (timeout vs route failure vs wrong PLC family) rely entirely on a single overwritten Detail string that the next read or write immediately clobbers.

Recommendation: Inject ILogger<AbLegacyDriver> (optional, like tagFactory) and log probe transitions, runtime-init failures, and the first occurrence of a non-zero libplctag status per device.

Resolution: (open)

Driver.AbLegacy-006

Field Value
Severity High
Category Concurrency & thread safety
Location AbLegacyDriver.cs:107-158, AbLegacyDriver.cs:162-234, LibplctagLegacyTagRuntime.cs
Status Resolved

Description: A per-tag IAbLegacyTagRuntime (wrapping a single libplctag Tag) is cached in DeviceState.Runtimes and reused. ReadAsync (called directly by the server read path) and the PollGroupEngine poll loop (which also calls ReadAsync via the reader delegate) can run concurrently, and two poll subscriptions covering the same tag run on independent background tasks. All of them call EnsureTagRuntimeAsync to the same Tag instance and call runtime.ReadAsync / GetStatus / DecodeValue with no synchronisation. A libplctag Tag is not safe for concurrent operations on the same handle: an interleaved Read/GetStatus/DecodeValue from two threads can read a value mid-update or observe a status that belongs to the other operation. WriteAsync shares the same runtime dictionary and compounds the hazard. Only the bit-in-word RMW path is serialised (per-parent SemaphoreSlim).

Recommendation: Serialise all operations against a given runtime - a per-runtime SemaphoreSlim, or a per-device read lock - so no two threads touch the same Tag handle concurrently.

Resolution: Resolved 2026-05-22 — added a per-runtime SemaphoreSlim (DeviceState.GetRuntimeLock, keyed by tag name); ReadAsync and WriteAsync now hold it around the whole Read→GetStatus→Decode / Encode→Write→GetStatus sequence so the shared libplctag Tag handle is never touched by two threads at once.

Driver.AbLegacy-007

Field Value
Severity Medium
Category Concurrency & thread safety
Location AbLegacyDriver.cs:411-438, AbLegacyDriver.cs:386-409
Status Resolved

Description: EnsureTagRuntimeAsync and EnsureParentRuntimeAsync are check-then-act: device.Runtimes.TryGetValue(...) then, after await runtime.InitializeAsync, device.Runtimes[def.Name] = runtime. Dictionary is not thread-safe, and two concurrent callers for the same tag (read + poll, or two poll loops) both miss the lookup, both Create + InitializeAsync a runtime, and both write the dictionary. One runtime is overwritten and leaked - DisposeRuntimes only disposes what is currently in the dict - and concurrent Dictionary writes can corrupt internal state. ParentRuntimes has the identical pattern.

Recommendation: Replace the runtime caches with ConcurrentDictionary and use GetOrAdd, or guard runtime creation under a per-device lock. Ensure the losing runtime of any race is disposed.

Resolution: Resolved 2026-05-22 — Runtimes and ParentRuntimes changed to ConcurrentDictionary; EnsureTagRuntimeAsync and EnsureParentRuntimeAsync now hold a per-key GetCreationLock semaphore around the double-checked create+initialize+store sequence so exactly one runtime is created per key and no race-loser is leaked.

Driver.AbLegacy-008

Field Value
Severity Medium
Category Concurrency & thread safety
Location AbLegacyDriver.cs:21, AbLegacyDriver.cs:138-146, AbLegacyDriver.cs:216-229
Status Resolved

Description: _health is a plain non-volatile reference field mutated from ReadAsync, WriteAsync (both can run on multiple threads / poll loops) and InitializeAsync/ShutdownAsync, and read by GetHealth() from yet another thread. There is no lock, no volatile, and no Interlocked exchange. The record reference assignment is atomic, but without a memory barrier a reader can observe a stale _health indefinitely, and concurrent writers race so a Healthy write from one successful read can clobber a Degraded write from a concurrent failing read. GetHealth() may therefore report Healthy while reads are persistently failing.

Recommendation: Mark _health volatile, or funnel health transitions through a lock / Interlocked.Exchange. Consider only downgrading on failure and upgrading on a successful poll so a single failed read does not flap the surface.

Resolution: Resolved 2026-05-22 — _health marked volatile; memory barrier comment documents the acquire/release ordering guarantee.

Driver.AbLegacy-009

Field Value
Severity Medium
Category Error handling & resilience
Location AbLegacyDriver.cs:41-74
Status Resolved

Description: InitializeAsync starts probe loops with Task.Run inside the try block. If InitializeAsync fails - or is re-entered - after some probe loops are already started, the catch only sets _health = Faulted and rethrows; it does not cancel state.ProbeCts, dispose runtimes, or clear _devices. A caller that catches the exception and retries via ReinitializeAsync is covered (it calls ShutdownAsync first), but a caller that catches and abandons the driver leaves orphaned probe tasks and CancellationTokenSources alive holding libplctag handles. Separately, ProbeLoopAsync never escalates a permanently-unreachable device beyond Stopped.

Recommendation: On the catch path in InitializeAsync, run the same teardown as ShutdownAsync (cancel probe CTSs, dispose runtimes, clear dictionaries) before rethrowing, so a failed initialise leaves no live background work.

Resolution: Resolved 2026-05-22 — InitializeAsync catch block now cancels and disposes probe CTSs, calls DisposeRuntimes, and clears _devices/_tagsByName before rethrowing, leaving no orphaned background tasks or handles.

Driver.AbLegacy-010

Field Value
Severity Medium
Category Error handling & resilience
Location AbLegacyStatusMapper.cs:26-56
Status Open

Description: MapLibplctagStatus maps the integer codes -5/-7/-14/-16/-17. These do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32, PLCTAG_ERR_NOT_FOUND = -22, PLCTAG_ERR_NOT_ALLOWED = -21, PLCTAG_ERR_OUT_OF_BOUNDS = -25, PLCTAG_ERR_BAD_CONNECTION = -8). The mapper operates on (int)_tag.GetStatus(), where GetStatus() returns the libplctag .NET wrapper Status enum whose underlying ordinals differ from the native codes - so the -5/-7/... values are at best the .NET enum ordinals (unverified, undocumented) and at worst wrong. Any unmatched negative status falls through to BadCommunicationError, so a timeout is reported as a generic comms error rather than BadTimeout. MapPcccStatus is dead code - the PCCC STS byte is never inspected because libplctag surfaces only its own status enum.

Recommendation: Verify the actual libplctag.Status enum values against the 1.5.2 package and map by enum name rather than magic integers. Either wire MapPcccStatus into a real PCCC-STS path or delete it as dead code. The same defect exists in AbCipStatusMapper and should be fixed in lockstep.

Resolution: (open)

Driver.AbLegacy-011

Field Value
Severity Low
Category Performance & resource management
Location AbLegacyDriver.cs:440
Status Open

Description: Dispose() is implemented as DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. ShutdownAsync awaits _poll.DisposeAsync() (which completes synchronously) and does no other real async work, so a deadlock is unlikely in practice, but the pattern blocks the calling thread and would deadlock if any awaited continuation were ever marshalled back to a single-threaded synchronization context.

Recommendation: Prefer callers use IAsyncDisposable. If a synchronous Dispose() must exist, perform the synchronous teardown directly (cancel CTSs, dispose runtimes) rather than blocking on the async path.

Resolution: (open)

Driver.AbLegacy-012

Field Value
Severity Medium
Category Design-document adherence
Location PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54, AbLegacyDriver.cs:48-52
Status Open

Description: AbLegacyPlcFamilyProfile declares four record properties - DefaultCipPath, MaxTagBytes, SupportsStringFile, SupportsLongFile - and only LibplctagPlcAttribute is ever consumed. In particular:

  • DefaultCipPath is dead: the per-family default path (empty for MicroLogix, 1,0 for SLC/PLC-5) is never used to substitute an empty CIP path. The CIP path always comes verbatim from AbLegacyHostAddress.CipPath, so a SLC 500 misconfigured with an empty path is never corrected to 1,0 even though the profile knows the right default - contradicting the test-fixture doc, which calls out the /1,0 cip-path workaround as required for SLC.
  • MaxTagBytes is never used to validate or chunk a string/array read.
  • SupportsStringFile/SupportsLongFile are never checked, so a String or Long tag configured against a MicroLogix or PLC-5 (which the profile says lack them) is accepted and only fails at runtime with an opaque comms error.

Recommendation: Either consume the profile fields (substitute DefaultCipPath when the host CIP path is empty; reject Long/String tags against families whose profile sets the corresponding flag false; use MaxTagBytes for validation) or remove the unused fields and the doc comments that imply they are load-bearing.

Resolution: (open)

Driver.AbLegacy-013

Field Value
Severity Low
Category Code organization & conventions
Location AbLegacyDriver.cs:340-345, AbLegacyDriver.cs:238-264
Status Open

Description: Two minor organisational issues:

  1. ResolveHost returns _options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId when the reference is unknown and no devices are configured. DriverInstanceId is not a host address (ab://...), so a downstream IHostConnectivityProbe / host lookup keyed on the returned value never matches a real device. Returning the instance id as a fake host masks a configuration error.
  2. DiscoverAsync always emits IsArray: false / ArrayDim: null. PCCC files are inherently arrays of elements; a tag that genuinely addresses a multi-element region cannot be represented. This is consistent with the PR-staged scope (the doc says array coverage is thin) but should be tracked rather than silently shipped.

Recommendation: For (1), either throw / return a sentinel the caller can detect, or document why falling back to the instance id is acceptable. For (2), record the array-addressing gap as a tracked follow-up.

Resolution: (open)