Files
lmxopcua/code-reviews/Driver.TwinCAT/findings.md
Joseph Doherty 40aa27b64b fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-005)
Inject optional ILogger<TwinCATDriver> (NullLogger default) and log
connect success/failure, ADS read errors, symbol-browse fallback,
native-notification registration failures, and host-state transitions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:42:08 -04:00

22 KiB

Code Review — Driver.TwinCAT

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 11

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Driver.TwinCAT-001, Driver.TwinCAT-002, Driver.TwinCAT-003, Driver.TwinCAT-004
2 OtOpcUa conventions Driver.TwinCAT-005, Driver.TwinCAT-006
3 Concurrency & thread safety Driver.TwinCAT-007, Driver.TwinCAT-008, Driver.TwinCAT-009
4 Error handling & resilience Driver.TwinCAT-010, Driver.TwinCAT-011
5 Security No issues found
6 Performance & resource management Driver.TwinCAT-012
7 Design-document adherence Driver.TwinCAT-013, Driver.TwinCAT-014
8 Code organization & conventions Driver.TwinCAT-015
9 Testing coverage Driver.TwinCAT-016
10 Documentation & comments Driver.TwinCAT-004 (data-type comment), Driver.TwinCAT-014

Findings

Driver.TwinCAT-001

Field Value
Severity High
Category Correctness & logic bugs
Location TwinCATDriver.cs:41-78
Status Resolved

Description: InitializeAsync and ReinitializeAsync both ignore their driverConfigJson parameter entirely. InitializeAsync builds device/tag state exclusively from _options, captured once in the constructor. ReinitializeAsync calls ShutdownAsync then InitializeAsync(driverConfigJson, ...) — but since InitializeAsync never reads driverConfigJson, a ReinitializeAsync with a changed config silently re-applies the original constructor-time options. Per IDriver.ReinitializeAsync docs and docs/v2/driver-stability.md section "In-process only (Tier A/B)", Reinitialize is the only Core-initiated path to apply a new config generation without a process restart. As written, config changes (added/removed devices, tags, probe settings) to a TwinCAT driver instance are never picked up at runtime.

Recommendation: Parse driverConfigJson in InitializeAsync (reuse TwinCATDriverFactoryExtensions DTO + option-builder logic — extract it to a shared static parser) and assign the resulting options to a mutable field, rather than relying on the constructor-captured _options. Alternatively, document explicitly that the constructor is the sole config source and have the Core recreate the driver instance on config change.

Resolution: Resolved 2026-05-22 — extracted the DTO→options parse into a shared TwinCATDriverFactoryExtensions.ParseOptions; InitializeAsync re-parses driverConfigJson into a now-mutable _options field, so ReinitializeAsync applies a changed config generation.

Driver.TwinCAT-002

Field Value
Severity High
Category Correctness & logic bugs
Location TwinCATDataType.cs:34-48, AdsTwinCATClient.cs:264-281
Status Resolved

Description: TwinCATDataTypeExtensions.ToDriverDataType maps LInt and ULInt (signed/ unsigned 64-bit) to DriverDataType.Int32 (comment: "matches Int64 gap"). The address-space layer therefore creates a 32-bit OPC UA node for a 64-bit PLC value. Meanwhile AdsTwinCATClient.MapToClrType reads LInt/ULInt as long/ulong (64-bit), so the read path returns a boxed long/ulong into a node typed Int32. DriverDataType already has an Int64/UInt64 member (DriverDataType.cs:16-19), so the "gap" the comment refers to does not exist. Values above int.MaxValue are silently truncated or produce a type mismatch at the OPC UA encode layer; UDInt is also folded into Int32, so unsigned 32-bit values in the range 0x80000000 to 0xFFFFFFFF surface as negative.

Recommendation: Map LInt to Int64, ULInt to UInt64, UDInt to UInt32, UInt to UInt16, and USInt/SInt to their natural widths. Remove the stale "Int64 gap" comment.

Resolution: Resolved 2026-05-22 — ToDriverDataType now maps LInt→Int64, ULInt→UInt64, UDInt→UInt32, UInt/USInt→UInt16, Int/SInt→Int16, and the IEC time types→UInt32; removed the stale Int64-gap comment.

Driver.TwinCAT-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location AdsTwinCATClient.cs:264-281, 283-300
Status Open

Description: MapToClrType has a _ => typeof(int) fallthrough and ConvertForWrite has a _ => throw NotSupportedException fallthrough. TwinCATDataType.Structure is a declared enum member, and a config-supplied tag can carry DataType: "Structure" because ParseEnum in the factory accepts any enum name case-insensitively. A Structure tag therefore reads as a 4-byte int against whatever the symbol actually is (a UDT blob) — a garbage/out-of-bounds read rather than a clean rejection — while a write fails late with NotSupportedException. Discovery ToDriverDataType maps Structure to String, compounding the inconsistency.

Recommendation: Reject Structure-typed pre-declared tags at InitializeAsync / TwinCATDriverFactoryExtensions.BuildTag time with a clear error — the driver atomic surface does not support UDT tags, and BrowseSymbolsAsync already correctly yields DataType = null for them.

Resolution: Resolved 2026-05-22 — BuildTag now parses the DataType field first and rejects TwinCATDataType.Structure with an InvalidOperationException that names the tag and explains the limitation; configuration-time failure replaces the previous silent garbage read or late NotSupportedException.

Driver.TwinCAT-004

Field Value
Severity Low
Category Correctness & logic bugs
Location TwinCATDataType.cs:24-27
Status Open

Description: The inline comments for the IEC time types are inaccurate. TwinCAT TIME is a duration (32-bit, milliseconds) — not "ms since epoch of day". DATE is stored as seconds since 1970-01-01 (truncated to a day boundary), not "days since 1970-01-01". These types are also all read/written as raw uint and mapped to DriverDataType.Int32 — the operator sees a raw counter, not a usable date/duration. Misleading comments will mislead the next implementer who tries to add proper conversion.

Recommendation: Correct the comments to match the TwinCAT/IEC 61131-3 representation. If date/time semantics are intended to be exposed properly, track a follow-up to decode them to DriverDataType.DateTime; otherwise document that they surface as raw counters.

Resolution: (open)

Driver.TwinCAT-005

Field Value
Severity Medium
Category OtOpcUa conventions
Location TwinCATDriver.cs (whole file), AdsTwinCATClient.cs (whole file)
Status Resolved

Description: The driver performs no logging. CLAUDE.md Library Preferences mandate Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse failures (DiscoverAsync swallows them in a bare catch), notification-registration failures, and probe state transitions all vanish into status fields or are swallowed silently. Operators get a Degraded health string with no correlatable log trail.

Recommendation: Inject an ILogger/Serilog logger and log at minimum: connect success/failure per device, ADS errors with code, symbol-browse fallback (the DiscoverAsync catch), native-notification registration failures, and host state transitions (TransitionDeviceState).

Resolution: Resolved 2026-05-22 — added optional ILogger<TwinCATDriver> constructor parameter (defaults to NullLogger); logs connect success/failure in EnsureConnectedAsync, ADS read errors in ReadAsync, symbol-browse fallback in DiscoverAsync, notification-registration failures in SubscribeAsync, and host-state transitions in TransitionDeviceState.

Driver.TwinCAT-006

Field Value
Severity Low
Category OtOpcUa conventions
Location TwinCATDriver.cs:406-411
Status Open

Description: ResolveHost falls back to DriverInstanceId when there are no configured devices and the reference is unknown. DriverInstanceId is a logical config-DB identifier, not a host address; IPerCallHostResolver consumers expect a host key that correlates with GetHostStatuses() entries (HostConnectivityStatus.HostName equals device.Options.HostAddress). Returning the instance ID produces a host key that matches no connectivity-status row.

Recommendation: Return a stable sentinel that will not be confused with a real host (an empty string or a documented unresolved marker), or document why the instance ID is the chosen fallback. Prefer the first device HostAddress only when one exists (already done).

Resolution: (open)

Driver.TwinCAT-007

Field Value
Severity High
Category Concurrency & thread safety
Location TwinCATDriver.cs:413-429
Status Resolved

Description: EnsureConnectedAsync is not thread-safe. ReadAsync, WriteAsync, SubscribeAsync, and the per-device ProbeLoopAsync background task can all call it concurrently for the same DeviceState. The sequence device.Client ??= _clientFactory.Create() followed by await device.Client.ConnectAsync(...) has no lock: two threads can both observe device.Client null-or-disconnected, each create/connect a client, and one AdsTwinCATClient is leaked (its AdsClient + AdsNotificationEx handler never disposed). Worse, on the connect-failure path one thread does device.Client.Dispose(); device.Client = null; while another thread is mid-ConnectAsync on that same client instance — a disposal race that can throw ObjectDisposedException or corrupt the AdsClient. The probe loop runs continuously, so this race is not hypothetical under any concurrent read/write load.

Recommendation: Guard EnsureConnectedAsync per-device with a SemaphoreSlim (one per DeviceState), or use an async-lazy connect with proper invalidation. The S7/AB-CIP drivers serialize device access with a SemaphoreSlim — follow that pattern. Note this also serializes the wire, which docs/v2/driver-specs.md recommends for single-connection-per-PLC.

Resolution: Resolved 2026-05-22 — EnsureConnectedAsync is now serialized per device by a SemaphoreSlim (DeviceState.ConnectGate) with a double-checked connect, mirroring the S7 driver; no client is leaked and no disposal race remains.

Driver.TwinCAT-008

Field Value
Severity High
Category Concurrency & thread safety
Location AdsTwinCATClient.cs:162-169, TwinCATDriver.cs:319-324
Status Resolved

Description: Native ADS notification callbacks (OnAdsNotificationEx) run on the AdsClient AMS router thread. docs/v2/driver-specs.md section 6 explicitly calls this out as a code-review checklist item: "Callbacks must marshal to a managed work queue immediately (no driver logic on the router thread) — blocking the router thread blocks every ADS notification across the process." The current path invokes reg.OnChange(...) directly on the router thread, and OnChange is the driver lambda that calls OnDataChange?.Invoke(this, ...) — i.e. every downstream Core/OPC UA subscriber handler executes synchronously on the AMS router thread. A single slow consumer stalls ADS notification delivery for every tag on every device in the process.

Recommendation: Marshal notification values onto a bounded Channel/work queue drained by a dedicated managed task before invoking OnChange/OnDataChange, exactly as the Galaxy EventPump does. Keep the router-thread callback to a non-blocking enqueue only.

Resolution: Resolved 2026-05-22 — AdsTwinCATClient now enqueues native AdsNotificationEx callbacks onto a bounded Channel drained by a dedicated managed task; the AMS router thread only does a non-blocking TryWrite, so a slow consumer cannot stall ADS delivery.

Driver.TwinCAT-009

Field Value
Severity Medium
Category Concurrency & thread safety
Location TwinCATDriver.cs:80-99, 41-72, 366-388
Status Open

Description: ShutdownAsync mutates _devices, _tagsByName, and _nativeSubs with no synchronization while ReadAsync/WriteAsync/SubscribeAsync may be iterating or indexing those same plain Dictionary<> instances on other threads (_devices and _tagsByName are non-concurrent dictionaries). ShutdownAsync calls _devices.Clear()/_tagsByName.Clear() concurrently with _devices.TryGetValue in a read — Dictionary<> is not safe for concurrent read+write and can throw or corrupt internal state. ReinitializeAsync makes this worse: it runs ShutdownAsync then InitializeAsync (which re-populates the same dictionaries) while in-flight reads continue. The probe loop EnsureConnectedAsync also touches DeviceState objects that ShutdownAsync is disposing — ShutdownAsync cancels ProbeCts but does not await the probe task before calling DisposeClient().

Recommendation: Either swap _devices/_tagsByName to ConcurrentDictionary and snapshot them on rebuild, or introduce a lifecycle lock / volatile running guard so reads fail fast with BadServerHalted/BadNodeIdUnknown once shutdown begins. Cancel and await the probe tasks before disposing DeviceStates.

Resolution: (open)

Driver.TwinCAT-010

Field Value
Severity Medium
Category Error handling & resilience
Location AdsTwinCATClient.cs:178-195
Status Open

Description: BrowseSymbolsAsync checks cancellationToken.IsCancellationRequested and does yield break (a clean completion) rather than throwing OperationCanceledException. DiscoverAsync (TwinCATDriver.cs:274) explicitly has catch (OperationCanceledException) { throw; } to propagate cancellation distinctly from a genuine browse failure. Because the client never throws on cancellation, a cancelled discovery silently completes as if the symbol table were fully and successfully walked — the address space is built from a partial symbol set with no indication it was truncated. The SymbolLoaderFactory.Create / loader.Symbols enumeration itself is also not cancellable.

Recommendation: Call cancellationToken.ThrowIfCancellationRequested() instead of yield break so a cancelled browse surfaces as cancellation, not as a successful but partial discovery.

Resolution: (open)

Driver.TwinCAT-011

Field Value
Severity Medium
Category Error handling & resilience
Location TwinCATStatusMapper.cs:29-42
Status Open

Description: ADS error-code mapping has gaps and an inconsistency versus docs/v2/driver-specs.md section 6. The spec documents symbol-not-found as 0x0701 (1793 decimal) and symbol-version-changed as 0x0702 (1794 decimal). MapAdsError maps decimal 1798 to BadNodeIdUnknown (symbol not found) and 1793/1794 to BadOutOfRange (invalid index group/offset). The decimal-vs-hex interpretation of the documented codes does not line up, so the mapper appears to treat the symbol-version-changed code as a generic range error. 0x0710 "Not ready / PLC in Config mode" has no entry and falls through to BadCommunicationError; the driver-spec recommends distinguishing it. And 0x0702 symbol-version-changed is never routed to rediscovery (see Driver.TwinCAT-013).

Recommendation: Confirm the actual AdsErrorCode numeric values from Beckhoff.TwinCAT.Ads (the SDK enum, not the doc hex shorthand) and align the mapper. Add an explicit case for symbol-version-changed routed to rediscovery, and for PLC-in-Config mapped to BadOutOfService/BadInvalidState.

Resolution: (open)

Driver.TwinCAT-012

Field Value
Severity Medium
Category Performance & resource management
Location TwinCATDriver.cs:102, AdsTwinCATClient.cs:178-195
Status Open

Description: GetMemoryFootprint() returns a hard-coded 0. docs/v2/driver-stability.md section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the footprint to reflect "bytes attributable to their own caches (symbol cache, subscription items, queued operations)", and section 6 of driver-specs.md explicitly identifies cached symbol info as "the largest in-driver allocation" for TwinCAT and ties FlushOptionalCachesAsync to flushing it. Reporting 0 means Core allocation-slope detection and cache-budget enforcement are blind to this driver, and FlushOptionalCachesAsync is a no-op. (Note: the current BrowseSymbolsAsync does not retain a symbol cache — it streams and discards — so re-discovery re-downloads the whole symbol table each time, itself a performance concern for EnableControllerBrowse deployments.)

Recommendation: Either implement an actual symbol cache + report its size via GetMemoryFootprint() and flush it in FlushOptionalCachesAsync, or, if the stream-and-discard design is intentional, report the real footprint of _nativeSubs / _tagsByName and document that the driver holds no flushable cache.

Resolution: (open)

Driver.TwinCAT-013

Field Value
Severity High
Category Design-document adherence
Location TwinCATDriver.cs:11-12 (capability list), whole file
Status Resolved

Description: TwinCATDriver does not implement IRediscoverable. Both docs/v2/driver-specs.md section 6 and docs/v2/driver-stability.md section "TwinCAT — Deep Dive" state this as the defining TwinCAT failure mode: "Symbol-version-changed (0x0702) is the unique TwinCAT failure mode... The driver must catch 0x0702, mark its symbol cache invalid, re-upload symbols, rebuild the address space subtree... Treat this as a IRediscoverable invocation, not as a connection error." The IRediscoverable XML doc names TwinCAT symbol-version-changed as a canonical example. The current driver maps the error to a generic BadOutOfRange/BadCommunicationError quality code and never re-runs discovery, so after a PLC program re-download every symbol handle and notification silently goes stale with no address-space rebuild.

Recommendation: Implement IRediscoverable; detect the symbol-version-changed ADS error on read/write/notification paths, raise OnRediscoveryNeeded with a scoped reason string, and re-establish native notifications after the Core re-runs DiscoverAsync. This is explicitly part of the documented driver contract, not optional.

Resolution: Resolved 2026-05-22 — TwinCATDriver implements IRediscoverable; AdsTwinCATClient detects ADS 0x0702 on read/write paths and raises OnSymbolVersionChanged, which the driver forwards as OnRediscoveryNeeded so Core rebuilds the address space.

Driver.TwinCAT-014

Field Value
Severity Low
Category Design-document adherence
Location TwinCATDriverOptions.cs:41-43, TwinCATDriverOptions.cs:57-62, AdsTwinCATClient.cs:145
Status Open

Description: Several drifts between the implemented config surface and docs/v2/driver-specs.md section 6. The spec connection-settings list has separate Host (IP), AmsNetId, and AmsPort fields; the implementation collapses these into a single HostAddress string parsed as ads://{netId}:{port}, so the target device IP has no home field. TwinCATProbeOptions.Timeout (TwinCATDriverOptions.cs:61) is never read anywhere — the probe path connects via _options.Timeout — a dead config field. The spec lists NotificationMaxDelayMs; the code hard-codes max-delay 0 in NotificationSettings (AdsTwinCATClient.cs:145) with no config knob.

Recommendation: Reconcile the driver-spec doc with the implemented TwinCATDriverOptions shape (the doc is DRAFT, so updating it is acceptable). Remove or wire up TwinCATProbeOptions.Timeout. Expose NotificationMaxDelayMs if batching control is wanted.

Resolution: (open)

Driver.TwinCAT-015

Field Value
Severity Low
Category Code organization & conventions
Location TwinCATDriver.cs:431-432
Status Open

Description: Dispose() runs DisposeAsync().AsTask().GetAwaiter().GetResult() — sync-over-async. docs/v2/driver-stability.md section Galaxy explicitly lists "sync-over-async on the OPC UA stack thread" among the four 2026-04-13 stability findings that had to be closed. DisposeAsync calls ShutdownAsync, which awaits _poll.DisposeAsync() and disposes clients; if Dispose() is ever called on a thread with a single-threaded synchronization context (the OPC UA stack), GetResult() can deadlock.

Recommendation: Make Dispose() perform a genuinely synchronous teardown. The operations here — cancelling token sources, disposing clients, clearing dictionaries — are all synchronous, and PollGroupEngine.DisposeAsync completes synchronously, so factor the synchronous teardown out so Dispose() does not block on a Task.

Resolution: (open)

Driver.TwinCAT-016

Field Value
Severity Low
Category Testing coverage
Location tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/
Status Open

Description: Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write, native notifications, symbol browse, and the capability surface. Gaps tied to the findings above: no test exercises ReinitializeAsync with a changed config (Driver.TwinCAT-001 would have been caught); no concurrency test drives ReadAsync/WriteAsync/probe against one device simultaneously (Driver.TwinCAT-007/009); no test covers the symbol-version-changed to rediscovery path (Driver.TwinCAT-013, currently unimplemented); no test covers a Structure- typed pre-declared tag (Driver.TwinCAT-003); no test asserts 64-bit LInt/ULInt round-trip without truncation (Driver.TwinCAT-002).

Recommendation: Add unit tests for the above paths once the corresponding findings are addressed, especially a concurrency stress test for EnsureConnectedAsync and a ReinitializeAsync-applies-new-config test.

Resolution: (open)