Files
lmxopcua/code-reviews/Client.Shared/findings.md
Joseph Doherty 7e54e1e4a0 fix(client-shared): resolve Medium code-review findings (Client.Shared-001, -002, -007, -008)
Client.Shared-001: lowered the OnAlarmEventNotification early-return guard
from <6 to <1; per-index field guards already default missing fields safely.
Client.Shared-002: GetRedundancyInfoAsync replaces unguarded unboxing casts
with StatusCode.IsGood + Convert.ToInt32/ToByte, defaulting on bad reads.
Client.Shared-007: alarm fallback Task.Run guards on ReferenceEquals(session,
_session) and drops stale alarms on ObjectDisposedException after failover.
Client.Shared-008: WriteValueAsync rejects type inference from bad/null reads;
ValueConverter wraps parse failures in a descriptive FormatException.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:11:23 -04:00

15 KiB

Code Review — Client.Shared

Field Value
Module src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 5

Checklist coverage

# Category Result
1 Correctness & logic bugs Client.Shared-001, Client.Shared-002, Client.Shared-003
2 OtOpcUa conventions Client.Shared-004
3 Concurrency & thread safety Client.Shared-005, Client.Shared-006, Client.Shared-007
4 Error handling & resilience Client.Shared-008, Client.Shared-009
5 Security No issues found
6 Performance & resource management Client.Shared-010
7 Design-document adherence No issues found
8 Code organization & conventions No issues found
9 Testing coverage Client.Shared-011
10 Documentation & comments Client.Shared-009

Findings

Client.Shared-001

Field Value
Severity Medium
Category Correctness & logic bugs
Location OpcUaClientService.cs:552
Status Resolved

Description: OnAlarmEventNotification returns early when eventFields.EventFields has fewer than 6 entries. The event filter built by CreateAlarmEventFilter always registers 13 select clauses, so a conforming server returns 13 fields. The < 6 threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with fields.Count > N. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the < 6 early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.

Recommendation: Drop the < 6 early return (or lower it to < 1) and rely on the existing per-index fields.Count > N guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.

Resolution: Resolved 2026-05-22 — lowered the early-return threshold to < 1 (null or empty guard only); per-index fields.Count > N guards already default missing fields safely for all higher indices.

Client.Shared-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location OpcUaClientService.cs:351-355, OpcUaClientService.cs:373
Status Resolved

Description: GetRedundancyInfoAsync performs unguarded unboxing casts on values read from the server: (int)redundancySupportValue.Value and (byte)serviceLevelValue.Value. Unlike the ServerUriArray/ServerArray reads below them, the RedundancySupport and ServiceLevel reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. ServiceLevel boxed as int instead of byte), or returns a null Value on a Bad DataValue, the cast throws InvalidCastException/NullReferenceException and the whole call fails instead of returning a sensible default.

Recommendation: Wrap the RedundancySupport and ServiceLevel reads in the same defensive pattern used for the array reads, using Convert.ToInt32/Convert.ToByte on the boxed value and falling back to None/0 when the read status is bad or the value is null.

Resolution: Resolved 2026-05-22 — replaced direct casts with StatusCode.IsGood guard + Convert.ToInt32/Convert.ToByte coercion; falls back to None/0 when status is bad or value is null.

Client.Shared-003

Field Value
Severity Low
Category Correctness & logic bugs
Location Adapters/DefaultSessionAdapter.cs:76, Adapters/DefaultSessionAdapter.cs:273
Status Open

Description: WriteValueAsync returns response.Results[0] and CallMethodAsync reads result.Results[0] without first checking the Results collection is non-empty. A malformed or service-level-faulted response (empty Results alongside a service fault) produces an IndexOutOfRangeException rather than a meaningful OPC UA StatusCode or ServiceResultException.

Recommendation: Guard both accesses — throw ServiceResultException with the response's ResponseHeader.ServiceResult (or BadUnexpectedError) when Results is empty.

Resolution: (open)

Client.Shared-004

Field Value
Severity Low
Category OtOpcUa conventions
Location Adapters/DefaultSessionAdapter.cs:228, Adapters/DefaultSessionAdapter.cs:121, Adapters/DefaultSessionAdapter.cs:172
Status Open

Description: CloseAsync, HistoryReadRawAsync, and HistoryReadAggregateAsync are declared async Task but call the synchronous Session.Close() / Session.HistoryRead(...) APIs and contain no await. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the CancellationToken parameter is ignored on these paths.

Recommendation: Use the stack's async overloads (Session.HistoryReadAsync, Session.CloseAsync) where available, or wrap the synchronous calls in Task.Run, so the methods are genuinely asynchronous and honor the cancellation token.

Resolution: (open)

Client.Shared-005

Field Value
Severity High
Category Concurrency & thread safety
Location OpcUaClientService.cs:19, OpcUaClientService.cs:226-249, OpcUaClientService.cs:499-521
Status Resolved

Description: _activeDataSubscriptions is a plain Dictionary mutated from at least three thread contexts with no synchronization: the caller thread (SubscribeAsync/UnsubscribeAsync), the keep-alive callback thread (HandleKeepAliveFailureAsync -> ReplaySubscriptionsAsync, invoked fire-and-forget from the OPC UA KeepAlive event), and DisconnectAsync. Concurrent Add/Remove/Clear/enumeration on a non-thread-safe Dictionary can corrupt its internal buckets, throw InvalidOperationException, or lose entries. A failover firing while the UI calls SubscribeAsync is a realistic trigger. The _activeAlarmSubscription nullable tuple has the same exposure.

Recommendation: Guard all access to _activeDataSubscriptions / _activeAlarmSubscription (and the _session/_dataSubscription/_alarmSubscription fields) with a single lock, or move subscription bookkeeping behind a ConcurrentDictionary plus a lock for the multi-field failover transition.

Resolution: Resolved 2026-05-22 — added a dedicated _subscriptionLock and wrapped every read/write of _activeDataSubscriptions and _activeAlarmSubscription (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O.

Client.Shared-006

Field Value
Severity High
Category Concurrency & thread safety
Location OpcUaClientService.cs:97-100, OpcUaClientService.cs:432-497
Status Resolved

Description: HandleKeepAliveFailureAsync is launched fire-and-forget (_ = HandleKeepAliveFailureAsync()) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check if (_state == Reconnecting || _state == Disconnected) return; at the top. Between that read and the TransitionState(Reconnecting, ...) write a few lines later, a second keep-alive failure (the SDK raises KeepAlive repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing _session, nulling subscription fields, and racing to assign a new _session. The session created by the loser leaks, and ReplaySubscriptionsAsync can run twice creating duplicate monitored items.

Recommendation: Serialize failover with an Interlocked.CompareExchange flag or a SemaphoreSlim(1,1) so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.

Resolution: Resolved 2026-05-22 — HandleKeepAliveFailureAsync now claims an atomic _failoverInProgress slot via Interlocked.CompareExchange(ref _failoverInProgress, 1, 0); a re-entrant bad keep-alive sees 1 and returns immediately, so only one failover loop runs. The loop body moved to RunFailoverAsync, wrapped in try/finally that resets the flag with Interlocked.Exchange.

Client.Shared-007

Field Value
Severity Medium
Category Concurrency & thread safety
Location OpcUaClientService.cs:581-622
Status Resolved

Description: In the alarm fallback path, the Task.Run closure mutates the captured locals activeState, ackedState, time, and capturedMessage, then reads them when invoking AlarmEvent. Because the captured _session reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental ReadValueAsync calls may run against a session being disposed, throwing ObjectDisposedException — caught by the bare catch, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of Task.Run continuations each doing 3-4 server round-trips.

Recommendation: Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential ReadValueAsync calls into one Read request.

Resolution: Resolved 2026-05-22 — added a ReferenceEquals(session, _session) guard at the top of the Task.Run body to skip reads if the session was replaced by failover; separated ObjectDisposedException from the general catch to drop rather than deliver the stale alarm.

Client.Shared-008

Field Value
Severity Medium
Category Error handling & resilience
Location OpcUaClientService.cs:170-180, Helpers/ValueConverter.cs:15-31
Status Resolved

Description: WriteValueAsync coerces a string input to the target type by reading the node's current value and inferring the type from currentDataValue.Value. When the node has never been written, or the read returns a Bad status with a null Value, ValueConverter.ConvertValue falls through to the _ => rawValue default and writes a raw string into, for example, an Int32 node — the server then rejects it with BadTypeMismatch, surfacing as a confusing failure unrelated to the operator's input. Separately, ConvertValue uses bool.Parse, which accepts only true/false — operator input of 1/0 throws FormatException that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.

Recommendation: Inspect currentDataValue.StatusCode before trusting Value; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept 1/0/yes/no, and wrap parse failures in a descriptive exception naming the node and target type.

Resolution: Resolved 2026-05-22 — WriteValueAsync now checks StatusCode.IsGood and non-null Value before calling ConvertValue, throwing a descriptive InvalidOperationException on bad reads; ValueConverter now uses a ParseBool helper accepting true/false/1/0/yes/no (case-insensitive) and wraps all parse/overflow failures in a FormatException with the target type and input value in the message.

Client.Shared-009

Field Value
Severity Low
Category Error handling & resilience / Documentation & comments
Location OpcUaClientService.cs:302-322
Status Open

Description: AcknowledgeAlarmAsync is typed Task<StatusCode> and its XML doc implies the returned code reports the ack outcome, but the method unconditionally return StatusCodes.Good. The actual failure path is DefaultSessionAdapter.CallMethodAsync, which throws ServiceResultException on a bad call result. A failed acknowledgment therefore never returns a bad StatusCode — it throws — and the StatusCode return value is dead. Callers writing if (StatusCode.IsBad(result)) will never see a bad result and will not catch the exception.

Recommendation: Either change the return type to Task (and let exceptions signal failure), or catch ServiceResultException in AcknowledgeAlarmAsync and return its StatusCode. Update the XML doc to match whichever is chosen.

Resolution: (open)

Client.Shared-010

Field Value
Severity Low
Category Performance & resource management
Location Models/ConnectionSettings.cs:48, OpcUaClientService.cs:408-417
Status Open

Description: ConnectionSettings.CertificateStorePath is initialized to ClientStoragePaths.GetPkiPath() as a property initializer, so every ConnectionSettings instantiation runs Environment.GetFolderPath + Path.Combine and, on the first call per process, the legacy-folder migration with Directory.Exists/Directory.Move filesystem IO. ConnectToEndpointAsync constructs a fresh ConnectionSettings per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The _migrationChecked fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.

Recommendation: Make CertificateStorePath default to string.Empty and resolve ClientStoragePaths.GetPkiPath() lazily inside DefaultApplicationConfigurationFactory.CreateAsync only when the path is unset.

Resolution: (open)

Client.Shared-011

Field Value
Severity Low
Category Testing coverage
Location tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs
Status Open

Description: The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent SubscribeAsync/failover to expose the _activeDataSubscriptions race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in OnAlarmEventNotification (the Task.Run supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) WriteValueAsync string coercion against an unwritten/Bad-status node (Client.Shared-008) is untested; (d) the production adapters (DefaultSessionAdapter, DefaultEndpointDiscovery) have no unit coverage — understandable since they wrap the SDK, but the Results[0] guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.

Recommendation: Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract DefaultEndpointDiscovery best-endpoint selection into a pure function so it can be unit tested.

Resolution: (open)