Client.Shared-001: lowered the OnAlarmEventNotification early-return guard from <6 to <1; per-index field guards already default missing fields safely. Client.Shared-002: GetRedundancyInfoAsync replaces unguarded unboxing casts with StatusCode.IsGood + Convert.ToInt32/ToByte, defaulting on bad reads. Client.Shared-007: alarm fallback Task.Run guards on ReferenceEquals(session, _session) and drops stale alarms on ObjectDisposedException after failover. Client.Shared-008: WriteValueAsync rejects type inference from bad/null reads; ValueConverter wraps parse failures in a descriptive FormatException. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Code Review — Client.Shared
| Field | Value |
|---|---|
| Module | src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | 76d35d1 |
| Status | Reviewed |
| Open findings | 5 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Shared-001, Client.Shared-002, Client.Shared-003 |
| 2 | OtOpcUa conventions | Client.Shared-004 |
| 3 | Concurrency & thread safety | Client.Shared-005, Client.Shared-006, Client.Shared-007 |
| 4 | Error handling & resilience | Client.Shared-008, Client.Shared-009 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Client.Shared-010 |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Client.Shared-011 |
| 10 | Documentation & comments | Client.Shared-009 |
Findings
Client.Shared-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OpcUaClientService.cs:552 |
| Status | Resolved |
Description: OnAlarmEventNotification returns early when eventFields.EventFields has fewer than 6 entries. The event filter built by CreateAlarmEventFilter always registers 13 select clauses, so a conforming server returns 13 fields. The < 6 threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with fields.Count > N. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the < 6 early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.
Recommendation: Drop the < 6 early return (or lower it to < 1) and rely on the existing per-index fields.Count > N guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.
Resolution: Resolved 2026-05-22 — lowered the early-return threshold to < 1 (null or empty guard only); per-index fields.Count > N guards already default missing fields safely for all higher indices.
Client.Shared-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OpcUaClientService.cs:351-355, OpcUaClientService.cs:373 |
| Status | Resolved |
Description: GetRedundancyInfoAsync performs unguarded unboxing casts on values read from the server: (int)redundancySupportValue.Value and (byte)serviceLevelValue.Value. Unlike the ServerUriArray/ServerArray reads below them, the RedundancySupport and ServiceLevel reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. ServiceLevel boxed as int instead of byte), or returns a null Value on a Bad DataValue, the cast throws InvalidCastException/NullReferenceException and the whole call fails instead of returning a sensible default.
Recommendation: Wrap the RedundancySupport and ServiceLevel reads in the same defensive pattern used for the array reads, using Convert.ToInt32/Convert.ToByte on the boxed value and falling back to None/0 when the read status is bad or the value is null.
Resolution: Resolved 2026-05-22 — replaced direct casts with StatusCode.IsGood guard + Convert.ToInt32/Convert.ToByte coercion; falls back to None/0 when status is bad or value is null.
Client.Shared-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | Adapters/DefaultSessionAdapter.cs:76, Adapters/DefaultSessionAdapter.cs:273 |
| Status | Open |
Description: WriteValueAsync returns response.Results[0] and CallMethodAsync reads result.Results[0] without first checking the Results collection is non-empty. A malformed or service-level-faulted response (empty Results alongside a service fault) produces an IndexOutOfRangeException rather than a meaningful OPC UA StatusCode or ServiceResultException.
Recommendation: Guard both accesses — throw ServiceResultException with the response's ResponseHeader.ServiceResult (or BadUnexpectedError) when Results is empty.
Resolution: (open)
Client.Shared-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | Adapters/DefaultSessionAdapter.cs:228, Adapters/DefaultSessionAdapter.cs:121, Adapters/DefaultSessionAdapter.cs:172 |
| Status | Open |
Description: CloseAsync, HistoryReadRawAsync, and HistoryReadAggregateAsync are declared async Task but call the synchronous Session.Close() / Session.HistoryRead(...) APIs and contain no await. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the CancellationToken parameter is ignored on these paths.
Recommendation: Use the stack's async overloads (Session.HistoryReadAsync, Session.CloseAsync) where available, or wrap the synchronous calls in Task.Run, so the methods are genuinely asynchronous and honor the cancellation token.
Resolution: (open)
Client.Shared-005
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:19, OpcUaClientService.cs:226-249, OpcUaClientService.cs:499-521 |
| Status | Resolved |
Description: _activeDataSubscriptions is a plain Dictionary mutated from at least three thread contexts with no synchronization: the caller thread (SubscribeAsync/UnsubscribeAsync), the keep-alive callback thread (HandleKeepAliveFailureAsync -> ReplaySubscriptionsAsync, invoked fire-and-forget from the OPC UA KeepAlive event), and DisconnectAsync. Concurrent Add/Remove/Clear/enumeration on a non-thread-safe Dictionary can corrupt its internal buckets, throw InvalidOperationException, or lose entries. A failover firing while the UI calls SubscribeAsync is a realistic trigger. The _activeAlarmSubscription nullable tuple has the same exposure.
Recommendation: Guard all access to _activeDataSubscriptions / _activeAlarmSubscription (and the _session/_dataSubscription/_alarmSubscription fields) with a single lock, or move subscription bookkeeping behind a ConcurrentDictionary plus a lock for the multi-field failover transition.
Resolution: Resolved 2026-05-22 — added a dedicated _subscriptionLock and wrapped every read/write of _activeDataSubscriptions and _activeAlarmSubscription (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O.
Client.Shared-006
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:97-100, OpcUaClientService.cs:432-497 |
| Status | Resolved |
Description: HandleKeepAliveFailureAsync is launched fire-and-forget (_ = HandleKeepAliveFailureAsync()) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check if (_state == Reconnecting || _state == Disconnected) return; at the top. Between that read and the TransitionState(Reconnecting, ...) write a few lines later, a second keep-alive failure (the SDK raises KeepAlive repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing _session, nulling subscription fields, and racing to assign a new _session. The session created by the loser leaks, and ReplaySubscriptionsAsync can run twice creating duplicate monitored items.
Recommendation: Serialize failover with an Interlocked.CompareExchange flag or a SemaphoreSlim(1,1) so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.
Resolution: Resolved 2026-05-22 — HandleKeepAliveFailureAsync now claims an atomic _failoverInProgress slot via Interlocked.CompareExchange(ref _failoverInProgress, 1, 0); a re-entrant bad keep-alive sees 1 and returns immediately, so only one failover loop runs. The loop body moved to RunFailoverAsync, wrapped in try/finally that resets the flag with Interlocked.Exchange.
Client.Shared-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:581-622 |
| Status | Resolved |
Description: In the alarm fallback path, the Task.Run closure mutates the captured locals activeState, ackedState, time, and capturedMessage, then reads them when invoking AlarmEvent. Because the captured _session reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental ReadValueAsync calls may run against a session being disposed, throwing ObjectDisposedException — caught by the bare catch, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of Task.Run continuations each doing 3-4 server round-trips.
Recommendation: Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential ReadValueAsync calls into one Read request.
Resolution: Resolved 2026-05-22 — added a ReferenceEquals(session, _session) guard at the top of the Task.Run body to skip reads if the session was replaced by failover; separated ObjectDisposedException from the general catch to drop rather than deliver the stale alarm.
Client.Shared-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | OpcUaClientService.cs:170-180, Helpers/ValueConverter.cs:15-31 |
| Status | Resolved |
Description: WriteValueAsync coerces a string input to the target type by reading the node's current value and inferring the type from currentDataValue.Value. When the node has never been written, or the read returns a Bad status with a null Value, ValueConverter.ConvertValue falls through to the _ => rawValue default and writes a raw string into, for example, an Int32 node — the server then rejects it with BadTypeMismatch, surfacing as a confusing failure unrelated to the operator's input. Separately, ConvertValue uses bool.Parse, which accepts only true/false — operator input of 1/0 throws FormatException that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.
Recommendation: Inspect currentDataValue.StatusCode before trusting Value; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept 1/0/yes/no, and wrap parse failures in a descriptive exception naming the node and target type.
Resolution: Resolved 2026-05-22 — WriteValueAsync now checks StatusCode.IsGood and non-null Value before calling ConvertValue, throwing a descriptive InvalidOperationException on bad reads; ValueConverter now uses a ParseBool helper accepting true/false/1/0/yes/no (case-insensitive) and wraps all parse/overflow failures in a FormatException with the target type and input value in the message.
Client.Shared-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience / Documentation & comments |
| Location | OpcUaClientService.cs:302-322 |
| Status | Open |
Description: AcknowledgeAlarmAsync is typed Task<StatusCode> and its XML doc implies the returned code reports the ack outcome, but the method unconditionally return StatusCodes.Good. The actual failure path is DefaultSessionAdapter.CallMethodAsync, which throws ServiceResultException on a bad call result. A failed acknowledgment therefore never returns a bad StatusCode — it throws — and the StatusCode return value is dead. Callers writing if (StatusCode.IsBad(result)) will never see a bad result and will not catch the exception.
Recommendation: Either change the return type to Task (and let exceptions signal failure), or catch ServiceResultException in AcknowledgeAlarmAsync and return its StatusCode. Update the XML doc to match whichever is chosen.
Resolution: (open)
Client.Shared-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | Models/ConnectionSettings.cs:48, OpcUaClientService.cs:408-417 |
| Status | Open |
Description: ConnectionSettings.CertificateStorePath is initialized to ClientStoragePaths.GetPkiPath() as a property initializer, so every ConnectionSettings instantiation runs Environment.GetFolderPath + Path.Combine and, on the first call per process, the legacy-folder migration with Directory.Exists/Directory.Move filesystem IO. ConnectToEndpointAsync constructs a fresh ConnectionSettings per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The _migrationChecked fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
Recommendation: Make CertificateStorePath default to string.Empty and resolve ClientStoragePaths.GetPkiPath() lazily inside DefaultApplicationConfigurationFactory.CreateAsync only when the path is unset.
Resolution: (open)
Client.Shared-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs |
| Status | Open |
Description: The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent SubscribeAsync/failover to expose the _activeDataSubscriptions race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in OnAlarmEventNotification (the Task.Run supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) WriteValueAsync string coercion against an unwritten/Bad-status node (Client.Shared-008) is untested; (d) the production adapters (DefaultSessionAdapter, DefaultEndpointDiscovery) have no unit coverage — understandable since they wrap the SDK, but the Results[0] guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
Recommendation: Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract DefaultEndpointDiscovery best-endpoint selection into a pure function so it can be unit tested.
Resolution: (open)