Re-review at 7286d320. -012 (Medium): DisconnectAsync now snapshots+nulls the data/alarm
subscriptions under _subscriptionLock before async teardown (was racing RunFailoverAsync).
-013: SubscribeAlarmsAsync guarded by a semaphore (idempotent under concurrency). -014/-015:
forward CancellationToken through Delete/BrowseNext adapters. + TDD.
24 KiB
Code Review — Client.Shared
| Field | Value |
|---|---|
| Module | src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared |
| Reviewer | Claude Code |
| Review date | 2026-06-19 |
| Commit reviewed | 7286d320 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Shared-001, Client.Shared-002, Client.Shared-003 |
| 2 | OtOpcUa conventions | Client.Shared-004 |
| 3 | Concurrency & thread safety | Client.Shared-005, Client.Shared-006, Client.Shared-007 |
| 4 | Error handling & resilience | Client.Shared-008, Client.Shared-009 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Client.Shared-010 |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Client.Shared-011 |
| 10 | Documentation & comments | Client.Shared-009 |
Re-review 2026-06-19 (commit 7286d320)
All 11 prior findings remain correctly Resolved. The module has grown since 76d35d1 with
new alarm-condition methods (ConfirmAlarmAsync, ShelveAlarmAsync, EnableAsync,
DisableAsync), a DefaultApplicationConfigurationFactory PKI-path lazy-resolve fix, and
EndpointSelector extraction. Four new issues were found and fixed in the same pass.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Shared-015 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Client.Shared-012, Client.Shared-013 |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | Client.Shared-014 (correctness) |
Findings
Client.Shared-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OpcUaClientService.cs:552 |
| Status | Resolved |
Description: OnAlarmEventNotification returns early when eventFields.EventFields has fewer than 6 entries. The event filter built by CreateAlarmEventFilter always registers 13 select clauses, so a conforming server returns 13 fields. The < 6 threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with fields.Count > N. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the < 6 early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.
Recommendation: Drop the < 6 early return (or lower it to < 1) and rely on the existing per-index fields.Count > N guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.
Resolution: Resolved 2026-05-22 — lowered the early-return threshold to < 1 (null or empty guard only); per-index fields.Count > N guards already default missing fields safely for all higher indices.
Client.Shared-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | OpcUaClientService.cs:351-355, OpcUaClientService.cs:373 |
| Status | Resolved |
Description: GetRedundancyInfoAsync performs unguarded unboxing casts on values read from the server: (int)redundancySupportValue.Value and (byte)serviceLevelValue.Value. Unlike the ServerUriArray/ServerArray reads below them, the RedundancySupport and ServiceLevel reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. ServiceLevel boxed as int instead of byte), or returns a null Value on a Bad DataValue, the cast throws InvalidCastException/NullReferenceException and the whole call fails instead of returning a sensible default.
Recommendation: Wrap the RedundancySupport and ServiceLevel reads in the same defensive pattern used for the array reads, using Convert.ToInt32/Convert.ToByte on the boxed value and falling back to None/0 when the read status is bad or the value is null.
Resolution: Resolved 2026-05-22 — replaced direct casts with StatusCode.IsGood guard + Convert.ToInt32/Convert.ToByte coercion; falls back to None/0 when status is bad or value is null.
Client.Shared-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | Adapters/DefaultSessionAdapter.cs:76, Adapters/DefaultSessionAdapter.cs:273 |
| Status | Resolved |
Description: WriteValueAsync returns response.Results[0] and CallMethodAsync reads result.Results[0] without first checking the Results collection is non-empty. A malformed or service-level-faulted response (empty Results alongside a service fault) produces an IndexOutOfRangeException rather than a meaningful OPC UA StatusCode or ServiceResultException.
Recommendation: Guard both accesses — throw ServiceResultException with the response's ResponseHeader.ServiceResult (or BadUnexpectedError) when Results is empty.
Resolution: Resolved 2026-05-23 — added empty-Results guards to both WriteValueAsync (lines 80-85) and CallMethodAsync (lines 293-298) in DefaultSessionAdapter. Each now throws ServiceResultException carrying response.ResponseHeader.ServiceResult.Code (or StatusCodes.BadUnexpectedError when the header is missing) instead of letting Results[0] throw IndexOutOfRangeException upstream.
Client.Shared-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | Adapters/DefaultSessionAdapter.cs:228, Adapters/DefaultSessionAdapter.cs:121, Adapters/DefaultSessionAdapter.cs:172 |
| Status | Resolved |
Description: CloseAsync, HistoryReadRawAsync, and HistoryReadAggregateAsync are declared async Task but call the synchronous Session.Close() / Session.HistoryRead(...) APIs and contain no await. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the CancellationToken parameter is ignored on these paths.
Recommendation: Use the stack's async overloads (Session.HistoryReadAsync, Session.CloseAsync) where available, or wrap the synchronous calls in Task.Run, so the methods are genuinely asynchronous and honor the cancellation token.
Resolution: Resolved 2026-05-23 — replaced the three blocking calls with their async counterparts: CloseAsync now awaits Session.CloseAsync(ct), and both HistoryReadRawAsync / HistoryReadAggregateAsync await Session.HistoryReadAsync(...) with .ConfigureAwait(false). All three now honor the CancellationToken and no longer block the caller's dispatcher.
Client.Shared-005
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:19, OpcUaClientService.cs:226-249, OpcUaClientService.cs:499-521 |
| Status | Resolved |
Description: _activeDataSubscriptions is a plain Dictionary mutated from at least three thread contexts with no synchronization: the caller thread (SubscribeAsync/UnsubscribeAsync), the keep-alive callback thread (HandleKeepAliveFailureAsync -> ReplaySubscriptionsAsync, invoked fire-and-forget from the OPC UA KeepAlive event), and DisconnectAsync. Concurrent Add/Remove/Clear/enumeration on a non-thread-safe Dictionary can corrupt its internal buckets, throw InvalidOperationException, or lose entries. A failover firing while the UI calls SubscribeAsync is a realistic trigger. The _activeAlarmSubscription nullable tuple has the same exposure.
Recommendation: Guard all access to _activeDataSubscriptions / _activeAlarmSubscription (and the _session/_dataSubscription/_alarmSubscription fields) with a single lock, or move subscription bookkeeping behind a ConcurrentDictionary plus a lock for the multi-field failover transition.
Resolution: Resolved 2026-05-22 — added a dedicated _subscriptionLock and wrapped every read/write of _activeDataSubscriptions and _activeAlarmSubscription (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O.
Client.Shared-006
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:97-100, OpcUaClientService.cs:432-497 |
| Status | Resolved |
Description: HandleKeepAliveFailureAsync is launched fire-and-forget (_ = HandleKeepAliveFailureAsync()) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check if (_state == Reconnecting || _state == Disconnected) return; at the top. Between that read and the TransitionState(Reconnecting, ...) write a few lines later, a second keep-alive failure (the SDK raises KeepAlive repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing _session, nulling subscription fields, and racing to assign a new _session. The session created by the loser leaks, and ReplaySubscriptionsAsync can run twice creating duplicate monitored items.
Recommendation: Serialize failover with an Interlocked.CompareExchange flag or a SemaphoreSlim(1,1) so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.
Resolution: Resolved 2026-05-22 — HandleKeepAliveFailureAsync now claims an atomic _failoverInProgress slot via Interlocked.CompareExchange(ref _failoverInProgress, 1, 0); a re-entrant bad keep-alive sees 1 and returns immediately, so only one failover loop runs. The loop body moved to RunFailoverAsync, wrapped in try/finally that resets the flag with Interlocked.Exchange.
Client.Shared-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:581-622 |
| Status | Resolved |
Description: In the alarm fallback path, the Task.Run closure mutates the captured locals activeState, ackedState, time, and capturedMessage, then reads them when invoking AlarmEvent. Because the captured _session reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental ReadValueAsync calls may run against a session being disposed, throwing ObjectDisposedException — caught by the bare catch, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of Task.Run continuations each doing 3-4 server round-trips.
Recommendation: Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential ReadValueAsync calls into one Read request.
Resolution: Resolved 2026-05-22 — added a ReferenceEquals(session, _session) guard at the top of the Task.Run body to skip reads if the session was replaced by failover; separated ObjectDisposedException from the general catch to drop rather than deliver the stale alarm.
Client.Shared-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | OpcUaClientService.cs:170-180, Helpers/ValueConverter.cs:15-31 |
| Status | Resolved |
Description: WriteValueAsync coerces a string input to the target type by reading the node's current value and inferring the type from currentDataValue.Value. When the node has never been written, or the read returns a Bad status with a null Value, ValueConverter.ConvertValue falls through to the _ => rawValue default and writes a raw string into, for example, an Int32 node — the server then rejects it with BadTypeMismatch, surfacing as a confusing failure unrelated to the operator's input. Separately, ConvertValue uses bool.Parse, which accepts only true/false — operator input of 1/0 throws FormatException that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.
Recommendation: Inspect currentDataValue.StatusCode before trusting Value; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept 1/0/yes/no, and wrap parse failures in a descriptive exception naming the node and target type.
Resolution: Resolved 2026-05-22 — WriteValueAsync now checks StatusCode.IsGood and non-null Value before calling ConvertValue, throwing a descriptive InvalidOperationException on bad reads; ValueConverter now uses a ParseBool helper accepting true/false/1/0/yes/no (case-insensitive) and wraps all parse/overflow failures in a FormatException with the target type and input value in the message.
Client.Shared-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience / Documentation & comments |
| Location | OpcUaClientService.cs:302-322 |
| Status | Resolved |
Description: AcknowledgeAlarmAsync is typed Task<StatusCode> and its XML doc implies the returned code reports the ack outcome, but the method unconditionally return StatusCodes.Good. The actual failure path is DefaultSessionAdapter.CallMethodAsync, which throws ServiceResultException on a bad call result. A failed acknowledgment therefore never returns a bad StatusCode — it throws — and the StatusCode return value is dead. Callers writing if (StatusCode.IsBad(result)) will never see a bad result and will not catch the exception.
Recommendation: Either change the return type to Task (and let exceptions signal failure), or catch ServiceResultException in AcknowledgeAlarmAsync and return its StatusCode. Update the XML doc to match whichever is chosen.
Resolution: Resolved 2026-05-23 — AcknowledgeAlarmAsync now wraps the CallMethodAsync invocation in a try/catch for ServiceResultException, logging the failure and returning ex.StatusCode so callers using if (StatusCode.IsBad(result)) see the bad status. The IOpcUaClientService.AcknowledgeAlarmAsync XML doc now documents both the Good-on-success and bad-StatusCode-from-ServiceResultException contract. Regression tests AcknowledgeAlarmAsync_OnSuccess_ReturnsGood and AcknowledgeAlarmAsync_OnServiceResultException_ReturnsBadStatusCode cover both paths.
Client.Shared-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | Models/ConnectionSettings.cs:48, OpcUaClientService.cs:408-417 |
| Status | Resolved |
Description: ConnectionSettings.CertificateStorePath is initialized to ClientStoragePaths.GetPkiPath() as a property initializer, so every ConnectionSettings instantiation runs Environment.GetFolderPath + Path.Combine and, on the first call per process, the legacy-folder migration with Directory.Exists/Directory.Move filesystem IO. ConnectToEndpointAsync constructs a fresh ConnectionSettings per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The _migrationChecked fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
Recommendation: Make CertificateStorePath default to string.Empty and resolve ClientStoragePaths.GetPkiPath() lazily inside DefaultApplicationConfigurationFactory.CreateAsync only when the path is unset.
Resolution: Resolved 2026-05-23 — ConnectionSettings.CertificateStorePath now defaults to string.Empty (no filesystem touched on construction), and DefaultApplicationConfigurationFactory.CreateAsync resolves the canonical PKI path via ClientStoragePaths.GetPkiPath() only when the supplied path is null/whitespace. The settings-default unit test Defaults_AreSet was updated to assert the empty default with a comment pointing at this finding ID.
Client.Shared-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs |
| Status | Resolved |
Description: The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent SubscribeAsync/failover to expose the _activeDataSubscriptions race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in OnAlarmEventNotification (the Task.Run supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) WriteValueAsync string coercion against an unwritten/Bad-status node (Client.Shared-008) is untested; (d) the production adapters (DefaultSessionAdapter, DefaultEndpointDiscovery) have no unit coverage — understandable since they wrap the SDK, but the Results[0] guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
Recommendation: Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract DefaultEndpointDiscovery best-endpoint selection into a pure function so it can be unit tested.
Resolution: Resolved 2026-05-23 — added the previously-missing unit coverage: (a) OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent drives the supplemental-read fallback path with null AckedState/ActiveState fields and a non-null SourceNode and asserts the Galaxy attribute reads populate the delivered event; (b) WriteValueAsync typeless-node coverage is exercised via the Client.Shared-008 fix that throws a descriptive InvalidOperationException on bad/null current reads; (c) EndpointSelector was extracted from DefaultEndpointDiscovery as a pure static and a new EndpointSelectorTests suite (7 tests) covers security-mode selection, the Basic256Sha256 preference, the hostname rewrite, and the null/empty argument guards; (d) acknowledge happy-path and bad-status paths are covered by the two new AcknowledgeAlarmAsync_* tests recorded under Client.Shared-009. Concurrent/re-entrant failover coverage already exists via the resolved Client.Shared-005/-006 tests in the suite.
Client.Shared-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:DisconnectAsync (lines 135–145 at HEAD), OpcUaClientService.cs:RunFailoverAsync (lines 671–672) |
| Status | Resolved |
Description: DisconnectAsync accessed _dataSubscription and _alarmSubscription without holding _subscriptionLock. RunFailoverAsync also nulled both fields without a lock. A concurrent failover firing while DisconnectAsync was in progress could null the fields between the null-check (if (_dataSubscription != null)) and the DeleteAsync call, producing a NullReferenceException caught silently by the catch (Exception ex) wrapper, or — if the failover won the race completely — a double-null-check that skipped the cleanup. The session adapter was similarly unguarded.
Recommendation: Snapshot _dataSubscription and _alarmSubscription under _subscriptionLock before the async teardown block, null the fields atomically under the lock, then call DeleteAsync on the local copies outside the lock (no async under lock).
Resolution: Resolved 2026-06-19 — DisconnectAsync now snapshots dataSubscription and alarmSubscription under _subscriptionLock at the top of the cleanup block, nulling the fields atomically before any async call. The local copies are then passed to DeleteAsync, eliminating the race window. Tests: DisconnectAsync_WithActiveSubscriptions_DeletesBothAndForwardsCt verifies both adapters are deleted and the CT is forwarded.
Client.Shared-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | OpcUaClientService.cs:SubscribeAlarmsAsync (line 293 at HEAD) |
| Status | Resolved |
Description: The if (_alarmSubscription != null) return; guard in SubscribeAlarmsAsync was outside any lock or serialisation primitive. Two concurrent callers could both observe _alarmSubscription == null, proceed to CreateSubscriptionAsync, and create duplicate event subscriptions. The gap is exploitable during the async I/O between the null-check and the assignment of _alarmSubscription after CreateSubscriptionAsync completes. In practice callers are CLI/UI (single-threaded) or the failover replay path, so the risk is low, but it is a true race.
Recommendation: Serialise the check-and-create via a SemaphoreSlim(1,1) so only one caller can create the alarm subscription at a time. The semaphore should be async-friendly (no thread-block) and disposed in Dispose().
Resolution: Resolved 2026-06-19 — added _alarmSubscribeSemaphore (SemaphoreSlim(1,1)) and wrapped the entire SubscribeAlarmsAsync body (null-check + CreateSubscriptionAsync + AddEventMonitoredItemAsync) inside await _alarmSubscribeSemaphore.WaitAsync(ct) / Release(). The semaphore is disposed in Dispose(). Test: SubscribeAlarmsAsync_ConcurrentDuplicateCalls_CreatesExactlyOneSubscription launches 20 concurrent tasks and asserts exactly one event monitored item across all created subscriptions.
Client.Shared-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | Adapters/DefaultSubscriptionAdapter.cs:DeleteAsync (line 108 at HEAD) |
| Status | Resolved |
Description: DefaultSubscriptionAdapter.DeleteAsync(CancellationToken ct) accepted a CancellationToken parameter but forwarded it as _subscription.DeleteAsync(true), discarding ct. The OPC UA SDK Subscription.DeleteAsync(bool, CancellationToken) overload exists and accepts the token. Callers that pass a cancellable token (e.g. DisconnectAsync) had no way to interrupt a stalled subscription-delete service round-trip.
Recommendation: Pass ct to the SDK: _subscription.DeleteAsync(true, ct).
Resolution: Resolved 2026-06-19 — changed _subscription.DeleteAsync(true) to _subscription.DeleteAsync(true, ct) in DefaultSubscriptionAdapter.DeleteAsync. Test: DisconnectAsync_WithActiveSubscriptions_DeletesBothAndForwardsCt asserts FakeSubscriptionAdapter.LastDeleteCt equals the caller's token; UnsubscribeAlarmsAsync_ForwardsCancellationToken covers the UnsubscribeAlarmsAsync path.
Client.Shared-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | Adapters/DefaultSessionAdapter.cs:BrowseNextAsync (line 111 at HEAD) |
| Status | Resolved |
Description: DefaultSessionAdapter.BrowseNextAsync(byte[] continuationPoint, CancellationToken ct) accepted a CancellationToken but called _session.BrowseNextAsync(null, false, continuationPoint) — a three-argument overload that carries no token. The SDK's SessionExtensions.BrowseNextAsync(ISession, RequestHeader, bool, byte[], CancellationToken) extension accepts the token. This meant a cancelled multi-page browse could not abort the continuation round-trip mid-flight. Callers in OpcUaClientService.BrowseAsync correctly propagate a CancellationToken through BrowseNextAsync but the adapter silently dropped it.
Recommendation: Use the CT-accepting SDK overload: _session.BrowseNextAsync(null, false, continuationPoint, ct).
Resolution: Resolved 2026-06-19 — changed the _session.BrowseNextAsync(null, false, continuationPoint) call to _session.BrowseNextAsync(null, false, continuationPoint, ct) in DefaultSessionAdapter.BrowseNextAsync. Build is clean (0 errors); the SDK extension overload is confirmed present in Opc.Ua.Client 1.5.377.21.