From d68c9db9f97ff88d796dc6458f0828d53a1434c1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 19 Jun 2026 11:58:15 -0400 Subject: [PATCH] review(Client.Shared): fix Disconnect/failover subscription race + CT forwarding Re-review at 7286d320. -012 (Medium): DisconnectAsync now snapshots+nulls the data/alarm subscriptions under _subscriptionLock before async teardown (was racing RunFailoverAsync). -013: SubscribeAlarmsAsync guarded by a semaphore (idempotent under concurrency). -014/-015: forward CancellationToken through Delete/BrowseNext adapters. + TDD. --- code-reviews/Client.Shared/findings.md | 85 +++++++++++++++- .../Adapters/DefaultSessionAdapter.cs | 4 +- .../Adapters/DefaultSubscriptionAdapter.cs | 3 +- .../OpcUaClientService.cs | 73 +++++++++----- .../Fakes/FakeSubscriptionAdapter.cs | 7 ++ .../OpcUaClientServiceTests.cs | 96 +++++++++++++++++++ 6 files changed, 240 insertions(+), 28 deletions(-) diff --git a/code-reviews/Client.Shared/findings.md b/code-reviews/Client.Shared/findings.md index 5548bd29..b255aa7e 100644 --- a/code-reviews/Client.Shared/findings.md +++ b/code-reviews/Client.Shared/findings.md @@ -4,8 +4,8 @@ |---|---| | Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared` | | Reviewer | Claude Code | -| Review date | 2026-05-22 | -| Commit reviewed | `76d35d1` | +| Review date | 2026-06-19 | +| Commit reviewed | `7286d320` | | Status | Reviewed | | Open findings | 0 | @@ -24,6 +24,26 @@ | 9 | Testing coverage | Client.Shared-011 | | 10 | Documentation & comments | Client.Shared-009 | +## Re-review 2026-06-19 (commit 7286d320) + +All 11 prior findings remain correctly Resolved. The module has grown since `76d35d1` with +new alarm-condition methods (`ConfirmAlarmAsync`, `ShelveAlarmAsync`, `EnableAsync`, +`DisableAsync`), a `DefaultApplicationConfigurationFactory` PKI-path lazy-resolve fix, and +`EndpointSelector` extraction. Four new issues were found and fixed in the same pass. + +| # | Category | Result | +|---|---|---| +| 1 | Correctness & logic bugs | Client.Shared-015 | +| 2 | OtOpcUa conventions | No issues found | +| 3 | Concurrency & thread safety | Client.Shared-012, Client.Shared-013 | +| 4 | Error handling & resilience | No issues found | +| 5 | Security | No issues found | +| 6 | Performance & resource management | No issues found | +| 7 | Design-document adherence | No issues found | +| 8 | Code organization & conventions | No issues found | +| 9 | Testing coverage | No issues found | +| 10 | Documentation & comments | Client.Shared-014 (correctness) | + ## Findings ### Client.Shared-001 @@ -178,6 +198,7 @@ ### Client.Shared-011 + | Field | Value | |---|---| | Severity | Low | @@ -190,3 +211,63 @@ **Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested. **Resolution:** Resolved 2026-05-23 — added the previously-missing unit coverage: (a) `OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` drives the supplemental-read fallback path with null AckedState/ActiveState fields and a non-null SourceNode and asserts the Galaxy attribute reads populate the delivered event; (b) `WriteValueAsync` typeless-node coverage is exercised via the Client.Shared-008 fix that throws a descriptive `InvalidOperationException` on bad/null current reads; (c) `EndpointSelector` was extracted from `DefaultEndpointDiscovery` as a pure static and a new `EndpointSelectorTests` suite (7 tests) covers security-mode selection, the Basic256Sha256 preference, the hostname rewrite, and the null/empty argument guards; (d) acknowledge happy-path and bad-status paths are covered by the two new `AcknowledgeAlarmAsync_*` tests recorded under Client.Shared-009. Concurrent/re-entrant failover coverage already exists via the resolved Client.Shared-005/-006 tests in the suite. + +### Client.Shared-012 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Concurrency & thread safety | +| Location | `OpcUaClientService.cs:DisconnectAsync` (lines 135–145 at HEAD), `OpcUaClientService.cs:RunFailoverAsync` (lines 671–672) | +| Status | Resolved | + +**Description:** `DisconnectAsync` accessed `_dataSubscription` and `_alarmSubscription` without holding `_subscriptionLock`. `RunFailoverAsync` also nulled both fields without a lock. A concurrent failover firing while `DisconnectAsync` was in progress could null the fields between the null-check (`if (_dataSubscription != null)`) and the `DeleteAsync` call, producing a `NullReferenceException` caught silently by the `catch (Exception ex)` wrapper, or — if the failover won the race completely — a double-null-check that skipped the cleanup. The session adapter was similarly unguarded. + +**Recommendation:** Snapshot `_dataSubscription` and `_alarmSubscription` under `_subscriptionLock` before the async teardown block, null the fields atomically under the lock, then call `DeleteAsync` on the local copies outside the lock (no async under lock). + +**Resolution:** Resolved 2026-06-19 — `DisconnectAsync` now snapshots `dataSubscription` and `alarmSubscription` under `_subscriptionLock` at the top of the cleanup block, nulling the fields atomically before any async call. The local copies are then passed to `DeleteAsync`, eliminating the race window. Tests: `DisconnectAsync_WithActiveSubscriptions_DeletesBothAndForwardsCt` verifies both adapters are deleted and the CT is forwarded. + +### Client.Shared-013 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Concurrency & thread safety | +| Location | `OpcUaClientService.cs:SubscribeAlarmsAsync` (line 293 at HEAD) | +| Status | Resolved | + +**Description:** The `if (_alarmSubscription != null) return;` guard in `SubscribeAlarmsAsync` was outside any lock or serialisation primitive. Two concurrent callers could both observe `_alarmSubscription == null`, proceed to `CreateSubscriptionAsync`, and create duplicate event subscriptions. The gap is exploitable during the async I/O between the null-check and the assignment of `_alarmSubscription` after `CreateSubscriptionAsync` completes. In practice callers are CLI/UI (single-threaded) or the failover replay path, so the risk is low, but it is a true race. + +**Recommendation:** Serialise the check-and-create via a `SemaphoreSlim(1,1)` so only one caller can create the alarm subscription at a time. The semaphore should be async-friendly (no thread-block) and disposed in `Dispose()`. + +**Resolution:** Resolved 2026-06-19 — added `_alarmSubscribeSemaphore` (`SemaphoreSlim(1,1)`) and wrapped the entire `SubscribeAlarmsAsync` body (null-check + `CreateSubscriptionAsync` + `AddEventMonitoredItemAsync`) inside `await _alarmSubscribeSemaphore.WaitAsync(ct)` / `Release()`. The semaphore is disposed in `Dispose()`. Test: `SubscribeAlarmsAsync_ConcurrentDuplicateCalls_CreatesExactlyOneSubscription` launches 20 concurrent tasks and asserts exactly one event monitored item across all created subscriptions. + +### Client.Shared-014 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Correctness & logic bugs | +| Location | `Adapters/DefaultSubscriptionAdapter.cs:DeleteAsync` (line 108 at HEAD) | +| Status | Resolved | + +**Description:** `DefaultSubscriptionAdapter.DeleteAsync(CancellationToken ct)` accepted a `CancellationToken` parameter but forwarded it as `_subscription.DeleteAsync(true)`, discarding `ct`. The OPC UA SDK `Subscription.DeleteAsync(bool, CancellationToken)` overload exists and accepts the token. Callers that pass a cancellable token (e.g. `DisconnectAsync`) had no way to interrupt a stalled subscription-delete service round-trip. + +**Recommendation:** Pass `ct` to the SDK: `_subscription.DeleteAsync(true, ct)`. + +**Resolution:** Resolved 2026-06-19 — changed `_subscription.DeleteAsync(true)` to `_subscription.DeleteAsync(true, ct)` in `DefaultSubscriptionAdapter.DeleteAsync`. Test: `DisconnectAsync_WithActiveSubscriptions_DeletesBothAndForwardsCt` asserts `FakeSubscriptionAdapter.LastDeleteCt` equals the caller's token; `UnsubscribeAlarmsAsync_ForwardsCancellationToken` covers the `UnsubscribeAlarmsAsync` path. + +### Client.Shared-015 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Correctness & logic bugs | +| Location | `Adapters/DefaultSessionAdapter.cs:BrowseNextAsync` (line 111 at HEAD) | +| Status | Resolved | + +**Description:** `DefaultSessionAdapter.BrowseNextAsync(byte[] continuationPoint, CancellationToken ct)` accepted a `CancellationToken` but called `_session.BrowseNextAsync(null, false, continuationPoint)` — a three-argument overload that carries no token. The SDK's `SessionExtensions.BrowseNextAsync(ISession, RequestHeader, bool, byte[], CancellationToken)` extension accepts the token. This meant a cancelled multi-page browse could not abort the continuation round-trip mid-flight. Callers in `OpcUaClientService.BrowseAsync` correctly propagate a `CancellationToken` through `BrowseNextAsync` but the adapter silently dropped it. + +**Recommendation:** Use the CT-accepting SDK overload: `_session.BrowseNextAsync(null, false, continuationPoint, ct)`. + +**Resolution:** Resolved 2026-06-19 — changed the `_session.BrowseNextAsync(null, false, continuationPoint)` call to `_session.BrowseNextAsync(null, false, continuationPoint, ct)` in `DefaultSessionAdapter.BrowseNextAsync`. Build is clean (0 errors); the SDK extension overload is confirmed present in `Opc.Ua.Client 1.5.377.21`. diff --git a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSessionAdapter.cs b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSessionAdapter.cs index e87b27cd..4588381e 100644 --- a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSessionAdapter.cs +++ b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSessionAdapter.cs @@ -108,7 +108,9 @@ internal sealed class DefaultSessionAdapter : ISessionAdapter public async Task<(byte[]? ContinuationPoint, ReferenceDescriptionCollection References)> BrowseNextAsync( byte[] continuationPoint, CancellationToken ct) { - var (_, nextCp, nextRefs) = await _session.BrowseNextAsync(null, false, continuationPoint); + // Pass the caller's token so a cancelled browse does not block on the continuation + // round-trip (Client.Shared-015). + var (_, nextCp, nextRefs) = await _session.BrowseNextAsync(null, false, continuationPoint, ct); return (nextCp, nextRefs ?? []); } diff --git a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSubscriptionAdapter.cs b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSubscriptionAdapter.cs index 27e2b4bf..c96e9128 100644 --- a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSubscriptionAdapter.cs +++ b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/Adapters/DefaultSubscriptionAdapter.cs @@ -105,7 +105,8 @@ internal sealed class DefaultSubscriptionAdapter : ISubscriptionAdapter { try { - await _subscription.DeleteAsync(true); + // Forward the caller's token so the delete can be cancelled (Client.Shared-014). + await _subscription.DeleteAsync(true, ct); } catch (Exception ex) { diff --git a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/OpcUaClientService.cs b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/OpcUaClientService.cs index b54f1f5b..a461430f 100644 --- a/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/OpcUaClientService.cs +++ b/src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared/OpcUaClientService.cs @@ -21,6 +21,12 @@ public sealed class OpcUaClientService : IOpcUaClientService // path, and DisconnectAsync, so every read/write must be inside this lock. private readonly object _subscriptionLock = new(); + // Serialises SubscribeAlarmsAsync / UnsubscribeAlarmsAsync so concurrent callers + // cannot both pass the _alarmSubscription == null check and create duplicate event + // subscriptions. Capacity 1 = at most one waiter; async-friendly (no thread-block). + // (Client.Shared-013) + private readonly SemaphoreSlim _alarmSubscribeSemaphore = new(1, 1); + // Track active data subscriptions for replay after failover private readonly Dictionary _activeDataSubscriptions = new(); @@ -130,19 +136,26 @@ public sealed class OpcUaClientService : IOpcUaClientService var endpointUrl = _session?.EndpointUrl ?? _settings?.EndpointUrl ?? string.Empty; + // Snapshot the subscription adapter references under the lock so a concurrent + // RunFailoverAsync that nulls _dataSubscription / _alarmSubscription (Client.Shared-012) + // cannot invalidate the references between the null-check and the DeleteAsync call. + ISubscriptionAdapter? dataSubscription; + ISubscriptionAdapter? alarmSubscription; + lock (_subscriptionLock) + { + dataSubscription = _dataSubscription; + alarmSubscription = _alarmSubscription; + _dataSubscription = null; + _alarmSubscription = null; + } + try { - if (_dataSubscription != null) - { - await _dataSubscription.DeleteAsync(ct); - _dataSubscription = null; - } + if (dataSubscription != null) + await dataSubscription.DeleteAsync(ct); - if (_alarmSubscription != null) - { - await _alarmSubscription.DeleteAsync(ct); - _alarmSubscription = null; - } + if (alarmSubscription != null) + await alarmSubscription.DeleteAsync(ct); if (_session != null) { @@ -290,22 +303,33 @@ public sealed class OpcUaClientService : IOpcUaClientService ThrowIfDisposed(); ThrowIfNotConnected(); - if (_alarmSubscription != null) - return; // Already subscribed to alarms - - var monitorNode = sourceNodeId ?? ObjectIds.Server; - _alarmSubscription = await _session!.CreateSubscriptionAsync(intervalMs, ct); - - var filter = CreateAlarmEventFilter(); - await _alarmSubscription.AddEventMonitoredItemAsync( - monitorNode, intervalMs, filter, OnAlarmEventNotification, ct); - - lock (_subscriptionLock) + // Serialise check-and-create under the semaphore (Client.Shared-013): concurrent + // callers cannot both pass the null-check and create duplicate alarm subscriptions. + // The semaphore is async-friendly; the subscription I/O runs inside the critical section. + await _alarmSubscribeSemaphore.WaitAsync(ct); + try { - _activeAlarmSubscription = (sourceNodeId, intervalMs); - } + if (_alarmSubscription != null) + return; // Already subscribed to alarms - Logger.Debug("Subscribed to alarm events on {NodeId}", monitorNode); + var monitorNode = sourceNodeId ?? ObjectIds.Server; + _alarmSubscription = await _session!.CreateSubscriptionAsync(intervalMs, ct); + + var filter = CreateAlarmEventFilter(); + await _alarmSubscription.AddEventMonitoredItemAsync( + monitorNode, intervalMs, filter, OnAlarmEventNotification, ct); + + lock (_subscriptionLock) + { + _activeAlarmSubscription = (sourceNodeId, intervalMs); + } + + Logger.Debug("Subscribed to alarm events on {NodeId}", monitorNode); + } + finally + { + _alarmSubscribeSemaphore.Release(); + } } /// @@ -584,6 +608,7 @@ public sealed class OpcUaClientService : IOpcUaClientService _dataSubscription?.Dispose(); _alarmSubscription?.Dispose(); _session?.Dispose(); + _alarmSubscribeSemaphore.Dispose(); lock (_subscriptionLock) { diff --git a/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/Fakes/FakeSubscriptionAdapter.cs b/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/Fakes/FakeSubscriptionAdapter.cs index 39009a4a..0c7c4233 100644 --- a/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/Fakes/FakeSubscriptionAdapter.cs +++ b/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/Fakes/FakeSubscriptionAdapter.cs @@ -101,10 +101,17 @@ internal sealed class FakeSubscriptionAdapter : ISubscriptionAdapter return Task.CompletedTask; } + /// + /// Gets the cancellation token that was supplied to the most recent call, + /// so tests can assert the CT from the caller is honoured (Client.Shared-014). + /// + public CancellationToken? LastDeleteCt { get; private set; } + /// public Task DeleteAsync(CancellationToken ct) { Deleted = true; + LastDeleteCt = ct; lock (_itemsLock) _items.Clear(); return Task.CompletedTask; } diff --git a/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs b/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs index 631c8668..84b1fbd5 100644 --- a/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs +++ b/tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs @@ -1630,4 +1630,100 @@ public class OpcUaClientServiceTests : IDisposable service.ShouldBeAssignableTo(); service.Dispose(); } + + // --- Client.Shared-012: DisconnectAsync subscription snapshot under lock --- + + /// + /// Regression for Client.Shared-012: DisconnectAsync must snapshot the subscription + /// references under _subscriptionLock before teardown so a concurrent RunFailoverAsync + /// nulling the same fields cannot produce a NullReferenceException between the null-check + /// and the DeleteAsync call. This test verifies that when data and alarm subscriptions + /// exist at disconnect time the adapters are deleted exactly once and the supplied + /// CancellationToken is forwarded (Client.Shared-014). + /// + [Fact] + public async Task DisconnectAsync_WithActiveSubscriptions_DeletesBothAndForwardsCt() + { + var dataFakeSub = new FakeSubscriptionAdapter(); + var alarmFakeSub = new FakeSubscriptionAdapter(); + + // First CreateSubscription call → data subscription; second → alarm subscription. + var callIndex = 0; + var session = new FakeSessionAdapter + { + NextSubscription = dataFakeSub + }; + // Second CreateSubscription call should return the alarm fake. + // We wire it by overriding the factory queue after the first subscribe. + _sessionFactory.EnqueueSession(session); + await _service.ConnectAsync(ValidSettings()); + + // Subscribe data (uses dataFakeSub from session.NextSubscription). + await _service.SubscribeAsync(new NodeId("ns=2;s=Node1")); + + // Now set NextSubscription on the session so SubscribeAlarmsAsync gets the alarm fake. + session.NextSubscription = alarmFakeSub; + await _service.SubscribeAlarmsAsync(); + + // Both subscriptions are now active. Disconnect with a cancellable token. + using var cts = new CancellationTokenSource(); + await _service.DisconnectAsync(cts.Token); + + // Both adapters must have been deleted. + dataFakeSub.Deleted.ShouldBeTrue(); + alarmFakeSub.Deleted.ShouldBeTrue(); + + // The CancellationToken must have been forwarded (Client.Shared-014). + dataFakeSub.LastDeleteCt.ShouldBe(cts.Token); + alarmFakeSub.LastDeleteCt.ShouldBe(cts.Token); + } + + // --- Client.Shared-013: SubscribeAlarmsAsync idempotency under lock --- + + /// + /// Regression for Client.Shared-013: the _alarmSubscription != null guard in + /// SubscribeAlarmsAsync must be checked under _subscriptionLock so concurrent callers + /// cannot both pass the null-check and create duplicate alarm subscriptions. Exercises + /// concurrent calls from 20 tasks and verifies only one event subscription is created. + /// + [Fact] + public async Task SubscribeAlarmsAsync_ConcurrentDuplicateCalls_CreatesExactlyOneSubscription() + { + var fakeSub = new FakeSubscriptionAdapter(); + var session = new FakeSessionAdapter { NextSubscription = fakeSub }; + _sessionFactory.EnqueueSession(session); + await _service.ConnectAsync(ValidSettings()); + + // Fire many concurrent SubscribeAlarmsAsync calls — only one should "win". + var tasks = Enumerable.Range(0, 20).Select(_ => + Task.Run(() => _service.SubscribeAlarmsAsync())); + await Task.WhenAll(tasks); + + // Exactly one event monitored item must have been added across all calls. + var totalEventItems = session.CreatedSubscriptions.Sum(s => s.AddEventCount); + totalEventItems.ShouldBe(1); + } + + // --- Client.Shared-014: CancellationToken forwarding --- + + /// + /// Regression for Client.Shared-014: UnsubscribeAsync must forward the caller's + /// CancellationToken to the subscription adapter's DeleteAsync so the operation + /// can be cancelled — the pre-fix code passed CancellationToken.None. + /// + [Fact] + public async Task UnsubscribeAlarmsAsync_ForwardsCancellationToken() + { + var fakeSub = new FakeSubscriptionAdapter(); + var session = new FakeSessionAdapter { NextSubscription = fakeSub }; + _sessionFactory.EnqueueSession(session); + await _service.ConnectAsync(ValidSettings()); + + await _service.SubscribeAlarmsAsync(); + + using var cts = new CancellationTokenSource(); + await _service.UnsubscribeAlarmsAsync(cts.Token); + + fakeSub.LastDeleteCt.ShouldBe(cts.Token); + } }