fix(client-shared): resolve High code-review findings (Client.Shared-005, Client.Shared-006)
Client.Shared-005: _activeDataSubscriptions (a plain Dictionary) and the _activeAlarmSubscription tuple were mutated from the caller thread, the keep-alive failover path, and DisconnectAsync with no synchronization, risking bucket corrosion / InvalidOperationException / lost entries. Added a dedicated _subscriptionLock and wrapped every read/write of that bookkeeping state inside it (Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync). Awaited adapter calls stay outside the lock so it is never held across I/O. Client.Shared-006: HandleKeepAliveFailureAsync had only a non-atomic state check guarding re-entry, so two bad keep-alives could each start a failover loop, racing to dispose/replace _session and double-replaying subscriptions. It now claims an atomic _failoverInProgress slot via Interlocked.CompareExchange; a re-entrant call returns immediately. The loop body moved to RunFailoverAsync, wrapped in try/finally that resets the flag. Tests: added KeepAliveFailure_ReentrantWhileFailoverInFlight_RunsFailoverOnce and SubscribeAndUnsubscribe_ConcurrentCalls_DoNotCorruptState regression tests; made the FakeSubscriptionAdapter / FakeSessionAdapter / FakeSessionFactory test doubles thread-safe (and added a CreateGate hook) so the concurrency tests exercise production locking rather than fake state. All 138 Client.Shared tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
| Open findings | 9 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
@@ -93,13 +93,13 @@
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `_activeDataSubscriptions` is a plain `Dictionary` mutated from at least three thread contexts with no synchronization: the caller thread (`SubscribeAsync`/`UnsubscribeAsync`), the keep-alive callback thread (`HandleKeepAliveFailureAsync` -> `ReplaySubscriptionsAsync`, invoked fire-and-forget from the OPC UA `KeepAlive` event), and `DisconnectAsync`. Concurrent `Add`/`Remove`/`Clear`/enumeration on a non-thread-safe `Dictionary` can corrupt its internal buckets, throw `InvalidOperationException`, or lose entries. A failover firing while the UI calls `SubscribeAsync` is a realistic trigger. The `_activeAlarmSubscription` nullable tuple has the same exposure.
|
||||
|
||||
**Recommendation:** Guard all access to `_activeDataSubscriptions` / `_activeAlarmSubscription` (and the `_session`/`_dataSubscription`/`_alarmSubscription` fields) with a single lock, or move subscription bookkeeping behind a `ConcurrentDictionary` plus a lock for the multi-field failover transition.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — added a dedicated `_subscriptionLock` and wrapped every read/write of `_activeDataSubscriptions` and `_activeAlarmSubscription` (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O.
|
||||
|
||||
### Client.Shared-006
|
||||
|
||||
@@ -108,13 +108,13 @@
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
|
||||
**Description:** `HandleKeepAliveFailureAsync` is launched fire-and-forget (`_ = HandleKeepAliveFailureAsync()`) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check `if (_state == Reconnecting || _state == Disconnected) return;` at the top. Between that read and the `TransitionState(Reconnecting, ...)` write a few lines later, a second keep-alive failure (the SDK raises `KeepAlive` repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing `_session`, nulling subscription fields, and racing to assign a new `_session`. The session created by the loser leaks, and `ReplaySubscriptionsAsync` can run twice creating duplicate monitored items.
|
||||
|
||||
**Recommendation:** Serialize failover with an `Interlocked.CompareExchange` flag or a `SemaphoreSlim(1,1)` so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
**Resolution:** Resolved 2026-05-22 — `HandleKeepAliveFailureAsync` now claims an atomic `_failoverInProgress` slot via `Interlocked.CompareExchange(ref _failoverInProgress, 1, 0)`; a re-entrant bad keep-alive sees `1` and returns immediately, so only one failover loop runs. The loop body moved to `RunFailoverAsync`, wrapped in try/finally that resets the flag with `Interlocked.Exchange`.
|
||||
|
||||
### Client.Shared-007
|
||||
|
||||
|
||||
Reference in New Issue
Block a user