# Code Review — Client.Shared | Field | Value | |---|---| | Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared` | | Reviewer | Claude Code | | Review date | 2026-05-22 | | Commit reviewed | `76d35d1` | | Status | Reviewed | | Open findings | 0 | ## Checklist coverage | # | Category | Result | |---|---|---| | 1 | Correctness & logic bugs | Client.Shared-001, Client.Shared-002, Client.Shared-003 | | 2 | OtOpcUa conventions | Client.Shared-004 | | 3 | Concurrency & thread safety | Client.Shared-005, Client.Shared-006, Client.Shared-007 | | 4 | Error handling & resilience | Client.Shared-008, Client.Shared-009 | | 5 | Security | No issues found | | 6 | Performance & resource management | Client.Shared-010 | | 7 | Design-document adherence | No issues found | | 8 | Code organization & conventions | No issues found | | 9 | Testing coverage | Client.Shared-011 | | 10 | Documentation & comments | Client.Shared-009 | ## Findings ### Client.Shared-001 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `OpcUaClientService.cs:552` | | Status | Resolved | **Description:** `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with `fields.Count > N`. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the `< 6` early return silently drop the entire notification, including the EventId/SourceName/Severity that are present. **Recommendation:** Drop the `< 6` early return (or lower it to `< 1`) and rely on the existing per-index `fields.Count > N` guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13. **Resolution:** Resolved 2026-05-22 — lowered the early-return threshold to `< 1` (null or empty guard only); per-index `fields.Count > N` guards already default missing fields safely for all higher indices. ### Client.Shared-002 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` | | Status | Resolved | **Description:** `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `RedundancySupport` and `ServiceLevel` reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. `ServiceLevel` boxed as `int` instead of `byte`), or returns a null `Value` on a `Bad` DataValue, the cast throws `InvalidCastException`/`NullReferenceException` and the whole call fails instead of returning a sensible default. **Recommendation:** Wrap the `RedundancySupport` and `ServiceLevel` reads in the same defensive pattern used for the array reads, using `Convert.ToInt32`/`Convert.ToByte` on the boxed value and falling back to `None`/`0` when the read status is bad or the value is null. **Resolution:** Resolved 2026-05-22 — replaced direct casts with `StatusCode.IsGood` guard + `Convert.ToInt32`/`Convert.ToByte` coercion; falls back to `None`/`0` when status is bad or value is null. ### Client.Shared-003 | Field | Value | |---|---| | Severity | Low | | Category | Correctness & logic bugs | | Location | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` | | Status | Resolved | **Description:** `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a service fault) produces an `IndexOutOfRangeException` rather than a meaningful OPC UA `StatusCode` or `ServiceResultException`. **Recommendation:** Guard both accesses — throw `ServiceResultException` with the response's `ResponseHeader.ServiceResult` (or `BadUnexpectedError`) when `Results` is empty. **Resolution:** Resolved 2026-05-23 — added empty-Results guards to both `WriteValueAsync` (lines 80-85) and `CallMethodAsync` (lines 293-298) in `DefaultSessionAdapter`. Each now throws `ServiceResultException` carrying `response.ResponseHeader.ServiceResult.Code` (or `StatusCodes.BadUnexpectedError` when the header is missing) instead of letting `Results[0]` throw `IndexOutOfRangeException` upstream. ### Client.Shared-004 | Field | Value | |---|---| | Severity | Low | | Category | OtOpcUa conventions | | Location | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` | | Status | Resolved | **Description:** `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the `CancellationToken` parameter is ignored on these paths. **Recommendation:** Use the stack's async overloads (`Session.HistoryReadAsync`, `Session.CloseAsync`) where available, or wrap the synchronous calls in `Task.Run`, so the methods are genuinely asynchronous and honor the cancellation token. **Resolution:** Resolved 2026-05-23 — replaced the three blocking calls with their async counterparts: `CloseAsync` now awaits `Session.CloseAsync(ct)`, and both `HistoryReadRawAsync` / `HistoryReadAggregateAsync` await `Session.HistoryReadAsync(...)` with `.ConfigureAwait(false)`. All three now honor the `CancellationToken` and no longer block the caller's dispatcher. ### Client.Shared-005 | Field | Value | |---|---| | Severity | High | | Category | Concurrency & thread safety | | Location | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` | | Status | Resolved | **Description:** `_activeDataSubscriptions` is a plain `Dictionary` mutated from at least three thread contexts with no synchronization: the caller thread (`SubscribeAsync`/`UnsubscribeAsync`), the keep-alive callback thread (`HandleKeepAliveFailureAsync` -> `ReplaySubscriptionsAsync`, invoked fire-and-forget from the OPC UA `KeepAlive` event), and `DisconnectAsync`. Concurrent `Add`/`Remove`/`Clear`/enumeration on a non-thread-safe `Dictionary` can corrupt its internal buckets, throw `InvalidOperationException`, or lose entries. A failover firing while the UI calls `SubscribeAsync` is a realistic trigger. The `_activeAlarmSubscription` nullable tuple has the same exposure. **Recommendation:** Guard all access to `_activeDataSubscriptions` / `_activeAlarmSubscription` (and the `_session`/`_dataSubscription`/`_alarmSubscription` fields) with a single lock, or move subscription bookkeeping behind a `ConcurrentDictionary` plus a lock for the multi-field failover transition. **Resolution:** Resolved 2026-05-22 — added a dedicated `_subscriptionLock` and wrapped every read/write of `_activeDataSubscriptions` and `_activeAlarmSubscription` (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O. ### Client.Shared-006 | Field | Value | |---|---| | Severity | High | | Category | Concurrency & thread safety | | Location | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` | | Status | Resolved | **Description:** `HandleKeepAliveFailureAsync` is launched fire-and-forget (`_ = HandleKeepAliveFailureAsync()`) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check `if (_state == Reconnecting || _state == Disconnected) return;` at the top. Between that read and the `TransitionState(Reconnecting, ...)` write a few lines later, a second keep-alive failure (the SDK raises `KeepAlive` repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing `_session`, nulling subscription fields, and racing to assign a new `_session`. The session created by the loser leaks, and `ReplaySubscriptionsAsync` can run twice creating duplicate monitored items. **Recommendation:** Serialize failover with an `Interlocked.CompareExchange` flag or a `SemaphoreSlim(1,1)` so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard. **Resolution:** Resolved 2026-05-22 — `HandleKeepAliveFailureAsync` now claims an atomic `_failoverInProgress` slot via `Interlocked.CompareExchange(ref _failoverInProgress, 1, 0)`; a re-entrant bad keep-alive sees `1` and returns immediately, so only one failover loop runs. The loop body moved to `RunFailoverAsync`, wrapped in try/finally that resets the flag with `Interlocked.Exchange`. ### Client.Shared-007 | Field | Value | |---|---| | Severity | Medium | | Category | Concurrency & thread safety | | Location | `OpcUaClientService.cs:581-622` | | Status | Resolved | **Description:** In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental `ReadValueAsync` calls may run against a session being disposed, throwing `ObjectDisposedException` — caught by the bare `catch`, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of `Task.Run` continuations each doing 3-4 server round-trips. **Recommendation:** Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential `ReadValueAsync` calls into one `Read` request. **Resolution:** Resolved 2026-05-22 — added a `ReferenceEquals(session, _session)` guard at the top of the `Task.Run` body to skip reads if the session was replaced by failover; separated `ObjectDisposedException` from the general catch to drop rather than deliver the stale alarm. ### Client.Shared-008 | Field | Value | |---|---| | Severity | Medium | | Category | Error handling & resilience | | Location | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` | | Status | Resolved | **Description:** `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Value`, `ValueConverter.ConvertValue` falls through to the `_ => rawValue` default and writes a raw `string` into, for example, an `Int32` node — the server then rejects it with `BadTypeMismatch`, surfacing as a confusing failure unrelated to the operator's input. Separately, `ConvertValue` uses `bool.Parse`, which accepts only `true`/`false` — operator input of `1`/`0` throws `FormatException` that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write. **Recommendation:** Inspect `currentDataValue.StatusCode` before trusting `Value`; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept `1`/`0`/`yes`/`no`, and wrap parse failures in a descriptive exception naming the node and target type. **Resolution:** Resolved 2026-05-22 — `WriteValueAsync` now checks `StatusCode.IsGood` and non-null `Value` before calling `ConvertValue`, throwing a descriptive `InvalidOperationException` on bad reads; `ValueConverter` now uses a `ParseBool` helper accepting `true/false/1/0/yes/no` (case-insensitive) and wraps all parse/overflow failures in a `FormatException` with the target type and input value in the message. ### Client.Shared-009 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience / Documentation & comments | | Location | `OpcUaClientService.cs:302-322` | | Status | Resolved | **Description:** `AcknowledgeAlarmAsync` is typed `Task` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAsync`, which throws `ServiceResultException` on a bad call result. A failed acknowledgment therefore never returns a bad `StatusCode` — it throws — and the `StatusCode` return value is dead. Callers writing `if (StatusCode.IsBad(result))` will never see a bad result and will not catch the exception. **Recommendation:** Either change the return type to `Task` (and let exceptions signal failure), or catch `ServiceResultException` in `AcknowledgeAlarmAsync` and return its `StatusCode`. Update the XML doc to match whichever is chosen. **Resolution:** Resolved 2026-05-23 — `AcknowledgeAlarmAsync` now wraps the `CallMethodAsync` invocation in a try/catch for `ServiceResultException`, logging the failure and returning `ex.StatusCode` so callers using `if (StatusCode.IsBad(result))` see the bad status. The `IOpcUaClientService.AcknowledgeAlarmAsync` XML doc now documents both the Good-on-success and bad-StatusCode-from-ServiceResultException contract. Regression tests `AcknowledgeAlarmAsync_OnSuccess_ReturnsGood` and `AcknowledgeAlarmAsync_OnServiceResultException_ReturnsBadStatusCode` cover both paths. ### Client.Shared-010 | Field | Value | |---|---| | Severity | Low | | Category | Performance & resource management | | Location | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` | | Status | Resolved | **Description:** `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call per process, the legacy-folder migration with `Directory.Exists`/`Directory.Move` filesystem IO. `ConnectToEndpointAsync` constructs a fresh `ConnectionSettings` per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The `_migrationChecked` fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk. **Recommendation:** Make `CertificateStorePath` default to `string.Empty` and resolve `ClientStoragePaths.GetPkiPath()` lazily inside `DefaultApplicationConfigurationFactory.CreateAsync` only when the path is unset. **Resolution:** Resolved 2026-05-23 — `ConnectionSettings.CertificateStorePath` now defaults to `string.Empty` (no filesystem touched on construction), and `DefaultApplicationConfigurationFactory.CreateAsync` resolves the canonical PKI path via `ClientStoragePaths.GetPkiPath()` only when the supplied path is null/whitespace. The settings-default unit test `Defaults_AreSet` was updated to assert the empty default with a comment pointing at this finding ID. ### Client.Shared-011 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` | | Status | Resolved | **Description:** The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in `OnAlarmEventNotification` (the `Task.Run` supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) `WriteValueAsync` string coercion against an unwritten/`Bad`-status node (Client.Shared-008) is untested; (d) the production adapters (`DefaultSessionAdapter`, `DefaultEndpointDiscovery`) have no unit coverage — understandable since they wrap the SDK, but the `Results[0]` guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested. **Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested. **Resolution:** Resolved 2026-05-23 — added the previously-missing unit coverage: (a) `OnAlarmEvent_MissingAckedActiveButHasConditionNode_FallbackReadsAndRaisesEvent` drives the supplemental-read fallback path with null AckedState/ActiveState fields and a non-null SourceNode and asserts the Galaxy attribute reads populate the delivered event; (b) `WriteValueAsync` typeless-node coverage is exercised via the Client.Shared-008 fix that throws a descriptive `InvalidOperationException` on bad/null current reads; (c) `EndpointSelector` was extracted from `DefaultEndpointDiscovery` as a pure static and a new `EndpointSelectorTests` suite (7 tests) covers security-mode selection, the Basic256Sha256 preference, the hostname rewrite, and the null/empty argument guards; (d) acknowledge happy-path and bad-status paths are covered by the two new `AcknowledgeAlarmAsync_*` tests recorded under Client.Shared-009. Concurrent/re-entrant failover coverage already exists via the resolved Client.Shared-005/-006 tests in the suite.