review(Driver.OpcUaClient): release browse continuation point on cancel

Re-review at 7286d320. -016: BrowseRecursiveAsync now releases the server-side continuation
point on OperationCanceledException (BrowseNext releaseContinuationPoints:true) before
rethrowing (resolves the Browser-002 cross-cutting leak) + TDD.
This commit is contained in:
Joseph Doherty
2026-06-19 11:47:11 -04:00
parent 04e0877bff
commit be272d960f
3 changed files with 302 additions and 7 deletions
+42 -2
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Review date | 2026-06-19 |
| Commit reviewed | `7286d320` |
| Status | Reviewed |
| Open findings | 0 |
@@ -250,3 +250,43 @@
**Recommendation:** Add tests exercising the reconnect callbacks with a stub session (success and give-up cases), a browse test with a paged/continuation-point server stub, and a read-batch test asserting upstream Bad StatusCodes pass through verbatim while a transport throw fans out the local fault code.
**Resolution:** Resolved 2026-05-22 — Added `OpcUaClientMediumFindingsRegressionTests.cs` covering: (1) BadTimeout vs BadCommunicationError status-code distinction for the write-timeout path (Driver.OpcUaClient-009); (2) Byte→UInt16 mapping regression (Driver.OpcUaClient-010); (3) AutoAcceptCertificates warning log assertion (Driver.OpcUaClient-012); (4) GetMemoryFootprint/FlushOptionalCachesAsync contract (Driver.OpcUaClient-013); (5) MapSeverity thresholds, pre-init health, Session null pre-init, GetHostStatuses contract. Wire-level reconnect callback tests remain fixture-gated pending the in-process OPC UA server fixture.
---
## Re-review 2026-06-19 (commit 7286d320)
#### Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No new issues found |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | No new issues found |
| 4 | Error handling & resilience | Driver.OpcUaClient-016 (continuation-point leak on cancellation) |
| 5 | Security | No new issues found |
| 6 | Performance & resource management | Driver.OpcUaClient-016 (server-side cursor held open) |
| 7 | Design-document adherence | No new issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.OpcUaClient-016 fixed with two new regression tests |
| 10 | Documentation & comments | No new issues found |
#### Browser-002 verdict
The Browser subagent (Driver.OpcUaClient.Browser-002) noted "the identical pattern exists in the runtime driver (`OpcUaClientDriver.cs`)" and deferred cross-cutting resolution here. Evaluated and confirmed: `BrowseRecursiveAsync` lines 934-948 at this commit contain the exact same continuation-point leak on `OperationCanceledException`. Self-contained fix is recorded as Driver.OpcUaClient-016 below and applied with TDD.
### Driver.OpcUaClient-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management / Error handling & resilience |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriver.cs:934-948` |
| Status | Resolved |
**Description:** When the caller's `CancellationToken` fires during the `BrowseNextAsync` pagination loop inside `BrowseRecursiveAsync`, the `OperationCanceledException` propagates out of the while-loop and is caught by the outer `catch` (line 950) which silently swallows it. No call to `BrowseNextAsync(releaseContinuationPoints: true)` is made, so the server holds the browse cursor open until the session closes or a server-side timeout fires. For long-lived discovery sessions against large remote servers this represents a server-side resource leak that can accumulate across multiple cancelled discovery passes. Additionally, cancellation was silently indistinguishable from a transient browse failure: callers could not observe that their `CancellationToken` was respected.
This is the same pattern the Browser subagent recorded as Driver.OpcUaClient.Browser-002, which noted it also exists in this runtime pagination loop.
**Recommendation:** Catch `OperationCanceledException` specifically inside the while-loop, issue a fire-and-forget `BrowseNextAsync(releaseContinuationPoints: true, ct: CancellationToken.None)` to release the cursor before rethrowing. Change the outer catch to propagate `OperationCanceledException` rather than swallowing it, so callers can observe cancelled discovery.
**Resolution:** Resolved 2026-06-19 (SHA pending commit) — Added an inner `catch (OperationCanceledException)` in the pagination while-loop that releases the server-side continuation point via `BrowseNextAsync(releaseContinuationPoints: true, ct: CancellationToken.None)` then rethrows; changed the outer catch to a `catch (OperationCanceledException) { throw; }` / `catch { return; }` pair so cancellation propagates out of `DiscoverAsync` while transient browse sub-tree failures continue to be silently skipped. Two regression tests added to `OpcUaClientContinuationPointReleaseTests.cs`: `DiscoverAsync_releases_continuation_point_when_cancelled_mid_pagination` (verifies `BrowseNextAsync(release=true)` is called once on cancellation) and `DiscoverAsync_does_not_release_continuation_point_on_non_cancel_browse_failure` (verifies no spurious release on transport error).