Files
lmxopcua/code-reviews/Core.Abstractions/findings.md
Joseph Doherty ff2e75ab98 fix(core-abstractions): resolve Low code-review findings (Core.Abstractions-004,005,006,007,008)
- Core.Abstractions-004: guard DriverTypeRegistry.Register with a Lock so
  concurrent registrations are atomic.
- Core.Abstractions-005: narrow PollGroupEngine catch blocks to non-fatal
  exceptions, add optional onError callback, tolerate disposed-CTS races.
- Core.Abstractions-006: document the deliberate int-vs-uint asymmetry on
  IHistoryProvider.ReadEventsAsync / IHistorianDataSource.ReadEventsAsync.
- Core.Abstractions-007: pin the gaps with PollGroupEngine + DriverHealth
  contract tests.
- Core.Abstractions-008: correct XML docs on DriverHealth.LastError and
  the optional / required asymmetry on the history-read surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 05:37:54 -04:00

16 KiB

Code Review — Core.Abstractions

Field Value
Module src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 0

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Core.Abstractions-001, Core.Abstractions-002
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety Core.Abstractions-003, Core.Abstractions-004
4 Error handling & resilience Core.Abstractions-005
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions Core.Abstractions-006
9 Testing coverage Core.Abstractions-007
10 Documentation & comments Core.Abstractions-008

Findings

Core.Abstractions-001

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112
Status Resolved

Description: PollOnceAsync detects a change with !Equals(lastSeen?.Value, current.Value). object.Equals falls back to reference equality for reference types that do not override it — including T[] array values. The capability interfaces explicitly support 1-D array attributes (DriverAttributeInfo.IsArray, ValueRank=1), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so OnDataChange fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.

Recommendation: Compare array values structurally — e.g. when both lastSeen?.Value and current.Value are arrays, compare with StructuralComparisons.StructuralEqualityComparer.Equals (or element-wise) — instead of relying on object.Equals. Add a test covering an array-valued tag whose contents are unchanged across polls.

Resolution: Resolved 2026-05-22 — introduced ValuesAreDifferent helper in PollGroupEngine that uses StructuralComparisons.StructuralEqualityComparer for Array values, falling back to object.Equals for scalars; added Array_valued_tag_unchanged_contents_raises_only_once and Array_valued_tag_changed_contents_raises_event tests.

Core.Abstractions-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109
Status Resolved

Description: PollOnceAsync iterates state.TagReferences and indexes the reader's result with snapshots[i], assuming the driver-supplied _reader delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws ArgumentOutOfRangeException from the indexer. That exception escapes PollOnceAsync, is swallowed by the catch-all in PollLoopAsync (line 99), and the subscription then silently produces no further OnDataChange callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.

Recommendation: Validate snapshots.Count == state.TagReferences.Count at the top of PollOnceAsync and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.

Resolution: Resolved 2026-05-22 — added count-guard at the top of PollOnceAsync that throws InvalidOperationException with a descriptive message when the reader returns the wrong number of snapshots; added Reader_short_result_list_raises_descriptive_exception_and_loop_continues test verifying the loop survives contract violations and resumes delivering events once the reader recovers.

Core.Abstractions-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130
Status Resolved

Description: Subscribe starts the poll loop with a fire-and-forget Task.Run and keeps no reference to the returned Task. Neither Unsubscribe nor DisposeAsync awaits the loop's completion — they only cancel the CancellationTokenSource and dispose it. Two consequences:

  1. After DisposeAsync/Unsubscribe returns, a poll already in flight inside PollOnceAsync can still complete and invoke the _onChange callback. A driver that disposes the engine during shutdown can therefore receive a data-change callback after it considers the engine torn down, with no way to know the engine is gone.
  2. Unsubscribe/DisposeAsync call state.Cts.Dispose() immediately while the loop may still be inside Task.Delay(state.Interval, ct). Cancelling-then-disposing a CTS while a consumer still touches the token can race; Task.Delay on a disposed token can throw ObjectDisposedException rather than OperationCanceledException, which the Task.Delay await in PollLoopAsync does not catch (it catches only OperationCanceledException).

Recommendation: Track each loop Task in SubscriptionState and await it (with a timeout) in Unsubscribe/DisposeAsync before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer Cts.Dispose() until the loop task has observed cancellation, or wrap the Task.Delay await to also tolerate ObjectDisposedException.

Resolution: Resolved 2026-05-22 — stored the loop Task in SubscriptionState.LoopTask; Unsubscribe calls StopState which cancels then awaits the task (5 s timeout) before disposing the CTS; DisposeAsync cancels all loops in parallel then awaits them all via Task.WhenAll with a 5 s timeout before disposing CTSs, making teardown deterministic and preventing post-disposal callbacks.

Core.Abstractions-004

Field Value
Severity Low
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40
Status Resolved

Description: Register performs a check-then-act sequence (snapshot.ContainsKey then build next then Interlocked.Exchange) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a next dictionary; the second Interlocked.Exchange then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of Interlocked.Exchange for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.

Recommendation: Either guard Register with a lock so the check-build-swap is atomic, or strengthen the XML Thread-safety remark to state explicitly that concurrent Register calls are unsupported and only reader/writer concurrency is safe.

Resolution: Resolved 2026-05-23 — guarded the duplicate check + copy-on-write rebuild + swap with a private Lock, making the check-build-swap atomic. Added Register_ConcurrentDistinctTypes_AllSucceed and Register_ConcurrentDuplicateType_ExactlyOneWins tests that exercise 16/32 racing threads and assert the "registered only once" guarantee holds.

Core.Abstractions-005

Field Value
Severity Low
Category Error handling & resilience
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99
Status Resolved

Description: Both the initial-poll and steady-state catch blocks use a bare catch { } that swallows every exception type, including non-transient programmer errors such as NullReferenceException and ArgumentOutOfRangeException (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.

Recommendation: Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional Action<Exception> error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.

Resolution: Resolved 2026-05-23 — narrowed the catch blocks with an IsFatal guard so OutOfMemoryException / StackOverflowException / AccessViolationException / ThreadAbortException propagate instead of being swallowed; added an optional Action<Exception>? onError constructor parameter (backward-compatible — every existing driver call site uses named args and is unaffected) and routed every caught reader / contract-violation exception through a ReportError helper that defends against a buggy error sink. Also tolerates ObjectDisposedException from Task.Delay against an already-disposed CTS (defensive race-safety). Added Reader_exception_is_reported_to_onError_callback, Reader_contract_violation_routes_to_onError_callback, and OnError_handler_that_throws_does_not_crash_loop regression tests.

Core.Abstractions-006

Field Value
Severity Low
Category Code organization & conventions
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63
Status Resolved

Description: The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. IHistoryProvider.ReadRawAsync and IHistorianDataSource.ReadRawAsync take uint maxValuesPerNode, but ReadEventsAsync (on both interfaces) takes int maxEvents. The OPC UA HistoryRead service request fields are unsigned, and a negative maxEvents has no defined meaning. Mixing int and uint for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.

Recommendation: Standardize on uint for all max-rows parameters across both IHistoryProvider and IHistorianDataSource (or document explicitly why maxEvents differs).

Resolution: Resolved 2026-05-23 — took the documented-difference path (signature change would cross ten files outside Core.Abstractions). int maxEvents is intentionally signed: callers and downstream historian adapters (WonderwareHistorianClient, HistorianDataSource) treat maxEvents <= 0 as a "use the backend's default cap" sentinel that has no uint equivalent. Updated XML docs on both IHistoryProvider.ReadEventsAsync and IHistorianDataSource.ReadEventsAsync to spell out the asymmetry and rationale. Added HistoryRead_MaxParameter_TypePinned / HistoryProvider_MaxParameter_TypePinned contract tests so an accidental future flip is caught.

Core.Abstractions-007

Field Value
Severity Low
Category Testing coverage
Location tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs
Status Resolved

Description: PollGroupEngine is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The Reader_exception_does_not_crash_loop test only covers a reader that throws before producing any result. DataValueSnapshot change-detection semantics for reference-typed values are therefore unverified.

Recommendation: Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.

Resolution: Resolved 2026-05-23 — the gap is filled by the regression tests added when Core.Abstractions-001 and -002 were closed: Array_valued_tag_unchanged_contents_raises_only_once, Array_valued_tag_changed_contents_raises_event, and Reader_short_result_list_raises_descriptive_exception_and_loop_continues lock the two previously-untested paths down. The fixes for -004 / -005 / -008 added a further nine regression tests (Register_ConcurrentDistinctTypes_AllSucceed, Register_ConcurrentDuplicateType_ExactlyOneWins, Reader_exception_is_reported_to_onError_callback, Reader_contract_violation_routes_to_onError_callback, OnError_handler_that_throws_does_not_crash_loop, LastError_IsIndependent_OfState, DriverState_EnumContainsExpectedMembers, HistoryRead_MaxParameter_TypePinned, HistoryProvider_MaxParameter_TypePinned, HistoryProvider_OptionalMethods_HaveDefaultImplementation). Total test count rose from 57 to 75.

Core.Abstractions-008

Field Value
Severity Low
Category Documentation & comments
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69
Status Resolved

Description: Two XML-doc inaccuracies:

  1. DriverHealth.LastError is documented as "Most recent error message; null when state is Healthy." The DriverState enum also defines Degraded, Reconnecting, and Faulted states, all of which carry an error; and a driver in Healthy state may legitimately retain the last error from a previously-recovered failure. The "null when Healthy" claim is stronger than the type enforces and than callers should rely on.
  2. IHistoryProvider.ReadAtTimeAsync and ReadEventsAsync are C# default interface methods whose <remarks> say "Default implementation throws". This is accurate, but the sibling IHistorianDataSource declares the same methods as required (non-default) members — the asymmetry between the two history surfaces is undocumented and could surprise an implementer who assumes parity.

Recommendation: Reword DriverHealth.LastError to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on IHistoryProvider/IHistorianDataSource explaining why one surface uses default methods and the other does not.

Resolution: Resolved 2026-05-23 — reworded DriverHealth.LastError to "null when no error has been recorded" and called out that the field is independent of State. Added an asymmetry <remarks> to IHistoryProvider (default-impl-throws so legacy drivers compile) and to IHistorianDataSource.ReadEventsAsync (required because server-side historians own the full surface) cross-referencing the finding. Added LastError_IsIndependent_OfState + DriverState_EnumContainsExpectedMembers and HistoryProvider_OptionalMethods_HaveDefaultImplementation contract tests so the asymmetry pins.