Files
lmxopcua/code-reviews/Core.Abstractions/findings.md
Joseph Doherty 11612900ba fix(core-abstractions): resolve Medium code-review findings (Core.Abstractions-001, -002, -003)
Core.Abstractions-001: PollGroupEngine compares array values with structural
equality so a driver returning a fresh T[] each poll no longer fires spuriously.
Core.Abstractions-002: PollOnceAsync guards reader result cardinality and
throws a descriptive InvalidOperationException on mismatch instead of a
swallowed ArgumentOutOfRangeException that stalled the subscription.
Core.Abstractions-003: the poll loop Task is tracked; Unsubscribe/DisposeAsync
await loop completion before disposing the CTS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:29:49 -04:00

12 KiB

Code Review — Core.Abstractions

Field Value
Module src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 5

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Core.Abstractions-001, Core.Abstractions-002
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety Core.Abstractions-003, Core.Abstractions-004
4 Error handling & resilience Core.Abstractions-005
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions Core.Abstractions-006
9 Testing coverage Core.Abstractions-007
10 Documentation & comments Core.Abstractions-008

Findings

Core.Abstractions-001

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112
Status Resolved

Description: PollOnceAsync detects a change with !Equals(lastSeen?.Value, current.Value). object.Equals falls back to reference equality for reference types that do not override it — including T[] array values. The capability interfaces explicitly support 1-D array attributes (DriverAttributeInfo.IsArray, ValueRank=1), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so OnDataChange fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.

Recommendation: Compare array values structurally — e.g. when both lastSeen?.Value and current.Value are arrays, compare with StructuralComparisons.StructuralEqualityComparer.Equals (or element-wise) — instead of relying on object.Equals. Add a test covering an array-valued tag whose contents are unchanged across polls.

Resolution: Resolved 2026-05-22 — introduced ValuesAreDifferent helper in PollGroupEngine that uses StructuralComparisons.StructuralEqualityComparer for Array values, falling back to object.Equals for scalars; added Array_valued_tag_unchanged_contents_raises_only_once and Array_valued_tag_changed_contents_raises_event tests.

Core.Abstractions-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109
Status Resolved

Description: PollOnceAsync iterates state.TagReferences and indexes the reader's result with snapshots[i], assuming the driver-supplied _reader delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws ArgumentOutOfRangeException from the indexer. That exception escapes PollOnceAsync, is swallowed by the catch-all in PollLoopAsync (line 99), and the subscription then silently produces no further OnDataChange callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.

Recommendation: Validate snapshots.Count == state.TagReferences.Count at the top of PollOnceAsync and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.

Resolution: Resolved 2026-05-22 — added count-guard at the top of PollOnceAsync that throws InvalidOperationException with a descriptive message when the reader returns the wrong number of snapshots; added Reader_short_result_list_raises_descriptive_exception_and_loop_continues test verifying the loop survives contract violations and resumes delivering events once the reader recovers.

Core.Abstractions-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130
Status Resolved

Description: Subscribe starts the poll loop with a fire-and-forget Task.Run and keeps no reference to the returned Task. Neither Unsubscribe nor DisposeAsync awaits the loop's completion — they only cancel the CancellationTokenSource and dispose it. Two consequences:

  1. After DisposeAsync/Unsubscribe returns, a poll already in flight inside PollOnceAsync can still complete and invoke the _onChange callback. A driver that disposes the engine during shutdown can therefore receive a data-change callback after it considers the engine torn down, with no way to know the engine is gone.
  2. Unsubscribe/DisposeAsync call state.Cts.Dispose() immediately while the loop may still be inside Task.Delay(state.Interval, ct). Cancelling-then-disposing a CTS while a consumer still touches the token can race; Task.Delay on a disposed token can throw ObjectDisposedException rather than OperationCanceledException, which the Task.Delay await in PollLoopAsync does not catch (it catches only OperationCanceledException).

Recommendation: Track each loop Task in SubscriptionState and await it (with a timeout) in Unsubscribe/DisposeAsync before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer Cts.Dispose() until the loop task has observed cancellation, or wrap the Task.Delay await to also tolerate ObjectDisposedException.

Resolution: Resolved 2026-05-22 — stored the loop Task in SubscriptionState.LoopTask; Unsubscribe calls StopState which cancels then awaits the task (5 s timeout) before disposing the CTS; DisposeAsync cancels all loops in parallel then awaits them all via Task.WhenAll with a 5 s timeout before disposing CTSs, making teardown deterministic and preventing post-disposal callbacks.

Core.Abstractions-004

Field Value
Severity Low
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40
Status Open

Description: Register performs a check-then-act sequence (snapshot.ContainsKey then build next then Interlocked.Exchange) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a next dictionary; the second Interlocked.Exchange then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of Interlocked.Exchange for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.

Recommendation: Either guard Register with a lock so the check-build-swap is atomic, or strengthen the XML Thread-safety remark to state explicitly that concurrent Register calls are unsupported and only reader/writer concurrency is safe.

Resolution: (open)

Core.Abstractions-005

Field Value
Severity Low
Category Error handling & resilience
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99
Status Open

Description: Both the initial-poll and steady-state catch blocks use a bare catch { } that swallows every exception type, including non-transient programmer errors such as NullReferenceException and ArgumentOutOfRangeException (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.

Recommendation: Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional Action<Exception> error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.

Resolution: (open)

Core.Abstractions-006

Field Value
Severity Low
Category Code organization & conventions
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63
Status Open

Description: The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. IHistoryProvider.ReadRawAsync and IHistorianDataSource.ReadRawAsync take uint maxValuesPerNode, but ReadEventsAsync (on both interfaces) takes int maxEvents. The OPC UA HistoryRead service request fields are unsigned, and a negative maxEvents has no defined meaning. Mixing int and uint for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.

Recommendation: Standardize on uint for all max-rows parameters across both IHistoryProvider and IHistorianDataSource (or document explicitly why maxEvents differs).

Resolution: (open)

Core.Abstractions-007

Field Value
Severity Low
Category Testing coverage
Location tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs
Status Open

Description: PollGroupEngine is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The Reader_exception_does_not_crash_loop test only covers a reader that throws before producing any result. DataValueSnapshot change-detection semantics for reference-typed values are therefore unverified.

Recommendation: Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.

Resolution: (open)

Core.Abstractions-008

Field Value
Severity Low
Category Documentation & comments
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69
Status Open

Description: Two XML-doc inaccuracies:

  1. DriverHealth.LastError is documented as "Most recent error message; null when state is Healthy." The DriverState enum also defines Degraded, Reconnecting, and Faulted states, all of which carry an error; and a driver in Healthy state may legitimately retain the last error from a previously-recovered failure. The "null when Healthy" claim is stronger than the type enforces and than callers should rely on.
  2. IHistoryProvider.ReadAtTimeAsync and ReadEventsAsync are C# default interface methods whose <remarks> say "Default implementation throws". This is accurate, but the sibling IHistorianDataSource declares the same methods as required (non-default) members — the asymmetry between the two history surfaces is undocumented and could surprise an implementer who assumes parity.

Recommendation: Reword DriverHealth.LastError to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on IHistoryProvider/IHistorianDataSource explaining why one surface uses default methods and the other does not.

Resolution: (open)