Files
lmxopcua/code-reviews/Core.Abstractions/findings.md
T
Joseph Doherty 65e6af6001 review(Core.Abstractions): document ReadEventsAsync continuation contract (OpcUaServer-002 root)
Re-review at 7286d320. Core.Abstractions-009: ReadEventsAsync maxEvents<=0 sentinel now
documents the implementer's continuation-point obligation when a backend cap truncates
(the root of OpcUaServer-002). -010: PollGroupEngineTests pass CancellationToken. Plus
EquipmentTagRefResolver.TryResolve [MaybeNullWhen(false)] NRT cleanup + test.
2026-06-19 11:06:56 -04:00

21 KiB

Code Review — Core.Abstractions

Field Value
Module src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions
Reviewer Claude Code
Review date 2026-06-19 (re-review; original 2026-05-22)
Commit reviewed 7286d320
Status Reviewed
Open findings 0

Checklist coverage — original review (76d35d1, 2026-05-22)

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Core.Abstractions-001, Core.Abstractions-002
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety Core.Abstractions-003, Core.Abstractions-004
4 Error handling & resilience Core.Abstractions-005
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions Core.Abstractions-006
9 Testing coverage Core.Abstractions-007
10 Documentation & comments Core.Abstractions-008

Checklist coverage — re-review (7286d320, 2026-06-19)

# Category Result
1 Correctness & logic bugs No new issues found
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety No new issues found
4 Error handling & resilience No new issues found
5 Security No issues found
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions No new issues found
9 Testing coverage Core.Abstractions-010
10 Documentation & comments Core.Abstractions-009

Findings

Core.Abstractions-001

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112
Status Resolved

Description: PollOnceAsync detects a change with !Equals(lastSeen?.Value, current.Value). object.Equals falls back to reference equality for reference types that do not override it — including T[] array values. The capability interfaces explicitly support 1-D array attributes (DriverAttributeInfo.IsArray, ValueRank=1), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so OnDataChange fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.

Recommendation: Compare array values structurally — e.g. when both lastSeen?.Value and current.Value are arrays, compare with StructuralComparisons.StructuralEqualityComparer.Equals (or element-wise) — instead of relying on object.Equals. Add a test covering an array-valued tag whose contents are unchanged across polls.

Resolution: Resolved 2026-05-22 — introduced ValuesAreDifferent helper in PollGroupEngine that uses StructuralComparisons.StructuralEqualityComparer for Array values, falling back to object.Equals for scalars; added Array_valued_tag_unchanged_contents_raises_only_once and Array_valued_tag_changed_contents_raises_event tests.

Core.Abstractions-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109
Status Resolved

Description: PollOnceAsync iterates state.TagReferences and indexes the reader's result with snapshots[i], assuming the driver-supplied _reader delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws ArgumentOutOfRangeException from the indexer. That exception escapes PollOnceAsync, is swallowed by the catch-all in PollLoopAsync (line 99), and the subscription then silently produces no further OnDataChange callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.

Recommendation: Validate snapshots.Count == state.TagReferences.Count at the top of PollOnceAsync and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.

Resolution: Resolved 2026-05-22 — added count-guard at the top of PollOnceAsync that throws InvalidOperationException with a descriptive message when the reader returns the wrong number of snapshots; added Reader_short_result_list_raises_descriptive_exception_and_loop_continues test verifying the loop survives contract violations and resumes delivering events once the reader recovers.

Core.Abstractions-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130
Status Resolved

Description: Subscribe starts the poll loop with a fire-and-forget Task.Run and keeps no reference to the returned Task. Neither Unsubscribe nor DisposeAsync awaits the loop's completion — they only cancel the CancellationTokenSource and dispose it. Two consequences:

  1. After DisposeAsync/Unsubscribe returns, a poll already in flight inside PollOnceAsync can still complete and invoke the _onChange callback. A driver that disposes the engine during shutdown can therefore receive a data-change callback after it considers the engine torn down, with no way to know the engine is gone.
  2. Unsubscribe/DisposeAsync call state.Cts.Dispose() immediately while the loop may still be inside Task.Delay(state.Interval, ct). Cancelling-then-disposing a CTS while a consumer still touches the token can race; Task.Delay on a disposed token can throw ObjectDisposedException rather than OperationCanceledException, which the Task.Delay await in PollLoopAsync does not catch (it catches only OperationCanceledException).

Recommendation: Track each loop Task in SubscriptionState and await it (with a timeout) in Unsubscribe/DisposeAsync before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer Cts.Dispose() until the loop task has observed cancellation, or wrap the Task.Delay await to also tolerate ObjectDisposedException.

Resolution: Resolved 2026-05-22 — stored the loop Task in SubscriptionState.LoopTask; Unsubscribe calls StopState which cancels then awaits the task (5 s timeout) before disposing the CTS; DisposeAsync cancels all loops in parallel then awaits them all via Task.WhenAll with a 5 s timeout before disposing CTSs, making teardown deterministic and preventing post-disposal callbacks.

Core.Abstractions-004

Field Value
Severity Low
Category Concurrency & thread safety
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40
Status Resolved

Description: Register performs a check-then-act sequence (snapshot.ContainsKey then build next then Interlocked.Exchange) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a next dictionary; the second Interlocked.Exchange then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of Interlocked.Exchange for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.

Recommendation: Either guard Register with a lock so the check-build-swap is atomic, or strengthen the XML Thread-safety remark to state explicitly that concurrent Register calls are unsupported and only reader/writer concurrency is safe.

Resolution: Resolved 2026-05-23 — guarded the duplicate check + copy-on-write rebuild + swap with a private Lock, making the check-build-swap atomic. Added Register_ConcurrentDistinctTypes_AllSucceed and Register_ConcurrentDuplicateType_ExactlyOneWins tests that exercise 16/32 racing threads and assert the "registered only once" guarantee holds.

Core.Abstractions-005

Field Value
Severity Low
Category Error handling & resilience
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99
Status Resolved

Description: Both the initial-poll and steady-state catch blocks use a bare catch { } that swallows every exception type, including non-transient programmer errors such as NullReferenceException and ArgumentOutOfRangeException (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.

Recommendation: Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional Action<Exception> error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.

Resolution: Resolved 2026-05-23 — narrowed the catch blocks with an IsFatal guard so OutOfMemoryException / StackOverflowException / AccessViolationException / ThreadAbortException propagate instead of being swallowed; added an optional Action<Exception>? onError constructor parameter (backward-compatible — every existing driver call site uses named args and is unaffected) and routed every caught reader / contract-violation exception through a ReportError helper that defends against a buggy error sink. Also tolerates ObjectDisposedException from Task.Delay against an already-disposed CTS (defensive race-safety). Added Reader_exception_is_reported_to_onError_callback, Reader_contract_violation_routes_to_onError_callback, and OnError_handler_that_throws_does_not_crash_loop regression tests.

Core.Abstractions-006

Field Value
Severity Low
Category Code organization & conventions
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63
Status Resolved

Description: The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. IHistoryProvider.ReadRawAsync and IHistorianDataSource.ReadRawAsync take uint maxValuesPerNode, but ReadEventsAsync (on both interfaces) takes int maxEvents. The OPC UA HistoryRead service request fields are unsigned, and a negative maxEvents has no defined meaning. Mixing int and uint for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.

Recommendation: Standardize on uint for all max-rows parameters across both IHistoryProvider and IHistorianDataSource (or document explicitly why maxEvents differs).

Resolution: Resolved 2026-05-23 — took the documented-difference path (signature change would cross ten files outside Core.Abstractions). int maxEvents is intentionally signed: callers and downstream historian adapters (WonderwareHistorianClient, HistorianDataSource) treat maxEvents <= 0 as a "use the backend's default cap" sentinel that has no uint equivalent. Updated XML docs on both IHistoryProvider.ReadEventsAsync and IHistorianDataSource.ReadEventsAsync to spell out the asymmetry and rationale. Added HistoryRead_MaxParameter_TypePinned / HistoryProvider_MaxParameter_TypePinned contract tests so an accidental future flip is caught.

Core.Abstractions-007

Field Value
Severity Low
Category Testing coverage
Location tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs
Status Resolved

Description: PollGroupEngine is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The Reader_exception_does_not_crash_loop test only covers a reader that throws before producing any result. DataValueSnapshot change-detection semantics for reference-typed values are therefore unverified.

Recommendation: Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.

Resolution: Resolved 2026-05-23 — the gap is filled by the regression tests added when Core.Abstractions-001 and -002 were closed: Array_valued_tag_unchanged_contents_raises_only_once, Array_valued_tag_changed_contents_raises_event, and Reader_short_result_list_raises_descriptive_exception_and_loop_continues lock the two previously-untested paths down. The fixes for -004 / -005 / -008 added a further nine regression tests (Register_ConcurrentDistinctTypes_AllSucceed, Register_ConcurrentDuplicateType_ExactlyOneWins, Reader_exception_is_reported_to_onError_callback, Reader_contract_violation_routes_to_onError_callback, OnError_handler_that_throws_does_not_crash_loop, LastError_IsIndependent_OfState, DriverState_EnumContainsExpectedMembers, HistoryRead_MaxParameter_TypePinned, HistoryProvider_MaxParameter_TypePinned, HistoryProvider_OptionalMethods_HaveDefaultImplementation). Total test count rose from 57 to 75.

Core.Abstractions-008

Field Value
Severity Low
Category Documentation & comments
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69
Status Resolved

Description: Two XML-doc inaccuracies:

  1. DriverHealth.LastError is documented as "Most recent error message; null when state is Healthy." The DriverState enum also defines Degraded, Reconnecting, and Faulted states, all of which carry an error; and a driver in Healthy state may legitimately retain the last error from a previously-recovered failure. The "null when Healthy" claim is stronger than the type enforces and than callers should rely on.
  2. IHistoryProvider.ReadAtTimeAsync and ReadEventsAsync are C# default interface methods whose <remarks> say "Default implementation throws". This is accurate, but the sibling IHistorianDataSource declares the same methods as required (non-default) members — the asymmetry between the two history surfaces is undocumented and could surprise an implementer who assumes parity.

Recommendation: Reword DriverHealth.LastError to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on IHistoryProvider/IHistorianDataSource explaining why one surface uses default methods and the other does not.

Resolution: Resolved 2026-05-23 — reworded DriverHealth.LastError to "null when no error has been recorded" and called out that the field is independent of State. Added an asymmetry <remarks> to IHistoryProvider (default-impl-throws so legacy drivers compile) and to IHistorianDataSource.ReadEventsAsync (required because server-side historians own the full surface) cross-referencing the finding. Added LastError_IsIndependent_OfState + DriverState_EnumContainsExpectedMembers and HistoryProvider_OptionalMethods_HaveDefaultImplementation contract tests so the asymmetry pins.


Re-review 2026-06-19 (commit 7286d320)

Changes since 76d35d1: all eight original findings resolved. New code added: EquipmentTagRefResolver<TDef> (generic helper bridging authored tag-table lookup and equipment-tag JSON ref parsing, with ConcurrentDictionary negative-result cache), NullHistorianDataSource singleton, AlarmAcknowledgeRequest.OperatorUser optional param, AlarmEventArgs.Kind (AlarmTransitionKind), NamespaceKindCompatibility.SystemPlatform removed, AlarmConditionInfo sub-attribute refs, numerous XML-doc backfill. Test count rose to 88 at the prior commit. Re-review scope: verify all prior findings are properly closed, evaluate all new code against the 10 categories, and assess the OpcUaServer-002 cross-module item.

Core.Abstractions-009

Field Value
Severity Low
Category Documentation & comments
Location src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:89-107, src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:91-101
Status Resolved

Description (OpcUaServer-002 verdict): The ReadEventsAsync maxEvents <= 0 sentinel contract is present (Core.Abstractions-006 fix), but neither IHistoryProvider.ReadEventsAsync nor IHistorianDataSource.ReadEventsAsync specifies what a compliant implementation MUST do with ContinuationPoint when the backend caps the result. Without this, callers cannot distinguish "all matching events were returned" from "the backend hit its cap and silently truncated." The OpcUaServer review noted that when NumValuesPerNode == 0 ("all events") the server dispatches with maxEvents = 0 and gets back a result whose ContinuationPoint is null even when the backend capped it — causing the server to report GoodNoData / empty with no GoodMoreData response. This is a contract-under-specification issue in the interface definition: it needs to state that when the backend cap truncates, ContinuationPoint MUST be non-null.

Changing the signature is forbidden (public-contract break); the fix is a documentation clarification.

Recommendation: Add a <b>Continuation contract when using the sentinel:</b> note to IHistoryProvider.ReadEventsAsync and update the <param name="maxEvents"> doc in IHistorianDataSource.ReadEventsAsync to state: implementations MUST set ContinuationPoint non-null when the backend cap truncates the result; null means all matching events were returned.

Resolution: Resolved 2026-06-19 — added explicit continuation-contract note to IHistoryProvider.ReadEventsAsync <param name="maxEvents"> and to IHistorianDataSource.ReadEventsAsync <param name="maxEvents"> (both: "when backend cap truncates, MUST set ContinuationPoint non-null; null means all events returned"). Added ReadEventsAsync_sentinel_and_continuation_contract_types_pinned contract test in IHistorianDataSourceContractTests pinning maxEvents type and ContinuationPoint member presence. SHA: (2026-06-19, blank — not committed).

Core.Abstractions-010

Field Value
Severity Low
Category Testing coverage
Location tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs:68,111,128,246,280
Status Resolved

Description: Five Task.Delay(ms) calls in PollGroupEngineTests omit a CancellationToken, triggering xUnit v3 analyser warning xUnit1051 ("Calls to methods which accept CancellationToken should use TestContext.Current.CancellationToken"). These delays cannot be cancelled when a test is cancelled or times out, making those test methods unresponsive to test-runner abort signals.

Recommendation: Pass TestContext.Current.CancellationToken as the second argument to every Task.Delay call inside test methods.

Resolution: Resolved 2026-06-19 — updated all five bare Task.Delay(ms) calls in PollGroupEngineTests (lines 68, 111, 128, 246, 280 at review commit) to Task.Delay(ms, TestContext.Current.CancellationToken). Build warnings eliminated; test count unchanged at 88 (+ 2 new from -009 fix = 90 total). SHA: (2026-06-19, blank — not committed).

Note: EquipmentTagRefResolver null-safety improvement (not a separately-tracked finding)

The TryResolve out-parameter def was typed as out TDef def (non-nullable) but set to null! on the false path — idiomatic but silently unsound in NRT-enabled callers. Added [MaybeNullWhen(false)] attribute and changed null! to default. Pinned by new TryResolve_false_path_sets_def_to_default test in EquipmentTagRefResolverTests.