Files
lmxopcua/code-reviews/Driver.Historian.Wonderware.Client/findings.md
Joseph Doherty 5499b817c8 fix(driver-historian-wonderware-client): resolve High code-review finding (Driver.Historian.Wonderware.Client-001)
WonderwareHistorianClient.ReadAtTimeAsync passed the sidecar's reply.Samples
straight through ToSnapshots, which violated the IHistorianDataSource
contract: the result MUST be the same length and order as the requested
timestampsUtc, with gaps returned as Bad-quality snapshots. If the sidecar
dropped or reordered samples, OPC UA HistoryReadAtTime would silently
misalign values with timestamps.

Add an AlignAtTimeSnapshots helper that indexes the returned samples by
timestamp ticks, builds the result array at timestampsUtc.Count in request
order, and emits a Bad-quality (0x80000000) snapshot for any requested
timestamp the sidecar did not return.

Add the ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad
regression test where the fake returns a partial, reordered sample set.

Update code-reviews/Driver.Historian.Wonderware.Client/findings.md: -001
Resolved, open-finding count 10 -> 9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:59:40 -04:00

15 KiB

Code Review — Driver.Historian.Wonderware.Client

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 9

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Driver.Historian.Wonderware.Client-001, Driver.Historian.Wonderware.Client-002
2 OtOpcUa conventions No issues found
3 Concurrency & thread safety Driver.Historian.Wonderware.Client-003, Driver.Historian.Wonderware.Client-004
4 Error handling & resilience Driver.Historian.Wonderware.Client-005, Driver.Historian.Wonderware.Client-006
5 Security Driver.Historian.Wonderware.Client-007, Driver.Historian.Wonderware.Client-008
6 Performance & resource management No issues found
7 Design-document adherence No issues found
8 Code organization & conventions No issues found
9 Testing coverage Driver.Historian.Wonderware.Client-009
10 Documentation & comments Driver.Historian.Wonderware.Client-010

Findings

Driver.Historian.Wonderware.Client-001

Field Value
Severity High
Category Correctness & logic bugs
Location WonderwareHistorianClient.cs:98-113
Status Resolved

Description: ReadAtTimeAsync violates the explicit IHistorianDataSource.ReadAtTimeAsync contract. The interface XML doc states: the returned list MUST be the same length and order as timestampsUtc, and gaps are returned as Bad-quality snapshots. The client passes reply.Samples straight through ToSnapshots with no check that the sidecar returned exactly one sample per requested timestamp, nor that the order matches. If the sidecar returns fewer/more samples (e.g. it drops boundary-less timestamps), the OPC UA HistoryReadAtTime service receives a result that the spec-compliant caller expects to index positionally against the request timestamps, silently misaligning values with timestamps. The matching ReadAtTimeAsync_PreservesTimestampOrder test only passes because the fake echoes the request verbatim; it never exercises a short/reordered reply.

Recommendation: After receiving the reply, reconcile reply.Samples against timestampsUtc by timestamp: build the result array at timestampsUtc.Count, fill matched entries, and emit a Bad-quality (0x80000000) snapshot for any requested timestamp the sidecar did not return. Alternatively assert reply.Samples.Length == timestampsUtc.Count and fail loudly. Add a test where the fake returns a partial/reordered sample set.

Resolution: Resolved 2026-05-22 — ReadAtTimeAsync now reconciles the sidecar reply against the requested timestamps via a new AlignAtTimeSnapshots helper: it indexes returned samples by timestamp ticks, builds the result at timestampsUtc.Count in request order, and emits a Bad-quality (0x80000000) snapshot for any requested timestamp the sidecar did not return; added the ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad regression test.

Driver.Historian.Wonderware.Client-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location WonderwareHistorianClient.cs:154-199, IAlarmHistorianSink.cs:66-74
Status Open

Description: WriteBatchAsync can never return HistorianWriteOutcome.PermanentFail. HistorianWriteOutcome defines three states (Ack, RetryPlease, PermanentFail) and the drain worker is documented to move the event to the dead-letter table on PermanentFail. The client maps the sidecar WriteAlarmEventsReply.PerEventOk bool array to only Ack/RetryPlease, and the whole-call-failure and catch paths also only emit RetryPlease. A malformed alarm event the sidecar can never persist (unrecoverable SDK error on that specific row) therefore retries forever, blocking the head of the store-and-forward queue and never dead-lettering. The wire contract (WriteAlarmEventsReply) carries no per-event permanent/transient distinction, so the limitation is structural.

Recommendation: Extend the wire contract: replace bool[] PerEventOk with a per-event status enum (Ack/Retry/Permanent), coordinated as an additive change on both sidecar and client per the Contracts.cs versioning rules, so unrecoverable events can be dead-lettered. Until then, document explicitly that this writer never produces PermanentFail and that poison events retry indefinitely.

Resolution: (open)

Driver.Historian.Wonderware.Client-003

Field Value
Severity Low
Category Concurrency & thread safety
Location WonderwareHistorianClient.cs:207, WonderwareHistorianClient.cs:132-150
Status Open

Description: _totalQueries is mutated with Interlocked.Increment in Invoke, but read inside GetHealthSnapshot under _healthLock, and every other counter (_totalSuccesses, _totalFailures, _consecutiveFailures) is mutated only under _healthLock. The two synchronization mechanisms do not compose: an Interlocked increment is not ordered against lock-protected reads, so a snapshot can observe a _totalQueries value inconsistent with the lock-protected counters. The window is small and the counters are advisory, but the mixed model is a latent hazard.

Recommendation: Pick one mechanism. Simplest: move the _totalQueries++ into the _healthLock block (a new RecordQuery() helper, or fold it into RecordSuccess/ RecordFailure) so all six health fields share a single lock.

Resolution: (open)

Driver.Historian.Wonderware.Client-004

Field Value
Severity Low
Category Concurrency & thread safety
Location WonderwareHistorianClient.cs:203-267
Status Open

Description: A sidecar-reported failure is recorded in two non-atomic steps under separate lock acquisitions: Invoke calls RecordSuccess() (line 211) and then the caller calls ThrowIfFailed which calls ReclassifySuccessAsFailure() (line 256), decrementing _totalSuccesses and incrementing _totalFailures. Between those two locked regions a concurrent GetHealthSnapshot can observe a transient state where the operation counts as both a success and not-yet-a-failure (_totalSuccesses inflated, _consecutiveFailures still 0). The undo-a-success/record-a-failure dance is also fragile: if a future change adds an early return or exception between RecordSuccess and ThrowIfFailed, the success is never reversed.

Recommendation: Classify the call once: do not call RecordSuccess until the sidecar-level Success flag has been checked, or pass the reply success/error into a single RecordOutcome(bool transportOk, bool sidecarOk, string? error) that updates all counters under one lock acquisition.

Resolution: (open)

Driver.Historian.Wonderware.Client-005

Field Value
Severity Medium
Category Error handling & resilience
Location Ipc/FrameReader.cs:31-32
Status Open

Description: After reading the 4-byte length prefix, ReadFrameAsync reads the kind byte with the synchronous, blocking _stream.ReadByte() and ignores the CancellationToken. On a NamedPipeClientStream with PipeOptions.Asynchronous, a synchronous ReadByte() blocks the calling thread until a byte arrives or the pipe closes. If the sidecar sends a length prefix and then stalls (slow/hung peer), the call hangs on a thread-pool thread and the EffectiveCallTimeout linked token in PipeChannel.InvokeAsync cannot interrupt it because the timeout only fires between awaits. This defeats the documented cap on a single read/write call once connected and can wedge the single-in-flight call gate.

Recommendation: Read the kind byte asynchronously and cancellably: extend the length prefix read to 5 bytes, or do a second ReadExactAsync(new byte[1], ct). This makes the whole frame read honor the call-timeout token and matches the async style of the rest of the reader.

Resolution: (open)

Driver.Historian.Wonderware.Client-006

Field Value
Severity Low
Category Error handling & resilience
Location Internal/PipeChannel.cs:96-107, WonderwareHistorianClientOptions.cs:11-12
Status Open

Description: PipeChannel.InvokeAsync retries exactly once on transport failure and otherwise propagates. The options expose ReconnectInitialBackoff and ReconnectMaxBackoff and WonderwareHistorianClientOptions documents them as exponential backoff between reconnects, but neither field is referenced anywhere in the module: the single retry reconnects immediately with no delay. A sidecar that is restarting will reject or refuse the immediate reconnect, the call fails, and there is no backoff before the next caller-driven attempt. Either the backoff belongs in the channel and is missing, or the options are dead config that misleads operators.

Recommendation: Either implement the documented exponential backoff in the reconnect path, or remove the two unused option fields and their XML docs and state plainly that retry/backoff is owned by the caller (the alarm drain worker / history router).

Resolution: (open)

Driver.Historian.Wonderware.Client-007

Field Value
Severity Medium
Category Security
Location WonderwareHistorianClient.cs:276
Status Open

Description: ToSnapshots deserializes peer-supplied bytes with MessagePackSerializer.Deserialize<object>(dto.ValueBytes), typeless MessagePack deserialization. The object overload resolves runtime types from the wire payload. The client treats the pipe peer as untrusted elsewhere (16 MiB frame cap stated to protect the receiver from a hostile or buggy peer, shared-secret Hello). Typeless deserialization of bytes that originate from the historian database widens the trust surface. The MessagePack standard resolver is primitive-only by default so the practical blast radius is limited, but this is the pattern called out by the two suppressed MessagePack advisories on this project (see finding 008).

Recommendation: Confirm the serializer options here use the default (non-typeless) resolver and that no TypelessContractlessStandardResolver is in play; if so, document that. Prefer round-tripping the value as a constrained set of known primitive types rather than object, and validate ValueBytes.Length against a sane per-sample cap before deserializing.

Resolution: (open)

Driver.Historian.Wonderware.Client-008

Field Value
Severity Low
Category Security
Location ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32
Status Open

Description: The csproj suppresses two NuGet audit advisories (GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf) for the MessagePack 2.5.187 dependency with no inline comment recording why the suppression is safe, who reviewed it, or when it should be revisited. Blanket NuGetAuditSuppress entries silence the very signal that would flag the next related CVE. Combined with finding 007 (typeless deserialization), an unexplained MessagePack advisory suppression is a maintainability and audit-trail gap.

Recommendation: Add an XML comment next to each NuGetAuditSuppress stating the advisory title, why it does not apply to this module usage, and a revisit trigger. Track a follow-up to upgrade MessagePack once a patched version is available so the suppressions can be dropped.

Resolution: (open)

Driver.Historian.Wonderware.Client-009

Field Value
Severity Medium
Category Testing coverage
Location tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs
Status Open

Description: The suite covers happy paths, server-error, bad-secret, a single reconnect and health counters, but several critical paths are untested: (1) ReadAtTimeAsync with a partial/reordered sidecar reply, the contract-alignment case from finding 001 (the existing test only echoes the request); (2) the WriteBatchAsync catch branch, a transport/deserialization throw during a write, which must return RetryPlease for every event; (3) InvokeAsync second-attempt-also-fails path (the test only proves a successful reconnect, never a reconnect that fails again and propagates); (4) the CallTimeout path, no test asserts that a stalled sidecar produces a timed-out OperationCanceledException; (5) MapAggregate for HistoryAggregateType.Total throwing NotSupportedException; (6) the InvalidDataException path when the sidecar replies with an unexpected MessageKind. The byte-equality / round-trip parity test the Contracts.cs and Framing.cs comments repeatedly promise is not present in this test project.

Recommendation: Add the missing-edge-case tests above. In particular add the wire-parity test the source comments commit to: serialize each DTO with the client copy and assert byte-equality against the sidecar Driver.Historian.Wonderware.Ipc copy, so a silent [Key] drift between the two duplicated contract sets is caught at build time.

Resolution: (open)

Driver.Historian.Wonderware.Client-010

Field Value
Severity Low
Category Documentation & comments
Location WonderwareHistorianClient.cs:355-361, WonderwareHistorianClient.cs:132-150
Status Open

Description: Two doc/behaviour mismatches. (1) The Dispose() XML comment asserts the underlying channel async cleanup is non-blocking so the GetAwaiter()/GetResult() bridge is safe. PipeChannel.DisposeAsync calls ResetTransport(), which invokes synchronous Stream.Dispose() on a NamedPipeClientStream; pipe disposal can block briefly on OS handle teardown. The bridge is safe (no deadlock, no captured context) but not strictly non-blocking; the comment should say "does not deadlock". (2) GetHealthSnapshot populates both ProcessConnectionOpen and EventConnectionOpen from the same _channel.IsConnected, and ActiveProcessNode/ActiveEventNode/Nodes are hard-coded to null/empty. A consumer reading HistorianHealthSnapshot would assume two independent connections and per-node health; this client has a single channel and no node concept. The collapse is reasonable but undocumented.

Recommendation: Reword the Dispose() comment to claim only deadlock-safety. Add a short remark on GetHealthSnapshot explaining that the single-channel client maps both connection flags to one transport and does not track per-node health.

Resolution: (open)