Add six previously-missing edge-case tests to WonderwareHistorianClientTests: (2) WriteBatchAsync transport-drop catch path returns RetryPlease for all events; (3) InvokeAsync second-attempt-also-fails propagates the exception; (4) stalled sidecar fires OperationCanceledException within CallTimeout; (5) HistoryAggregateType.Total throws NotSupportedException via ReadProcessedAsync; (6) sidecar wrong-MessageKind reply throws InvalidDataException. Extend FakeSidecarServer with DisconnectBeforeReply, ReplyWithWrongKind, and StallAfterRequest test knobs to support these scenarios. Add ContractsWireParityTests.cs (11 tests) to pin the MessagePack byte layout, round-trip correctness, MessageKind enum values, and Framing constants — catching silent [Key] index drift between the client and sidecar mirror copies without requiring a cross-TFM (net10 vs net48) project reference. Test count grew from 11 to 27; all 27 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 KiB
Code Review — Driver.Historian.Wonderware.Client
| Field | Value |
|---|---|
| Module | src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | 76d35d1 |
| Status | Reviewed |
| Open findings | 5 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.Historian.Wonderware.Client-001, Driver.Historian.Wonderware.Client-002 |
| 2 | OtOpcUa conventions | No issues found |
| 3 | Concurrency & thread safety | Driver.Historian.Wonderware.Client-003, Driver.Historian.Wonderware.Client-004 |
| 4 | Error handling & resilience | Driver.Historian.Wonderware.Client-005, Driver.Historian.Wonderware.Client-006 |
| 5 | Security | Driver.Historian.Wonderware.Client-007, Driver.Historian.Wonderware.Client-008 |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Driver.Historian.Wonderware.Client-009 |
| 10 | Documentation & comments | Driver.Historian.Wonderware.Client-010 |
Findings
Driver.Historian.Wonderware.Client-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | WonderwareHistorianClient.cs:98-113 |
| Status | Resolved |
Description: ReadAtTimeAsync violates the explicit IHistorianDataSource.ReadAtTimeAsync
contract. The interface XML doc states: the returned list MUST be the same length and
order as timestampsUtc, and gaps are returned as Bad-quality snapshots. The client passes
reply.Samples straight through ToSnapshots with no check that the sidecar returned
exactly one sample per requested timestamp, nor that the order matches. If the sidecar
returns fewer/more samples (e.g. it drops boundary-less timestamps), the OPC UA
HistoryReadAtTime service receives a result that the spec-compliant caller expects to
index positionally against the request timestamps, silently misaligning values with
timestamps. The matching ReadAtTimeAsync_PreservesTimestampOrder test only passes because
the fake echoes the request verbatim; it never exercises a short/reordered reply.
Recommendation: After receiving the reply, reconcile reply.Samples against
timestampsUtc by timestamp: build the result array at timestampsUtc.Count, fill matched
entries, and emit a Bad-quality (0x80000000) snapshot for any requested timestamp the
sidecar did not return. Alternatively assert reply.Samples.Length == timestampsUtc.Count
and fail loudly. Add a test where the fake returns a partial/reordered sample set.
Resolution: Resolved 2026-05-22 — ReadAtTimeAsync now reconciles the sidecar reply against the requested timestamps via a new AlignAtTimeSnapshots helper: it indexes returned samples by timestamp ticks, builds the result at timestampsUtc.Count in request order, and emits a Bad-quality (0x80000000) snapshot for any requested timestamp the sidecar did not return; added the ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad regression test.
Driver.Historian.Wonderware.Client-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | WonderwareHistorianClient.cs:154-199, IAlarmHistorianSink.cs:66-74 |
| Status | Resolved |
Description: WriteBatchAsync can never return HistorianWriteOutcome.PermanentFail.
HistorianWriteOutcome defines three states (Ack, RetryPlease, PermanentFail) and
the drain worker is documented to move the event to the dead-letter table on
PermanentFail. The client maps the sidecar WriteAlarmEventsReply.PerEventOk bool array
to only Ack/RetryPlease, and the whole-call-failure and catch paths also only emit
RetryPlease. A malformed alarm event the sidecar can never persist (unrecoverable SDK
error on that specific row) therefore retries forever, blocking the head of the
store-and-forward queue and never dead-lettering. The wire contract
(WriteAlarmEventsReply) carries no per-event permanent/transient distinction, so the
limitation is structural.
Recommendation: Extend the wire contract: replace bool[] PerEventOk with a
per-event status enum (Ack/Retry/Permanent), coordinated as an additive change on both
sidecar and client per the Contracts.cs versioning rules, so unrecoverable events can be
dead-lettered. Until then, document explicitly that this writer never produces
PermanentFail and that poison events retry indefinitely.
Resolution: Resolved 2026-05-22 — extending the wire contract (replacing bool[] PerEventOk with a per-event status enum) requires a coordinated change to the .NET 4.8 sidecar; instead, added a <remarks> XML doc block on WriteBatchAsync explicitly stating that PermanentFail is never returned, that poison events retry indefinitely until the drain worker's own retry-count limit fires, and that the protocol extension is a tracked follow-up; also added inline // NOTE comments in both the success and catch paths.
Driver.Historian.Wonderware.Client-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | WonderwareHistorianClient.cs:207, WonderwareHistorianClient.cs:132-150 |
| Status | Open |
Description: _totalQueries is mutated with Interlocked.Increment in Invoke, but
read inside GetHealthSnapshot under _healthLock, and every other counter
(_totalSuccesses, _totalFailures, _consecutiveFailures) is mutated only under
_healthLock. The two synchronization mechanisms do not compose: an Interlocked
increment is not ordered against lock-protected reads, so a snapshot can observe a
_totalQueries value inconsistent with the lock-protected counters. The window is small
and the counters are advisory, but the mixed model is a latent hazard.
Recommendation: Pick one mechanism. Simplest: move the _totalQueries++ into the
_healthLock block (a new RecordQuery() helper, or fold it into RecordSuccess/
RecordFailure) so all six health fields share a single lock.
Resolution: (open)
Driver.Historian.Wonderware.Client-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | WonderwareHistorianClient.cs:203-267 |
| Status | Open |
Description: A sidecar-reported failure is recorded in two non-atomic steps under
separate lock acquisitions: Invoke calls RecordSuccess() (line 211) and then the
caller calls ThrowIfFailed which calls ReclassifySuccessAsFailure() (line 256),
decrementing _totalSuccesses and incrementing _totalFailures. Between those two locked
regions a concurrent GetHealthSnapshot can observe a transient state where the operation
counts as both a success and not-yet-a-failure (_totalSuccesses inflated,
_consecutiveFailures still 0). The undo-a-success/record-a-failure dance is also fragile:
if a future change adds an early return or exception between RecordSuccess and
ThrowIfFailed, the success is never reversed.
Recommendation: Classify the call once: do not call RecordSuccess until the
sidecar-level Success flag has been checked, or pass the reply success/error into a
single RecordOutcome(bool transportOk, bool sidecarOk, string? error) that updates all
counters under one lock acquisition.
Resolution: (open)
Driver.Historian.Wonderware.Client-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | Ipc/FrameReader.cs:31-32 |
| Status | Resolved |
Description: After reading the 4-byte length prefix, ReadFrameAsync reads the kind
byte with the synchronous, blocking _stream.ReadByte() and ignores the
CancellationToken. On a NamedPipeClientStream with PipeOptions.Asynchronous, a
synchronous ReadByte() blocks the calling thread until a byte arrives or the pipe
closes. If the sidecar sends a length prefix and then stalls (slow/hung peer), the call
hangs on a thread-pool thread and the EffectiveCallTimeout linked token in
PipeChannel.InvokeAsync cannot interrupt it because the timeout only fires between
awaits. This defeats the documented cap on a single read/write call once connected and can
wedge the single-in-flight call gate.
Recommendation: Read the kind byte asynchronously and cancellably: extend the length
prefix read to 5 bytes, or do a second ReadExactAsync(new byte[1], ct). This makes the
whole frame read honor the call-timeout token and matches the async style of the rest of
the reader.
Resolution: Resolved 2026-05-22 — replaced the synchronous, non-cancellable _stream.ReadByte() for the kind byte with an async ReadExactAsync(new byte[1], ct) call so the full frame read honours the call-timeout token and cannot wedge the channel on a stalled peer.
Driver.Historian.Wonderware.Client-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | Internal/PipeChannel.cs:96-107, WonderwareHistorianClientOptions.cs:11-12 |
| Status | Open |
Description: PipeChannel.InvokeAsync retries exactly once on transport failure and
otherwise propagates. The options expose ReconnectInitialBackoff and
ReconnectMaxBackoff and WonderwareHistorianClientOptions documents them as exponential
backoff between reconnects, but neither field is referenced anywhere in the module: the
single retry reconnects immediately with no delay. A sidecar that is restarting will
reject or refuse the immediate reconnect, the call fails, and there is no backoff before
the next caller-driven attempt. Either the backoff belongs in the channel and is missing,
or the options are dead config that misleads operators.
Recommendation: Either implement the documented exponential backoff in the reconnect path, or remove the two unused option fields and their XML docs and state plainly that retry/backoff is owned by the caller (the alarm drain worker / history router).
Resolution: (open)
Driver.Historian.Wonderware.Client-007
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | WonderwareHistorianClient.cs:276 |
| Status | Resolved |
Description: ToSnapshots deserializes peer-supplied bytes with
MessagePackSerializer.Deserialize<object>(dto.ValueBytes), typeless MessagePack
deserialization. The object overload resolves runtime types from the wire payload. The
client treats the pipe peer as untrusted elsewhere (16 MiB frame cap stated to protect
the receiver from a hostile or buggy peer, shared-secret Hello). Typeless deserialization
of bytes that originate from the historian database widens the trust surface. The
MessagePack standard resolver is primitive-only by default so the practical blast radius
is limited, but this is the pattern called out by the two suppressed MessagePack
advisories on this project (see finding 008).
Recommendation: Confirm the serializer options here use the default (non-typeless)
resolver and that no TypelessContractlessStandardResolver is in play; if so, document
that. Prefer round-tripping the value as a constrained set of known primitive types rather
than object, and validate ValueBytes.Length against a sane per-sample cap before
deserializing.
Resolution: Resolved 2026-05-22 — added DeserializeSampleValue() helper that enforces a 64 KiB per-sample ValueBytes cap before deserialization and documents that the default StandardResolver (primitive-only, no TypelessContractlessStandardResolver) is in use; both ToSnapshots and AlignAtTimeSnapshots now route through the helper; added inline XML comments to the two NuGetAuditSuppress entries in the csproj stating the advisory title, why it does not apply to this usage, and the revisit trigger.
Driver.Historian.Wonderware.Client-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32 |
| Status | Open |
Description: The csproj suppresses two NuGet audit advisories
(GHSA-37gx-xxp4-5rgx, GHSA-w3x6-4m5h-cxqf) for the MessagePack 2.5.187 dependency
with no inline comment recording why the suppression is safe, who reviewed it, or when it
should be revisited. Blanket NuGetAuditSuppress entries silence the very signal that
would flag the next related CVE. Combined with finding 007 (typeless deserialization), an
unexplained MessagePack advisory suppression is a maintainability and audit-trail gap.
Recommendation: Add an XML comment next to each NuGetAuditSuppress stating the
advisory title, why it does not apply to this module usage, and a revisit trigger. Track a
follow-up to upgrade MessagePack once a patched version is available so the suppressions
can be dropped.
Resolution: (open)
Driver.Historian.Wonderware.Client-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs |
| Status | Resolved |
Description: The suite covers happy paths, server-error, bad-secret, a single
reconnect and health counters, but several critical paths are untested:
(1) ReadAtTimeAsync with a partial/reordered sidecar reply, the contract-alignment case
from finding 001 (the existing test only echoes the request);
(2) the WriteBatchAsync catch branch, a transport/deserialization throw during a write,
which must return RetryPlease for every event;
(3) InvokeAsync second-attempt-also-fails path (the test only proves a successful
reconnect, never a reconnect that fails again and propagates);
(4) the CallTimeout path, no test asserts that a stalled sidecar produces a timed-out
OperationCanceledException;
(5) MapAggregate for HistoryAggregateType.Total throwing NotSupportedException;
(6) the InvalidDataException path when the sidecar replies with an unexpected
MessageKind. The byte-equality / round-trip parity test the Contracts.cs and Framing.cs
comments repeatedly promise is not present in this test project.
Recommendation: Add the missing-edge-case tests above. In particular add the
wire-parity test the source comments commit to: serialize each DTO with the client copy
and assert byte-equality against the sidecar Driver.Historian.Wonderware.Ipc copy, so a
silent [Key] drift between the two duplicated contract sets is caught at build time.
Resolution: Resolved 2026-05-22 — added six missing tests to WonderwareHistorianClientTests.cs (WriteBatchAsync transport-drop catch path returns RetryPlease; InvokeAsync both-attempts-fail propagates exception; stalled sidecar fires OperationCanceledException within CallTimeout; ReadProcessedAsync Total aggregate throws NotSupportedException; sidecar wrong-kind reply throws InvalidDataException) and extended FakeSidecarServer with DisconnectBeforeReply, ReplyWithWrongKind, and StallAfterRequest test knobs; added new ContractsWireParityTests.cs with 11 tests pinning MessagePack byte layout, round-trip correctness, MessageKind enum values, and Framing constants to catch silent [Key] index drift between the client and sidecar mirror copies. Total test count grew from 11 to 27, all passing.
Driver.Historian.Wonderware.Client-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | WonderwareHistorianClient.cs:355-361, WonderwareHistorianClient.cs:132-150 |
| Status | Open |
Description: Two doc/behaviour mismatches.
(1) The Dispose() XML comment asserts the underlying channel async cleanup is
non-blocking so the GetAwaiter()/GetResult() bridge is safe. PipeChannel.DisposeAsync
calls ResetTransport(), which invokes synchronous Stream.Dispose() on a
NamedPipeClientStream; pipe disposal can block briefly on OS handle teardown. The bridge
is safe (no deadlock, no captured context) but not strictly non-blocking; the comment
should say "does not deadlock".
(2) GetHealthSnapshot populates both ProcessConnectionOpen and EventConnectionOpen
from the same _channel.IsConnected, and ActiveProcessNode/ActiveEventNode/Nodes
are hard-coded to null/empty. A consumer reading HistorianHealthSnapshot would assume
two independent connections and per-node health; this client has a single channel and no
node concept. The collapse is reasonable but undocumented.
Recommendation: Reword the Dispose() comment to claim only deadlock-safety. Add a
short remark on GetHealthSnapshot explaining that the single-channel client maps both
connection flags to one transport and does not track per-node health.
Resolution: (open)