WonderwareHistorianClient.ReadAtTimeAsync passed the sidecar's reply.Samples straight through ToSnapshots, which violated the IHistorianDataSource contract: the result MUST be the same length and order as the requested timestampsUtc, with gaps returned as Bad-quality snapshots. If the sidecar dropped or reordered samples, OPC UA HistoryReadAtTime would silently misalign values with timestamps. Add an AlignAtTimeSnapshots helper that indexes the returned samples by timestamp ticks, builds the result array at timestampsUtc.Count in request order, and emits a Bad-quality (0x80000000) snapshot for any requested timestamp the sidecar did not return. Add the ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad regression test where the fake returns a partial, reordered sample set. Update code-reviews/Driver.Historian.Wonderware.Client/findings.md: -001 Resolved, open-finding count 10 -> 9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
295 lines
15 KiB
Markdown
295 lines
15 KiB
Markdown
# Code Review — Driver.Historian.Wonderware.Client
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-22 |
|
|
| Commit reviewed | `76d35d1` |
|
|
| Status | Reviewed |
|
|
| Open findings | 9 |
|
|
|
|
## Checklist coverage
|
|
|
|
A comprehensive review completes every category, recording "No issues found" where
|
|
a category produced nothing rather than leaving it blank.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Driver.Historian.Wonderware.Client-001, Driver.Historian.Wonderware.Client-002 |
|
|
| 2 | OtOpcUa conventions | No issues found |
|
|
| 3 | Concurrency & thread safety | Driver.Historian.Wonderware.Client-003, Driver.Historian.Wonderware.Client-004 |
|
|
| 4 | Error handling & resilience | Driver.Historian.Wonderware.Client-005, Driver.Historian.Wonderware.Client-006 |
|
|
| 5 | Security | Driver.Historian.Wonderware.Client-007, Driver.Historian.Wonderware.Client-008 |
|
|
| 6 | Performance & resource management | No issues found |
|
|
| 7 | Design-document adherence | No issues found |
|
|
| 8 | Code organization & conventions | No issues found |
|
|
| 9 | Testing coverage | Driver.Historian.Wonderware.Client-009 |
|
|
| 10 | Documentation & comments | Driver.Historian.Wonderware.Client-010 |
|
|
|
|
## Findings
|
|
|
|
### Driver.Historian.Wonderware.Client-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `WonderwareHistorianClient.cs:98-113` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `ReadAtTimeAsync` violates the explicit `IHistorianDataSource.ReadAtTimeAsync`
|
|
contract. The interface XML doc states: the returned list MUST be the same length and
|
|
order as `timestampsUtc`, and gaps are returned as Bad-quality snapshots. The client passes
|
|
`reply.Samples` straight through `ToSnapshots` with no check that the sidecar returned
|
|
exactly one sample per requested timestamp, nor that the order matches. If the sidecar
|
|
returns fewer/more samples (e.g. it drops boundary-less timestamps), the OPC UA
|
|
HistoryReadAtTime service receives a result that the spec-compliant caller expects to
|
|
index positionally against the request timestamps, silently misaligning values with
|
|
timestamps. The matching `ReadAtTimeAsync_PreservesTimestampOrder` test only passes because
|
|
the fake echoes the request verbatim; it never exercises a short/reordered reply.
|
|
|
|
**Recommendation:** After receiving the reply, reconcile `reply.Samples` against
|
|
`timestampsUtc` by timestamp: build the result array at `timestampsUtc.Count`, fill matched
|
|
entries, and emit a Bad-quality (`0x80000000`) snapshot for any requested timestamp the
|
|
sidecar did not return. Alternatively assert `reply.Samples.Length == timestampsUtc.Count`
|
|
and fail loudly. Add a test where the fake returns a partial/reordered sample set.
|
|
|
|
**Resolution:** Resolved 2026-05-22 — `ReadAtTimeAsync` now reconciles the sidecar reply against the requested timestamps via a new `AlignAtTimeSnapshots` helper: it indexes returned samples by timestamp ticks, builds the result at `timestampsUtc.Count` in request order, and emits a Bad-quality (`0x80000000`) snapshot for any requested timestamp the sidecar did not return; added the `ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad` regression test.
|
|
|
|
### Driver.Historian.Wonderware.Client-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
|
|
| Status | Open |
|
|
|
|
**Description:** `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`.
|
|
`HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and
|
|
the drain worker is documented to move the event to the dead-letter table on
|
|
`PermanentFail`. The client maps the sidecar `WriteAlarmEventsReply.PerEventOk` bool array
|
|
to only `Ack`/`RetryPlease`, and the whole-call-failure and catch paths also only emit
|
|
`RetryPlease`. A malformed alarm event the sidecar can never persist (unrecoverable SDK
|
|
error on that specific row) therefore retries forever, blocking the head of the
|
|
store-and-forward queue and never dead-lettering. The wire contract
|
|
(`WriteAlarmEventsReply`) carries no per-event permanent/transient distinction, so the
|
|
limitation is structural.
|
|
|
|
**Recommendation:** Extend the wire contract: replace `bool[] PerEventOk` with a
|
|
per-event status enum (Ack/Retry/Permanent), coordinated as an additive change on both
|
|
sidecar and client per the Contracts.cs versioning rules, so unrecoverable events can be
|
|
dead-lettered. Until then, document explicitly that this writer never produces
|
|
`PermanentFail` and that poison events retry indefinitely.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
|
|
| Status | Open |
|
|
|
|
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
|
|
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
|
|
(`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under
|
|
`_healthLock`. The two synchronization mechanisms do not compose: an `Interlocked`
|
|
increment is not ordered against `lock`-protected reads, so a snapshot can observe a
|
|
`_totalQueries` value inconsistent with the lock-protected counters. The window is small
|
|
and the counters are advisory, but the mixed model is a latent hazard.
|
|
|
|
**Recommendation:** Pick one mechanism. Simplest: move the `_totalQueries++` into the
|
|
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
|
|
`RecordFailure`) so all six health fields share a single lock.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `WonderwareHistorianClient.cs:203-267` |
|
|
| Status | Open |
|
|
|
|
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
|
|
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
|
|
caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256),
|
|
decrementing `_totalSuccesses` and incrementing `_totalFailures`. Between those two locked
|
|
regions a concurrent `GetHealthSnapshot` can observe a transient state where the operation
|
|
counts as both a success and not-yet-a-failure (`_totalSuccesses` inflated,
|
|
`_consecutiveFailures` still 0). The undo-a-success/record-a-failure dance is also fragile:
|
|
if a future change adds an early return or exception between `RecordSuccess` and
|
|
`ThrowIfFailed`, the success is never reversed.
|
|
|
|
**Recommendation:** Classify the call once: do not call `RecordSuccess` until the
|
|
sidecar-level `Success` flag has been checked, or pass the reply success/error into a
|
|
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
|
|
counters under one lock acquisition.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `Ipc/FrameReader.cs:31-32` |
|
|
| Status | Open |
|
|
|
|
**Description:** After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind
|
|
byte with the synchronous, blocking `_stream.ReadByte()` and ignores the
|
|
`CancellationToken`. On a `NamedPipeClientStream` with `PipeOptions.Asynchronous`, a
|
|
synchronous `ReadByte()` blocks the calling thread until a byte arrives or the pipe
|
|
closes. If the sidecar sends a length prefix and then stalls (slow/hung peer), the call
|
|
hangs on a thread-pool thread and the `EffectiveCallTimeout` linked token in
|
|
`PipeChannel.InvokeAsync` cannot interrupt it because the timeout only fires between
|
|
awaits. This defeats the documented cap on a single read/write call once connected and can
|
|
wedge the single-in-flight call gate.
|
|
|
|
**Recommendation:** Read the kind byte asynchronously and cancellably: extend the length
|
|
prefix read to 5 bytes, or do a second `ReadExactAsync(new byte[1], ct)`. This makes the
|
|
whole frame read honor the call-timeout token and matches the async style of the rest of
|
|
the reader.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
|
|
| Status | Open |
|
|
|
|
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
|
|
otherwise propagates. The options expose `ReconnectInitialBackoff` and
|
|
`ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential
|
|
backoff between reconnects, but neither field is referenced anywhere in the module: the
|
|
single retry reconnects immediately with no delay. A sidecar that is restarting will
|
|
reject or refuse the immediate reconnect, the call fails, and there is no backoff before
|
|
the next caller-driven attempt. Either the backoff belongs in the channel and is missing,
|
|
or the options are dead config that misleads operators.
|
|
|
|
**Recommendation:** Either implement the documented exponential backoff in the reconnect
|
|
path, or remove the two unused option fields and their XML docs and state plainly that
|
|
retry/backoff is owned by the caller (the alarm drain worker / history router).
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Security |
|
|
| Location | `WonderwareHistorianClient.cs:276` |
|
|
| Status | Open |
|
|
|
|
**Description:** `ToSnapshots` deserializes peer-supplied bytes with
|
|
`MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack
|
|
deserialization. The `object` overload resolves runtime types from the wire payload. The
|
|
client treats the pipe peer as untrusted elsewhere (16 MiB frame cap stated to protect
|
|
the receiver from a hostile or buggy peer, shared-secret Hello). Typeless deserialization
|
|
of bytes that originate from the historian database widens the trust surface. The
|
|
MessagePack standard resolver is primitive-only by default so the practical blast radius
|
|
is limited, but this is the pattern called out by the two suppressed MessagePack
|
|
advisories on this project (see finding 008).
|
|
|
|
**Recommendation:** Confirm the serializer options here use the default (non-typeless)
|
|
resolver and that no `TypelessContractlessStandardResolver` is in play; if so, document
|
|
that. Prefer round-tripping the value as a constrained set of known primitive types rather
|
|
than `object`, and validate `ValueBytes.Length` against a sane per-sample cap before
|
|
deserializing.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Security |
|
|
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
|
|
| Status | Open |
|
|
|
|
**Description:** The csproj suppresses two NuGet audit advisories
|
|
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
|
|
with no inline comment recording why the suppression is safe, who reviewed it, or when it
|
|
should be revisited. Blanket `NuGetAuditSuppress` entries silence the very signal that
|
|
would flag the next related CVE. Combined with finding 007 (typeless deserialization), an
|
|
unexplained MessagePack advisory suppression is a maintainability and audit-trail gap.
|
|
|
|
**Recommendation:** Add an XML comment next to each `NuGetAuditSuppress` stating the
|
|
advisory title, why it does not apply to this module usage, and a revisit trigger. Track a
|
|
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
|
|
can be dropped.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Testing coverage |
|
|
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
|
|
| Status | Open |
|
|
|
|
**Description:** The suite covers happy paths, server-error, bad-secret, a single
|
|
reconnect and health counters, but several critical paths are untested:
|
|
(1) `ReadAtTimeAsync` with a partial/reordered sidecar reply, the contract-alignment case
|
|
from finding 001 (the existing test only echoes the request);
|
|
(2) the `WriteBatchAsync` catch branch, a transport/deserialization throw during a write,
|
|
which must return `RetryPlease` for every event;
|
|
(3) `InvokeAsync` second-attempt-also-fails path (the test only proves a successful
|
|
reconnect, never a reconnect that fails again and propagates);
|
|
(4) the `CallTimeout` path, no test asserts that a stalled sidecar produces a timed-out
|
|
`OperationCanceledException`;
|
|
(5) `MapAggregate` for `HistoryAggregateType.Total` throwing `NotSupportedException`;
|
|
(6) the `InvalidDataException` path when the sidecar replies with an unexpected
|
|
`MessageKind`. The byte-equality / round-trip parity test the Contracts.cs and Framing.cs
|
|
comments repeatedly promise is not present in this test project.
|
|
|
|
**Recommendation:** Add the missing-edge-case tests above. In particular add the
|
|
wire-parity test the source comments commit to: serialize each DTO with the client copy
|
|
and assert byte-equality against the sidecar `Driver.Historian.Wonderware.Ipc` copy, so a
|
|
silent `[Key]` drift between the two duplicated contract sets is caught at build time.
|
|
|
|
**Resolution:** _(open)_
|
|
|
|
### Driver.Historian.Wonderware.Client-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
|
|
| Status | Open |
|
|
|
|
**Description:** Two doc/behaviour mismatches.
|
|
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
|
|
non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync`
|
|
calls `ResetTransport()`, which invokes synchronous `Stream.Dispose()` on a
|
|
`NamedPipeClientStream`; pipe disposal can block briefly on OS handle teardown. The bridge
|
|
is safe (no deadlock, no captured context) but not strictly non-blocking; the comment
|
|
should say "does not deadlock".
|
|
(2) `GetHealthSnapshot` populates both `ProcessConnectionOpen` and `EventConnectionOpen`
|
|
from the same `_channel.IsConnected`, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes`
|
|
are hard-coded to null/empty. A consumer reading `HistorianHealthSnapshot` would assume
|
|
two independent connections and per-node health; this client has a single channel and no
|
|
node concept. The collapse is reasonable but undocumented.
|
|
|
|
**Recommendation:** Reword the `Dispose()` comment to claim only deadlock-safety. Add a
|
|
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
|
|
connection flags to one transport and does not track per-node health.
|
|
|
|
**Resolution:** _(open)_
|