fix(driver-historian-wonderware-client): resolve Low code-review findings (Driver.Historian.Wonderware.Client-003,004,006,008,010)

- Driver.Historian.Wonderware.Client-003: replaced the mixed Interlocked
  + healthLock counters with RecordOutcome that touches _totalQueries
  and exactly one of _totalSuccesses / _totalFailures under one
  acquisition.
- Driver.Historian.Wonderware.Client-004: InvokeAndClassifyAsync routes
  transport + sidecar classification through a single RecordOutcome
  call; the legacy ReclassifySuccessAsFailure two-step is gone.
- Driver.Historian.Wonderware.Client-006: removed the dead
  ReconnectInitialBackoff / ReconnectMaxBackoff options and added a
  doc <remarks> stating the channel performs a single in-place
  reconnect; retry/backoff stays with the caller.
- Driver.Historian.Wonderware.Client-008: the audit-suppression comment
  block now records advisory titles, why neither applies, and the
  revisit trigger.
- Driver.Historian.Wonderware.Client-010: reworded Dispose() to claim
  deadlock-safety and added a GetHealthSnapshot summary documenting the
  single-channel collapse + counter invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 11:12:16 -04:00
parent 3ca569f621
commit 879925180b
4 changed files with 206 additions and 71 deletions

View File

@@ -7,7 +7,7 @@
| Review date | 2026-05-22 | | Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` | | Commit reviewed | `76d35d1` |
| Status | Reviewed | | Status | Reviewed |
| Open findings | 5 | | Open findings | 0 |
## Checklist coverage ## Checklist coverage
@@ -92,7 +92,7 @@ dead-lettered. Until then, document explicitly that this writer never produces
| Severity | Low | | Severity | Low |
| Category | Concurrency & thread safety | | Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` | | Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open | | Status | Resolved |
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but **Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
@@ -106,7 +106,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/ `_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
`RecordFailure`) so all six health fields share a single lock. `RecordFailure`) so all six health fields share a single lock.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — replaced the mixed `Interlocked.Increment(ref _totalQueries)` + `_healthLock`-protected outcome counters with a single `RecordOutcome(bool success, string? error)` helper that increments `_totalQueries` and exactly one of `_totalSuccesses` / `_totalFailures` under one `_healthLock` acquisition; `GetHealthSnapshot` documents the invariant that `TotalSuccesses + TotalFailures == TotalQueries` at every observed snapshot. Added the regression test `GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent` that runs a polling reader concurrently with 50 calls and asserts the invariant never breaks (fails red against the previous code, passes green now).
### Driver.Historian.Wonderware.Client-004 ### Driver.Historian.Wonderware.Client-004
@@ -115,7 +115,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
| Severity | Low | | Severity | Low |
| Category | Concurrency & thread safety | | Category | Concurrency & thread safety |
| Location | `WonderwareHistorianClient.cs:203-267` | | Location | `WonderwareHistorianClient.cs:203-267` |
| Status | Open | | Status | Resolved |
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under **Description:** A sidecar-reported failure is recorded in two non-atomic steps under
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
@@ -132,7 +132,7 @@ sidecar-level `Success` flag has been checked, or pass the reply success/error i
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
counters under one lock acquisition. counters under one lock acquisition.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — eliminated the `RecordSuccess``ReclassifySuccessAsFailure` undo dance. `InvokeAsync` now takes a `Func<TReply, (bool ok, string? error)>` evaluator, evaluates it once when the transport reply lands, and calls `RecordOutcome(bool success, string? error)` exactly once per call under a single `_healthLock` acquisition. A sidecar-reported failure is now classified as a failure on its first and only counter update — no transient "success then undo" state is observable. The read-side `InvokeAndClassifyAsync` wrapper preserves the prior `InvalidOperationException` throw on sidecar failure. Added regression test `GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter` pinning `TotalSuccesses=0`/`TotalFailures=1` after a sidecar-error call.
### Driver.Historian.Wonderware.Client-005 ### Driver.Historian.Wonderware.Client-005
@@ -167,7 +167,7 @@ the reader.
| Severity | Low | | Severity | Low |
| Category | Error handling & resilience | | Category | Error handling & resilience |
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` | | Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
| Status | Open | | Status | Resolved |
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and **Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
otherwise propagates. The options expose `ReconnectInitialBackoff` and otherwise propagates. The options expose `ReconnectInitialBackoff` and
@@ -182,7 +182,7 @@ or the options are dead config that misleads operators.
path, or remove the two unused option fields and their XML docs and state plainly that path, or remove the two unused option fields and their XML docs and state plainly that
retry/backoff is owned by the caller (the alarm drain worker / history router). retry/backoff is owned by the caller (the alarm drain worker / history router).
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — removed the dead `ReconnectInitialBackoff`/`ReconnectMaxBackoff` fields (and their `Effective*` accessors) from `WonderwareHistorianClientOptions` and added a `<remarks>` block stating that retry/backoff is owned by the caller (the alarm drain worker and the read-side history router) and that the channel itself performs exactly one in-place reconnect with no delay. Confirmed no consumer referenced the removed fields (only `code-reviews/` references remain). Solution-level build clean — Server picks up the new options shape without change.
### Driver.Historian.Wonderware.Client-007 ### Driver.Historian.Wonderware.Client-007
@@ -218,7 +218,7 @@ deserializing.
| Severity | Low | | Severity | Low |
| Category | Security | | Category | Security |
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` | | Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
| Status | Open | | Status | Resolved |
**Description:** The csproj suppresses two NuGet audit advisories **Description:** The csproj suppresses two NuGet audit advisories
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency (`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
@@ -232,7 +232,7 @@ advisory title, why it does not apply to this module usage, and a revisit trigge
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
can be dropped. can be dropped.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — the suppression block in the csproj (already added under finding 007) records each advisory title (GHSA-37gx-xxp4-5rgx unsafe-dynamic-codegen, GHSA-w3x6-4m5h-cxqf typeless-resolver gadget chain), why neither applies to this module (default `StandardResolver` only, no `TypelessContractlessStandardResolver` / `DynamicUnion` / `DynamicGenericResolver`, plus the 64 KiB per-sample ValueBytes cap in `DeserializeSampleValue` from finding 007), and the revisit trigger ("Revisit once MessagePack 3.x is available and drop these suppressions at that time"). All three pieces the recommendation asked for are present; the single comment block above both `NuGetAuditSuppress` entries was confirmed to satisfy the audit-trail gap.
### Driver.Historian.Wonderware.Client-009 ### Driver.Historian.Wonderware.Client-009
@@ -272,7 +272,7 @@ silent `[Key]` drift between the two duplicated contract sets is caught at build
| Severity | Low | | Severity | Low |
| Category | Documentation & comments | | Category | Documentation & comments |
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` | | Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
| Status | Open | | Status | Resolved |
**Description:** Two doc/behaviour mismatches. **Description:** Two doc/behaviour mismatches.
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is (1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
@@ -291,4 +291,4 @@ node concept. The collapse is reasonable but undocumented.
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
connection flags to one transport and does not track per-node health. connection flags to one transport and does not track per-node health.
**Resolution:** _(open)_ **Resolution:** Resolved 2026-05-23 — (1) reworded the `Dispose()` XML comment to drop the "non-blocking" claim and instead state that the bridge is **deadlock-safe** because the cleanup never awaits a captured `SynchronizationContext` nor takes any lock the caller could hold, while acknowledging that `NamedPipeClientStream` teardown can block briefly on OS handle release. (2) Added a full `<summary>` + `<remarks>` block to `GetHealthSnapshot` explaining the single-channel collapse — both `ProcessConnectionOpen` and `EventConnectionOpen` report the same channel state, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes` are intentionally null/empty because the client has no per-node telemetry. The remarks also pin the finding-003/004 invariant `TotalSuccesses + TotalFailures == TotalQueries`.

View File

@@ -72,8 +72,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
MaxValues = (int)Math.Min(maxValuesPerNode, int.MaxValue), MaxValues = (int)Math.Min(maxValuesPerNode, int.MaxValue),
CorrelationId = Guid.NewGuid().ToString("N"), CorrelationId = Guid.NewGuid().ToString("N"),
}; };
var reply = await Invoke<ReadRawRequest, ReadRawReply>(MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req, cancellationToken).ConfigureAwait(false); var reply = await InvokeAndClassifyAsync<ReadRawRequest, ReadRawReply>(
ThrowIfFailed(reply.Success, reply.Error, "ReadRaw"); MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req,
r => (r.Success, r.Error), "ReadRaw", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(ToSnapshots(reply.Samples), ContinuationPoint: null); return new HistoryReadResult(ToSnapshots(reply.Samples), ContinuationPoint: null);
} }
@@ -90,8 +91,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
AggregateColumn = MapAggregate(aggregate), AggregateColumn = MapAggregate(aggregate),
CorrelationId = Guid.NewGuid().ToString("N"), CorrelationId = Guid.NewGuid().ToString("N"),
}; };
var reply = await Invoke<ReadProcessedRequest, ReadProcessedReply>(MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req, cancellationToken).ConfigureAwait(false); var reply = await InvokeAndClassifyAsync<ReadProcessedRequest, ReadProcessedReply>(
ThrowIfFailed(reply.Success, reply.Error, "ReadProcessed"); MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req,
r => (r.Success, r.Error), "ReadProcessed", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(ToAggregateSnapshots(reply.Buckets), ContinuationPoint: null); return new HistoryReadResult(ToAggregateSnapshots(reply.Buckets), ContinuationPoint: null);
} }
@@ -107,8 +109,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
TimestampsUtcTicks = ticks, TimestampsUtcTicks = ticks,
CorrelationId = Guid.NewGuid().ToString("N"), CorrelationId = Guid.NewGuid().ToString("N"),
}; };
var reply = await Invoke<ReadAtTimeRequest, ReadAtTimeReply>(MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req, cancellationToken).ConfigureAwait(false); var reply = await InvokeAndClassifyAsync<ReadAtTimeRequest, ReadAtTimeReply>(
ThrowIfFailed(reply.Success, reply.Error, "ReadAtTime"); MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req,
r => (r.Success, r.Error), "ReadAtTime", cancellationToken).ConfigureAwait(false);
return new HistoryReadResult(AlignAtTimeSnapshots(timestampsUtc, reply.Samples), ContinuationPoint: null); return new HistoryReadResult(AlignAtTimeSnapshots(timestampsUtc, reply.Samples), ContinuationPoint: null);
} }
@@ -167,11 +170,34 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
MaxEvents = maxEvents, MaxEvents = maxEvents,
CorrelationId = Guid.NewGuid().ToString("N"), CorrelationId = Guid.NewGuid().ToString("N"),
}; };
var reply = await Invoke<ReadEventsRequest, ReadEventsReply>(MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req, cancellationToken).ConfigureAwait(false); var reply = await InvokeAndClassifyAsync<ReadEventsRequest, ReadEventsReply>(
ThrowIfFailed(reply.Success, reply.Error, "ReadEvents"); MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req,
r => (r.Success, r.Error), "ReadEvents", cancellationToken).ConfigureAwait(false);
return new HistoricalEventsResult(ToHistoricalEvents(reply.Events), ContinuationPoint: null); return new HistoricalEventsResult(ToHistoricalEvents(reply.Events), ContinuationPoint: null);
} }
/// <summary>
/// Returns a snapshot of operation counters and the single pipe channel's connection
/// state.
/// </summary>
/// <remarks>
/// This client owns one duplex named-pipe channel to the sidecar — it has no notion of
/// separate process / event connections and no per-node telemetry. The single channel's
/// connected state is reported for both <see cref="HistorianHealthSnapshot.ProcessConnectionOpen"/>
/// and <see cref="HistorianHealthSnapshot.EventConnectionOpen"/>, and
/// <see cref="HistorianHealthSnapshot.ActiveProcessNode"/> /
/// <see cref="HistorianHealthSnapshot.ActiveEventNode"/> /
/// <see cref="HistorianHealthSnapshot.Nodes"/> are intentionally null/empty. Consumers
/// that need to distinguish two connections should read another driver. (Finding 010.)
/// <para>
/// All six counter fields (TotalQueries, TotalSuccesses, TotalFailures,
/// ConsecutiveFailures, LastSuccessTime, LastFailureTime, LastError) are mutated
/// exclusively under <c>_healthLock</c>, so the snapshot is internally consistent —
/// in particular <c>TotalSuccesses + TotalFailures == TotalQueries</c> at every
/// observed snapshot (a call that has started but not yet completed has not
/// incremented any counter). (Finding 003 / 004.)
/// </para>
/// </remarks>
public HistorianHealthSnapshot GetHealthSnapshot() public HistorianHealthSnapshot GetHealthSnapshot()
{ {
lock (_healthLock) lock (_healthLock)
@@ -233,8 +259,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
try try
{ {
var reply = await Invoke<WriteAlarmEventsRequest, WriteAlarmEventsReply>( var reply = await InvokeAsync<WriteAlarmEventsRequest, WriteAlarmEventsReply>(
MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req, cancellationToken).ConfigureAwait(false); MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req,
r => (r.Success, r.Error), cancellationToken).ConfigureAwait(false);
// Whole-call failure → transient retry for every event in the batch. // Whole-call failure → transient retry for every event in the batch.
if (!reply.Success) if (!reply.Success)
@@ -279,70 +306,80 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
// ===== Helpers ===== // ===== Helpers =====
private async Task<TReply> Invoke<TRequest, TReply>( /// <summary>
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request, CancellationToken ct) /// Sends one request through the channel and records the outcome (transport success or
/// transport failure) under a single <c>_healthLock</c> acquisition that also bumps
/// <c>_totalQueries</c>. Sidecar-level success / failure is NOT classified here — the
/// caller passes that through <see cref="InvokeAndClassifyAsync"/> instead. (Finding
/// 003 / 004: all six counter fields share one synchronization mechanism so a snapshot
/// can never observe a torn state.)
/// </summary>
private async Task<TReply> InvokeAsync<TRequest, TReply>(
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
Func<TReply, (bool ok, string? error)> evaluate, CancellationToken ct)
where TReply : class where TReply : class
{ {
Interlocked.Increment(ref _totalQueries);
try try
{ {
var reply = await _channel.InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, ct).ConfigureAwait(false); var reply = await _channel.InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, ct).ConfigureAwait(false);
RecordSuccess(); // Classify transport+sidecar in one lock so TotalQueries/TotalSuccesses/
// TotalFailures move together and no intermediate "success-then-undo" state is
// visible to a concurrent GetHealthSnapshot.
var (ok, error) = evaluate(reply);
RecordOutcome(ok, error);
return reply; return reply;
} }
catch (Exception ex) catch (Exception ex)
{ {
RecordFailure(ex.Message); RecordOutcome(success: false, ex.Message);
throw; throw;
} }
} }
private void RecordSuccess() /// <summary>
/// Convenience wrapper around <see cref="InvokeAsync"/> that throws
/// <see cref="InvalidOperationException"/> on a sidecar-reported failure. Used by the
/// <see cref="IHistorianDataSource"/> read methods.
/// </summary>
private async Task<TReply> InvokeAndClassifyAsync<TRequest, TReply>(
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
Func<TReply, (bool ok, string? error)> evaluate, string op, CancellationToken ct)
where TReply : class
{
var reply = await InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, evaluate, ct).ConfigureAwait(false);
var (ok, error) = evaluate(reply);
if (!ok)
{
throw new InvalidOperationException(
$"Sidecar {op} failed: {error ?? "<no message>"}.");
}
return reply;
}
/// <summary>
/// Records the outcome of a single call — increments <c>_totalQueries</c> and exactly
/// one of <c>_totalSuccesses</c> / <c>_totalFailures</c> under a single
/// <c>_healthLock</c> acquisition. (Findings 003 + 004.)
/// </summary>
private void RecordOutcome(bool success, string? error)
{ {
lock (_healthLock) lock (_healthLock)
{
_totalQueries++;
if (success)
{ {
_totalSuccesses++; _totalSuccesses++;
_consecutiveFailures = 0; _consecutiveFailures = 0;
_lastSuccessUtc = DateTime.UtcNow; _lastSuccessUtc = DateTime.UtcNow;
} }
} else
private void RecordFailure(string message)
{
lock (_healthLock)
{ {
_totalFailures++; _totalFailures++;
_consecutiveFailures++; _consecutiveFailures++;
_lastFailureUtc = DateTime.UtcNow; _lastFailureUtc = DateTime.UtcNow;
_lastError = message; _lastError = error;
} }
} }
private void ThrowIfFailed(bool success, string? error, string op)
{
if (!success)
{
// Sidecar-reported failure counts as an operation failure even though the
// transport delivered a reply. The Invoke wrapper already recorded transport
// success — undo that and record the failure so the health snapshot reflects
// operation-level success rates rather than just connectivity.
ReclassifySuccessAsFailure(error);
throw new InvalidOperationException(
$"Sidecar {op} failed: {error ?? "<no message>"}.");
}
}
private void ReclassifySuccessAsFailure(string? message)
{
lock (_healthLock)
{
// Transport-level RecordSuccess happened a moment ago; reverse it.
_totalSuccesses--;
_totalFailures++;
_consecutiveFailures++;
_lastFailureUtc = DateTime.UtcNow;
_lastError = message;
}
} }
/// <summary> /// <summary>
@@ -452,9 +489,12 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
/// <summary> /// <summary>
/// Synchronous dispose required by <see cref="IDisposable"/> on /// Synchronous dispose required by <see cref="IDisposable"/> on
/// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup is /// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup runs
/// non-blocking (just resets transport state + disposes streams), so the /// <see cref="System.IO.Pipes.NamedPipeClientStream"/> teardown, which can block briefly
/// GetAwaiter()/GetResult() bridge is safe. /// on OS handle release — strictly speaking it is not non-blocking — but the
/// <c>GetAwaiter()/GetResult()</c> bridge is deadlock-safe because the cleanup never
/// awaits a captured <see cref="System.Threading.SynchronizationContext"/> nor takes any
/// lock that the caller could hold. (Finding 010.)
/// </summary> /// </summary>
public void Dispose() => _channel.DisposeAsync().AsTask().GetAwaiter().GetResult(); public void Dispose() => _channel.DisposeAsync().AsTask().GetAwaiter().GetResult();
} }

View File

@@ -3,24 +3,28 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client;
/// <summary> /// <summary>
/// Connection options for <see cref="WonderwareHistorianClient"/>. /// Connection options for <see cref="WonderwareHistorianClient"/>.
/// </summary> /// </summary>
/// <remarks>
/// <para>
/// <b>Retry / backoff ownership (finding 006):</b> this module performs exactly one
/// in-place transport reconnect inside <c>PipeChannel.InvokeAsync</c> with no delay,
/// and does NOT implement exponential reconnect backoff. Broader retry/backoff is the
/// caller's responsibility — the alarm drain worker
/// (<c>Core.AlarmHistorian.SqliteStoreAndForwardSink</c>) and the read-side
/// history router are expected to layer their own backoff on top.
/// </para>
/// </remarks>
/// <param name="PipeName">Named-pipe name the sidecar listens on (matches the sidecar's <c>OTOPCUA_HISTORIAN_PIPE</c>).</param> /// <param name="PipeName">Named-pipe name the sidecar listens on (matches the sidecar's <c>OTOPCUA_HISTORIAN_PIPE</c>).</param>
/// <param name="SharedSecret">Per-process shared secret the sidecar will verify in the Hello frame.</param> /// <param name="SharedSecret">Per-process shared secret the sidecar will verify in the Hello frame.</param>
/// <param name="PeerName">Diagnostic peer identifier sent in Hello — typically the OtOpcUa instance id.</param> /// <param name="PeerName">Diagnostic peer identifier sent in Hello — typically the OtOpcUa instance id.</param>
/// <param name="ConnectTimeout">Cap on the named-pipe connect + Hello round trip on each (re)connect.</param> /// <param name="ConnectTimeout">Cap on the named-pipe connect + Hello round trip on each (re)connect.</param>
/// <param name="CallTimeout">Cap on a single read/write call once connected.</param> /// <param name="CallTimeout">Cap on a single read/write call once connected.</param>
/// <param name="ReconnectInitialBackoff">Backoff between the first failed reconnect attempts.</param>
/// <param name="ReconnectMaxBackoff">Upper bound on the exponential backoff between reconnects.</param>
public sealed record WonderwareHistorianClientOptions( public sealed record WonderwareHistorianClientOptions(
string PipeName, string PipeName,
string SharedSecret, string SharedSecret,
string PeerName = "OtOpcUa", string PeerName = "OtOpcUa",
TimeSpan? ConnectTimeout = null, TimeSpan? ConnectTimeout = null,
TimeSpan? CallTimeout = null, TimeSpan? CallTimeout = null)
TimeSpan? ReconnectInitialBackoff = null,
TimeSpan? ReconnectMaxBackoff = null)
{ {
public TimeSpan EffectiveConnectTimeout => ConnectTimeout ?? TimeSpan.FromSeconds(10); public TimeSpan EffectiveConnectTimeout => ConnectTimeout ?? TimeSpan.FromSeconds(10);
public TimeSpan EffectiveCallTimeout => CallTimeout ?? TimeSpan.FromSeconds(30); public TimeSpan EffectiveCallTimeout => CallTimeout ?? TimeSpan.FromSeconds(30);
public TimeSpan EffectiveReconnectInitialBackoff => ReconnectInitialBackoff ?? TimeSpan.FromMilliseconds(500);
public TimeSpan EffectiveReconnectMaxBackoff => ReconnectMaxBackoff ?? TimeSpan.FromSeconds(30);
} }

View File

@@ -491,4 +491,95 @@ public sealed class WonderwareHistorianClientTests
await Should.ThrowAsync<InvalidDataException>(() => await Should.ThrowAsync<InvalidDataException>(() =>
client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 100, CancellationToken.None)); client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 100, CancellationToken.None));
} }
// ===== Finding-003 / Finding-004: health counter consistency =====
/// <summary>
/// (Finding 003 + 004) A sidecar-level failure must be classified once: TotalSuccesses
/// must stay at 0, TotalFailures must become 1, and TotalQueries / TotalSuccesses /
/// TotalFailures must all be updated under the same lock so a concurrent snapshot can
/// never observe inflated successes or out-of-band TotalQueries. This pins behaviour so
/// a future regression to the "RecordSuccess then undo via ReclassifySuccessAsFailure"
/// dance is caught.
/// </summary>
[Fact]
public async Task GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter()
{
var pipe = UniquePipeName();
await using var server = new FakeSidecarServer(pipe, Secret)
{
OnReadRaw = _ => new ReadRawReply { Success = false, Error = "boom" },
};
await server.StartAsync();
await using var client = new WonderwareHistorianClient(OptsFor(pipe));
await Should.ThrowAsync<InvalidOperationException>(() =>
client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 1, CancellationToken.None));
var snap = client.GetHealthSnapshot();
snap.TotalQueries.ShouldBe(1);
snap.TotalSuccesses.ShouldBe(0);
snap.TotalFailures.ShouldBe(1);
snap.ConsecutiveFailures.ShouldBe(1);
snap.LastError.ShouldNotBeNull();
}
/// <summary>
/// (Finding 003) Concurrent calls + concurrent <see cref="WonderwareHistorianClient.GetHealthSnapshot"/>
/// reads must observe consistent counters. Specifically, TotalSuccesses + TotalFailures
/// must equal TotalQueries at every observed snapshot (no torn read between an
/// Interlocked-incremented TotalQueries and a lock-protected outcome counter). The
/// channel serializes calls, so the test is observable: each completed query strictly
/// increments either successes or failures by one.
/// </summary>
[Fact]
public async Task GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent()
{
var pipe = UniquePipeName();
await using var server = new FakeSidecarServer(pipe, Secret)
{
OnReadRaw = _ => new ReadRawReply { Success = true },
};
await server.StartAsync();
await using var client = new WonderwareHistorianClient(OptsFor(pipe));
using var stop = new CancellationTokenSource();
var readerSawInconsistent = false;
#pragma warning disable xUnit1051 // Internal Task.Run loop drives a polling stress test; cancellation flows via stop.IsCancellationRequested below.
var reader = Task.Run(() =>
{
while (!stop.IsCancellationRequested)
{
var snap = client.GetHealthSnapshot();
// Every completed call increments TotalQueries AND exactly one of
// TotalSuccesses or TotalFailures under the same lock; an in-flight call
// has not yet incremented any of them. So TotalQueries should always equal
// the sum of TotalSuccesses + TotalFailures (no in-between state visible).
if (snap.TotalSuccesses + snap.TotalFailures != snap.TotalQueries)
{
readerSawInconsistent = true;
}
}
});
#pragma warning restore xUnit1051
for (var i = 0; i < 50; i++)
{
await client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 1, TestContext.Current.CancellationToken);
}
stop.Cancel();
await reader;
readerSawInconsistent.ShouldBeFalse(
"GetHealthSnapshot exposed TotalQueries that disagreed with the sum of TotalSuccesses + TotalFailures — counters are not updated under a single lock.");
var final = client.GetHealthSnapshot();
final.TotalQueries.ShouldBe(50);
final.TotalSuccesses.ShouldBe(50);
final.TotalFailures.ShouldBe(0);
}
} }