fix(driver-historian-wonderware-client): resolve Low code-review findings (Driver.Historian.Wonderware.Client-003,004,006,008,010)
- Driver.Historian.Wonderware.Client-003: replaced the mixed Interlocked + healthLock counters with RecordOutcome that touches _totalQueries and exactly one of _totalSuccesses / _totalFailures under one acquisition. - Driver.Historian.Wonderware.Client-004: InvokeAndClassifyAsync routes transport + sidecar classification through a single RecordOutcome call; the legacy ReclassifySuccessAsFailure two-step is gone. - Driver.Historian.Wonderware.Client-006: removed the dead ReconnectInitialBackoff / ReconnectMaxBackoff options and added a doc <remarks> stating the channel performs a single in-place reconnect; retry/backoff stays with the caller. - Driver.Historian.Wonderware.Client-008: the audit-suppression comment block now records advisory titles, why neither applies, and the revisit trigger. - Driver.Historian.Wonderware.Client-010: reworded Dispose() to claim deadlock-safety and added a GetHealthSnapshot summary documenting the single-channel collapse + counter invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -7,7 +7,7 @@
|
|||||||
| Review date | 2026-05-22 |
|
| Review date | 2026-05-22 |
|
||||||
| Commit reviewed | `76d35d1` |
|
| Commit reviewed | `76d35d1` |
|
||||||
| Status | Reviewed |
|
| Status | Reviewed |
|
||||||
| Open findings | 5 |
|
| Open findings | 0 |
|
||||||
|
|
||||||
## Checklist coverage
|
## Checklist coverage
|
||||||
|
|
||||||
@@ -92,7 +92,7 @@ dead-lettered. Until then, document explicitly that this writer never produces
|
|||||||
| Severity | Low |
|
| Severity | Low |
|
||||||
| Category | Concurrency & thread safety |
|
| Category | Concurrency & thread safety |
|
||||||
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
|
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
|
|
||||||
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
|
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
|
||||||
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
|
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
|
||||||
@@ -106,7 +106,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
|
|||||||
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
|
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
|
||||||
`RecordFailure`) so all six health fields share a single lock.
|
`RecordFailure`) so all six health fields share a single lock.
|
||||||
|
|
||||||
**Resolution:** _(open)_
|
**Resolution:** Resolved 2026-05-23 — replaced the mixed `Interlocked.Increment(ref _totalQueries)` + `_healthLock`-protected outcome counters with a single `RecordOutcome(bool success, string? error)` helper that increments `_totalQueries` and exactly one of `_totalSuccesses` / `_totalFailures` under one `_healthLock` acquisition; `GetHealthSnapshot` documents the invariant that `TotalSuccesses + TotalFailures == TotalQueries` at every observed snapshot. Added the regression test `GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent` that runs a polling reader concurrently with 50 calls and asserts the invariant never breaks (fails red against the previous code, passes green now).
|
||||||
|
|
||||||
### Driver.Historian.Wonderware.Client-004
|
### Driver.Historian.Wonderware.Client-004
|
||||||
|
|
||||||
@@ -115,7 +115,7 @@ and the counters are advisory, but the mixed model is a latent hazard.
|
|||||||
| Severity | Low |
|
| Severity | Low |
|
||||||
| Category | Concurrency & thread safety |
|
| Category | Concurrency & thread safety |
|
||||||
| Location | `WonderwareHistorianClient.cs:203-267` |
|
| Location | `WonderwareHistorianClient.cs:203-267` |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
|
|
||||||
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
|
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
|
||||||
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
|
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
|
||||||
@@ -132,7 +132,7 @@ sidecar-level `Success` flag has been checked, or pass the reply success/error i
|
|||||||
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
|
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
|
||||||
counters under one lock acquisition.
|
counters under one lock acquisition.
|
||||||
|
|
||||||
**Resolution:** _(open)_
|
**Resolution:** Resolved 2026-05-23 — eliminated the `RecordSuccess` → `ReclassifySuccessAsFailure` undo dance. `InvokeAsync` now takes a `Func<TReply, (bool ok, string? error)>` evaluator, evaluates it once when the transport reply lands, and calls `RecordOutcome(bool success, string? error)` exactly once per call under a single `_healthLock` acquisition. A sidecar-reported failure is now classified as a failure on its first and only counter update — no transient "success then undo" state is observable. The read-side `InvokeAndClassifyAsync` wrapper preserves the prior `InvalidOperationException` throw on sidecar failure. Added regression test `GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter` pinning `TotalSuccesses=0`/`TotalFailures=1` after a sidecar-error call.
|
||||||
|
|
||||||
### Driver.Historian.Wonderware.Client-005
|
### Driver.Historian.Wonderware.Client-005
|
||||||
|
|
||||||
@@ -167,7 +167,7 @@ the reader.
|
|||||||
| Severity | Low |
|
| Severity | Low |
|
||||||
| Category | Error handling & resilience |
|
| Category | Error handling & resilience |
|
||||||
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
|
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
|
|
||||||
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
|
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
|
||||||
otherwise propagates. The options expose `ReconnectInitialBackoff` and
|
otherwise propagates. The options expose `ReconnectInitialBackoff` and
|
||||||
@@ -182,7 +182,7 @@ or the options are dead config that misleads operators.
|
|||||||
path, or remove the two unused option fields and their XML docs and state plainly that
|
path, or remove the two unused option fields and their XML docs and state plainly that
|
||||||
retry/backoff is owned by the caller (the alarm drain worker / history router).
|
retry/backoff is owned by the caller (the alarm drain worker / history router).
|
||||||
|
|
||||||
**Resolution:** _(open)_
|
**Resolution:** Resolved 2026-05-23 — removed the dead `ReconnectInitialBackoff`/`ReconnectMaxBackoff` fields (and their `Effective*` accessors) from `WonderwareHistorianClientOptions` and added a `<remarks>` block stating that retry/backoff is owned by the caller (the alarm drain worker and the read-side history router) and that the channel itself performs exactly one in-place reconnect with no delay. Confirmed no consumer referenced the removed fields (only `code-reviews/` references remain). Solution-level build clean — Server picks up the new options shape without change.
|
||||||
|
|
||||||
### Driver.Historian.Wonderware.Client-007
|
### Driver.Historian.Wonderware.Client-007
|
||||||
|
|
||||||
@@ -218,7 +218,7 @@ deserializing.
|
|||||||
| Severity | Low |
|
| Severity | Low |
|
||||||
| Category | Security |
|
| Category | Security |
|
||||||
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
|
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
|
|
||||||
**Description:** The csproj suppresses two NuGet audit advisories
|
**Description:** The csproj suppresses two NuGet audit advisories
|
||||||
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
|
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
|
||||||
@@ -232,7 +232,7 @@ advisory title, why it does not apply to this module usage, and a revisit trigge
|
|||||||
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
|
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
|
||||||
can be dropped.
|
can be dropped.
|
||||||
|
|
||||||
**Resolution:** _(open)_
|
**Resolution:** Resolved 2026-05-23 — the suppression block in the csproj (already added under finding 007) records each advisory title (GHSA-37gx-xxp4-5rgx unsafe-dynamic-codegen, GHSA-w3x6-4m5h-cxqf typeless-resolver gadget chain), why neither applies to this module (default `StandardResolver` only, no `TypelessContractlessStandardResolver` / `DynamicUnion` / `DynamicGenericResolver`, plus the 64 KiB per-sample ValueBytes cap in `DeserializeSampleValue` from finding 007), and the revisit trigger ("Revisit once MessagePack 3.x is available and drop these suppressions at that time"). All three pieces the recommendation asked for are present; the single comment block above both `NuGetAuditSuppress` entries was confirmed to satisfy the audit-trail gap.
|
||||||
|
|
||||||
### Driver.Historian.Wonderware.Client-009
|
### Driver.Historian.Wonderware.Client-009
|
||||||
|
|
||||||
@@ -272,7 +272,7 @@ silent `[Key]` drift between the two duplicated contract sets is caught at build
|
|||||||
| Severity | Low |
|
| Severity | Low |
|
||||||
| Category | Documentation & comments |
|
| Category | Documentation & comments |
|
||||||
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
|
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
|
||||||
| Status | Open |
|
| Status | Resolved |
|
||||||
|
|
||||||
**Description:** Two doc/behaviour mismatches.
|
**Description:** Two doc/behaviour mismatches.
|
||||||
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
|
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
|
||||||
@@ -291,4 +291,4 @@ node concept. The collapse is reasonable but undocumented.
|
|||||||
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
|
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
|
||||||
connection flags to one transport and does not track per-node health.
|
connection flags to one transport and does not track per-node health.
|
||||||
|
|
||||||
**Resolution:** _(open)_
|
**Resolution:** Resolved 2026-05-23 — (1) reworded the `Dispose()` XML comment to drop the "non-blocking" claim and instead state that the bridge is **deadlock-safe** because the cleanup never awaits a captured `SynchronizationContext` nor takes any lock the caller could hold, while acknowledging that `NamedPipeClientStream` teardown can block briefly on OS handle release. (2) Added a full `<summary>` + `<remarks>` block to `GetHealthSnapshot` explaining the single-channel collapse — both `ProcessConnectionOpen` and `EventConnectionOpen` report the same channel state, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes` are intentionally null/empty because the client has no per-node telemetry. The remarks also pin the finding-003/004 invariant `TotalSuccesses + TotalFailures == TotalQueries`.
|
||||||
|
|||||||
@@ -72,8 +72,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
MaxValues = (int)Math.Min(maxValuesPerNode, int.MaxValue),
|
MaxValues = (int)Math.Min(maxValuesPerNode, int.MaxValue),
|
||||||
CorrelationId = Guid.NewGuid().ToString("N"),
|
CorrelationId = Guid.NewGuid().ToString("N"),
|
||||||
};
|
};
|
||||||
var reply = await Invoke<ReadRawRequest, ReadRawReply>(MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req, cancellationToken).ConfigureAwait(false);
|
var reply = await InvokeAndClassifyAsync<ReadRawRequest, ReadRawReply>(
|
||||||
ThrowIfFailed(reply.Success, reply.Error, "ReadRaw");
|
MessageKind.ReadRawRequest, MessageKind.ReadRawReply, req,
|
||||||
|
r => (r.Success, r.Error), "ReadRaw", cancellationToken).ConfigureAwait(false);
|
||||||
return new HistoryReadResult(ToSnapshots(reply.Samples), ContinuationPoint: null);
|
return new HistoryReadResult(ToSnapshots(reply.Samples), ContinuationPoint: null);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -90,8 +91,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
AggregateColumn = MapAggregate(aggregate),
|
AggregateColumn = MapAggregate(aggregate),
|
||||||
CorrelationId = Guid.NewGuid().ToString("N"),
|
CorrelationId = Guid.NewGuid().ToString("N"),
|
||||||
};
|
};
|
||||||
var reply = await Invoke<ReadProcessedRequest, ReadProcessedReply>(MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req, cancellationToken).ConfigureAwait(false);
|
var reply = await InvokeAndClassifyAsync<ReadProcessedRequest, ReadProcessedReply>(
|
||||||
ThrowIfFailed(reply.Success, reply.Error, "ReadProcessed");
|
MessageKind.ReadProcessedRequest, MessageKind.ReadProcessedReply, req,
|
||||||
|
r => (r.Success, r.Error), "ReadProcessed", cancellationToken).ConfigureAwait(false);
|
||||||
return new HistoryReadResult(ToAggregateSnapshots(reply.Buckets), ContinuationPoint: null);
|
return new HistoryReadResult(ToAggregateSnapshots(reply.Buckets), ContinuationPoint: null);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -107,8 +109,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
TimestampsUtcTicks = ticks,
|
TimestampsUtcTicks = ticks,
|
||||||
CorrelationId = Guid.NewGuid().ToString("N"),
|
CorrelationId = Guid.NewGuid().ToString("N"),
|
||||||
};
|
};
|
||||||
var reply = await Invoke<ReadAtTimeRequest, ReadAtTimeReply>(MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req, cancellationToken).ConfigureAwait(false);
|
var reply = await InvokeAndClassifyAsync<ReadAtTimeRequest, ReadAtTimeReply>(
|
||||||
ThrowIfFailed(reply.Success, reply.Error, "ReadAtTime");
|
MessageKind.ReadAtTimeRequest, MessageKind.ReadAtTimeReply, req,
|
||||||
|
r => (r.Success, r.Error), "ReadAtTime", cancellationToken).ConfigureAwait(false);
|
||||||
return new HistoryReadResult(AlignAtTimeSnapshots(timestampsUtc, reply.Samples), ContinuationPoint: null);
|
return new HistoryReadResult(AlignAtTimeSnapshots(timestampsUtc, reply.Samples), ContinuationPoint: null);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -167,11 +170,34 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
MaxEvents = maxEvents,
|
MaxEvents = maxEvents,
|
||||||
CorrelationId = Guid.NewGuid().ToString("N"),
|
CorrelationId = Guid.NewGuid().ToString("N"),
|
||||||
};
|
};
|
||||||
var reply = await Invoke<ReadEventsRequest, ReadEventsReply>(MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req, cancellationToken).ConfigureAwait(false);
|
var reply = await InvokeAndClassifyAsync<ReadEventsRequest, ReadEventsReply>(
|
||||||
ThrowIfFailed(reply.Success, reply.Error, "ReadEvents");
|
MessageKind.ReadEventsRequest, MessageKind.ReadEventsReply, req,
|
||||||
|
r => (r.Success, r.Error), "ReadEvents", cancellationToken).ConfigureAwait(false);
|
||||||
return new HistoricalEventsResult(ToHistoricalEvents(reply.Events), ContinuationPoint: null);
|
return new HistoricalEventsResult(ToHistoricalEvents(reply.Events), ContinuationPoint: null);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Returns a snapshot of operation counters and the single pipe channel's connection
|
||||||
|
/// state.
|
||||||
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// This client owns one duplex named-pipe channel to the sidecar — it has no notion of
|
||||||
|
/// separate process / event connections and no per-node telemetry. The single channel's
|
||||||
|
/// connected state is reported for both <see cref="HistorianHealthSnapshot.ProcessConnectionOpen"/>
|
||||||
|
/// and <see cref="HistorianHealthSnapshot.EventConnectionOpen"/>, and
|
||||||
|
/// <see cref="HistorianHealthSnapshot.ActiveProcessNode"/> /
|
||||||
|
/// <see cref="HistorianHealthSnapshot.ActiveEventNode"/> /
|
||||||
|
/// <see cref="HistorianHealthSnapshot.Nodes"/> are intentionally null/empty. Consumers
|
||||||
|
/// that need to distinguish two connections should read another driver. (Finding 010.)
|
||||||
|
/// <para>
|
||||||
|
/// All six counter fields (TotalQueries, TotalSuccesses, TotalFailures,
|
||||||
|
/// ConsecutiveFailures, LastSuccessTime, LastFailureTime, LastError) are mutated
|
||||||
|
/// exclusively under <c>_healthLock</c>, so the snapshot is internally consistent —
|
||||||
|
/// in particular <c>TotalSuccesses + TotalFailures == TotalQueries</c> at every
|
||||||
|
/// observed snapshot (a call that has started but not yet completed has not
|
||||||
|
/// incremented any counter). (Finding 003 / 004.)
|
||||||
|
/// </para>
|
||||||
|
/// </remarks>
|
||||||
public HistorianHealthSnapshot GetHealthSnapshot()
|
public HistorianHealthSnapshot GetHealthSnapshot()
|
||||||
{
|
{
|
||||||
lock (_healthLock)
|
lock (_healthLock)
|
||||||
@@ -233,8 +259,9 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var reply = await Invoke<WriteAlarmEventsRequest, WriteAlarmEventsReply>(
|
var reply = await InvokeAsync<WriteAlarmEventsRequest, WriteAlarmEventsReply>(
|
||||||
MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req, cancellationToken).ConfigureAwait(false);
|
MessageKind.WriteAlarmEventsRequest, MessageKind.WriteAlarmEventsReply, req,
|
||||||
|
r => (r.Success, r.Error), cancellationToken).ConfigureAwait(false);
|
||||||
|
|
||||||
// Whole-call failure → transient retry for every event in the batch.
|
// Whole-call failure → transient retry for every event in the batch.
|
||||||
if (!reply.Success)
|
if (!reply.Success)
|
||||||
@@ -279,69 +306,79 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
|
|
||||||
// ===== Helpers =====
|
// ===== Helpers =====
|
||||||
|
|
||||||
private async Task<TReply> Invoke<TRequest, TReply>(
|
/// <summary>
|
||||||
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request, CancellationToken ct)
|
/// Sends one request through the channel and records the outcome (transport success or
|
||||||
|
/// transport failure) under a single <c>_healthLock</c> acquisition that also bumps
|
||||||
|
/// <c>_totalQueries</c>. Sidecar-level success / failure is NOT classified here — the
|
||||||
|
/// caller passes that through <see cref="InvokeAndClassifyAsync"/> instead. (Finding
|
||||||
|
/// 003 / 004: all six counter fields share one synchronization mechanism so a snapshot
|
||||||
|
/// can never observe a torn state.)
|
||||||
|
/// </summary>
|
||||||
|
private async Task<TReply> InvokeAsync<TRequest, TReply>(
|
||||||
|
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
|
||||||
|
Func<TReply, (bool ok, string? error)> evaluate, CancellationToken ct)
|
||||||
where TReply : class
|
where TReply : class
|
||||||
{
|
{
|
||||||
Interlocked.Increment(ref _totalQueries);
|
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
var reply = await _channel.InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, ct).ConfigureAwait(false);
|
var reply = await _channel.InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, ct).ConfigureAwait(false);
|
||||||
RecordSuccess();
|
// Classify transport+sidecar in one lock so TotalQueries/TotalSuccesses/
|
||||||
|
// TotalFailures move together and no intermediate "success-then-undo" state is
|
||||||
|
// visible to a concurrent GetHealthSnapshot.
|
||||||
|
var (ok, error) = evaluate(reply);
|
||||||
|
RecordOutcome(ok, error);
|
||||||
return reply;
|
return reply;
|
||||||
}
|
}
|
||||||
catch (Exception ex)
|
catch (Exception ex)
|
||||||
{
|
{
|
||||||
RecordFailure(ex.Message);
|
RecordOutcome(success: false, ex.Message);
|
||||||
throw;
|
throw;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
private void RecordSuccess()
|
/// <summary>
|
||||||
|
/// Convenience wrapper around <see cref="InvokeAsync"/> that throws
|
||||||
|
/// <see cref="InvalidOperationException"/> on a sidecar-reported failure. Used by the
|
||||||
|
/// <see cref="IHistorianDataSource"/> read methods.
|
||||||
|
/// </summary>
|
||||||
|
private async Task<TReply> InvokeAndClassifyAsync<TRequest, TReply>(
|
||||||
|
MessageKind requestKind, MessageKind expectedReplyKind, TRequest request,
|
||||||
|
Func<TReply, (bool ok, string? error)> evaluate, string op, CancellationToken ct)
|
||||||
|
where TReply : class
|
||||||
{
|
{
|
||||||
lock (_healthLock)
|
var reply = await InvokeAsync<TRequest, TReply>(requestKind, expectedReplyKind, request, evaluate, ct).ConfigureAwait(false);
|
||||||
|
var (ok, error) = evaluate(reply);
|
||||||
|
if (!ok)
|
||||||
{
|
{
|
||||||
_totalSuccesses++;
|
|
||||||
_consecutiveFailures = 0;
|
|
||||||
_lastSuccessUtc = DateTime.UtcNow;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private void RecordFailure(string message)
|
|
||||||
{
|
|
||||||
lock (_healthLock)
|
|
||||||
{
|
|
||||||
_totalFailures++;
|
|
||||||
_consecutiveFailures++;
|
|
||||||
_lastFailureUtc = DateTime.UtcNow;
|
|
||||||
_lastError = message;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private void ThrowIfFailed(bool success, string? error, string op)
|
|
||||||
{
|
|
||||||
if (!success)
|
|
||||||
{
|
|
||||||
// Sidecar-reported failure counts as an operation failure even though the
|
|
||||||
// transport delivered a reply. The Invoke wrapper already recorded transport
|
|
||||||
// success — undo that and record the failure so the health snapshot reflects
|
|
||||||
// operation-level success rates rather than just connectivity.
|
|
||||||
ReclassifySuccessAsFailure(error);
|
|
||||||
throw new InvalidOperationException(
|
throw new InvalidOperationException(
|
||||||
$"Sidecar {op} failed: {error ?? "<no message>"}.");
|
$"Sidecar {op} failed: {error ?? "<no message>"}.");
|
||||||
}
|
}
|
||||||
|
return reply;
|
||||||
}
|
}
|
||||||
|
|
||||||
private void ReclassifySuccessAsFailure(string? message)
|
/// <summary>
|
||||||
|
/// Records the outcome of a single call — increments <c>_totalQueries</c> and exactly
|
||||||
|
/// one of <c>_totalSuccesses</c> / <c>_totalFailures</c> under a single
|
||||||
|
/// <c>_healthLock</c> acquisition. (Findings 003 + 004.)
|
||||||
|
/// </summary>
|
||||||
|
private void RecordOutcome(bool success, string? error)
|
||||||
{
|
{
|
||||||
lock (_healthLock)
|
lock (_healthLock)
|
||||||
{
|
{
|
||||||
// Transport-level RecordSuccess happened a moment ago; reverse it.
|
_totalQueries++;
|
||||||
_totalSuccesses--;
|
if (success)
|
||||||
_totalFailures++;
|
{
|
||||||
_consecutiveFailures++;
|
_totalSuccesses++;
|
||||||
_lastFailureUtc = DateTime.UtcNow;
|
_consecutiveFailures = 0;
|
||||||
_lastError = message;
|
_lastSuccessUtc = DateTime.UtcNow;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
_totalFailures++;
|
||||||
|
_consecutiveFailures++;
|
||||||
|
_lastFailureUtc = DateTime.UtcNow;
|
||||||
|
_lastError = error;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -452,9 +489,12 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
|
|||||||
|
|
||||||
/// <summary>
|
/// <summary>
|
||||||
/// Synchronous dispose required by <see cref="IDisposable"/> on
|
/// Synchronous dispose required by <see cref="IDisposable"/> on
|
||||||
/// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup is
|
/// <see cref="IHistorianDataSource"/>. The underlying channel's async cleanup runs
|
||||||
/// non-blocking (just resets transport state + disposes streams), so the
|
/// <see cref="System.IO.Pipes.NamedPipeClientStream"/> teardown, which can block briefly
|
||||||
/// GetAwaiter()/GetResult() bridge is safe.
|
/// on OS handle release — strictly speaking it is not non-blocking — but the
|
||||||
|
/// <c>GetAwaiter()/GetResult()</c> bridge is deadlock-safe because the cleanup never
|
||||||
|
/// awaits a captured <see cref="System.Threading.SynchronizationContext"/> nor takes any
|
||||||
|
/// lock that the caller could hold. (Finding 010.)
|
||||||
/// </summary>
|
/// </summary>
|
||||||
public void Dispose() => _channel.DisposeAsync().AsTask().GetAwaiter().GetResult();
|
public void Dispose() => _channel.DisposeAsync().AsTask().GetAwaiter().GetResult();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,24 +3,28 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client;
|
|||||||
/// <summary>
|
/// <summary>
|
||||||
/// Connection options for <see cref="WonderwareHistorianClient"/>.
|
/// Connection options for <see cref="WonderwareHistorianClient"/>.
|
||||||
/// </summary>
|
/// </summary>
|
||||||
|
/// <remarks>
|
||||||
|
/// <para>
|
||||||
|
/// <b>Retry / backoff ownership (finding 006):</b> this module performs exactly one
|
||||||
|
/// in-place transport reconnect inside <c>PipeChannel.InvokeAsync</c> with no delay,
|
||||||
|
/// and does NOT implement exponential reconnect backoff. Broader retry/backoff is the
|
||||||
|
/// caller's responsibility — the alarm drain worker
|
||||||
|
/// (<c>Core.AlarmHistorian.SqliteStoreAndForwardSink</c>) and the read-side
|
||||||
|
/// history router are expected to layer their own backoff on top.
|
||||||
|
/// </para>
|
||||||
|
/// </remarks>
|
||||||
/// <param name="PipeName">Named-pipe name the sidecar listens on (matches the sidecar's <c>OTOPCUA_HISTORIAN_PIPE</c>).</param>
|
/// <param name="PipeName">Named-pipe name the sidecar listens on (matches the sidecar's <c>OTOPCUA_HISTORIAN_PIPE</c>).</param>
|
||||||
/// <param name="SharedSecret">Per-process shared secret the sidecar will verify in the Hello frame.</param>
|
/// <param name="SharedSecret">Per-process shared secret the sidecar will verify in the Hello frame.</param>
|
||||||
/// <param name="PeerName">Diagnostic peer identifier sent in Hello — typically the OtOpcUa instance id.</param>
|
/// <param name="PeerName">Diagnostic peer identifier sent in Hello — typically the OtOpcUa instance id.</param>
|
||||||
/// <param name="ConnectTimeout">Cap on the named-pipe connect + Hello round trip on each (re)connect.</param>
|
/// <param name="ConnectTimeout">Cap on the named-pipe connect + Hello round trip on each (re)connect.</param>
|
||||||
/// <param name="CallTimeout">Cap on a single read/write call once connected.</param>
|
/// <param name="CallTimeout">Cap on a single read/write call once connected.</param>
|
||||||
/// <param name="ReconnectInitialBackoff">Backoff between the first failed reconnect attempts.</param>
|
|
||||||
/// <param name="ReconnectMaxBackoff">Upper bound on the exponential backoff between reconnects.</param>
|
|
||||||
public sealed record WonderwareHistorianClientOptions(
|
public sealed record WonderwareHistorianClientOptions(
|
||||||
string PipeName,
|
string PipeName,
|
||||||
string SharedSecret,
|
string SharedSecret,
|
||||||
string PeerName = "OtOpcUa",
|
string PeerName = "OtOpcUa",
|
||||||
TimeSpan? ConnectTimeout = null,
|
TimeSpan? ConnectTimeout = null,
|
||||||
TimeSpan? CallTimeout = null,
|
TimeSpan? CallTimeout = null)
|
||||||
TimeSpan? ReconnectInitialBackoff = null,
|
|
||||||
TimeSpan? ReconnectMaxBackoff = null)
|
|
||||||
{
|
{
|
||||||
public TimeSpan EffectiveConnectTimeout => ConnectTimeout ?? TimeSpan.FromSeconds(10);
|
public TimeSpan EffectiveConnectTimeout => ConnectTimeout ?? TimeSpan.FromSeconds(10);
|
||||||
public TimeSpan EffectiveCallTimeout => CallTimeout ?? TimeSpan.FromSeconds(30);
|
public TimeSpan EffectiveCallTimeout => CallTimeout ?? TimeSpan.FromSeconds(30);
|
||||||
public TimeSpan EffectiveReconnectInitialBackoff => ReconnectInitialBackoff ?? TimeSpan.FromMilliseconds(500);
|
|
||||||
public TimeSpan EffectiveReconnectMaxBackoff => ReconnectMaxBackoff ?? TimeSpan.FromSeconds(30);
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -491,4 +491,95 @@ public sealed class WonderwareHistorianClientTests
|
|||||||
await Should.ThrowAsync<InvalidDataException>(() =>
|
await Should.ThrowAsync<InvalidDataException>(() =>
|
||||||
client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 100, CancellationToken.None));
|
client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 100, CancellationToken.None));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ===== Finding-003 / Finding-004: health counter consistency =====
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// (Finding 003 + 004) A sidecar-level failure must be classified once: TotalSuccesses
|
||||||
|
/// must stay at 0, TotalFailures must become 1, and TotalQueries / TotalSuccesses /
|
||||||
|
/// TotalFailures must all be updated under the same lock so a concurrent snapshot can
|
||||||
|
/// never observe inflated successes or out-of-band TotalQueries. This pins behaviour so
|
||||||
|
/// a future regression to the "RecordSuccess then undo via ReclassifySuccessAsFailure"
|
||||||
|
/// dance is caught.
|
||||||
|
/// </summary>
|
||||||
|
[Fact]
|
||||||
|
public async Task GetHealthSnapshot_SidecarFailure_NeverInflatesSuccessCounter()
|
||||||
|
{
|
||||||
|
var pipe = UniquePipeName();
|
||||||
|
await using var server = new FakeSidecarServer(pipe, Secret)
|
||||||
|
{
|
||||||
|
OnReadRaw = _ => new ReadRawReply { Success = false, Error = "boom" },
|
||||||
|
};
|
||||||
|
await server.StartAsync();
|
||||||
|
|
||||||
|
await using var client = new WonderwareHistorianClient(OptsFor(pipe));
|
||||||
|
|
||||||
|
await Should.ThrowAsync<InvalidOperationException>(() =>
|
||||||
|
client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 1, CancellationToken.None));
|
||||||
|
|
||||||
|
var snap = client.GetHealthSnapshot();
|
||||||
|
snap.TotalQueries.ShouldBe(1);
|
||||||
|
snap.TotalSuccesses.ShouldBe(0);
|
||||||
|
snap.TotalFailures.ShouldBe(1);
|
||||||
|
snap.ConsecutiveFailures.ShouldBe(1);
|
||||||
|
snap.LastError.ShouldNotBeNull();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// (Finding 003) Concurrent calls + concurrent <see cref="WonderwareHistorianClient.GetHealthSnapshot"/>
|
||||||
|
/// reads must observe consistent counters. Specifically, TotalSuccesses + TotalFailures
|
||||||
|
/// must equal TotalQueries at every observed snapshot (no torn read between an
|
||||||
|
/// Interlocked-incremented TotalQueries and a lock-protected outcome counter). The
|
||||||
|
/// channel serializes calls, so the test is observable: each completed query strictly
|
||||||
|
/// increments either successes or failures by one.
|
||||||
|
/// </summary>
|
||||||
|
[Fact]
|
||||||
|
public async Task GetHealthSnapshot_ConcurrentCallsAndReads_CountersAreInternallyConsistent()
|
||||||
|
{
|
||||||
|
var pipe = UniquePipeName();
|
||||||
|
await using var server = new FakeSidecarServer(pipe, Secret)
|
||||||
|
{
|
||||||
|
OnReadRaw = _ => new ReadRawReply { Success = true },
|
||||||
|
};
|
||||||
|
await server.StartAsync();
|
||||||
|
|
||||||
|
await using var client = new WonderwareHistorianClient(OptsFor(pipe));
|
||||||
|
|
||||||
|
using var stop = new CancellationTokenSource();
|
||||||
|
var readerSawInconsistent = false;
|
||||||
|
|
||||||
|
#pragma warning disable xUnit1051 // Internal Task.Run loop drives a polling stress test; cancellation flows via stop.IsCancellationRequested below.
|
||||||
|
var reader = Task.Run(() =>
|
||||||
|
{
|
||||||
|
while (!stop.IsCancellationRequested)
|
||||||
|
{
|
||||||
|
var snap = client.GetHealthSnapshot();
|
||||||
|
// Every completed call increments TotalQueries AND exactly one of
|
||||||
|
// TotalSuccesses or TotalFailures under the same lock; an in-flight call
|
||||||
|
// has not yet incremented any of them. So TotalQueries should always equal
|
||||||
|
// the sum of TotalSuccesses + TotalFailures (no in-between state visible).
|
||||||
|
if (snap.TotalSuccesses + snap.TotalFailures != snap.TotalQueries)
|
||||||
|
{
|
||||||
|
readerSawInconsistent = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
#pragma warning restore xUnit1051
|
||||||
|
|
||||||
|
for (var i = 0; i < 50; i++)
|
||||||
|
{
|
||||||
|
await client.ReadRawAsync("Tag", DateTime.UtcNow, DateTime.UtcNow, 1, TestContext.Current.CancellationToken);
|
||||||
|
}
|
||||||
|
|
||||||
|
stop.Cancel();
|
||||||
|
await reader;
|
||||||
|
|
||||||
|
readerSawInconsistent.ShouldBeFalse(
|
||||||
|
"GetHealthSnapshot exposed TotalQueries that disagreed with the sum of TotalSuccesses + TotalFailures — counters are not updated under a single lock.");
|
||||||
|
|
||||||
|
var final = client.GetHealthSnapshot();
|
||||||
|
final.TotalQueries.ShouldBe(50);
|
||||||
|
final.TotalSuccesses.ShouldBe(50);
|
||||||
|
final.TotalFailures.ShouldBe(0);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user