fix(data-connection): resolve DataConnectionLayer-006..012 — quality-counter reconciliation, per-tag batch reads, configurable failover threshold, dedup retry, stale-callback guard, secure cert default
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
| Last reviewed | 2026-05-16 |
|
||||
| Reviewer | claude-agent |
|
||||
| Commit reviewed | `9c60592` |
|
||||
| Open findings | 8 |
|
||||
| Open findings | 2 |
|
||||
|
||||
## Summary
|
||||
|
||||
@@ -287,7 +287,7 @@ unbounded code and passes after. Fixed by the commit whose message references
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:645-673,721-756` |
|
||||
|
||||
**Description**
|
||||
@@ -303,6 +303,14 @@ decrement, and the totals can drift above `_totalSubscribed`. Over repeated
|
||||
disconnect/reconnect cycles the health report's good/bad/uncertain counts become
|
||||
unreliable.
|
||||
|
||||
**Verification note**: Confirmed against source. The root cause is broader than the
|
||||
reconnect path the finding describes: `HandleUnsubscribe` also never removes a tag
|
||||
from `_lastTagQuality` nor decrements its quality bucket, so an unsubscribed tag
|
||||
lingers and `PushBadQualityForAllTags` (which sets `_tagsBadQuality =
|
||||
_lastTagQuality.Count`) over-counts it — driving the bad-quality count above
|
||||
`_totalSubscribed` even without a re-subscribe. Both the unsubscribe leak and the
|
||||
re-subscribe drift are real.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
On `BecomeConnected` after a re-subscribe (or in `ReSubscribeAll`), clear
|
||||
@@ -312,7 +320,17 @@ fresh `TagValueReceived` messages. Alternatively recompute the buckets from
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `HandleUnsubscribe` now removes each
|
||||
unsubscribed tag from `_lastTagQuality` and decrements the corresponding quality
|
||||
bucket, then reports the corrected counters via `UpdateTagQuality`/`UpdateTagResolution`;
|
||||
`ReSubscribeAll` clears `_lastTagQuality` and zeroes the three quality counters so
|
||||
post-reconnect tags are repopulated from fresh `TagValueReceived` messages instead of
|
||||
only incrementing. Regression test
|
||||
`DCL006_DisconnectAfterUnsubscribe_BadQualityCountMatchesRemainingTags` subscribes two
|
||||
tags, pushes Good values, unsubscribes one, then disconnects and asserts
|
||||
`PushBadQualityForAllTags` reports exactly 1 bad tag (the reconnect is gated open so
|
||||
`ReSubscribeAll` does not run before the assertion); it reports 2 against the pre-fix
|
||||
code and 1 after.
|
||||
|
||||
### DataConnectionLayer-007 — `ReadBatchAsync` aborts the whole batch on the first failing tag
|
||||
|
||||
@@ -320,7 +338,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:187-195` |
|
||||
|
||||
**Description**
|
||||
@@ -333,6 +351,9 @@ has a `Success`/`ErrorMessage` shape designed to carry per-tag failures. The bat
|
||||
also fully serial (one round-trip per tag), defeating the point of a batch API; the
|
||||
design doc lists `ReadBatch`/`WriteBatch` as first-class operations.
|
||||
|
||||
**Verification note**: Confirmed against source — `ReadAsync` re-throws on any
|
||||
non-`OperationCanceledException`, aborting the whole batch.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Catch per-tag exceptions inside the loop and store a failed `ReadResult` for that tag
|
||||
@@ -342,7 +363,17 @@ for all node IDs (`RealOpcUaClient.ReadValueAsync` already builds a
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). `ReadBatchAsync` now wraps each per-tag
|
||||
`ReadAsync` in a try/catch: a per-tag exception is recorded as a failed `ReadResult`
|
||||
(`Success: false`, message = the exception message) so the batch returns a complete
|
||||
result map for every requested tag; `OperationCanceledException` is still propagated
|
||||
so a cancelled batch aborts as a whole. The per-tag-serial loop and single-service-call
|
||||
optimisation were deliberately left for a follow-up — they are a performance concern,
|
||||
not the correctness bug this finding raised. Regression test
|
||||
`DCL007_ReadBatch_ReturnsPerTagResults_WhenOneTagFails` reads three tags where the
|
||||
middle one throws and asserts all three appear in the result map with the failing one
|
||||
marked unsuccessful; it threw (no map returned) against the pre-fix code and passes
|
||||
after.
|
||||
|
||||
### DataConnectionLayer-008 — `HandleUnsubscribe` is O(n^2) over instances and rechecks `_unresolvedTags` redundantly
|
||||
|
||||
@@ -379,9 +410,9 @@ _Unresolved._
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium — partially design-doc work outside this module's editable scope |
|
||||
| Category | Design-document adherence |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:189,242-297,379-449`, `docs/requirements/Component-DataConnectionLayer.md:73-85` |
|
||||
|
||||
**Description**
|
||||
@@ -398,6 +429,10 @@ all. A reviewer or operator reading `Component-DataConnectionLayer.md` would not
|
||||
predict this behaviour, and the 60 s threshold is a magic constant not exposed via
|
||||
`DataConnectionOptions`.
|
||||
|
||||
**Verification note**: Confirmed against source. The hard-coded
|
||||
`StableConnectionThreshold = TimeSpan.FromSeconds(60)` `static readonly` field and the
|
||||
`_consecutiveUnstableDisconnects` failover path both exist as described.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Update `Component-DataConnectionLayer.md` to document the unstable-disconnect failover
|
||||
@@ -406,7 +441,19 @@ path and the stability threshold, and move the 60 s threshold into
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). The configurability half of the recommendation
|
||||
is done: the hard-coded `StableConnectionThreshold` constant was removed from
|
||||
`DataConnectionActor` and replaced with a new `DataConnectionOptions.StableConnectionThreshold`
|
||||
property (60 s default), bindable from the `DataConnectionLayer` `appsettings.json`
|
||||
section like `ReconnectInterval`/`TagResolutionRetryInterval`/`WriteTimeout`. Regression
|
||||
test `DCL009_StableConnectionThreshold_IsConfigurable_WithSixtySecondDefault` guards
|
||||
the default and the setter. **The documentation half is out of this module's editable
|
||||
scope** — `docs/requirements/Component-DataConnectionLayer.md` (lines 73-85) still
|
||||
describes only the connect-failure failover path and does not mention the
|
||||
unstable-disconnect trigger. **Action required (surfaced):** the DCL design doc should
|
||||
be updated to document the unstable-disconnect failover path and the configurable
|
||||
stability threshold; that edit was deliberately not made here because this task is
|
||||
scoped to `src/ScadaLink.DataConnectionLayer`, tests, and this findings file only.
|
||||
|
||||
### DataConnectionLayer-010 — Tag-resolution retry can issue duplicate concurrent subscribe attempts
|
||||
|
||||
@@ -414,7 +461,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:594-619,689-703` |
|
||||
|
||||
**Description**
|
||||
@@ -429,6 +476,10 @@ monitored items / leaked subscription IDs (the second success overwrites
|
||||
with no `UnsubscribeAsync` call). The timer-cancel condition in
|
||||
`HandleTagResolutionSucceeded` is also non-deterministic for the same reason.
|
||||
|
||||
**Verification note**: Confirmed against source — `HandleRetryTagResolution` dispatched
|
||||
`SubscribeAsync` for every tag in `_unresolvedTags` on every tick with no in-flight
|
||||
guard.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Remove tags from `_unresolvedTags` (into an "in-flight" set) when a retry is
|
||||
@@ -437,7 +488,18 @@ subscribe attempts and makes the timer-cancel condition deterministic.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). A new `_resolutionInFlight` `HashSet<string>`
|
||||
tracks tags whose retry `SubscribeAsync` is currently outstanding.
|
||||
`HandleRetryTagResolution` now dispatches only for unresolved tags **not** already in
|
||||
flight (and skips entirely if all are in flight), adding each dispatched tag to the
|
||||
set; `HandleTagResolutionSucceeded` and `HandleTagResolutionFailed` remove the tag
|
||||
from the set when its attempt completes, and `HandleUnsubscribe`/`ReSubscribeAll`
|
||||
clear stale entries. This prevents overlapping duplicate subscribe attempts and the
|
||||
resulting orphaned monitored items. Regression test
|
||||
`DCL010_TagResolutionRetry_DoesNotIssueDuplicateConcurrentSubscribes` gives a tag a
|
||||
genuine initial failure then a retry `SubscribeAsync` that never completes, lets six
|
||||
100 ms retry ticks elapse, and asserts exactly one retry was dispatched (2 total
|
||||
subscribe calls); the pre-fix code dispatched on every tick (6 total).
|
||||
|
||||
### DataConnectionLayer-011 — Stale subscription callbacks from disposed adapters can still reach the actor
|
||||
|
||||
@@ -445,7 +507,7 @@ _Unresolved._
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:486-489,278-285,416-425`, `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:252-262` |
|
||||
|
||||
**Description**
|
||||
@@ -460,6 +522,10 @@ data with the new endpoint's data and briefly reporting a value the active endpo
|
||||
never produced. There is no per-adapter generation/epoch tag on `TagValueReceived` to
|
||||
distinguish current from stale callbacks.
|
||||
|
||||
**Verification note**: Confirmed against source — `TagValueReceived` carried no
|
||||
adapter identity, and `HandleTagValueReceived` (reachable in `Connected`) processed
|
||||
any such message regardless of which adapter produced it.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Add an adapter-generation counter incremented on every adapter swap; stamp it onto
|
||||
@@ -468,15 +534,28 @@ generation does not match the current adapter in `HandleTagValueReceived`.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). Implemented exactly as recommended: a new
|
||||
`_adapterGeneration` `int` field is incremented at both adapter-swap sites (the
|
||||
unstable-disconnect failover in `BecomeReconnecting` and the connect-failure failover
|
||||
in `HandleReconnectResult`). The `TagValueReceived` record gained an
|
||||
`AdapterGeneration` field; every subscription callback closure (`HandleSubscribe`, the
|
||||
initial-read seed, `HandleRetryTagResolution`, `ReSubscribeAll`) captures the
|
||||
generation in effect at subscribe time and stamps it onto each `TagValueReceived`.
|
||||
`HandleTagValueReceived` drops any message whose generation no longer matches the
|
||||
current adapter, so a callback fired by a disposed adapter after failover cannot reach
|
||||
an Instance Actor. Regression test
|
||||
`DCL011_StaleTagValueFromOldAdapter_IsNotForwardedAfterFailover` subscribes on the
|
||||
primary, fails over to the backup, then invokes the captured primary callback with a
|
||||
stale value and asserts the subscriber receives nothing; the stale value reached the
|
||||
subscriber against the pre-fix code and is dropped after.
|
||||
|
||||
### DataConnectionLayer-012 — `AutoAcceptUntrustedCerts` defaults to `true`, accepting any server certificate
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| Severity | Medium |
|
||||
| Severity | Medium — full secure default also requires a Commons + design-doc change outside this module |
|
||||
| Category | Security |
|
||||
| Status | Open |
|
||||
| Status | Resolved |
|
||||
| Location | `src/ScadaLink.DataConnectionLayer/Adapters/IOpcUaClient.cs:17`, `src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs:49,60-61`, `docs/requirements/Component-DataConnectionLayer.md:116` |
|
||||
|
||||
**Description**
|
||||
@@ -490,6 +569,13 @@ UA link. The design doc explicitly lists `true` as the default. For an industria
|
||||
control link this is a meaningful exposure; a secure-by-default posture would reject
|
||||
untrusted certs unless an operator opts in per connection.
|
||||
|
||||
**Verification note**: Confirmed against source. Note the *authoritative* runtime
|
||||
default does not actually live on `OpcUaConnectionOptions` — for a real connection
|
||||
`OpcUaDataConnection.ConnectAsync` builds `OpcUaConnectionOptions` from
|
||||
`OpcUaEndpointConfig` (in `ScadaLink.Commons`), whose `AutoAcceptUntrustedCerts`
|
||||
property also defaults to `true`. `OpcUaConnectionOptions`' own default is only the
|
||||
fallback used when an `OpcUaConnectionOptions` is constructed directly.
|
||||
|
||||
**Recommendation**
|
||||
|
||||
Default `AutoAcceptUntrustedCerts` to `false` and require explicit per-connection
|
||||
@@ -498,7 +584,21 @@ installed. Update the design doc to reflect the secure default.
|
||||
|
||||
**Resolution**
|
||||
|
||||
_Unresolved._
|
||||
Resolved 2026-05-16 (commit pending). The two in-scope parts of the recommendation
|
||||
are done: (1) `OpcUaConnectionOptions.AutoAcceptUntrustedCerts` now defaults to
|
||||
`false`; (2) `RealOpcUaClient.ConnectAsync` logs a prominent `ILogger` warning
|
||||
whenever the auto-accept certificate validator is installed (an `ILogger<RealOpcUaClient>`
|
||||
was added as an optional constructor parameter, defaulting to `NullLogger`, so
|
||||
existing callers are unaffected). Regression test
|
||||
`DCL012_OpcUaConnectionOptions_AutoAcceptUntrustedCerts_DefaultsToFalse` guards the
|
||||
new secure default. **Two parts remain outside this module's editable scope and are
|
||||
surfaced as action required:** (a) `ScadaLink.Commons.Types.DataConnections.OpcUaEndpointConfig.AutoAcceptUntrustedCerts`
|
||||
still defaults to `true` — since that is the value actually used for a real connection
|
||||
(see verification note above), the Commons default must also be flipped to `false`
|
||||
for the system to be secure-by-default; (b) `docs/requirements/Component-DataConnectionLayer.md`
|
||||
line 116 still documents `true` as the default and must be updated. Both edits were
|
||||
deliberately not made here because this task is scoped to
|
||||
`src/ScadaLink.DataConnectionLayer`, tests, and this findings file only.
|
||||
|
||||
### DataConnectionLayer-013 — Misleading XML comment: `RaiseDisconnected` claims thread safety it does not provide
|
||||
|
||||
|
||||
Reference in New Issue
Block a user