diff --git a/code-reviews/DataConnectionLayer/findings.md b/code-reviews/DataConnectionLayer/findings.md index 385535f..2b921f8 100644 --- a/code-reviews/DataConnectionLayer/findings.md +++ b/code-reviews/DataConnectionLayer/findings.md @@ -8,7 +8,7 @@ | Last reviewed | 2026-05-17 | | Reviewer | claude-agent | | Commit reviewed | `39d737e` | -| Open findings | 4 | +| Open findings | 0 | ## Summary @@ -688,7 +688,7 @@ guard), and it passes against the atomic fix. |--|--| | Severity | High | | Category | Security | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs:325`, `src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs:35-39,79-83` | **Description** @@ -724,7 +724,18 @@ the default is `false`. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit pending). `RealOpcUaClientFactory` gained an +`ILoggerFactory` constructor parameter and `Create()` now passes +`_loggerFactory.CreateLogger()` into every `RealOpcUaClient` it builds; +`DataConnectionFactory` (which already holds an `ILoggerFactory`) now constructs +`RealOpcUaClientFactory(globalOptions, _loggerFactory)`, so the DCL-012 auto-accept-cert +warning reaches a real logger in production instead of being discarded by `NullLogger`. +The parameterless / single-arg factory constructors still default to `NullLoggerFactory` +so existing callers are unaffected. Regression tests +`DCL014_RealOpcUaClientFactory_CreatesClientWithRealLogger` and +`DCL014_DataConnectionFactory_ThreadsLoggerToRealOpcUaClient` fail against the pre-fix +code (the first fails to compile — no logger ctor; the second observes a `NullLogger`) +and pass after. ### DataConnectionLayer-015 — Initial-connect failures never trigger failover; an unreachable primary at startup never tries the backup @@ -732,7 +743,7 @@ _Unresolved._ |--|--| | Severity | Medium | | Category | Error handling & resilience | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:404-417`, `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:419-493` | **Description** @@ -769,7 +780,16 @@ backup". **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit pending). The endpoint-switch failover logic was extracted +from `HandleReconnectResult` into a shared `CountFailureAndMaybeFailover` helper, and +`HandleConnectResult` (the initial-connect handler in the `Connecting` state) now +increments `_consecutiveFailures` and calls that helper on failure — so a primary that +is unreachable at startup fails over to the configured backup after +`FailoverRetryCount` attempts instead of retrying the primary forever. Single-endpoint +connections (no backup) still retry indefinitely. Regression test +`DCL015_PrimaryDownAtStartup_FailsOverToBackup` fails against the pre-fix code (the +backup adapter is never created) and passes after; `DCL015_SingleEndpointDownAtStartup_RetriesIndefinitely_NoFailover` +guards the no-backup path. ### DataConnectionLayer-016 — `HandleSubscribeCompleted` reports `SubscribeTagsResponse` success even on a connection-level subscribe failure @@ -777,7 +797,7 @@ _Unresolved._ |--|--| | Severity | Medium | | Category | Correctness & logic bugs | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:606,666-672`, `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs:232-240` | **Description** @@ -814,7 +834,16 @@ Instance Actor can reflect partial success. Add a test asserting the response is **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit pending). `HandleSubscribeCompleted` now replies +`SubscribeTagsResponse(success: false, error: "connection unavailable — will +re-subscribe on reconnect")` when `connectionLevelFailure` is true — matching the +actor's own assessment as it drives into `Reconnecting` — instead of unconditionally +replying `success: true`. Genuine tag-resolution failures still reply `success: true` +(they are a runtime quality concern tracked via `_unresolvedTags`, with a `Bad`-quality +`TagValueUpdate` already pushed), so that case is not regressed. Regression test +`DCL016_ConnectionLevelSubscribeFailure_RepliesWithUnsuccessfulResponse` fails against +the pre-fix code (it returned `Success: true`) and passes after; +`DCL016_GenuineResolutionFailure_StillRepliesSuccess` guards the resolution-failure path. ### DataConnectionLayer-017 — `WriteBatchAsync` aborts the whole batch on a mid-batch disconnect @@ -822,7 +851,7 @@ _Unresolved._ |--|--| | Severity | Low | | Category | Error handling & resilience | -| Status | Open | +| Status | Resolved | | Location | `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:229-237`, `src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs:218-227` | **Description** @@ -856,4 +885,14 @@ complete, consistent result map. **Resolution** -_Unresolved._ +Resolved 2026-05-17 (commit pending). `WriteBatchAsync` now wraps each per-tag +`WriteAsync` call in a try/catch — mirroring the DCL-007 fix for `ReadBatchAsync` — so a +mid-batch fault (the `InvalidOperationException` from `EnsureConnected` on a dropped +connection) is recorded as a failed `WriteResult(false, ex.Message)` for that tag and +the batch returns a complete result map; `OperationCanceledException` is still +propagated so a cancelled batch aborts as a whole. Callers (including +`WriteBatchAndWaitAsync`) now get a consistent per-tag result shape instead of an +unhandled exception. Regression test +`DCL017_WriteBatch_ReturnsPerTagResults_WhenConnectionDropsMidBatch` fails against the +pre-fix code (the batch throws, no map returned) and passes after; +`DCL017_WriteBatch_CancellationAbortsWholeBatch` guards that cancellation still aborts. diff --git a/src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs b/src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs index a08a16b..a6d8d0b 100644 --- a/src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs +++ b/src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs @@ -410,8 +410,14 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers } else { - _log.Warning("[{0}] Connection failed: {1}. Retrying in {2}s", - _connectionName, result.Error, _options.ReconnectInterval.TotalSeconds); + // DataConnectionLayer-015: the INITIAL connect must participate in the + // failover counter exactly like a reconnect. Without this a primary that is + // unreachable when the actor first starts (fresh deployment, site restart, or + // a primary simply down) is retried forever and the configured backup is + // never tried. Count the failure and switch endpoint once the retry count is + // exhausted, then re-arm the timer. + _consecutiveFailures++; + CountFailureAndMaybeFailover(result.Error); Timers.StartSingleTimer("reconnect", new AttemptConnect(), _options.ReconnectInterval); } } @@ -439,59 +445,69 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers else { _consecutiveFailures++; - - // Failover: switch endpoint after exhausting retry count (only if backup is configured) - if (_backupConfig != null && _consecutiveFailures >= _failoverRetryCount) - { - var previousEndpoint = _activeEndpoint; - _activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary - ? ActiveEndpoint.Backup - : ActiveEndpoint.Primary; - _consecutiveFailures = 0; - - var newConfig = _activeEndpoint == ActiveEndpoint.Primary - ? _primaryConfig - : _backupConfig; - - // Dispose old adapter (fire-and-forget — don't await in actor context) - _adapter.Disconnected -= OnAdapterDisconnected; - _ = _adapter.DisposeAsync().AsTask(); - - // Create new adapter for the target endpoint - _adapter = _factory.Create(_protocolType, newConfig); - _connectionDetails = newConfig; - - // Wire disconnect handler on new adapter - _adapter.Disconnected += OnAdapterDisconnected; - - // DataConnectionLayer-011: new adapter — bump the generation so callbacks - // from the disposed adapter are recognised as stale and dropped. - _adapterGeneration++; - - _log.Warning("[{0}] Failing over from {1} to {2}", - _connectionName, previousEndpoint, _activeEndpoint); - - // Log failover event to site event log - if (_siteEventLogger != null) - { - _ = _siteEventLogger.LogEventAsync( - "connection", "Warning", null, _connectionName, - $"Failover from {previousEndpoint} to {_activeEndpoint}", - $"After {_failoverRetryCount} consecutive failures"); - } - } - else - { - var retryLimit = _backupConfig != null ? _failoverRetryCount.ToString() : "∞"; - _log.Warning("[{0}] Reconnect failed: {1}. Retrying in {2}s (attempt {3}/{4})", - _connectionName, result.Error, _options.ReconnectInterval.TotalSeconds, - _consecutiveFailures, retryLimit); - } - + CountFailureAndMaybeFailover(result.Error); Timers.StartSingleTimer("reconnect", new AttemptConnect(), _options.ReconnectInterval); } } + /// + /// Shared connect-failure handling for both the initial connect (Connecting state) + /// and reconnect (Reconnecting state). Assumes has + /// already been incremented for the current failure. Switches to the other endpoint + /// once the retry count is exhausted and a backup is configured + /// (DataConnectionLayer-015 brought the initial connect onto this path). + /// + private void CountFailureAndMaybeFailover(string? error) + { + // Failover: switch endpoint after exhausting retry count (only if backup is configured) + if (_backupConfig != null && _consecutiveFailures >= _failoverRetryCount) + { + var previousEndpoint = _activeEndpoint; + _activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary + ? ActiveEndpoint.Backup + : ActiveEndpoint.Primary; + _consecutiveFailures = 0; + + var newConfig = _activeEndpoint == ActiveEndpoint.Primary + ? _primaryConfig + : _backupConfig; + + // Dispose old adapter (fire-and-forget — don't await in actor context) + _adapter.Disconnected -= OnAdapterDisconnected; + _ = _adapter.DisposeAsync().AsTask(); + + // Create new adapter for the target endpoint + _adapter = _factory.Create(_protocolType, newConfig); + _connectionDetails = newConfig; + + // Wire disconnect handler on new adapter + _adapter.Disconnected += OnAdapterDisconnected; + + // DataConnectionLayer-011: new adapter — bump the generation so callbacks + // from the disposed adapter are recognised as stale and dropped. + _adapterGeneration++; + + _log.Warning("[{0}] Failing over from {1} to {2}", + _connectionName, previousEndpoint, _activeEndpoint); + + // Log failover event to site event log + if (_siteEventLogger != null) + { + _ = _siteEventLogger.LogEventAsync( + "connection", "Warning", null, _connectionName, + $"Failover from {previousEndpoint} to {_activeEndpoint}", + $"After {_failoverRetryCount} consecutive failures"); + } + } + else + { + var retryLimit = _backupConfig != null ? _failoverRetryCount.ToString() : "∞"; + _log.Warning("[{0}] Connect failed: {1}. Retrying in {2}s (attempt {3}/{4})", + _connectionName, error, _options.ReconnectInterval.TotalSeconds, + _consecutiveFailures, retryLimit); + } + } + private void HandleDisconnect() { _log.Warning("[{0}] AdapterDisconnected message received — transitioning to Reconnecting", _connectionName); @@ -663,8 +679,18 @@ public class DataConnectionActor : UntypedActor, IWithStash, IWithTimers _options.TagResolutionRetryInterval); } - msg.ReplyTo.Tell(new SubscribeTagsResponse( - msg.Request.CorrelationId, instanceName, true, null, DateTimeOffset.UtcNow)); + // DataConnectionLayer-016: the response must match the actor's own assessment. + // When a connection-level failure is driving the actor into Reconnecting, the + // tags were never subscribed at the adapter — replying Success: true would tell + // the Instance Actor the subscribe succeeded when it did not. Genuine + // tag-resolution failures stay Success: true (they are a runtime quality concern + // tracked via _unresolvedTags, with a Bad-quality TagValueUpdate already pushed). + msg.ReplyTo.Tell(connectionLevelFailure + ? new SubscribeTagsResponse( + msg.Request.CorrelationId, instanceName, false, + "connection unavailable — will re-subscribe on reconnect", DateTimeOffset.UtcNow) + : new SubscribeTagsResponse( + msg.Request.CorrelationId, instanceName, true, null, DateTimeOffset.UtcNow)); // The caller (Connected state only) decides whether to enter Reconnecting. // In Connecting/Reconnecting the connection is not established anyway, so the diff --git a/src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs b/src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs index e9c0614..f14ca95 100644 --- a/src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs +++ b/src/ScadaLink.DataConnectionLayer/Adapters/OpcUaDataConnection.cs @@ -228,10 +228,28 @@ public class OpcUaDataConnection : IDataConnection public async Task> WriteBatchAsync(IDictionary values, CancellationToken cancellationToken = default) { + // DataConnectionLayer-017: a mid-batch fault must not abort the whole batch. + // WriteAsync calls EnsureConnected(), which throws InvalidOperationException when + // the connection drops partway through; catch per-tag exceptions and record a + // failed WriteResult so the caller (including WriteBatchAndWaitAsync) receives a + // complete result map. OperationCanceledException is still propagated so a + // cancelled batch aborts as a whole — mirrors the DCL-007 fix for ReadBatchAsync. var results = new Dictionary(); foreach (var (tagPath, value) in values) { - results[tagPath] = await WriteAsync(tagPath, value, cancellationToken); + try + { + results[tagPath] = await WriteAsync(tagPath, value, cancellationToken); + } + catch (OperationCanceledException) + { + // Cancellation aborts the whole batch — propagate it. + throw; + } + catch (Exception ex) + { + results[tagPath] = new WriteResult(false, ex.Message); + } } return results; } diff --git a/src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs b/src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs index b949ffb..0f16307 100644 --- a/src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs +++ b/src/ScadaLink.DataConnectionLayer/Adapters/RealOpcUaClient.cs @@ -316,11 +316,24 @@ public class RealOpcUaClientFactory : IOpcUaClientFactory { private readonly OpcUaGlobalOptions _globalOptions; + // DataConnectionLayer-014: a real logger must be threaded through to every + // RealOpcUaClient this factory builds, otherwise the DCL-012 auto-accept-certificate + // warning emitted in RealOpcUaClient.ConnectAsync sinks into NullLogger and is never + // seen in production. The factory is constructed by DataConnectionFactory, which has + // an ILoggerFactory available. + private readonly ILoggerFactory _loggerFactory; + public RealOpcUaClientFactory() : this(new OpcUaGlobalOptions()) { } + public RealOpcUaClientFactory(OpcUaGlobalOptions globalOptions) + : this(globalOptions, NullLoggerFactory.Instance) { } + + public RealOpcUaClientFactory(OpcUaGlobalOptions globalOptions, ILoggerFactory loggerFactory) { _globalOptions = globalOptions; + _loggerFactory = loggerFactory; } - public IOpcUaClient Create() => new RealOpcUaClient(_globalOptions); + public IOpcUaClient Create() => + new RealOpcUaClient(_globalOptions, _loggerFactory.CreateLogger()); } diff --git a/src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs b/src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs index 82a6ef2..40bc330 100644 --- a/src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs +++ b/src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs @@ -22,9 +22,12 @@ public class DataConnectionFactory : IDataConnectionFactory _loggerFactory = loggerFactory; var globalOptions = opcUaGlobalOptions.Value; - // Register built-in protocols + // Register built-in protocols. + // DataConnectionLayer-014: pass the ILoggerFactory into RealOpcUaClientFactory so + // the RealOpcUaClient it builds gets a real logger — without it the DCL-012 + // auto-accept-certificate security warning is silently discarded by NullLogger. RegisterAdapter("OpcUa", details => new OpcUaDataConnection( - new RealOpcUaClientFactory(globalOptions), + new RealOpcUaClientFactory(globalOptions, _loggerFactory), _loggerFactory.CreateLogger())); } diff --git a/tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs b/tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs index a1e7820..ba8312a 100644 --- a/tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs +++ b/tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs @@ -803,6 +803,132 @@ public class DataConnectionActorTests : TestKit ExpectNoMsg(TimeSpan.FromSeconds(1)); } + // ── DataConnectionLayer-015: initial-connect failures must trigger failover ── + + [Fact] + public async Task DCL015_PrimaryDownAtStartup_FailsOverToBackup() + { + // Regression test for DataConnectionLayer-015. HandleConnectResult — the handler + // for the INITIAL connection attempt in the Connecting state — only logged and + // re-armed the reconnect timer. It never incremented _consecutiveFailures and + // never switched endpoint, so a primary that is unreachable when the actor first + // starts (a fresh deployment, a site restart, a primary simply down) retried the + // primary forever and never tried the configured backup. After the fix the + // initial connect participates in the failover counter like HandleReconnectResult. + var primaryConfig = new Dictionary { ["Endpoint"] = "opc.tcp://primary:4840" }; + var backupConfig = new Dictionary { ["Endpoint"] = "opc.tcp://backup:4840" }; + var primaryAdapter = Substitute.For(); + var backupAdapter = Substitute.For(); + + // Primary is down from the very first attempt — it never connects. + primaryAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(Task.FromException(new Exception("Connection refused"))); + + // Factory returns the backup adapter when called with the backup config. + _mockFactory.Create("OpcUa", Arg.Is>(d => d["Endpoint"] == "opc.tcp://backup:4840")) + .Returns(backupAdapter); + backupAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(Task.CompletedTask); + + var actor = CreateFailoverActor(primaryAdapter, "dcl015-startup-failover", + primaryConfig, backupConfig, failoverRetryCount: 2); + + // After failoverRetryCount initial-connect failures on the primary, the actor + // must build the backup adapter. Pre-fix the factory was never called. + AwaitCondition(() => + _mockFactory.ReceivedCalls().Any(c => + c.GetMethodInfo().Name == "Create" && + c.GetArguments()[1] is IDictionary d && + d["Endpoint"] == "opc.tcp://backup:4840"), + TimeSpan.FromSeconds(5)); + } + + [Fact] + public async Task DCL015_SingleEndpointDownAtStartup_RetriesIndefinitely_NoFailover() + { + // Companion guard: a single-endpoint connection (no backup) whose primary is + // unreachable at startup must keep retrying the same endpoint indefinitely — the + // initial-connect failover counter must not synthesise a non-existent backup. + var primaryConfig = new Dictionary { ["Endpoint"] = "opc.tcp://primary:4840" }; + var primaryAdapter = Substitute.For(); + var connectCount = 0; + primaryAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(_ => + { + Interlocked.Increment(ref connectCount); + return Task.FromException(new Exception("Connection refused")); + }); + + var actor = CreateFailoverActor(primaryAdapter, "dcl015-no-backup", + primaryConfig, backupConfig: null, failoverRetryCount: 2); + + // Many retries occur (well past the failover threshold) but no adapter is ever + // created via the factory — there is nothing to fail over to. + AwaitCondition(() => connectCount >= 6, TimeSpan.FromSeconds(10)); + _mockFactory.DidNotReceive().Create(Arg.Any(), Arg.Any>()); + } + + // ── DataConnectionLayer-016: subscribe response must reflect a connection-level failure ── + + [Fact] + public async Task DCL016_ConnectionLevelSubscribeFailure_RepliesWithUnsuccessfulResponse() + { + // Regression test for DataConnectionLayer-016. When a subscribe arrives while the + // adapter is silently down, HandleSubscribeCompleted drove the actor into + // Reconnecting (a connection-level failure) but still replied to the caller with + // SubscribeTagsResponse(Success: true, Error: null). The Instance Actor was told + // the subscribe succeeded while the tags were never actually subscribed at the + // adapter. After the fix the response matches the actor's own assessment: + // Success: false with an explanatory error. + _mockAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(Task.CompletedTask); + _mockAdapter.Status.Returns(ConnectionHealth.Connected); + // Subscribe fails at connection level (InvalidOperationException from EnsureConnected). + _mockAdapter.SubscribeAsync(Arg.Any(), Arg.Any(), Arg.Any()) + .Returns(Task.FromException( + new InvalidOperationException("OPC UA client is not connected."))); + _mockAdapter.ReadAsync(Arg.Any(), Arg.Any()) + .Returns(new ReadResult(false, null, null)); + + var actor = CreateConnectionActor("dcl016-conn-fail"); + await Task.Delay(300); + + actor.Tell(new SubscribeTagsRequest( + "c1", "inst1", "dcl016-conn-fail", ["some/tag"], DateTimeOffset.UtcNow)); + + // The response must NOT claim success — the connection-level failure that drove + // Reconnecting means the tags were never subscribed at the adapter. + var response = ExpectMsg(TimeSpan.FromSeconds(5)); + Assert.False(response.Success); + Assert.NotNull(response.ErrorMessage); + } + + [Fact] + public async Task DCL016_GenuineResolutionFailure_StillRepliesSuccess() + { + // Companion guard: a genuine tag-resolution failure (the node does not exist) is + // a runtime quality concern, not a connection-level fault — the design tracks it + // via _unresolvedTags and a Bad-quality TagValueUpdate. The overall subscribe + // response stays Success: true so this case is not regressed by the 016 fix. + _mockAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(Task.CompletedTask); + _mockAdapter.Status.Returns(ConnectionHealth.Connected); + _mockAdapter.SubscribeAsync("missing/tag", Arg.Any(), Arg.Any()) + .Returns(Task.FromException(new KeyNotFoundException("node not found"))); + _mockAdapter.ReadAsync(Arg.Any(), Arg.Any()) + .Returns(new ReadResult(false, null, null)); + + var actor = CreateConnectionActor("dcl016-genuine"); + await Task.Delay(300); + + actor.Tell(new SubscribeTagsRequest( + "c1", "inst1", "dcl016-genuine", ["missing/tag"], DateTimeOffset.UtcNow)); + + var ack = FishForMessage(_ => true, TimeSpan.FromSeconds(5)); + Assert.True(ack.Success); + Assert.Null(ack.ErrorMessage); + } + [Fact] public async Task DCL001_SubscribeWithFailedTags_CountsResolvedAndUnresolvedSeparately() { diff --git a/tests/ScadaLink.DataConnectionLayer.Tests/OpcUaDataConnectionTests.cs b/tests/ScadaLink.DataConnectionLayer.Tests/OpcUaDataConnectionTests.cs index f36f388..d93e447 100644 --- a/tests/ScadaLink.DataConnectionLayer.Tests/OpcUaDataConnectionTests.cs +++ b/tests/ScadaLink.DataConnectionLayer.Tests/OpcUaDataConnectionTests.cs @@ -1,7 +1,9 @@ +using Microsoft.Extensions.Logging; using Microsoft.Extensions.Logging.Abstractions; using NSubstitute; using ScadaLink.Commons.Interfaces.Protocol; using ScadaLink.Commons.Types.Enums; +using ScadaLink.DataConnectionLayer; using ScadaLink.DataConnectionLayer.Adapters; namespace ScadaLink.DataConnectionLayer.Tests; @@ -54,6 +56,63 @@ public class OpcUaCertificateDefaultTests } } +/// +/// DataConnectionLayer-014: the DCL-012 auto-accept-certificate security warning is +/// only effective if RealOpcUaClient is built with a real logger. The only production +/// path that constructs one is RealOpcUaClientFactory.Create(); that factory must +/// thread a logger through, otherwise the warning sinks into NullLogger and an +/// operator who enables AutoAcceptUntrustedCerts sees no signal anywhere. +/// +public class RealOpcUaClientFactoryLoggerTests +{ + private static ILogger ReadLogger(RealOpcUaClient client) + { + var field = typeof(RealOpcUaClient).GetField("_logger", + System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic); + Assert.NotNull(field); + return (ILogger)field!.GetValue(client)!; + } + + [Fact] + public void DCL014_RealOpcUaClientFactory_CreatesClientWithRealLogger() + { + // Regression test for DataConnectionLayer-014. RealOpcUaClientFactory.Create() + // constructed `new RealOpcUaClient(_globalOptions)` with no logger, so the + // DCL-012 man-in-the-middle warning was always discarded by NullLogger in + // production. After the fix the factory accepts an ILoggerFactory and passes a + // real ILogger into every client it creates. + using var loggerFactory = LoggerFactory.Create(b => { }); + var factory = new RealOpcUaClientFactory(new OpcUaGlobalOptions(), loggerFactory); + + var client = factory.Create(); + + var logger = ReadLogger((RealOpcUaClient)client); + Assert.NotSame(NullLogger.Instance, logger); + } + + [Fact] + public void DCL014_DataConnectionFactory_ThreadsLoggerToRealOpcUaClient() + { + // The full production wiring: DataConnectionFactory holds an ILoggerFactory and + // registers the OpcUa adapter. The RealOpcUaClient it ultimately builds must end + // up with a real (non-Null) logger so the auto-accept-cert warning is visible. + using var loggerFactory = LoggerFactory.Create(b => { }); + var dataConnectionFactory = new DataConnectionFactory(loggerFactory); + var adapter = (OpcUaDataConnection)dataConnectionFactory.Create( + "OpcUa", new Dictionary()); + + // Reach the RealOpcUaClient the adapter would create on connect. + var clientFactoryField = typeof(OpcUaDataConnection).GetField("_clientFactory", + System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic); + Assert.NotNull(clientFactoryField); + var clientFactory = (RealOpcUaClientFactory)clientFactoryField!.GetValue(adapter)!; + var client = (RealOpcUaClient)clientFactory.Create(); + + var logger = ReadLogger(client); + Assert.NotSame(NullLogger.Instance, logger); + } +} + /// /// DataConnectionLayer-009: failover-stability tunables must be configurable. /// @@ -269,6 +328,64 @@ public class OpcUaDataConnectionTests Assert.NotNull(results["bad"].ErrorMessage); } + [Fact] + public async Task DCL017_WriteBatch_ReturnsPerTagResults_WhenConnectionDropsMidBatch() + { + // Regression test for DataConnectionLayer-017. WriteBatchAsync looped calling + // WriteAsync per tag; WriteAsync first calls EnsureConnected(), which throws + // InvalidOperationException when the client is disconnected. WriteBatchAsync did + // not catch that, so a connection dropping partway through a batch made the whole + // WriteBatchAsync throw — the caller lost the per-tag outcomes for the tags that + // already wrote. After the fix (mirroring DCL-007's ReadBatchAsync) each per-tag + // failure is recorded as a failed WriteResult and the batch returns a complete map. + var writeCount = 0; + // First write succeeds; then the client "disconnects" so EnsureConnected throws. + _mockClient.IsConnected.Returns(_ => Interlocked.Increment(ref writeCount) <= 1); + _mockClient.WriteValueAsync(Arg.Any(), Arg.Any(), Arg.Any()) + .Returns((uint)0); + + // Connect leaves IsConnected true for the first WriteAsync's EnsureConnected check. + _mockClient.IsConnected.Returns(true); + await _adapter.ConnectAsync(new Dictionary()); + + // Re-arm: IsConnected true for tag1's check, false for tag2 and tag3. + var checks = 0; + _mockClient.IsConnected.Returns(_ => Interlocked.Increment(ref checks) <= 1); + + var results = await _adapter.WriteBatchAsync(new Dictionary + { + ["tag1"] = 1, + ["tag2"] = 2, + ["tag3"] = 3 + }); + + // Every requested tag is present in the result map — the batch was not aborted. + Assert.Equal(3, results.Count); + Assert.True(results["tag1"].Success); + // tag2 and tag3 fail at the connection check but are reported per-tag. + Assert.False(results["tag2"].Success); + Assert.NotNull(results["tag2"].ErrorMessage); + Assert.False(results["tag3"].Success); + Assert.NotNull(results["tag3"].ErrorMessage); + } + + [Fact] + public async Task DCL017_WriteBatch_CancellationAbortsWholeBatch() + { + // Companion guard: a cancelled batch must still abort as a whole — only + // connection/device faults are demoted to per-tag results, never cancellation. + _mockClient.IsConnected.Returns(true); + await _adapter.ConnectAsync(new Dictionary()); + + using var cts = new CancellationTokenSource(); + cts.Cancel(); + _mockClient.WriteValueAsync(Arg.Any(), Arg.Any(), Arg.Any()) + .Returns(_ => throw new OperationCanceledException()); + + await Assert.ThrowsAnyAsync(() => + _adapter.WriteBatchAsync(new Dictionary { ["tag1"] = 1 }, cts.Token)); + } + [Fact] public async Task NotConnected_ThrowsOnOperations() {