From 5ca1be328c2f122121b49062ec4b3f8896fb40d1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 22 Mar 2026 08:13:23 -0400 Subject: [PATCH] docs(dcl): add primary/backup data connections implementation plan 8 tasks with TDD steps, exact file paths, and code samples. Covers entity model, failover state machine, health reporting, UI, CLI, management API, deployment, and documentation. --- ...6-03-22-primary-backup-data-connections.md | 695 ++++++++++++++++++ ...mary-backup-data-connections.md.tasks.json | 14 + 2 files changed, 709 insertions(+) create mode 100644 docs/plans/2026-03-22-primary-backup-data-connections.md create mode 100644 docs/plans/2026-03-22-primary-backup-data-connections.md.tasks.json diff --git a/docs/plans/2026-03-22-primary-backup-data-connections.md b/docs/plans/2026-03-22-primary-backup-data-connections.md new file mode 100644 index 0000000..505626a --- /dev/null +++ b/docs/plans/2026-03-22-primary-backup-data-connections.md @@ -0,0 +1,695 @@ +# Primary/Backup Data Connection Endpoints — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. + +**Goal:** Add optional backup endpoints to data connections with automatic failover after configurable retry count. + +**Architecture:** The `DataConnectionActor` gains failover logic in its Reconnecting state — after N failed retries on the active endpoint, it disposes the adapter and creates a fresh one with the other endpoint's config. Adapters remain single-endpoint. Entity model splits `Configuration` into `PrimaryConfiguration` + `BackupConfiguration`. + +**Tech Stack:** C# / .NET 10, Akka.NET, EF Core, Blazor Server, System.CommandLine + +**Design doc:** `docs/plans/2026-03-22-primary-backup-data-connections-design.md` + +--- + +## Task 1: Entity Model & Database Migration + +**Files:** +- Modify: `src/ScadaLink.Commons/Entities/Sites/DataConnection.cs` +- Modify: `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs` (lines 32-56) +- Modify: `src/ScadaLink.Commons/Messages/Artifacts/DataConnectionArtifact.cs` + +### Step 1: Update DataConnection entity + +In `DataConnection.cs`, rename `Configuration` to `PrimaryConfiguration`, add `BackupConfiguration` and `FailoverRetryCount`: + +```csharp +public class DataConnection +{ + public int Id { get; set; } + public int SiteId { get; set; } + public string Name { get; set; } + public string Protocol { get; set; } + public string? PrimaryConfiguration { get; set; } + public string? BackupConfiguration { get; set; } + public int FailoverRetryCount { get; set; } = 3; + + public DataConnection(int siteId, string name, string protocol) + { + SiteId = siteId; + Name = name ?? throw new ArgumentNullException(nameof(name)); + Protocol = protocol ?? throw new ArgumentNullException(nameof(protocol)); + } +} +``` + +### Step 2: Update EF Core mapping + +In `SiteConfiguration.cs`, update the DataConnection mapping (around lines 46-47): + +- Rename `Configuration` property mapping to `PrimaryConfiguration` (MaxLength 4000) +- Add `BackupConfiguration` property (optional, MaxLength 4000) +- Add `FailoverRetryCount` property (required, default 3) + +```csharp +builder.Property(d => d.PrimaryConfiguration).HasMaxLength(4000); +builder.Property(d => d.BackupConfiguration).HasMaxLength(4000); +builder.Property(d => d.FailoverRetryCount).HasDefaultValue(3); +``` + +### Step 3: Create EF Core migration + +Run: +```bash +cd src/ScadaLink.ConfigurationDatabase +dotnet ef migrations add AddDataConnectionBackupEndpoint \ + --startup-project ../ScadaLink.Host +``` + +Verify the migration renames `Configuration` → `PrimaryConfiguration` (should use `RenameColumn`, not drop+add). If the scaffolded migration drops and recreates, manually fix it: + +```csharp +migrationBuilder.RenameColumn( + name: "Configuration", + table: "DataConnections", + newName: "PrimaryConfiguration"); + +migrationBuilder.AddColumn( + name: "BackupConfiguration", + table: "DataConnections", + maxLength: 4000, + nullable: true); + +migrationBuilder.AddColumn( + name: "FailoverRetryCount", + table: "DataConnections", + nullable: false, + defaultValue: 3); +``` + +### Step 4: Update DataConnectionArtifact + +In `DataConnectionArtifact.cs`, replace single `ConfigurationJson` with both: + +```csharp +public record DataConnectionArtifact( + string Name, + string Protocol, + string? PrimaryConfigurationJson, + string? BackupConfigurationJson, + int FailoverRetryCount = 3); +``` + +### Step 5: Build and fix compile errors + +Run: `dotnet build ScadaLink.slnx` + +This will surface all references to the old `Configuration` and `ConfigurationJson` fields across the codebase. Fix each one — this includes: +- ManagementActor handlers +- CLI commands +- UI pages +- Deployment/flattening code +- Tests + +Fix only the field name renames in this step (use `PrimaryConfiguration` where `Configuration` was). Don't add backup logic yet — just make it compile. + +### Step 6: Run tests, fix failures + +Run: `dotnet test ScadaLink.slnx` + +Fix any test failures caused by the rename. + +### Step 7: Commit + +```bash +git add -A +git commit -m "feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount" +``` + +--- + +## Task 2: Update CreateConnectionCommand & Manager Actor + +**Files:** +- Modify: `src/ScadaLink.Commons/Messages/DataConnection/CreateConnectionCommand.cs` +- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionManagerActor.cs` (lines 39-62) + +### Step 1: Update CreateConnectionCommand message + +```csharp +public record CreateConnectionCommand( + string ConnectionName, + string ProtocolType, + IDictionary PrimaryConnectionDetails, + IDictionary? BackupConnectionDetails = null, + int FailoverRetryCount = 3); +``` + +### Step 2: Update DataConnectionManagerActor.HandleCreateConnection + +Update the handler (around line 39-62) to pass both configs to DataConnectionActor: + +```csharp +private void HandleCreateConnection(CreateConnectionCommand command) +{ + if (_connectionActors.ContainsKey(command.ConnectionName)) + { + _log.Warning("Connection {0} already exists", command.ConnectionName); + return; + } + + var adapter = _factory.Create(command.ProtocolType, command.PrimaryConnectionDetails); + + var props = Props.Create(() => new DataConnectionActor( + command.ConnectionName, + adapter, + _options, + _healthCollector, + command.ProtocolType, + command.PrimaryConnectionDetails, + command.BackupConnectionDetails, + command.FailoverRetryCount)); + + var actorName = new string(command.ConnectionName + .Select(c => char.IsLetterOrDigit(c) || "-_.*$+:@&=,!~';()".Contains(c) ? c : '-') + .ToArray()); + var actorRef = Context.ActorOf(props, actorName); + _connectionActors[command.ConnectionName] = actorRef; + + _log.Info("Created DataConnectionActor for {0} (protocol={1}, backup={2})", + command.ConnectionName, command.ProtocolType, command.BackupConnectionDetails != null ? "yes" : "none"); +} +``` + +### Step 3: Update all callers of CreateConnectionCommand + +Search for all places that construct `CreateConnectionCommand` and update them to use the new signature. The primary caller is the site-side deployment handler. + +### Step 4: Build and test + +Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` + +### Step 5: Commit + +```bash +git add -A +git commit -m "feat(dcl): extend CreateConnectionCommand with backup config and failover retry count" +``` + +--- + +## Task 3: DataConnectionActor Failover State Machine + +**Files:** +- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` +- Modify: `src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs` + +This is the core change. The actor gains failover logic in its Reconnecting state. + +### Step 1: Add new state fields to DataConnectionActor + +Add these fields alongside the existing ones (around line 30): + +```csharp +private readonly string _protocolType; +private readonly IDictionary _primaryConfig; +private readonly IDictionary? _backupConfig; +private readonly int _failoverRetryCount; +private readonly IDataConnectionFactory _factory; +private ActiveEndpoint _activeEndpoint = ActiveEndpoint.Primary; +private int _consecutiveFailures; + +public enum ActiveEndpoint { Primary, Backup } +``` + +### Step 2: Update constructor + +Extend the constructor to accept both configs and the factory: + +```csharp +public DataConnectionActor( + string connectionName, + IDataConnection adapter, + DataConnectionOptions options, + ISiteHealthCollector healthCollector, + string protocolType, + IDictionary primaryConfig, + IDictionary? backupConfig = null, + int failoverRetryCount = 3) +{ + _connectionName = connectionName; + _adapter = adapter; + _options = options; + _healthCollector = healthCollector; + _protocolType = protocolType; + _primaryConfig = primaryConfig; + _backupConfig = backupConfig; + _failoverRetryCount = failoverRetryCount; + _connectionDetails = primaryConfig; // start with primary +} +``` + +Note: The actor also needs `IDataConnectionFactory` injected to create new adapters on failover. Pass it through the constructor or resolve via DI. The `DataConnectionManagerActor` already has the factory — pass it through to the actor constructor. + +### Step 3: Extend HandleReconnectResult with failover logic + +Replace the reconnect failure handling (around lines 279-296) to include failover: + +```csharp +private void HandleReconnectResult(ConnectResult result) +{ + if (result.Success) + { + _consecutiveFailures = 0; + _log.Info("Reconnected {0} on {1} endpoint", _connectionName, _activeEndpoint); + ReSubscribeAll(); + BecomeConnected(); + return; + } + + _consecutiveFailures++; + _log.Warning("Reconnect attempt {0}/{1} failed for {2} on {3}: {4}", + _consecutiveFailures, _failoverRetryCount, _connectionName, _activeEndpoint, result.Error); + + if (_consecutiveFailures >= _failoverRetryCount && _backupConfig != null) + { + // Switch endpoint + var previousEndpoint = _activeEndpoint; + _activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary + ? ActiveEndpoint.Backup + : ActiveEndpoint.Primary; + _consecutiveFailures = 0; + + var newConfig = _activeEndpoint == ActiveEndpoint.Primary ? _primaryConfig : _backupConfig; + + _log.Warning("Failing over {0} from {1} to {2}", _connectionName, previousEndpoint, _activeEndpoint); + + // Dispose old adapter, create new one + _ = _adapter.DisposeAsync(); + _adapter = _factory.Create(_protocolType, newConfig); + _connectionDetails = newConfig; + + // Wire up disconnect handler on new adapter + _adapter.Disconnected += () => _self.Tell(new AdapterDisconnected()); + } + + // Schedule next retry + Context.System.Scheduler.ScheduleTellOnce( + _options.ReconnectInterval, Self, AttemptConnect.Instance, ActorRefs.NoSender); +} +``` + +### Step 4: Pass IDataConnectionFactory to DataConnectionActor + +Update `DataConnectionManagerActor.HandleCreateConnection` to pass the factory: + +```csharp +var props = Props.Create(() => new DataConnectionActor( + command.ConnectionName, adapter, _options, _healthCollector, + _factory, // pass factory for failover adapter creation + command.ProtocolType, command.PrimaryConnectionDetails, + command.BackupConnectionDetails, command.FailoverRetryCount)); +``` + +And update the DataConnectionActor constructor to store `_factory`. + +### Step 5: Build and run existing tests + +Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` + +Existing tests must pass (they use single-endpoint configs, so no failover triggered). + +### Step 6: Commit + +```bash +git add -A +git commit -m "feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching" +``` + +--- + +## Task 4: Failover Tests + +**Files:** +- Modify: `tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs` + +### Step 1: Write test — failover after N retries + +```csharp +[Fact] +public async Task Reconnecting_AfterFailoverRetryCount_SwitchesToBackup() +{ + // Arrange: create actor with primary + backup, failoverRetryCount = 2 + var primaryAdapter = Substitute.For(); + var backupAdapter = Substitute.For(); + var factory = Substitute.For(); + factory.Create("OpcUa", Arg.Is>(d => d["endpoint"] == "backup")) + .Returns(backupAdapter); + + // Primary connects then disconnects + primaryAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) + .Returns(Task.CompletedTask); + primaryAdapter.Status.Returns(ConnectionHealth.Connected); + + var primaryConfig = new Dictionary { ["endpoint"] = "primary" }; + var backupConfig = new Dictionary { ["endpoint"] = "backup" }; + + // Create actor, connect on primary + // ... (use test kit patterns from existing tests) + // Simulate disconnect, verify 2 failures then factory.Create called with backup config +} +``` + +### Step 2: Write test — single endpoint retries forever + +```csharp +[Fact] +public async Task Reconnecting_NoBackup_RetriesIndefinitely() +{ + // Arrange: create actor with primary only, no backup + // Simulate 10 reconnect failures + // Verify: factory.Create never called with backup, just keeps retrying +} +``` + +### Step 3: Write test — round-robin back to primary after backup fails + +```csharp +[Fact] +public async Task Reconnecting_BackupFails_SwitchesBackToPrimary() +{ + // Arrange: primary + backup, failoverRetryCount = 1 + // Simulate: primary fails 1x → switch to backup → backup fails 1x → switch to primary + // Verify: round-robin pattern +} +``` + +### Step 4: Write test — successful reconnect resets counter + +```csharp +[Fact] +public async Task Reconnecting_SuccessfulConnect_ResetsConsecutiveFailures() +{ + // Arrange: failoverRetryCount = 3 + // Simulate: 2 failures on primary, then success + // Verify: no failover, counter reset +} +``` + +### Step 5: Write test — ReSubscribeAll called after failover + +```csharp +[Fact] +public async Task Failover_ReSubscribesAllTagsOnNewAdapter() +{ + // Arrange: actor with subscriptions, then failover + // Verify: new adapter receives SubscribeAsync calls for all previously subscribed tags +} +``` + +### Step 6: Run all tests + +Run: `dotnet test tests/ScadaLink.DataConnectionLayer.Tests -v` + +### Step 7: Commit + +```bash +git add -A +git commit -m "test(dcl): add failover state machine tests for DataConnectionActor" +``` + +--- + +## Task 5: Health Reporting & Site Event Logging + +**Files:** +- Modify: `src/ScadaLink.Commons/Messages/DataConnection/DataConnectionHealthReport.cs` +- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` (ReplyWithHealthReport, HandleReconnectResult) + +### Step 1: Add ActiveEndpoint to health report + +```csharp +public record DataConnectionHealthReport( + string ConnectionName, + ConnectionHealth Status, + int TotalSubscribedTags, + int ResolvedTags, + string ActiveEndpoint, + DateTimeOffset Timestamp); +``` + +### Step 2: Update ReplyWithHealthReport in DataConnectionActor + +Update the health report method (around line 516) to include the active endpoint: + +```csharp +private void ReplyWithHealthReport() +{ + var endpointLabel = _backupConfig == null + ? "Primary (no backup)" + : _activeEndpoint.ToString(); + + Sender.Tell(new DataConnectionHealthReport( + _connectionName, _adapter.Status, + _subscriptionsByInstance.Values.Sum(s => s.Count), + _resolvedTags, + endpointLabel, + DateTimeOffset.UtcNow)); +} +``` + +### Step 3: Add site event logging on failover + +In `HandleReconnectResult`, after switching endpoints, log a site event: + +```csharp +if (_siteEventLogger != null) +{ + _ = _siteEventLogger.LogEventAsync( + "connection", "Warning", null, _connectionName, + $"Failover from {previousEndpoint} to {_activeEndpoint}", + $"After {_failoverRetryCount} consecutive failures"); +} +``` + +Note: The actor needs `ISiteEventLogger` injected. Add it as an optional constructor parameter. + +### Step 4: Add site event logging on successful reconnect after failover + +In `HandleReconnectResult` success path, if the endpoint changed from last known good: + +```csharp +if (_siteEventLogger != null) +{ + _ = _siteEventLogger.LogEventAsync( + "connection", "Info", null, _connectionName, + $"Connection restored on {_activeEndpoint} endpoint", null); +} +``` + +### Step 5: Build and test + +Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` + +### Step 6: Commit + +```bash +git add -A +git commit -m "feat(dcl): add active endpoint to health reports and log failover events" +``` + +--- + +## Task 6: Central UI Changes + +**Files:** +- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor` +- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor` + +### Step 1: Update DataConnections list page + +Add `Active Endpoint` column to the table (around line 28-64). Insert after the Protocol column: + +```html +Active Endpoint +``` + +And in the row template: + +```html +@connection.ActiveEndpoint +``` + +This requires the list page to fetch health data alongside the connection list. Add a health status lookup or include `ActiveEndpoint` in the data connection response. + +### Step 2: Update DataConnectionForm — rename Configuration label + +Change the "Configuration" label to "Primary Endpoint Configuration" (around line 44-61). + +### Step 3: Add backup endpoint section + +Below the primary config field, add: + +```html +@if (!_showBackup) +{ + +} +else +{ +
+
+ + +
+