# Primary/Backup Data Connection Endpoints — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. **Goal:** Add optional backup endpoints to data connections with automatic failover after configurable retry count. **Architecture:** The `DataConnectionActor` gains failover logic in its Reconnecting state — after N failed retries on the active endpoint, it disposes the adapter and creates a fresh one with the other endpoint's config. Adapters remain single-endpoint. Entity model splits `Configuration` into `PrimaryConfiguration` + `BackupConfiguration`. **Tech Stack:** C# / .NET 10, Akka.NET, EF Core, Blazor Server, System.CommandLine **Design doc:** `docs/plans/2026-03-22-primary-backup-data-connections-design.md` --- ## Task 1: Entity Model & Database Migration **Files:** - Modify: `src/ScadaLink.Commons/Entities/Sites/DataConnection.cs` - Modify: `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs` (lines 32-56) - Modify: `src/ScadaLink.Commons/Messages/Artifacts/DataConnectionArtifact.cs` ### Step 1: Update DataConnection entity In `DataConnection.cs`, rename `Configuration` to `PrimaryConfiguration`, add `BackupConfiguration` and `FailoverRetryCount`: ```csharp public class DataConnection { public int Id { get; set; } public int SiteId { get; set; } public string Name { get; set; } public string Protocol { get; set; } public string? PrimaryConfiguration { get; set; } public string? BackupConfiguration { get; set; } public int FailoverRetryCount { get; set; } = 3; public DataConnection(int siteId, string name, string protocol) { SiteId = siteId; Name = name ?? throw new ArgumentNullException(nameof(name)); Protocol = protocol ?? throw new ArgumentNullException(nameof(protocol)); } } ``` ### Step 2: Update EF Core mapping In `SiteConfiguration.cs`, update the DataConnection mapping (around lines 46-47): - Rename `Configuration` property mapping to `PrimaryConfiguration` (MaxLength 4000) - Add `BackupConfiguration` property (optional, MaxLength 4000) - Add `FailoverRetryCount` property (required, default 3) ```csharp builder.Property(d => d.PrimaryConfiguration).HasMaxLength(4000); builder.Property(d => d.BackupConfiguration).HasMaxLength(4000); builder.Property(d => d.FailoverRetryCount).HasDefaultValue(3); ``` ### Step 3: Create EF Core migration Run: ```bash cd src/ScadaLink.ConfigurationDatabase dotnet ef migrations add AddDataConnectionBackupEndpoint \ --startup-project ../ScadaLink.Host ``` Verify the migration renames `Configuration` → `PrimaryConfiguration` (should use `RenameColumn`, not drop+add). If the scaffolded migration drops and recreates, manually fix it: ```csharp migrationBuilder.RenameColumn( name: "Configuration", table: "DataConnections", newName: "PrimaryConfiguration"); migrationBuilder.AddColumn( name: "BackupConfiguration", table: "DataConnections", maxLength: 4000, nullable: true); migrationBuilder.AddColumn( name: "FailoverRetryCount", table: "DataConnections", nullable: false, defaultValue: 3); ``` ### Step 4: Update DataConnectionArtifact In `DataConnectionArtifact.cs`, replace single `ConfigurationJson` with both: ```csharp public record DataConnectionArtifact( string Name, string Protocol, string? PrimaryConfigurationJson, string? BackupConfigurationJson, int FailoverRetryCount = 3); ``` ### Step 5: Build and fix compile errors Run: `dotnet build ScadaLink.slnx` This will surface all references to the old `Configuration` and `ConfigurationJson` fields across the codebase. Fix each one — this includes: - ManagementActor handlers - CLI commands - UI pages - Deployment/flattening code - Tests Fix only the field name renames in this step (use `PrimaryConfiguration` where `Configuration` was). Don't add backup logic yet — just make it compile. ### Step 6: Run tests, fix failures Run: `dotnet test ScadaLink.slnx` Fix any test failures caused by the rename. ### Step 7: Commit ```bash git add -A git commit -m "feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount" ``` --- ## Task 2: Update CreateConnectionCommand & Manager Actor **Files:** - Modify: `src/ScadaLink.Commons/Messages/DataConnection/CreateConnectionCommand.cs` - Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionManagerActor.cs` (lines 39-62) ### Step 1: Update CreateConnectionCommand message ```csharp public record CreateConnectionCommand( string ConnectionName, string ProtocolType, IDictionary PrimaryConnectionDetails, IDictionary? BackupConnectionDetails = null, int FailoverRetryCount = 3); ``` ### Step 2: Update DataConnectionManagerActor.HandleCreateConnection Update the handler (around line 39-62) to pass both configs to DataConnectionActor: ```csharp private void HandleCreateConnection(CreateConnectionCommand command) { if (_connectionActors.ContainsKey(command.ConnectionName)) { _log.Warning("Connection {0} already exists", command.ConnectionName); return; } var adapter = _factory.Create(command.ProtocolType, command.PrimaryConnectionDetails); var props = Props.Create(() => new DataConnectionActor( command.ConnectionName, adapter, _options, _healthCollector, command.ProtocolType, command.PrimaryConnectionDetails, command.BackupConnectionDetails, command.FailoverRetryCount)); var actorName = new string(command.ConnectionName .Select(c => char.IsLetterOrDigit(c) || "-_.*$+:@&=,!~';()".Contains(c) ? c : '-') .ToArray()); var actorRef = Context.ActorOf(props, actorName); _connectionActors[command.ConnectionName] = actorRef; _log.Info("Created DataConnectionActor for {0} (protocol={1}, backup={2})", command.ConnectionName, command.ProtocolType, command.BackupConnectionDetails != null ? "yes" : "none"); } ``` ### Step 3: Update all callers of CreateConnectionCommand Search for all places that construct `CreateConnectionCommand` and update them to use the new signature. The primary caller is the site-side deployment handler. ### Step 4: Build and test Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` ### Step 5: Commit ```bash git add -A git commit -m "feat(dcl): extend CreateConnectionCommand with backup config and failover retry count" ``` --- ## Task 3: DataConnectionActor Failover State Machine **Files:** - Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` - Modify: `src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs` This is the core change. The actor gains failover logic in its Reconnecting state. ### Step 1: Add new state fields to DataConnectionActor Add these fields alongside the existing ones (around line 30): ```csharp private readonly string _protocolType; private readonly IDictionary _primaryConfig; private readonly IDictionary? _backupConfig; private readonly int _failoverRetryCount; private readonly IDataConnectionFactory _factory; private ActiveEndpoint _activeEndpoint = ActiveEndpoint.Primary; private int _consecutiveFailures; public enum ActiveEndpoint { Primary, Backup } ``` ### Step 2: Update constructor Extend the constructor to accept both configs and the factory: ```csharp public DataConnectionActor( string connectionName, IDataConnection adapter, DataConnectionOptions options, ISiteHealthCollector healthCollector, string protocolType, IDictionary primaryConfig, IDictionary? backupConfig = null, int failoverRetryCount = 3) { _connectionName = connectionName; _adapter = adapter; _options = options; _healthCollector = healthCollector; _protocolType = protocolType; _primaryConfig = primaryConfig; _backupConfig = backupConfig; _failoverRetryCount = failoverRetryCount; _connectionDetails = primaryConfig; // start with primary } ``` Note: The actor also needs `IDataConnectionFactory` injected to create new adapters on failover. Pass it through the constructor or resolve via DI. The `DataConnectionManagerActor` already has the factory — pass it through to the actor constructor. ### Step 3: Extend HandleReconnectResult with failover logic Replace the reconnect failure handling (around lines 279-296) to include failover: ```csharp private void HandleReconnectResult(ConnectResult result) { if (result.Success) { _consecutiveFailures = 0; _log.Info("Reconnected {0} on {1} endpoint", _connectionName, _activeEndpoint); ReSubscribeAll(); BecomeConnected(); return; } _consecutiveFailures++; _log.Warning("Reconnect attempt {0}/{1} failed for {2} on {3}: {4}", _consecutiveFailures, _failoverRetryCount, _connectionName, _activeEndpoint, result.Error); if (_consecutiveFailures >= _failoverRetryCount && _backupConfig != null) { // Switch endpoint var previousEndpoint = _activeEndpoint; _activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary ? ActiveEndpoint.Backup : ActiveEndpoint.Primary; _consecutiveFailures = 0; var newConfig = _activeEndpoint == ActiveEndpoint.Primary ? _primaryConfig : _backupConfig; _log.Warning("Failing over {0} from {1} to {2}", _connectionName, previousEndpoint, _activeEndpoint); // Dispose old adapter, create new one _ = _adapter.DisposeAsync(); _adapter = _factory.Create(_protocolType, newConfig); _connectionDetails = newConfig; // Wire up disconnect handler on new adapter _adapter.Disconnected += () => _self.Tell(new AdapterDisconnected()); } // Schedule next retry Context.System.Scheduler.ScheduleTellOnce( _options.ReconnectInterval, Self, AttemptConnect.Instance, ActorRefs.NoSender); } ``` ### Step 4: Pass IDataConnectionFactory to DataConnectionActor Update `DataConnectionManagerActor.HandleCreateConnection` to pass the factory: ```csharp var props = Props.Create(() => new DataConnectionActor( command.ConnectionName, adapter, _options, _healthCollector, _factory, // pass factory for failover adapter creation command.ProtocolType, command.PrimaryConnectionDetails, command.BackupConnectionDetails, command.FailoverRetryCount)); ``` And update the DataConnectionActor constructor to store `_factory`. ### Step 5: Build and run existing tests Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` Existing tests must pass (they use single-endpoint configs, so no failover triggered). ### Step 6: Commit ```bash git add -A git commit -m "feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching" ``` --- ## Task 4: Failover Tests **Files:** - Modify: `tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs` ### Step 1: Write test — failover after N retries ```csharp [Fact] public async Task Reconnecting_AfterFailoverRetryCount_SwitchesToBackup() { // Arrange: create actor with primary + backup, failoverRetryCount = 2 var primaryAdapter = Substitute.For(); var backupAdapter = Substitute.For(); var factory = Substitute.For(); factory.Create("OpcUa", Arg.Is>(d => d["endpoint"] == "backup")) .Returns(backupAdapter); // Primary connects then disconnects primaryAdapter.ConnectAsync(Arg.Any>(), Arg.Any()) .Returns(Task.CompletedTask); primaryAdapter.Status.Returns(ConnectionHealth.Connected); var primaryConfig = new Dictionary { ["endpoint"] = "primary" }; var backupConfig = new Dictionary { ["endpoint"] = "backup" }; // Create actor, connect on primary // ... (use test kit patterns from existing tests) // Simulate disconnect, verify 2 failures then factory.Create called with backup config } ``` ### Step 2: Write test — single endpoint retries forever ```csharp [Fact] public async Task Reconnecting_NoBackup_RetriesIndefinitely() { // Arrange: create actor with primary only, no backup // Simulate 10 reconnect failures // Verify: factory.Create never called with backup, just keeps retrying } ``` ### Step 3: Write test — round-robin back to primary after backup fails ```csharp [Fact] public async Task Reconnecting_BackupFails_SwitchesBackToPrimary() { // Arrange: primary + backup, failoverRetryCount = 1 // Simulate: primary fails 1x → switch to backup → backup fails 1x → switch to primary // Verify: round-robin pattern } ``` ### Step 4: Write test — successful reconnect resets counter ```csharp [Fact] public async Task Reconnecting_SuccessfulConnect_ResetsConsecutiveFailures() { // Arrange: failoverRetryCount = 3 // Simulate: 2 failures on primary, then success // Verify: no failover, counter reset } ``` ### Step 5: Write test — ReSubscribeAll called after failover ```csharp [Fact] public async Task Failover_ReSubscribesAllTagsOnNewAdapter() { // Arrange: actor with subscriptions, then failover // Verify: new adapter receives SubscribeAsync calls for all previously subscribed tags } ``` ### Step 6: Run all tests Run: `dotnet test tests/ScadaLink.DataConnectionLayer.Tests -v` ### Step 7: Commit ```bash git add -A git commit -m "test(dcl): add failover state machine tests for DataConnectionActor" ``` --- ## Task 5: Health Reporting & Site Event Logging **Files:** - Modify: `src/ScadaLink.Commons/Messages/DataConnection/DataConnectionHealthReport.cs` - Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` (ReplyWithHealthReport, HandleReconnectResult) ### Step 1: Add ActiveEndpoint to health report ```csharp public record DataConnectionHealthReport( string ConnectionName, ConnectionHealth Status, int TotalSubscribedTags, int ResolvedTags, string ActiveEndpoint, DateTimeOffset Timestamp); ``` ### Step 2: Update ReplyWithHealthReport in DataConnectionActor Update the health report method (around line 516) to include the active endpoint: ```csharp private void ReplyWithHealthReport() { var endpointLabel = _backupConfig == null ? "Primary (no backup)" : _activeEndpoint.ToString(); Sender.Tell(new DataConnectionHealthReport( _connectionName, _adapter.Status, _subscriptionsByInstance.Values.Sum(s => s.Count), _resolvedTags, endpointLabel, DateTimeOffset.UtcNow)); } ``` ### Step 3: Add site event logging on failover In `HandleReconnectResult`, after switching endpoints, log a site event: ```csharp if (_siteEventLogger != null) { _ = _siteEventLogger.LogEventAsync( "connection", "Warning", null, _connectionName, $"Failover from {previousEndpoint} to {_activeEndpoint}", $"After {_failoverRetryCount} consecutive failures"); } ``` Note: The actor needs `ISiteEventLogger` injected. Add it as an optional constructor parameter. ### Step 4: Add site event logging on successful reconnect after failover In `HandleReconnectResult` success path, if the endpoint changed from last known good: ```csharp if (_siteEventLogger != null) { _ = _siteEventLogger.LogEventAsync( "connection", "Info", null, _connectionName, $"Connection restored on {_activeEndpoint} endpoint", null); } ``` ### Step 5: Build and test Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests` ### Step 6: Commit ```bash git add -A git commit -m "feat(dcl): add active endpoint to health reports and log failover events" ``` --- ## Task 6: Central UI Changes **Files:** - Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor` - Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor` ### Step 1: Update DataConnections list page Add `Active Endpoint` column to the table (around line 28-64). Insert after the Protocol column: ```html Active Endpoint ``` And in the row template: ```html @connection.ActiveEndpoint ``` This requires the list page to fetch health data alongside the connection list. Add a health status lookup or include `ActiveEndpoint` in the data connection response. ### Step 2: Update DataConnectionForm — rename Configuration label Change the "Configuration" label to "Primary Endpoint Configuration" (around line 44-61). ### Step 3: Add backup endpoint section Below the primary config field, add: ```html @if (!_showBackup) { } else {