8 tasks with TDD steps, exact file paths, and code samples. Covers entity model, failover state machine, health reporting, UI, CLI, management API, deployment, and documentation.
22 KiB
Primary/Backup Data Connection Endpoints — Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
Goal: Add optional backup endpoints to data connections with automatic failover after configurable retry count.
Architecture: The DataConnectionActor gains failover logic in its Reconnecting state — after N failed retries on the active endpoint, it disposes the adapter and creates a fresh one with the other endpoint's config. Adapters remain single-endpoint. Entity model splits Configuration into PrimaryConfiguration + BackupConfiguration.
Tech Stack: C# / .NET 10, Akka.NET, EF Core, Blazor Server, System.CommandLine
Design doc: docs/plans/2026-03-22-primary-backup-data-connections-design.md
Task 1: Entity Model & Database Migration
Files:
- Modify:
src/ScadaLink.Commons/Entities/Sites/DataConnection.cs - Modify:
src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs(lines 32-56) - Modify:
src/ScadaLink.Commons/Messages/Artifacts/DataConnectionArtifact.cs
Step 1: Update DataConnection entity
In DataConnection.cs, rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount:
public class DataConnection
{
public int Id { get; set; }
public int SiteId { get; set; }
public string Name { get; set; }
public string Protocol { get; set; }
public string? PrimaryConfiguration { get; set; }
public string? BackupConfiguration { get; set; }
public int FailoverRetryCount { get; set; } = 3;
public DataConnection(int siteId, string name, string protocol)
{
SiteId = siteId;
Name = name ?? throw new ArgumentNullException(nameof(name));
Protocol = protocol ?? throw new ArgumentNullException(nameof(protocol));
}
}
Step 2: Update EF Core mapping
In SiteConfiguration.cs, update the DataConnection mapping (around lines 46-47):
- Rename
Configurationproperty mapping toPrimaryConfiguration(MaxLength 4000) - Add
BackupConfigurationproperty (optional, MaxLength 4000) - Add
FailoverRetryCountproperty (required, default 3)
builder.Property(d => d.PrimaryConfiguration).HasMaxLength(4000);
builder.Property(d => d.BackupConfiguration).HasMaxLength(4000);
builder.Property(d => d.FailoverRetryCount).HasDefaultValue(3);
Step 3: Create EF Core migration
Run:
cd src/ScadaLink.ConfigurationDatabase
dotnet ef migrations add AddDataConnectionBackupEndpoint \
--startup-project ../ScadaLink.Host
Verify the migration renames Configuration → PrimaryConfiguration (should use RenameColumn, not drop+add). If the scaffolded migration drops and recreates, manually fix it:
migrationBuilder.RenameColumn(
name: "Configuration",
table: "DataConnections",
newName: "PrimaryConfiguration");
migrationBuilder.AddColumn<string>(
name: "BackupConfiguration",
table: "DataConnections",
maxLength: 4000,
nullable: true);
migrationBuilder.AddColumn<int>(
name: "FailoverRetryCount",
table: "DataConnections",
nullable: false,
defaultValue: 3);
Step 4: Update DataConnectionArtifact
In DataConnectionArtifact.cs, replace single ConfigurationJson with both:
public record DataConnectionArtifact(
string Name,
string Protocol,
string? PrimaryConfigurationJson,
string? BackupConfigurationJson,
int FailoverRetryCount = 3);
Step 5: Build and fix compile errors
Run: dotnet build ScadaLink.slnx
This will surface all references to the old Configuration and ConfigurationJson fields across the codebase. Fix each one — this includes:
- ManagementActor handlers
- CLI commands
- UI pages
- Deployment/flattening code
- Tests
Fix only the field name renames in this step (use PrimaryConfiguration where Configuration was). Don't add backup logic yet — just make it compile.
Step 6: Run tests, fix failures
Run: dotnet test ScadaLink.slnx
Fix any test failures caused by the rename.
Step 7: Commit
git add -A
git commit -m "feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount"
Task 2: Update CreateConnectionCommand & Manager Actor
Files:
- Modify:
src/ScadaLink.Commons/Messages/DataConnection/CreateConnectionCommand.cs - Modify:
src/ScadaLink.DataConnectionLayer/Actors/DataConnectionManagerActor.cs(lines 39-62)
Step 1: Update CreateConnectionCommand message
public record CreateConnectionCommand(
string ConnectionName,
string ProtocolType,
IDictionary<string, string> PrimaryConnectionDetails,
IDictionary<string, string>? BackupConnectionDetails = null,
int FailoverRetryCount = 3);
Step 2: Update DataConnectionManagerActor.HandleCreateConnection
Update the handler (around line 39-62) to pass both configs to DataConnectionActor:
private void HandleCreateConnection(CreateConnectionCommand command)
{
if (_connectionActors.ContainsKey(command.ConnectionName))
{
_log.Warning("Connection {0} already exists", command.ConnectionName);
return;
}
var adapter = _factory.Create(command.ProtocolType, command.PrimaryConnectionDetails);
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName,
adapter,
_options,
_healthCollector,
command.ProtocolType,
command.PrimaryConnectionDetails,
command.BackupConnectionDetails,
command.FailoverRetryCount));
var actorName = new string(command.ConnectionName
.Select(c => char.IsLetterOrDigit(c) || "-_.*$+:@&=,!~';()".Contains(c) ? c : '-')
.ToArray());
var actorRef = Context.ActorOf(props, actorName);
_connectionActors[command.ConnectionName] = actorRef;
_log.Info("Created DataConnectionActor for {0} (protocol={1}, backup={2})",
command.ConnectionName, command.ProtocolType, command.BackupConnectionDetails != null ? "yes" : "none");
}
Step 3: Update all callers of CreateConnectionCommand
Search for all places that construct CreateConnectionCommand and update them to use the new signature. The primary caller is the site-side deployment handler.
Step 4: Build and test
Run: dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests
Step 5: Commit
git add -A
git commit -m "feat(dcl): extend CreateConnectionCommand with backup config and failover retry count"
Task 3: DataConnectionActor Failover State Machine
Files:
- Modify:
src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs - Modify:
src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs
This is the core change. The actor gains failover logic in its Reconnecting state.
Step 1: Add new state fields to DataConnectionActor
Add these fields alongside the existing ones (around line 30):
private readonly string _protocolType;
private readonly IDictionary<string, string> _primaryConfig;
private readonly IDictionary<string, string>? _backupConfig;
private readonly int _failoverRetryCount;
private readonly IDataConnectionFactory _factory;
private ActiveEndpoint _activeEndpoint = ActiveEndpoint.Primary;
private int _consecutiveFailures;
public enum ActiveEndpoint { Primary, Backup }
Step 2: Update constructor
Extend the constructor to accept both configs and the factory:
public DataConnectionActor(
string connectionName,
IDataConnection adapter,
DataConnectionOptions options,
ISiteHealthCollector healthCollector,
string protocolType,
IDictionary<string, string> primaryConfig,
IDictionary<string, string>? backupConfig = null,
int failoverRetryCount = 3)
{
_connectionName = connectionName;
_adapter = adapter;
_options = options;
_healthCollector = healthCollector;
_protocolType = protocolType;
_primaryConfig = primaryConfig;
_backupConfig = backupConfig;
_failoverRetryCount = failoverRetryCount;
_connectionDetails = primaryConfig; // start with primary
}
Note: The actor also needs IDataConnectionFactory injected to create new adapters on failover. Pass it through the constructor or resolve via DI. The DataConnectionManagerActor already has the factory — pass it through to the actor constructor.
Step 3: Extend HandleReconnectResult with failover logic
Replace the reconnect failure handling (around lines 279-296) to include failover:
private void HandleReconnectResult(ConnectResult result)
{
if (result.Success)
{
_consecutiveFailures = 0;
_log.Info("Reconnected {0} on {1} endpoint", _connectionName, _activeEndpoint);
ReSubscribeAll();
BecomeConnected();
return;
}
_consecutiveFailures++;
_log.Warning("Reconnect attempt {0}/{1} failed for {2} on {3}: {4}",
_consecutiveFailures, _failoverRetryCount, _connectionName, _activeEndpoint, result.Error);
if (_consecutiveFailures >= _failoverRetryCount && _backupConfig != null)
{
// Switch endpoint
var previousEndpoint = _activeEndpoint;
_activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary
? ActiveEndpoint.Backup
: ActiveEndpoint.Primary;
_consecutiveFailures = 0;
var newConfig = _activeEndpoint == ActiveEndpoint.Primary ? _primaryConfig : _backupConfig;
_log.Warning("Failing over {0} from {1} to {2}", _connectionName, previousEndpoint, _activeEndpoint);
// Dispose old adapter, create new one
_ = _adapter.DisposeAsync();
_adapter = _factory.Create(_protocolType, newConfig);
_connectionDetails = newConfig;
// Wire up disconnect handler on new adapter
_adapter.Disconnected += () => _self.Tell(new AdapterDisconnected());
}
// Schedule next retry
Context.System.Scheduler.ScheduleTellOnce(
_options.ReconnectInterval, Self, AttemptConnect.Instance, ActorRefs.NoSender);
}
Step 4: Pass IDataConnectionFactory to DataConnectionActor
Update DataConnectionManagerActor.HandleCreateConnection to pass the factory:
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName, adapter, _options, _healthCollector,
_factory, // pass factory for failover adapter creation
command.ProtocolType, command.PrimaryConnectionDetails,
command.BackupConnectionDetails, command.FailoverRetryCount));
And update the DataConnectionActor constructor to store _factory.
Step 5: Build and run existing tests
Run: dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests
Existing tests must pass (they use single-endpoint configs, so no failover triggered).
Step 6: Commit
git add -A
git commit -m "feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching"
Task 4: Failover Tests
Files:
- Modify:
tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs
Step 1: Write test — failover after N retries
[Fact]
public async Task Reconnecting_AfterFailoverRetryCount_SwitchesToBackup()
{
// Arrange: create actor with primary + backup, failoverRetryCount = 2
var primaryAdapter = Substitute.For<IDataConnection>();
var backupAdapter = Substitute.For<IDataConnection>();
var factory = Substitute.For<IDataConnectionFactory>();
factory.Create("OpcUa", Arg.Is<IDictionary<string, string>>(d => d["endpoint"] == "backup"))
.Returns(backupAdapter);
// Primary connects then disconnects
primaryAdapter.ConnectAsync(Arg.Any<IDictionary<string, string>>(), Arg.Any<CancellationToken>())
.Returns(Task.CompletedTask);
primaryAdapter.Status.Returns(ConnectionHealth.Connected);
var primaryConfig = new Dictionary<string, string> { ["endpoint"] = "primary" };
var backupConfig = new Dictionary<string, string> { ["endpoint"] = "backup" };
// Create actor, connect on primary
// ... (use test kit patterns from existing tests)
// Simulate disconnect, verify 2 failures then factory.Create called with backup config
}
Step 2: Write test — single endpoint retries forever
[Fact]
public async Task Reconnecting_NoBackup_RetriesIndefinitely()
{
// Arrange: create actor with primary only, no backup
// Simulate 10 reconnect failures
// Verify: factory.Create never called with backup, just keeps retrying
}
Step 3: Write test — round-robin back to primary after backup fails
[Fact]
public async Task Reconnecting_BackupFails_SwitchesBackToPrimary()
{
// Arrange: primary + backup, failoverRetryCount = 1
// Simulate: primary fails 1x → switch to backup → backup fails 1x → switch to primary
// Verify: round-robin pattern
}
Step 4: Write test — successful reconnect resets counter
[Fact]
public async Task Reconnecting_SuccessfulConnect_ResetsConsecutiveFailures()
{
// Arrange: failoverRetryCount = 3
// Simulate: 2 failures on primary, then success
// Verify: no failover, counter reset
}
Step 5: Write test — ReSubscribeAll called after failover
[Fact]
public async Task Failover_ReSubscribesAllTagsOnNewAdapter()
{
// Arrange: actor with subscriptions, then failover
// Verify: new adapter receives SubscribeAsync calls for all previously subscribed tags
}
Step 6: Run all tests
Run: dotnet test tests/ScadaLink.DataConnectionLayer.Tests -v
Step 7: Commit
git add -A
git commit -m "test(dcl): add failover state machine tests for DataConnectionActor"
Task 5: Health Reporting & Site Event Logging
Files:
- Modify:
src/ScadaLink.Commons/Messages/DataConnection/DataConnectionHealthReport.cs - Modify:
src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs(ReplyWithHealthReport, HandleReconnectResult)
Step 1: Add ActiveEndpoint to health report
public record DataConnectionHealthReport(
string ConnectionName,
ConnectionHealth Status,
int TotalSubscribedTags,
int ResolvedTags,
string ActiveEndpoint,
DateTimeOffset Timestamp);
Step 2: Update ReplyWithHealthReport in DataConnectionActor
Update the health report method (around line 516) to include the active endpoint:
private void ReplyWithHealthReport()
{
var endpointLabel = _backupConfig == null
? "Primary (no backup)"
: _activeEndpoint.ToString();
Sender.Tell(new DataConnectionHealthReport(
_connectionName, _adapter.Status,
_subscriptionsByInstance.Values.Sum(s => s.Count),
_resolvedTags,
endpointLabel,
DateTimeOffset.UtcNow));
}
Step 3: Add site event logging on failover
In HandleReconnectResult, after switching endpoints, log a site event:
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Warning", null, _connectionName,
$"Failover from {previousEndpoint} to {_activeEndpoint}",
$"After {_failoverRetryCount} consecutive failures");
}
Note: The actor needs ISiteEventLogger injected. Add it as an optional constructor parameter.
Step 4: Add site event logging on successful reconnect after failover
In HandleReconnectResult success path, if the endpoint changed from last known good:
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Info", null, _connectionName,
$"Connection restored on {_activeEndpoint} endpoint", null);
}
Step 5: Build and test
Run: dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests
Step 6: Commit
git add -A
git commit -m "feat(dcl): add active endpoint to health reports and log failover events"
Task 6: Central UI Changes
Files:
- Modify:
src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor - Modify:
src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor
Step 1: Update DataConnections list page
Add Active Endpoint column to the table (around line 28-64). Insert after the Protocol column:
<th>Active Endpoint</th>
And in the row template:
<td>@connection.ActiveEndpoint</td>
This requires the list page to fetch health data alongside the connection list. Add a health status lookup or include ActiveEndpoint in the data connection response.
Step 2: Update DataConnectionForm — rename Configuration label
Change the "Configuration" label to "Primary Endpoint Configuration" (around line 44-61).
Step 3: Add backup endpoint section
Below the primary config field, add:
@if (!_showBackup)
{
<button type="button" class="btn btn-outline-secondary btn-sm mt-2"
@onclick="() => _showBackup = true">
Add Backup Endpoint
</button>
}
else
{
<div class="mt-3">
<div class="d-flex justify-content-between align-items-center">
<label class="form-label">Backup Endpoint Configuration</label>
<button type="button" class="btn btn-outline-danger btn-sm"
@onclick="RemoveBackup">
Remove Backup
</button>
</div>
<textarea class="form-control" rows="4"
@bind="_model.BackupConfiguration"
placeholder='{"Host": "backup-host", "Port": 50101}' />
</div>
<div class="mt-3">
<label class="form-label">Failover Retry Count</label>
<input type="number" class="form-control" min="1" max="20"
@bind="_model.FailoverRetryCount" />
<small class="text-muted">Retries before switching to backup (default: 3)</small>
</div>
}
Step 4: Update form model and save logic
Add BackupConfiguration and FailoverRetryCount to the form model. Update the save method to pass both configs to the management API.
In edit mode, set _showBackup = true if BackupConfiguration is not null.
Step 5: Build and verify visually
Run: dotnet build ScadaLink.slnx
Visual verification requires running the cluster — document as manual test.
Step 6: Commit
git add -A
git commit -m "feat(ui): add primary/backup endpoint fields to data connection form"
Task 7: CLI, Management API, and Deployment
Files:
- Modify:
src/ScadaLink.Commons/Messages/Management/DataConnectionCommands.cs - Modify:
src/ScadaLink.CLI/Commands/DataConnectionCommands.cs - Modify:
src/ScadaLink.ManagementService/ManagementActor.cs(lines 689-711) - Modify: Deployment/flattening code that creates DataConnectionArtifact
Step 1: Update management command messages
public record CreateDataConnectionCommand(
int SiteId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
public record UpdateDataConnectionCommand(
int DataConnectionId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
Step 2: Update ManagementActor handlers
In HandleCreateDataConnection (around line 689): set PrimaryConfiguration, BackupConfiguration, FailoverRetryCount from command.
In HandleUpdateDataConnection (around line 699): same fields.
Step 3: Update CLI commands
In BuildCreate (around line 75-98):
- Rename
--configurationto--primary-config - Add hidden alias
--configurationpointing to same option - Add
--backup-configoption (optional) - Add
--failover-retry-countoption (optional, default 3)
In BuildUpdate (around line 36-59): same changes.
In BuildGet (around line 22-34): update output to show both configs.
Step 4: Update deployment artifact creation
Find where DataConnectionArtifact is constructed (in deployment/flattening code). Update to pass PrimaryConfigurationJson and BackupConfigurationJson from the entity.
Step 5: Build and test CLI
Run: dotnet build ScadaLink.slnx
Test CLI manually:
scadalink data-connection create --site-id 1 --name "Test" --protocol OpcUa \
--primary-config '{"endpoint":"opc.tcp://localhost:50000"}' \
--backup-config '{"endpoint":"opc.tcp://localhost:50010"}' \
--failover-retry-count 3
Step 6: Commit
git add -A
git commit -m "feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands"
Task 8: Documentation Updates
Files:
- Modify:
docs/requirements/Component-DataConnectionLayer.md - Modify:
docs/requirements/HighLevelReqs.md - Modify:
docs/requirements/Component-CentralUI.md - Modify:
docs/test_infra/test_infra.md
Step 1: Update Component-DataConnectionLayer.md
Add new section "Endpoint Redundancy" covering:
- Optional backup endpoints
- Failover state machine (include ASCII diagram from design doc)
- Configuration model (PrimaryConfiguration + BackupConfiguration)
- Failover retry count and round-robin behavior
- Subscription re-creation on failover
- Health reporting (ActiveEndpoint field)
- Site event logging (DataConnectionFailover, DataConnectionRestored)
Update the configuration reference tables to show the new entity fields.
Step 2: Update HighLevelReqs.md
Add requirement: "Data connections support optional backup endpoints with automatic failover after configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint."
Step 3: Update Component-CentralUI.md
Update the Data Connections workflow section to describe:
- Primary/backup config fields on the form
- Collapsible backup section
- Failover retry count field
- Active endpoint column on list page
Step 4: Update test_infra.md
Add a note in the Remote Test Infrastructure section that the dual OPC UA servers (50000/50010) and dual LmxProxy instances (50100/50101) enable primary/backup testing.
Step 5: Commit
git add -A
git commit -m "docs(dcl): document primary/backup endpoint redundancy across requirements and test infra"