docs(dcl): add primary/backup data connections implementation plan

8 tasks with TDD steps, exact file paths, and code samples.
Covers entity model, failover state machine, health reporting,
UI, CLI, management API, deployment, and documentation.
This commit is contained in:
Joseph Doherty
2026-03-22 08:13:23 -04:00
parent 6267ff882c
commit 5ca1be328c
2 changed files with 709 additions and 0 deletions

View File

@@ -0,0 +1,695 @@
# Primary/Backup Data Connection Endpoints — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
**Goal:** Add optional backup endpoints to data connections with automatic failover after configurable retry count.
**Architecture:** The `DataConnectionActor` gains failover logic in its Reconnecting state — after N failed retries on the active endpoint, it disposes the adapter and creates a fresh one with the other endpoint's config. Adapters remain single-endpoint. Entity model splits `Configuration` into `PrimaryConfiguration` + `BackupConfiguration`.
**Tech Stack:** C# / .NET 10, Akka.NET, EF Core, Blazor Server, System.CommandLine
**Design doc:** `docs/plans/2026-03-22-primary-backup-data-connections-design.md`
---
## Task 1: Entity Model & Database Migration
**Files:**
- Modify: `src/ScadaLink.Commons/Entities/Sites/DataConnection.cs`
- Modify: `src/ScadaLink.ConfigurationDatabase/Configurations/SiteConfiguration.cs` (lines 32-56)
- Modify: `src/ScadaLink.Commons/Messages/Artifacts/DataConnectionArtifact.cs`
### Step 1: Update DataConnection entity
In `DataConnection.cs`, rename `Configuration` to `PrimaryConfiguration`, add `BackupConfiguration` and `FailoverRetryCount`:
```csharp
public class DataConnection
{
public int Id { get; set; }
public int SiteId { get; set; }
public string Name { get; set; }
public string Protocol { get; set; }
public string? PrimaryConfiguration { get; set; }
public string? BackupConfiguration { get; set; }
public int FailoverRetryCount { get; set; } = 3;
public DataConnection(int siteId, string name, string protocol)
{
SiteId = siteId;
Name = name ?? throw new ArgumentNullException(nameof(name));
Protocol = protocol ?? throw new ArgumentNullException(nameof(protocol));
}
}
```
### Step 2: Update EF Core mapping
In `SiteConfiguration.cs`, update the DataConnection mapping (around lines 46-47):
- Rename `Configuration` property mapping to `PrimaryConfiguration` (MaxLength 4000)
- Add `BackupConfiguration` property (optional, MaxLength 4000)
- Add `FailoverRetryCount` property (required, default 3)
```csharp
builder.Property(d => d.PrimaryConfiguration).HasMaxLength(4000);
builder.Property(d => d.BackupConfiguration).HasMaxLength(4000);
builder.Property(d => d.FailoverRetryCount).HasDefaultValue(3);
```
### Step 3: Create EF Core migration
Run:
```bash
cd src/ScadaLink.ConfigurationDatabase
dotnet ef migrations add AddDataConnectionBackupEndpoint \
--startup-project ../ScadaLink.Host
```
Verify the migration renames `Configuration``PrimaryConfiguration` (should use `RenameColumn`, not drop+add). If the scaffolded migration drops and recreates, manually fix it:
```csharp
migrationBuilder.RenameColumn(
name: "Configuration",
table: "DataConnections",
newName: "PrimaryConfiguration");
migrationBuilder.AddColumn<string>(
name: "BackupConfiguration",
table: "DataConnections",
maxLength: 4000,
nullable: true);
migrationBuilder.AddColumn<int>(
name: "FailoverRetryCount",
table: "DataConnections",
nullable: false,
defaultValue: 3);
```
### Step 4: Update DataConnectionArtifact
In `DataConnectionArtifact.cs`, replace single `ConfigurationJson` with both:
```csharp
public record DataConnectionArtifact(
string Name,
string Protocol,
string? PrimaryConfigurationJson,
string? BackupConfigurationJson,
int FailoverRetryCount = 3);
```
### Step 5: Build and fix compile errors
Run: `dotnet build ScadaLink.slnx`
This will surface all references to the old `Configuration` and `ConfigurationJson` fields across the codebase. Fix each one — this includes:
- ManagementActor handlers
- CLI commands
- UI pages
- Deployment/flattening code
- Tests
Fix only the field name renames in this step (use `PrimaryConfiguration` where `Configuration` was). Don't add backup logic yet — just make it compile.
### Step 6: Run tests, fix failures
Run: `dotnet test ScadaLink.slnx`
Fix any test failures caused by the rename.
### Step 7: Commit
```bash
git add -A
git commit -m "feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount"
```
---
## Task 2: Update CreateConnectionCommand & Manager Actor
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/DataConnection/CreateConnectionCommand.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionManagerActor.cs` (lines 39-62)
### Step 1: Update CreateConnectionCommand message
```csharp
public record CreateConnectionCommand(
string ConnectionName,
string ProtocolType,
IDictionary<string, string> PrimaryConnectionDetails,
IDictionary<string, string>? BackupConnectionDetails = null,
int FailoverRetryCount = 3);
```
### Step 2: Update DataConnectionManagerActor.HandleCreateConnection
Update the handler (around line 39-62) to pass both configs to DataConnectionActor:
```csharp
private void HandleCreateConnection(CreateConnectionCommand command)
{
if (_connectionActors.ContainsKey(command.ConnectionName))
{
_log.Warning("Connection {0} already exists", command.ConnectionName);
return;
}
var adapter = _factory.Create(command.ProtocolType, command.PrimaryConnectionDetails);
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName,
adapter,
_options,
_healthCollector,
command.ProtocolType,
command.PrimaryConnectionDetails,
command.BackupConnectionDetails,
command.FailoverRetryCount));
var actorName = new string(command.ConnectionName
.Select(c => char.IsLetterOrDigit(c) || "-_.*$+:@&=,!~';()".Contains(c) ? c : '-')
.ToArray());
var actorRef = Context.ActorOf(props, actorName);
_connectionActors[command.ConnectionName] = actorRef;
_log.Info("Created DataConnectionActor for {0} (protocol={1}, backup={2})",
command.ConnectionName, command.ProtocolType, command.BackupConnectionDetails != null ? "yes" : "none");
}
```
### Step 3: Update all callers of CreateConnectionCommand
Search for all places that construct `CreateConnectionCommand` and update them to use the new signature. The primary caller is the site-side deployment handler.
### Step 4: Build and test
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
### Step 5: Commit
```bash
git add -A
git commit -m "feat(dcl): extend CreateConnectionCommand with backup config and failover retry count"
```
---
## Task 3: DataConnectionActor Failover State Machine
**Files:**
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/DataConnectionFactory.cs`
This is the core change. The actor gains failover logic in its Reconnecting state.
### Step 1: Add new state fields to DataConnectionActor
Add these fields alongside the existing ones (around line 30):
```csharp
private readonly string _protocolType;
private readonly IDictionary<string, string> _primaryConfig;
private readonly IDictionary<string, string>? _backupConfig;
private readonly int _failoverRetryCount;
private readonly IDataConnectionFactory _factory;
private ActiveEndpoint _activeEndpoint = ActiveEndpoint.Primary;
private int _consecutiveFailures;
public enum ActiveEndpoint { Primary, Backup }
```
### Step 2: Update constructor
Extend the constructor to accept both configs and the factory:
```csharp
public DataConnectionActor(
string connectionName,
IDataConnection adapter,
DataConnectionOptions options,
ISiteHealthCollector healthCollector,
string protocolType,
IDictionary<string, string> primaryConfig,
IDictionary<string, string>? backupConfig = null,
int failoverRetryCount = 3)
{
_connectionName = connectionName;
_adapter = adapter;
_options = options;
_healthCollector = healthCollector;
_protocolType = protocolType;
_primaryConfig = primaryConfig;
_backupConfig = backupConfig;
_failoverRetryCount = failoverRetryCount;
_connectionDetails = primaryConfig; // start with primary
}
```
Note: The actor also needs `IDataConnectionFactory` injected to create new adapters on failover. Pass it through the constructor or resolve via DI. The `DataConnectionManagerActor` already has the factory — pass it through to the actor constructor.
### Step 3: Extend HandleReconnectResult with failover logic
Replace the reconnect failure handling (around lines 279-296) to include failover:
```csharp
private void HandleReconnectResult(ConnectResult result)
{
if (result.Success)
{
_consecutiveFailures = 0;
_log.Info("Reconnected {0} on {1} endpoint", _connectionName, _activeEndpoint);
ReSubscribeAll();
BecomeConnected();
return;
}
_consecutiveFailures++;
_log.Warning("Reconnect attempt {0}/{1} failed for {2} on {3}: {4}",
_consecutiveFailures, _failoverRetryCount, _connectionName, _activeEndpoint, result.Error);
if (_consecutiveFailures >= _failoverRetryCount && _backupConfig != null)
{
// Switch endpoint
var previousEndpoint = _activeEndpoint;
_activeEndpoint = _activeEndpoint == ActiveEndpoint.Primary
? ActiveEndpoint.Backup
: ActiveEndpoint.Primary;
_consecutiveFailures = 0;
var newConfig = _activeEndpoint == ActiveEndpoint.Primary ? _primaryConfig : _backupConfig;
_log.Warning("Failing over {0} from {1} to {2}", _connectionName, previousEndpoint, _activeEndpoint);
// Dispose old adapter, create new one
_ = _adapter.DisposeAsync();
_adapter = _factory.Create(_protocolType, newConfig);
_connectionDetails = newConfig;
// Wire up disconnect handler on new adapter
_adapter.Disconnected += () => _self.Tell(new AdapterDisconnected());
}
// Schedule next retry
Context.System.Scheduler.ScheduleTellOnce(
_options.ReconnectInterval, Self, AttemptConnect.Instance, ActorRefs.NoSender);
}
```
### Step 4: Pass IDataConnectionFactory to DataConnectionActor
Update `DataConnectionManagerActor.HandleCreateConnection` to pass the factory:
```csharp
var props = Props.Create(() => new DataConnectionActor(
command.ConnectionName, adapter, _options, _healthCollector,
_factory, // pass factory for failover adapter creation
command.ProtocolType, command.PrimaryConnectionDetails,
command.BackupConnectionDetails, command.FailoverRetryCount));
```
And update the DataConnectionActor constructor to store `_factory`.
### Step 5: Build and run existing tests
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
Existing tests must pass (they use single-endpoint configs, so no failover triggered).
### Step 6: Commit
```bash
git add -A
git commit -m "feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching"
```
---
## Task 4: Failover Tests
**Files:**
- Modify: `tests/ScadaLink.DataConnectionLayer.Tests/DataConnectionActorTests.cs`
### Step 1: Write test — failover after N retries
```csharp
[Fact]
public async Task Reconnecting_AfterFailoverRetryCount_SwitchesToBackup()
{
// Arrange: create actor with primary + backup, failoverRetryCount = 2
var primaryAdapter = Substitute.For<IDataConnection>();
var backupAdapter = Substitute.For<IDataConnection>();
var factory = Substitute.For<IDataConnectionFactory>();
factory.Create("OpcUa", Arg.Is<IDictionary<string, string>>(d => d["endpoint"] == "backup"))
.Returns(backupAdapter);
// Primary connects then disconnects
primaryAdapter.ConnectAsync(Arg.Any<IDictionary<string, string>>(), Arg.Any<CancellationToken>())
.Returns(Task.CompletedTask);
primaryAdapter.Status.Returns(ConnectionHealth.Connected);
var primaryConfig = new Dictionary<string, string> { ["endpoint"] = "primary" };
var backupConfig = new Dictionary<string, string> { ["endpoint"] = "backup" };
// Create actor, connect on primary
// ... (use test kit patterns from existing tests)
// Simulate disconnect, verify 2 failures then factory.Create called with backup config
}
```
### Step 2: Write test — single endpoint retries forever
```csharp
[Fact]
public async Task Reconnecting_NoBackup_RetriesIndefinitely()
{
// Arrange: create actor with primary only, no backup
// Simulate 10 reconnect failures
// Verify: factory.Create never called with backup, just keeps retrying
}
```
### Step 3: Write test — round-robin back to primary after backup fails
```csharp
[Fact]
public async Task Reconnecting_BackupFails_SwitchesBackToPrimary()
{
// Arrange: primary + backup, failoverRetryCount = 1
// Simulate: primary fails 1x → switch to backup → backup fails 1x → switch to primary
// Verify: round-robin pattern
}
```
### Step 4: Write test — successful reconnect resets counter
```csharp
[Fact]
public async Task Reconnecting_SuccessfulConnect_ResetsConsecutiveFailures()
{
// Arrange: failoverRetryCount = 3
// Simulate: 2 failures on primary, then success
// Verify: no failover, counter reset
}
```
### Step 5: Write test — ReSubscribeAll called after failover
```csharp
[Fact]
public async Task Failover_ReSubscribesAllTagsOnNewAdapter()
{
// Arrange: actor with subscriptions, then failover
// Verify: new adapter receives SubscribeAsync calls for all previously subscribed tags
}
```
### Step 6: Run all tests
Run: `dotnet test tests/ScadaLink.DataConnectionLayer.Tests -v`
### Step 7: Commit
```bash
git add -A
git commit -m "test(dcl): add failover state machine tests for DataConnectionActor"
```
---
## Task 5: Health Reporting & Site Event Logging
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/DataConnection/DataConnectionHealthReport.cs`
- Modify: `src/ScadaLink.DataConnectionLayer/Actors/DataConnectionActor.cs` (ReplyWithHealthReport, HandleReconnectResult)
### Step 1: Add ActiveEndpoint to health report
```csharp
public record DataConnectionHealthReport(
string ConnectionName,
ConnectionHealth Status,
int TotalSubscribedTags,
int ResolvedTags,
string ActiveEndpoint,
DateTimeOffset Timestamp);
```
### Step 2: Update ReplyWithHealthReport in DataConnectionActor
Update the health report method (around line 516) to include the active endpoint:
```csharp
private void ReplyWithHealthReport()
{
var endpointLabel = _backupConfig == null
? "Primary (no backup)"
: _activeEndpoint.ToString();
Sender.Tell(new DataConnectionHealthReport(
_connectionName, _adapter.Status,
_subscriptionsByInstance.Values.Sum(s => s.Count),
_resolvedTags,
endpointLabel,
DateTimeOffset.UtcNow));
}
```
### Step 3: Add site event logging on failover
In `HandleReconnectResult`, after switching endpoints, log a site event:
```csharp
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Warning", null, _connectionName,
$"Failover from {previousEndpoint} to {_activeEndpoint}",
$"After {_failoverRetryCount} consecutive failures");
}
```
Note: The actor needs `ISiteEventLogger` injected. Add it as an optional constructor parameter.
### Step 4: Add site event logging on successful reconnect after failover
In `HandleReconnectResult` success path, if the endpoint changed from last known good:
```csharp
if (_siteEventLogger != null)
{
_ = _siteEventLogger.LogEventAsync(
"connection", "Info", null, _connectionName,
$"Connection restored on {_activeEndpoint} endpoint", null);
}
```
### Step 5: Build and test
Run: `dotnet build ScadaLink.slnx && dotnet test tests/ScadaLink.DataConnectionLayer.Tests`
### Step 6: Commit
```bash
git add -A
git commit -m "feat(dcl): add active endpoint to health reports and log failover events"
```
---
## Task 6: Central UI Changes
**Files:**
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnections.razor`
- Modify: `src/ScadaLink.CentralUI/Components/Pages/Admin/DataConnectionForm.razor`
### Step 1: Update DataConnections list page
Add `Active Endpoint` column to the table (around line 28-64). Insert after the Protocol column:
```html
<th>Active Endpoint</th>
```
And in the row template:
```html
<td>@connection.ActiveEndpoint</td>
```
This requires the list page to fetch health data alongside the connection list. Add a health status lookup or include `ActiveEndpoint` in the data connection response.
### Step 2: Update DataConnectionForm — rename Configuration label
Change the "Configuration" label to "Primary Endpoint Configuration" (around line 44-61).
### Step 3: Add backup endpoint section
Below the primary config field, add:
```html
@if (!_showBackup)
{
<button type="button" class="btn btn-outline-secondary btn-sm mt-2"
@onclick="() => _showBackup = true">
Add Backup Endpoint
</button>
}
else
{
<div class="mt-3">
<div class="d-flex justify-content-between align-items-center">
<label class="form-label">Backup Endpoint Configuration</label>
<button type="button" class="btn btn-outline-danger btn-sm"
@onclick="RemoveBackup">
Remove Backup
</button>
</div>
<textarea class="form-control" rows="4"
@bind="_model.BackupConfiguration"
placeholder='{"Host": "backup-host", "Port": 50101}' />
</div>
<div class="mt-3">
<label class="form-label">Failover Retry Count</label>
<input type="number" class="form-control" min="1" max="20"
@bind="_model.FailoverRetryCount" />
<small class="text-muted">Retries before switching to backup (default: 3)</small>
</div>
}
```
### Step 4: Update form model and save logic
Add `BackupConfiguration` and `FailoverRetryCount` to the form model. Update the save method to pass both configs to the management API.
In edit mode, set `_showBackup = true` if `BackupConfiguration` is not null.
### Step 5: Build and verify visually
Run: `dotnet build ScadaLink.slnx`
Visual verification requires running the cluster — document as manual test.
### Step 6: Commit
```bash
git add -A
git commit -m "feat(ui): add primary/backup endpoint fields to data connection form"
```
---
## Task 7: CLI, Management API, and Deployment
**Files:**
- Modify: `src/ScadaLink.Commons/Messages/Management/DataConnectionCommands.cs`
- Modify: `src/ScadaLink.CLI/Commands/DataConnectionCommands.cs`
- Modify: `src/ScadaLink.ManagementService/ManagementActor.cs` (lines 689-711)
- Modify: Deployment/flattening code that creates DataConnectionArtifact
### Step 1: Update management command messages
```csharp
public record CreateDataConnectionCommand(
int SiteId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
public record UpdateDataConnectionCommand(
int DataConnectionId, string Name, string Protocol,
string? PrimaryConfiguration,
string? BackupConfiguration = null,
int FailoverRetryCount = 3);
```
### Step 2: Update ManagementActor handlers
In `HandleCreateDataConnection` (around line 689): set `PrimaryConfiguration`, `BackupConfiguration`, `FailoverRetryCount` from command.
In `HandleUpdateDataConnection` (around line 699): same fields.
### Step 3: Update CLI commands
In `BuildCreate` (around line 75-98):
- Rename `--configuration` to `--primary-config`
- Add hidden alias `--configuration` pointing to same option
- Add `--backup-config` option (optional)
- Add `--failover-retry-count` option (optional, default 3)
In `BuildUpdate` (around line 36-59): same changes.
In `BuildGet` (around line 22-34): update output to show both configs.
### Step 4: Update deployment artifact creation
Find where `DataConnectionArtifact` is constructed (in deployment/flattening code). Update to pass `PrimaryConfigurationJson` and `BackupConfigurationJson` from the entity.
### Step 5: Build and test CLI
Run: `dotnet build ScadaLink.slnx`
Test CLI manually:
```bash
scadalink data-connection create --site-id 1 --name "Test" --protocol OpcUa \
--primary-config '{"endpoint":"opc.tcp://localhost:50000"}' \
--backup-config '{"endpoint":"opc.tcp://localhost:50010"}' \
--failover-retry-count 3
```
### Step 6: Commit
```bash
git add -A
git commit -m "feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands"
```
---
## Task 8: Documentation Updates
**Files:**
- Modify: `docs/requirements/Component-DataConnectionLayer.md`
- Modify: `docs/requirements/HighLevelReqs.md`
- Modify: `docs/requirements/Component-CentralUI.md`
- Modify: `docs/test_infra/test_infra.md`
### Step 1: Update Component-DataConnectionLayer.md
Add new section "Endpoint Redundancy" covering:
- Optional backup endpoints
- Failover state machine (include ASCII diagram from design doc)
- Configuration model (PrimaryConfiguration + BackupConfiguration)
- Failover retry count and round-robin behavior
- Subscription re-creation on failover
- Health reporting (ActiveEndpoint field)
- Site event logging (DataConnectionFailover, DataConnectionRestored)
Update the configuration reference tables to show the new entity fields.
### Step 2: Update HighLevelReqs.md
Add requirement: "Data connections support optional backup endpoints with automatic failover after configurable retry count. On failover, all subscriptions are transparently re-created on the new endpoint."
### Step 3: Update Component-CentralUI.md
Update the Data Connections workflow section to describe:
- Primary/backup config fields on the form
- Collapsible backup section
- Failover retry count field
- Active endpoint column on list page
### Step 4: Update test_infra.md
Add a note in the Remote Test Infrastructure section that the dual OPC UA servers (50000/50010) and dual LmxProxy instances (50100/50101) enable primary/backup testing.
### Step 5: Commit
```bash
git add -A
git commit -m "docs(dcl): document primary/backup endpoint redundancy across requirements and test infra"
```

View File

@@ -0,0 +1,14 @@
{
"planPath": "docs/plans/2026-03-22-primary-backup-data-connections.md",
"tasks": [
{"id": 1, "subject": "Task 1: Entity Model & Database Migration", "status": "pending"},
{"id": 2, "subject": "Task 2: Update CreateConnectionCommand & Manager Actor", "status": "pending", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: DataConnectionActor Failover State Machine", "status": "pending", "blockedBy": [1, 2]},
{"id": 4, "subject": "Task 4: Failover Tests", "status": "pending", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Health Reporting & Site Event Logging", "status": "pending", "blockedBy": [3]},
{"id": 6, "subject": "Task 6: Central UI Changes", "status": "pending", "blockedBy": [1]},
{"id": 7, "subject": "Task 7: CLI, Management API, and Deployment", "status": "pending", "blockedBy": [1]},
{"id": 8, "subject": "Task 8: Documentation Updates", "status": "pending", "blockedBy": [3]}
],
"lastUpdated": "2026-03-22T12:00:00Z"
}