Notes and documentation covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices for the Akka.NET framework used throughout the ScadaLink system.
6.3 KiB
15 — Coordination (Akka.Coordination)
Overview
Akka.Coordination provides lease-based distributed locking primitives. A lease is a time-bounded lock that a node acquires from an external store. It is used by the Split Brain Resolver, Cluster Sharding, and Cluster Singleton to prevent split-brain scenarios by ensuring only the node holding the lease can act as the leader or singleton host.
In the SCADA system's 2-node topology, lease-based coordination addresses the fundamental challenge of 2-node split-brain resolution: without a third-party arbiter, two partitioned nodes cannot determine which should survive.
When to Use
- If the
keep-oldestSBR strategy withdown-if-alone = on(see 03-Cluster.md) is insufficient — specifically, if the scenario where both nodes down themselves during a partition is unacceptable - When a shared resource (file share, database) is available to serve as the lease store
- For stronger singleton guarantees — ensuring only the lease holder can run the Device Manager singleton
When Not to Use
- If the site has no shared resource accessible by both nodes — a lease requires an external store
- If the
keep-oldeststrategy with automatic Windows Service restart is acceptable for your availability requirements - If the added dependency on the lease store introduces more risk than it mitigates (lease store becomes a single point of failure)
Design Decisions for the SCADA System
Lease Store Options
For Windows Server deployments without cloud services:
Option A: SMB File Share Lease
If the site has a shared filesystem (NAS, Windows file server), a file-based lease can work. However, Akka.NET does not ship a file-based lease implementation — a custom one would need to be built.
Option B: SQL Server Lease (Where Available)
For sites with SQL Server, use a database-backed lease. This provides strong consistency guarantees:
// Custom lease implementation using SQL Server
public class SqlServerLease : Lease
{
// Acquire: INSERT with optimistic concurrency
// Heartbeat: UPDATE timestamp periodically
// Release: DELETE the lease row
// Check: SELECT where timestamp is recent
}
Option C: Azure Blob Storage Lease (Akka.Coordination.Azure)
If the site has Azure connectivity, Akka.Coordination.Azure provides a production-ready lease implementation using Azure Blob Storage:
akkaBuilder.WithClusterBootstrap(options =>
{
// Configure Azure lease for SBR
});
Current Recommendation:
For most on-premise SCADA sites without cloud access, use the keep-oldest SBR strategy without a lease, and rely on Windows Service auto-restart to recover from the "both nodes downed" scenario. The recovery time is longer (~1–2 minutes for service restart + cluster reformation) but avoids the lease store dependency.
If faster recovery or stronger guarantees are needed, implement a SQL Server-backed lease for sites that have SQL Server.
Lease-Based SBR Configuration
If a lease is available:
akka.cluster {
downing-provider-class = "Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster"
split-brain-resolver {
active-strategy = lease-majority
lease-majority {
lease-implementation = "custom-sql-lease"
acquire-lease-delay-for-minority = 5s
}
}
}
Lease for Cluster Singleton
The Singleton can be configured to require a lease before starting. This provides an additional guarantee against duplicate singletons:
akka.cluster.singleton {
use-lease = "custom-sql-lease"
lease-retry-interval = 5s
}
Common Patterns
Lease Heartbeat
Leases are time-bounded. The holder must periodically renew (heartbeat) the lease. If the holder crashes without releasing, the lease expires after the heartbeat timeout, allowing another node to acquire it:
Heartbeat interval: 12s (default)
Heartbeat timeout: 120s (default)
This means after a node crash, it takes up to 120 seconds for the lease to expire and the standby to acquire it. Reduce the timeout for faster failover, but be cautious of network glitches causing false lease expiry:
custom-sql-lease {
heartbeat-timeout = 30s
heartbeat-interval = 5s
lease-operation-timeout = 5s
}
Lease + SBR Interaction
When using lease-based SBR in a 2-node cluster:
- Both nodes attempt to acquire the lease on startup
- Only one succeeds — this node becomes the leader
- During a partition, both nodes attempt to renew/acquire the lease
- The node that holds the lease survives; the other downs itself
- On the surviving node, the Singleton continues running
- No "both nodes down" scenario occurs (as long as the lease store is reachable)
Anti-Patterns
Lease Store on One of the Two SCADA Nodes
Never host the lease store on one of the SCADA nodes themselves. If that node goes down, the lease store goes with it, and the surviving node cannot acquire the lease. The lease store must be on an independent resource.
Very Short Lease Timeouts
Setting heartbeat-timeout below 10 seconds risks false lease expiry during garbage collection pauses, network blips, or high CPU load. This would cause the singleton to stop unnecessarily.
Assuming the Lease Prevents All Split-Brain
The lease only works if both nodes can reach the lease store. If the lease store itself is partitioned from one node, that node cannot acquire the lease and will down itself — even if it's otherwise healthy. Consider lease store availability as part of the system's overall availability design.
Configuration Guidance
Without Lease (Current Default)
No coordination configuration needed. Use keep-oldest SBR as described in 03-Cluster.md.
With SQL Server Lease (Future Enhancement)
custom-sql-lease {
lease-class = "ScadaSystem.SqlServerLease, ScadaSystem"
heartbeat-timeout = 30s
heartbeat-interval = 5s
lease-operation-timeout = 5s
}
akka.cluster.split-brain-resolver {
active-strategy = lease-majority
lease-majority {
lease-implementation = "custom-sql-lease"
}
}
References
- Akka Coordination Concepts: https://doc.akka.io/docs/akka/current/coordination.html
- Akka.Coordination.Azure: https://github.com/akkadotnet/Akka.Management
- Split Brain Resolver: https://getakka.net/articles/clustering/split-brain-resolver.html