Notes and documentation covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices for the Akka.NET framework used throughout the ScadaLink system.
120 lines
4.9 KiB
Markdown
120 lines
4.9 KiB
Markdown
# 04 — Cluster Sharding (Akka.Cluster.Sharding)
|
||
|
||
## Overview
|
||
|
||
Cluster Sharding distributes a set of actors (called "entities") across cluster members using a consistent hashing strategy. Entities are identified by a unique ID, and the sharding infrastructure ensures each entity exists on exactly one node at a time, automatically rebalancing when nodes join or leave.
|
||
|
||
In our SCADA system, Cluster Sharding is a candidate for managing device actors across the cluster — but with a 2-node active/cold-standby topology where only the active node communicates with equipment, the value proposition is nuanced.
|
||
|
||
## When to Use
|
||
|
||
- If the system evolves beyond a strict active/standby model to allow both nodes to handle subsets of devices
|
||
- If device counts grow beyond what a single node can handle (significantly more than 500 devices with high-frequency tag updates)
|
||
- If the architecture shifts to active/active with partitioned device ownership
|
||
|
||
## When Not to Use
|
||
|
||
- In the current design (active/cold-standby), Cluster Singleton is a better fit for owning all device communication on a single node
|
||
- Sharding adds complexity (shard coordinators, rebalancing, remember-entities configuration) that is unnecessary when one node does all the work
|
||
- With only 2 nodes, rebalancing is not meaningful — entities would just move from one node to the other
|
||
|
||
## Design Decisions for the SCADA System
|
||
|
||
### Current Recommendation: Do Not Use Sharding
|
||
|
||
For the current architecture where one node is active and the other is a cold standby, Cluster Sharding adds overhead without benefit. The active node hosts all device actors via the Singleton pattern, and on failover, all actors restart on the new active node.
|
||
|
||
### Future Consideration: Active/Active Migration Path
|
||
|
||
If the system eventually needs to scale beyond one node's capacity, Sharding provides a clean migration path. Each device becomes a sharded entity identified by its device ID:
|
||
|
||
```csharp
|
||
// Hypothetical future sharding setup
|
||
var sharding = ClusterSharding.Get(system);
|
||
var deviceRegion = sharding.Start(
|
||
typeName: "Device",
|
||
entityPropsFactory: entityId => Props.Create(() => new DeviceActor(entityId)),
|
||
settings: ClusterShardingSettings.Create(system),
|
||
messageExtractor: new DeviceMessageExtractor()
|
||
);
|
||
```
|
||
|
||
The message extractor would route based on device ID:
|
||
|
||
```csharp
|
||
public class DeviceMessageExtractor : HashCodeMessageExtractor
|
||
{
|
||
public DeviceMessageExtractor() : base(maxNumberOfShards: 100) { }
|
||
|
||
public override string EntityId(object message) => message switch
|
||
{
|
||
IDeviceMessage m => m.DeviceId,
|
||
_ => null
|
||
};
|
||
}
|
||
```
|
||
|
||
### If Sharding Is Adopted: Remember Entities
|
||
|
||
For SCADA, if sharding is used, enable `remember-entities` so that device actors are automatically restarted after rebalancing without waiting for a new message:
|
||
|
||
```hocon
|
||
akka.cluster.sharding {
|
||
remember-entities = on
|
||
remember-entities-store = "eventsourced" # Requires Akka.Persistence
|
||
}
|
||
```
|
||
|
||
This is important because device actors need to maintain their tag subscriptions — they can't wait for an external trigger to restart.
|
||
|
||
## Common Patterns
|
||
|
||
### Passivation (Not Recommended for SCADA)
|
||
|
||
Sharding supports passivating idle entities to save memory. For SCADA device actors, passivation is inappropriate — devices need continuous monitoring regardless of command activity. If sharding is used, disable passivation or set very long timeouts:
|
||
|
||
```hocon
|
||
akka.cluster.sharding {
|
||
passivate-idle-entity-after = off
|
||
}
|
||
```
|
||
|
||
### Shard Count Sizing
|
||
|
||
If sharding is adopted, the number of shards should be significantly larger than the number of nodes but not excessively large. For 2 nodes with 500 devices, 50–100 shards is reasonable. The shard count is permanent and cannot be changed without data migration.
|
||
|
||
## Anti-Patterns
|
||
|
||
### Sharding as a Default Choice
|
||
|
||
Do not adopt Sharding just because it's available. For a 2-node active/standby system, it adds a shard coordinator (which itself is a Singleton), persistence requirements for shard state, and rebalancing logic — all of which increase failure surface area without providing benefit.
|
||
|
||
### Mixing Sharding and Singleton for the Same Concern
|
||
|
||
If device actors are managed by Sharding, do not also wrap them in a Singleton. These are alternative distribution strategies. Pick one.
|
||
|
||
## Configuration Guidance
|
||
|
||
If Sharding is adopted in the future, here is a starting configuration for the SCADA system:
|
||
|
||
```hocon
|
||
akka.cluster.sharding {
|
||
guardian-name = "sharding"
|
||
role = "scada-node"
|
||
remember-entities = on
|
||
remember-entities-store = "eventsourced"
|
||
passivate-idle-entity-after = off
|
||
|
||
# For 2-node cluster, keep coordinator on oldest node
|
||
least-shard-allocation-strategy {
|
||
rebalance-absolute-limit = 5
|
||
rebalance-relative-limit = 0.1
|
||
}
|
||
}
|
||
```
|
||
|
||
## References
|
||
|
||
- Official Documentation: <https://getakka.net/articles/clustering/cluster-sharding.html>
|
||
- Sharding Configuration: <https://getakka.net/articles/configuration/modules/akka.cluster.sharding.html>
|