scadalink-design/AkkaDotNet/04-ClusterSharding.md

# 04 — Cluster Sharding (Akka.Cluster.Sharding)

## Overview

Cluster Sharding distributes a set of actors (called "entities") across cluster members using a consistent hashing strategy. Entities are identified by a unique ID, and the sharding infrastructure ensures each entity exists on exactly one node at a time, automatically rebalancing when nodes join or leave.

In our SCADA system, Cluster Sharding is a candidate for managing device actors across the cluster — but with a 2-node active/cold-standby topology where only the active node communicates with equipment, the value proposition is nuanced.

## When to Use

- If the system evolves beyond a strict active/standby model to allow both nodes to handle subsets of devices
- If device counts grow beyond what a single node can handle (significantly more than 500 devices with high-frequency tag updates)
- If the architecture shifts to active/active with partitioned device ownership

## When Not to Use

- In the current design (active/cold-standby), Cluster Singleton is a better fit for owning all device communication on a single node
- Sharding adds complexity (shard coordinators, rebalancing, remember-entities configuration) that is unnecessary when one node does all the work
- With only 2 nodes, rebalancing is not meaningful — entities would just move from one node to the other

## Design Decisions for the SCADA System

### Current Recommendation: Do Not Use Sharding

For the current architecture where one node is active and the other is a cold standby, Cluster Sharding adds overhead without benefit. The active node hosts all device actors via the Singleton pattern, and on failover, all actors restart on the new active node.

### Future Consideration: Active/Active Migration Path

If the system eventually needs to scale beyond one node's capacity, Sharding provides a clean migration path. Each device becomes a sharded entity identified by its device ID:

```csharp
// Hypothetical future sharding setup
var sharding = ClusterSharding.Get(system);
var deviceRegion = sharding.Start(
    typeName: "Device",
    entityPropsFactory: entityId => Props.Create(() => new DeviceActor(entityId)),
    settings: ClusterShardingSettings.Create(system),
    messageExtractor: new DeviceMessageExtractor()
);
```

The message extractor would route based on device ID:

```csharp
public class DeviceMessageExtractor : HashCodeMessageExtractor
{
    public DeviceMessageExtractor() : base(maxNumberOfShards: 100) { }

    public override string EntityId(object message) => message switch
    {
        IDeviceMessage m => m.DeviceId,
        _ => null
    };
}
```

### If Sharding Is Adopted: Remember Entities

For SCADA, if sharding is used, enable `remember-entities` so that device actors are automatically restarted after rebalancing without waiting for a new message:

```hocon
akka.cluster.sharding {
  remember-entities = on
  remember-entities-store = "eventsourced"  # Requires Akka.Persistence
}
```

This is important because device actors need to maintain their tag subscriptions — they can't wait for an external trigger to restart.

## Common Patterns

### Passivation (Not Recommended for SCADA)

Sharding supports passivating idle entities to save memory. For SCADA device actors, passivation is inappropriate — devices need continuous monitoring regardless of command activity. If sharding is used, disable passivation or set very long timeouts:

```hocon
akka.cluster.sharding {
  passivate-idle-entity-after = off
}
```

### Shard Count Sizing

If sharding is adopted, the number of shards should be significantly larger than the number of nodes but not excessively large. For 2 nodes with 500 devices, 50–100 shards is reasonable. The shard count is permanent and cannot be changed without data migration.

## Anti-Patterns

### Sharding as a Default Choice

Do not adopt Sharding just because it's available. For a 2-node active/standby system, it adds a shard coordinator (which itself is a Singleton), persistence requirements for shard state, and rebalancing logic — all of which increase failure surface area without providing benefit.

### Mixing Sharding and Singleton for the Same Concern

If device actors are managed by Sharding, do not also wrap them in a Singleton. These are alternative distribution strategies. Pick one.

## Configuration Guidance

If Sharding is adopted in the future, here is a starting configuration for the SCADA system:

```hocon
akka.cluster.sharding {
  guardian-name = "sharding"
  role = "scada-node"
  remember-entities = on
  remember-entities-store = "eventsourced"
  passivate-idle-entity-after = off

  # For 2-node cluster, keep coordinator on oldest node
  least-shard-allocation-strategy {
    rebalance-absolute-limit = 5
    rebalance-relative-limit = 0.1
  }
}
```

## References

- Official Documentation: <https://getakka.net/articles/clustering/cluster-sharding.html>
- Sharding Configuration: <https://getakka.net/articles/configuration/modules/akka.cluster.sharding.html>