Files
scadalink-design/AkkaDotNet/04-ClusterSharding.md
Joseph Doherty de636b908b Add Akka.NET reference documentation
Notes and documentation covering actors, remoting, clustering, persistence,
streams, serialization, hosting, testing, and best practices for the Akka.NET
framework used throughout the ScadaLink system.
2026-03-16 09:08:17 -04:00

4.9 KiB
Raw Permalink Blame History

04 — Cluster Sharding (Akka.Cluster.Sharding)

Overview

Cluster Sharding distributes a set of actors (called "entities") across cluster members using a consistent hashing strategy. Entities are identified by a unique ID, and the sharding infrastructure ensures each entity exists on exactly one node at a time, automatically rebalancing when nodes join or leave.

In our SCADA system, Cluster Sharding is a candidate for managing device actors across the cluster — but with a 2-node active/cold-standby topology where only the active node communicates with equipment, the value proposition is nuanced.

When to Use

  • If the system evolves beyond a strict active/standby model to allow both nodes to handle subsets of devices
  • If device counts grow beyond what a single node can handle (significantly more than 500 devices with high-frequency tag updates)
  • If the architecture shifts to active/active with partitioned device ownership

When Not to Use

  • In the current design (active/cold-standby), Cluster Singleton is a better fit for owning all device communication on a single node
  • Sharding adds complexity (shard coordinators, rebalancing, remember-entities configuration) that is unnecessary when one node does all the work
  • With only 2 nodes, rebalancing is not meaningful — entities would just move from one node to the other

Design Decisions for the SCADA System

Current Recommendation: Do Not Use Sharding

For the current architecture where one node is active and the other is a cold standby, Cluster Sharding adds overhead without benefit. The active node hosts all device actors via the Singleton pattern, and on failover, all actors restart on the new active node.

Future Consideration: Active/Active Migration Path

If the system eventually needs to scale beyond one node's capacity, Sharding provides a clean migration path. Each device becomes a sharded entity identified by its device ID:

// Hypothetical future sharding setup
var sharding = ClusterSharding.Get(system);
var deviceRegion = sharding.Start(
    typeName: "Device",
    entityPropsFactory: entityId => Props.Create(() => new DeviceActor(entityId)),
    settings: ClusterShardingSettings.Create(system),
    messageExtractor: new DeviceMessageExtractor()
);

The message extractor would route based on device ID:

public class DeviceMessageExtractor : HashCodeMessageExtractor
{
    public DeviceMessageExtractor() : base(maxNumberOfShards: 100) { }

    public override string EntityId(object message) => message switch
    {
        IDeviceMessage m => m.DeviceId,
        _ => null
    };
}

If Sharding Is Adopted: Remember Entities

For SCADA, if sharding is used, enable remember-entities so that device actors are automatically restarted after rebalancing without waiting for a new message:

akka.cluster.sharding {
  remember-entities = on
  remember-entities-store = "eventsourced"  # Requires Akka.Persistence
}

This is important because device actors need to maintain their tag subscriptions — they can't wait for an external trigger to restart.

Common Patterns

Sharding supports passivating idle entities to save memory. For SCADA device actors, passivation is inappropriate — devices need continuous monitoring regardless of command activity. If sharding is used, disable passivation or set very long timeouts:

akka.cluster.sharding {
  passivate-idle-entity-after = off
}

Shard Count Sizing

If sharding is adopted, the number of shards should be significantly larger than the number of nodes but not excessively large. For 2 nodes with 500 devices, 50100 shards is reasonable. The shard count is permanent and cannot be changed without data migration.

Anti-Patterns

Sharding as a Default Choice

Do not adopt Sharding just because it's available. For a 2-node active/standby system, it adds a shard coordinator (which itself is a Singleton), persistence requirements for shard state, and rebalancing logic — all of which increase failure surface area without providing benefit.

Mixing Sharding and Singleton for the Same Concern

If device actors are managed by Sharding, do not also wrap them in a Singleton. These are alternative distribution strategies. Pick one.

Configuration Guidance

If Sharding is adopted in the future, here is a starting configuration for the SCADA system:

akka.cluster.sharding {
  guardian-name = "sharding"
  role = "scada-node"
  remember-entities = on
  remember-entities-store = "eventsourced"
  passivate-idle-entity-after = off

  # For 2-node cluster, keep coordinator on oldest node
  least-shard-allocation-strategy {
    rebalance-absolute-limit = 5
    rebalance-relative-limit = 0.1
  }
}

References