Files
scadalink-design/AkkaDotNet/06-ClusterPubSub.md
Joseph Doherty de636b908b Add Akka.NET reference documentation
Notes and documentation covering actors, remoting, clustering, persistence,
streams, serialization, hosting, testing, and best practices for the Akka.NET
framework used throughout the ScadaLink system.
2026-03-16 09:08:17 -04:00

6.2 KiB
Raw Blame History

06 — Cluster Publish-Subscribe (Akka.Cluster.Tools)

Overview

Cluster Publish-Subscribe provides a distributed message broker within the Akka.NET cluster. Actors can publish messages to named topics, and any actor in the cluster that has subscribed to that topic receives the message. It also supports "send to one subscriber" semantics for load-balanced distribution.

In the SCADA system, Pub-Sub serves as the event distribution backbone — enabling the active node's device actors to broadcast tag updates, alarms, and status changes to subscribers on both nodes (e.g., a logging actor on the standby node, or a monitoring dashboard service).

When to Use

  • Broadcasting alarm events from device actors to all interested subscribers (alarm processors, historians, UI notifiers)
  • Distributing tag value changes to monitoring/dashboard actors
  • Decoupling event producers (device actors) from consumers (alarm handlers, loggers, external integrations)
  • Cross-node event distribution where the producer doesn't need to know about specific consumers

When Not to Use

  • Point-to-point command delivery — use direct actor references or the Singleton Proxy instead
  • High-frequency, high-volume data streams — use Akka.Streams for backpressure support
  • Guaranteed delivery — Pub-Sub is at-most-once; if a subscriber is down, it misses the message

Design Decisions for the SCADA System

Topic Structure

Define topics by event category, not by device. This keeps subscription management simple and avoids an explosion of topics (500 devices × N event types):

Topics:
  "alarms"           — all alarm raise/clear events
  "tag-updates"      — tag value changes (filtered by subscriber if needed)
  "device-status"    — connection up/down, device faulted
  "commands"         — command dispatched/acknowledged (for audit/logging)

DistributedPubSub Mediator

Access the mediator via the DistributedPubSub extension:

var mediator = DistributedPubSub.Get(Context.System).Mediator;

// Publishing
mediator.Tell(new Publish("alarms", new AlarmRaised("machine-001", "HighTemp", DateTime.UtcNow)));

// Subscribing
mediator.Tell(new Subscribe("alarms", Self));

Subscribing from the Standby Node

Even though the standby node is "cold" (not running device actors), it can still subscribe to topics. This enables a monitoring or logging actor on the standby to receive events from the active node's device actors. This is useful for keeping a warm audit log on both nodes:

// On the standby node
public class AuditLogActor : ReceiveActor
{
    public AuditLogActor()
    {
        var mediator = DistributedPubSub.Get(Context.System).Mediator;
        mediator.Tell(new Subscribe("alarms", Self));
        mediator.Tell(new Subscribe("commands", Self));

        Receive<AlarmRaised>(msg => WriteToLocalLog(msg));
        Receive<CommandDispatched>(msg => WriteToLocalLog(msg));
    }
}

Message Filtering

Pub-Sub delivers all messages on a topic to all subscribers. If a subscriber only cares about a subset (e.g., alarms from a specific device group), filter inside the subscriber actor rather than creating per-device topics:

Receive<AlarmRaised>(msg =>
{
    if (_monitoredDeviceGroup.Contains(msg.DeviceId))
        ProcessAlarm(msg);
    // else: ignore
});

Common Patterns

Topic Groups for Load Distribution

If multiple actors should share the processing of a topic (only one handles each message), use SendToAll = false with group-based routing. This is useful for distributing alarm processing across multiple handler actors:

// Subscribe with a group name — messages are sent to one member of the group
mediator.Tell(new Subscribe("alarms", Self, "alarm-processors"));

Unsubscribe on Actor Stop

Pub-Sub automatically cleans up subscriptions when an actor is terminated, but explicitly unsubscribing during graceful shutdown ensures clean behavior:

protected override void PostStop()
{
    var mediator = DistributedPubSub.Get(Context.System).Mediator;
    mediator.Tell(new Unsubscribe("alarms", Self));
    base.PostStop();
}

Event Envelope Pattern

Wrap all published events in a common envelope that includes metadata (source device, timestamp, sequence number). This makes it easier for subscribers to filter, order, and deduplicate:

public record ScadaEvent(string DeviceId, DateTime Timestamp, long SequenceNr, object Payload);

// Publishing
mediator.Tell(new Publish("alarms",
    new ScadaEvent("machine-001", DateTime.UtcNow, _seqNr++, new AlarmRaised("HighTemp"))));

Anti-Patterns

Using Pub-Sub for Commands

Commands (write a tag value, start/stop equipment) should not go through Pub-Sub. Commands need a specific target and acknowledgment. Use direct actor references or the Singleton Proxy for command delivery.

Publishing High-Frequency Data Without Throttling

If a device updates a tag value every 100ms and publishes each update to Pub-Sub, subscribers can be overwhelmed. Throttle at the publisher: only publish on significant value changes (deadband) or at a reduced rate.

Relying on Pub-Sub for Critical Event Delivery

Pub-Sub provides at-most-once delivery. If the subscriber is temporarily unreachable (e.g., during singleton hand-over), events are lost. For events that must not be lost (safety alarms, command audit trail), persist them to the journal first, then publish to Pub-Sub as a notification.

Configuration Guidance

akka.cluster.pub-sub {
  # Actor name of the mediator
  name = "distributedPubSubMediator"

  # The role to use for the mediator — should match our cluster role
  role = "scada-node"

  # How often to gossip subscription state between nodes
  gossip-interval = 1s

  # How long to buffer messages to unreachable nodes
  removed-time-to-live = 120s

  # Maximum delta elements to transfer in one round of gossip
  max-delta-elements = 3000
}

For a 2-node cluster, the defaults are fine. The gossip-interval of 1 second ensures subscription changes propagate quickly between the pair.

References