Files
scadalink-design/AkkaDotNet/06-ClusterPubSub.md
Joseph Doherty de636b908b Add Akka.NET reference documentation
Notes and documentation covering actors, remoting, clustering, persistence,
streams, serialization, hosting, testing, and best practices for the Akka.NET
framework used throughout the ScadaLink system.
2026-03-16 09:08:17 -04:00

160 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 06 — Cluster Publish-Subscribe (Akka.Cluster.Tools)
## Overview
Cluster Publish-Subscribe provides a distributed message broker within the Akka.NET cluster. Actors can publish messages to named topics, and any actor in the cluster that has subscribed to that topic receives the message. It also supports "send to one subscriber" semantics for load-balanced distribution.
In the SCADA system, Pub-Sub serves as the event distribution backbone — enabling the active node's device actors to broadcast tag updates, alarms, and status changes to subscribers on both nodes (e.g., a logging actor on the standby node, or a monitoring dashboard service).
## When to Use
- Broadcasting alarm events from device actors to all interested subscribers (alarm processors, historians, UI notifiers)
- Distributing tag value changes to monitoring/dashboard actors
- Decoupling event producers (device actors) from consumers (alarm handlers, loggers, external integrations)
- Cross-node event distribution where the producer doesn't need to know about specific consumers
## When Not to Use
- Point-to-point command delivery — use direct actor references or the Singleton Proxy instead
- High-frequency, high-volume data streams — use Akka.Streams for backpressure support
- Guaranteed delivery — Pub-Sub is at-most-once; if a subscriber is down, it misses the message
## Design Decisions for the SCADA System
### Topic Structure
Define topics by event category, not by device. This keeps subscription management simple and avoids an explosion of topics (500 devices × N event types):
```
Topics:
"alarms" — all alarm raise/clear events
"tag-updates" — tag value changes (filtered by subscriber if needed)
"device-status" — connection up/down, device faulted
"commands" — command dispatched/acknowledged (for audit/logging)
```
### DistributedPubSub Mediator
Access the mediator via the `DistributedPubSub` extension:
```csharp
var mediator = DistributedPubSub.Get(Context.System).Mediator;
// Publishing
mediator.Tell(new Publish("alarms", new AlarmRaised("machine-001", "HighTemp", DateTime.UtcNow)));
// Subscribing
mediator.Tell(new Subscribe("alarms", Self));
```
### Subscribing from the Standby Node
Even though the standby node is "cold" (not running device actors), it can still subscribe to topics. This enables a monitoring or logging actor on the standby to receive events from the active node's device actors. This is useful for keeping a warm audit log on both nodes:
```csharp
// On the standby node
public class AuditLogActor : ReceiveActor
{
public AuditLogActor()
{
var mediator = DistributedPubSub.Get(Context.System).Mediator;
mediator.Tell(new Subscribe("alarms", Self));
mediator.Tell(new Subscribe("commands", Self));
Receive<AlarmRaised>(msg => WriteToLocalLog(msg));
Receive<CommandDispatched>(msg => WriteToLocalLog(msg));
}
}
```
### Message Filtering
Pub-Sub delivers all messages on a topic to all subscribers. If a subscriber only cares about a subset (e.g., alarms from a specific device group), filter inside the subscriber actor rather than creating per-device topics:
```csharp
Receive<AlarmRaised>(msg =>
{
if (_monitoredDeviceGroup.Contains(msg.DeviceId))
ProcessAlarm(msg);
// else: ignore
});
```
## Common Patterns
### Topic Groups for Load Distribution
If multiple actors should share the processing of a topic (only one handles each message), use `SendToAll = false` with group-based routing. This is useful for distributing alarm processing across multiple handler actors:
```csharp
// Subscribe with a group name — messages are sent to one member of the group
mediator.Tell(new Subscribe("alarms", Self, "alarm-processors"));
```
### Unsubscribe on Actor Stop
Pub-Sub automatically cleans up subscriptions when an actor is terminated, but explicitly unsubscribing during graceful shutdown ensures clean behavior:
```csharp
protected override void PostStop()
{
var mediator = DistributedPubSub.Get(Context.System).Mediator;
mediator.Tell(new Unsubscribe("alarms", Self));
base.PostStop();
}
```
### Event Envelope Pattern
Wrap all published events in a common envelope that includes metadata (source device, timestamp, sequence number). This makes it easier for subscribers to filter, order, and deduplicate:
```csharp
public record ScadaEvent(string DeviceId, DateTime Timestamp, long SequenceNr, object Payload);
// Publishing
mediator.Tell(new Publish("alarms",
new ScadaEvent("machine-001", DateTime.UtcNow, _seqNr++, new AlarmRaised("HighTemp"))));
```
## Anti-Patterns
### Using Pub-Sub for Commands
Commands (write a tag value, start/stop equipment) should not go through Pub-Sub. Commands need a specific target and acknowledgment. Use direct actor references or the Singleton Proxy for command delivery.
### Publishing High-Frequency Data Without Throttling
If a device updates a tag value every 100ms and publishes each update to Pub-Sub, subscribers can be overwhelmed. Throttle at the publisher: only publish on significant value changes (deadband) or at a reduced rate.
### Relying on Pub-Sub for Critical Event Delivery
Pub-Sub provides at-most-once delivery. If the subscriber is temporarily unreachable (e.g., during singleton hand-over), events are lost. For events that must not be lost (safety alarms, command audit trail), persist them to the journal first, then publish to Pub-Sub as a notification.
## Configuration Guidance
```hocon
akka.cluster.pub-sub {
# Actor name of the mediator
name = "distributedPubSubMediator"
# The role to use for the mediator — should match our cluster role
role = "scada-node"
# How often to gossip subscription state between nodes
gossip-interval = 1s
# How long to buffer messages to unreachable nodes
removed-time-to-live = 120s
# Maximum delta elements to transfer in one round of gossip
max-delta-elements = 3000
}
```
For a 2-node cluster, the defaults are fine. The `gossip-interval` of 1 second ensures subscription changes propagate quickly between the pair.
## References
- Official Documentation: <https://getakka.net/articles/clustering/distributed-publish-subscribe.html>
- Cluster Tools Configuration: <https://getakka.net/articles/configuration/modules/akka.cluster.tools.html>