Notes and documentation covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices for the Akka.NET framework used throughout the ScadaLink system.
160 lines
6.2 KiB
Markdown
160 lines
6.2 KiB
Markdown
# 06 — Cluster Publish-Subscribe (Akka.Cluster.Tools)
|
||
|
||
## Overview
|
||
|
||
Cluster Publish-Subscribe provides a distributed message broker within the Akka.NET cluster. Actors can publish messages to named topics, and any actor in the cluster that has subscribed to that topic receives the message. It also supports "send to one subscriber" semantics for load-balanced distribution.
|
||
|
||
In the SCADA system, Pub-Sub serves as the event distribution backbone — enabling the active node's device actors to broadcast tag updates, alarms, and status changes to subscribers on both nodes (e.g., a logging actor on the standby node, or a monitoring dashboard service).
|
||
|
||
## When to Use
|
||
|
||
- Broadcasting alarm events from device actors to all interested subscribers (alarm processors, historians, UI notifiers)
|
||
- Distributing tag value changes to monitoring/dashboard actors
|
||
- Decoupling event producers (device actors) from consumers (alarm handlers, loggers, external integrations)
|
||
- Cross-node event distribution where the producer doesn't need to know about specific consumers
|
||
|
||
## When Not to Use
|
||
|
||
- Point-to-point command delivery — use direct actor references or the Singleton Proxy instead
|
||
- High-frequency, high-volume data streams — use Akka.Streams for backpressure support
|
||
- Guaranteed delivery — Pub-Sub is at-most-once; if a subscriber is down, it misses the message
|
||
|
||
## Design Decisions for the SCADA System
|
||
|
||
### Topic Structure
|
||
|
||
Define topics by event category, not by device. This keeps subscription management simple and avoids an explosion of topics (500 devices × N event types):
|
||
|
||
```
|
||
Topics:
|
||
"alarms" — all alarm raise/clear events
|
||
"tag-updates" — tag value changes (filtered by subscriber if needed)
|
||
"device-status" — connection up/down, device faulted
|
||
"commands" — command dispatched/acknowledged (for audit/logging)
|
||
```
|
||
|
||
### DistributedPubSub Mediator
|
||
|
||
Access the mediator via the `DistributedPubSub` extension:
|
||
|
||
```csharp
|
||
var mediator = DistributedPubSub.Get(Context.System).Mediator;
|
||
|
||
// Publishing
|
||
mediator.Tell(new Publish("alarms", new AlarmRaised("machine-001", "HighTemp", DateTime.UtcNow)));
|
||
|
||
// Subscribing
|
||
mediator.Tell(new Subscribe("alarms", Self));
|
||
```
|
||
|
||
### Subscribing from the Standby Node
|
||
|
||
Even though the standby node is "cold" (not running device actors), it can still subscribe to topics. This enables a monitoring or logging actor on the standby to receive events from the active node's device actors. This is useful for keeping a warm audit log on both nodes:
|
||
|
||
```csharp
|
||
// On the standby node
|
||
public class AuditLogActor : ReceiveActor
|
||
{
|
||
public AuditLogActor()
|
||
{
|
||
var mediator = DistributedPubSub.Get(Context.System).Mediator;
|
||
mediator.Tell(new Subscribe("alarms", Self));
|
||
mediator.Tell(new Subscribe("commands", Self));
|
||
|
||
Receive<AlarmRaised>(msg => WriteToLocalLog(msg));
|
||
Receive<CommandDispatched>(msg => WriteToLocalLog(msg));
|
||
}
|
||
}
|
||
```
|
||
|
||
### Message Filtering
|
||
|
||
Pub-Sub delivers all messages on a topic to all subscribers. If a subscriber only cares about a subset (e.g., alarms from a specific device group), filter inside the subscriber actor rather than creating per-device topics:
|
||
|
||
```csharp
|
||
Receive<AlarmRaised>(msg =>
|
||
{
|
||
if (_monitoredDeviceGroup.Contains(msg.DeviceId))
|
||
ProcessAlarm(msg);
|
||
// else: ignore
|
||
});
|
||
```
|
||
|
||
## Common Patterns
|
||
|
||
### Topic Groups for Load Distribution
|
||
|
||
If multiple actors should share the processing of a topic (only one handles each message), use `SendToAll = false` with group-based routing. This is useful for distributing alarm processing across multiple handler actors:
|
||
|
||
```csharp
|
||
// Subscribe with a group name — messages are sent to one member of the group
|
||
mediator.Tell(new Subscribe("alarms", Self, "alarm-processors"));
|
||
```
|
||
|
||
### Unsubscribe on Actor Stop
|
||
|
||
Pub-Sub automatically cleans up subscriptions when an actor is terminated, but explicitly unsubscribing during graceful shutdown ensures clean behavior:
|
||
|
||
```csharp
|
||
protected override void PostStop()
|
||
{
|
||
var mediator = DistributedPubSub.Get(Context.System).Mediator;
|
||
mediator.Tell(new Unsubscribe("alarms", Self));
|
||
base.PostStop();
|
||
}
|
||
```
|
||
|
||
### Event Envelope Pattern
|
||
|
||
Wrap all published events in a common envelope that includes metadata (source device, timestamp, sequence number). This makes it easier for subscribers to filter, order, and deduplicate:
|
||
|
||
```csharp
|
||
public record ScadaEvent(string DeviceId, DateTime Timestamp, long SequenceNr, object Payload);
|
||
|
||
// Publishing
|
||
mediator.Tell(new Publish("alarms",
|
||
new ScadaEvent("machine-001", DateTime.UtcNow, _seqNr++, new AlarmRaised("HighTemp"))));
|
||
```
|
||
|
||
## Anti-Patterns
|
||
|
||
### Using Pub-Sub for Commands
|
||
|
||
Commands (write a tag value, start/stop equipment) should not go through Pub-Sub. Commands need a specific target and acknowledgment. Use direct actor references or the Singleton Proxy for command delivery.
|
||
|
||
### Publishing High-Frequency Data Without Throttling
|
||
|
||
If a device updates a tag value every 100ms and publishes each update to Pub-Sub, subscribers can be overwhelmed. Throttle at the publisher: only publish on significant value changes (deadband) or at a reduced rate.
|
||
|
||
### Relying on Pub-Sub for Critical Event Delivery
|
||
|
||
Pub-Sub provides at-most-once delivery. If the subscriber is temporarily unreachable (e.g., during singleton hand-over), events are lost. For events that must not be lost (safety alarms, command audit trail), persist them to the journal first, then publish to Pub-Sub as a notification.
|
||
|
||
## Configuration Guidance
|
||
|
||
```hocon
|
||
akka.cluster.pub-sub {
|
||
# Actor name of the mediator
|
||
name = "distributedPubSubMediator"
|
||
|
||
# The role to use for the mediator — should match our cluster role
|
||
role = "scada-node"
|
||
|
||
# How often to gossip subscription state between nodes
|
||
gossip-interval = 1s
|
||
|
||
# How long to buffer messages to unreachable nodes
|
||
removed-time-to-live = 120s
|
||
|
||
# Maximum delta elements to transfer in one round of gossip
|
||
max-delta-elements = 3000
|
||
}
|
||
```
|
||
|
||
For a 2-node cluster, the defaults are fine. The `gossip-interval` of 1 second ensures subscription changes propagate quickly between the pair.
|
||
|
||
## References
|
||
|
||
- Official Documentation: <https://getakka.net/articles/clustering/distributed-publish-subscribe.html>
|
||
- Cluster Tools Configuration: <https://getakka.net/articles/configuration/modules/akka.cluster.tools.html>
|