# 07 — Cluster Metrics (Akka.Cluster.Metrics)

## Overview

Akka.Cluster.Metrics periodically collects node-level resource metrics (CPU usage, memory consumption) from each cluster member and publishes them across the cluster. It can also drive adaptive load-balancing routers that route messages to the least-loaded node.

In our 2-node SCADA system, Cluster Metrics serves primarily as a health monitoring input — providing visibility into whether the active node is under resource pressure. The adaptive routing capability is less relevant since only one node is actively processing.

## When to Use

- Monitoring active node resource consumption (CPU, memory) as a health indicator
- Triggering alerts if the active node approaches resource limits (e.g., memory pressure from too many device actors or tag subscriptions)
- Feeding operational dashboards that display cluster health

## When Not to Use

- Adaptive load-balancing routing — irrelevant in an active/standby topology since the standby does no work
- As a replacement for proper application-level metrics (device connection counts, command throughput, alarm rates) — Cluster Metrics only covers infrastructure-level metrics

## Design Decisions for the SCADA System

### Enable for Monitoring, Not Routing

Install Cluster Metrics on both nodes to collect health data, but do not configure metrics-based routers. The standby node will report minimal resource usage (it's idle); the active node's metrics are the useful signal.

### Metrics Listener Actor

Create a metrics listener that logs warnings when the active node exceeds thresholds:

```csharp
public class MetricsMonitorActor : ReceiveActor
{
    private readonly ClusterMetrics _metrics = ClusterMetrics.Get(Context.System);
    private readonly Cluster _cluster = Cluster.Get(Context.System);

    public MetricsMonitorActor()
    {
        Receive<ClusterMetricsChanged>(changed =>
        {
            foreach (var nodeMetrics in changed.NodeMetrics)
            {
                if (nodeMetrics.Address.Equals(_cluster.SelfAddress))
                {
                    CheckMemory(nodeMetrics);
                    CheckCpu(nodeMetrics);
                }
            }
        });
    }

    protected override void PreStart()
    {
        _metrics.Subscribe(Self);
    }

    protected override void PostStop()
    {
        _metrics.Unsubscribe(Self);
    }

    private void CheckMemory(NodeMetrics metrics)
    {
        // Alert if memory usage exceeds 80%
        if (metrics.Metric("MemoryUsed") is { } mem)
        {
            // Log or publish alert
        }
    }
}
```

### Collection Interval

The default collection interval (3 seconds) is adequate for SCADA monitoring. Faster collection wastes resources; slower collection misses transient spikes. Leave the default unless specific monitoring needs dictate otherwise.

## Common Patterns

### Forwarding Metrics to External Monitoring

In production SCADA deployments, metrics should feed into the site's existing monitoring infrastructure (Windows Performance Counters, Prometheus, Grafana, or SCADA historian). The metrics listener actor can bridge Cluster Metrics data to these external systems.

### Memory Pressure Early Warning

Use memory metrics to detect when the active node is approaching limits. If device actor count increases (e.g., new equipment added) and memory grows toward the machine's limit, the metrics listener can log warnings before out-of-memory conditions cause crashes.

## Anti-Patterns

### Using Metrics for Failover Decisions

Do not use Cluster Metrics to trigger failover. Failover should be driven by Cluster membership events (unreachable/down). High CPU or memory on the active node does not mean it should hand off to the standby — it means the active node needs attention (configuration tuning, resource allocation).

### Excessive Collection Frequency

Collecting metrics every 100ms creates unnecessary garbage collection pressure from the metrics snapshots. Stick to the default 3-second interval.

## Configuration Guidance

```hocon
akka {
  extensions = ["Akka.Cluster.Metrics.ClusterMetricsExtensionProvider, Akka.Cluster.Metrics"]

  cluster.metrics {
    # Use the default collector
    collector {
      provider = "Akka.Cluster.Metrics.Collectors.DefaultCollector, Akka.Cluster.Metrics"
      sample-interval = 3s
      gossip-interval = 3s
    }

    # Disable the metrics-based router (not needed for active/standby)
    # No router configuration required
  }
}
```

## References

- Official Documentation: <https://getakka.net/articles/clustering/cluster-metrics.html>