Notes and documentation covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices for the Akka.NET framework used throughout the ScadaLink system.
7.1 KiB
18 — MultiNodeTestRunner (Akka.MultiNodeTestRunner)
Overview
The MultiNodeTestRunner provides infrastructure for running distributed integration tests across multiple actor systems simultaneously. Each "node" in the test runs in its own process, with full cluster formation, network simulation, and coordinated test assertions. This is the tool for validating failover behavior, split-brain scenarios, and cluster membership transitions.
In the SCADA system, MultiNodeTestRunner is essential for validating the core availability guarantee: that the standby node correctly takes over device communication when the active node fails, without losing or duplicating commands.
When to Use
- Testing failover scenarios (active node crash → standby takes over)
- Validating Split Brain Resolver behavior in the 2-node topology
- Testing Cluster Singleton migration (Device Manager moves to the standby)
- Verifying Distributed Data replication between nodes
- Testing graceful shutdown and rejoin sequences
When Not to Use
- Unit testing individual actor logic — use TestKit
- Integration tests that only need DI — use Hosting.TestKit
- Performance or load testing — MultiNodeTestRunner adds significant overhead from process coordination
Design Decisions for the SCADA System
Key Failover Scenarios to Test
-
Active node hard crash: Kill the active node's process. Verify the standby detects the failure, acquires the singleton, and starts device actors.
-
Active node graceful shutdown: Initiate CoordinatedShutdown on the active node. Verify the singleton migrates cleanly with buffered messages preserved.
-
Network partition (simulated): Prevent the two nodes from communicating. Verify SBR correctly resolves the partition (one node survives, one downs itself).
-
Rejoin after failure: After failover, restart the failed node. Verify it joins the cluster as the new standby without disrupting the active node.
-
Command in-flight during failover: Send a command to the active node, then kill it before the command is acknowledged. Verify the new active node recovers the pending command from the Persistence journal.
Test Structure
public class FailoverSpec : MultiNodeClusterSpec
{
public FailoverSpec() : base(new FailoverSpecConfig()) { }
[MultiNodeFact]
public void Active_node_failure_should_trigger_singleton_migration()
{
// Arrange: Both nodes join cluster
RunOn(() => Cluster.Join(GetAddress(First)), First, Second);
AwaitMembersUp(2);
// Verify singleton is on the first (oldest) node
RunOn(() =>
{
var singleton = Sys.ActorSelection("/user/device-manager");
singleton.Tell(new Identify(1));
var identity = ExpectMsg<ActorIdentity>();
Assert.NotNull(identity.Subject);
}, First);
EnterBarrier("singleton-running");
// Act: Kill the first node
RunOn(() =>
{
TestConductor.Exit(First, 0).Wait();
}, Second);
// Assert: Singleton migrates to second node
RunOn(() =>
{
AwaitAssert(() =>
{
var singleton = Sys.ActorSelection("/user/device-manager");
singleton.Tell(new Identify(2));
var identity = ExpectMsg<ActorIdentity>(TimeSpan.FromSeconds(30));
Assert.NotNull(identity.Subject);
}, TimeSpan.FromSeconds(60));
}, Second);
}
}
Spec Configuration
public class FailoverSpecConfig : MultiNodeConfig
{
public RoleName First { get; }
public RoleName Second { get; }
public FailoverSpecConfig()
{
First = Role("first");
Second = Role("second");
CommonConfig = ConfigurationFactory.ParseString(@"
akka.actor.provider = cluster
akka.remote.dot-netty.tcp.port = 0
akka.cluster {
downing-provider-class = ""Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster""
split-brain-resolver {
active-strategy = keep-oldest
keep-oldest.down-if-alone = on
}
min-nr-of-members = 1
}
");
}
}
Common Patterns
Barriers for Synchronization
Use EnterBarrier to synchronize test steps across nodes:
// Both nodes reach this point before proceeding
EnterBarrier("cluster-formed");
// ... do work ...
EnterBarrier("work-complete");
RunOn for Node-Specific Logic
Execute test logic on specific nodes:
RunOn(() =>
{
// This code runs only on the "first" node
Cluster.Join(GetAddress(First));
}, First);
RunOn(() =>
{
// This code runs only on the "second" node
Cluster.Join(GetAddress(First));
}, Second);
TestConductor for Failure Injection
The TestConductor controls node lifecycle and network simulation:
// Kill a node
TestConductor.Exit(First, exitCode: 0).Wait();
// Simulate network partition (blackhole traffic)
TestConductor.Blackhole(First, Second, ThrottleTransportAdapter.Direction.Both).Wait();
// Restore network
TestConductor.PassThrough(First, Second, ThrottleTransportAdapter.Direction.Both).Wait();
Timeout Handling
Multi-node tests involve network coordination and are inherently slower. Use generous timeouts:
AwaitAssert(() =>
{
// Assertion that may take time (singleton migration, failure detection)
}, max: TimeSpan.FromSeconds(60), interval: TimeSpan.FromSeconds(2));
Anti-Patterns
Testing Everything Multi-Node
Multi-node tests are slow (process startup, cluster formation, barrier synchronization). Only test scenarios that genuinely require multiple nodes: failover, partition handling, data replication. All other tests should use TestKit or Hosting.TestKit.
Brittle Timing Assertions
Do not assert that failover completes in exactly N seconds. Timing varies with machine load, GC pauses, and CI environment. Use AwaitAssert with a generous maximum timeout.
Forgetting Cleanup
Ensure all node processes are terminated after each test. The MultiNodeTestRunner handles this, but custom test infrastructure must clean up explicitly.
Testing with Real Equipment
Multi-node tests should use mock protocol adapters, not real equipment connections. Equipment behavior during test-driven cluster failures could be unpredictable.
Configuration Guidance
Running Multi-Node Tests
# Using the Akka.MultiNodeTestRunner CLI
dotnet tool install --global Akka.MultiNodeTestRunner
# Run tests
mntr run ScadaSystem.MultiNode.Tests.dll
CI/CD Integration
Multi-node tests require multiple processes on the same machine. Ensure the CI agent has sufficient resources and that ports are available (the test runner uses random ports).
Test Project Structure
ScadaSystem.MultiNode.Tests/
Specs/
FailoverSpec.cs
SplitBrainSpec.cs
RejoinSpec.cs
CommandRecoverySpec.cs
Configs/
FailoverSpecConfig.cs
SplitBrainSpecConfig.cs
References
- GitHub: https://github.com/akkadotnet/Akka.MultiNodeTestRunner
- Testing Actor Systems: https://getakka.net/articles/actors/testing-actor-systems.html