test(integration): F22 — failover scenario tests + harness Stop/Restart primitives

Extends TwoNodeClusterHarness with three lifecycle primitives:
- StopNodeBAsync()      — graceful CoordinatedShutdown (Cluster.Leave)
- RestartNodeBAsync()   — rebuild node B on same Akka port + same in-memory DB
- WaitForClusterSizeAsync(n) — converge assertion helper

Adds three failover scenario tests:
- Stopping node B shrinks cluster to 1 Up member
- Restarted node B rejoins on the same Akka port
- Deployment started with B down seals with a single NodeDeploymentState
  (validates ConfigPublishCoordinator.DiscoverDriverNodes snapshots
   membership at dispatch time)

Closes follow-up F22. Integration test count: 6 → 9 (+3).
This commit is contained in:
Joseph Doherty
2026-05-26 07:13:14 -04:00
parent 4e6ef648d1
commit cd5540cb1a
2 changed files with 153 additions and 0 deletions

View File

@@ -88,6 +88,58 @@ public sealed class TwoNodeClusterHarness : IAsyncDisposable
return harness;
}
/// <summary>
/// Gracefully shuts down node B via <see cref="WebApplication.DisposeAsync"/>, which runs
/// CoordinatedShutdown → Cluster.Leave. Node A sees the member transition to Removed within
/// a couple of seconds. Use this for failover scenarios; call <see cref="RestartNodeBAsync"/>
/// to bring it back on the same Akka port.
/// </summary>
public async Task StopNodeBAsync()
{
if (NodeB is null) return;
await NodeB.DisposeAsync();
NodeB = null!;
}
/// <summary>
/// Rebuilds node B on the same Akka port + same in-memory ConfigDb and waits for the cluster
/// to re-converge to 2 Up members. Use after <see cref="StopNodeBAsync"/> to test rejoin.
/// </summary>
public async Task RestartNodeBAsync(TimeSpan? formationTimeout = null)
{
NodeB = await BuildNodeAsync(
host: LoopbackHost,
akkaPort: NodeBAkkaPort,
seedHost: LoopbackHost,
seedAkkaPort: NodeAAkkaPort,
dbName: SharedDbName);
await WaitForClusterFormationAsync(
NodeASystem,
NodeBSystem,
formationTimeout ?? TimeSpan.FromSeconds(20));
}
/// <summary>
/// Waits for node A's cluster view to reach <paramref name="expectedUpMembers"/> members in
/// <see cref="MemberStatus.Up"/>. Used for asserting shrink-after-stop or grow-after-restart.
/// </summary>
public async Task WaitForClusterSizeAsync(int expectedUpMembers, TimeSpan timeout)
{
var deadline = DateTime.UtcNow + timeout;
while (DateTime.UtcNow < deadline)
{
var count = Akka.Cluster.Cluster.Get(NodeASystem).State.Members
.Count(m => m.Status == MemberStatus.Up);
if (count == expectedUpMembers) return;
await Task.Delay(200);
}
var actual = Akka.Cluster.Cluster.Get(NodeASystem).State.Members
.Count(m => m.Status == MemberStatus.Up);
throw new TimeoutException(
$"Cluster did not converge to {expectedUpMembers} Up members within {timeout}. Actual={actual}");
}
private static async Task<WebApplication> BuildNodeAsync(
string host, int akkaPort, string seedHost, int seedAkkaPort, string dbName)
{