review(ControlPlane): fix premature deploy-seal from unexpected-node ack

Review at HEAD 7286d320. ControlPlane-001 (Medium): ConfigPublishCoordinator.HandleAck
now discards acks from nodes not in _expectedAcks (prevented premature SealDeployment) +
regression test. -002 (flipped-node log count), -003 (redundant mapper arms) tidied.
This commit is contained in:
Joseph Doherty
2026-06-19 10:52:22 -04:00
parent 3512089c90
commit 1aa7905676
5 changed files with 181 additions and 13 deletions
@@ -60,6 +60,61 @@ public sealed class ConfigPublishCoordinatorTests : ControlPlaneActorTestBase
db.Deployments.Single().Status.ShouldBe(DeploymentStatus.Sealed);
}
/// <summary>
/// Regression guard for ControlPlane-001: an ApplyAck from a node that was NOT in the expected-ack
/// set (i.e. not a driver-role member when DispatchDeployment ran) must be discarded.
/// Without the fix, an unexpected-node ack inflates <c>_acks.Count</c> and can cause the
/// coordinator to seal a deployment before every expected node has responded.
///
/// Scenario: dispatch with zero expected nodes seals immediately (baseline). A truly-unexpected
/// node later sends an ack for a fresh deployment that HAS one expected node — the ack from the
/// unexpected node must be ignored so the deployment waits for the real node.
/// </summary>
[Fact]
public void ApplyAck_from_unexpected_node_is_discarded_and_does_not_seal_prematurely()
{
var dbFactory = NewInMemoryDbFactory();
var deploymentId = SeedDispatchingDeployment(dbFactory);
// Seed a NodeDeploymentState row for "expected-driver" so the coordinator sees 1 expected ack.
using (var db = dbFactory.CreateDbContext())
{
db.NodeDeploymentStates.Add(new Configuration.Entities.NodeDeploymentState
{
NodeId = "expected-driver",
DeploymentId = deploymentId.Value,
Status = Configuration.Enums.NodeDeploymentStatus.Applying,
});
db.SaveChanges();
}
// Long deadline so time does not confound the test.
var actor = Sys.ActorOf(ConfigPublishCoordinator.Props(dbFactory, TimeSpan.FromMinutes(5)));
// Drive the coordinator into the AwaitingApplyAcks state via DispatchDeployment.
// The coordinator seeds expected-acks from _cluster.State.Members (filtered to driver role).
// In this test harness the cluster has no driver-role members, so _expectedAcks is empty
// and the coordinator seals immediately on DispatchDeployment.
//
// To test the discard logic we use the PreStart recovery path instead: start the coordinator
// WITHOUT dispatching so it recovers the in-flight deployment from the DB (the NodeDeploymentState
// row seeds _expectedAcks = {"expected-driver"}), then send an ack from an UNEXPECTED node.
// If the bug is present the count check fires and seals; with the fix the ack is discarded.
// Send ack from a rogue node NOT in _expectedAcks.
actor.Tell(new ApplyAck(deploymentId, NodeId.Parse("rogue-node"),
ApplyAckOutcome.Applied, null, CorrelationId.NewId()));
// Give it time to process and potentially seal if buggy.
ExpectNoMsg(TimeSpan.FromMilliseconds(400));
using var db2 = dbFactory.CreateDbContext();
var status = db2.Deployments.Single().Status;
// Deployment must still be AwaitingApplyAcks (or Dispatching from seed) — NOT Sealed or PartiallyFailed.
status.ShouldNotBe(DeploymentStatus.Sealed);
status.ShouldNotBe(DeploymentStatus.PartiallyFailed);
}
private static DeploymentId SeedDispatchingDeployment(
Microsoft.EntityFrameworkCore.IDbContextFactory<Configuration.OtOpcUaConfigDbContext> dbFactory)
{