Four new docs at docs/v2/ giving a single-page tour of each v2 piece: - Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit) - Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap - ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery - Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map Each links back to the design doc for depth. Architecture-v2 cross-references the other three + ServiceHosting + Redundancy + security.
5.4 KiB
OtOpcUa.ControlPlane
Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/.
Singletons
| Actor | File | Marker key | Role |
|---|---|---|---|
ConfigPublishCoordinator |
Coordinators/ConfigPublishCoordinator.cs |
ConfigPublishCoordinatorKey |
Dispatches DispatchDeployment, collects ApplyAcks, seals/fails/times-out. |
AdminOperationsActor |
AdminOperations/AdminOperationsActor.cs |
AdminOperationsActorKey |
Receives StartDeployment from the UI, snapshots ConfigDb via ConfigComposer, persists Deployment row + ConfigEdit marker, tells the coordinator to dispatch. |
AuditWriterActor |
Audit/AuditWriterActor.cs |
AuditWriterActorKey |
Batched ConfigAuditLog writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. |
FleetStatusBroadcaster |
Fleet/FleetStatusBroadcaster.cs |
FleetStatusBroadcasterKey |
Aggregates per-node FleetNodeStatus heartbeats; publishes FleetStatusChanged on the fleet-status DPS topic (SignalR bridge tracked as F16). |
RedundancyStateActor |
Redundancy/RedundancyStateActor.cs |
RedundancyStateActorKey |
Cluster-event subscriber; debounces 250 ms; publishes RedundancyStateChanged on the redundancy-state DPS topic. |
All five register via WithOtOpcUaControlPlaneSingletons() (extension on AkkaConfigurationBuilder). Each uses ClusterSingletonOptions { Role = "admin" } so the singleton runs on the admin role-leader and migrates to the next admin node on failover.
// Program.cs (admin role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
ab.WithOtOpcUaClusterBootstrap(sp);
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});
Resolve from anywhere via IRequiredActor<T> or the ActorRegistry:
public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
{
private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
// ...
}
Deploy flow
UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
│ Ask the AdminOperationsActor singleton proxy
▼
AdminOperationsActor
│ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
│ insert Deployment(Dispatching) + ConfigEdit marker
│ Tell coordinator → DispatchDeployment
▼
ConfigPublishCoordinator
│ DiscoverDriverNodes() → expected ACK set (host:port per member)
│ insert NodeDeploymentState(Applying) per driver
│ Publish DispatchDeployment on "deployments" topic
│ Start apply-deadline timer (2 min default)
▼ DistributedPubSub
DriverHostActor (on each driver node — subscribed to "deployments")
│ PreStart subscribed; current state Steady(rev)
│ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent)
│ else Become(Applying) → write NodeDeploymentStatus → ApplyAck
▼ via "deployment-acks" topic
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
│ PersistNodeAck + collect
│ all-Applied → Sealed
│ any-Failed → PartiallyFailed
│ deadline → TimedOut
The dedicated deployment-acks topic + coordinator subscription was added in commit 5cfbe8b. Before that, ACKs were published back on deployments and the coordinator (not subscribed) silently dropped them — deployments hung at AwaitingApplyAcks forever in multi-node tests.
Failover recovery
If the admin singleton fails over mid-deploy, the new instance's PreStart queries NodeDeploymentState for any Dispatching/AwaitingApplyAcks row, rebuilds _expectedAcks + _acks from persisted state, and resumes the deadline timer. See Coordinators/ConfigPublishCoordinator.cs::PreStart.
ConfigComposer
Pure function SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string):
- Reads every live-edit table.
- Serialises to a stable byte[] (deterministic ordering).
- Computes SHA-256 over the bytes → 64-hex
RevisionHash.
Same DB state → same artifact + same hash. That's what makes the NoChanges outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).
ServiceLevelCalculator
Pure function exposed at Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs). Returns the OPC UA ServiceLevel byte per the truth table in Redundancy.md. No side effects; trivially unit-testable.
DPS topics
| Topic | Publisher | Subscribers |
|---|---|---|
deployments |
ConfigPublishCoordinator | DriverHostActor (per-node) |
deployment-acks |
DriverHostActor | ConfigPublishCoordinator |
fleet-status |
FleetStatusBroadcaster | (SignalR bridge — F16) |
redundancy-state |
RedundancyStateActor | (per-node ServiceLevel calc — F10) |
Tests
tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).
Multi-node tests (cross-ActorSystem) are in tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/.