Files
lmxopcua/docs/v2/ControlPlane.md
Joseph Doherty 1689901c0e docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece:
- Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit)
- Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap
- ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery
- Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map

Each links back to the design doc for depth. Architecture-v2 cross-references
the other three + ServiceHosting + Redundancy + security.
2026-05-26 06:41:48 -04:00

5.4 KiB

OtOpcUa.ControlPlane

Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/.

Singletons

Actor File Marker key Role
ConfigPublishCoordinator Coordinators/ConfigPublishCoordinator.cs ConfigPublishCoordinatorKey Dispatches DispatchDeployment, collects ApplyAcks, seals/fails/times-out.
AdminOperationsActor AdminOperations/AdminOperationsActor.cs AdminOperationsActorKey Receives StartDeployment from the UI, snapshots ConfigDb via ConfigComposer, persists Deployment row + ConfigEdit marker, tells the coordinator to dispatch.
AuditWriterActor Audit/AuditWriterActor.cs AuditWriterActorKey Batched ConfigAuditLog writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3.
FleetStatusBroadcaster Fleet/FleetStatusBroadcaster.cs FleetStatusBroadcasterKey Aggregates per-node FleetNodeStatus heartbeats; publishes FleetStatusChanged on the fleet-status DPS topic (SignalR bridge tracked as F16).
RedundancyStateActor Redundancy/RedundancyStateActor.cs RedundancyStateActorKey Cluster-event subscriber; debounces 250 ms; publishes RedundancyStateChanged on the redundancy-state DPS topic.

All five register via WithOtOpcUaControlPlaneSingletons() (extension on AkkaConfigurationBuilder). Each uses ClusterSingletonOptions { Role = "admin" } so the singleton runs on the admin role-leader and migrates to the next admin node on failover.

// Program.cs (admin role only)
builder.Services.AddAkka("otopcua", (ab, sp) =>
{
    ab.WithOtOpcUaClusterBootstrap(sp);
    if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
    if (hasDriver) ab.WithOtOpcUaRuntimeActors();
});

Resolve from anywhere via IRequiredActor<T> or the ActorRegistry:

public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
{
    private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
    // ...
}

Deploy flow

UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
        │  Ask the AdminOperationsActor singleton proxy
        ▼
AdminOperationsActor
        │  ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
        │  insert Deployment(Dispatching) + ConfigEdit marker
        │  Tell coordinator → DispatchDeployment
        ▼
ConfigPublishCoordinator
        │  DiscoverDriverNodes() → expected ACK set (host:port per member)
        │  insert NodeDeploymentState(Applying) per driver
        │  Publish DispatchDeployment on "deployments" topic
        │  Start apply-deadline timer (2 min default)
        ▼  DistributedPubSub
DriverHostActor (on each driver node — subscribed to "deployments")
        │  PreStart subscribed; current state Steady(rev)
        │  if currentRev == msg.rev → immediate ApplyAck(Applied)  (idempotent)
        │  else Become(Applying) → write NodeDeploymentStatus → ApplyAck
        ▼  via "deployment-acks" topic
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
        │  PersistNodeAck + collect
        │  all-Applied → Sealed
        │  any-Failed → PartiallyFailed
        │  deadline → TimedOut

The dedicated deployment-acks topic + coordinator subscription was added in commit 5cfbe8b. Before that, ACKs were published back on deployments and the coordinator (not subscribed) silently dropped them — deployments hung at AwaitingApplyAcks forever in multi-node tests.

Failover recovery

If the admin singleton fails over mid-deploy, the new instance's PreStart queries NodeDeploymentState for any Dispatching/AwaitingApplyAcks row, rebuilds _expectedAcks + _acks from persisted state, and resumes the deadline timer. See Coordinators/ConfigPublishCoordinator.cs::PreStart.

ConfigComposer

Pure function SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string):

  1. Reads every live-edit table.
  2. Serialises to a stable byte[] (deterministic ordering).
  3. Computes SHA-256 over the bytes → 64-hex RevisionHash.

Same DB state → same artifact + same hash. That's what makes the NoChanges outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).

ServiceLevelCalculator

Pure function exposed at Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs). Returns the OPC UA ServiceLevel byte per the truth table in Redundancy.md. No side effects; trivially unit-testable.

DPS topics

Topic Publisher Subscribers
deployments ConfigPublishCoordinator DriverHostActor (per-node)
deployment-acks DriverHostActor ConfigPublishCoordinator
fleet-status FleetStatusBroadcaster (SignalR bridge — F16)
redundancy-state RedundancyStateActor (per-node ServiceLevel calc — F10)

Tests

tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/ — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).

Multi-node tests (cross-ActorSystem) are in tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/.