docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece: - Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit) - Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap - ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery - Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map Each links back to the design doc for depth. Architecture-v2 cross-references the other three + ServiceHosting + Redundancy + security.
This commit is contained in:
99
docs/v2/ControlPlane.md
Normal file
99
docs/v2/ControlPlane.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# OtOpcUa.ControlPlane
|
||||
|
||||
Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/`.
|
||||
|
||||
## Singletons
|
||||
|
||||
| Actor | File | Marker key | Role |
|
||||
|---|---|---|---|
|
||||
| `ConfigPublishCoordinator` | `Coordinators/ConfigPublishCoordinator.cs` | `ConfigPublishCoordinatorKey` | Dispatches `DispatchDeployment`, collects `ApplyAck`s, seals/fails/times-out. |
|
||||
| `AdminOperationsActor` | `AdminOperations/AdminOperationsActor.cs` | `AdminOperationsActorKey` | Receives `StartDeployment` from the UI, snapshots ConfigDb via `ConfigComposer`, persists `Deployment` row + `ConfigEdit` marker, tells the coordinator to dispatch. |
|
||||
| `AuditWriterActor` | `Audit/AuditWriterActor.cs` | `AuditWriterActorKey` | Batched `ConfigAuditLog` writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. |
|
||||
| `FleetStatusBroadcaster` | `Fleet/FleetStatusBroadcaster.cs` | `FleetStatusBroadcasterKey` | Aggregates per-node `FleetNodeStatus` heartbeats; publishes `FleetStatusChanged` on the `fleet-status` DPS topic (SignalR bridge tracked as F16). |
|
||||
| `RedundancyStateActor` | `Redundancy/RedundancyStateActor.cs` | `RedundancyStateActorKey` | Cluster-event subscriber; debounces 250 ms; publishes `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
|
||||
|
||||
All five register via `WithOtOpcUaControlPlaneSingletons()` (extension on `AkkaConfigurationBuilder`). Each uses `ClusterSingletonOptions { Role = "admin" }` so the singleton runs on the admin role-leader and migrates to the next admin node on failover.
|
||||
|
||||
```csharp
|
||||
// Program.cs (admin role only)
|
||||
builder.Services.AddAkka("otopcua", (ab, sp) =>
|
||||
{
|
||||
ab.WithOtOpcUaClusterBootstrap(sp);
|
||||
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
|
||||
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
|
||||
});
|
||||
```
|
||||
|
||||
Resolve from anywhere via `IRequiredActor<T>` or the `ActorRegistry`:
|
||||
|
||||
```csharp
|
||||
public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
|
||||
{
|
||||
private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
## Deploy flow
|
||||
|
||||
```
|
||||
UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
|
||||
│ Ask the AdminOperationsActor singleton proxy
|
||||
▼
|
||||
AdminOperationsActor
|
||||
│ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
|
||||
│ insert Deployment(Dispatching) + ConfigEdit marker
|
||||
│ Tell coordinator → DispatchDeployment
|
||||
▼
|
||||
ConfigPublishCoordinator
|
||||
│ DiscoverDriverNodes() → expected ACK set (host:port per member)
|
||||
│ insert NodeDeploymentState(Applying) per driver
|
||||
│ Publish DispatchDeployment on "deployments" topic
|
||||
│ Start apply-deadline timer (2 min default)
|
||||
▼ DistributedPubSub
|
||||
DriverHostActor (on each driver node — subscribed to "deployments")
|
||||
│ PreStart subscribed; current state Steady(rev)
|
||||
│ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent)
|
||||
│ else Become(Applying) → write NodeDeploymentStatus → ApplyAck
|
||||
▼ via "deployment-acks" topic
|
||||
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
|
||||
│ PersistNodeAck + collect
|
||||
│ all-Applied → Sealed
|
||||
│ any-Failed → PartiallyFailed
|
||||
│ deadline → TimedOut
|
||||
```
|
||||
|
||||
The dedicated `deployment-acks` topic + coordinator subscription was added in commit `5cfbe8b`. Before that, ACKs were published back on `deployments` and the coordinator (not subscribed) silently dropped them — deployments hung at `AwaitingApplyAcks` forever in multi-node tests.
|
||||
|
||||
### Failover recovery
|
||||
|
||||
If the admin singleton fails over mid-deploy, the new instance's `PreStart` queries `NodeDeploymentState` for any `Dispatching`/`AwaitingApplyAcks` row, rebuilds `_expectedAcks` + `_acks` from persisted state, and resumes the deadline timer. See `Coordinators/ConfigPublishCoordinator.cs::PreStart`.
|
||||
|
||||
## ConfigComposer
|
||||
|
||||
Pure function `SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string)`:
|
||||
|
||||
1. Reads every live-edit table.
|
||||
2. Serialises to a stable byte[] (deterministic ordering).
|
||||
3. Computes SHA-256 over the bytes → 64-hex `RevisionHash`.
|
||||
|
||||
Same DB state → same artifact + same hash. That's what makes the `NoChanges` outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).
|
||||
|
||||
## ServiceLevelCalculator
|
||||
|
||||
Pure function exposed at `Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs)`. Returns the OPC UA `ServiceLevel` byte per the truth table in [Redundancy.md](../Redundancy.md#servicelevel-tiers-part-5-65). No side effects; trivially unit-testable.
|
||||
|
||||
## DPS topics
|
||||
|
||||
| Topic | Publisher | Subscribers |
|
||||
|---|---|---|
|
||||
| `deployments` | ConfigPublishCoordinator | DriverHostActor (per-node) |
|
||||
| `deployment-acks` | DriverHostActor | ConfigPublishCoordinator |
|
||||
| `fleet-status` | FleetStatusBroadcaster | (SignalR bridge — F16) |
|
||||
| `redundancy-state` | RedundancyStateActor | (per-node ServiceLevel calc — F10) |
|
||||
|
||||
## Tests
|
||||
|
||||
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/` — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).
|
||||
|
||||
Multi-node tests (cross-ActorSystem) are in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/`.
|
||||
Reference in New Issue
Block a user