docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65)
Four new docs at docs/v2/ giving a single-page tour of each v2 piece: - Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit) - Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap - ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery - Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map Each links back to the design doc for depth. Architecture-v2 cross-references the other three + ServiceHosting + Redundancy + security.
This commit is contained in:
127
docs/v2/Architecture-v2.md
Normal file
127
docs/v2/Architecture-v2.md
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
# OtOpcUa v2 Architecture
|
||||||
|
|
||||||
|
Single-page tour of the v2 layout. For decision history + tradeoffs, see [`2026-05-26-akka-hosting-alignment-design.md`](../plans/2026-05-26-akka-hosting-alignment-design.md).
|
||||||
|
|
||||||
|
## Big picture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ OtOpcUa.Host │ (fused binary)
|
||||||
|
│ │
|
||||||
|
│ reads OTOPCUA_ROLES env, mounts: │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ admin → Blazor + auth + control- │ │
|
||||||
|
│ │ plane singletons │ │
|
||||||
|
│ │ driver → OPC UA endpoint + │ │
|
||||||
|
│ │ per-node actors │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ joins
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ Akka.NET cluster │
|
||||||
|
│ (split-brain resolver: keep-oldest, 15s) │
|
||||||
|
└─────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
shared by every node: ┌─────────────────┐
|
||||||
|
│ ConfigDb (SQL) │ live-edit + Deployment artifacts + audit
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
The v1 setup was two separate Windows services (`OtOpcUa.Server` + `OtOpcUa.Admin`) talking through the DB. v2 collapses them into one binary with role gating, and adds an Akka cluster so admin singletons can drive deploys and the redundancy story is automatic.
|
||||||
|
|
||||||
|
## Project layout
|
||||||
|
|
||||||
|
```
|
||||||
|
src/Core/ shared abstractions, no Server deps
|
||||||
|
ZB.MOM.WW.OtOpcUa.Commons types + Akka message contracts + interfaces
|
||||||
|
ZB.MOM.WW.OtOpcUa.Cluster HOCON, AkkaClusterOptions, IClusterRoleInfo
|
||||||
|
ZB.MOM.WW.OtOpcUa.Configuration EF Core DbContext + entities
|
||||||
|
|
||||||
|
src/Server/ server-side projects
|
||||||
|
ZB.MOM.WW.OtOpcUa.Security cookie+JWT auth, LDAP, JwtTokenService
|
||||||
|
ZB.MOM.WW.OtOpcUa.ControlPlane admin-role cluster singletons
|
||||||
|
ZB.MOM.WW.OtOpcUa.Runtime driver-role per-node actors
|
||||||
|
ZB.MOM.WW.OtOpcUa.OpcUaServer OPC UA endpoint facade + Phase7Composer
|
||||||
|
ZB.MOM.WW.OtOpcUa.AdminUI Blazor Razor class library
|
||||||
|
ZB.MOM.WW.OtOpcUa.Host fused binary (Program.cs)
|
||||||
|
```
|
||||||
|
|
||||||
|
| Project | Role | Doc |
|
||||||
|
|---|---|---|
|
||||||
|
| Cluster | Bootstrap + cluster topology view | [Cluster.md](Cluster.md) |
|
||||||
|
| ControlPlane | Admin singletons (deploy, audit, fleet, redundancy) | [ControlPlane.md](ControlPlane.md) |
|
||||||
|
| Runtime | Driver-role actor tree | [Runtime.md](Runtime.md) |
|
||||||
|
| Security | Cookie+JWT auth, LDAP, /auth/* endpoints | [../security.md](../security.md) |
|
||||||
|
| OpcUaServer | OPC UA endpoint host + composer | [../OpcUaServer.md](../OpcUaServer.md) |
|
||||||
|
| Host | Role-gated DI graph + Program.cs | [../ServiceHosting.md](../ServiceHosting.md) |
|
||||||
|
|
||||||
|
## Role gating
|
||||||
|
|
||||||
|
`Program.cs` reads `OTOPCUA_ROLES` once (per process) and decides what to wire:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
var roles = RoleParser.Parse(Environment.GetEnvironmentVariable("OTOPCUA_ROLES"));
|
||||||
|
var hasAdmin = roles.Contains("admin");
|
||||||
|
var hasDriver = roles.Contains("driver");
|
||||||
|
|
||||||
|
builder.Services.AddOtOpcUaConfigDb(builder.Configuration);
|
||||||
|
builder.Services.AddOtOpcUaCluster(builder.Configuration);
|
||||||
|
|
||||||
|
builder.Services.AddAkka("otopcua", (ab, sp) =>
|
||||||
|
{
|
||||||
|
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster options
|
||||||
|
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
|
||||||
|
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
|
||||||
|
});
|
||||||
|
|
||||||
|
if (hasAdmin)
|
||||||
|
{
|
||||||
|
builder.Services.AddOtOpcUaAuth(builder.Configuration);
|
||||||
|
builder.Services.AddAdminUI();
|
||||||
|
// SignalR, AdminOpsClient, etc.
|
||||||
|
}
|
||||||
|
|
||||||
|
builder.Services.AddOtOpcUaHealth();
|
||||||
|
```
|
||||||
|
|
||||||
|
There is a **single** ActorSystem. Cluster singletons + per-node actors share it via the `Akka.Hosting` registry. This was a v2 fix (the initial Phase 9 wiring ran two ActorSystems by mistake; see commit `d6fac2d`).
|
||||||
|
|
||||||
|
## Live-edit vs draft/publish
|
||||||
|
|
||||||
|
v1 had `ConfigGeneration(Draft|Published)` with every live-edit entity FK'd to a generation. Edits accumulated in a Draft until Publish promoted them.
|
||||||
|
|
||||||
|
v2 removes that entirely:
|
||||||
|
|
||||||
|
- No `ConfigGeneration` table, no `GenerationId` columns.
|
||||||
|
- Every live-edit entity has a `RowVersion` (`IsRowVersion()`) for last-write-wins.
|
||||||
|
- Audit goes to `ConfigEdit` (per-row delta) and `ConfigAuditLog` (event-level).
|
||||||
|
- Deploys snapshot the *current* DB state into an immutable `Deployment.ArtifactBlob` + its `RevisionHash`. That artifact is what driver nodes apply.
|
||||||
|
|
||||||
|
See [ControlPlane.md § Deploy flow](ControlPlane.md#deploy-flow) for the end-to-end dispatch + ACK + seal sequence.
|
||||||
|
|
||||||
|
## NodeId
|
||||||
|
|
||||||
|
Each cluster member has a `NodeId` derived as `{PublicHostname}:{Port}` of the Akka remote endpoint. `ClusterRoleInfo.LocalNode` + `ConfigPublishCoordinator.DiscoverDriverNodes()` use the same formula so they always agree. The port suffix makes loopback test deployments distinguishable (commit `5cfbe8b`); in production the hostname alone is already unique.
|
||||||
|
|
||||||
|
## Health endpoints
|
||||||
|
|
||||||
|
| Path | Returns 200 when… |
|
||||||
|
|---|---|
|
||||||
|
| `/healthz` | Process is alive (no checks). |
|
||||||
|
| `/health/ready` | DB reachable + this node is `Up` in the cluster. |
|
||||||
|
| `/health/active` | This node is the admin role-leader (used by Traefik/HA-LB to pin traffic). |
|
||||||
|
|
||||||
|
## What lives where (quick map)
|
||||||
|
|
||||||
|
| Concern | Project | Entry point |
|
||||||
|
|---|---|---|
|
||||||
|
| Read OTOPCUA_ROLES | `Cluster.RoleParser` | static `Parse(string?)` |
|
||||||
|
| Cluster lifecycle | `Cluster.WithOtOpcUaClusterBootstrap` | extension on `AkkaConfigurationBuilder` |
|
||||||
|
| Local node identity | `Cluster.IClusterRoleInfo.LocalNode` | DI singleton |
|
||||||
|
| Admin singletons | `ControlPlane.WithOtOpcUaControlPlaneSingletons` | extension on `AkkaConfigurationBuilder` |
|
||||||
|
| Driver actors | `Runtime.WithOtOpcUaRuntimeActors` | extension on `AkkaConfigurationBuilder` |
|
||||||
|
| Auth pipeline | `Security.AddOtOpcUaAuth` + `MapOtOpcUaAuth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
|
||||||
|
| OPC UA facade | `OpcUaServer.OpcUaApplicationHost` | runtime host, started by driver-role startup |
|
||||||
|
| Health endpoints | `Host.Health.AddOtOpcUaHealth` + `MapOtOpcUaHealth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` |
|
||||||
102
docs/v2/Cluster.md
Normal file
102
docs/v2/Cluster.md
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
# OtOpcUa.Cluster
|
||||||
|
|
||||||
|
Akka.NET cluster bootstrap + topology view. Used by every other server-side project to talk to the live cluster.
|
||||||
|
|
||||||
|
Path: `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/`
|
||||||
|
|
||||||
|
## Public surface
|
||||||
|
|
||||||
|
| Type | Role |
|
||||||
|
|---|---|
|
||||||
|
| `AkkaClusterOptions` | DI-bound options from `appsettings.json::Cluster`. Hostname/Port/PublicHostname/SeedNodes/Roles. |
|
||||||
|
| `IClusterRoleInfo` (interface in Commons) | Live view of cluster membership + role-leader topology. Thread-safe + event-raising. |
|
||||||
|
| `ClusterRoleInfo` | Implementation. Subscribes to `ClusterEvent.IMemberEvent` + `RoleLeaderChanged` + `LeaderChanged`. |
|
||||||
|
| `HoconLoader.LoadBaseConfig()` | Reads the embedded `Resources/akka.conf`. |
|
||||||
|
| `RoleParser.Parse(string?)` | Parses `OTOPCUA_ROLES` env var into a deduped `string[]`. |
|
||||||
|
| `ServiceCollectionExtensions.AddOtOpcUaCluster(configuration)` | Binds options + registers `IClusterRoleInfo` singleton. **Does not** start an ActorSystem. |
|
||||||
|
| `WithOtOpcUaClusterBootstrap(serviceProvider)` | Extension on `AkkaConfigurationBuilder`. Loads embedded HOCON + applies `WithRemoting(...)` + `WithClustering(...)` from options. |
|
||||||
|
|
||||||
|
## Bootstrap flow
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
// Program.cs
|
||||||
|
builder.Services.AddOtOpcUaCluster(builder.Configuration);
|
||||||
|
|
||||||
|
builder.Services.AddAkka("otopcua", (ab, sp) =>
|
||||||
|
{
|
||||||
|
ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster
|
||||||
|
// …singletons + node actors layered on
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Order matters: `AddOtOpcUaCluster` must come before `AddAkka` so the options binding has run by the time the `AddAkka` lambda fires. Inside the lambda, `WithOtOpcUaClusterBootstrap` resolves `IOptions<AkkaClusterOptions>` from `sp` and writes them into the Akka builder.
|
||||||
|
|
||||||
|
The single ActorSystem this produces is what every other v2 piece runs on. There is no second Akka instance — that was a Phase 9 bug (commit `d6fac2d` consolidated).
|
||||||
|
|
||||||
|
## Embedded HOCON
|
||||||
|
|
||||||
|
`src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf` contains:
|
||||||
|
|
||||||
|
| Setting | Value | Why |
|
||||||
|
|---|---|---|
|
||||||
|
| `akka.actor.provider` | `cluster` | Required for `Cluster.Get(system)` to work. |
|
||||||
|
| `akka.cluster.split-brain-resolver.active-strategy` | `keep-oldest` | Smaller/younger side downs itself on partition. |
|
||||||
|
| `akka.cluster.split-brain-resolver.stable-after` | `15s` | Time before SBR acts. |
|
||||||
|
| `akka.cluster.failure-detector.threshold` | `10.0` | Higher than default (8.0) for GC-pause tolerance. |
|
||||||
|
| `opcua-synchronized-dispatcher.type` | `PinnedDispatcher` | Dedicated thread for `OpcUaPublishActor` so SDK calls stay marshalled. |
|
||||||
|
|
||||||
|
The Cluster.Tests project verifies these key values stay correct (`HoconLoaderTests`).
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"Cluster": {
|
||||||
|
"Hostname": "0.0.0.0",
|
||||||
|
"Port": 4053,
|
||||||
|
"PublicHostname": "node-a.lan",
|
||||||
|
"SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"],
|
||||||
|
"Roles": ["admin", "driver"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `Hostname`: interface to bind. `0.0.0.0` listens on every interface.
|
||||||
|
- `Port`: TCP port for cluster gossip. Default 4053.
|
||||||
|
- `PublicHostname`: address advertised in cluster gossip. Must be reachable by every other node.
|
||||||
|
- `SeedNodes`: where new nodes go to join. List one (or two) stable nodes. First node bootstraps the cluster from its own address.
|
||||||
|
- `Roles`: free-form tags Akka gossip propagates. v2 uses `admin` + `driver`; per-role wiring in `Program.cs` reads `OTOPCUA_ROLES` env var, not this list — these two should stay in sync.
|
||||||
|
|
||||||
|
## IClusterRoleInfo
|
||||||
|
|
||||||
|
Anywhere in the host that needs the local node's identity or a view of who-else-is-in-the-cluster, inject `IClusterRoleInfo`:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
public sealed class MyService(IClusterRoleInfo cluster)
|
||||||
|
{
|
||||||
|
public NodeId Self => cluster.LocalNode;
|
||||||
|
public IReadOnlyList<NodeId> Drivers => cluster.MembersWithRole("driver");
|
||||||
|
public NodeId? AdminLeader => cluster.RoleLeader("admin");
|
||||||
|
|
||||||
|
public MyService(IClusterRoleInfo cluster)
|
||||||
|
{
|
||||||
|
cluster.RoleLeaderChanged += (_, e) =>
|
||||||
|
Console.WriteLine($"role={e.Role}: {e.PreviousLeader} → {e.NewLeader}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`LocalNode` is `{PublicHostname}:{Port}` (the port suffix lets loopback test deployments stay distinct; production hostnames are already unique). `ConfigPublishCoordinator` uses the same `{host}:{port}` formula so the expected-ack set and the driver self-identification agree (commit `5cfbe8b`).
|
||||||
|
|
||||||
|
## Lifecycle
|
||||||
|
|
||||||
|
Akka.Hosting owns the lifecycle: `IHostedService` starts the ActorSystem at host start, runs `CoordinatedShutdown.ClusterLeavingReason` on host stop. The Cluster project does not register its own `IHostedService` (the v1 `AkkaHostedService` was deleted in commit `d6fac2d`).
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
`tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/` covers:
|
||||||
|
|
||||||
|
- `HoconLoaderTests` — embedded resource loads + key settings parse correctly.
|
||||||
|
- `RoleParserTests` — comma-split + dedup + trim semantics.
|
||||||
|
|
||||||
|
Cross-project integration is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/` (cluster formation, deploy round-trip).
|
||||||
99
docs/v2/ControlPlane.md
Normal file
99
docs/v2/ControlPlane.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
# OtOpcUa.ControlPlane
|
||||||
|
|
||||||
|
Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/`.
|
||||||
|
|
||||||
|
## Singletons
|
||||||
|
|
||||||
|
| Actor | File | Marker key | Role |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `ConfigPublishCoordinator` | `Coordinators/ConfigPublishCoordinator.cs` | `ConfigPublishCoordinatorKey` | Dispatches `DispatchDeployment`, collects `ApplyAck`s, seals/fails/times-out. |
|
||||||
|
| `AdminOperationsActor` | `AdminOperations/AdminOperationsActor.cs` | `AdminOperationsActorKey` | Receives `StartDeployment` from the UI, snapshots ConfigDb via `ConfigComposer`, persists `Deployment` row + `ConfigEdit` marker, tells the coordinator to dispatch. |
|
||||||
|
| `AuditWriterActor` | `Audit/AuditWriterActor.cs` | `AuditWriterActorKey` | Batched `ConfigAuditLog` writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. |
|
||||||
|
| `FleetStatusBroadcaster` | `Fleet/FleetStatusBroadcaster.cs` | `FleetStatusBroadcasterKey` | Aggregates per-node `FleetNodeStatus` heartbeats; publishes `FleetStatusChanged` on the `fleet-status` DPS topic (SignalR bridge tracked as F16). |
|
||||||
|
| `RedundancyStateActor` | `Redundancy/RedundancyStateActor.cs` | `RedundancyStateActorKey` | Cluster-event subscriber; debounces 250 ms; publishes `RedundancyStateChanged` on the `redundancy-state` DPS topic. |
|
||||||
|
|
||||||
|
All five register via `WithOtOpcUaControlPlaneSingletons()` (extension on `AkkaConfigurationBuilder`). Each uses `ClusterSingletonOptions { Role = "admin" }` so the singleton runs on the admin role-leader and migrates to the next admin node on failover.
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
// Program.cs (admin role only)
|
||||||
|
builder.Services.AddAkka("otopcua", (ab, sp) =>
|
||||||
|
{
|
||||||
|
ab.WithOtOpcUaClusterBootstrap(sp);
|
||||||
|
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
|
||||||
|
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Resolve from anywhere via `IRequiredActor<T>` or the `ActorRegistry`:
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient
|
||||||
|
{
|
||||||
|
private readonly IActorRef _proxy = registry.Get<AdminOperationsActorKey>();
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deploy flow
|
||||||
|
|
||||||
|
```
|
||||||
|
UI → IAdminOperationsClient.StartDeploymentAsync(createdBy)
|
||||||
|
│ Ask the AdminOperationsActor singleton proxy
|
||||||
|
▼
|
||||||
|
AdminOperationsActor
|
||||||
|
│ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash)
|
||||||
|
│ insert Deployment(Dispatching) + ConfigEdit marker
|
||||||
|
│ Tell coordinator → DispatchDeployment
|
||||||
|
▼
|
||||||
|
ConfigPublishCoordinator
|
||||||
|
│ DiscoverDriverNodes() → expected ACK set (host:port per member)
|
||||||
|
│ insert NodeDeploymentState(Applying) per driver
|
||||||
|
│ Publish DispatchDeployment on "deployments" topic
|
||||||
|
│ Start apply-deadline timer (2 min default)
|
||||||
|
▼ DistributedPubSub
|
||||||
|
DriverHostActor (on each driver node — subscribed to "deployments")
|
||||||
|
│ PreStart subscribed; current state Steady(rev)
|
||||||
|
│ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent)
|
||||||
|
│ else Become(Applying) → write NodeDeploymentStatus → ApplyAck
|
||||||
|
▼ via "deployment-acks" topic
|
||||||
|
ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart)
|
||||||
|
│ PersistNodeAck + collect
|
||||||
|
│ all-Applied → Sealed
|
||||||
|
│ any-Failed → PartiallyFailed
|
||||||
|
│ deadline → TimedOut
|
||||||
|
```
|
||||||
|
|
||||||
|
The dedicated `deployment-acks` topic + coordinator subscription was added in commit `5cfbe8b`. Before that, ACKs were published back on `deployments` and the coordinator (not subscribed) silently dropped them — deployments hung at `AwaitingApplyAcks` forever in multi-node tests.
|
||||||
|
|
||||||
|
### Failover recovery
|
||||||
|
|
||||||
|
If the admin singleton fails over mid-deploy, the new instance's `PreStart` queries `NodeDeploymentState` for any `Dispatching`/`AwaitingApplyAcks` row, rebuilds `_expectedAcks` + `_acks` from persisted state, and resumes the deadline timer. See `Coordinators/ConfigPublishCoordinator.cs::PreStart`.
|
||||||
|
|
||||||
|
## ConfigComposer
|
||||||
|
|
||||||
|
Pure function `SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string)`:
|
||||||
|
|
||||||
|
1. Reads every live-edit table.
|
||||||
|
2. Serialises to a stable byte[] (deterministic ordering).
|
||||||
|
3. Computes SHA-256 over the bytes → 64-hex `RevisionHash`.
|
||||||
|
|
||||||
|
Same DB state → same artifact + same hash. That's what makes the `NoChanges` outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash).
|
||||||
|
|
||||||
|
## ServiceLevelCalculator
|
||||||
|
|
||||||
|
Pure function exposed at `Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs)`. Returns the OPC UA `ServiceLevel` byte per the truth table in [Redundancy.md](../Redundancy.md#servicelevel-tiers-part-5-65). No side effects; trivially unit-testable.
|
||||||
|
|
||||||
|
## DPS topics
|
||||||
|
|
||||||
|
| Topic | Publisher | Subscribers |
|
||||||
|
|---|---|---|
|
||||||
|
| `deployments` | ConfigPublishCoordinator | DriverHostActor (per-node) |
|
||||||
|
| `deployment-acks` | DriverHostActor | ConfigPublishCoordinator |
|
||||||
|
| `fleet-status` | FleetStatusBroadcaster | (SignalR bridge — F16) |
|
||||||
|
| `redundancy-state` | RedundancyStateActor | (per-node ServiceLevel calc — F10) |
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/` — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table).
|
||||||
|
|
||||||
|
Multi-node tests (cross-ActorSystem) are in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/`.
|
||||||
126
docs/v2/Runtime.md
Normal file
126
docs/v2/Runtime.md
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
# OtOpcUa.Runtime
|
||||||
|
|
||||||
|
Driver-role actor tree — one set per node. Path: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/`.
|
||||||
|
|
||||||
|
## Actor tree
|
||||||
|
|
||||||
|
```
|
||||||
|
DriverHostActor (per node)
|
||||||
|
│ state machine: Steady ⇄ Applying ⇄ Stale
|
||||||
|
│
|
||||||
|
├──▶ DriverInstanceActor (per configured DriverInstance row)
|
||||||
|
│ state: Connecting → Connected → Reconnecting (or Stubbed)
|
||||||
|
│
|
||||||
|
├──▶ VirtualTagActor (per VirtualTag row)
|
||||||
|
│ compiles + evaluates expression, publishes derived value
|
||||||
|
│
|
||||||
|
├──▶ ScriptedAlarmActor (per ScriptedAlarm row)
|
||||||
|
│ state: Inactive ⇄ Active ⇄ Acknowledged
|
||||||
|
│
|
||||||
|
├──▶ OpcUaPublishActor (per node, pinned dispatcher)
|
||||||
|
│ marshalled OPC UA SDK writes + RebuildAddressSpace
|
||||||
|
│
|
||||||
|
├──▶ HistorianAdapterActor (per node)
|
||||||
|
│ pipe IPC to Wonderware historian sidecar
|
||||||
|
│
|
||||||
|
├──▶ PeerOpcUaProbeActor (per peer node)
|
||||||
|
│ opc.tcp ping → redundancy-state DPS topic
|
||||||
|
│
|
||||||
|
└──▶ DbHealthProbeActor (per node)
|
||||||
|
cached SELECT 1; consumed by /health/ready + redundancy calc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Public surface
|
||||||
|
|
||||||
|
| Type | File |
|
||||||
|
|---|---|
|
||||||
|
| `WithOtOpcUaRuntimeActors()` | `ServiceCollectionExtensions.cs` — extension on `AkkaConfigurationBuilder`. Spawns `DriverHostActor` + `DbHealthProbeActor` on the host's ActorSystem. |
|
||||||
|
| `DriverHostActor` | `Drivers/DriverHostActor.cs` |
|
||||||
|
| `DriverInstanceActor` | `Drivers/DriverInstanceActor.cs` |
|
||||||
|
| `VirtualTagActor` | `VirtualTags/VirtualTagActor.cs` |
|
||||||
|
| `ScriptedAlarmActor` | `ScriptedAlarms/ScriptedAlarmActor.cs` |
|
||||||
|
| `OpcUaPublishActor` | `OpcUa/OpcUaPublishActor.cs` |
|
||||||
|
| `HistorianAdapterActor` | `Historian/HistorianAdapterActor.cs` |
|
||||||
|
| `PeerOpcUaProbeActor` | `Health/PeerOpcUaProbeActor.cs` |
|
||||||
|
| `DbHealthProbeActor` | `Health/DbHealthProbeActor.cs` |
|
||||||
|
|
||||||
|
Marker keys for registry lookup: `DriverHostActorKey`, `DbHealthProbeActorKey`.
|
||||||
|
|
||||||
|
## DriverHostActor
|
||||||
|
|
||||||
|
Per-node supervisor with three Become states:
|
||||||
|
|
||||||
|
| State | Meaning |
|
||||||
|
|---|---|
|
||||||
|
| `Steady(rev)` | Caught up. `DispatchDeployment` with `msg.rev == currentRev` → immediate `ApplyAck(Applied)` (idempotent). New rev → `Become(Applying)`. |
|
||||||
|
| `Applying(id)` | Apply in progress. Further `DispatchDeployment` for in-flight ID → debug-log + ignore. For new ID → defer via `Self.Forward`. |
|
||||||
|
| `Stale` | ConfigDb unreachable on bootstrap. Periodic `RetryConfigDbConnection` tries to advance to `Steady`. |
|
||||||
|
|
||||||
|
`PreStart`:
|
||||||
|
|
||||||
|
1. Subscribe to `deployments` DPS topic.
|
||||||
|
2. Read most-recent `NodeDeploymentState` for this node from ConfigDb.
|
||||||
|
3. If `Applied` → restore `_currentRevision`, `Become(Steady)`.
|
||||||
|
4. If `Applying` (orphan from crash) → replay apply (idempotent).
|
||||||
|
5. If `Failed` → `Become(Steady)` at last known rev.
|
||||||
|
6. DB unreachable → `Become(Stale)`, start retry timer.
|
||||||
|
|
||||||
|
ACK publishing: when no `_coordinatorOverride` is set (production), `SendAck` publishes on the dedicated `deployment-acks` DPS topic which the coordinator subscribes to (commit `5cfbe8b`).
|
||||||
|
|
||||||
|
## DriverInstanceActor
|
||||||
|
|
||||||
|
Per-driver-instance child. State machine:
|
||||||
|
|
||||||
|
- `Connecting` → first attempt to reach the underlying driver
|
||||||
|
- `Connected` → subscriptions active, reads/writes flow
|
||||||
|
- `Reconnecting` → temporary disconnect; backoff retry
|
||||||
|
- `Stubbed` → DEV-STUB mode for Windows-only drivers (Galaxy, Wonderware Historian) on non-Windows or when `roles` contains `dev`
|
||||||
|
|
||||||
|
`ShouldStub(driverType, roles)` returns `true` for `"Galaxy" | "Historian.Wonderware"` on non-Windows; the actor goes straight to `Stubbed` and returns deterministic success without touching real hardware. Wiring this into the DriverHost child-spawn path is follow-up F20 (folds into F7).
|
||||||
|
|
||||||
|
Engine wiring (subscription publishing, ApplyDelta diff, bad-quality-on-disconnect, write path, supervisor backoff) is stubbed — tracked as F7. Tests exercise message contracts, not engine behaviour.
|
||||||
|
|
||||||
|
## VirtualTagActor / ScriptedAlarmActor
|
||||||
|
|
||||||
|
Skeleton state machines + message handlers. Engine work:
|
||||||
|
|
||||||
|
- `VirtualTagEngine.Evaluate()` not yet called from `VirtualTagActor.DependencyValueChanged` (F8).
|
||||||
|
- `AlarmConditionService` not yet called from `ScriptedAlarmActor` (F9).
|
||||||
|
- `ScriptedAlarmState` DB persistence on `PreRestart` not wired (F9).
|
||||||
|
|
||||||
|
## OpcUaPublishActor
|
||||||
|
|
||||||
|
The only actor on the **pinned dispatcher** (`opcua-synchronized-dispatcher` from `akka.conf`). All OPC UA SDK address-space writes go through it so the SDK's threading model isn't violated.
|
||||||
|
|
||||||
|
Message contracts are defined; actual SDK calls are stubbed (counters only). Real address-space writes + `ServiceLevel` Variable updates + `RebuildAddressSpace` after a deploy land in F10 (gated on F13 — full `OpcUaApplicationHost` extraction).
|
||||||
|
|
||||||
|
## HistorianAdapterActor, PeerOpcUaProbeActor
|
||||||
|
|
||||||
|
Both have message contracts wired. Engine integration deferred:
|
||||||
|
|
||||||
|
- `HistorianAdapterActor` — named-pipe IPC to the Wonderware historian sidecar + `SqliteStoreAndForwardSink` (F11).
|
||||||
|
- `PeerOpcUaProbeActor` — real `opc.tcp://peer:4840` ping (F12). Current stub always returns `Ok=true`.
|
||||||
|
|
||||||
|
## DbHealthProbeActor
|
||||||
|
|
||||||
|
`Ask<DbHealthStatus>` returns cached state (refreshed every 5 s by an internal `SELECT 1`). Consumed by `/health/ready` and `RedundancyStateActor`.
|
||||||
|
|
||||||
|
## Lifecycle wiring
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
// Program.cs (driver role only)
|
||||||
|
builder.Services.AddAkka("otopcua", (ab, sp) =>
|
||||||
|
{
|
||||||
|
ab.WithOtOpcUaClusterBootstrap(sp);
|
||||||
|
if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons();
|
||||||
|
if (hasDriver) ab.WithOtOpcUaRuntimeActors();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
`WithOtOpcUaRuntimeActors` resolves `IDbContextFactory<OtOpcUaConfigDbContext>` + `IClusterRoleInfo` from DI, then spawns `DbHealthProbeActor` and `DriverHostActor` as top-level `/user/` actors. Both register marker keys in `ActorRegistry` so the registry lookup works from anywhere.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — 16 tests covering DriverHostActor (Steady ack, Applying transitions, Stale recovery), DriverInstanceActor (state machine, stub mode), VirtualTagActor + ScriptedAlarmActor (message contracts), OpcUaPublishActor (props + message acceptance), DbHealthProbe + PeerOpcUaProbe (probe loop), and the `WithOtOpcUaRuntimeActors` registration round-trip.
|
||||||
|
|
||||||
|
End-to-end deploy from admin → driver via the cluster is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DeployHappyPathTests.cs`.
|
||||||
Reference in New Issue
Block a user