From 1689901c0e5d799b97779e3b9fe70fa48883c35c Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 26 May 2026 06:41:48 -0400 Subject: [PATCH] docs(v2): Architecture-v2 + Cluster + ControlPlane + Runtime overviews (Task 65) Four new docs at docs/v2/ giving a single-page tour of each v2 piece: - Architecture-v2.md: top-level mental model (fused Host + roles + cluster + live-edit) - Cluster.md: AkkaClusterOptions + IClusterRoleInfo + WithOtOpcUaClusterBootstrap - ControlPlane.md: 5 admin singletons + DPS topics + deploy flow + failover recovery - Runtime.md: per-node actor tree + state machines + engine-wiring follow-up map Each links back to the design doc for depth. Architecture-v2 cross-references the other three + ServiceHosting + Redundancy + security. --- docs/v2/Architecture-v2.md | 127 +++++++++++++++++++++++++++++++++++++ docs/v2/Cluster.md | 102 +++++++++++++++++++++++++++++ docs/v2/ControlPlane.md | 99 +++++++++++++++++++++++++++++ docs/v2/Runtime.md | 126 ++++++++++++++++++++++++++++++++++++ 4 files changed, 454 insertions(+) create mode 100644 docs/v2/Architecture-v2.md create mode 100644 docs/v2/Cluster.md create mode 100644 docs/v2/ControlPlane.md create mode 100644 docs/v2/Runtime.md diff --git a/docs/v2/Architecture-v2.md b/docs/v2/Architecture-v2.md new file mode 100644 index 0000000..1f35833 --- /dev/null +++ b/docs/v2/Architecture-v2.md @@ -0,0 +1,127 @@ +# OtOpcUa v2 Architecture + +Single-page tour of the v2 layout. For decision history + tradeoffs, see [`2026-05-26-akka-hosting-alignment-design.md`](../plans/2026-05-26-akka-hosting-alignment-design.md). + +## Big picture + +``` + ┌─────────────────────────────────────────────┐ + │ OtOpcUa.Host │ (fused binary) + │ │ + │ reads OTOPCUA_ROLES env, mounts: │ + │ ┌─────────────────────────────────────┐ │ + │ │ admin → Blazor + auth + control- │ │ + │ │ plane singletons │ │ + │ │ driver → OPC UA endpoint + │ │ + │ │ per-node actors │ │ + │ └─────────────────────────────────────┘ │ + └─────────────────────────────────────────────┘ + │ + │ joins + ▼ + ┌─────────────────────────────────────────────┐ + │ Akka.NET cluster │ + │ (split-brain resolver: keep-oldest, 15s) │ + └─────────────────────────────────────────────┘ + +shared by every node: ┌─────────────────┐ + │ ConfigDb (SQL) │ live-edit + Deployment artifacts + audit + └─────────────────┘ +``` + +The v1 setup was two separate Windows services (`OtOpcUa.Server` + `OtOpcUa.Admin`) talking through the DB. v2 collapses them into one binary with role gating, and adds an Akka cluster so admin singletons can drive deploys and the redundancy story is automatic. + +## Project layout + +``` +src/Core/ shared abstractions, no Server deps + ZB.MOM.WW.OtOpcUa.Commons types + Akka message contracts + interfaces + ZB.MOM.WW.OtOpcUa.Cluster HOCON, AkkaClusterOptions, IClusterRoleInfo + ZB.MOM.WW.OtOpcUa.Configuration EF Core DbContext + entities + +src/Server/ server-side projects + ZB.MOM.WW.OtOpcUa.Security cookie+JWT auth, LDAP, JwtTokenService + ZB.MOM.WW.OtOpcUa.ControlPlane admin-role cluster singletons + ZB.MOM.WW.OtOpcUa.Runtime driver-role per-node actors + ZB.MOM.WW.OtOpcUa.OpcUaServer OPC UA endpoint facade + Phase7Composer + ZB.MOM.WW.OtOpcUa.AdminUI Blazor Razor class library + ZB.MOM.WW.OtOpcUa.Host fused binary (Program.cs) +``` + +| Project | Role | Doc | +|---|---|---| +| Cluster | Bootstrap + cluster topology view | [Cluster.md](Cluster.md) | +| ControlPlane | Admin singletons (deploy, audit, fleet, redundancy) | [ControlPlane.md](ControlPlane.md) | +| Runtime | Driver-role actor tree | [Runtime.md](Runtime.md) | +| Security | Cookie+JWT auth, LDAP, /auth/* endpoints | [../security.md](../security.md) | +| OpcUaServer | OPC UA endpoint host + composer | [../OpcUaServer.md](../OpcUaServer.md) | +| Host | Role-gated DI graph + Program.cs | [../ServiceHosting.md](../ServiceHosting.md) | + +## Role gating + +`Program.cs` reads `OTOPCUA_ROLES` once (per process) and decides what to wire: + +```csharp +var roles = RoleParser.Parse(Environment.GetEnvironmentVariable("OTOPCUA_ROLES")); +var hasAdmin = roles.Contains("admin"); +var hasDriver = roles.Contains("driver"); + +builder.Services.AddOtOpcUaConfigDb(builder.Configuration); +builder.Services.AddOtOpcUaCluster(builder.Configuration); + +builder.Services.AddAkka("otopcua", (ab, sp) => +{ + ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster options + if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons(); + if (hasDriver) ab.WithOtOpcUaRuntimeActors(); +}); + +if (hasAdmin) +{ + builder.Services.AddOtOpcUaAuth(builder.Configuration); + builder.Services.AddAdminUI(); + // SignalR, AdminOpsClient, etc. +} + +builder.Services.AddOtOpcUaHealth(); +``` + +There is a **single** ActorSystem. Cluster singletons + per-node actors share it via the `Akka.Hosting` registry. This was a v2 fix (the initial Phase 9 wiring ran two ActorSystems by mistake; see commit `d6fac2d`). + +## Live-edit vs draft/publish + +v1 had `ConfigGeneration(Draft|Published)` with every live-edit entity FK'd to a generation. Edits accumulated in a Draft until Publish promoted them. + +v2 removes that entirely: + +- No `ConfigGeneration` table, no `GenerationId` columns. +- Every live-edit entity has a `RowVersion` (`IsRowVersion()`) for last-write-wins. +- Audit goes to `ConfigEdit` (per-row delta) and `ConfigAuditLog` (event-level). +- Deploys snapshot the *current* DB state into an immutable `Deployment.ArtifactBlob` + its `RevisionHash`. That artifact is what driver nodes apply. + +See [ControlPlane.md § Deploy flow](ControlPlane.md#deploy-flow) for the end-to-end dispatch + ACK + seal sequence. + +## NodeId + +Each cluster member has a `NodeId` derived as `{PublicHostname}:{Port}` of the Akka remote endpoint. `ClusterRoleInfo.LocalNode` + `ConfigPublishCoordinator.DiscoverDriverNodes()` use the same formula so they always agree. The port suffix makes loopback test deployments distinguishable (commit `5cfbe8b`); in production the hostname alone is already unique. + +## Health endpoints + +| Path | Returns 200 when… | +|---|---| +| `/healthz` | Process is alive (no checks). | +| `/health/ready` | DB reachable + this node is `Up` in the cluster. | +| `/health/active` | This node is the admin role-leader (used by Traefik/HA-LB to pin traffic). | + +## What lives where (quick map) + +| Concern | Project | Entry point | +|---|---|---| +| Read OTOPCUA_ROLES | `Cluster.RoleParser` | static `Parse(string?)` | +| Cluster lifecycle | `Cluster.WithOtOpcUaClusterBootstrap` | extension on `AkkaConfigurationBuilder` | +| Local node identity | `Cluster.IClusterRoleInfo.LocalNode` | DI singleton | +| Admin singletons | `ControlPlane.WithOtOpcUaControlPlaneSingletons` | extension on `AkkaConfigurationBuilder` | +| Driver actors | `Runtime.WithOtOpcUaRuntimeActors` | extension on `AkkaConfigurationBuilder` | +| Auth pipeline | `Security.AddOtOpcUaAuth` + `MapOtOpcUaAuth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` | +| OPC UA facade | `OpcUaServer.OpcUaApplicationHost` | runtime host, started by driver-role startup | +| Health endpoints | `Host.Health.AddOtOpcUaHealth` + `MapOtOpcUaHealth` | extensions on `IServiceCollection` / `IEndpointRouteBuilder` | diff --git a/docs/v2/Cluster.md b/docs/v2/Cluster.md new file mode 100644 index 0000000..beed2cb --- /dev/null +++ b/docs/v2/Cluster.md @@ -0,0 +1,102 @@ +# OtOpcUa.Cluster + +Akka.NET cluster bootstrap + topology view. Used by every other server-side project to talk to the live cluster. + +Path: `src/Core/ZB.MOM.WW.OtOpcUa.Cluster/` + +## Public surface + +| Type | Role | +|---|---| +| `AkkaClusterOptions` | DI-bound options from `appsettings.json::Cluster`. Hostname/Port/PublicHostname/SeedNodes/Roles. | +| `IClusterRoleInfo` (interface in Commons) | Live view of cluster membership + role-leader topology. Thread-safe + event-raising. | +| `ClusterRoleInfo` | Implementation. Subscribes to `ClusterEvent.IMemberEvent` + `RoleLeaderChanged` + `LeaderChanged`. | +| `HoconLoader.LoadBaseConfig()` | Reads the embedded `Resources/akka.conf`. | +| `RoleParser.Parse(string?)` | Parses `OTOPCUA_ROLES` env var into a deduped `string[]`. | +| `ServiceCollectionExtensions.AddOtOpcUaCluster(configuration)` | Binds options + registers `IClusterRoleInfo` singleton. **Does not** start an ActorSystem. | +| `WithOtOpcUaClusterBootstrap(serviceProvider)` | Extension on `AkkaConfigurationBuilder`. Loads embedded HOCON + applies `WithRemoting(...)` + `WithClustering(...)` from options. | + +## Bootstrap flow + +```csharp +// Program.cs +builder.Services.AddOtOpcUaCluster(builder.Configuration); + +builder.Services.AddAkka("otopcua", (ab, sp) => +{ + ab.WithOtOpcUaClusterBootstrap(sp); // HOCON + remote + cluster + // …singletons + node actors layered on +}); +``` + +Order matters: `AddOtOpcUaCluster` must come before `AddAkka` so the options binding has run by the time the `AddAkka` lambda fires. Inside the lambda, `WithOtOpcUaClusterBootstrap` resolves `IOptions` from `sp` and writes them into the Akka builder. + +The single ActorSystem this produces is what every other v2 piece runs on. There is no second Akka instance — that was a Phase 9 bug (commit `d6fac2d` consolidated). + +## Embedded HOCON + +`src/Core/ZB.MOM.WW.OtOpcUa.Cluster/Resources/akka.conf` contains: + +| Setting | Value | Why | +|---|---|---| +| `akka.actor.provider` | `cluster` | Required for `Cluster.Get(system)` to work. | +| `akka.cluster.split-brain-resolver.active-strategy` | `keep-oldest` | Smaller/younger side downs itself on partition. | +| `akka.cluster.split-brain-resolver.stable-after` | `15s` | Time before SBR acts. | +| `akka.cluster.failure-detector.threshold` | `10.0` | Higher than default (8.0) for GC-pause tolerance. | +| `opcua-synchronized-dispatcher.type` | `PinnedDispatcher` | Dedicated thread for `OpcUaPublishActor` so SDK calls stay marshalled. | + +The Cluster.Tests project verifies these key values stay correct (`HoconLoaderTests`). + +## Configuration + +```json +{ + "Cluster": { + "Hostname": "0.0.0.0", + "Port": 4053, + "PublicHostname": "node-a.lan", + "SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"], + "Roles": ["admin", "driver"] + } +} +``` + +- `Hostname`: interface to bind. `0.0.0.0` listens on every interface. +- `Port`: TCP port for cluster gossip. Default 4053. +- `PublicHostname`: address advertised in cluster gossip. Must be reachable by every other node. +- `SeedNodes`: where new nodes go to join. List one (or two) stable nodes. First node bootstraps the cluster from its own address. +- `Roles`: free-form tags Akka gossip propagates. v2 uses `admin` + `driver`; per-role wiring in `Program.cs` reads `OTOPCUA_ROLES` env var, not this list — these two should stay in sync. + +## IClusterRoleInfo + +Anywhere in the host that needs the local node's identity or a view of who-else-is-in-the-cluster, inject `IClusterRoleInfo`: + +```csharp +public sealed class MyService(IClusterRoleInfo cluster) +{ + public NodeId Self => cluster.LocalNode; + public IReadOnlyList Drivers => cluster.MembersWithRole("driver"); + public NodeId? AdminLeader => cluster.RoleLeader("admin"); + + public MyService(IClusterRoleInfo cluster) + { + cluster.RoleLeaderChanged += (_, e) => + Console.WriteLine($"role={e.Role}: {e.PreviousLeader} → {e.NewLeader}"); + } +} +``` + +`LocalNode` is `{PublicHostname}:{Port}` (the port suffix lets loopback test deployments stay distinct; production hostnames are already unique). `ConfigPublishCoordinator` uses the same `{host}:{port}` formula so the expected-ack set and the driver self-identification agree (commit `5cfbe8b`). + +## Lifecycle + +Akka.Hosting owns the lifecycle: `IHostedService` starts the ActorSystem at host start, runs `CoordinatedShutdown.ClusterLeavingReason` on host stop. The Cluster project does not register its own `IHostedService` (the v1 `AkkaHostedService` was deleted in commit `d6fac2d`). + +## Tests + +`tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests/` covers: + +- `HoconLoaderTests` — embedded resource loads + key settings parse correctly. +- `RoleParserTests` — comma-split + dedup + trim semantics. + +Cross-project integration is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/` (cluster formation, deploy round-trip). diff --git a/docs/v2/ControlPlane.md b/docs/v2/ControlPlane.md new file mode 100644 index 0000000..9dbd30b --- /dev/null +++ b/docs/v2/ControlPlane.md @@ -0,0 +1,99 @@ +# OtOpcUa.ControlPlane + +Five admin-role cluster singletons that drive the v2 deploy, audit, fleet, and redundancy stories. Path: `src/Server/ZB.MOM.WW.OtOpcUa.ControlPlane/`. + +## Singletons + +| Actor | File | Marker key | Role | +|---|---|---|---| +| `ConfigPublishCoordinator` | `Coordinators/ConfigPublishCoordinator.cs` | `ConfigPublishCoordinatorKey` | Dispatches `DispatchDeployment`, collects `ApplyAck`s, seals/fails/times-out. | +| `AdminOperationsActor` | `AdminOperations/AdminOperationsActor.cs` | `AdminOperationsActorKey` | Receives `StartDeployment` from the UI, snapshots ConfigDb via `ConfigComposer`, persists `Deployment` row + `ConfigEdit` marker, tells the coordinator to dispatch. | +| `AuditWriterActor` | `Audit/AuditWriterActor.cs` | `AuditWriterActorKey` | Batched `ConfigAuditLog` writer. Flushes every 500 events or 5 s. In-buffer dedup; cross-restart dedup tracked as F3. | +| `FleetStatusBroadcaster` | `Fleet/FleetStatusBroadcaster.cs` | `FleetStatusBroadcasterKey` | Aggregates per-node `FleetNodeStatus` heartbeats; publishes `FleetStatusChanged` on the `fleet-status` DPS topic (SignalR bridge tracked as F16). | +| `RedundancyStateActor` | `Redundancy/RedundancyStateActor.cs` | `RedundancyStateActorKey` | Cluster-event subscriber; debounces 250 ms; publishes `RedundancyStateChanged` on the `redundancy-state` DPS topic. | + +All five register via `WithOtOpcUaControlPlaneSingletons()` (extension on `AkkaConfigurationBuilder`). Each uses `ClusterSingletonOptions { Role = "admin" }` so the singleton runs on the admin role-leader and migrates to the next admin node on failover. + +```csharp +// Program.cs (admin role only) +builder.Services.AddAkka("otopcua", (ab, sp) => +{ + ab.WithOtOpcUaClusterBootstrap(sp); + if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons(); + if (hasDriver) ab.WithOtOpcUaRuntimeActors(); +}); +``` + +Resolve from anywhere via `IRequiredActor` or the `ActorRegistry`: + +```csharp +public sealed class AdminOperationsClient(ActorRegistry registry) : IAdminOperationsClient +{ + private readonly IActorRef _proxy = registry.Get(); + // ... +} +``` + +## Deploy flow + +``` +UI → IAdminOperationsClient.StartDeploymentAsync(createdBy) + │ Ask the AdminOperationsActor singleton proxy + ▼ +AdminOperationsActor + │ ConfigComposer.SnapshotAndFlattenAsync(db) → ConfigArtifact(blob, revHash) + │ insert Deployment(Dispatching) + ConfigEdit marker + │ Tell coordinator → DispatchDeployment + ▼ +ConfigPublishCoordinator + │ DiscoverDriverNodes() → expected ACK set (host:port per member) + │ insert NodeDeploymentState(Applying) per driver + │ Publish DispatchDeployment on "deployments" topic + │ Start apply-deadline timer (2 min default) + ▼ DistributedPubSub +DriverHostActor (on each driver node — subscribed to "deployments") + │ PreStart subscribed; current state Steady(rev) + │ if currentRev == msg.rev → immediate ApplyAck(Applied) (idempotent) + │ else Become(Applying) → write NodeDeploymentStatus → ApplyAck + ▼ via "deployment-acks" topic +ConfigPublishCoordinator (subscribed to "deployment-acks" in PreStart) + │ PersistNodeAck + collect + │ all-Applied → Sealed + │ any-Failed → PartiallyFailed + │ deadline → TimedOut +``` + +The dedicated `deployment-acks` topic + coordinator subscription was added in commit `5cfbe8b`. Before that, ACKs were published back on `deployments` and the coordinator (not subscribed) silently dropped them — deployments hung at `AwaitingApplyAcks` forever in multi-node tests. + +### Failover recovery + +If the admin singleton fails over mid-deploy, the new instance's `PreStart` queries `NodeDeploymentState` for any `Dispatching`/`AwaitingApplyAcks` row, rebuilds `_expectedAcks` + `_acks` from persisted state, and resumes the deadline timer. See `Coordinators/ConfigPublishCoordinator.cs::PreStart`. + +## ConfigComposer + +Pure function `SnapshotAndFlattenAsync(db) → ConfigArtifact(byte[], string)`: + +1. Reads every live-edit table. +2. Serialises to a stable byte[] (deterministic ordering). +3. Computes SHA-256 over the bytes → 64-hex `RevisionHash`. + +Same DB state → same artifact + same hash. That's what makes the `NoChanges` outcome work (AdminOperations compares the proposed hash to the last sealed deployment's hash). + +## ServiceLevelCalculator + +Pure function exposed at `Redundancy/ServiceLevelCalculator.Compute(NodeHealthInputs)`. Returns the OPC UA `ServiceLevel` byte per the truth table in [Redundancy.md](../Redundancy.md#servicelevel-tiers-part-5-65). No side effects; trivially unit-testable. + +## DPS topics + +| Topic | Publisher | Subscribers | +|---|---|---| +| `deployments` | ConfigPublishCoordinator | DriverHostActor (per-node) | +| `deployment-acks` | DriverHostActor | ConfigPublishCoordinator | +| `fleet-status` | FleetStatusBroadcaster | (SignalR bridge — F16) | +| `redundancy-state` | RedundancyStateActor | (per-node ServiceLevel calc — F10) | + +## Tests + +`tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests/` — 29 tests covering coordinator (happy path, timeout, failover recovery), AdminOps (StartDeployment outcomes), AuditWriter (batching, dedup), FleetStatusBroadcaster (heartbeat staleness), RedundancyStateActor (debounce, snapshot), ConfigComposer (purity), ServiceLevelCalculator (truth table). + +Multi-node tests (cross-ActorSystem) are in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/`. diff --git a/docs/v2/Runtime.md b/docs/v2/Runtime.md new file mode 100644 index 0000000..40bf820 --- /dev/null +++ b/docs/v2/Runtime.md @@ -0,0 +1,126 @@ +# OtOpcUa.Runtime + +Driver-role actor tree — one set per node. Path: `src/Server/ZB.MOM.WW.OtOpcUa.Runtime/`. + +## Actor tree + +``` + DriverHostActor (per node) + │ state machine: Steady ⇄ Applying ⇄ Stale + │ + ├──▶ DriverInstanceActor (per configured DriverInstance row) + │ state: Connecting → Connected → Reconnecting (or Stubbed) + │ + ├──▶ VirtualTagActor (per VirtualTag row) + │ compiles + evaluates expression, publishes derived value + │ + ├──▶ ScriptedAlarmActor (per ScriptedAlarm row) + │ state: Inactive ⇄ Active ⇄ Acknowledged + │ + ├──▶ OpcUaPublishActor (per node, pinned dispatcher) + │ marshalled OPC UA SDK writes + RebuildAddressSpace + │ + ├──▶ HistorianAdapterActor (per node) + │ pipe IPC to Wonderware historian sidecar + │ + ├──▶ PeerOpcUaProbeActor (per peer node) + │ opc.tcp ping → redundancy-state DPS topic + │ + └──▶ DbHealthProbeActor (per node) + cached SELECT 1; consumed by /health/ready + redundancy calc +``` + +## Public surface + +| Type | File | +|---|---| +| `WithOtOpcUaRuntimeActors()` | `ServiceCollectionExtensions.cs` — extension on `AkkaConfigurationBuilder`. Spawns `DriverHostActor` + `DbHealthProbeActor` on the host's ActorSystem. | +| `DriverHostActor` | `Drivers/DriverHostActor.cs` | +| `DriverInstanceActor` | `Drivers/DriverInstanceActor.cs` | +| `VirtualTagActor` | `VirtualTags/VirtualTagActor.cs` | +| `ScriptedAlarmActor` | `ScriptedAlarms/ScriptedAlarmActor.cs` | +| `OpcUaPublishActor` | `OpcUa/OpcUaPublishActor.cs` | +| `HistorianAdapterActor` | `Historian/HistorianAdapterActor.cs` | +| `PeerOpcUaProbeActor` | `Health/PeerOpcUaProbeActor.cs` | +| `DbHealthProbeActor` | `Health/DbHealthProbeActor.cs` | + +Marker keys for registry lookup: `DriverHostActorKey`, `DbHealthProbeActorKey`. + +## DriverHostActor + +Per-node supervisor with three Become states: + +| State | Meaning | +|---|---| +| `Steady(rev)` | Caught up. `DispatchDeployment` with `msg.rev == currentRev` → immediate `ApplyAck(Applied)` (idempotent). New rev → `Become(Applying)`. | +| `Applying(id)` | Apply in progress. Further `DispatchDeployment` for in-flight ID → debug-log + ignore. For new ID → defer via `Self.Forward`. | +| `Stale` | ConfigDb unreachable on bootstrap. Periodic `RetryConfigDbConnection` tries to advance to `Steady`. | + +`PreStart`: + +1. Subscribe to `deployments` DPS topic. +2. Read most-recent `NodeDeploymentState` for this node from ConfigDb. +3. If `Applied` → restore `_currentRevision`, `Become(Steady)`. +4. If `Applying` (orphan from crash) → replay apply (idempotent). +5. If `Failed` → `Become(Steady)` at last known rev. +6. DB unreachable → `Become(Stale)`, start retry timer. + +ACK publishing: when no `_coordinatorOverride` is set (production), `SendAck` publishes on the dedicated `deployment-acks` DPS topic which the coordinator subscribes to (commit `5cfbe8b`). + +## DriverInstanceActor + +Per-driver-instance child. State machine: + +- `Connecting` → first attempt to reach the underlying driver +- `Connected` → subscriptions active, reads/writes flow +- `Reconnecting` → temporary disconnect; backoff retry +- `Stubbed` → DEV-STUB mode for Windows-only drivers (Galaxy, Wonderware Historian) on non-Windows or when `roles` contains `dev` + +`ShouldStub(driverType, roles)` returns `true` for `"Galaxy" | "Historian.Wonderware"` on non-Windows; the actor goes straight to `Stubbed` and returns deterministic success without touching real hardware. Wiring this into the DriverHost child-spawn path is follow-up F20 (folds into F7). + +Engine wiring (subscription publishing, ApplyDelta diff, bad-quality-on-disconnect, write path, supervisor backoff) is stubbed — tracked as F7. Tests exercise message contracts, not engine behaviour. + +## VirtualTagActor / ScriptedAlarmActor + +Skeleton state machines + message handlers. Engine work: + +- `VirtualTagEngine.Evaluate()` not yet called from `VirtualTagActor.DependencyValueChanged` (F8). +- `AlarmConditionService` not yet called from `ScriptedAlarmActor` (F9). +- `ScriptedAlarmState` DB persistence on `PreRestart` not wired (F9). + +## OpcUaPublishActor + +The only actor on the **pinned dispatcher** (`opcua-synchronized-dispatcher` from `akka.conf`). All OPC UA SDK address-space writes go through it so the SDK's threading model isn't violated. + +Message contracts are defined; actual SDK calls are stubbed (counters only). Real address-space writes + `ServiceLevel` Variable updates + `RebuildAddressSpace` after a deploy land in F10 (gated on F13 — full `OpcUaApplicationHost` extraction). + +## HistorianAdapterActor, PeerOpcUaProbeActor + +Both have message contracts wired. Engine integration deferred: + +- `HistorianAdapterActor` — named-pipe IPC to the Wonderware historian sidecar + `SqliteStoreAndForwardSink` (F11). +- `PeerOpcUaProbeActor` — real `opc.tcp://peer:4840` ping (F12). Current stub always returns `Ok=true`. + +## DbHealthProbeActor + +`Ask` returns cached state (refreshed every 5 s by an internal `SELECT 1`). Consumed by `/health/ready` and `RedundancyStateActor`. + +## Lifecycle wiring + +```csharp +// Program.cs (driver role only) +builder.Services.AddAkka("otopcua", (ab, sp) => +{ + ab.WithOtOpcUaClusterBootstrap(sp); + if (hasAdmin) ab.WithOtOpcUaControlPlaneSingletons(); + if (hasDriver) ab.WithOtOpcUaRuntimeActors(); +}); +``` + +`WithOtOpcUaRuntimeActors` resolves `IDbContextFactory` + `IClusterRoleInfo` from DI, then spawns `DbHealthProbeActor` and `DriverHostActor` as top-level `/user/` actors. Both register marker keys in `ActorRegistry` so the registry lookup works from anywhere. + +## Tests + +`tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests/` — 16 tests covering DriverHostActor (Steady ack, Applying transitions, Stale recovery), DriverInstanceActor (state machine, stub mode), VirtualTagActor + ScriptedAlarmActor (message contracts), OpcUaPublishActor (props + message acceptance), DbHealthProbe + PeerOpcUaProbe (probe loop), and the `WithOtOpcUaRuntimeActors` registration round-trip. + +End-to-end deploy from admin → driver via the cluster is in `tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests/DeployHappyPathTests.cs`.