Files
lmxopcua/docs/plans/2026-06-07-per-cluster-scoping-design.md
T
Joseph Doherty ab8900eee5 docs(design): per-ClusterId scoping for hub-and-spoke single mesh
Central cluster (2 fused admin+driver nodes) hosts the only UI + deploy
singleton; site clusters (2 driver-only nodes each) join the central mesh
and are logically separated by ClusterId. Each node applies only its own
cluster's drivers + address space on a global deploy. Approved design;
next step is the implementation plan.
2026-06-07 02:50:49 -04:00

9.9 KiB
Raw Blame History

Per-ClusterId Scoping (hub-and-spoke single mesh) — Design

Date: 2026-06-07 Status: Approved (brainstorming complete; next step: writing-plans) Branch: feat/per-cluster-scoping

Goal

Let one central cluster's Admin UI manage and deploy to multiple logically-separate clusters that share a single Akka mesh. The central cluster runs 2 fused admin,driver nodes (the only UI + the only deploy singleton); each site cluster runs 2 driver-only nodes. A single global deploy from the central UI reaches every node, and each node applies only the slice of the configuration that belongs to its own ClusterId — its drivers and its OPC UA address space. Ship global deploy first; per-cluster deploy is a later follow-up.

Why this needs runtime work

The deploy channel is in-mesh: AdminUI → admin-operations singleton → ConfigPublishCoordinator → DistributedPubSub → driver nodes. DistributedPubSub does not cross Akka mesh boundaries, so for the central UI to deploy to site servers the site nodes must join the central mesh. But the runtime currently assumes one Akka mesh == one logical cluster:

  • DriverHostActor.ReconcileDrivers spawns every DriverInstance in the artifact with no cluster filter (DriverHostActor.cs:367). The ClusterId on a spec is used only to label health snapshots.
  • ConfigPublishCoordinator.DiscoverDriverNodes broadcasts to every driver member of the mesh, no ClusterId filter (ConfigPublishCoordinator.cs:248).
  • ConfigComposer.SnapshotAndFlattenAsync snapshots all clusters' rows into one flat artifact; the address space is built from the whole thing.

Consequence today: put MAIN + SITE-A + SITE-B nodes in one mesh and every node spawns every cluster's drivers (Galaxy auto-stubs on Linux, so it would start) and serves a merged address space of all three clusters. That is why the existing docker-dev rig uses three isolated meshes.

This design adds the missing per-ClusterId scoping so a shared mesh behaves as distinct logical clusters.

Approach (chosen: A — node-side, parse-time filter, ClusterId from the artifact)

Each node resolves its own ClusterId by finding its NodeId (_localNode.Value, format "host:port", e.g. central-1:4053) in the artifact's ClusterNode rows, then filters both the driver specs and the address-space composition to that cluster.

The artifact is a self-contained, consistent snapshot that already includes ClusterNode + DriverInstance + Namespace + UnsArea (all carrying ClusterId), so resolution needs no extra DB query and has no seal-vs-apply inconsistency window. The coordinator stays a single broadcast; every node just applies its own slice.

Alternatives considered

  • B — control-plane per-node artifact slices. ConfigComposer emits a filtered artifact per cluster and the coordinator dispatches the right slice to each node. Rejected: turns one broadcast into per-cluster dispatch (a large change to the deploy/ack model), contradicts "ship global first," and still needs the same transitive ClusterId resolution.
  • C — runtime DB lookup for ClusterId. Node queries ClusterNode by its address at apply time, then filters post-parse. Rejected: extra DB round-trip per node per deploy and a seal-vs-apply inconsistency window; the artifact already contains everything A needs.

Components

1. Self-ClusterId resolution

New helper DeploymentArtifact.ParseClusterScope(blob, nodeId) returning (string? ClusterId, int ClusterCount):

  • ClusterId = the ClusterNode row whose NodeId == nodeId, else null.
  • ClusterCount = number of ServerCluster rows in the artifact.

Both DriverHostActor and OpcUaPublishActor call it with _localNode.Value.

Fallback rule (single source of truth for every filter site):

Condition Behavior
ClusterCount ≤ 1 Lenient — no filter (legacy single-cluster meshes + the entire existing test suite behave exactly as today).
ClusterCount > 1 and ClusterId resolved Filter to my cluster.
ClusterCount > 1 and ClusterId unresolved Apply nothing + log error (a node in a multi-cluster mesh with no ClusterNode row is misconfigured; serving everything would leak other clusters' data).

The ClusterCount ≤ 1 lenient branch is what protects the existing ~210 v2 tests and any single-cluster deployment from any behavior change.

2. Driver-spawn filter — DriverHostActor

DriverInstanceSpec already carries ClusterId, so in ReconcileDrivers (and the restart RestoreServedState path) apply a one-line predicate over the parsed specs using the fallback rule. In multi-cluster mode, specs with a null ClusterId are excluded + logged (should never occur — ConfigComposer always serializes the column).

3. Address-space filter — ParseComposition + OpcUaPublishActor

Add DeploymentArtifact.ParseComposition(blob, clusterId). At parse time the raw artifact entities still carry ClusterId / NamespaceId / UnsAreaId / DriverInstanceId, so build in-cluster id sets from the artifact and filter every projection:

Projection Filter predicate
UnsAreas ClusterId == mine (direct)
UnsLines UnsAreaId ∈ myAreas
EquipmentNodes DriverInstanceId ∈ myDrivers
DriverInstancePlans DriverInstanceId ∈ myDrivers
GalaxyTags / EquipmentTags DriverInstanceId ∈ myDrivers
ScriptedAlarmPlans EquipmentId ∈ myEquipment

OpcUaPublishActor.HandleRebuild resolves myClusterId and calls the filtered parse before Phase7Planner.Compute. _lastApplied becomes the filtered composition, so the incremental diff stays correct across redeploys. The no-arg ParseComposition(blob) is left untouched (legacy / single-cluster path).

4. Deploy ack / convergence

ConfigPublishCoordinator keeps broadcasting to all driver members and waiting for all acks (in the new rig all 6 nodes are driver-role). Each node applies its slice and acks — including a node whose cluster has an empty slice. The one risk: the ack must fire even when the node's plan is empty. Implementation will verify the ack is unconditional and add a small fix if it is currently gated on a non-empty change set. No change to DiscoverDriverNodes.

5. docker-dev compose + seed rewrite

  • compose: remove admin-a / admin-b / driver-a / driver-b; add central-1 / central-2 (OTOPCUA_ROLES=admin,driver, seed = central-1, OPC UA 4840 / 4841, ASPNETCORE UI on :9000). site-a-1/2, site-b-1/2 become driver-only (OTOPCUA_ROLES=driver, Cluster__Roles__0=driver, seed → central-1, OPC UA 48424845), dropping their UI / Jwt / Ldap / DeployApiKey env + Traefik exposure. All nodes share the one ConfigDb.
  • traefik: single PathPrefix(/) router → central-1 / central-2 (sticky cookie); drop the two site routers + services in both docker-compose.yml and traefik-dynamic.yml.
  • seed SQL (seed/seed-clusters.sql): MAIN ClusterNode rows become central-1:4053 / central-2:4053 (replacing driver-a / driver-b); SITE-A / SITE-B keep their ServerCluster + 2 ClusterNode rows but no drivers/tags (empty sites). Update the Notes columns + the file header comments. The Galaxy namespace / driver / tags stay on MAIN (they run on the central fused nodes).
  • compose header + comment blocks: rewrite the topology description (single mesh, hub-and-spoke, central-only UI).

Data flow (after the change)

  1. Operator clicks Deploy in the central UI (or POST /api/deployments).
  2. admin-operations singleton (on a central node) → ConfigComposer snapshots all clusters' rows into one artifact → ConfigPublishCoordinator broadcasts DispatchDeployment to all driver members.
  3. Each node resolves its own ClusterId from the artifact's ClusterNode rows.
  4. DriverHostActor spawns only its cluster's DriverInstances.
  5. OpcUaPublishActor materialises only its cluster's address space.
  6. Every node acks; the coordinator seals the deployment when all acks arrive.
  7. Result: central :4840/:4841 serve MAIN's Galaxy tree; site :4842:4845 serve only their own (empty until configured) trees.

Error handling

  • Misconfigured node (multi-cluster mesh, no matching ClusterNode row): applies nothing, logs an error, still acks (so the deploy converges rather than hanging). Surfaced for the operator to add the missing ClusterNode row.
  • Pre-PR / single-cluster artifacts: ClusterCount ≤ 1 → lenient no-filter, identical to current behavior.
  • Empty cluster slice: node applies an empty plan and acks normally.

Testing

  • Unit: ParseClusterScope (match / miss / count); ParseComposition(blob, clusterId) (cross-cluster projections excluded; transitive resolution for UnsLine / Equipment / Tag / ScriptedAlarm); the driver-spec filter predicate (lenient / strict / unresolved-strict).
  • Integration: a 2-cluster scoping test on the in-process harness — two driver nodes assigned to different ClusterIds, one deploy, assert each spawns only its cluster's drivers and materialises only its cluster's tree.
  • Backward-compat: the existing single-cluster suites must stay green (the ClusterCount ≤ 1 lenient branch guarantees this).
  • Live (docker-dev rig): bring the rig up, sign into the central UI, confirm 3 clusters listed, deploy, confirm :4840 shows the Galaxy tree and :4842/:4844 are empty (not the merged tree).

Classification

High-risk — touches the actor model, the Phase7 data contract, and the deploy path. The implementation plan will be TDD'd section by section.

Out of scope (follow-ups)

  • Per-cluster deploy (deploy just SITE-A from the UI) — global deploy ships first; per-cluster targeting is a later coordinator + UI enhancement.
  • Seeding demo drivers on the sites — sites start empty; drivers are added via the central UI.