docs(design): per-ClusterId scoping for hub-and-spoke single mesh
Central cluster (2 fused admin+driver nodes) hosts the only UI + deploy singleton; site clusters (2 driver-only nodes each) join the central mesh and are logically separated by ClusterId. Each node applies only its own cluster's drivers + address space on a global deploy. Approved design; next step is the implementation plan.
This commit is contained in:
@@ -0,0 +1,193 @@
|
|||||||
|
# Per-ClusterId Scoping (hub-and-spoke single mesh) — Design
|
||||||
|
|
||||||
|
**Date:** 2026-06-07
|
||||||
|
**Status:** Approved (brainstorming complete; next step: writing-plans)
|
||||||
|
**Branch:** `feat/per-cluster-scoping`
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Let one **central** cluster's Admin UI manage and deploy to multiple
|
||||||
|
logically-separate clusters that share a single Akka mesh. The central cluster
|
||||||
|
runs 2 fused `admin,driver` nodes (the only UI + the only deploy singleton);
|
||||||
|
each site cluster runs 2 `driver`-only nodes. A single global deploy from the
|
||||||
|
central UI reaches every node, and **each node applies only the slice of the
|
||||||
|
configuration that belongs to its own `ClusterId`** — its drivers and its OPC UA
|
||||||
|
address space. Ship global deploy first; per-cluster deploy is a later follow-up.
|
||||||
|
|
||||||
|
## Why this needs runtime work
|
||||||
|
|
||||||
|
The deploy channel is **in-mesh**: AdminUI → `admin-operations` singleton →
|
||||||
|
`ConfigPublishCoordinator` → DistributedPubSub → driver nodes. DistributedPubSub
|
||||||
|
does not cross Akka mesh boundaries, so for the central UI to deploy to site
|
||||||
|
servers the site nodes **must join the central mesh**. But the runtime currently
|
||||||
|
assumes **one Akka mesh == one logical cluster**:
|
||||||
|
|
||||||
|
- `DriverHostActor.ReconcileDrivers` spawns **every** `DriverInstance` in the
|
||||||
|
artifact with no cluster filter (`DriverHostActor.cs:367`). The `ClusterId` on
|
||||||
|
a spec is used only to *label* health snapshots.
|
||||||
|
- `ConfigPublishCoordinator.DiscoverDriverNodes` broadcasts to **every** driver
|
||||||
|
member of the mesh, no `ClusterId` filter (`ConfigPublishCoordinator.cs:248`).
|
||||||
|
- `ConfigComposer.SnapshotAndFlattenAsync` snapshots **all** clusters' rows into
|
||||||
|
one flat artifact; the address space is built from the whole thing.
|
||||||
|
|
||||||
|
Consequence today: put MAIN + SITE-A + SITE-B nodes in one mesh and every node
|
||||||
|
spawns every cluster's drivers (Galaxy auto-stubs on Linux, so it *would* start)
|
||||||
|
and serves a **merged** address space of all three clusters. That is why the
|
||||||
|
existing docker-dev rig uses three isolated meshes.
|
||||||
|
|
||||||
|
This design adds the missing per-`ClusterId` scoping so a shared mesh behaves as
|
||||||
|
distinct logical clusters.
|
||||||
|
|
||||||
|
## Approach (chosen: A — node-side, parse-time filter, ClusterId from the artifact)
|
||||||
|
|
||||||
|
Each node resolves *its own* `ClusterId` by finding its `NodeId`
|
||||||
|
(`_localNode.Value`, format `"host:port"`, e.g. `central-1:4053`) in the
|
||||||
|
artifact's `ClusterNode` rows, then filters both the driver specs and the
|
||||||
|
address-space composition to that cluster.
|
||||||
|
|
||||||
|
The artifact is a self-contained, consistent snapshot that already includes
|
||||||
|
`ClusterNode` + `DriverInstance` + `Namespace` + `UnsArea` (all carrying
|
||||||
|
`ClusterId`), so resolution needs **no extra DB query** and has no
|
||||||
|
seal-vs-apply inconsistency window. The coordinator stays a **single broadcast**;
|
||||||
|
every node just applies its own slice.
|
||||||
|
|
||||||
|
### Alternatives considered
|
||||||
|
|
||||||
|
- **B — control-plane per-node artifact slices.** `ConfigComposer` emits a
|
||||||
|
filtered artifact per cluster and the coordinator dispatches the right slice to
|
||||||
|
each node. Rejected: turns one broadcast into per-cluster dispatch (a large
|
||||||
|
change to the deploy/ack model), contradicts "ship global first," and still
|
||||||
|
needs the same transitive `ClusterId` resolution.
|
||||||
|
- **C — runtime DB lookup for ClusterId.** Node queries `ClusterNode` by its
|
||||||
|
address at apply time, then filters post-parse. Rejected: extra DB round-trip
|
||||||
|
per node per deploy and a seal-vs-apply inconsistency window; the artifact
|
||||||
|
already contains everything A needs.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### 1. Self-`ClusterId` resolution
|
||||||
|
|
||||||
|
New helper `DeploymentArtifact.ParseClusterScope(blob, nodeId)` returning
|
||||||
|
`(string? ClusterId, int ClusterCount)`:
|
||||||
|
- `ClusterId` = the `ClusterNode` row whose `NodeId == nodeId`, else `null`.
|
||||||
|
- `ClusterCount` = number of `ServerCluster` rows in the artifact.
|
||||||
|
|
||||||
|
Both `DriverHostActor` and `OpcUaPublishActor` call it with `_localNode.Value`.
|
||||||
|
|
||||||
|
**Fallback rule (single source of truth for every filter site):**
|
||||||
|
|
||||||
|
| Condition | Behavior |
|
||||||
|
|---|---|
|
||||||
|
| `ClusterCount ≤ 1` | **Lenient — no filter** (legacy single-cluster meshes + the entire existing test suite behave exactly as today). |
|
||||||
|
| `ClusterCount > 1` and `ClusterId` resolved | **Filter to my cluster.** |
|
||||||
|
| `ClusterCount > 1` and `ClusterId` unresolved | **Apply nothing + log error** (a node in a multi-cluster mesh with no `ClusterNode` row is misconfigured; serving everything would leak other clusters' data). |
|
||||||
|
|
||||||
|
The `ClusterCount ≤ 1` lenient branch is what protects the existing ~210 v2
|
||||||
|
tests and any single-cluster deployment from any behavior change.
|
||||||
|
|
||||||
|
### 2. Driver-spawn filter — `DriverHostActor`
|
||||||
|
|
||||||
|
`DriverInstanceSpec` already carries `ClusterId`, so in `ReconcileDrivers` (and
|
||||||
|
the restart `RestoreServedState` path) apply a one-line predicate over the parsed
|
||||||
|
specs using the fallback rule. In multi-cluster mode, specs with a `null`
|
||||||
|
`ClusterId` are excluded + logged (should never occur — `ConfigComposer` always
|
||||||
|
serializes the column).
|
||||||
|
|
||||||
|
### 3. Address-space filter — `ParseComposition` + `OpcUaPublishActor`
|
||||||
|
|
||||||
|
Add `DeploymentArtifact.ParseComposition(blob, clusterId)`. At parse time the raw
|
||||||
|
artifact entities still carry `ClusterId` / `NamespaceId` / `UnsAreaId` /
|
||||||
|
`DriverInstanceId`, so build in-cluster id sets from the artifact and filter every
|
||||||
|
projection:
|
||||||
|
|
||||||
|
| Projection | Filter predicate |
|
||||||
|
|---|---|
|
||||||
|
| `UnsAreas` | `ClusterId == mine` (direct) |
|
||||||
|
| `UnsLines` | `UnsAreaId ∈ myAreas` |
|
||||||
|
| `EquipmentNodes` | `DriverInstanceId ∈ myDrivers` |
|
||||||
|
| `DriverInstancePlans` | `DriverInstanceId ∈ myDrivers` |
|
||||||
|
| `GalaxyTags` / `EquipmentTags` | `DriverInstanceId ∈ myDrivers` |
|
||||||
|
| `ScriptedAlarmPlans` | `EquipmentId ∈ myEquipment` |
|
||||||
|
|
||||||
|
`OpcUaPublishActor.HandleRebuild` resolves `myClusterId` and calls the filtered
|
||||||
|
parse before `Phase7Planner.Compute`. `_lastApplied` becomes the filtered
|
||||||
|
composition, so the incremental diff stays correct across redeploys. The no-arg
|
||||||
|
`ParseComposition(blob)` is left untouched (legacy / single-cluster path).
|
||||||
|
|
||||||
|
### 4. Deploy ack / convergence
|
||||||
|
|
||||||
|
`ConfigPublishCoordinator` keeps broadcasting to all driver members and waiting
|
||||||
|
for all acks (in the new rig all 6 nodes are driver-role). Each node applies its
|
||||||
|
slice and acks — **including a node whose cluster has an empty slice**. The one
|
||||||
|
risk: the ack must fire even when the node's plan is empty. Implementation will
|
||||||
|
**verify the ack is unconditional** and add a small fix if it is currently gated
|
||||||
|
on a non-empty change set. No change to `DiscoverDriverNodes`.
|
||||||
|
|
||||||
|
### 5. docker-dev compose + seed rewrite
|
||||||
|
|
||||||
|
- **compose:** remove `admin-a` / `admin-b` / `driver-a` / `driver-b`; add
|
||||||
|
`central-1` / `central-2` (`OTOPCUA_ROLES=admin,driver`, seed = `central-1`,
|
||||||
|
OPC UA `4840` / `4841`, ASPNETCORE UI on `:9000`). `site-a-1/2`, `site-b-1/2`
|
||||||
|
become `driver`-only (`OTOPCUA_ROLES=driver`, `Cluster__Roles__0=driver`, seed
|
||||||
|
→ `central-1`, OPC UA `4842`–`4845`), dropping their UI / Jwt / Ldap /
|
||||||
|
DeployApiKey env + Traefik exposure. All nodes share the one ConfigDb.
|
||||||
|
- **traefik:** single `PathPrefix(/)` router → `central-1` / `central-2`
|
||||||
|
(sticky cookie); drop the two site routers + services in both
|
||||||
|
`docker-compose.yml` and `traefik-dynamic.yml`.
|
||||||
|
- **seed SQL (`seed/seed-clusters.sql`):** MAIN `ClusterNode` rows become
|
||||||
|
`central-1:4053` / `central-2:4053` (replacing `driver-a` / `driver-b`);
|
||||||
|
SITE-A / SITE-B keep their `ServerCluster` + 2 `ClusterNode` rows but **no
|
||||||
|
drivers/tags** (empty sites). Update the `Notes` columns + the file header
|
||||||
|
comments. The Galaxy namespace / driver / tags stay on MAIN (they run on the
|
||||||
|
central fused nodes).
|
||||||
|
- **compose header + comment blocks:** rewrite the topology description (single
|
||||||
|
mesh, hub-and-spoke, central-only UI).
|
||||||
|
|
||||||
|
## Data flow (after the change)
|
||||||
|
|
||||||
|
1. Operator clicks **Deploy** in the central UI (or `POST /api/deployments`).
|
||||||
|
2. `admin-operations` singleton (on a central node) → `ConfigComposer` snapshots
|
||||||
|
**all** clusters' rows into one artifact → `ConfigPublishCoordinator`
|
||||||
|
broadcasts `DispatchDeployment` to **all** driver members.
|
||||||
|
3. Each node resolves its own `ClusterId` from the artifact's `ClusterNode` rows.
|
||||||
|
4. `DriverHostActor` spawns only its cluster's `DriverInstance`s.
|
||||||
|
5. `OpcUaPublishActor` materialises only its cluster's address space.
|
||||||
|
6. Every node acks; the coordinator seals the deployment when all acks arrive.
|
||||||
|
7. Result: central `:4840`/`:4841` serve MAIN's Galaxy tree; site
|
||||||
|
`:4842`–`:4845` serve only their own (empty until configured) trees.
|
||||||
|
|
||||||
|
## Error handling
|
||||||
|
|
||||||
|
- **Misconfigured node** (multi-cluster mesh, no matching `ClusterNode` row):
|
||||||
|
applies nothing, logs an error, still acks (so the deploy converges rather than
|
||||||
|
hanging). Surfaced for the operator to add the missing `ClusterNode` row.
|
||||||
|
- **Pre-PR / single-cluster artifacts:** `ClusterCount ≤ 1` → lenient no-filter,
|
||||||
|
identical to current behavior.
|
||||||
|
- **Empty cluster slice:** node applies an empty plan and acks normally.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- **Unit:** `ParseClusterScope` (match / miss / count); `ParseComposition(blob,
|
||||||
|
clusterId)` (cross-cluster projections excluded; transitive resolution for
|
||||||
|
UnsLine / Equipment / Tag / ScriptedAlarm); the driver-spec filter predicate
|
||||||
|
(lenient / strict / unresolved-strict).
|
||||||
|
- **Integration:** a 2-cluster scoping test on the in-process harness — two
|
||||||
|
driver nodes assigned to different `ClusterId`s, one deploy, assert each spawns
|
||||||
|
only its cluster's drivers and materialises only its cluster's tree.
|
||||||
|
- **Backward-compat:** the existing single-cluster suites must stay green (the
|
||||||
|
`ClusterCount ≤ 1` lenient branch guarantees this).
|
||||||
|
- **Live (docker-dev rig):** bring the rig up, sign into the central UI, confirm
|
||||||
|
3 clusters listed, deploy, confirm `:4840` shows the Galaxy tree and
|
||||||
|
`:4842`/`:4844` are empty (not the merged tree).
|
||||||
|
|
||||||
|
## Classification
|
||||||
|
|
||||||
|
High-risk — touches the actor model, the Phase7 data contract, and the deploy
|
||||||
|
path. The implementation plan will be TDD'd section by section.
|
||||||
|
|
||||||
|
## Out of scope (follow-ups)
|
||||||
|
|
||||||
|
- **Per-cluster deploy** (deploy just SITE-A from the UI) — global deploy ships
|
||||||
|
first; per-cluster targeting is a later coordinator + UI enhancement.
|
||||||
|
- **Seeding demo drivers on the sites** — sites start empty; drivers are added
|
||||||
|
via the central UI.
|
||||||
Reference in New Issue
Block a user