Files
lmxopcua/docs/plans/2026-06-07-per-cluster-scoping-design.md
T
Joseph Doherty ab8900eee5 docs(design): per-ClusterId scoping for hub-and-spoke single mesh
Central cluster (2 fused admin+driver nodes) hosts the only UI + deploy
singleton; site clusters (2 driver-only nodes each) join the central mesh
and are logically separated by ClusterId. Each node applies only its own
cluster's drivers + address space on a global deploy. Approved design;
next step is the implementation plan.
2026-06-07 02:50:49 -04:00

194 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Per-ClusterId Scoping (hub-and-spoke single mesh) — Design
**Date:** 2026-06-07
**Status:** Approved (brainstorming complete; next step: writing-plans)
**Branch:** `feat/per-cluster-scoping`
## Goal
Let one **central** cluster's Admin UI manage and deploy to multiple
logically-separate clusters that share a single Akka mesh. The central cluster
runs 2 fused `admin,driver` nodes (the only UI + the only deploy singleton);
each site cluster runs 2 `driver`-only nodes. A single global deploy from the
central UI reaches every node, and **each node applies only the slice of the
configuration that belongs to its own `ClusterId`** — its drivers and its OPC UA
address space. Ship global deploy first; per-cluster deploy is a later follow-up.
## Why this needs runtime work
The deploy channel is **in-mesh**: AdminUI → `admin-operations` singleton →
`ConfigPublishCoordinator` → DistributedPubSub → driver nodes. DistributedPubSub
does not cross Akka mesh boundaries, so for the central UI to deploy to site
servers the site nodes **must join the central mesh**. But the runtime currently
assumes **one Akka mesh == one logical cluster**:
- `DriverHostActor.ReconcileDrivers` spawns **every** `DriverInstance` in the
artifact with no cluster filter (`DriverHostActor.cs:367`). The `ClusterId` on
a spec is used only to *label* health snapshots.
- `ConfigPublishCoordinator.DiscoverDriverNodes` broadcasts to **every** driver
member of the mesh, no `ClusterId` filter (`ConfigPublishCoordinator.cs:248`).
- `ConfigComposer.SnapshotAndFlattenAsync` snapshots **all** clusters' rows into
one flat artifact; the address space is built from the whole thing.
Consequence today: put MAIN + SITE-A + SITE-B nodes in one mesh and every node
spawns every cluster's drivers (Galaxy auto-stubs on Linux, so it *would* start)
and serves a **merged** address space of all three clusters. That is why the
existing docker-dev rig uses three isolated meshes.
This design adds the missing per-`ClusterId` scoping so a shared mesh behaves as
distinct logical clusters.
## Approach (chosen: A — node-side, parse-time filter, ClusterId from the artifact)
Each node resolves *its own* `ClusterId` by finding its `NodeId`
(`_localNode.Value`, format `"host:port"`, e.g. `central-1:4053`) in the
artifact's `ClusterNode` rows, then filters both the driver specs and the
address-space composition to that cluster.
The artifact is a self-contained, consistent snapshot that already includes
`ClusterNode` + `DriverInstance` + `Namespace` + `UnsArea` (all carrying
`ClusterId`), so resolution needs **no extra DB query** and has no
seal-vs-apply inconsistency window. The coordinator stays a **single broadcast**;
every node just applies its own slice.
### Alternatives considered
- **B — control-plane per-node artifact slices.** `ConfigComposer` emits a
filtered artifact per cluster and the coordinator dispatches the right slice to
each node. Rejected: turns one broadcast into per-cluster dispatch (a large
change to the deploy/ack model), contradicts "ship global first," and still
needs the same transitive `ClusterId` resolution.
- **C — runtime DB lookup for ClusterId.** Node queries `ClusterNode` by its
address at apply time, then filters post-parse. Rejected: extra DB round-trip
per node per deploy and a seal-vs-apply inconsistency window; the artifact
already contains everything A needs.
## Components
### 1. Self-`ClusterId` resolution
New helper `DeploymentArtifact.ParseClusterScope(blob, nodeId)` returning
`(string? ClusterId, int ClusterCount)`:
- `ClusterId` = the `ClusterNode` row whose `NodeId == nodeId`, else `null`.
- `ClusterCount` = number of `ServerCluster` rows in the artifact.
Both `DriverHostActor` and `OpcUaPublishActor` call it with `_localNode.Value`.
**Fallback rule (single source of truth for every filter site):**
| Condition | Behavior |
|---|---|
| `ClusterCount ≤ 1` | **Lenient — no filter** (legacy single-cluster meshes + the entire existing test suite behave exactly as today). |
| `ClusterCount > 1` and `ClusterId` resolved | **Filter to my cluster.** |
| `ClusterCount > 1` and `ClusterId` unresolved | **Apply nothing + log error** (a node in a multi-cluster mesh with no `ClusterNode` row is misconfigured; serving everything would leak other clusters' data). |
The `ClusterCount ≤ 1` lenient branch is what protects the existing ~210 v2
tests and any single-cluster deployment from any behavior change.
### 2. Driver-spawn filter — `DriverHostActor`
`DriverInstanceSpec` already carries `ClusterId`, so in `ReconcileDrivers` (and
the restart `RestoreServedState` path) apply a one-line predicate over the parsed
specs using the fallback rule. In multi-cluster mode, specs with a `null`
`ClusterId` are excluded + logged (should never occur — `ConfigComposer` always
serializes the column).
### 3. Address-space filter — `ParseComposition` + `OpcUaPublishActor`
Add `DeploymentArtifact.ParseComposition(blob, clusterId)`. At parse time the raw
artifact entities still carry `ClusterId` / `NamespaceId` / `UnsAreaId` /
`DriverInstanceId`, so build in-cluster id sets from the artifact and filter every
projection:
| Projection | Filter predicate |
|---|---|
| `UnsAreas` | `ClusterId == mine` (direct) |
| `UnsLines` | `UnsAreaId ∈ myAreas` |
| `EquipmentNodes` | `DriverInstanceId ∈ myDrivers` |
| `DriverInstancePlans` | `DriverInstanceId ∈ myDrivers` |
| `GalaxyTags` / `EquipmentTags` | `DriverInstanceId ∈ myDrivers` |
| `ScriptedAlarmPlans` | `EquipmentId ∈ myEquipment` |
`OpcUaPublishActor.HandleRebuild` resolves `myClusterId` and calls the filtered
parse before `Phase7Planner.Compute`. `_lastApplied` becomes the filtered
composition, so the incremental diff stays correct across redeploys. The no-arg
`ParseComposition(blob)` is left untouched (legacy / single-cluster path).
### 4. Deploy ack / convergence
`ConfigPublishCoordinator` keeps broadcasting to all driver members and waiting
for all acks (in the new rig all 6 nodes are driver-role). Each node applies its
slice and acks — **including a node whose cluster has an empty slice**. The one
risk: the ack must fire even when the node's plan is empty. Implementation will
**verify the ack is unconditional** and add a small fix if it is currently gated
on a non-empty change set. No change to `DiscoverDriverNodes`.
### 5. docker-dev compose + seed rewrite
- **compose:** remove `admin-a` / `admin-b` / `driver-a` / `driver-b`; add
`central-1` / `central-2` (`OTOPCUA_ROLES=admin,driver`, seed = `central-1`,
OPC UA `4840` / `4841`, ASPNETCORE UI on `:9000`). `site-a-1/2`, `site-b-1/2`
become `driver`-only (`OTOPCUA_ROLES=driver`, `Cluster__Roles__0=driver`, seed
`central-1`, OPC UA `4842``4845`), dropping their UI / Jwt / Ldap /
DeployApiKey env + Traefik exposure. All nodes share the one ConfigDb.
- **traefik:** single `PathPrefix(/)` router → `central-1` / `central-2`
(sticky cookie); drop the two site routers + services in both
`docker-compose.yml` and `traefik-dynamic.yml`.
- **seed SQL (`seed/seed-clusters.sql`):** MAIN `ClusterNode` rows become
`central-1:4053` / `central-2:4053` (replacing `driver-a` / `driver-b`);
SITE-A / SITE-B keep their `ServerCluster` + 2 `ClusterNode` rows but **no
drivers/tags** (empty sites). Update the `Notes` columns + the file header
comments. The Galaxy namespace / driver / tags stay on MAIN (they run on the
central fused nodes).
- **compose header + comment blocks:** rewrite the topology description (single
mesh, hub-and-spoke, central-only UI).
## Data flow (after the change)
1. Operator clicks **Deploy** in the central UI (or `POST /api/deployments`).
2. `admin-operations` singleton (on a central node) → `ConfigComposer` snapshots
**all** clusters' rows into one artifact → `ConfigPublishCoordinator`
broadcasts `DispatchDeployment` to **all** driver members.
3. Each node resolves its own `ClusterId` from the artifact's `ClusterNode` rows.
4. `DriverHostActor` spawns only its cluster's `DriverInstance`s.
5. `OpcUaPublishActor` materialises only its cluster's address space.
6. Every node acks; the coordinator seals the deployment when all acks arrive.
7. Result: central `:4840`/`:4841` serve MAIN's Galaxy tree; site
`:4842``:4845` serve only their own (empty until configured) trees.
## Error handling
- **Misconfigured node** (multi-cluster mesh, no matching `ClusterNode` row):
applies nothing, logs an error, still acks (so the deploy converges rather than
hanging). Surfaced for the operator to add the missing `ClusterNode` row.
- **Pre-PR / single-cluster artifacts:** `ClusterCount ≤ 1` → lenient no-filter,
identical to current behavior.
- **Empty cluster slice:** node applies an empty plan and acks normally.
## Testing
- **Unit:** `ParseClusterScope` (match / miss / count); `ParseComposition(blob,
clusterId)` (cross-cluster projections excluded; transitive resolution for
UnsLine / Equipment / Tag / ScriptedAlarm); the driver-spec filter predicate
(lenient / strict / unresolved-strict).
- **Integration:** a 2-cluster scoping test on the in-process harness — two
driver nodes assigned to different `ClusterId`s, one deploy, assert each spawns
only its cluster's drivers and materialises only its cluster's tree.
- **Backward-compat:** the existing single-cluster suites must stay green (the
`ClusterCount ≤ 1` lenient branch guarantees this).
- **Live (docker-dev rig):** bring the rig up, sign into the central UI, confirm
3 clusters listed, deploy, confirm `:4840` shows the Galaxy tree and
`:4842`/`:4844` are empty (not the merged tree).
## Classification
High-risk — touches the actor model, the Phase7 data contract, and the deploy
path. The implementation plan will be TDD'd section by section.
## Out of scope (follow-ups)
- **Per-cluster deploy** (deploy just SITE-A from the UI) — global deploy ships
first; per-cluster targeting is a later coordinator + UI enhancement.
- **Seeding demo drivers on the sites** — sites start empty; drivers are added
via the central UI.