The previous commit (961e094) gave each site cluster its own database
(OtOpcUa_SiteA / OtOpcUa_SiteB). That fights the architecture — ConfigDb
is multi-tenant by design: one schema with a ServerCluster table whose
rows scope the rest of the configuration via ClusterId. Per-cluster
databases would split the schema and force every singleton/coordinator
to point at a different connection string.
Correct model: one ConfigDb, three ServerCluster rows (MAIN / SITE-A /
SITE-B), each Akka cluster's ClusterNode rows pointing back at the
matching ClusterId. Akka mesh isolation is still enforced by the
disjoint seed-node lists (unchanged from the previous commit).
Compose: all eight host nodes now point at Server=sql,1433;Database=OtOpcUa
and the README documents the post-boot ServerCluster + ClusterNode rows
operators need to create via /clusters and /hosts before the runtime can
resolve its scope.
103 lines
5.6 KiB
Markdown
103 lines
5.6 KiB
Markdown
# docker-dev
|
|
|
|
Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. All three clusters share the single `OtOpcUa` ConfigDb — multi-tenancy is enforced by per-row `ServerCluster.ClusterId` scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name `otopcua`.
|
|
|
|
## Stack
|
|
|
|
### Shared infrastructure
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` |
|
|
| `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` |
|
|
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8080` |
|
|
|
|
### Main cluster — split admin/driver roles
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
|
|
| `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
|
|
| `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
|
|
| `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
|
|
|
|
### Site A cluster — 2-node fused admin+driver
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` |
|
|
| `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` |
|
|
|
|
### Site B cluster — 2-node fused admin+driver
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` |
|
|
| `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` |
|
|
|
|
All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by `ServerCluster.ClusterId` — see "Multi-tenancy" below.
|
|
|
|
## Multi-tenancy
|
|
|
|
All eight host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three Akka meshes: each Akka cluster maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope. After the stack comes up clean for the first time, sign in to any admin UI and create the three rows (or do it via `dotnet run` against the Configuration project's seed script):
|
|
|
|
| Akka mesh | Suggested `ClusterId` | Nodes (`ClusterNode.NodeId`) |
|
|
|---|---|---|
|
|
| Main | `MAIN` | `admin-a`, `admin-b`, `driver-a`, `driver-b` |
|
|
| Site A | `SITE-A` | `site-a-1`, `site-a-2` |
|
|
| Site B | `SITE-B` | `site-b-1`, `site-b-2` |
|
|
|
|
The `NodeId` for each `ClusterNode` row must match the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to find its own membership.
|
|
|
|
## Bring up
|
|
|
|
```bash
|
|
# from the repo root
|
|
docker compose -f docker-dev/docker-compose.yml up -d --build
|
|
|
|
# wait ~20 seconds for SQL to come up + all three clusters to form
|
|
|
|
open http://localhost # main cluster admin UI
|
|
open http://site-a.localhost # site A admin UI
|
|
open http://site-b.localhost # site B admin UI
|
|
open http://localhost:8080 # Traefik dashboard
|
|
```
|
|
|
|
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
|
|
|
|
The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.
|
|
|
|
## Auth (dev only)
|
|
|
|
Use one of the LDAP dev users from `LDAP_USERS` in `docker-compose.yml`:
|
|
|
|
| Username | Password |
|
|
|---|---|
|
|
| `alice` | `alice123` |
|
|
| `bob` | `bob123` |
|
|
|
|
The compose mounts everyone into `ou=FleetAdmin` so the dev role mapping resolves to `FleetAdmin`.
|
|
|
|
## Tear down
|
|
|
|
```bash
|
|
docker compose -f docker-dev/docker-compose.yml down -v
|
|
```
|
|
|
|
The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts.
|
|
|
|
## Failover smoke
|
|
|
|
1. Watch the Traefik dashboard at `http://localhost:8080`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
|
|
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a` — `admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
|
|
3. `docker compose -f docker-dev/docker-compose.yml start admin-a` — `admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.
|
|
|
|
## Notes
|
|
|
|
- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
|
|
- The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
|
|
- Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b)
|
|
- Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2)
|
|
- Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2)
|
|
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.
|