feat(deploy): add site-a + site-b 2-node clusters to docker-dev

Extends the docker-dev compose with two additional, fully-isolated Akka
clusters representing distinct sites. Each site is a 2-node fused
admin+driver cluster (OTOPCUA_ROLES=admin,driver on both nodes), backed
by its own ConfigDb database so configuration state stays separate from
the main cluster and from the other site.

Cluster isolation: the three meshes share the same Akka system name
"otopcua" and remoting port 4053 (inside each container's own network
namespace), but their seed-node lists are disjoint — main seeds at
admin-a, site-a seeds at site-a-1, site-b seeds at site-b-1 — so gossip
doesn't cross between them.

Layout:
  Main cluster   ConfigDb=OtOpcUa        admin-a, admin-b, driver-a, driver-b
  Site A         ConfigDb=OtOpcUa_SiteA  site-a-1, site-a-2 (fused admin+driver)
  Site B         ConfigDb=OtOpcUa_SiteB  site-b-1, site-b-2 (fused admin+driver)

OPC UA endpoints exposed on host ports 4840-4845. Admin UIs reachable
through Traefik via Host-header routing:
  http://localhost               → main cluster (PathPrefix default)
  http://site-a.localhost        → site A
  http://site-b.localhost        → site B

`*.localhost` auto-resolves on macOS; Linux users add the two hosts to
/etc/hosts (or rely on the resolver's RFC 6761 behaviour).
This commit is contained in:
Joseph Doherty
2026-05-26 13:59:23 -04:00
parent a1a7646b33
commit 961e09430a
3 changed files with 209 additions and 26 deletions

View File

@@ -1,20 +1,41 @@
# docker-dev
Mac-friendly four-node OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up an Akka cluster + SQL Server + OpenLDAP + Traefik in front of two admin nodes.
Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. Each cluster has its own ConfigDb database and its own seed-node list, so Akka.Cluster gossip doesn't cross between them even though they share the same system name `otopcua`.
## Stack
### Shared infrastructure
| Service | Role | Ports |
|---|---|---|
| `sql` | SQL Server 2022 (`ConfigDb` backing store) | host `14330` → container `1433` |
| `sql` | SQL Server 2022 (hosts all per-cluster ConfigDb databases) | host `14330` → container `1433` |
| `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` |
| `admin-a` | OtOpcUa.Host, `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
| `admin-b` | OtOpcUa.Host, `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
| `driver-a` | OtOpcUa.Host, `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
| `driver-b` | OtOpcUa.Host, `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
| `traefik` | Routes `:80` to whichever admin-* currently passes `/health/active` | host `80`, dashboard `8080` |
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8080` |
All six containers share an Akka cluster bound to port `4053` inside the Compose network. The Akka `PublicHostname` of each container matches its Compose service name; the seed-node list points at `admin-a` so the other three join via that.
### Main cluster — split admin/driver roles (ConfigDb: `OtOpcUa`)
| Service | Role | Ports |
|---|---|---|
| `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
| `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
| `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
| `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
### Site A cluster — 2-node fused admin+driver (ConfigDb: `OtOpcUa_SiteA`)
| Service | Role | Ports |
|---|---|---|
| `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` |
| `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` |
### Site B cluster — 2-node fused admin+driver (ConfigDb: `OtOpcUa_SiteB`)
| Service | Role | Ports |
|---|---|---|
| `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` |
| `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` |
All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Cluster isolation is enforced purely by disjoint seed lists.
## Bring up
@@ -22,12 +43,16 @@ All six containers share an Akka cluster bound to port `4053` inside the Compose
# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build
# wait ~15 seconds for SQL to come up + the cluster to form
# wait ~20 seconds for SQL to come up + all three clusters to form
open http://localhost # Blazor admin UI via Traefik
open http://localhost:8080 # Traefik dashboard
open http://localhost # main cluster admin UI
open http://site-a.localhost # site A admin UI
open http://site-b.localhost # site B admin UI
open http://localhost:8080 # Traefik dashboard
```
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.
## Auth (dev only)
@@ -58,5 +83,8 @@ The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across r
## Notes
- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
- The OPC UA driver endpoints (`opc.tcp://localhost:4840`, `opc.tcp://localhost:4841`) are reachable directly from the host Traefik is only in front of the admin HTTP surface.
- The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
- Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b)
- Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2)
- Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2)
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.