# docker-dev Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. All three clusters share the single `OtOpcUa` ConfigDb — multi-tenancy is enforced by per-row `ServerCluster.ClusterId` scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name `otopcua`. ## Stack ### Shared infrastructure | Service | Role | Ports | |---|---|---| | `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` | | `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` | Authentication runs in `DevStubMode` — every host container has `Authentication__Ldap__DevStubMode=true` set, so the LDAP service is not part of the dev compose right now (the `bitnami/openldap:2.6` image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as `FleetAdmin`. To restore a real LDAP service, drop the env var and add an `openldap`-compatible image back to compose. ### Main cluster — split admin/driver roles | Service | Role | Ports | |---|---|---| | `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` | | `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` | | `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` | | `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` | ### Site A cluster — 2-node fused admin+driver | Service | Role | Ports | |---|---|---| | `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` | | `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` | ### Site B cluster — 2-node fused admin+driver | Service | Role | Ports | |---|---|---| | `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` | | `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` | All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by `ServerCluster.ClusterId` — see "Multi-tenancy" below. ## Multi-tenancy All eight host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three Akka meshes: each Akka cluster maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope. A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for SQL + the EF auto-migration to complete and then INSERTs the rows below. The seed is **idempotent** — `IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops: | Akka mesh | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows | |---|---|---| | Main | `MAIN` | `driver-a`, `driver-b` (OPC UA publishers) | | Site A | `SITE-A` | `site-a-1`, `site-a-2` | | Site B | `SITE-B` | `site-b-1`, `site-b-2` | `ClusterNode` is the table for **OPC UA-publishing nodes** (not every Akka cluster member), which is why the main cluster's `admin-a` / `admin-b` don't get rows — they're control-plane-only. Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:` convention. The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually: ```bash docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed ``` ### Galaxy / MxAccess gateway The seed also pre-creates a `SystemPlatform` Namespace + a `GalaxyMxGateway` DriverInstance in the MAIN cluster pointing at `http://10.100.0.48:5120`. The API key is resolved from the `GALAXY_MXGW_API_KEY` env var set on every driver-role container in compose; override via `GALAXY_MXGW_API_KEY=... docker compose up -d` to swap keys without editing the compose file. The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a *sealed Deployment* before drivers materialise. After first bring-up, sign in to the Admin UI and click **Deploy current configuration** on `/deployments` to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack. ## Bring up ```bash # from the repo root docker compose -f docker-dev/docker-compose.yml up -d --build # wait ~20 seconds for SQL to come up + all three clusters to form open http://localhost # main cluster admin UI open http://site-a.localhost # site A admin UI open http://site-b.localhost # site B admin UI open http://localhost:8089 # Traefik dashboard ``` On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't. The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache. ## Auth (dev only) `Authentication__Ldap__DevStubMode=true` is set on every host container, so any non-empty username/password signs in as a `FleetAdmin` user without contacting an LDAP server. **Do not** ship this configuration to production — set `DevStubMode=false` and wire a real LDAP backend before any non-dev deployment. ## Tear down ```bash docker compose -f docker-dev/docker-compose.yml down -v ``` The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts. ## Failover smoke 1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service. 2. `docker compose -f docker-dev/docker-compose.yml stop admin-a` — `admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200. 3. `docker compose -f docker-dev/docker-compose.yml start admin-a` — `admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it. ## Notes - This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings. - The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface): - Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b) - Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2) - Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2) - Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.