# docker-dev Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. Each cluster has its own ConfigDb database and its own seed-node list, so Akka.Cluster gossip doesn't cross between them even though they share the same system name `otopcua`. ## Stack ### Shared infrastructure | Service | Role | Ports | |---|---|---| | `sql` | SQL Server 2022 (hosts all per-cluster ConfigDb databases) | host `14330` → container `1433` | | `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` | | `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8080` | ### Main cluster — split admin/driver roles (ConfigDb: `OtOpcUa`) | Service | Role | Ports | |---|---|---| | `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` | | `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` | | `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` | | `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` | ### Site A cluster — 2-node fused admin+driver (ConfigDb: `OtOpcUa_SiteA`) | Service | Role | Ports | |---|---|---| | `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` | | `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` | ### Site B cluster — 2-node fused admin+driver (ConfigDb: `OtOpcUa_SiteB`) | Service | Role | Ports | |---|---|---| | `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` | | `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` | All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Cluster isolation is enforced purely by disjoint seed lists. ## Bring up ```bash # from the repo root docker compose -f docker-dev/docker-compose.yml up -d --build # wait ~20 seconds for SQL to come up + all three clusters to form open http://localhost # main cluster admin UI open http://site-a.localhost # site A admin UI open http://site-b.localhost # site B admin UI open http://localhost:8080 # Traefik dashboard ``` On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't. The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache. ## Auth (dev only) Use one of the LDAP dev users from `LDAP_USERS` in `docker-compose.yml`: | Username | Password | |---|---| | `alice` | `alice123` | | `bob` | `bob123` | The compose mounts everyone into `ou=FleetAdmin` so the dev role mapping resolves to `FleetAdmin`. ## Tear down ```bash docker compose -f docker-dev/docker-compose.yml down -v ``` The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts. ## Failover smoke 1. Watch the Traefik dashboard at `http://localhost:8080`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service. 2. `docker compose -f docker-dev/docker-compose.yml stop admin-a` — `admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200. 3. `docker compose -f docker-dev/docker-compose.yml start admin-a` — `admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it. ## Notes - This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings. - The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface): - Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b) - Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2) - Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2) - Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.