# docker-dev Mac-friendly OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **one single Akka mesh** (hub-and-spoke topology) + SQL Server + Traefik on the same Compose network. All six host nodes share the single `OtOpcUa` ConfigDb — logical separation between MAIN, SITE-A, and SITE-B is enforced by per-row `ServerCluster.ClusterId` scoping, not by mesh isolation. ## Stack ### Shared infrastructure | Service | Role | Ports | |---|---|---| | `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all nodes | host `14330` → container `1433` | | `traefik` | Routes `:80` by PathPrefix to central admin nodes | host `80`, dashboard `8089` | Authentication uses the **shared GLAuth** on the Linux Docker host at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). Only the central admin nodes authenticate users. Sign in as `multi-role` / `password` to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password `password`. Group→role mappings are seeded by `seed/seed-clusters.sql` (`OtOpcUa-Admins`→Administrator, `OtOpcUa-Designers`→Designer, `OtOpcUa-Viewers`→Viewer). The shared GLAuth source of truth and deploy runbook live in `scadaproj/infra/glauth/`. ### Central nodes — fused admin+driver (MAIN cluster, UI + deploy singleton) | Service | Roles | Ports | |---|---|---| | `central-1` | `OTOPCUA_ROLES=admin,driver`, Akka mesh seed | host `4840` → container `4840`; internal `9000` | | `central-2` | `OTOPCUA_ROLES=admin,driver`, joins central-1 | host `4841` → container `4840`; internal `9000` | `central-1` and `central-2` are the **only** nodes that host the Admin UI and the deploy singleton. They are also the OPC UA publishers for the MAIN cluster. Traefik routes all `PathPrefix(/)` traffic to whichever central node has the leader role. ### Site A nodes — driver-only (SITE-A cluster) | Service | Roles | Ports | |---|---|---| | `site-a-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4842` → container `4840` | | `site-a-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4843` → container `4840` | ### Site B nodes — driver-only (SITE-B cluster) | Service | Roles | Ports | |---|---|---| | `site-b-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4844` → container `4840` | | `site-b-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4845` → container `4840` | Site nodes serve no UI and authenticate no users. The central cluster manages and deploys to them over the shared Akka mesh. All six nodes bind Akka remoting to port `4053` inside their own network namespace; `PublicHostname` for each matches its Compose service name. ## Multi-tenancy All six host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three logical clusters: each maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope. A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for the `OtOpcUa` ConfigDb schema to exist (the host nodes do **not** auto-migrate — you apply EF migrations once; see [First-time setup](#first-time-setup-or-after-down--v)) and then INSERTs the rows below. The seed is **idempotent** — `IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops: | Logical cluster | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows | |---|---|---| | Main | `MAIN` | `central-1`, `central-2` (OPC UA publishers + admin UI) | | Site A | `SITE-A` | `site-a-1`, `site-a-2` | | Site B | `SITE-B` | `site-b-1`, `site-b-2` | Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:` convention. The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually: ```bash docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed ``` ### Galaxy / MxAccess gateway The seed also pre-creates a `SystemPlatform` Namespace + a `GalaxyMxGateway` DriverInstance in the MAIN cluster pointing at `http://10.100.0.48:5120`. The API key is resolved from the `GALAXY_MXGW_API_KEY` env var set on every driver-role container in compose; override via `GALAXY_MXGW_API_KEY=... docker compose up -d` to swap keys without editing the compose file. The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a *sealed Deployment* before drivers materialise. After first bring-up, sign in to the Admin UI and click **Deploy current configuration** on `/deployments` to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack. ## Bring up ```bash # from the repo root docker compose -f docker-dev/docker-compose.yml up -d --build # wait ~20 seconds for SQL to come up + the mesh to form open http://localhost:9200 # Admin UI (Traefik → central-1 or central-2) open http://localhost:8089 # Traefik dashboard ``` The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache. ### First-time setup (or after `down -v`) The host nodes do **not** auto-create the ConfigDb schema — on a brand-new SQL volume you must apply the EF migrations once, then (re)run the seed. (The auto-started `cluster-seed` polls for `dbo.ServerCluster`, which the *first* migration creates, so if it runs mid-migration it can fail against an intermediate schema — just re-run it after migrations finish.) ```bash # 1. bring the stack up (SQL + nodes; nodes retry the DB until the schema exists) docker compose -f docker-dev/docker-compose.yml up -d --build # 2. create + migrate the OtOpcUa ConfigDb (one time; the design-time factory reads OTOPCUA_CONFIG_CONNECTION) OTOPCUA_CONFIG_CONNECTION="Server=localhost,14330;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;" \ dotnet ef database update \ --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration \ --startup-project src/Core/ZB.MOM.WW.OtOpcUa.Configuration # 3. apply the cluster/namespace/driver seed against the now-complete schema (idempotent) docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed ``` After the schema + seed exist, a plain `docker compose ... up -d` is enough — the named SQL volume keeps both across restarts (only `down -v` wipes them, which is when you repeat the steps above). ## Auth (dev only) Central nodes authenticate against the shared GLAuth at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). `DevStubMode` is **not** active. Sign in with any test user (password `password`); `multi-role` / `password` returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by `seed/seed-clusters.sql`. The GLAuth source of truth + deploy runbook is in `scadaproj/infra/glauth/`. **Do not** enable `DevStubMode` outside local debugging — production must always bind a real LDAP backend. ## Headless deploy ```bash POST http://localhost:9200/api/deployments X-Api-Key: docker-dev-deploy-key ``` ## Tear down ```bash docker compose -f docker-dev/docker-compose.yml down -v ``` The `-v` drops the SQL volume; remove it to keep ConfigDb state across restarts. There is no local LDAP volume — LDAP is the shared external GLAuth on `10.100.0.35:3893`. ## Failover smoke 1. Watch the Traefik dashboard at `http://localhost:8089`. Both `central-1` and `central-2` should be listed as healthy in the `otopcua-admin` service. 2. `docker compose -f docker-dev/docker-compose.yml stop central-1` — `central-2` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `central-2` once its `/health/active` returns 200. 3. `docker compose -f docker-dev/docker-compose.yml start central-1` — `central-1` rejoins as a follower; `central-2` keeps the leader role until something disturbs it. ## Notes - This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings. - The OPC UA endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface): - Main: `opc.tcp://localhost:4840` (central-1), `opc.tcp://localhost:4841` (central-2) - Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2) - Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2) - Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success. - SQL persistence: ConfigDb state survives container restarts (named Docker volume). Drop the volume with `down -v` for a clean slate.