docker-dev
Mac-friendly OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up one single Akka mesh (hub-and-spoke topology) + SQL Server + Traefik on the same Compose network. All six host nodes share the single OtOpcUa ConfigDb — logical separation between MAIN, SITE-A, and SITE-B is enforced by per-row ServerCluster.ClusterId scoping, not by mesh isolation.
Stack
Shared infrastructure
| Service | Role | Ports |
|---|---|---|
sql |
SQL Server 2022 — single OtOpcUa ConfigDb shared by all nodes |
host 14330 → container 1433 |
traefik |
Routes :80 by PathPrefix to central admin nodes |
host 80, dashboard 8089 |
Authentication uses the shared GLAuth on the Linux Docker host at 10.100.0.35:3893 (baseDN dc=zb,dc=local). Only the central admin nodes authenticate users. Sign in as multi-role / password to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password password. Group→role mappings are seeded by seed/seed-clusters.sql (OtOpcUa-Admins→Administrator, OtOpcUa-Designers→Designer, OtOpcUa-Viewers→Viewer). The shared GLAuth source of truth and deploy runbook live in scadaproj/infra/glauth/.
Central nodes — fused admin+driver (MAIN cluster, UI + deploy singleton)
| Service | Roles | Ports |
|---|---|---|
central-1 |
OTOPCUA_ROLES=admin,driver, Akka mesh seed |
host 4840 → container 4840; internal 9000 |
central-2 |
OTOPCUA_ROLES=admin,driver, joins central-1 |
host 4841 → container 4840; internal 9000 |
central-1 and central-2 are the only nodes that host the Admin UI and the deploy singleton. They are also the OPC UA publishers for the MAIN cluster. Traefik routes all PathPrefix(/) traffic to whichever central node has the leader role.
Site A nodes — driver-only (SITE-A cluster)
| Service | Roles | Ports |
|---|---|---|
site-a-1 |
OTOPCUA_ROLES=driver, joins the single mesh |
host 4842 → container 4840 |
site-a-2 |
OTOPCUA_ROLES=driver, joins the single mesh |
host 4843 → container 4840 |
Site B nodes — driver-only (SITE-B cluster)
| Service | Roles | Ports |
|---|---|---|
site-b-1 |
OTOPCUA_ROLES=driver, joins the single mesh |
host 4844 → container 4840 |
site-b-2 |
OTOPCUA_ROLES=driver, joins the single mesh |
host 4845 → container 4840 |
Site nodes serve no UI and authenticate no users. The central cluster manages and deploys to them over the shared Akka mesh. All six nodes bind Akka remoting to port 4053 inside their own network namespace; PublicHostname for each matches its Compose service name.
Multi-tenancy
All six host nodes write to the same OtOpcUa ConfigDb. The ServerCluster table differentiates the three logical clusters: each maps to one row, and each ClusterNode row's ClusterId ties the runtime node back to its owning cluster scope.
Two one-shot Compose services bootstrap the DB on bring-up: migrator applies the EF Core migrations (so a fresh SQL volume gets the schema with no operator step — the host nodes deliberately do not auto-migrate, since production owns schema changes), then cluster-seed (image mcr.microsoft.com/mssql-tools) INSERTs the rows below. cluster-seed and every host node depend_on the migrator completing (service_completed_successfully), so the seed never races an in-progress migration. The seed is idempotent — IF NOT EXISTS guards every insert — so re-runs on docker compose up are no-ops:
| Logical cluster | ServerCluster.ClusterId |
ClusterNode.NodeId rows |
|---|---|---|
| Main | MAIN |
central-1, central-2 (OPC UA publishers + admin UI) |
| Site A | SITE-A |
site-a-1, site-a-2 |
| Site B | SITE-B |
site-b-1, site-b-2 |
Each ClusterNode.NodeId matches the node's Cluster__PublicHostname env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. ApplicationUri follows the urn:OtOpcUa:<NodeId> convention.
The SQL lives at seed/seed-clusters.sql; the wait-and-apply wrapper lives at seed/entrypoint.sh. To re-seed manually:
docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed
Galaxy / MxAccess gateway
The seed also pre-creates a SystemPlatform Namespace + a GalaxyMxGateway DriverInstance in the MAIN cluster pointing at http://10.100.0.48:5120. The API key is resolved from the GALAXY_MXGW_API_KEY env var set on every driver-role container in compose; override via GALAXY_MXGW_API_KEY=... docker compose up -d to swap keys without editing the compose file.
The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a sealed Deployment before drivers materialise. After first bring-up, sign in to the Admin UI and click Deploy current configuration on /deployments to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack.
Bring up
# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build
# the one-shot migrator + cluster-seed bootstrap the DB; watch the seed finish:
docker compose -f docker-dev/docker-compose.yml logs -f cluster-seed # ^C once it prints "[cluster-seed] done."
open http://localhost:9200 # Admin UI (Traefik → central-1 or central-2)
open http://localhost:8089 # Traefik dashboard
The first build takes a few minutes (.NET SDK image + restore + publish). No manual schema step is needed — on a fresh SQL volume the one-shot migrator service applies the EF migrations (the host nodes deliberately don't auto-migrate, since production owns schema changes), then cluster-seed populates the cluster/namespace/driver rows. cluster-seed and the host nodes wait for the migrator via service_completed_successfully, so nothing races an in-progress migration. A plain docker compose ... up -d on an existing volume is a fast no-op for both — the named SQL volume keeps the schema + rows across restarts; only down -v wipes them, after which the next up re-migrates + re-seeds automatically.
Auth (dev only)
Central nodes authenticate against the shared GLAuth at 10.100.0.35:3893 (baseDN dc=zb,dc=local). DevStubMode is not active. Sign in with any test user (password password); multi-role / password returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by seed/seed-clusters.sql. The GLAuth source of truth + deploy runbook is in scadaproj/infra/glauth/. Do not enable DevStubMode outside local debugging — production must always bind a real LDAP backend.
Headless deploy
POST http://localhost:9200/api/deployments
X-Api-Key: docker-dev-deploy-key
Tear down
docker compose -f docker-dev/docker-compose.yml down -v
The -v drops the SQL volume; remove it to keep ConfigDb state across restarts. There is no local LDAP volume — LDAP is the shared external GLAuth on 10.100.0.35:3893.
Failover smoke
- Watch the Traefik dashboard at
http://localhost:8089. Bothcentral-1andcentral-2should be listed as healthy in theotopcua-adminservice. docker compose -f docker-dev/docker-compose.yml stop central-1—central-2should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic tocentral-2once its/health/activereturns 200.docker compose -f docker-dev/docker-compose.yml start central-1—central-1rejoins as a follower;central-2keeps the leader role until something disturbs it.
Resource limits & dev logging
The full single-mesh stack (central-1/central-2 + the four site nodes) can OOM-kill central-1 on a loaded host. Two settings in the compose file guard against that:
- EF Core + ASP.NET Core logs are pinned to
Warningon every host node (Serilog__MinimumLevel__Override__Microsoft.EntityFrameworkCore/…Microsoft.AspNetCore=Warning). The host logs via Serilog (AddZbSerilog→ReadFrom.Configuration), and inDevelopmentthe default level isDebug— without these overrides every Deployment-poll emits anExecuted DbCommand/SELECT … FROM [Deployment]line, flooding the Serilog pipeline and starving the Akka cluster heartbeat thread. Application + Akka log levels are left untouched, so this only silences the per-poll SQL chatter. To temporarily restore the SQL log flood for debugging, drop those two env vars (or set them back toInformation) on the node you're inspecting. - Each host node has
mem_limit: 1g(mem_reservation: 512m). A quiet solocentral-1measures ~357 MiB; the limit leaves headroom for the deploy/UI load and per-cluster driver subscriptions that push a fully-loaded node higher. The limit/reservation live on the&otopcua-hostanchor, so all six host services inherit them;sql,traefik, and the one-shotmigrator/cluster-seedare left unbounded.
The full six-node host stack therefore needs roughly 6 GiB of Docker Desktop VM memory just for the host nodes (plus SQL Server's own footprint on top). On a constrained host, either raise the Docker Desktop VM memory or run fewer host services (e.g. just central-1 + central-2, or a single central node) rather than the full mesh.
Notes
- This compose is for the local Mac/Linux developer rig. The team's CI + soak runs go to the remote docker host at
10.100.0.35(seedocs/v2/dev-environment.md); the file there mirrors this one with adjusted port bindings. - The OPC UA endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
- Main:
opc.tcp://localhost:4840(central-1),opc.tcp://localhost:4841(central-2) - Site A:
opc.tcp://localhost:4842(site-a-1),opc.tcp://localhost:4843(site-a-2) - Site B:
opc.tcp://localhost:4844(site-b-1),opc.tcp://localhost:4845(site-b-2)
- Main:
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows,
DriverInstanceActor.ShouldStub(driverType, roles)returnstruefor those types and the actor goes straight to aStubbedstate that returns deterministic success. - SQL persistence: ConfigDb state survives container restarts (named Docker volume). Drop the volume with
down -vfor a clean slate.