Files
lmxopcua/docker-dev

docker-dev

Mac-friendly OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up one single Akka mesh (hub-and-spoke topology) + SQL Server + Traefik on the same Compose network. All six host nodes share the single OtOpcUa ConfigDb — logical separation between MAIN, SITE-A, and SITE-B is enforced by per-row ServerCluster.ClusterId scoping, not by mesh isolation.

Stack

Shared infrastructure

Service Role Ports
sql SQL Server 2022 — single OtOpcUa ConfigDb shared by all nodes host 14330 → container 1433
traefik Routes :80 by PathPrefix to central admin nodes host 80, dashboard 8089

Authentication uses the shared GLAuth on the Linux Docker host at 10.100.0.35:3893 (baseDN dc=zb,dc=local). Only the central admin nodes authenticate users. Sign in as multi-role / password to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password password. Group→role mappings are seeded by seed/seed-clusters.sql (OtOpcUa-Admins→Administrator, OtOpcUa-Designers→Designer, OtOpcUa-Viewers→Viewer). The shared GLAuth source of truth and deploy runbook live in scadaproj/infra/glauth/.

Central nodes — fused admin+driver (MAIN cluster, UI + deploy singleton)

Service Roles Ports
central-1 OTOPCUA_ROLES=admin,driver, Akka mesh seed host 4840 → container 4840; internal 9000
central-2 OTOPCUA_ROLES=admin,driver, joins central-1 host 4841 → container 4840; internal 9000

central-1 and central-2 are the only nodes that host the Admin UI and the deploy singleton. They are also the OPC UA publishers for the MAIN cluster. Traefik routes all PathPrefix(/) traffic to whichever central node has the leader role.

Site A nodes — driver-only (SITE-A cluster)

Service Roles Ports
site-a-1 OTOPCUA_ROLES=driver, joins the single mesh host 4842 → container 4840
site-a-2 OTOPCUA_ROLES=driver, joins the single mesh host 4843 → container 4840

Site B nodes — driver-only (SITE-B cluster)

Service Roles Ports
site-b-1 OTOPCUA_ROLES=driver, joins the single mesh host 4844 → container 4840
site-b-2 OTOPCUA_ROLES=driver, joins the single mesh host 4845 → container 4840

Site nodes serve no UI and authenticate no users. The central cluster manages and deploys to them over the shared Akka mesh. All six nodes bind Akka remoting to port 4053 inside their own network namespace; PublicHostname for each matches its Compose service name.

Multi-tenancy

All six host nodes write to the same OtOpcUa ConfigDb. The ServerCluster table differentiates the three logical clusters: each maps to one row, and each ClusterNode row's ClusterId ties the runtime node back to its owning cluster scope.

Two one-shot Compose services bootstrap the DB on bring-up: migrator applies the EF Core migrations (so a fresh SQL volume gets the schema with no operator step — the host nodes deliberately do not auto-migrate, since production owns schema changes), then cluster-seed (image mcr.microsoft.com/mssql-tools) INSERTs the rows below. cluster-seed and every host node depend_on the migrator completing (service_completed_successfully), so the seed never races an in-progress migration. The seed is idempotentIF NOT EXISTS guards every insert — so re-runs on docker compose up are no-ops:

Logical cluster ServerCluster.ClusterId ClusterNode.NodeId rows
Main MAIN central-1, central-2 (OPC UA publishers + admin UI)
Site A SITE-A site-a-1, site-a-2
Site B SITE-B site-b-1, site-b-2

Each ClusterNode.NodeId matches the node's Cluster__PublicHostname env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. ApplicationUri follows the urn:OtOpcUa:<NodeId> convention.

The SQL lives at seed/seed-clusters.sql; the wait-and-apply wrapper lives at seed/entrypoint.sh. To re-seed manually:

docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed

Galaxy / MxAccess gateway

The seed also pre-creates a SystemPlatform Namespace + a GalaxyMxGateway DriverInstance in the MAIN cluster pointing at http://10.100.0.48:5120. The API key is resolved from the GALAXY_MXGW_API_KEY env var set on every driver-role container in compose; override via GALAXY_MXGW_API_KEY=... docker compose up -d to swap keys without editing the compose file.

The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a sealed Deployment before drivers materialise. After first bring-up, sign in to the Admin UI and click Deploy current configuration on /deployments to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack.

Bring up

# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build

# the one-shot migrator + cluster-seed bootstrap the DB; watch the seed finish:
docker compose -f docker-dev/docker-compose.yml logs -f cluster-seed   # ^C once it prints "[cluster-seed] done."

open http://localhost:9200                 # Admin UI (Traefik → central-1 or central-2)
open http://localhost:8089                 # Traefik dashboard

The first build takes a few minutes (.NET SDK image + restore + publish). No manual schema step is needed — on a fresh SQL volume the one-shot migrator service applies the EF migrations (the host nodes deliberately don't auto-migrate, since production owns schema changes), then cluster-seed populates the cluster/namespace/driver rows. cluster-seed and the host nodes wait for the migrator via service_completed_successfully, so nothing races an in-progress migration. A plain docker compose ... up -d on an existing volume is a fast no-op for both — the named SQL volume keeps the schema + rows across restarts; only down -v wipes them, after which the next up re-migrates + re-seeds automatically.

Auth (dev only)

Central nodes authenticate against the shared GLAuth at 10.100.0.35:3893 (baseDN dc=zb,dc=local). DevStubMode is not active. Sign in with any test user (password password); multi-role / password returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by seed/seed-clusters.sql. The GLAuth source of truth + deploy runbook is in scadaproj/infra/glauth/. Do not enable DevStubMode outside local debugging — production must always bind a real LDAP backend.

Headless deploy

POST http://localhost:9200/api/deployments
X-Api-Key: docker-dev-deploy-key

Tear down

docker compose -f docker-dev/docker-compose.yml down -v

The -v drops the SQL volume; remove it to keep ConfigDb state across restarts. There is no local LDAP volume — LDAP is the shared external GLAuth on 10.100.0.35:3893.

Failover smoke

  1. Watch the Traefik dashboard at http://localhost:8089. Both central-1 and central-2 should be listed as healthy in the otopcua-admin service.
  2. docker compose -f docker-dev/docker-compose.yml stop central-1central-2 should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to central-2 once its /health/active returns 200.
  3. docker compose -f docker-dev/docker-compose.yml start central-1central-1 rejoins as a follower; central-2 keeps the leader role until something disturbs it.

Resource limits & dev logging

The full single-mesh stack (central-1/central-2 + the four site nodes) can OOM-kill central-1 on a loaded host. Two settings in the compose file guard against that:

  • EF Core + ASP.NET Core logs are pinned to Warning on every host node (Serilog__MinimumLevel__Override__Microsoft.EntityFrameworkCore / …Microsoft.AspNetCore = Warning). The host logs via Serilog (AddZbSerilogReadFrom.Configuration), and in Development the default level is Debug — without these overrides every Deployment-poll emits an Executed DbCommand / SELECT … FROM [Deployment] line, flooding the Serilog pipeline and starving the Akka cluster heartbeat thread. Application + Akka log levels are left untouched, so this only silences the per-poll SQL chatter. To temporarily restore the SQL log flood for debugging, drop those two env vars (or set them back to Information) on the node you're inspecting.
  • Each host node has mem_limit: 1g (mem_reservation: 512m). A quiet solo central-1 measures ~357 MiB; the limit leaves headroom for the deploy/UI load and per-cluster driver subscriptions that push a fully-loaded node higher. The limit/reservation live on the &otopcua-host anchor, so all six host services inherit them; sql, traefik, and the one-shot migrator/cluster-seed are left unbounded.

The full six-node host stack therefore needs roughly 6 GiB of Docker Desktop VM memory just for the host nodes (plus SQL Server's own footprint on top). On a constrained host, either raise the Docker Desktop VM memory or run fewer host services (e.g. just central-1 + central-2, or a single central node) rather than the full mesh.

Notes

  • This compose is for the local Mac/Linux developer rig. The team's CI + soak runs go to the remote docker host at 10.100.0.35 (see docs/v2/dev-environment.md); the file there mirrors this one with adjusted port bindings.
  • The OPC UA endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
    • Main: opc.tcp://localhost:4840 (central-1), opc.tcp://localhost:4841 (central-2)
    • Site A: opc.tcp://localhost:4842 (site-a-1), opc.tcp://localhost:4843 (site-a-2)
    • Site B: opc.tcp://localhost:4844 (site-b-1), opc.tcp://localhost:4845 (site-b-2)
  • Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, DriverInstanceActor.ShouldStub(driverType, roles) returns true for those types and the actor goes straight to a Stubbed state that returns deterministic success.
  • SQL persistence: ConfigDb state survives container restarts (named Docker volume). Drop the volume with down -v for a clean slate.