lmxopcua/docker-dev/README.md

# docker-dev

Mac-friendly OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **one single Akka mesh** (hub-and-spoke topology) + SQL Server + Traefik on the same Compose network. All six host nodes share the single `OtOpcUa` ConfigDb — logical separation between MAIN, SITE-A, and SITE-B is enforced by per-row `ServerCluster.ClusterId` scoping, not by mesh isolation.

## Stack

### Shared infrastructure

| Service | Role | Ports |
|---|---|---|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all nodes | host `14330` → container `1433` |
| `traefik` | Routes `:80` by PathPrefix to central admin nodes | host `80`, dashboard `8089` |

Authentication uses the **shared GLAuth** on the Linux Docker host at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). Only the central admin nodes authenticate users. Sign in as `multi-role` / `password` to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password `password`. Group→role mappings are seeded by `seed/seed-clusters.sql` (`OtOpcUa-Admins`→Administrator, `OtOpcUa-Designers`→Designer, `OtOpcUa-Viewers`→Viewer). The shared GLAuth source of truth and deploy runbook live in `scadaproj/infra/glauth/`.

### Central nodes — fused admin+driver (MAIN cluster, UI + deploy singleton)

| Service | Roles | Ports |
|---|---|---|
| `central-1` | `OTOPCUA_ROLES=admin,driver`, Akka mesh seed | host `4840` → container `4840`; internal `9000` |
| `central-2` | `OTOPCUA_ROLES=admin,driver`, joins central-1 | host `4841` → container `4840`; internal `9000` |

`central-1` and `central-2` are the **only** nodes that host the Admin UI and the deploy singleton. They are also the OPC UA publishers for the MAIN cluster. Traefik routes all `PathPrefix(/)` traffic to whichever central node has the leader role.

### Site A nodes — driver-only (SITE-A cluster)

| Service | Roles | Ports |
|---|---|---|
| `site-a-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4842` → container `4840` |
| `site-a-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4843` → container `4840` |

### Site B nodes — driver-only (SITE-B cluster)

| Service | Roles | Ports |
|---|---|---|
| `site-b-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4844` → container `4840` |
| `site-b-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4845` → container `4840` |

Site nodes serve no UI and authenticate no users. The central cluster manages and deploys to them over the shared Akka mesh. All six nodes bind Akka remoting to port `4053` inside their own network namespace; `PublicHostname` for each matches its Compose service name.

## Multi-tenancy

All six host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three logical clusters: each maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope.

A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for the `OtOpcUa` ConfigDb schema to exist (the host nodes do **not** auto-migrate — you apply EF migrations once; see [First-time setup](#first-time-setup-or-after-down--v)) and then INSERTs the rows below. The seed is **idempotent** — `IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops:

| Logical cluster | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows |
|---|---|---|
| Main | `MAIN` | `central-1`, `central-2` (OPC UA publishers + admin UI) |
| Site A | `SITE-A` | `site-a-1`, `site-a-2` |
| Site B | `SITE-B` | `site-b-1`, `site-b-2` |

Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:<NodeId>` convention.

The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually:

```bash
docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed
```

### Galaxy / MxAccess gateway

The seed also pre-creates a `SystemPlatform` Namespace + a `GalaxyMxGateway` DriverInstance in the MAIN cluster pointing at `http://10.100.0.48:5120`. The API key is resolved from the `GALAXY_MXGW_API_KEY` env var set on every driver-role container in compose; override via `GALAXY_MXGW_API_KEY=... docker compose up -d` to swap keys without editing the compose file.

The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a *sealed Deployment* before drivers materialise. After first bring-up, sign in to the Admin UI and click **Deploy current configuration** on `/deployments` to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack.

## Bring up

```bash
# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build

# wait ~20 seconds for SQL to come up + the mesh to form

open http://localhost:9200                 # Admin UI (Traefik → central-1 or central-2)
open http://localhost:8089                 # Traefik dashboard
```

The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.

### First-time setup (or after `down -v`)

The host nodes do **not** auto-create the ConfigDb schema — on a brand-new SQL volume you must apply the EF migrations once, then (re)run the seed. (The auto-started `cluster-seed` polls for `dbo.ServerCluster`, which the *first* migration creates, so if it runs mid-migration it can fail against an intermediate schema — just re-run it after migrations finish.)

```bash
# 1. bring the stack up (SQL + nodes; nodes retry the DB until the schema exists)
docker compose -f docker-dev/docker-compose.yml up -d --build

# 2. create + migrate the OtOpcUa ConfigDb (one time; the design-time factory reads OTOPCUA_CONFIG_CONNECTION)
OTOPCUA_CONFIG_CONNECTION="Server=localhost,14330;Database=OtOpcUa;User Id=sa;Password=OtOpcUa!Dev123;TrustServerCertificate=True;" \
  dotnet ef database update \
    --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration \
    --startup-project src/Core/ZB.MOM.WW.OtOpcUa.Configuration

# 3. apply the cluster/namespace/driver seed against the now-complete schema (idempotent)
docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed
```

After the schema + seed exist, a plain `docker compose ... up -d` is enough — the named SQL volume keeps both across restarts (only `down -v` wipes them, which is when you repeat the steps above).

## Auth (dev only)

Central nodes authenticate against the shared GLAuth at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). `DevStubMode` is **not** active. Sign in with any test user (password `password`); `multi-role` / `password` returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by `seed/seed-clusters.sql`. The GLAuth source of truth + deploy runbook is in `scadaproj/infra/glauth/`. **Do not** enable `DevStubMode` outside local debugging — production must always bind a real LDAP backend.

## Headless deploy

```bash
POST http://localhost:9200/api/deployments
X-Api-Key: docker-dev-deploy-key
```

## Tear down

```bash
docker compose -f docker-dev/docker-compose.yml down -v
```

The `-v` drops the SQL volume; remove it to keep ConfigDb state across restarts. There is no local LDAP volume — LDAP is the shared external GLAuth on `10.100.0.35:3893`.

## Failover smoke

1. Watch the Traefik dashboard at `http://localhost:8089`. Both `central-1` and `central-2` should be listed as healthy in the `otopcua-admin` service.
2. `docker compose -f docker-dev/docker-compose.yml stop central-1` — `central-2` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `central-2` once its `/health/active` returns 200.
3. `docker compose -f docker-dev/docker-compose.yml start central-1` — `central-1` rejoins as a follower; `central-2` keeps the leader role until something disturbs it.

## Notes

- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
- The OPC UA endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
  - Main: `opc.tcp://localhost:4840` (central-1), `opc.tcp://localhost:4841` (central-2)
  - Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2)
  - Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2)
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.
- SQL persistence: ConfigDb state survives container restarts (named Docker volume). Drop the volume with `down -v` for a clean slate.