diff --git a/docker-dev/Dockerfile b/docker-dev/Dockerfile index b4cf1f72..95008fba 100644 --- a/docker-dev/Dockerfile +++ b/docker-dev/Dockerfile @@ -1,5 +1,5 @@ # Multi-stage build of OtOpcUa.Host targeting linux-x64. Used by docker-dev/docker-compose.yml -# to spin four host containers (admin-a, admin-b, driver-a, driver-b) from a single image — +# to spin six host containers (central-1, central-2, site-a-1, site-a-2, site-b-1, site-b-2) from a single image — # Compose drives OTOPCUA_ROLES + Cluster:* env per container to differentiate them. FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build diff --git a/docker-dev/README.md b/docker-dev/README.md index 6e82dbf0..08fb5ed3 100644 --- a/docker-dev/README.md +++ b/docker-dev/README.md @@ -1,6 +1,6 @@ # docker-dev -Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + Traefik on the same Compose network. All three clusters share the single `OtOpcUa` ConfigDb — multi-tenancy is enforced by per-row `ServerCluster.ClusterId` scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name `otopcua`. +Mac-friendly OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **one single Akka mesh** (hub-and-spoke topology) + SQL Server + Traefik on the same Compose network. All six host nodes share the single `OtOpcUa` ConfigDb — logical separation between MAIN, SITE-A, and SITE-B is enforced by per-row `ServerCluster.ClusterId` scoping, not by mesh isolation. ## Stack @@ -8,50 +8,48 @@ Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration sm | Service | Role | Ports | |---|---|---| -| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` | -| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` | +| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all nodes | host `14330` → container `1433` | +| `traefik` | Routes `:80` by PathPrefix to central admin nodes | host `80`, dashboard `8089` | -Authentication uses the **shared GLAuth** on the Linux Docker host at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). Every host container binds that instance via `cn=serviceaccount,dc=zb,dc=local`. `DevStubMode` is **not** active. Sign in as `multi-role` / `password` to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password `password`. Group→role mappings are seeded by `seed/seed-clusters.sql` (`OtOpcUa-Admins`→Administrator, `OtOpcUa-Designers`→Designer, `OtOpcUa-Viewers`→Viewer). The shared GLAuth source of truth and deploy runbook live in `scadaproj/infra/glauth/`. +Authentication uses the **shared GLAuth** on the Linux Docker host at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). Only the central admin nodes authenticate users. Sign in as `multi-role` / `password` to get all three OtOpcUa roles (Administrator, Designer, Viewer), or use any other shared test user with password `password`. Group→role mappings are seeded by `seed/seed-clusters.sql` (`OtOpcUa-Admins`→Administrator, `OtOpcUa-Designers`→Designer, `OtOpcUa-Viewers`→Viewer). The shared GLAuth source of truth and deploy runbook live in `scadaproj/infra/glauth/`. -### Main cluster — split admin/driver roles +### Central nodes — fused admin+driver (MAIN cluster, UI + deploy singleton) -| Service | Role | Ports | +| Service | Roles | Ports | |---|---|---| -| `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` | -| `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` | -| `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` | -| `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` | +| `central-1` | `OTOPCUA_ROLES=admin,driver`, Akka mesh seed | host `4840` → container `4840`; internal `9000` | +| `central-2` | `OTOPCUA_ROLES=admin,driver`, joins central-1 | host `4841` → container `4840`; internal `9000` | -### Site A cluster — 2-node fused admin+driver +`central-1` and `central-2` are the **only** nodes that host the Admin UI and the deploy singleton. They are also the OPC UA publishers for the MAIN cluster. Traefik routes all `PathPrefix(/)` traffic to whichever central node has the leader role. -| Service | Role | Ports | +### Site A nodes — driver-only (SITE-A cluster) + +| Service | Roles | Ports | |---|---|---| -| `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` | -| `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` | +| `site-a-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4842` → container `4840` | +| `site-a-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4843` → container `4840` | -### Site B cluster — 2-node fused admin+driver +### Site B nodes — driver-only (SITE-B cluster) -| Service | Role | Ports | +| Service | Roles | Ports | |---|---|---| -| `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` | -| `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` | +| `site-b-1` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4844` → container `4840` | +| `site-b-2` | `OTOPCUA_ROLES=driver`, joins the single mesh | host `4845` → container `4840` | -All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by `ServerCluster.ClusterId` — see "Multi-tenancy" below. +Site nodes serve no UI and authenticate no users. The central cluster manages and deploys to them over the shared Akka mesh. All six nodes bind Akka remoting to port `4053` inside their own network namespace; `PublicHostname` for each matches its Compose service name. ## Multi-tenancy -All eight host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three Akka meshes: each Akka cluster maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope. +All six host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three logical clusters: each maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope. A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for SQL + the EF auto-migration to complete and then INSERTs the rows below. The seed is **idempotent** — `IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops: -| Akka mesh | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows | +| Logical cluster | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows | |---|---|---| -| Main | `MAIN` | `driver-a`, `driver-b` (OPC UA publishers) | +| Main | `MAIN` | `central-1`, `central-2` (OPC UA publishers + admin UI) | | Site A | `SITE-A` | `site-a-1`, `site-a-2` | | Site B | `SITE-B` | `site-b-1`, `site-b-2` | -`ClusterNode` is the table for **OPC UA-publishing nodes** (not every Akka cluster member), which is why the main cluster's `admin-a` / `admin-b` don't get rows — they're control-plane-only. - Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:` convention. The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually: @@ -72,21 +70,24 @@ The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its o # from the repo root docker compose -f docker-dev/docker-compose.yml up -d --build -# wait ~20 seconds for SQL to come up + all three clusters to form +# wait ~20 seconds for SQL to come up + the mesh to form -open http://localhost # main cluster admin UI -open http://site-a.localhost # site A admin UI -open http://site-b.localhost # site B admin UI +open http://localhost:9200 # Admin UI (Traefik → central-1 or central-2) open http://localhost:8089 # Traefik dashboard ``` -On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't. - The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache. ## Auth (dev only) -All host containers authenticate against the shared GLAuth at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). `DevStubMode` is **not** active. Sign in with any test user (password `password`); `multi-role` / `password` returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by `seed/seed-clusters.sql`. The GLAuth source of truth + deploy runbook is in `scadaproj/infra/glauth/`. **Do not** enable `DevStubMode` outside local debugging — production must always bind a real LDAP backend. +Central nodes authenticate against the shared GLAuth at `10.100.0.35:3893` (baseDN `dc=zb,dc=local`). `DevStubMode` is **not** active. Sign in with any test user (password `password`); `multi-role` / `password` returns all three roles (Administrator, Designer, Viewer). Group→role mappings are seeded by `seed/seed-clusters.sql`. The GLAuth source of truth + deploy runbook is in `scadaproj/infra/glauth/`. **Do not** enable `DevStubMode` outside local debugging — production must always bind a real LDAP backend. + +## Headless deploy + +```bash +POST http://localhost:9200/api/deployments +X-Api-Key: docker-dev-deploy-key +``` ## Tear down @@ -98,15 +99,16 @@ The `-v` drops the SQL volume; remove it to keep ConfigDb state across restarts. ## Failover smoke -1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service. -2. `docker compose -f docker-dev/docker-compose.yml stop admin-a` — `admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200. -3. `docker compose -f docker-dev/docker-compose.yml start admin-a` — `admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it. +1. Watch the Traefik dashboard at `http://localhost:8089`. Both `central-1` and `central-2` should be listed as healthy in the `otopcua-admin` service. +2. `docker compose -f docker-dev/docker-compose.yml stop central-1` — `central-2` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `central-2` once its `/health/active` returns 200. +3. `docker compose -f docker-dev/docker-compose.yml start central-1` — `central-1` rejoins as a follower; `central-2` keeps the leader role until something disturbs it. ## Notes - This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings. -- The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface): - - Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b) +- The OPC UA endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface): + - Main: `opc.tcp://localhost:4840` (central-1), `opc.tcp://localhost:4841` (central-2) - Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2) - Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2) - Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success. +- SQL persistence: ConfigDb state survives container restarts (named Docker volume). Drop the volume with `down -v` for a clean slate.