Some checks failed
v2-ci / build (push) Failing after 37s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
User confirmed the mxaccessgw client (Galaxy driver) doesn't need Windows — only the gateway worker has that constraint. This wires the Galaxy driver into the docker-dev fleet: - docker-compose.yml: GALAXY_MXGW_API_KEY env var on every host service (admin nodes harmlessly ignore it; driver-role nodes pick it up when the seeded DriverInstance resolves ApiKeySecretRef=env:GALAXY_MXGW_API_KEY). Default value matches the key the operator provided; override via shell env (GALAXY_MXGW_API_KEY=... docker compose up -d) to rotate without editing compose. - seed-clusters.sql: now creates a SystemPlatform Namespace (MAIN-galaxy, urn:zb:docker-dev:galaxy) plus a GalaxyMxGateway DriverInstance (MAIN-galaxy-mxgw) in the MAIN cluster pointing at http://10.100.0.48:5120 with UseTls=false. Idempotent via IF NOT EXISTS. - DriverInstanceActor.ShouldStub: clarified the doc comment — only the legacy "Galaxy" type name and "Historian.Wonderware" are Windows-only; the v2 "GalaxyMxGateway" driver is .NET 10 cross-platform (gRPC to an external gateway) and is NOT stubbed. - README: documents the final operator step — sign in, click "Deploy current configuration" on /deployments to materialise the seeded Galaxy driver into a running gRPC connection. Raw DriverInstance rows don't spawn drivers on their own; the v2 lifecycle requires a sealed Deployment first.
113 lines
7.3 KiB
Markdown
113 lines
7.3 KiB
Markdown
# docker-dev
|
|
|
|
Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up **three isolated Akka clusters** + SQL Server + OpenLDAP + Traefik on the same Compose network. All three clusters share the single `OtOpcUa` ConfigDb — multi-tenancy is enforced by per-row `ServerCluster.ClusterId` scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name `otopcua`.
|
|
|
|
## Stack
|
|
|
|
### Shared infrastructure
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` |
|
|
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` |
|
|
|
|
Authentication runs in `DevStubMode` — every host container has `Authentication__Ldap__DevStubMode=true` set, so the LDAP service is not part of the dev compose right now (the `bitnami/openldap:2.6` image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as `FleetAdmin`. To restore a real LDAP service, drop the env var and add an `openldap`-compatible image back to compose.
|
|
|
|
### Main cluster — split admin/driver roles
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `admin-a` | `OTOPCUA_ROLES=admin`, cluster seed | internal `9000` |
|
|
| `admin-b` | `OTOPCUA_ROLES=admin`, joins admin-a | internal `9000` |
|
|
| `driver-a` | `OTOPCUA_ROLES=driver` | host `4840` → container `4840` |
|
|
| `driver-b` | `OTOPCUA_ROLES=driver` | host `4841` → container `4840` |
|
|
|
|
### Site A cluster — 2-node fused admin+driver
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `site-a-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4842` → container `4840` |
|
|
| `site-a-2` | `OTOPCUA_ROLES=admin,driver`, joins site-a-1 | host `4843` → container `4840` |
|
|
|
|
### Site B cluster — 2-node fused admin+driver
|
|
|
|
| Service | Role | Ports |
|
|
|---|---|---|
|
|
| `site-b-1` | `OTOPCUA_ROLES=admin,driver`, cluster seed | host `4844` → container `4840` |
|
|
| `site-b-2` | `OTOPCUA_ROLES=admin,driver`, joins site-b-1 | host `4845` → container `4840` |
|
|
|
|
All containers bind Akka remoting to port `4053` inside their own network namespace; the `PublicHostname` of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by `ServerCluster.ClusterId` — see "Multi-tenancy" below.
|
|
|
|
## Multi-tenancy
|
|
|
|
All eight host nodes write to the same `OtOpcUa` ConfigDb. The `ServerCluster` table differentiates the three Akka meshes: each Akka cluster maps to one row, and each `ClusterNode` row's `ClusterId` ties the runtime node back to its owning cluster scope.
|
|
|
|
A one-shot `cluster-seed` Compose service (image `mcr.microsoft.com/mssql-tools`) waits for SQL + the EF auto-migration to complete and then INSERTs the rows below. The seed is **idempotent** — `IF NOT EXISTS` guards every insert — so re-runs on `docker compose up` are no-ops:
|
|
|
|
| Akka mesh | `ServerCluster.ClusterId` | `ClusterNode.NodeId` rows |
|
|
|---|---|---|
|
|
| Main | `MAIN` | `driver-a`, `driver-b` (OPC UA publishers) |
|
|
| Site A | `SITE-A` | `site-a-1`, `site-a-2` |
|
|
| Site B | `SITE-B` | `site-b-1`, `site-b-2` |
|
|
|
|
`ClusterNode` is the table for **OPC UA-publishing nodes** (not every Akka cluster member), which is why the main cluster's `admin-a` / `admin-b` don't get rows — they're control-plane-only.
|
|
|
|
Each `ClusterNode.NodeId` matches the node's `Cluster__PublicHostname` env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. `ApplicationUri` follows the `urn:OtOpcUa:<NodeId>` convention.
|
|
|
|
The SQL lives at `seed/seed-clusters.sql`; the wait-and-apply wrapper lives at `seed/entrypoint.sh`. To re-seed manually:
|
|
|
|
```bash
|
|
docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed
|
|
```
|
|
|
|
### Galaxy / MxAccess gateway
|
|
|
|
The seed also pre-creates a `SystemPlatform` Namespace + a `GalaxyMxGateway` DriverInstance in the MAIN cluster pointing at `http://10.100.0.48:5120`. The API key is resolved from the `GALAXY_MXGW_API_KEY` env var set on every driver-role container in compose; override via `GALAXY_MXGW_API_KEY=... docker compose up -d` to swap keys without editing the compose file.
|
|
|
|
The DriverHost actor doesn't spawn drivers from raw DriverInstance rows on its own — the v2 deploy lifecycle requires a *sealed Deployment* before drivers materialise. After first bring-up, sign in to the Admin UI and click **Deploy current configuration** on `/deployments` to compose the seeded rows into an artifact and dispatch it. The Galaxy driver instance will start its gRPC connection to the gateway on the next deploy ack.
|
|
|
|
## Bring up
|
|
|
|
```bash
|
|
# from the repo root
|
|
docker compose -f docker-dev/docker-compose.yml up -d --build
|
|
|
|
# wait ~20 seconds for SQL to come up + all three clusters to form
|
|
|
|
open http://localhost # main cluster admin UI
|
|
open http://site-a.localhost # site A admin UI
|
|
open http://site-b.localhost # site B admin UI
|
|
open http://localhost:8089 # Traefik dashboard
|
|
```
|
|
|
|
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
|
|
|
|
The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.
|
|
|
|
## Auth (dev only)
|
|
|
|
`Authentication__Ldap__DevStubMode=true` is set on every host container, so any non-empty username/password signs in as a `FleetAdmin` user without contacting an LDAP server. **Do not** ship this configuration to production — set `DevStubMode=false` and wire a real LDAP backend before any non-dev deployment.
|
|
|
|
## Tear down
|
|
|
|
```bash
|
|
docker compose -f docker-dev/docker-compose.yml down -v
|
|
```
|
|
|
|
The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts.
|
|
|
|
## Failover smoke
|
|
|
|
1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
|
|
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a` — `admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
|
|
3. `docker compose -f docker-dev/docker-compose.yml start admin-a` — `admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.
|
|
|
|
## Notes
|
|
|
|
- This compose is for the **local Mac/Linux developer rig**. The team's CI + soak runs go to the remote docker host at `10.100.0.35` (see `docs/v2/dev-environment.md`); the file there mirrors this one with adjusted port bindings.
|
|
- The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
|
|
- Main: `opc.tcp://localhost:4840` (driver-a), `opc.tcp://localhost:4841` (driver-b)
|
|
- Site A: `opc.tcp://localhost:4842` (site-a-1), `opc.tcp://localhost:4843` (site-a-2)
|
|
- Site B: `opc.tcp://localhost:4844` (site-b-1), `opc.tcp://localhost:4845` (site-b-2)
|
|
- Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, `DriverInstanceActor.ShouldStub(driverType, roles)` returns `true` for those types and the actor goes straight to a `Stubbed` state that returns deterministic success.
|