fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
Some checks failed
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped

Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
This commit is contained in:
Joseph Doherty
2026-05-26 14:37:01 -04:00
parent 1e64488c0d
commit ed1c17bc7b
4 changed files with 52 additions and 59 deletions

View File

@@ -9,8 +9,9 @@ Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration sm
| Service | Role | Ports |
|---|---|---|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` |
| `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` |
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8080` |
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` |
Authentication runs in `DevStubMode` — every host container has `Authentication__Ldap__DevStubMode=true` set, so the LDAP service is not part of the dev compose right now (the `bitnami/openldap:2.6` image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as `FleetAdmin`. To restore a real LDAP service, drop the env var and add an `openldap`-compatible image back to compose.
### Main cluster — split admin/driver roles
@@ -70,7 +71,7 @@ docker compose -f docker-dev/docker-compose.yml up -d --build
open http://localhost # main cluster admin UI
open http://site-a.localhost # site A admin UI
open http://site-b.localhost # site B admin UI
open http://localhost:8080 # Traefik dashboard
open http://localhost:8089 # Traefik dashboard
```
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
@@ -79,14 +80,7 @@ The first build takes a few minutes (.NET SDK image + restore + publish). Subseq
## Auth (dev only)
Use one of the LDAP dev users from `LDAP_USERS` in `docker-compose.yml`:
| Username | Password |
|---|---|
| `alice` | `alice123` |
| `bob` | `bob123` |
The compose mounts everyone into `ou=FleetAdmin` so the dev role mapping resolves to `FleetAdmin`.
`Authentication__Ldap__DevStubMode=true` is set on every host container, so any non-empty username/password signs in as a `FleetAdmin` user without contacting an LDAP server. **Do not** ship this configuration to production — set `DevStubMode=false` and wire a real LDAP backend before any non-dev deployment.
## Tear down
@@ -98,7 +92,7 @@ The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across r
## Failover smoke
1. Watch the Traefik dashboard at `http://localhost:8080`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a``admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
3. `docker compose -f docker-dev/docker-compose.yml start admin-a``admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.