fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder
Some checks failed
v2-ci / build (push) Failing after 32s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped

Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.
This commit is contained in:
Joseph Doherty
2026-05-26 14:37:01 -04:00
parent 1e64488c0d
commit ed1c17bc7b
4 changed files with 52 additions and 59 deletions

View File

@@ -9,8 +9,9 @@ Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration sm
| Service | Role | Ports | | Service | Role | Ports |
|---|---|---| |---|---|---|
| `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` | | `sql` | SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters | host `14330` → container `1433` |
| `ldap` | OpenLDAP with dev users `alice` / `bob` | host `3893` → container `1389` | | `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8089` |
| `traefik` | Routes :80 by Host header / PathPrefix | host `80`, dashboard `8080` |
Authentication runs in `DevStubMode` — every host container has `Authentication__Ldap__DevStubMode=true` set, so the LDAP service is not part of the dev compose right now (the `bitnami/openldap:2.6` image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as `FleetAdmin`. To restore a real LDAP service, drop the env var and add an `openldap`-compatible image back to compose.
### Main cluster — split admin/driver roles ### Main cluster — split admin/driver roles
@@ -70,7 +71,7 @@ docker compose -f docker-dev/docker-compose.yml up -d --build
open http://localhost # main cluster admin UI open http://localhost # main cluster admin UI
open http://site-a.localhost # site A admin UI open http://site-a.localhost # site A admin UI
open http://site-b.localhost # site B admin UI open http://site-b.localhost # site B admin UI
open http://localhost:8080 # Traefik dashboard open http://localhost:8089 # Traefik dashboard
``` ```
On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't. On macOS, `*.localhost` resolves to `127.0.0.1` automatically. On Linux add `127.0.0.1 site-a.localhost site-b.localhost` to `/etc/hosts` if your resolver doesn't.
@@ -79,14 +80,7 @@ The first build takes a few minutes (.NET SDK image + restore + publish). Subseq
## Auth (dev only) ## Auth (dev only)
Use one of the LDAP dev users from `LDAP_USERS` in `docker-compose.yml`: `Authentication__Ldap__DevStubMode=true` is set on every host container, so any non-empty username/password signs in as a `FleetAdmin` user without contacting an LDAP server. **Do not** ship this configuration to production — set `DevStubMode=false` and wire a real LDAP backend before any non-dev deployment.
| Username | Password |
|---|---|
| `alice` | `alice123` |
| `bob` | `bob123` |
The compose mounts everyone into `ou=FleetAdmin` so the dev role mapping resolves to `FleetAdmin`.
## Tear down ## Tear down
@@ -98,7 +92,7 @@ The `-v` drops the SQL + LDAP volumes; remove it to keep ConfigDb state across r
## Failover smoke ## Failover smoke
1. Watch the Traefik dashboard at `http://localhost:8080`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service. 1. Watch the Traefik dashboard at `http://localhost:8089`. Both `admin-a` and `admin-b` should be listed as healthy in the `otopcua-admin` service.
2. `docker compose -f docker-dev/docker-compose.yml stop admin-a``admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200. 2. `docker compose -f docker-dev/docker-compose.yml stop admin-a``admin-b` should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to `admin-b` once its `/health/active` returns 200.
3. `docker compose -f docker-dev/docker-compose.yml start admin-a``admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it. 3. `docker compose -f docker-dev/docker-compose.yml start admin-a``admin-a` rejoins as a follower; `admin-b` keeps the leader role until something disturbs it.

View File

@@ -35,7 +35,7 @@
# open http://localhost # main cluster Blazor admin UI # open http://localhost # main cluster Blazor admin UI
# open http://site-a.localhost # site A admin UI # open http://site-a.localhost # site A admin UI
# open http://site-b.localhost # site B admin UI # open http://site-b.localhost # site B admin UI
# open http://localhost:8080 # Traefik dashboard # open http://localhost:8089 # Traefik dashboard (8080 is the sister scadalink stack)
# #
# Tear-down: docker compose -f docker-dev/docker-compose.yml down -v # Tear-down: docker compose -f docker-dev/docker-compose.yml down -v
@@ -71,17 +71,12 @@ services:
entrypoint: ["/bin/bash", "/seed/entrypoint.sh"] entrypoint: ["/bin/bash", "/seed/entrypoint.sh"]
restart: "no" restart: "no"
ldap: # OpenLDAP was previously here but the bitnami/openldap:2.6 image was retired
image: bitnami/openldap:2.6 # (manifest gone) and bitnamilegacy/openldap:2.6 crashes during LDIF setup with
environment: # exit 68. For the dev compose every host container now runs with
LDAP_ROOT: "dc=lmxopcua,dc=local" # Authentication__Ldap__DevStubMode=true, so any non-empty username/password
LDAP_ADMIN_USERNAME: "admin" # signs in as `FleetAdmin`. Restore a real LDAP service when there's a need
LDAP_ADMIN_PASSWORD: "ldapadmin" # for end-to-end LDAP coverage (the host code path is unchanged).
LDAP_USERS: "alice,bob"
LDAP_PASSWORDS: "alice123,bob123"
LDAP_USER_DC: "ou=FleetAdmin"
ports:
- "3893:1389"
admin-a: &otopcua-host admin-a: &otopcua-host
build: build:
@@ -102,9 +97,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
admin-b: admin-b:
<<: *otopcua-host <<: *otopcua-host
@@ -120,9 +113,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
driver-a: driver-a:
<<: *otopcua-host <<: *otopcua-host
@@ -170,9 +161,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
ports: ports:
- "4842:4840" - "4842:4840"
@@ -194,9 +183,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
ports: ports:
- "4843:4840" - "4843:4840"
@@ -217,9 +204,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
ports: ports:
- "4844:4840" - "4844:4840"
@@ -241,9 +226,7 @@ services:
Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345" Security__Jwt__SigningKey: "docker-dev-signing-key-with-at-least-32-bytes-of-utf8-content-12345"
Security__Jwt__Issuer: "otopcua-dev" Security__Jwt__Issuer: "otopcua-dev"
Security__Jwt__Audience: "otopcua-dev" Security__Jwt__Audience: "otopcua-dev"
Authentication__Ldap__Server: "ldap" Authentication__Ldap__DevStubMode: "true"
Authentication__Ldap__Port: "1389"
Authentication__Ldap__AllowInsecureLdap: "true"
ports: ports:
- "4845:4840" - "4845:4840"
@@ -256,7 +239,7 @@ services:
- --api.insecure=true - --api.insecure=true
ports: ports:
- "80:80" - "80:80"
- "8080:8080" - "8089:8080" # 8080 conflicts with the sister scadalink dev stack
volumes: volumes:
- ./traefik-dynamic.yml:/etc/traefik/dynamic.yml:ro - ./traefik-dynamic.yml:/etc/traefik/dynamic.yml:ro
depends_on: depends_on:

View File

@@ -1,35 +1,48 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# docker-dev cluster-seed entrypoint. Waits for the host containers to finish # docker-dev cluster-seed entrypoint. Waits for the OtOpcUa ConfigDb schema to
# their EF Core auto-migration (which creates the ServerCluster table), then # be in place, then applies the idempotent row seed.
# applies the idempotent seed script.
# #
# Image: mcr.microsoft.com/mssql-tools (Debian + sqlcmd at /opt/mssql-tools18/bin). # IMPORTANT: this container does NOT run EF migrations — sqlcmd can't execute
# the V2 migration script cleanly because it contains CREATE PROCEDURE
# statements inside IF NOT EXISTS BEGIN ... END blocks (procs must be the
# first statement in their batch). Migrations are owned by the operator:
#
# dotnet ef database update \
# --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration \
# --startup-project src/Server/ZB.MOM.WW.OtOpcUa.Host
#
# (with ConnectionStrings__ConfigDb pointing at Server=localhost,14330;...).
# Once the schema is in place, restart the cluster-seed container — or just
# `docker compose up -d` and the seed will pick up where it left off thanks to
# the IF NOT EXISTS guards in seed-clusters.sql.
set -euo pipefail set -euo pipefail
SQLCMD="/opt/mssql-tools18/bin/sqlcmd" SQLCMD="/opt/mssql-tools/bin/sqlcmd"
SERVER="${SQL_HOST:-sql},1433" SERVER="${SQL_HOST:-sql},1433"
USER="${SQL_USER:-sa}" USER="${SQL_USER:-sa}"
PASS="${SQL_PASSWORD:-OtOpcUa!Dev123}" PASS="${SQL_PASSWORD:-OtOpcUa!Dev123}"
DB="${SQL_DATABASE:-OtOpcUa}" DB="${SQL_DATABASE:-OtOpcUa}"
run_sql() { run_sql_in() {
"$SQLCMD" -S "$SERVER" -U "$USER" -P "$PASS" -d "$DB" -No -b -h -1 "$@" local target_db="$1"; shift
# -I forces SET QUOTED_IDENTIFIER ON (needed for filtered indexes if you
# ever extend this script to touch them).
"$SQLCMD" -S "$SERVER" -U "$USER" -P "$PASS" -d "$target_db" -b -h -1 -I "$@"
} }
echo "[cluster-seed] waiting for SQL Server to accept connections..." echo "[cluster-seed] waiting for SQL Server to accept connections..."
until run_sql -Q "SELECT 1" >/dev/null 2>&1; do until run_sql_in master -Q "SELECT 1" >/dev/null 2>&1; do
sleep 2 sleep 2
done done
echo "[cluster-seed] SQL Server up." echo "[cluster-seed] SQL Server up."
echo "[cluster-seed] waiting for $DB.ServerCluster (host containers must finish EF migration)..." echo "[cluster-seed] waiting for ${DB} database + dbo.ServerCluster table (operator must run dotnet ef database update)..."
until run_sql -Q "IF OBJECT_ID('dbo.ServerCluster') IS NULL THROW 50001, 'missing', 1; SELECT 1" >/dev/null 2>&1; do until run_sql_in "$DB" -Q "IF OBJECT_ID('dbo.ServerCluster') IS NULL THROW 50001, 'missing', 1; SELECT 1" >/dev/null 2>&1; do
sleep 3 sleep 3
done done
echo "[cluster-seed] schema ready." echo "[cluster-seed] schema ready."
echo "[cluster-seed] applying seed-clusters.sql..." echo "[cluster-seed] applying seed-clusters.sql (ServerCluster + ClusterNode rows)..."
run_sql -i /seed/seed-clusters.sql run_sql_in "$DB" -i /seed/seed-clusters.sql
echo "[cluster-seed] done." echo "[cluster-seed] done."

View File

@@ -24,18 +24,21 @@ public static class HealthEndpoints
public static IEndpointRouteBuilder MapOtOpcUaHealth(this IEndpointRouteBuilder app) public static IEndpointRouteBuilder MapOtOpcUaHealth(this IEndpointRouteBuilder app)
{ {
// AllowAnonymous on all three — Traefik / k8s liveness probes / load-balancers
// hit these without credentials. Without it the AddOtOpcUaAuth fallback policy
// 401s every probe and Traefik marks every backend unhealthy.
app.MapHealthChecks("/health/ready", new HealthCheckOptions app.MapHealthChecks("/health/ready", new HealthCheckOptions
{ {
Predicate = c => c.Tags.Contains("ready"), Predicate = c => c.Tags.Contains("ready"),
}); }).AllowAnonymous();
app.MapHealthChecks("/health/active", new HealthCheckOptions app.MapHealthChecks("/health/active", new HealthCheckOptions
{ {
Predicate = c => c.Tags.Contains("active"), Predicate = c => c.Tags.Contains("active"),
}); }).AllowAnonymous();
app.MapHealthChecks("/healthz", new HealthCheckOptions app.MapHealthChecks("/healthz", new HealthCheckOptions
{ {
Predicate = _ => false, // process-liveness only — no probes run. Predicate = _ => false, // process-liveness only — no probes run.
}); }).AllowAnonymous();
return app; return app;
} }
} }