dohertj2/lmxopcua

Fork 0

Files

Joseph Doherty ed1c17bc7b

v2-ci / build (push) Failing after 32s

Details

v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped

Details

v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped

Details

v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped

Details

v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped

Details

v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped

Details

v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped

Details

v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped

Details

fix(deploy,host): docker-dev bring-up — anon health probes, robust seeder

Two fixes surfaced while bringing up the docker-dev stack end-to-end:

- HealthEndpoints.MapOtOpcUaHealth now calls .AllowAnonymous() on /health/ready,
  /health/active, /healthz. Without it the AddOtOpcUaAuth fallback policy 401s
  every probe and Traefik marks every backend unhealthy → all three cluster
  routes return 503.

- cluster-seed entrypoint no longer attempts to apply Migrate-To-V2.sql via
  sqlcmd. The EF-generated idempotent script puts CREATE PROCEDURE inside
  IF NOT EXISTS BEGIN ... END blocks (procs must be first in their batch),
  so sqlcmd fails with "Must declare the scalar variable @FromGenerationId".
  EF's own runner handles this; sqlcmd doesn't. The seed now just waits for
  the schema and applies row inserts. Migrations remain the operator's job:
      dotnet ef database update --project src/Core/.../Configuration \
                                --startup-project src/Server/.../Host

Also: LDAP service removed (bitnami/openldap:2.6 image retired, legacy tag
crashes mid-setup with exit 68); every host now runs with
Authentication__Ldap__DevStubMode=true. Bumped LDAP+Traefik dashboard host
ports to avoid collisions with the sister scadalink dev stack (3893→3894,
8080→8089).

Confirmed working end-to-end: all three Traefik routes return HTTP 200,
cluster-seed populates ServerCluster (MAIN/SITE-A/SITE-B) + ClusterNode
(driver-a/b, site-a-1/2, site-b-1/2) rows on first boot.

2026-05-26 14:37:01 -04:00

6.5 KiB

Raw Blame History

docker-dev

Mac-friendly multi-cluster OtOpcUa fleet for manual UI exercise + integration smoke tests. Spins up three isolated Akka clusters + SQL Server + OpenLDAP + Traefik on the same Compose network. All three clusters share the single OtOpcUa ConfigDb — multi-tenancy is enforced by per-row ServerCluster.ClusterId scoping. Akka.Cluster gossip stays isolated between meshes because their seed-node lists are disjoint, even though they share the same system name otopcua.

Stack

Shared infrastructure

Service	Role	Ports
`sql`	SQL Server 2022 — single `OtOpcUa` ConfigDb shared by all three clusters	host `14330` → container `1433`
`traefik`	Routes :80 by Host header / PathPrefix	host `80`, dashboard `8089`

Authentication runs in DevStubMode — every host container has Authentication__Ldap__DevStubMode=true set, so the LDAP service is not part of the dev compose right now (the bitnami/openldap:2.6 image was retired and the legacy tag crashes mid-setup with exit 68). Any non-empty username/password signs in as FleetAdmin. To restore a real LDAP service, drop the env var and add an openldap-compatible image back to compose.

Main cluster — split admin/driver roles

Service	Role	Ports
`admin-a`	`OTOPCUA_ROLES=admin`, cluster seed	internal `9000`
`admin-b`	`OTOPCUA_ROLES=admin`, joins admin-a	internal `9000`
`driver-a`	`OTOPCUA_ROLES=driver`	host `4840` → container `4840`
`driver-b`	`OTOPCUA_ROLES=driver`	host `4841` → container `4840`

Site A cluster — 2-node fused admin+driver

Service	Role	Ports
`site-a-1`	`OTOPCUA_ROLES=admin,driver`, cluster seed	host `4842` → container `4840`
`site-a-2`	`OTOPCUA_ROLES=admin,driver`, joins site-a-1	host `4843` → container `4840`

Site B cluster — 2-node fused admin+driver

Service	Role	Ports
`site-b-1`	`OTOPCUA_ROLES=admin,driver`, cluster seed	host `4844` → container `4840`
`site-b-2`	`OTOPCUA_ROLES=admin,driver`, joins site-b-1	host `4845` → container `4840`

All containers bind Akka remoting to port 4053 inside their own network namespace; the PublicHostname of each matches its Compose service name. Akka mesh isolation is enforced purely by disjoint seed lists. Configuration-side isolation is enforced by ServerCluster.ClusterId — see "Multi-tenancy" below.

Multi-tenancy

All eight host nodes write to the same OtOpcUa ConfigDb. The ServerCluster table differentiates the three Akka meshes: each Akka cluster maps to one row, and each ClusterNode row's ClusterId ties the runtime node back to its owning cluster scope.

A one-shot cluster-seed Compose service (image mcr.microsoft.com/mssql-tools) waits for SQL + the EF auto-migration to complete and then INSERTs the rows below. The seed is idempotent — IF NOT EXISTS guards every insert — so re-runs on docker compose up are no-ops:

Akka mesh	`ServerCluster.ClusterId`	`ClusterNode.NodeId` rows
Main	`MAIN`	`driver-a`, `driver-b` (OPC UA publishers)
Site A	`SITE-A`	`site-a-1`, `site-a-2`
Site B	`SITE-B`	`site-b-1`, `site-b-2`

ClusterNode is the table for OPC UA-publishing nodes (not every Akka cluster member), which is why the main cluster's admin-a / admin-b don't get rows — they're control-plane-only.

Each ClusterNode.NodeId matches the node's Cluster__PublicHostname env value (Compose service name) — that's the lookup the runtime uses to resolve its own membership. ApplicationUri follows the urn:OtOpcUa:<NodeId> convention.

The SQL lives at seed/seed-clusters.sql; the wait-and-apply wrapper lives at seed/entrypoint.sh. To re-seed manually:

docker compose -f docker-dev/docker-compose.yml run --rm cluster-seed

Bring up

# from the repo root
docker compose -f docker-dev/docker-compose.yml up -d --build

# wait ~20 seconds for SQL to come up + all three clusters to form

open http://localhost                      # main cluster admin UI
open http://site-a.localhost               # site A admin UI
open http://site-b.localhost               # site B admin UI
open http://localhost:8089                 # Traefik dashboard

On macOS, *.localhost resolves to 127.0.0.1 automatically. On Linux add 127.0.0.1 site-a.localhost site-b.localhost to /etc/hosts if your resolver doesn't.

The first build takes a few minutes (.NET SDK image + restore + publish). Subsequent rebuilds are faster with Docker's layer cache.

Auth (dev only)

Authentication__Ldap__DevStubMode=true is set on every host container, so any non-empty username/password signs in as a FleetAdmin user without contacting an LDAP server. Do not ship this configuration to production — set DevStubMode=false and wire a real LDAP backend before any non-dev deployment.

Tear down

docker compose -f docker-dev/docker-compose.yml down -v

The -v drops the SQL + LDAP volumes; remove it to keep ConfigDb state across restarts.

Failover smoke

Watch the Traefik dashboard at http://localhost:8089. Both admin-a and admin-b should be listed as healthy in the otopcua-admin service.
docker compose -f docker-dev/docker-compose.yml stop admin-a — admin-b should pick up the admin role-leader within ~15 s (Akka split-brain stable-after). Traefik will route traffic to admin-b once its /health/active returns 200.
docker compose -f docker-dev/docker-compose.yml start admin-a — admin-a rejoins as a follower; admin-b keeps the leader role until something disturbs it.

Notes

This compose is for the local Mac/Linux developer rig. The team's CI + soak runs go to the remote docker host at 10.100.0.35 (see docs/v2/dev-environment.md); the file there mirrors this one with adjusted port bindings.
The OPC UA driver endpoints are reachable directly from the host (Traefik is only in front of the admin HTTP surface):
- Main: opc.tcp://localhost:4840 (driver-a), opc.tcp://localhost:4841 (driver-b)
- Site A: opc.tcp://localhost:4842 (site-a-1), opc.tcp://localhost:4843 (site-a-2)
- Site B: opc.tcp://localhost:4844 (site-b-1), opc.tcp://localhost:4845 (site-b-2)
Galaxy + Wonderware drivers can't run in Linux containers (they need the Windows-only mxaccessgw + Historian SDK). On non-Windows, DriverInstanceActor.ShouldStub(driverType, roles) returns true for those types and the actor goes straight to a Stubbed state that returns deterministic success.

6.5 KiB Raw Blame History

docker-dev

Stack

Shared infrastructure

Main cluster — split admin/driver roles

Site A cluster — 2-node fused admin+driver

Site B cluster — 2-node fused admin+driver

Multi-tenancy

Bring up

Auth (dev only)

Tear down

Failover smoke

Notes

6.5 KiB

Raw Blame History