Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:
- ConfigPublishCoordinator computes expected-ack NodeIds from
Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
match ClusterRoleInfo's NodeId derivation. The seed had been using the
bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
now writes the full host:port form for every ClusterNode row.
- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
Without sticky sessions, Traefik round-robins admin-a/admin-b and the
WebSocket upgrade lands on the wrong backend, returning "No Connection
with that ID: Status code '404'" so @onclick handlers never fire on the
client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
Traefik service loadBalancers so each session pins to one node.
Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
Extends the docker-dev compose with two additional, fully-isolated Akka
clusters representing distinct sites. Each site is a 2-node fused
admin+driver cluster (OTOPCUA_ROLES=admin,driver on both nodes), backed
by its own ConfigDb database so configuration state stays separate from
the main cluster and from the other site.
Cluster isolation: the three meshes share the same Akka system name
"otopcua" and remoting port 4053 (inside each container's own network
namespace), but their seed-node lists are disjoint — main seeds at
admin-a, site-a seeds at site-a-1, site-b seeds at site-b-1 — so gossip
doesn't cross between them.
Layout:
Main cluster ConfigDb=OtOpcUa admin-a, admin-b, driver-a, driver-b
Site A ConfigDb=OtOpcUa_SiteA site-a-1, site-a-2 (fused admin+driver)
Site B ConfigDb=OtOpcUa_SiteB site-b-1, site-b-2 (fused admin+driver)
OPC UA endpoints exposed on host ports 4840-4845. Admin UIs reachable
through Traefik via Host-header routing:
http://localhost → main cluster (PathPrefix default)
http://site-a.localhost → site A
http://site-b.localhost → site B
`*.localhost` auto-resolves on macOS; Linux users add the two hosts to
/etc/hosts (or rely on the resolver's RFC 6761 behaviour).
- scripts/install/traefik.yml + traefik-dynamic.yml: Traefik static + dynamic
config. One :80 entry point, one router on HostRegexp(otopcua.*), one
service load-balancing admin-a:9000 + admin-b:9000 with /health/active health
check (interval 5s, timeout 2s, expected 200). Followers return 503 from
/health/active so Traefik drops them within the next interval after a
leadership change.
- scripts/install/Install-Traefik.ps1: downloads Traefik for Windows, drops the
yml configs, registers the OtOpcUaTraefik Windows service via sc.exe with
restart-on-failure. Companion to Install-Services.ps1.
- docker-dev/{Dockerfile,docker-compose.yml,traefik-dynamic.yml,README.md}:
Mac-friendly four-node fleet (admin-a + admin-b + driver-a + driver-b) plus
SQL Server 2022 + OpenLDAP + Traefik. Single OtOpcUa.Host image built once;
Compose drives OTOPCUA_ROLES + Cluster:* per container to differentiate the
four hosts. README walks through bring-up + failover smoke + the dev LDAP
users.
Note: untested on macOS (no local Docker — see docs/v2/dev-environment.md).