Files
lmxopcua/docker-dev/traefik-dynamic.yml
Joseph Doherty 44b8a9c7ff
Some checks failed
v2-ci / build (push) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
fix(deploy): ClusterNode NodeId uses host:port + Traefik sticky cookie
Two bring-up issues found while clicking through the operator Deploy flow
on the docker-dev stack:

- ConfigPublishCoordinator computes expected-ack NodeIds from
  Akka.Cluster.State.Members as "{host}:{port}" (e.g. "driver-a:4053") to
  match ClusterRoleInfo's NodeId derivation. The seed had been using the
  bare service name ("driver-a"), so NodeDeploymentState INSERT hit FK
  violation 547 on NodeDeploymentState.NodeId → ClusterNode.NodeId. Seed
  now writes the full host:port form for every ClusterNode row.

- Blazor Server uses SignalR (WebSocket upgrade after the initial GET).
  Without sticky sessions, Traefik round-robins admin-a/admin-b and the
  WebSocket upgrade lands on the wrong backend, returning "No Connection
  with that ID: Status code '404'" so @onclick handlers never fire on the
  client. Added sticky.cookie (otopcua_lb, SameSite=Lax) to all three
  Traefik service loadBalancers so each session pins to one node.

Verified end-to-end: clicked "Deploy current configuration" on
/deployments → Deployment row sealed in ~70ms → driver-a + driver-b
spawn GalaxyMxGateway driver (stub=False) → GalaxyDriver connects to
http://10.100.0.48:5120 with the seeded ApiKeySecretRef=env:GALAXY_MXGW_API_KEY.
2026-05-26 15:10:11 -04:00

82 lines
2.6 KiB
YAML

# docker-dev companion to scripts/install/traefik-dynamic.yml. Routes three
# Akka clusters that share the Compose network:
#
# - Main cluster (default): PathPrefix(`/`) → admin-a / admin-b.
# - Site A cluster: Host(`site-a.localhost`) → site-a-1 / site-a-2.
# - Site B cluster: Host(`site-b.localhost`) → site-b-1 / site-b-2.
#
# Host-header rules are more specific than PathPrefix, so they win over the
# default router for the site hostnames automatically — no priority field needed.
http:
routers:
otopcua-admin:
entryPoints: ["web"]
rule: "PathPrefix(`/`)"
service: otopcua-admin
otopcua-site-a:
entryPoints: ["web"]
rule: "Host(`site-a.localhost`)"
service: otopcua-site-a
otopcua-site-b:
entryPoints: ["web"]
rule: "Host(`site-b.localhost`)"
service: otopcua-site-b
services:
otopcua-admin:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://admin-a:9000"
- url: "http://admin-b:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
otopcua-site-a:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://site-a-1:9000"
- url: "http://site-a-2:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s
otopcua-site-b:
loadBalancer:
# Blazor Server uses SignalR; the WebSocket upgrade must hit the same
# backend that owns the circuit ID. Sticky cookie keeps each session
# pinned to one node so the post-handshake WebSocket doesn't 404.
sticky:
cookie:
name: otopcua_lb
httpOnly: true
sameSite: lax
servers:
- url: "http://site-b-1:9000"
- url: "http://site-b-2:9000"
healthCheck:
path: /health/active
interval: 5s
timeout: 2s