# Traefik Proxy The Traefik Proxy is the reverse proxy and load balancer that fronts the central cluster's two web servers. It exposes a single stable entrypoint for all central traffic — Central UI, Management API, Inbound API — and routes exclusively to whichever central node is currently the Akka.NET cluster leader, using a health-check on each node's `/health/active` endpoint to make that determination. When the active node changes, Traefik detects the change on its next poll cycle and redirects traffic automatically, with no operator intervention. ## Overview The proxy runs as the `scadabridge-traefik` Docker container in the main compose stack (`docker/docker-compose.yml`). It is a third-party infrastructure component (Traefik; the image tag is pinned in `docker/docker-compose.yml`) — there is no C# project for it. Its entire configuration is two YAML files mounted read-only into the container: - `docker/traefik/traefik.yml` — static config: entrypoints, API dashboard, and file provider declaration. - `docker/traefik/dynamic.yml` — routing rules: the router that catches all traffic, the `central` load-balancer service listing both backend nodes, and the `/health/active` health-check settings. The proxy sits on the `scadabridge-net` Docker bridge network alongside both central nodes (`scadabridge-central-a`, `scadabridge-central-b`) and all site containers, so it can reach the central backends by container name. ## Key Concepts ### Active-node routing via `/health/active` Traefik does not know which central node is the Akka.NET cluster leader — it discovers this by polling `/health/active` on both backends. The Host registers `ActiveNodeHealthCheck` under the `Active` health tag; `app.MapZbHealth()` serves it at `/health/active`. The check returns HTTP 200 on the leader and HTTP 503 on the standby (or when the actor system has not yet reached `MemberStatus.Up`): ```csharp public bool IsActiveNode { get { var system = _akkaService.ActorSystem; if (system == null) return false; var cluster = Cluster.Get(system); var self = cluster.SelfMember; if (self.Status != MemberStatus.Up) return false; var leader = cluster.State.Leader; return leader != null && leader == self.Address; } } ``` The identical leadership check backs `ActiveNodeGate` — the `IActiveNodeGate` implementation the Inbound API endpoint filter consults before executing method scripts. Both surfaces agree on which node is active because they share the same Akka cluster state. ### Automatic failover When the active central node goes down, the Akka cluster's keep-oldest split-brain resolver promotes the surviving node to leader (roughly 25 seconds: 10-second heartbeat threshold plus a 15-second stable-after period). Once the surviving node's `ActiveNodeHealthCheck` starts returning 200, Traefik's next poll cycle — within the 5-second interval — removes the failed backend from the pool and routes all subsequent requests to the new active node. No config change or restart is required on the Traefik side. ## Architecture ### Docker topology ```text Clients (CLI, browser, external API) │ host:9000 (HTTP) │ ┌───────▼──────────────────┐ │ scadabridge-traefik │ (Traefik container) │ entrypoint :80 │ └──────┬──────────┬─────────┘ │ /health/active poll (5s) ▼ ▼ scadabridge- scadabridge- central-a:5000 central-b:5000 (ACTIVE → 200) (STANDBY → 503) ``` Clients always connect to `http://localhost:9000`. The two central nodes are also reachable directly — `central-a` on host port 9001, `central-b` on host port 9002 — but these bypass the load balancer and should be used only for direct debugging. The Traefik dashboard is accessible at `http://localhost:8180`. ### Request flow Every incoming request on the `web` entrypoint hits the `central` router, which matches all paths (`PathPrefix("/")`) and forwards to the `central` load-balancer service. The load balancer only includes servers that are currently passing the health check, so in normal operation all traffic goes to the single healthy (active) backend. ## Usage Traefik starts automatically with the cluster compose stack: ```bash # Start full cluster (includes Traefik) docker compose -f docker/docker-compose.yml up -d # Check Traefik dashboard (shows backend health status) open http://localhost:8180 # Verify routing — reaches the active node curl http://localhost:9000/health/active # Direct node access (bypasses Traefik — use for debugging only) curl http://localhost:9001/health/active # central-a curl http://localhost:9002/health/active # central-b ``` The Traefik container's `restart: unless-stopped` policy means it recovers automatically after a Docker host restart. ## Configuration ### Static config (`docker/traefik/traefik.yml`) ```yaml entryPoints: web: address: ":80" api: dashboard: true insecure: true providers: file: filename: /etc/traefik/dynamic.yml ``` | Key | Value | Effect | |-----|-------|--------| | `entryPoints.web.address` | `:80` | Listens on container port 80, mapped to host port 9000. | | `api.dashboard` | `true` | Enables the Traefik web dashboard. | | `api.insecure` | `true` | Serves the dashboard on port 8080 without auth (development only). | | `providers.file.filename` | `/etc/traefik/dynamic.yml` | Loads routing rules from the mounted dynamic config; no Docker socket required. | ### Dynamic config (`docker/traefik/dynamic.yml`) ```yaml http: routers: central: rule: "PathPrefix(`/`)" service: central entryPoints: - web services: central: loadBalancer: healthCheck: path: /health/active interval: 5s timeout: 3s servers: - url: "http://scadabridge-central-a:5000" - url: "http://scadabridge-central-b:5000" ``` | Setting | Value | Effect | |---------|-------|--------| | `routers.central.rule` | `PathPrefix("/")` | Catches every request on the `web` entrypoint. | | `services.central.loadBalancer.healthCheck.path` | `/health/active` | The endpoint Traefik polls on each backend. | | `services.central.loadBalancer.healthCheck.interval` | `5s` | Poll cadence; a backend failing the check is removed within one interval. | | `services.central.loadBalancer.healthCheck.timeout` | `3s` | Per-poll timeout; a non-responding backend counts as unhealthy. | | `servers[0].url` | `http://scadabridge-central-a:5000` | `central-a` backend, reachable by container name on `scadabridge-net`. | | `servers[1].url` | `http://scadabridge-central-b:5000` | `central-b` backend, reachable by container name on `scadabridge-net`. | ### Port mapping | Host port | Container port | Purpose | |-----------|---------------|---------| | `9000` | `80` | Load-balanced entrypoint — all central traffic (Central UI, Management API, Inbound API). | | `8180` | `8080` | Traefik dashboard. | | `9001` | `5000` | Direct access to `central-a` (bypasses Traefik). | | `9002` | `5000` | Direct access to `central-b` (bypasses Traefik). | ## Dependencies & Interactions - [Host (#15)](./Host.md) — implements and serves `/health/active` via `ActiveNodeHealthCheck` (tagged `Active`, mounted by `app.MapZbHealth()`). Also implements `ActiveNodeGate`, which enforces the same active-node contract at the Inbound API filter level, providing a defence-in-depth layer if traffic reaches the standby directly. - [Cluster Infrastructure (#13)](./ClusterInfrastructure.md) — the underlying Akka.NET cluster determines which node is the leader. Traefik's routing decision is derived entirely from cluster leadership state via the health-check poll; Traefik has no Akka dependency of its own. - [Central UI (#9)](./CentralUI.md) — Blazor Server (SignalR/WebSocket circuits) is proxied through Traefik. Traefik proxies WebSocket connections natively with no additional config. On failover, active SignalR circuits on the failed node are lost; the browser's reconnection logic re-establishes the circuit on the new active node. Session continuity is preserved because authentication uses a cookie-embedded JWT with Data Protection keys shared across both central nodes. - [Inbound API (#14)](./InboundAPI.md) — external API consumers target `http://localhost:9000/api/{methodName}`. Traefik routes each request to the active node; if a request reaches the standby directly (bypassing Traefik), `ActiveNodeGate` responds with HTTP 503. - [CLI (#19)](./CLI.md) — the CLI connects to the Management API via `http://localhost:9000` (the Traefik entrypoint) by default, so it always reaches the active central node without needing to know which node is active. ## Troubleshooting ### Both backends show unhealthy on the dashboard If both `central-a` and `central-b` appear red on the Traefik dashboard, neither node's `ActiveNodeHealthCheck` is returning 200. Common causes: 1. **Akka cluster has not formed yet** — both nodes are still starting. Wait for the cluster to stabilise (typically 10–15 seconds after both containers are up). Check the central node logs for `Cluster is now ready`. 2. **Split-brain resolver has downed both nodes** — a network partition followed by a split-brain condition. Restart the cluster via `bash docker/deploy.sh`. 3. **Traefik cannot reach the backends** — the `scadabridge-net` Docker network may not exist. Create it: `docker network create scadabridge-net`. ### Traffic reaches a standby node If a client receives HTTP 503 with `X-ScadaBridge-Active: false`, the request reached a standby node — either because Traefik has not yet completed its health-check poll after a failover (up to 5 seconds), or because the client is connecting directly to port 9001/9002 instead of port 9000. Use `http://localhost:9000` for all normal access. The 503 is transient during the Traefik poll window; the client should retry. ### Health check succeeds but `/health/ready` returns degraded `/health/active` and `/health/ready` are independent. A node can pass the active check (it is the leader) but fail the readiness check (database or Akka cluster health probe failed). Traefik only uses `/health/active`; readiness gating is for orchestration and monitoring. Check the node's structured logs for `database` or `akka-cluster` check failures. ## Related Documentation - [Traefik Proxy design specification](../requirements/Component-TraefikProxy.md) - [Host](./Host.md) - [Cluster Infrastructure](./ClusterInfrastructure.md) - [Central UI](./CentralUI.md) - [Inbound API](./InboundAPI.md) - [CLI](./CLI.md)