# ScadaLink Docker Infrastructure Local Docker deployment of the full ScadaLink cluster topology: a 2-node central cluster and three 2-node site clusters. ## Cluster Topology ``` ┌───────────────────┐ │ Traefik LB :9000 │ ◄── CLI / Browser │ Dashboard :8180 │ └────────┬──────────┘ │ routes to active node ┌──────────────────────┼──────────────────────────────┐ │ Central Cluster │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ central-node-a │◄──►│ central-node-b │ │ │ │ (leader/oldest) │ │ (standby) │ │ │ │ Web UI :9001 │ │ Web UI :9002 │ │ │ │ Akka :9011 │ │ Akka :9012 │ │ │ └────────┬─────────┘ └─────────────────┘ │ │ │ │ └───────────┼─────────────────────────────────────────┘ │ Akka.NET Remoting (hub-and-spoke) ├──────────────────┬──────────────────┐ ▼ ▼ ▼ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │ Site-A Cluster │ │ Site-B Cluster │ │ Site-C Cluster │ │ (Test Plant A) │ │ (Test Plant B) │ │ (Test Plant C) │ │ │ │ │ │ │ │ node-a ◄──► node-b│ │ node-a ◄──► node-b│ │ node-a ◄──► node-b│ │ Akka :9021 :9022 │ │ Akka :9031 :9032 │ │ Akka :9041 :9042 │ │ gRPC :9023 :9024 │ │ gRPC :9033 :9034 │ │ gRPC :9043 :9044 │ └────────────────────┘ └────────────────────┘ └────────────────────┘ ``` ### Central Cluster (active/standby) Runs the web UI (Blazor Server), Template Engine, Deployment Manager, Security, Inbound API, Management Service, and Health Monitoring. Connects to MS SQL for configuration and machine data, LDAP for authentication, and SMTP for notifications. ### Site Clusters (active/standby each) Each site cluster runs Site Runtime, Data Connection Layer, Store-and-Forward, and Site Event Logging. Sites connect to OPC UA for device data and to the central cluster via Akka.NET remoting. Each site node also hosts a gRPC streaming server (port 8083) that central nodes connect to for real-time attribute value and alarm state streams. Deployed configurations and S&F buffers are stored in local SQLite databases per node. | Site Cluster | Site Identifier | Central UI Name | |-------------|-----------------|-----------------| | Site-A | `site-a` | Test Plant A | | Site-B | `site-b` | Test Plant B | | Site-C | `site-c` | Test Plant C | ## Port Allocation ### Application Nodes | Node | Container Name | Host Web Port | Host Akka Port | Host gRPC Port | Internal Ports | |------|---------------|---------------|----------------|----------------|----------------| | Traefik LB | `scadalink-traefik` | 9000 | — | — | 80 (proxy), 8080 (dashboard) | | Central A | `scadalink-central-a` | 9001 | 9011 | — | 5000 (web), 8081 (Akka) | | Central B | `scadalink-central-b` | 9002 | 9012 | — | 5000 (web), 8081 (Akka) | | Site-A A | `scadalink-site-a-a` | — | 9021 | 9023 | 8082 (Akka), 8083 (gRPC) | | Site-A B | `scadalink-site-a-b` | — | 9022 | 9024 | 8082 (Akka), 8083 (gRPC) | | Site-B A | `scadalink-site-b-a` | — | 9031 | 9033 | 8082 (Akka), 8083 (gRPC) | | Site-B B | `scadalink-site-b-b` | — | 9032 | 9034 | 8082 (Akka), 8083 (gRPC) | | Site-C A | `scadalink-site-c-a` | — | 9041 | 9043 | 8082 (Akka), 8083 (gRPC) | | Site-C B | `scadalink-site-c-b` | — | 9042 | 9044 | 8082 (Akka), 8083 (gRPC) | Port block pattern: `90X1`/`90X2` (Akka), `90X3`/`90X4` (gRPC) where X = 0 (central), 2 (site-a), 3 (site-b), 4 (site-c). gRPC streaming ports are used by central nodes to subscribe to real-time site data streams. ### Infrastructure Services (from `infra/docker-compose.yml`) | Service | Container Name | Host Port | Purpose | |---------|---------------|-----------|---------| | MS SQL 2022 | `scadalink-mssql` | 1433 | Configuration and machine data databases | | LDAP (GLAuth) | `scadalink-ldap` | 3893 | Authentication with test users | | SMTP (Mailpit) | `scadalink-smtp` | 1025 / 8025 | Email capture (SMTP / web UI) | | OPC UA | `scadalink-opcua` | 50000 / 8080 | Simulated OPC UA server (protocol / web UI) | | REST API | `scadalink-restapi` | 5200 | External REST API for integration testing | All containers communicate over the shared `scadalink-net` Docker bridge network using container names as hostnames. ## Directory Structure ``` docker/ ├── Dockerfile # Multi-stage build (shared by all nodes) ├── docker-compose.yml # 8-node application stack ├── build.sh # Build Docker image ├── deploy.sh # Build + deploy all containers ├── seed-sites.sh # Create test sites with Akka + gRPC addresses ├── teardown.sh # Stop and remove containers ├── central-node-a/ │ ├── appsettings.Central.json # Central node A configuration │ └── logs/ # Serilog file output (gitignored) ├── central-node-b/ │ ├── appsettings.Central.json │ └── logs/ ├── site-a-node-a/ │ ├── appsettings.Site.json # Site-A node A configuration │ ├── data/ # SQLite databases (gitignored) │ └── logs/ ├── site-a-node-b/ │ ├── appsettings.Site.json │ ├── data/ │ └── logs/ ├── site-b-node-a/ │ ├── appsettings.Site.json # Site-B node A configuration │ ├── data/ │ └── logs/ ├── site-b-node-b/ │ ├── appsettings.Site.json │ ├── data/ │ └── logs/ ├── site-c-node-a/ │ ├── appsettings.Site.json # Site-C node A configuration │ ├── data/ │ └── logs/ └── site-c-node-b/ ├── appsettings.Site.json ├── data/ └── logs/ ``` ## Commands ### Initial Setup Start infrastructure services first, then build and deploy the application: ```bash # 1. Start test infrastructure (MS SQL, LDAP, SMTP, OPC UA) cd infra && docker compose up -d && cd .. # 2. Build and deploy all 8 ScadaLink nodes docker/deploy.sh # 3. Seed test sites (first-time only, after cluster is healthy) docker/seed-sites.sh ``` ### After Code Changes Rebuild and redeploy. The Docker build cache skips NuGet restore when only source files change: ```bash docker/deploy.sh ``` ### Stop Application Nodes Stops and removes all 8 application containers. Site SQLite databases and log files are preserved in node directories: ```bash docker/teardown.sh ``` ### Stop Everything ```bash docker/teardown.sh cd infra && docker compose down && cd .. ``` ### View Logs ```bash # All nodes (follow mode) docker compose -f docker/docker-compose.yml logs -f # Single node docker logs -f scadalink-central-a # Filter by site cluster docker compose -f docker/docker-compose.yml logs -f site-a-a site-a-b docker compose -f docker/docker-compose.yml logs -f site-b-a site-b-b docker compose -f docker/docker-compose.yml logs -f site-c-a site-c-b # Persisted log files ls docker/central-node-a/logs/ ``` ### Restart a Single Node ```bash docker restart scadalink-central-a ``` ### Check Cluster Health ```bash # Central node A health check curl -s http://localhost:9001/health/ready | python3 -m json.tool # Central node B health check curl -s http://localhost:9002/health/ready | python3 -m json.tool ``` ### CLI Access The CLI connects to the Central Host's HTTP management API via the Traefik load balancer at `http://localhost:9000`, which routes to the active central node: ```bash dotnet run --project src/ScadaLink.CLI -- \ --url http://localhost:9000 \ --username multi-role --password password \ template list ``` Direct access to individual nodes is also available at `http://localhost:9001` (central-a) and `http://localhost:9002` (central-b). > **Note:** The `multi-role` test user has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot perform design or deployment operations. See `infra/glauth/config.toml` for all test users and their group memberships. A recommended `~/.scadalink/config.json` for the Docker test environment: ```json { "managementUrl": "http://localhost:9000" } ``` With this config file in place, the URL is automatic: ```bash dotnet run --project src/ScadaLink.CLI -- \ --username multi-role --password password \ template list ``` ### Clear Site Data Remove SQLite databases to reset site state (deployed configs, S&F buffers): ```bash # Single site rm -rf docker/site-a-node-a/data docker/site-a-node-b/data docker restart scadalink-site-a-a scadalink-site-a-b # All sites rm -rf docker/site-*/data docker restart scadalink-site-a-a scadalink-site-a-b \ scadalink-site-b-a scadalink-site-b-b \ scadalink-site-c-a scadalink-site-c-b ``` ### Rebuild Image From Scratch (no cache) ```bash docker build --no-cache -t scadalink:latest -f docker/Dockerfile . ``` ## Build Cache The Dockerfile uses a multi-stage build optimized for fast rebuilds: 1. **Restore stage**: Copies only `.csproj` files and runs `dotnet restore`. This layer is cached as long as no project file changes. 2. **Build stage**: Copies source code and runs `dotnet publish --no-restore`. Re-runs on any source change but skips restore. 3. **Runtime stage**: Uses the slim `aspnet:10.0` base image with only the published output. Typical rebuild after a source-only change takes ~5 seconds (restore cached, only build + publish runs). ## Test Users All test passwords are `password`. See `infra/glauth/config.toml` for the full list. | Username | Roles | Use Case | |----------|-------|----------| | `admin` | Admin | System administration | | `designer` | Design | Template authoring | | `deployer` | Deployment | Instance deployment (all sites) | | `multi-role` | Admin, Design, Deployment | Full access for testing | ## Failover Testing ### Central Failover ```bash # Stop the active central node docker stop scadalink-central-a # Verify central-b takes over (check logs for leader election) docker logs -f scadalink-central-b # Access UI on standby node open http://localhost:9002 # Restore the original node docker start scadalink-central-a ``` ### Site Failover ```bash # Stop the active site-a node docker stop scadalink-site-a-a # Verify site-a-b takes over singleton (DeploymentManager) docker logs -f scadalink-site-a-b # Restore docker start scadalink-site-a-a ``` Same pattern applies for site-b (`scadalink-site-b-a`/`scadalink-site-b-b`) and site-c (`scadalink-site-c-a`/`scadalink-site-c-b`). Failover takes approximately 25 seconds (2s heartbeat + 10s detection threshold + 15s stable-after for split-brain resolver).