diff --git a/CLAUDE.md b/CLAUDE.md index f5839e3..dcbcf29 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -35,7 +35,7 @@ This project contains design documentation for a distributed SCADA system built - Use `git diff` to review changes before committing. - Commit related changes together with a descriptive message summarizing the design decision. -## Current Component List (19 components) +## Current Component List (20 components) 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. @@ -55,7 +55,8 @@ This project contains design documentation for a distributed SCADA system built 16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts. 17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations. 18. Management Service — Akka.NET actor providing programmatic access to all admin operations, ClusterClientReceptionist registration. -19. CLI — Command-line tool using ClusterClient to interact with Management Service, System.CommandLine, JSON/table output. +19. CLI — Command-line tool using HTTP Management API, System.CommandLine, JSON/table output. +20. Traefik Proxy — Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. ## Key Design Decisions (for context across sessions) @@ -152,7 +153,7 @@ This project contains design documentation for a distributed SCADA system built ### CLI Quick Reference (Docker / OrbStack) -- **Management URL**: `http://localhost:9001` — the CLI connects to the Central Host's HTTP management API (port 5000 mapped to 9001 in Docker). +- **Management URL**: `http://localhost:9000` — the CLI connects via the Traefik load balancer, which routes to the active central node. Direct access: central-a on port 9001, central-b on port 9002. - **Test user**: `--username multi-role --password password` — has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot create templates, data connections, or deploy. - **Config file**: `~/.scadalink/config.json` — stores `managementUrl` and default format. See `docker/README.md` for a ready-to-use test config. - **Rebuild cluster**: `bash docker/deploy.sh` — builds the `scadalink:latest` image and recreates all containers. Run this after code changes to ManagementActor, Host, or any server-side component. diff --git a/Component-TraefikProxy.md b/Component-TraefikProxy.md new file mode 100644 index 0000000..5506f6e --- /dev/null +++ b/Component-TraefikProxy.md @@ -0,0 +1,138 @@ +# Component: Traefik Proxy + +## Purpose + +The Traefik Proxy is a reverse proxy and load balancer that sits in front of the central cluster's two web servers. It provides a single stable URL for the CLI, browser, and external API consumers, automatically routing traffic to the active central node. When the active node fails over, Traefik detects the change via health checks and redirects traffic to the new active node without manual intervention. + +## Location + +Runs as a Docker container (`scadalink-traefik`) in the cluster compose stack (`docker/docker-compose.yml`). Not part of the application codebase — it is a third-party infrastructure component with static configuration files. + +`docker/traefik/` + +## Responsibilities + +- Route all HTTP traffic (Central UI, Management API, Inbound API, health endpoints) to the active central node. +- Health-check both central nodes via `/health/active` to determine which is the active (cluster leader) node. +- Automatically fail over to the standby node when the active node goes down. +- Provide a dashboard for monitoring routing state and backend health. + +## How It Works + +### Active Node Detection + +Traefik polls `/health/active` on both central nodes every 5 seconds. This endpoint returns: + +- **HTTP 200** on the active node (the Akka.NET cluster leader). +- **HTTP 503** on the standby node (or if the node is unreachable). + +Only the node returning 200 receives traffic. The health check is implemented by `ActiveNodeHealthCheck` in the Host project, which checks `Cluster.Get(system).State.Leader == SelfMember.Address`. + +### Failover Sequence + +1. Active node fails (crash, network partition, or graceful shutdown). +2. Akka.NET cluster detects the failure (~10s heartbeat timeout). +3. Split-brain resolver acts after stable-after period (~15s). +4. Surviving node becomes cluster leader. +5. `ActiveNodeHealthCheck` on the surviving node starts returning 200. +6. Traefik's next health poll (within 5s) detects the change. +7. Traffic routes to the new active node. + +**Total failover time**: ~25–30s (Akka failover ~25s + Traefik poll interval up to 5s). + +### SignalR / Blazor Server Considerations + +Blazor Server uses persistent SignalR connections (WebSocket circuits). During failover: + +- Active SignalR circuits on the failed node are lost. +- The browser's SignalR reconnection logic attempts to reconnect. +- Traefik routes the reconnection to the new active node. +- The user's session survives because authentication uses cookie-embedded JWT with shared Data Protection keys across both central nodes. +- The user may see a brief "Reconnecting..." overlay before the circuit re-establishes. + +## Configuration + +### Static Config (`docker/traefik/traefik.yml`) + +```yaml +entryPoints: + web: + address: ":80" + +api: + dashboard: true + insecure: true + +providers: + file: + filename: /etc/traefik/dynamic.yml +``` + +- **Entrypoint `web`**: Listens on port 80 (mapped to host port 9000). +- **Dashboard**: Enabled in insecure mode (no auth) for development. Accessible at `http://localhost:8180`. +- **File provider**: Loads routing rules from a static YAML file (no Docker socket required). + +### Dynamic Config (`docker/traefik/dynamic.yml`) + +```yaml +http: + routers: + central: + rule: "PathPrefix(`/`)" + service: central + entryPoints: + - web + + services: + central: + loadBalancer: + healthCheck: + path: /health/active + interval: 5s + timeout: 3s + servers: + - url: "http://scadalink-central-a:5000" + - url: "http://scadalink-central-b:5000" +``` + +- **Router `central`**: Catches all requests and forwards to the `central` service. +- **Service `central`**: Load balancer with two backends (both central nodes) and a health check on `/health/active`. +- **Health check interval**: 5 seconds. A server failing the health check is removed from the pool within one interval. + +## Ports + +| Host Port | Container Port | Purpose | +|-----------|---------------|---------| +| 9000 | 80 | Load-balanced entrypoint (Central UI, Management API, Inbound API) | +| 8180 | 8080 | Traefik dashboard | + +## Health Endpoints + +The central nodes expose three health endpoints: + +| Endpoint | Purpose | Who Uses It | +|----------|---------|-------------| +| `/health/ready` | Readiness gate — 200 when database + Akka cluster are healthy | Kubernetes probes, monitoring | +| `/health/active` | Active node — 200 only on cluster leader | **Traefik** (routing decisions) | + +## Dependencies + +- **Central cluster nodes**: The two backends (`scadalink-central-a`, `scadalink-central-b`) on the `scadalink-net` Docker network. +- **ActiveNodeHealthCheck**: Health check implementation in `src/ScadaLink.Host/Health/ActiveNodeHealthCheck.cs` that determines cluster leader status. +- **Docker network**: All containers must be on the shared `scadalink-net` bridge network. + +## Interactions + +- **CLI**: Connects to `http://localhost:9000/management` — routed by Traefik to the active node. +- **Browser (Central UI)**: Connects to `http://localhost:9000` — Blazor Server + SignalR routed to the active node. +- **Inbound API consumers**: Connect to `http://localhost:9000/api/{methodName}` — routed to the active node. +- **Cluster Infrastructure**: The `ActiveNodeHealthCheck` relies on Akka.NET cluster gossip state to determine the leader. + +## Production Considerations + +The current configuration is for development/testing. In production: + +- **TLS termination**: Add HTTPS entrypoint with certificates (Let's Encrypt via Traefik's ACME provider, or static certs). +- **Dashboard auth**: Disable `insecure: true` and configure authentication on the dashboard. +- **WebSocket support**: Traefik supports WebSocket proxying natively — no additional config needed for SignalR. +- **Sticky sessions**: Not required. The Management API is stateless (Basic Auth per request). Blazor Server circuits are bound to a specific node via SignalR, but reconnection handles failover transparently. diff --git a/README.md b/README.md index aad2c56..45c1b91 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,8 @@ This document serves as the master index for the SCADA system design. The system | 16 | Commons | [Component-Commons.md](Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention. | | 17 | Configuration Database | [Component-ConfigurationDatabase.md](Component-ConfigurationDatabase.md) | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management. | | 18 | Management Service | [Component-ManagementService.md](Component-ManagementService.md) | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. | -| 19 | CLI | [Component-CLI.md](Component-CLI.md) | Standalone command-line tool, System.CommandLine, Akka.NET ClusterClient transport, LDAP auth, JSON/table output, mirrors all Management Service operations. | +| 19 | CLI | [Component-CLI.md](Component-CLI.md) | Standalone command-line tool, System.CommandLine, HTTP transport via Management API, JSON/table output, mirrors all Management Service operations. | +| 20 | Traefik Proxy | [Component-TraefikProxy.md](Component-TraefikProxy.md) | Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. | ### Reference Documentation diff --git a/docker/README.md b/docker/README.md index 4cb9de3..9c12f00 100644 --- a/docker/README.md +++ b/docker/README.md @@ -5,7 +5,12 @@ Local Docker deployment of the full ScadaLink cluster topology: a 2-node central ## Cluster Topology ``` -┌─────────────────────────────────────────────────────┐ + ┌───────────────────┐ + │ Traefik LB :9000 │ ◄── CLI / Browser + │ Dashboard :8180 │ + └────────┬──────────┘ + │ routes to active node +┌──────────────────────┼──────────────────────────────┐ │ Central Cluster │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ @@ -48,6 +53,7 @@ Each site cluster runs Site Runtime, Data Connection Layer, Store-and-Forward, a | Node | Container Name | Host Web Port | Host Akka Port | Internal Ports | |------|---------------|---------------|----------------|----------------| +| Traefik LB | `scadalink-traefik` | 9000 | — | 80 (proxy), 8080 (dashboard) | | Central A | `scadalink-central-a` | 9001 | 9011 | 5000 (web), 8081 (Akka) | | Central B | `scadalink-central-b` | 9002 | 9012 | 5000 (web), 8081 (Akka) | | Site-A A | `scadalink-site-a-a` | — | 9021 | 8082 (Akka) | @@ -185,22 +191,24 @@ curl -s http://localhost:9002/health/ready | python3 -m json.tool ### CLI Access -The CLI connects to the Central Host's HTTP management API. With the Docker setup, the Central UI (and management API) is available at `http://localhost:9001`: +The CLI connects to the Central Host's HTTP management API via the Traefik load balancer at `http://localhost:9000`, which routes to the active central node: ```bash dotnet run --project src/ScadaLink.CLI -- \ - --url http://localhost:9001 \ + --url http://localhost:9000 \ --username multi-role --password password \ template list ``` +Direct access to individual nodes is also available at `http://localhost:9001` (central-a) and `http://localhost:9002` (central-b). + > **Note:** The `multi-role` test user has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot perform design or deployment operations. See `infra/glauth/config.toml` for all test users and their group memberships. A recommended `~/.scadalink/config.json` for the Docker test environment: ```json { - "managementUrl": "http://localhost:9001" + "managementUrl": "http://localhost:9000" } ``` diff --git a/docker/deploy.sh b/docker/deploy.sh index 1cbff6d..3cdd46d 100755 --- a/docker/deploy.sh +++ b/docker/deploy.sh @@ -18,10 +18,12 @@ docker compose -f "$SCRIPT_DIR/docker-compose.yml" ps echo "" echo "Access points:" -echo " Central UI (node A): http://localhost:9001" -echo " Central UI (node B): http://localhost:9002" -echo " Health check: http://localhost:9001/health/ready" -echo " CLI contact points: akka.tcp://scadalink@localhost:9011" -echo " akka.tcp://scadalink@localhost:9012" +echo " Central (Traefik LB): http://localhost:9000" +echo " Central UI (node A): http://localhost:9001" +echo " Central UI (node B): http://localhost:9002" +echo " Health check: http://localhost:9001/health/ready" +echo " Active node check: http://localhost:9001/health/active" +echo " Traefik dashboard: http://localhost:8180" +echo " Management API: http://localhost:9000/management" echo "" echo "Logs: docker compose -f $SCRIPT_DIR/docker-compose.yml logs -f" diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml index c087cd0..97e6c41 100644 --- a/docker/docker-compose.yml +++ b/docker/docker-compose.yml @@ -123,6 +123,19 @@ services: - scadalink-net restart: unless-stopped + traefik: + image: traefik:v3.4 + container_name: scadalink-traefik + ports: + - "9000:80" # Central load-balanced entrypoint + - "8180:8080" # Traefik dashboard + volumes: + - ./traefik/traefik.yml:/etc/traefik/traefik.yml:ro + - ./traefik/dynamic.yml:/etc/traefik/dynamic.yml:ro + networks: + - scadalink-net + restart: unless-stopped + networks: scadalink-net: external: true diff --git a/docker/traefik/dynamic.yml b/docker/traefik/dynamic.yml new file mode 100644 index 0000000..ffecd2b --- /dev/null +++ b/docker/traefik/dynamic.yml @@ -0,0 +1,18 @@ +http: + routers: + central: + rule: "PathPrefix(`/`)" + service: central + entryPoints: + - web + + services: + central: + loadBalancer: + healthCheck: + path: /health/active + interval: 5s + timeout: 3s + servers: + - url: "http://scadalink-central-a:5000" + - url: "http://scadalink-central-b:5000" diff --git a/docker/traefik/traefik.yml b/docker/traefik/traefik.yml new file mode 100644 index 0000000..3a921ce --- /dev/null +++ b/docker/traefik/traefik.yml @@ -0,0 +1,11 @@ +entryPoints: + web: + address: ":80" + +api: + dashboard: true + insecure: true + +providers: + file: + filename: /etc/traefik/dynamic.yml diff --git a/src/ScadaLink.Host/Health/ActiveNodeHealthCheck.cs b/src/ScadaLink.Host/Health/ActiveNodeHealthCheck.cs new file mode 100644 index 0000000..4c629da --- /dev/null +++ b/src/ScadaLink.Host/Health/ActiveNodeHealthCheck.cs @@ -0,0 +1,40 @@ +using Akka.Cluster; +using Microsoft.Extensions.Diagnostics.HealthChecks; +using ScadaLink.Host.Actors; + +namespace ScadaLink.Host.Health; + +/// +/// Health check that returns healthy only if this node is the active (leader) node +/// in the Akka.NET cluster. Used by Traefik to route traffic to the active node. +/// +public class ActiveNodeHealthCheck : IHealthCheck +{ + private readonly AkkaHostedService _akkaService; + + public ActiveNodeHealthCheck(AkkaHostedService akkaService) + { + _akkaService = akkaService; + } + + public Task CheckHealthAsync( + HealthCheckContext context, + CancellationToken cancellationToken = default) + { + var system = _akkaService.ActorSystem; + if (system == null) + return Task.FromResult(HealthCheckResult.Unhealthy("ActorSystem not yet available.")); + + var cluster = Cluster.Get(system); + var self = cluster.SelfMember; + + if (self.Status != MemberStatus.Up) + return Task.FromResult(HealthCheckResult.Unhealthy($"Node not Up (status: {self.Status}).")); + + var leader = cluster.State.Leader; + if (leader != null && leader == self.Address) + return Task.FromResult(HealthCheckResult.Healthy("Active node (cluster leader).")); + + return Task.FromResult(HealthCheckResult.Unhealthy("Standby node (not cluster leader).")); + } +} diff --git a/src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs b/src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs index 8251d4f..d5c3798 100644 --- a/src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs +++ b/src/ScadaLink.Host/Health/AkkaClusterHealthCheck.cs @@ -1,6 +1,6 @@ -using Akka.Actor; using Akka.Cluster; using Microsoft.Extensions.Diagnostics.HealthChecks; +using ScadaLink.Host.Actors; namespace ScadaLink.Host.Health; @@ -10,21 +10,22 @@ namespace ScadaLink.Host.Health; /// public class AkkaClusterHealthCheck : IHealthCheck { - private readonly ActorSystem? _system; + private readonly AkkaHostedService _akkaService; - public AkkaClusterHealthCheck(ActorSystem? system = null) + public AkkaClusterHealthCheck(AkkaHostedService akkaService) { - _system = system; + _akkaService = akkaService; } public Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { - if (_system == null) + var system = _akkaService.ActorSystem; + if (system == null) return Task.FromResult(HealthCheckResult.Degraded("ActorSystem not yet available.")); - var cluster = Cluster.Get(_system); + var cluster = Cluster.Get(system); var status = cluster.SelfMember.Status; var result = status switch diff --git a/src/ScadaLink.Host/Program.cs b/src/ScadaLink.Host/Program.cs index 7deff27..189005b 100644 --- a/src/ScadaLink.Host/Program.cs +++ b/src/ScadaLink.Host/Program.cs @@ -87,7 +87,8 @@ try // WP-12: Health checks for readiness gating builder.Services.AddHealthChecks() .AddCheck("database") - .AddCheck("akka-cluster"); + .AddCheck("akka-cluster") + .AddCheck("active-node"); // WP-13: Akka.NET bootstrap via hosted service builder.Services.AddSingleton(); @@ -126,6 +127,13 @@ try ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse }); + // Active node endpoint — returns 200 only on the cluster leader; used by Traefik for routing + app.MapHealthChecks("/health/active", new HealthCheckOptions + { + Predicate = check => check.Name == "active-node", + ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse + }); + app.MapStaticAssets(); app.MapCentralUI(); app.MapInboundAPI(); diff --git a/test_infra.md b/test_infra.md index a5095ab..f18f740 100644 --- a/test_infra.md +++ b/test_infra.md @@ -1,17 +1,18 @@ # Test Infrastructure -This document describes the local Docker-based test infrastructure for ScadaLink development. Five services provide the external dependencies needed to run and test the system locally. +This document describes the local Docker-based test infrastructure for ScadaLink development. Six services provide the external dependencies needed to run and test the system locally. The first six run in `infra/docker-compose.yml`; Traefik runs alongside the cluster nodes in `docker/docker-compose.yml`. ## Services -| Service | Image | Port(s) | Config | -|---------|-------|---------|--------| -| OPC UA Server | `mcr.microsoft.com/iotedge/opc-plc:latest` | 50000 (OPC UA), 8080 (web) | `infra/opcua/nodes.json` | -| LDAP Server | `glauth/glauth:latest` | 3893 | `infra/glauth/config.toml` | -| MS SQL 2022 | `mcr.microsoft.com/mssql/server:2022-latest` | 1433 | `infra/mssql/setup.sql` | -| SMTP (Mailpit) | `axllent/mailpit:latest` | 1025 (SMTP), 8025 (web) | Environment vars | -| REST API (Flask) | Custom build (`infra/restapi/Dockerfile`) | 5200 | `infra/restapi/app.py` | -| LmxFakeProxy | Custom build (`infra/lmxfakeproxy/Dockerfile`) | 50051 (gRPC) | Environment vars | +| Service | Image | Port(s) | Config | Compose File | +|---------|-------|---------|--------|-------------| +| OPC UA Server | `mcr.microsoft.com/iotedge/opc-plc:latest` | 50000 (OPC UA), 8080 (web) | `infra/opcua/nodes.json` | `infra/` | +| LDAP Server | `glauth/glauth:latest` | 3893 | `infra/glauth/config.toml` | `infra/` | +| MS SQL 2022 | `mcr.microsoft.com/mssql/server:2022-latest` | 1433 | `infra/mssql/setup.sql` | `infra/` | +| SMTP (Mailpit) | `axllent/mailpit:latest` | 1025 (SMTP), 8025 (web) | Environment vars | `infra/` | +| REST API (Flask) | Custom build (`infra/restapi/Dockerfile`) | 5200 | `infra/restapi/app.py` | `infra/` | +| LmxFakeProxy | Custom build (`infra/lmxfakeproxy/Dockerfile`) | 50051 (gRPC) | Environment vars | `infra/` | +| Traefik LB | `traefik:v3.4` | 9000 (proxy), 8180 (dashboard) | `docker/traefik/` | `docker/` | ## Quick Start @@ -42,6 +43,7 @@ Each service has a dedicated document with configuration details, verification s - [test_infra_smtp.md](test_infra_smtp.md) — SMTP test server (Mailpit) - [test_infra_restapi.md](test_infra_restapi.md) — REST API test server (Flask) - [test_infra_lmxfakeproxy.md](test_infra_lmxfakeproxy.md) — LmxProxy fake server (OPC UA bridge) +- Traefik LB — see `docker/README.md` and `docker/traefik/` (runs with the cluster, not in `infra/`) ## Connection Strings @@ -112,4 +114,8 @@ infra/ lmxfakeproxy/ # .NET gRPC proxy bridging LmxProxy protocol to OPC UA tools/ # Python CLI tools (opcua, ldap, mssql, smtp, restapi) README.md # Quick-start for the infra folder + +docker/ + traefik/traefik.yml # Traefik static config (entrypoints, file provider) + traefik/dynamic.yml # Traefik dynamic config (load balancer, health check routing) ``` diff --git a/tests/ScadaLink.Host.Tests/HealthCheckTests.cs b/tests/ScadaLink.Host.Tests/HealthCheckTests.cs index e16aae7..0f7d770 100644 --- a/tests/ScadaLink.Host.Tests/HealthCheckTests.cs +++ b/tests/ScadaLink.Host.Tests/HealthCheckTests.cs @@ -1,10 +1,11 @@ using Microsoft.AspNetCore.Mvc.Testing; using Microsoft.Extensions.Configuration; +using ScadaLink.Host.Health; namespace ScadaLink.Host.Tests; /// -/// WP-12: Tests for /health/ready endpoint. +/// WP-12: Tests for /health/ready and /health/active endpoints. /// public class HealthCheckTests : IDisposable { @@ -63,4 +64,94 @@ public class HealthCheckTests : IDisposable Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", previousEnv); } } + + [Fact] + public async Task HealthActive_Endpoint_ReturnsResponse() + { + var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT"); + try + { + Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central"); + + var factory = new WebApplicationFactory() + .WithWebHostBuilder(builder => + { + builder.ConfigureAppConfiguration((context, config) => + { + config.AddInMemoryCollection(new Dictionary + { + ["ScadaLink:Node:NodeHostname"] = "localhost", + ["ScadaLink:Node:RemotingPort"] = "0", + ["ScadaLink:Cluster:SeedNodes:0"] = "akka.tcp://scadalink@localhost:2551", + ["ScadaLink:Cluster:SeedNodes:1"] = "akka.tcp://scadalink@localhost:2552", + ["ScadaLink:Database:SkipMigrations"] = "true", + }); + }); + builder.UseSetting("ScadaLink:Node:Role", "Central"); + builder.UseSetting("ScadaLink:Database:SkipMigrations", "true"); + }); + _disposables.Add(factory); + + var client = factory.CreateClient(); + _disposables.Add(client); + + var response = await client.GetAsync("/health/active"); + + // In test mode, the ActorSystem may not be fully available, + // so the active-node check returns 503 (Unhealthy). + Assert.True( + response.StatusCode == System.Net.HttpStatusCode.OK || + response.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable, + $"Expected 200 or 503, got {(int)response.StatusCode}"); + } + finally + { + Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", previousEnv); + } + } + + [Fact] + public async Task ActiveNodeHealthCheck_SystemNotStarted_ReturnsUnhealthy() + { + // AkkaHostedService before StartAsync has ActorSystem == null. + // The integration test (HealthActive_Endpoint_ReturnsResponse) validates the full + // endpoint wiring. This test validates the null-system path via WebApplicationFactory + // where the ActorSystem may not be available. + var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT"); + try + { + Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central"); + var factory = new WebApplicationFactory() + .WithWebHostBuilder(builder => + { + builder.ConfigureAppConfiguration((context, config) => + { + config.AddInMemoryCollection(new Dictionary + { + ["ScadaLink:Node:NodeHostname"] = "localhost", + ["ScadaLink:Node:RemotingPort"] = "0", + ["ScadaLink:Cluster:SeedNodes:0"] = "akka.tcp://scadalink@localhost:2551", + ["ScadaLink:Database:SkipMigrations"] = "true", + }); + }); + builder.UseSetting("ScadaLink:Node:Role", "Central"); + builder.UseSetting("ScadaLink:Database:SkipMigrations", "true"); + }); + _disposables.Add(factory); + + var client = factory.CreateClient(); + _disposables.Add(client); + + var response = await client.GetAsync("/health/active"); + var body = await response.Content.ReadAsStringAsync(); + + // Active-node check returns 503 when ActorSystem is not yet available or not leader + Assert.Equal(System.Net.HttpStatusCode.ServiceUnavailable, response.StatusCode); + Assert.Contains("active-node", body); + } + finally + { + Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", previousEnv); + } + } }