# SCADA System — Design Documentation ## Overview This document serves as the master index for the SCADA system design. The system is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology. ### Technology Stack | Layer | Technology | |-------|-----------| | Runtime | .NET, Akka.NET (actors, clustering, remoting, persistence, streams) | | Central UI | Blazor Server (ASP.NET Core + SignalR) | | Inbound API | ASP.NET Core Web API (REST/JSON) | | Central Database | MS SQL Server, Entity Framework Core | | Site Storage | SQLite (deployed configs, S&F buffer, event logs) | | Authentication | Direct LDAP/AD bind (LDAPS/StartTLS), JWT sessions | | Notifications | SMTP with OAuth2 Client Credentials (Microsoft 365) | | Hosting | Windows Server, Windows Service | | Cluster | Akka.NET Cluster (active/standby, keep-oldest SBR) | | Logging | Serilog (structured) | ### Scale - ~10 site clusters, each with 50–500 machines, 25–75 live tags per machine. - Central cluster: 2-node active/standby behind a load balancer. - Site clusters: 2-node active/standby, headless (no UI). ## Document Map ### Requirements - [HighLevelReqs.md](docs/requirements/HighLevelReqs.md) — Complete high-level requirements covering all functional areas. ### Component Design Documents | # | Component | Document | Description | |---|-----------|----------|-------------| | 1 | Template Engine | [docs/requirements/Component-TemplateEngine.md](docs/requirements/Component-TemplateEngine.md) | Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, flattening, semantic validation, revision hashing, and diff calculation. | | 2 | Deployment Manager | [docs/requirements/Component-DeploymentManager.md](docs/requirements/Component-DeploymentManager.md) | Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status. | | 3 | Site Runtime | [docs/requirements/Component-SiteRuntime.md](docs/requirements/Component-SiteRuntime.md) | Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, and site-wide Akka stream with per-subscriber backpressure. | | 4 | Data Connection Layer | [docs/requirements/Component-DataConnectionLayer.md](docs/requirements/Component-DataConnectionLayer.md) | Common data connection interface (OPC UA, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry. | | 5 | Central–Site Communication | [docs/requirements/Component-Communication.md](docs/requirements/Component-Communication.md) | Dual transport: Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data). 8 message patterns with per-pattern timeouts, SiteStreamGrpcServer/Client, application-level correlation IDs, transport heartbeat config, gRPC keepalive, message ordering, connection failure behavior. | | 6 | Store-and-Forward Engine | [docs/requirements/Component-StoreAndForward.md](docs/requirements/Component-StoreAndForward.md) | Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites. | | 7 | External System Gateway | [docs/requirements/Component-ExternalSystemGateway.md](docs/requirements/Component-ExternalSystemGateway.md) | HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling. | | 8 | Notification Service | [docs/requirements/Component-NotificationService.md](docs/requirements/Component-NotificationService.md) | SMTP with OAuth2 (M365) or Basic Auth, BCC delivery, plain text, transient/permanent SMTP error classification, store-and-forward integration. | | 9 | Central UI | [docs/requirements/Component-CentralUI.md](docs/requirements/Component-CentralUI.md) | Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows. | | 10 | Security & Auth | [docs/requirements/Component-Security.md](docs/requirements/Component-Security.md) | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. | | 11 | Health Monitoring | [docs/requirements/Component-HealthMonitoring.md](docs/requirements/Component-HealthMonitoring.md) | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. | | 12 | Site Event Logging | [docs/requirements/Component-SiteEventLogging.md](docs/requirements/Component-SiteEventLogging.md) | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. | | 13 | Cluster Infrastructure | [docs/requirements/Component-ClusterInfrastructure.md](docs/requirements/Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. | | 14 | Inbound API | [docs/requirements/Component-InboundAPI.md](docs/requirements/Component-InboundAPI.md) | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. | | 15 | Host | [docs/requirements/Component-Host.md](docs/requirements/Component-Host.md) | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. | | 16 | Commons | [docs/requirements/Component-Commons.md](docs/requirements/Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention. | | 17 | Configuration Database | [docs/requirements/Component-ConfigurationDatabase.md](docs/requirements/Component-ConfigurationDatabase.md) | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management. | | 18 | Management Service | [docs/requirements/Component-ManagementService.md](docs/requirements/Component-ManagementService.md) | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. | | 19 | CLI | [docs/requirements/Component-CLI.md](docs/requirements/Component-CLI.md) | Standalone command-line tool, System.CommandLine, HTTP transport via Management API, JSON/table output, mirrors all Management Service operations. | | 20 | Traefik Proxy | [docs/requirements/Component-TraefikProxy.md](docs/requirements/Component-TraefikProxy.md) | Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. | ### Reference Documentation - [AkkaDotNet/](AkkaDotNet/) — Akka.NET reference notes covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices. - [docs/plans/](docs/plans/) — Design decision documents from refinement sessions. ### Architecture Diagram (Logical) ``` Users (Blazor Server) │ Load Balancer │ ┌────────────────────────┼────────────────────────────┐ │ CENTRAL CLUSTER │ │ (2-node active/standby) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Template │ │Deployment│ │ Central │ │ │ │ Engine │ │ Manager │ │ UI │ Blazor Svr │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Security │ │ Config │ │ Health │ │ │ │ & Auth │ │ DB │ │ Monitor │ │ │ │ (JWT/LDAP)│ │ (EF+IAud)│ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ┌──────────┐ │ │ │ Inbound │ ◄── External Systems (X-API-Key) │ │ │ API │ POST /api/{method}, JSON │ │ └──────────┘ │ │ ┌──────────┐ │ │ │ Mgmt │ ◄── CLI (ClusterClient) │ │ │ Service │ ManagementActor + Receptionist │ │ └──────────┘ │ │ ┌───────────────────────────────────┐ │ │ │ Akka.NET Communication Layer │ │ │ │ ClusterClient: command/control │ │ │ │ gRPC Client: real-time streams │ │ │ │ (correlation IDs, per-pattern │ │ │ │ timeouts, message ordering) │ │ │ └──────────────┬────────────────────┘ │ │ ┌──────────────┴────────────────────┐ │ │ │ Configuration Database (EF) │──► MS SQL │ │ └───────────────────────────────────┘ (Config DB)│ │ │ Machine Data DB│ └─────────────────┼───────────────────────────────────┘ │ Akka.NET Remoting (command/control) │ gRPC HTTP/2 (real-time data, port 8083) ┌────────────┼────────────┐ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ SITE A │ │ SITE B │ │ SITE N │ │ (2-node)│ │ (2-node)│ │ (2-node)│ │ ┌─────┐ │ │ ┌─────┐ │ │ ┌─────┐ │ │ │Data │ │ │ │Data │ │ │ │Data │ │ │ │Conn │ │ │ │Conn │ │ │ │Conn │ │ │ │Layer │ │ │ │Layer │ │ │ │Layer │ │ │ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ │ │Site │ │ │ │Site │ │ │ │Site │ │ │ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │ │ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ │ │gRPC │ │ │ │gRPC │ │ │ │gRPC │ │ │ │Srvr │ │ │ │Srvr │ │ │ │Srvr │ │ │ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ │ │S&F │ │ │ │S&F │ │ │ │S&F │ │ │ │Engine│ │ │ │Engine│ │ │ │Engine│ │ │ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ │ │ExtSys│ │ │ │ExtSys│ │ │ │ExtSys│ │ │ │Gatwy │ │ │ │Gatwy │ │ │ │Gatwy │ │ │ └─────┘ │ │ └─────┘ │ │ └─────┘ │ │ SQLite │ │ SQLite │ │ SQLite │ └─────────┘ └─────────┘ └─────────┘ │ │ │ OPC UA / OPC UA / OPC UA / Custom Custom Custom Protocol Protocol Protocol ``` ### Site Runtime Actor Hierarchy ``` Deployment Manager Singleton (Cluster Singleton) ├── Instance Actor (one per deployed, enabled instance) │ ├── Script Actor (coordinator, one per instance script) │ │ └── Script Execution Actor (short-lived, per invocation) │ ├── Alarm Actor (coordinator, one per alarm definition) │ │ └── Alarm Execution Actor (short-lived, per on-trigger invocation) │ └── ... (more Script/Alarm Actors) ├── Instance Actor │ └── ... └── ... (more Instance Actors) Site-Wide Akka Stream (attribute + alarm state changes) ├── All Instance Actors publish to the stream └── Debug view subscribes with instance-level filtering ```