Joseph Doherty b2385709f8 fix: raise health report sender log level to INFO for observability
Changed "Sent health report" from DEBUG to INFO and failure log from
WARNING to ERROR so health report activity is visible in default logging.
2026-03-18 01:08:44 -04:00

SCADA System — Design Documentation

Overview

This document serves as the master index for the SCADA system design. The system is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology.

Technology Stack

Layer Technology
Runtime .NET, Akka.NET (actors, clustering, remoting, persistence, streams)
Central UI Blazor Server (ASP.NET Core + SignalR)
Inbound API ASP.NET Core Web API (REST/JSON)
Central Database MS SQL Server, Entity Framework Core
Site Storage SQLite (deployed configs, S&F buffer, event logs)
Authentication Direct LDAP/AD bind (LDAPS/StartTLS), JWT sessions
Notifications SMTP with OAuth2 Client Credentials (Microsoft 365)
Hosting Windows Server, Windows Service
Cluster Akka.NET Cluster (active/standby, keep-oldest SBR)
Logging Serilog (structured)

Scale

  • ~10 site clusters, each with 50500 machines, 2575 live tags per machine.
  • Central cluster: 2-node active/standby behind a load balancer.
  • Site clusters: 2-node active/standby, headless (no UI).

Document Map

Requirements

  • HighLevelReqs.md — Complete high-level requirements covering all functional areas.

Component Design Documents

# Component Document Description
1 Template Engine Component-TemplateEngine.md Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, flattening, semantic validation, revision hashing, and diff calculation.
2 Deployment Manager Component-DeploymentManager.md Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status.
3 Site Runtime Component-SiteRuntime.md Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, and site-wide Akka stream with per-subscriber backpressure.
4 Data Connection Layer Component-DataConnectionLayer.md Common data connection interface (OPC UA, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry.
5 CentralSite Communication Component-Communication.md Akka.NET remoting/cluster topology, 8 message patterns with per-pattern timeouts, application-level correlation IDs, transport heartbeat config, message ordering, connection failure behavior.
6 Store-and-Forward Engine Component-StoreAndForward.md Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites.
7 External System Gateway Component-ExternalSystemGateway.md HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling.
8 Notification Service Component-NotificationService.md SMTP with OAuth2 (M365) or Basic Auth, BCC delivery, plain text, transient/permanent SMTP error classification, store-and-forward integration.
9 Central UI Component-CentralUI.md Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows.
10 Security & Auth Component-Security.md Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions.
11 Health Monitoring Component-HealthMonitoring.md 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring.
12 Site Event Logging Component-SiteEventLogging.md SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search.
13 Cluster Infrastructure Component-ClusterInfrastructure.md Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery.
14 Inbound API Component-InboundAPI.md POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging.
15 Host Component-Host.md Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central.
16 Commons Component-Commons.md Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention.
17 Configuration Database Component-ConfigurationDatabase.md EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management.
18 Management Service Component-ManagementService.md Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface.
19 CLI Component-CLI.md Standalone command-line tool, System.CommandLine, Akka.NET ClusterClient transport, LDAP auth, JSON/table output, mirrors all Management Service operations.

Reference Documentation

  • AkkaDotNet/ — Akka.NET reference notes covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices.
  • docs/plans/ — Design decision documents from refinement sessions.

Architecture Diagram (Logical)

                    Users (Blazor Server)
                         │
                    Load Balancer
                         │
┌────────────────────────┼────────────────────────────┐
│                   CENTRAL CLUSTER                    │
│              (2-node active/standby)                 │
│                                                      │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐            │
│  │ Template  │ │Deployment│ │ Central  │            │
│  │ Engine    │ │ Manager  │ │ UI       │ Blazor Svr │
│  └──────────┘ └──────────┘ └──────────┘            │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐            │
│  │ Security  │ │  Config  │ │  Health  │            │
│  │ & Auth    │ │   DB     │ │ Monitor  │            │
│  │ (JWT/LDAP)│ │ (EF+IAud)│ │          │            │
│  └──────────┘ └──────────┘ └──────────┘            │
│  ┌──────────┐                                       │
│  │ Inbound  │  ◄── External Systems (X-API-Key)     │
│  │ API      │      POST /api/{method}, JSON         │
│  └──────────┘                                       │
│  ┌──────────┐                                       │
│  │ Mgmt     │  ◄── CLI (ClusterClient)              │
│  │ Service  │      ManagementActor + Receptionist   │
│  └──────────┘                                       │
│  ┌───────────────────────────────────┐              │
│  │    Akka.NET Communication Layer   │              │
│  │  (correlation IDs, per-pattern    │              │
│  │   timeouts, message ordering)     │              │
│  └──────────────┬────────────────────┘              │
│  ┌──────────────┴────────────────────┐              │
│  │    Configuration Database (EF)    │──► MS SQL    │
│  └───────────────────────────────────┘   (Config DB)│
│                  │                    Machine Data DB│
└─────────────────┼───────────────────────────────────┘
                  │ Akka.NET Remoting
     ┌────────────┼────────────┐
     ▼            ▼            ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ SITE A  │ │ SITE B  │ │ SITE N  │
│ (2-node)│ │ (2-node)│ │ (2-node)│
│ ┌─────┐ │ │ ┌─────┐ │ │ ┌─────┐ │
│ │Data │ │ │ │Data │ │ │ │Data │ │
│ │Conn │ │ │ │Conn │ │ │ │Conn │ │
│ │Layer │ │ │ │Layer │ │ │ │Layer │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │Site │ │ │ │Site │ │ │ │Site │ │
│ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │S&F  │ │ │ │S&F  │ │ │ │S&F  │ │
│ │Engine│ │ │ │Engine│ │ │ │Engine│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │ExtSys│ │ │ │ExtSys│ │ │ │ExtSys│ │
│ │Gatwy │ │ │ │Gatwy │ │ │ │Gatwy │ │
│ └─────┘ │ │ └─────┘ │ │ └─────┘ │
│ SQLite  │ │ SQLite  │ │ SQLite  │
└─────────┘ └─────────┘ └─────────┘
     │            │            │
  OPC UA /     OPC UA /     OPC UA /
  Custom       Custom       Custom
  Protocol     Protocol     Protocol

Site Runtime Actor Hierarchy

Deployment Manager Singleton (Cluster Singleton)
├── Instance Actor (one per deployed, enabled instance)
│   ├── Script Actor (coordinator, one per instance script)
│   │   └── Script Execution Actor (short-lived, per invocation)
│   ├── Alarm Actor (coordinator, one per alarm definition)
│   │   └── Alarm Execution Actor (short-lived, per on-trigger invocation)
│   └── ... (more Script/Alarm Actors)
├── Instance Actor
│   └── ...
└── ... (more Instance Actors)

Site-Wide Akka Stream (attribute + alarm state changes)
├── All Instance Actors publish to the stream
└── Debug view subscribes with instance-level filtering
Description
No description provided
Readme 7.1 MiB
Languages
C# 88.5%
HTML 9.4%
Python 1.3%
TSQL 0.5%
Shell 0.2%