# ScadaLink Design Documentation Project This project contains design documentation for a distributed SCADA system built on Akka.NET. The documents describe a hub-and-spoke architecture with a central cluster and multiple site clusters. ## Project Structure - `README.md` — Master index with component table and architecture diagrams. - `HighLevelReqs.md` — Complete high-level requirements covering all functional areas. - `Component-*.md` — Individual component design documents (one per component). - `docs/plans/` — Design decision documents from refinement sessions. - `AkkaDotNet/` — Akka.NET reference documentation and best practices notes. - `test_infra.md` — Master test infrastructure doc (OPC UA, LDAP, MS SQL). - `infra/` — Docker Compose and config files for local test services. - `docker/` — Docker infrastructure for the 8-node cluster topology (2 central + 3 sites). See [`docker/README.md`](docker/README.md) for cluster setup, port allocation, and management commands. ## Document Conventions - All documents are markdown files in the project root directory. - Component documents are named `Component-.md` (PascalCase, hyphen-separated). - Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions. - The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table. - Cross-component references in Dependencies and Interactions sections must be kept accurate across all documents. When a component's role changes, update references in all affected documents. ## Refinement Process - When refining requirements, use the Socratic method — ask clarifying questions before making changes. - Probe for implications of decisions across components before updating documents. - When a decision is made, identify all affected documents and update them together for consistency. - After updates, verify no stale cross-references remain (e.g., references to removed or renamed components). ## Editing Rules - Edit documents in place. Do not create copies or backup files. - When a change affects multiple documents, update all affected documents in the same session. - Use `git diff` to review changes before committing. - Commit related changes together with a descriptive message summarizing the design decision. ## Current Component List (19 components) 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. 3. Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream. 4. Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe. 5. Central–Site Communication — Akka.NET ClusterClient/ClusterClientReceptionist, message patterns, debug streaming. 6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication. 7. External System Gateway — External system definitions, API method invocation, database connections. 8. Notification Service — Notification lists, email delivery, store-and-forward integration. 9. Central UI — Web-based management interface, all workflows. 10. Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions. 11. Health Monitoring — Site health metrics collection and central reporting. 12. Site Event Logging — Local operational event logs at sites with central query access. 13. Cluster Infrastructure — Akka.NET cluster setup, active/standby failover, singleton support. 14. Inbound API — Web API for external systems, API key auth, script-based implementations. 15. Host — Single deployable binary, role-based component registration, Akka.NET bootstrap. 16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts. 17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations. 18. Management Service — Akka.NET actor providing programmatic access to all admin operations, ClusterClientReceptionist registration. 19. CLI — Command-line tool using ClusterClient to interact with Management Service, System.CommandLine, JSON/table output. ## Key Design Decisions (for context across sessions) ### Architecture & Runtime - Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state. - Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors. - Script Actors spawn short-lived Script Execution Actors on a dedicated blocking I/O dispatcher. - Alarm Actors are separate peer subsystem from scripts (not inside Script Engine). - Shared scripts execute inline as compiled code (no separate actors). - Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering. - Instance Actors serialize all state mutations (Akka actor model); concurrent scripts produce interleaved side effects. - Staggered Instance Actor startup on failover to prevent reconnection storms. - Explicit supervision strategies: Resume for coordinator actors, Stop for short-lived execution actors. ### Data & Communication - Data Connection Layer is a clean data pipe — publishes to Instance Actors only. - DCL connection actor uses Become/Stash pattern for lifecycle state machine. - DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe. - DCL write failures returned synchronously to calling script. - Tag path resolution retried periodically for devices still booting. - Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment). - All timestamps are UTC throughout the system. - Inter-cluster communication uses ClusterClient/ClusterClientReceptionist. Both CentralCommunicationActor and SiteCommunicationActor registered with receptionist. Central creates one ClusterClient per site using NodeA/NodeB as contact points. Sites configure multiple central contact points for failover. Addresses cached in CentralCommunicationActor, refreshed periodically (60s) and on admin changes. Heartbeats serve health monitoring only. ### External Integrations - External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth. - Dual call modes: `ExternalSystem.Call()` (synchronous) and `ExternalSystem.CachedCall()` (store-and-forward on transient failure). - Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script). - Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text. - Inbound API: `POST /api/{methodName}`, `X-API-Key` header, flat JSON, extended type system (Object, List). ### Templates & Deployment - Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types). - Composed member addressing uses path-qualified canonical names: `[ModuleInstanceName].[MemberName]`. - Override granularity defined per entity type and per field. - Template graph acyclicity enforced on save. - Flattened configs include a revision hash for staleness detection. - Deployment identity: unique deployment ID + revision hash for idempotency. - Per-instance operation lock covers all mutating commands (deploy, disable, enable, delete). - Site-side apply is all-or-nothing per instance. - System-wide artifact version skew across sites is supported. - Last-write-wins for concurrent template editing (no optimistic concurrency on templates). - Optimistic concurrency on deployment status records. - Naming collisions in composed feature modules are design-time errors. ### Store-and-Forward - Fixed retry interval, no max buffer size. Only transient failures buffered. - Async best-effort replication to standby (no ack wait). - Messages not cleared on instance deletion. - CachedCall idempotency is the caller's responsibility. ### Security & Auth - Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required. - Cookie+JWT hybrid sessions: HttpOnly/Secure cookie carries an embedded JWT (HMAC-SHA256 shared symmetric key), 15-minute expiry with sliding refresh, 30-minute idle timeout. Cookies are the correct transport for Blazor Server (SignalR circuits). - LDAP failure: new logins fail; active sessions continue with current roles. - Load balancer in front of central UI; cookie-embedded JWT + shared Data Protection keys for failover transparency. ### Cluster & Failover - Keep-oldest split-brain resolver with `down-if-alone = on`, 15s stable-after. - Both nodes are seed nodes. `min-nr-of-members = 1`. - Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s. - CoordinatedShutdown for graceful singleton handover. - Automatic dual-node recovery from persistent storage. ### UI & Monitoring - Central UI: Blazor Server (ASP.NET Core + SignalR) with Bootstrap CSS. No third-party component frameworks (no Blazorise, MudBlazor, Radzen, etc.). Build custom Blazor components for tables, grids, forms, etc. - UI design: Clean, corporate, internal-use aesthetic. Not flashy. Use the `frontend-design` skill when designing UI pages/components. - Debug view: 2s polling timer. Health dashboard: 10s polling timer. Deployment status: real-time push via SignalR. - Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval. - Dead letter monitoring as a health metric. - Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search. ### Code Organization - Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database. - Repository interfaces defined in Commons; implementations in Configuration Database. - Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders. - Message contracts follow additive-only evolution rules for version compatibility. - Per-component configuration via `appsettings.json` sections bound to options classes (Options pattern). - Options classes owned by component projects, not Commons. - Host readiness gating: `/health/ready` endpoint, no traffic until operational. - EF Core migrations: auto-apply in dev, manual SQL scripts for production. - Audit logging absorbed into Configuration Database component (IAuditService). ### Akka.NET Conventions - Tell for hot-path internal communication; Ask reserved for system boundaries. - ClusterClient for cross-cluster communication; ClusterClientReceptionist for service discovery across cluster boundaries. - Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network). - Application-level correlation IDs on all request/response messages. ## Tool Usage - When consulting with the Codex MCP tool, use model `gpt-5.4`. - When a task requires setting up or controlling system state (sites, templates, instances, data connections, deployments, security, etc.) and the Central UI is not needed, prefer the ScadaLink CLI over manual DB edits or UI navigation. See [`src/ScadaLink.CLI/README.md`](src/ScadaLink.CLI/README.md) for the full command reference. ### CLI Quick Reference (Docker / OrbStack) - **Contact point**: `akka.tcp://scadalink@scadalink-central-a:8081` — the hostname must match the container's Akka `NodeHostname` config. Do NOT use `localhost:9011`; Akka remoting requires the hostname in the URI to match what the node advertises. - **Test user**: `--username multi-role --password password` — has Admin, Design, and Deployment roles. The `admin` user only has the Admin role and cannot create templates, data connections, or deploy. - **Config file**: `~/.scadalink/config.json` — stores contact points, LDAP settings (including `searchBase`, `serviceAccountDn`, `serviceAccountPassword`), and default format. See `docker/README.md` for a ready-to-use test config. - **Rebuild cluster**: `bash docker/deploy.sh` — builds the `scadalink:latest` image and recreates all containers. Run this after code changes to ManagementActor, Host, or any server-side component. - **Infrastructure services**: `cd infra && docker compose up -d` — starts LDAP, MS SQL, OPC UA, SMTP, REST API, and LmxFakeProxy. These are separate from the cluster containers in `docker/`. - **All test LDAP passwords**: `password` (see `infra/glauth/config.toml` for users and groups).