# ScadaLink Design Documentation Project This project contains design documentation for a distributed SCADA system built on Akka.NET. The documents describe a hub-and-spoke architecture with a central cluster and multiple site clusters. ## Project Structure - `README.md` — Master index with component table and architecture diagrams. - `HighLevelReqs.md` — Complete high-level requirements covering all functional areas. - `Component-*.md` — Individual component design documents (one per component). - `docs/plans/` — Design decision documents from refinement sessions. - `AkkaDotNet/` — Akka.NET reference documentation and best practices notes. - `test_infra.md` — Master test infrastructure doc (OPC UA, LDAP, MS SQL). - `infra/` — Docker Compose and config files for local test services. There is no source code in this project — only design documentation in markdown. ## Document Conventions - All documents are markdown files in the project root directory. - Component documents are named `Component-.md` (PascalCase, hyphen-separated). - Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions. - The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table. - Cross-component references in Dependencies and Interactions sections must be kept accurate across all documents. When a component's role changes, update references in all affected documents. ## Refinement Process - When refining requirements, use the Socratic method — ask clarifying questions before making changes. - Probe for implications of decisions across components before updating documents. - When a decision is made, identify all affected documents and update them together for consistency. - After updates, verify no stale cross-references remain (e.g., references to removed or renamed components). ## Editing Rules - Edit documents in place. Do not create copies or backup files. - When a change affects multiple documents, update all affected documents in the same session. - Use `git diff` to review changes before committing. - Commit related changes together with a descriptive message summarizing the design decision. ## Current Component List (17 components) 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. 3. Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream. 4. Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe. 5. Central–Site Communication — Akka.NET remoting, message patterns, debug streaming. 6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication. 7. External System Gateway — External system definitions, API method invocation, database connections. 8. Notification Service — Notification lists, email delivery, store-and-forward integration. 9. Central UI — Web-based management interface, all workflows. 10. Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions. 11. Health Monitoring — Site health metrics collection and central reporting. 12. Site Event Logging — Local operational event logs at sites with central query access. 13. Cluster Infrastructure — Akka.NET cluster setup, active/standby failover, singleton support. 14. Inbound API — Web API for external systems, API key auth, script-based implementations. 15. Host — Single deployable binary, role-based component registration, Akka.NET bootstrap. 16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts. 17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations. ## Key Design Decisions (for context across sessions) ### Architecture & Runtime - Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state. - Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors. - Script Actors spawn short-lived Script Execution Actors on a dedicated blocking I/O dispatcher. - Alarm Actors are separate peer subsystem from scripts (not inside Script Engine). - Shared scripts execute inline as compiled code (no separate actors). - Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering. - Instance Actors serialize all state mutations (Akka actor model); concurrent scripts produce interleaved side effects. - Staggered Instance Actor startup on failover to prevent reconnection storms. - Explicit supervision strategies: Resume for coordinator actors, Stop for short-lived execution actors. ### Data & Communication - Data Connection Layer is a clean data pipe — publishes to Instance Actors only. - DCL connection actor uses Become/Stash pattern for lifecycle state machine. - DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe. - DCL write failures returned synchronously to calling script. - Tag path resolution retried periodically for devices still booting. - Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment). - All timestamps are UTC throughout the system. ### External Integrations - External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth. - Dual call modes: `ExternalSystem.Call()` (synchronous) and `ExternalSystem.CachedCall()` (store-and-forward on transient failure). - Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script). - Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text. - Inbound API: `POST /api/{methodName}`, `X-API-Key` header, flat JSON, extended type system (Object, List). ### Templates & Deployment - Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types). - Composed member addressing uses path-qualified canonical names: `[ModuleInstanceName].[MemberName]`. - Override granularity defined per entity type and per field. - Template graph acyclicity enforced on save. - Flattened configs include a revision hash for staleness detection. - Deployment identity: unique deployment ID + revision hash for idempotency. - Per-instance operation lock covers all mutating commands (deploy, disable, enable, delete). - Site-side apply is all-or-nothing per instance. - System-wide artifact version skew across sites is supported. - Last-write-wins for concurrent template editing (no optimistic concurrency on templates). - Optimistic concurrency on deployment status records. - Naming collisions in composed feature modules are design-time errors. ### Store-and-Forward - Fixed retry interval, no max buffer size. Only transient failures buffered. - Async best-effort replication to standby (no ack wait). - Messages not cleared on instance deletion. - CachedCall idempotency is the caller's responsibility. ### Security & Auth - Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required. - JWT sessions: HMAC-SHA256 shared symmetric key, 15-minute expiry with sliding refresh, 30-minute idle timeout. - LDAP failure: new logins fail; active sessions continue with current roles. - Load balancer in front of central UI; JWT + shared Data Protection keys for failover transparency. ### Cluster & Failover - Keep-oldest split-brain resolver with `down-if-alone = on`, 15s stable-after. - Both nodes are seed nodes. `min-nr-of-members = 1`. - Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s. - CoordinatedShutdown for graceful singleton handover. - Automatic dual-node recovery from persistent storage. ### UI & Monitoring - Central UI: Blazor Server (ASP.NET Core + SignalR) with Bootstrap CSS. No third-party component frameworks (no Blazorise, MudBlazor, Radzen, etc.). Build custom Blazor components for tables, grids, forms, etc. - UI design: Clean, corporate, internal-use aesthetic. Not flashy. Use the `frontend-design` skill when designing UI pages/components. - Real-time push for debug view, health dashboard, deployment status. - Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval. - Dead letter monitoring as a health metric. - Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search. ### Code Organization - Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database. - Repository interfaces defined in Commons; implementations in Configuration Database. - Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders. - Message contracts follow additive-only evolution rules for version compatibility. - Per-component configuration via `appsettings.json` sections bound to options classes (Options pattern). - Options classes owned by component projects, not Commons. - Host readiness gating: `/health/ready` endpoint, no traffic until operational. - EF Core migrations: auto-apply in dev, manual SQL scripts for production. - Audit logging absorbed into Configuration Database component (IAuditService). ### Akka.NET Conventions - Tell for hot-path internal communication; Ask reserved for system boundaries. - Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network). - Application-level correlation IDs on all request/response messages. ## Tool Usage - When consulting with the Codex MCP tool, use model `gpt-5.4`.