CLAUDE.md: reorganize Key Design Decisions into categorized sections covering architecture, data, integrations, templates, S&F, security, cluster, UI, monitoring, code organization, and Akka.NET conventions. Add docs/plans and AkkaDotNet to project structure. README.md: add technology stack table and scale summary. Update all 17 component descriptions to reflect refined designs. Update architecture diagram with load balancer, 2-node annotations, protocol connections, and component details. Add links to AkkaDotNet reference docs and design plans.
9.3 KiB
9.3 KiB
ScadaLink Design Documentation Project
This project contains design documentation for a distributed SCADA system built on Akka.NET. The documents describe a hub-and-spoke architecture with a central cluster and multiple site clusters.
Project Structure
README.md— Master index with component table and architecture diagrams.HighLevelReqs.md— Complete high-level requirements covering all functional areas.Component-*.md— Individual component design documents (one per component).docs/plans/— Design decision documents from refinement sessions.AkkaDotNet/— Akka.NET reference documentation and best practices notes.
There is no source code in this project — only design documentation in markdown.
Document Conventions
- All documents are markdown files in the project root directory.
- Component documents are named
Component-<Name>.md(PascalCase, hyphen-separated). - Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions.
- The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table.
- Cross-component references in Dependencies and Interactions sections must be kept accurate across all documents. When a component's role changes, update references in all affected documents.
Refinement Process
- When refining requirements, use the Socratic method — ask clarifying questions before making changes.
- Probe for implications of decisions across components before updating documents.
- When a decision is made, identify all affected documents and update them together for consistency.
- After updates, verify no stale cross-references remain (e.g., references to removed or renamed components).
Editing Rules
- Edit documents in place. Do not create copies or backup files.
- When a change affects multiple documents, update all affected documents in the same session.
- Use
git diffto review changes before committing. - Commit related changes together with a descriptive message summarizing the design decision.
Current Component List (17 components)
- Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs.
- Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle.
- Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream.
- Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe.
- Central–Site Communication — Akka.NET remoting, message patterns, debug streaming.
- Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication.
- External System Gateway — External system definitions, API method invocation, database connections.
- Notification Service — Notification lists, email delivery, store-and-forward integration.
- Central UI — Web-based management interface, all workflows.
- Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions.
- Health Monitoring — Site health metrics collection and central reporting.
- Site Event Logging — Local operational event logs at sites with central query access.
- Cluster Infrastructure — Akka.NET cluster setup, active/standby failover, singleton support.
- Inbound API — Web API for external systems, API key auth, script-based implementations.
- Host — Single deployable binary, role-based component registration, Akka.NET bootstrap.
- Commons — Shared types, POCO entity classes, repository interfaces, message contracts.
- Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations.
Key Design Decisions (for context across sessions)
Architecture & Runtime
- Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state.
- Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors.
- Script Actors spawn short-lived Script Execution Actors on a dedicated blocking I/O dispatcher.
- Alarm Actors are separate peer subsystem from scripts (not inside Script Engine).
- Shared scripts execute inline as compiled code (no separate actors).
- Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering.
- Instance Actors serialize all state mutations (Akka actor model); concurrent scripts produce interleaved side effects.
- Staggered Instance Actor startup on failover to prevent reconnection storms.
- Explicit supervision strategies: Resume for coordinator actors, Stop for short-lived execution actors.
Data & Communication
- Data Connection Layer is a clean data pipe — publishes to Instance Actors only.
- DCL connection actor uses Become/Stash pattern for lifecycle state machine.
- DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe.
- DCL write failures returned synchronously to calling script.
- Tag path resolution retried periodically for devices still booting.
- Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment).
- All timestamps are UTC throughout the system.
External Integrations
- External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth.
- Dual call modes:
ExternalSystem.Call()(synchronous) andExternalSystem.CachedCall()(store-and-forward on transient failure). - Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script).
- Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text.
- Inbound API:
POST /api/{methodName},X-API-Keyheader, flat JSON, extended type system (Object, List).
Templates & Deployment
- Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types).
- Composed member addressing uses path-qualified canonical names:
[ModuleInstanceName].[MemberName]. - Override granularity defined per entity type and per field.
- Template graph acyclicity enforced on save.
- Flattened configs include a revision hash for staleness detection.
- Deployment identity: unique deployment ID + revision hash for idempotency.
- Per-instance operation lock covers all mutating commands (deploy, disable, enable, delete).
- Site-side apply is all-or-nothing per instance.
- System-wide artifact version skew across sites is supported.
- Last-write-wins for concurrent template editing (no optimistic concurrency on templates).
- Optimistic concurrency on deployment status records.
- Naming collisions in composed feature modules are design-time errors.
Store-and-Forward
- Fixed retry interval, no max buffer size. Only transient failures buffered.
- Async best-effort replication to standby (no ack wait).
- Messages not cleared on instance deletion.
- CachedCall idempotency is the caller's responsibility.
Security & Auth
- Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required.
- JWT sessions: HMAC-SHA256 shared symmetric key, 15-minute expiry with sliding refresh, 30-minute idle timeout.
- LDAP failure: new logins fail; active sessions continue with current roles.
- Load balancer in front of central UI; JWT + shared Data Protection keys for failover transparency.
Cluster & Failover
- Keep-oldest split-brain resolver with
down-if-alone = on, 15s stable-after. - Both nodes are seed nodes.
min-nr-of-members = 1. - Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s.
- CoordinatedShutdown for graceful singleton handover.
- Automatic dual-node recovery from persistent storage.
UI & Monitoring
- Central UI: Blazor Server (ASP.NET Core + SignalR). Real-time push for debug view, health dashboard, deployment status.
- Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval.
- Dead letter monitoring as a health metric.
- Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search.
Code Organization
- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.
- Repository interfaces defined in Commons; implementations in Configuration Database.
- Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders.
- Message contracts follow additive-only evolution rules for version compatibility.
- Per-component configuration via
appsettings.jsonsections bound to options classes (Options pattern). - Options classes owned by component projects, not Commons.
- Host readiness gating:
/health/readyendpoint, no traffic until operational. - EF Core migrations: auto-apply in dev, manual SQL scripts for production.
- Audit logging absorbed into Configuration Database component (IAuditService).
Akka.NET Conventions
- Tell for hot-path internal communication; Ask reserved for system boundaries.
- Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network).
- Application-level correlation IDs on all request/response messages.
Tool Usage
- When consulting with the Codex MCP tool, use model
gpt-5.4.