- Replace "custom protocol" placeholder with full LmxProxy details (gRPC transport, SDK API mapping, session management, keep-alive, TLS, batch ops) - Add bullet-level requirement traceability, design constraint traceability (52 KDD + 6 CD), split-section tracking, and post-generation orphan check to plan framework - Resolve Q9 (LmxProxy), Q11 (REST test server), Q13 (solo dev), Q14 (self-test), Q15 (Machine Data DB out of scope) - Set Central UI constraints: Blazor Server + Bootstrap only, no heavy frameworks, custom components, clean corporate design
148 lines
9.7 KiB
Markdown
148 lines
9.7 KiB
Markdown
# ScadaLink Design Documentation Project
|
||
|
||
This project contains design documentation for a distributed SCADA system built on Akka.NET. The documents describe a hub-and-spoke architecture with a central cluster and multiple site clusters.
|
||
|
||
## Project Structure
|
||
|
||
- `README.md` — Master index with component table and architecture diagrams.
|
||
- `HighLevelReqs.md` — Complete high-level requirements covering all functional areas.
|
||
- `Component-*.md` — Individual component design documents (one per component).
|
||
- `docs/plans/` — Design decision documents from refinement sessions.
|
||
- `AkkaDotNet/` — Akka.NET reference documentation and best practices notes.
|
||
- `test_infra.md` — Master test infrastructure doc (OPC UA, LDAP, MS SQL).
|
||
- `infra/` — Docker Compose and config files for local test services.
|
||
|
||
There is no source code in this project — only design documentation in markdown.
|
||
|
||
## Document Conventions
|
||
|
||
- All documents are markdown files in the project root directory.
|
||
- Component documents are named `Component-<Name>.md` (PascalCase, hyphen-separated).
|
||
- Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions.
|
||
- The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table.
|
||
- Cross-component references in Dependencies and Interactions sections must be kept accurate across all documents. When a component's role changes, update references in all affected documents.
|
||
|
||
## Refinement Process
|
||
|
||
- When refining requirements, use the Socratic method — ask clarifying questions before making changes.
|
||
- Probe for implications of decisions across components before updating documents.
|
||
- When a decision is made, identify all affected documents and update them together for consistency.
|
||
- After updates, verify no stale cross-references remain (e.g., references to removed or renamed components).
|
||
|
||
## Editing Rules
|
||
|
||
- Edit documents in place. Do not create copies or backup files.
|
||
- When a change affects multiple documents, update all affected documents in the same session.
|
||
- Use `git diff` to review changes before committing.
|
||
- Commit related changes together with a descriptive message summarizing the design decision.
|
||
|
||
## Current Component List (17 components)
|
||
|
||
1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs.
|
||
2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle.
|
||
3. Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream.
|
||
4. Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe.
|
||
5. Central–Site Communication — Akka.NET remoting, message patterns, debug streaming.
|
||
6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication.
|
||
7. External System Gateway — External system definitions, API method invocation, database connections.
|
||
8. Notification Service — Notification lists, email delivery, store-and-forward integration.
|
||
9. Central UI — Web-based management interface, all workflows.
|
||
10. Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions.
|
||
11. Health Monitoring — Site health metrics collection and central reporting.
|
||
12. Site Event Logging — Local operational event logs at sites with central query access.
|
||
13. Cluster Infrastructure — Akka.NET cluster setup, active/standby failover, singleton support.
|
||
14. Inbound API — Web API for external systems, API key auth, script-based implementations.
|
||
15. Host — Single deployable binary, role-based component registration, Akka.NET bootstrap.
|
||
16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts.
|
||
17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations.
|
||
|
||
## Key Design Decisions (for context across sessions)
|
||
|
||
### Architecture & Runtime
|
||
- Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state.
|
||
- Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors.
|
||
- Script Actors spawn short-lived Script Execution Actors on a dedicated blocking I/O dispatcher.
|
||
- Alarm Actors are separate peer subsystem from scripts (not inside Script Engine).
|
||
- Shared scripts execute inline as compiled code (no separate actors).
|
||
- Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering.
|
||
- Instance Actors serialize all state mutations (Akka actor model); concurrent scripts produce interleaved side effects.
|
||
- Staggered Instance Actor startup on failover to prevent reconnection storms.
|
||
- Explicit supervision strategies: Resume for coordinator actors, Stop for short-lived execution actors.
|
||
|
||
### Data & Communication
|
||
- Data Connection Layer is a clean data pipe — publishes to Instance Actors only.
|
||
- DCL connection actor uses Become/Stash pattern for lifecycle state machine.
|
||
- DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe.
|
||
- DCL write failures returned synchronously to calling script.
|
||
- Tag path resolution retried periodically for devices still booting.
|
||
- Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment).
|
||
- All timestamps are UTC throughout the system.
|
||
|
||
### External Integrations
|
||
- External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth.
|
||
- Dual call modes: `ExternalSystem.Call()` (synchronous) and `ExternalSystem.CachedCall()` (store-and-forward on transient failure).
|
||
- Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script).
|
||
- Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text.
|
||
- Inbound API: `POST /api/{methodName}`, `X-API-Key` header, flat JSON, extended type system (Object, List).
|
||
|
||
### Templates & Deployment
|
||
- Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types).
|
||
- Composed member addressing uses path-qualified canonical names: `[ModuleInstanceName].[MemberName]`.
|
||
- Override granularity defined per entity type and per field.
|
||
- Template graph acyclicity enforced on save.
|
||
- Flattened configs include a revision hash for staleness detection.
|
||
- Deployment identity: unique deployment ID + revision hash for idempotency.
|
||
- Per-instance operation lock covers all mutating commands (deploy, disable, enable, delete).
|
||
- Site-side apply is all-or-nothing per instance.
|
||
- System-wide artifact version skew across sites is supported.
|
||
- Last-write-wins for concurrent template editing (no optimistic concurrency on templates).
|
||
- Optimistic concurrency on deployment status records.
|
||
- Naming collisions in composed feature modules are design-time errors.
|
||
|
||
### Store-and-Forward
|
||
- Fixed retry interval, no max buffer size. Only transient failures buffered.
|
||
- Async best-effort replication to standby (no ack wait).
|
||
- Messages not cleared on instance deletion.
|
||
- CachedCall idempotency is the caller's responsibility.
|
||
|
||
### Security & Auth
|
||
- Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required.
|
||
- JWT sessions: HMAC-SHA256 shared symmetric key, 15-minute expiry with sliding refresh, 30-minute idle timeout.
|
||
- LDAP failure: new logins fail; active sessions continue with current roles.
|
||
- Load balancer in front of central UI; JWT + shared Data Protection keys for failover transparency.
|
||
|
||
### Cluster & Failover
|
||
- Keep-oldest split-brain resolver with `down-if-alone = on`, 15s stable-after.
|
||
- Both nodes are seed nodes. `min-nr-of-members = 1`.
|
||
- Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s.
|
||
- CoordinatedShutdown for graceful singleton handover.
|
||
- Automatic dual-node recovery from persistent storage.
|
||
|
||
### UI & Monitoring
|
||
- Central UI: Blazor Server (ASP.NET Core + SignalR) with Bootstrap CSS. No third-party component frameworks (no Blazorise, MudBlazor, Radzen, etc.). Build custom Blazor components for tables, grids, forms, etc.
|
||
- UI design: Clean, corporate, internal-use aesthetic. Not flashy. Use the `frontend-design` skill when designing UI pages/components.
|
||
- Real-time push for debug view, health dashboard, deployment status.
|
||
- Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval.
|
||
- Dead letter monitoring as a health metric.
|
||
- Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search.
|
||
|
||
### Code Organization
|
||
- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.
|
||
- Repository interfaces defined in Commons; implementations in Configuration Database.
|
||
- Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders.
|
||
- Message contracts follow additive-only evolution rules for version compatibility.
|
||
- Per-component configuration via `appsettings.json` sections bound to options classes (Options pattern).
|
||
- Options classes owned by component projects, not Commons.
|
||
- Host readiness gating: `/health/ready` endpoint, no traffic until operational.
|
||
- EF Core migrations: auto-apply in dev, manual SQL scripts for production.
|
||
- Audit logging absorbed into Configuration Database component (IAuditService).
|
||
|
||
### Akka.NET Conventions
|
||
- Tell for hot-path internal communication; Ask reserved for system boundaries.
|
||
- Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network).
|
||
- Application-level correlation IDs on all request/response messages.
|
||
|
||
## Tool Usage
|
||
|
||
- When consulting with the Codex MCP tool, use model `gpt-5.4`.
|