Update CLAUDE.md and README.md with all design decisions from refinement
CLAUDE.md: reorganize Key Design Decisions into categorized sections covering architecture, data, integrations, templates, S&F, security, cluster, UI, monitoring, code organization, and Akka.NET conventions. Add docs/plans and AkkaDotNet to project structure. README.md: add technology stack table and scale summary. Update all 17 component descriptions to reflect refined designs. Update architecture diagram with load balancer, 2-node annotations, protocol connections, and component details. Add links to AkkaDotNet reference docs and design plans.
This commit is contained in:
88
CLAUDE.md
88
CLAUDE.md
@@ -7,6 +7,8 @@ This project contains design documentation for a distributed SCADA system built
|
||||
- `README.md` — Master index with component table and architecture diagrams.
|
||||
- `HighLevelReqs.md` — Complete high-level requirements covering all functional areas.
|
||||
- `Component-*.md` — Individual component design documents (one per component).
|
||||
- `docs/plans/` — Design decision documents from refinement sessions.
|
||||
- `AkkaDotNet/` — Akka.NET reference documentation and best practices notes.
|
||||
|
||||
There is no source code in this project — only design documentation in markdown.
|
||||
|
||||
@@ -54,29 +56,87 @@ There is no source code in this project — only design documentation in markdow
|
||||
|
||||
## Key Design Decisions (for context across sessions)
|
||||
|
||||
### Architecture & Runtime
|
||||
- Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state.
|
||||
- Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors.
|
||||
- Script Actors spawn short-lived Script Execution Actors for concurrent execution.
|
||||
- Script Actors spawn short-lived Script Execution Actors on a dedicated blocking I/O dispatcher.
|
||||
- Alarm Actors are separate peer subsystem from scripts (not inside Script Engine).
|
||||
- Shared scripts execute inline as compiled code (no separate actors).
|
||||
- Site-wide Akka stream for attribute value and alarm state changes (debug view subscribes to this).
|
||||
- Script Engine and Alarm Actors subscribe directly to Instance Actors (not via stream).
|
||||
- Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering.
|
||||
- Instance Actors serialize all state mutations (Akka actor model); concurrent scripts produce interleaved side effects.
|
||||
- Staggered Instance Actor startup on failover to prevent reconnection storms.
|
||||
- Explicit supervision strategies: Resume for coordinator actors, Stop for short-lived execution actors.
|
||||
|
||||
### Data & Communication
|
||||
- Data Connection Layer is a clean data pipe — publishes to Instance Actors only.
|
||||
- Pre-deployment validation is comprehensive (flattening, script compilation, trigger refs, binding completeness).
|
||||
- Data source references are relative paths; connection binding is per-attribute at instance level.
|
||||
- System-wide artifact deployment requires explicit Deployment role action.
|
||||
- Store-and-forward: fixed retry interval, no max buffer size.
|
||||
- Instance lifecycle: enabled/disabled states, deletion supported.
|
||||
- Template deletion blocked if instances or child templates reference it.
|
||||
- DCL connection actor uses Become/Stash pattern for lifecycle state machine.
|
||||
- DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe.
|
||||
- DCL write failures returned synchronously to calling script.
|
||||
- Tag path resolution retried periodically for devices still booting.
|
||||
- Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment).
|
||||
- All timestamps are UTC throughout the system.
|
||||
|
||||
### External Integrations
|
||||
- External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth.
|
||||
- Dual call modes: `ExternalSystem.Call()` (synchronous) and `ExternalSystem.CachedCall()` (store-and-forward on transient failure).
|
||||
- Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script).
|
||||
- Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text.
|
||||
- Inbound API: `POST /api/{methodName}`, `X-API-Key` header, flat JSON, extended type system (Object, List).
|
||||
|
||||
### Templates & Deployment
|
||||
- Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types).
|
||||
- Composed member addressing uses path-qualified canonical names: `[ModuleInstanceName].[MemberName]`.
|
||||
- Override granularity defined per entity type and per field.
|
||||
- Template graph acyclicity enforced on save.
|
||||
- Flattened configs include a revision hash for staleness detection.
|
||||
- Deployment identity: unique deployment ID + revision hash for idempotency.
|
||||
- Per-instance operation lock covers all mutating commands (deploy, disable, enable, delete).
|
||||
- Site-side apply is all-or-nothing per instance.
|
||||
- System-wide artifact version skew across sites is supported.
|
||||
- Last-write-wins for concurrent template editing (no optimistic concurrency on templates).
|
||||
- Optimistic concurrency on deployment status records.
|
||||
- Naming collisions in composed feature modules are design-time errors.
|
||||
- Last-write-wins for concurrent template editing.
|
||||
- Scripts compiled at site; pre-validated (test compiled) at central.
|
||||
- Max recursion depth for script-to-script calls.
|
||||
- Alarm on-trigger scripts can call instance scripts; instance scripts cannot call alarm scripts.
|
||||
- Audit logging absorbed into Configuration Database component (IAuditService).
|
||||
|
||||
### Store-and-Forward
|
||||
- Fixed retry interval, no max buffer size. Only transient failures buffered.
|
||||
- Async best-effort replication to standby (no ack wait).
|
||||
- Messages not cleared on instance deletion.
|
||||
- CachedCall idempotency is the caller's responsibility.
|
||||
|
||||
### Security & Auth
|
||||
- Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required.
|
||||
- JWT sessions: HMAC-SHA256 shared symmetric key, 15-minute expiry with sliding refresh, 30-minute idle timeout.
|
||||
- LDAP failure: new logins fail; active sessions continue with current roles.
|
||||
- Load balancer in front of central UI; JWT + shared Data Protection keys for failover transparency.
|
||||
|
||||
### Cluster & Failover
|
||||
- Keep-oldest split-brain resolver with `down-if-alone = on`, 15s stable-after.
|
||||
- Both nodes are seed nodes. `min-nr-of-members = 1`.
|
||||
- Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s.
|
||||
- CoordinatedShutdown for graceful singleton handover.
|
||||
- Automatic dual-node recovery from persistent storage.
|
||||
|
||||
### UI & Monitoring
|
||||
- Central UI: Blazor Server (ASP.NET Core + SignalR). Real-time push for debug view, health dashboard, deployment status.
|
||||
- Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval.
|
||||
- Dead letter monitoring as a health metric.
|
||||
- Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search.
|
||||
|
||||
### Code Organization
|
||||
- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.
|
||||
- Repository interfaces defined in Commons; implementations in Configuration Database.
|
||||
- Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders.
|
||||
- Message contracts follow additive-only evolution rules for version compatibility.
|
||||
- Per-component configuration via `appsettings.json` sections bound to options classes (Options pattern).
|
||||
- Options classes owned by component projects, not Commons.
|
||||
- Host readiness gating: `/health/ready` endpoint, no traffic until operational.
|
||||
- EF Core migrations: auto-apply in dev, manual SQL scripts for production.
|
||||
- Audit logging absorbed into Configuration Database component (IAuditService).
|
||||
|
||||
### Akka.NET Conventions
|
||||
- Tell for hot-path internal communication; Ask reserved for system boundaries.
|
||||
- Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network).
|
||||
- Application-level correlation IDs on all request/response messages.
|
||||
|
||||
## Tool Usage
|
||||
|
||||
|
||||
Reference in New Issue
Block a user