commit 1944f94fedf5bae376dbdcd103763c5d1debf654 Author: Joseph Doherty Date: Mon Mar 16 07:39:26 2026 -0400 Initial design docs from claude.ai refinement sessions diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..b1f8195 Binary files /dev/null and b/.DS_Store differ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..e88974e --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,79 @@ +# ScadaLink Design Documentation Project + +This project contains design documentation for a distributed SCADA system built on Akka.NET. The documents describe a hub-and-spoke architecture with a central cluster and multiple site clusters. + +## Project Structure + +- `README.md` — Master index with component table and architecture diagrams. +- `HighLevelReqs.md` — Complete high-level requirements covering all functional areas. +- `Component-*.md` — Individual component design documents (one per component). + +There is no source code in this project — only design documentation in markdown. + +## Document Conventions + +- All documents are markdown files in the project root directory. +- Component documents are named `Component-.md` (PascalCase, hyphen-separated). +- Each component document follows a consistent structure: Purpose, Location, Responsibilities, detailed design sections, Dependencies, and Interactions. +- The README.md component table must stay in sync with actual component documents. When a component is added, removed, or renamed, update the table. +- Cross-component references in Dependencies and Interactions sections must be kept accurate across all documents. When a component's role changes, update references in all affected documents. + +## Refinement Process + +- When refining requirements, use the Socratic method — ask clarifying questions before making changes. +- Probe for implications of decisions across components before updating documents. +- When a decision is made, identify all affected documents and update them together for consistency. +- After updates, verify no stale cross-references remain (e.g., references to removed or renamed components). + +## Editing Rules + +- Edit documents in place. Do not create copies or backup files. +- When a change affects multiple documents, update all affected documents in the same session. +- Use `git diff` to review changes before committing. +- Commit related changes together with a descriptive message summarizing the design decision. + +## Current Component List (17 components) + +1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. +2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. +3. Site Runtime — Site-side actor hierarchy (Deployment Manager singleton, Instance/Script/Alarm Actors), script compilation, Akka stream. +4. Data Connection Layer — Protocol abstraction (OPC UA, custom), subscription management, clean data pipe. +5. Central–Site Communication — Akka.NET remoting, message patterns, debug streaming. +6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication. +7. External System Gateway — External system definitions, API method invocation, database connections. +8. Notification Service — Notification lists, email delivery, store-and-forward integration. +9. Central UI — Web-based management interface, all workflows. +10. Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions. +11. Health Monitoring — Site health metrics collection and central reporting. +12. Site Event Logging — Local operational event logs at sites with central query access. +13. Cluster Infrastructure — Akka.NET cluster setup, active/standby failover, singleton support. +14. Inbound API — Web API for external systems, API key auth, script-based implementations. +15. Host — Single deployable binary, role-based component registration, Akka.NET bootstrap. +16. Commons — Shared types, POCO entity classes, repository interfaces, message contracts. +17. Configuration Database — EF Core data access layer, repositories, unit-of-work, audit logging (IAuditService), migrations. + +## Key Design Decisions (for context across sessions) + +- Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state. +- Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors. +- Script Actors spawn short-lived Script Execution Actors for concurrent execution. +- Alarm Actors are separate peer subsystem from scripts (not inside Script Engine). +- Shared scripts execute inline as compiled code (no separate actors). +- Site-wide Akka stream for attribute value and alarm state changes (debug view subscribes to this). +- Script Engine and Alarm Actors subscribe directly to Instance Actors (not via stream). +- Data Connection Layer is a clean data pipe — publishes to Instance Actors only. +- Pre-deployment validation is comprehensive (flattening, script compilation, trigger refs, binding completeness). +- Data source references are relative paths; connection binding is per-attribute at instance level. +- System-wide artifact deployment requires explicit Deployment role action. +- Store-and-forward: fixed retry interval, no max buffer size. +- Instance lifecycle: enabled/disabled states, deletion supported. +- Template deletion blocked if instances or child templates reference it. +- Naming collisions in composed feature modules are design-time errors. +- Last-write-wins for concurrent template editing. +- Scripts compiled at site; pre-validated (test compiled) at central. +- Max recursion depth for script-to-script calls. +- Alarm on-trigger scripts can call instance scripts; instance scripts cannot call alarm scripts. +- Audit logging absorbed into Configuration Database component (IAuditService). +- Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database. +- Repository interfaces defined in Commons; implementations in Configuration Database. +- EF Core migrations: auto-apply in dev, manual SQL scripts for production. diff --git a/Component-AuditLogging.md b/Component-AuditLogging.md new file mode 100644 index 0000000..b0e903a --- /dev/null +++ b/Component-AuditLogging.md @@ -0,0 +1,143 @@ +# Component: Audit Logging + +## Purpose + +The Audit Logging component records all configuration and administrative changes in the system with the resulting state after each change, providing a queryable trail of who changed what and when. + +## Location + +Central cluster. All auditable actions occur through central (sites receive configurations but do not originate changes). + +## Responsibilities + +- Provide a service interface (`IAuditService`) for components to log changes after successful operations. +- Serialize entity state as JSON and store audit entries in the configuration MS SQL database. +- Write audit entries synchronously within the same database transaction as the change (via the unit-of-work). +- Provide query capabilities for the Central UI audit log viewer. + +--- + +## Integration Pattern + +Audit logging uses a **direct service call** pattern. Components call `IAuditService` after a successful operation: + +``` +IAuditService.LogAsync(user, action, entityType, entityId, entityName, afterState) +``` + +- **`user`**: The authenticated AD user who performed the action (provided by Security & Auth). +- **`action`**: The type of operation (create, update, delete, deploy, disable, enable). +- **`entityType`**: What was changed (template, instance, alarm, shared script, etc.). +- **`entityId`**: Unique identifier of the specific entity. +- **`entityName`**: Human-readable name of the entity. +- **`afterState`**: The entity's state after the change, serialized as JSON. Null for deletes. + +The `IAuditService` interface is defined in **Commons** (alongside the other shared interfaces). The implementation is provided by the Audit Logging component and registered in the DI container. + +### Transactional Guarantee + +Audit entries are written **synchronously** within the same database transaction as the change. Since all central components use the unit-of-work pattern (EF Core's DbContext), the audit entry is added to the same DbContext and committed in the same `SaveChangesAsync()` call. This guarantees: + +- If the change succeeds, the audit entry is always recorded. +- If the change fails and rolls back, the audit entry is also rolled back. +- No audit entries are lost due to process crashes between the change and the audit write. + +### Integration Example + +``` +Template Engine: Update Template + │ + ├── repository.UpdateTemplate(template) + ├── auditService.LogAsync(user, "update", "Template", template.Id, + │ template.Name, serialize(template)) + └── repository.SaveChangesAsync() ← both the change and audit entry commit together +``` + +--- + +## Audited Actions + +| Category | Actions | +|----------|---------| +| Templates | Create, edit, delete templates | +| Scripts | Create, edit, delete template scripts and shared scripts | +| Alarms | Create, edit, delete alarm definitions | +| Instances | Create, override values, bind connections, area assignment, disable, enable, delete | +| Deployments | Deploy to instance (who, what, which instance, success/failure) | +| System-Wide Artifact Deployments | Deploy shared scripts / external system definitions / DB connections / notification lists to sites (who, what, result) | +| External Systems | Create, edit, delete definitions | +| Database Connections | Create, edit, delete definitions | +| Notification Lists | Create, edit, delete lists and recipients | +| Inbound API | API key create, enable/disable, delete. API method create, edit, delete | +| Areas | Create, edit, delete area definitions | +| Sites & Data Connections | Create, edit, delete sites. Define and assign data connections to sites | +| Security/Admin | Role mapping changes, site permission changes | + +--- + +## Audit Entry Schema + +Each audit log entry contains: + +| Field | Type | Description | +|-------|------|-------------| +| **Id** | Long / GUID | Unique identifier for the audit entry. | +| **Timestamp** | DateTimeOffset | When the action occurred (UTC). | +| **User** | String | Authenticated AD username who performed the action. | +| **Action** | String (enum) | The type of operation: `Create`, `Update`, `Delete`, `Deploy`, `Disable`, `Enable`. | +| **EntityType** | String | What was changed: `Template`, `Instance`, `SharedScript`, `Alarm`, `ExternalSystem`, `DatabaseConnection`, `NotificationList`, `ApiKey`, `ApiMethod`, `Area`, `Site`, `DataConnection`, `LdapGroupMapping`. | +| **EntityId** | String | Unique identifier of the specific entity. | +| **EntityName** | String | Human-readable name of the entity (for display without needing to deserialize state). | +| **State** | JSON (nvarchar(max)) | The entity's state after the change, serialized as JSON. Null for deletes. | + +### State Serialization + +- Entity state is serialized as **JSON** using the standard .NET JSON serializer. +- JSON is stored in an `nvarchar(max)` column in SQL Server. +- The UI can render the JSON state for inspection. SQL Server's built-in JSON functions (`JSON_VALUE`, `OPENJSON`) can be used for ad-hoc queries against the state if needed. +- For delete operations, the state is **null** (the entity no longer exists). The previous state can be found by querying the most recent prior audit entry for the same entity. + +### Granularity + +- **One audit entry per save operation**. When a user edits a template and changes multiple attributes in a single save, one audit entry is created containing the full template state after the save. +- This captures the complete picture of what the entity looked like after each change without requiring per-field tracking. + +### Reconstructing Change History + +Since only the "after" state is stored, a full change history for an entity can be reconstructed by querying all audit entries for that entity ordered by timestamp. Comparing consecutive entries (entry N-1's state vs. entry N's state) reveals what changed at each step. This is a query-time concern handled by the Central UI, not a write-time concern. + +--- + +## Storage + +- Stored in the **configuration MS SQL database** in a dedicated audit table, accessed via the `IAuditLoggingRepository`. +- Entries are **append-only** — audit records are never modified or deleted. +- No retention policy — audit logs are retained **indefinitely**. +- Indexes on: `Timestamp`, `User`, `EntityType`, `EntityId`, `Action` for efficient filtering. + +--- + +## Query Capabilities + +The Central UI audit log viewer can filter by: +- **User**: Who made the change. +- **Entity type**: What kind of entity was changed. +- **Action type**: What kind of operation was performed. +- **Time range**: When the change occurred. +- **Specific entity ID/name**: Changes to a particular entity. + +Results are returned in reverse chronological order (most recent first) with pagination support. + +--- + +## Dependencies + +- **Configuration Database**: Storage for audit entries via `IAuditLoggingRepository`. Audit entries participate in the same unit-of-work transaction as the changes they record. +- **Security & Auth**: Provides the authenticated user identity for each entry. +- **Commons**: Defines the `IAuditService` interface and `AuditLogEntry` entity class. + +## Interactions + +- **All central components that modify state**: Call `IAuditService.LogAsync()` after successful operations. This includes Template Engine, Deployment Manager, Security & Auth, Inbound API, External System Gateway, and Notification Service. +- **Central UI**: Provides the audit log viewer for querying and displaying entries. +- **Configuration Database**: Audit entries are committed in the same transaction as the changes they record. diff --git a/Component-CentralUI.md b/Component-CentralUI.md new file mode 100644 index 0000000..3f4b658 --- /dev/null +++ b/Component-CentralUI.md @@ -0,0 +1,116 @@ +# Component: Central UI + +## Purpose + +The Central UI is a web-based management interface hosted on the central cluster. It provides all configuration, deployment, monitoring, and troubleshooting workflows for the SCADA system. There is no live machine data visualization — the UI is focused on system management, with the exception of on-demand debug views. + +## Location + +Central cluster only. Sites have no user interface. + +## Responsibilities + +- Provide authenticated access to all management workflows. +- Enforce role-based access control in the UI (Admin, Design, Deployment with site scoping). +- Present data from the configuration database, and from site clusters via remote queries. + +## Workflows / Pages + +### Template Authoring (Design Role) +- Create, edit, and delete templates. +- **Template deletion** is blocked if any instances or child templates reference the template. The UI displays the references preventing deletion. +- Manage template hierarchy (inheritance) — visual tree of parent/child relationships. +- Manage composition — add/remove feature module instances within templates. **Naming collision detection** provides immediate feedback if composed modules introduce duplicate attribute, alarm, or script names. +- Define and edit attributes, alarms, and scripts on templates. +- Set lock flags on attributes, alarms, and scripts. +- Visual indicator showing inherited vs. locally defined vs. overridden members. +- **On-demand validation**: A "Validate" action allows Design users to run comprehensive pre-deployment validation (flattening, naming collisions, script compilation, trigger references) without triggering a deployment. Provides early feedback during authoring. +- **Last-write-wins** editing — no pessimistic locks or conflict detection on templates. + +### Shared Script Management (Design Role) +- Create, edit, and delete shared (global) scripts. +- Shared scripts are not associated with any template. +- On-demand validation (compilation check) available. + +### External System Management (Design Role) +- Define external system contracts: connection details, API method definitions (parameters, return types). +- Define retry settings per external system (max retry count, fixed time between retries). + +### Database Connection Management (Design Role) +- Define named database connections: server, database, credentials. +- Define retry settings per connection (max retry count, fixed time between retries). + +### Notification List Management (Design Role) +- Create, edit, and delete notification lists. +- Manage recipients (name + email) within each list. +- Configure SMTP settings. + +### Site & Data Connection Management (Admin Role) +- Create, edit, and delete site definitions. +- Define data connections and assign them to sites (name, protocol type, connection details). + +### Area Management (Admin Role) +- Define hierarchical area structures per site. +- Parent-child area relationships. +- Assign areas when managing instances. + +### Instance Management (Deployment Role) +- Create instances from templates at a specific site. +- Assign instances to areas. +- Bind data connections — **per-attribute binding** where each attribute with a data source reference individually selects its data connection from the site's available connections. **Bulk assignment** supported: select multiple attributes and assign a data connection to all of them at once. +- Set instance-level attribute overrides (non-locked attributes only). +- Filter/search instances by site, area, template, or status. +- **Disable** instances — stops data collection, script triggers, and alarm evaluation at the site while retaining the deployed configuration. +- **Enable** instances — re-activates a disabled instance. +- **Delete** instances — removes the running configuration from the site. Blocked if the site is unreachable. Store-and-forward messages are not cleared. + +### Deployment (Deployment Role) +- View list of instances with staleness indicators (deployed config differs from template-derived config). +- Filter by site, area, template. +- View diff between deployed and current template-derived configuration. +- Deploy updated configuration to individual instances. **Pre-deployment validation** runs automatically before any deployment is sent — validation errors are displayed and block deployment. +- Track deployment status (pending, in-progress, success, failed). + +### System-Wide Artifact Deployment (Deployment Role) +- Explicitly deploy shared scripts, external system definitions, database connection definitions, and notification lists to all sites. +- This is a **separate action** from instance deployment — system-wide artifacts are not automatically pushed when definitions change. +- Track per-site deployment status. + +### Debug View (Deployment Role) +- Select a deployed instance and open a live debug view. +- Real-time streaming of all attribute values (with quality and timestamp) and alarm states for that instance. +- Initial snapshot of current state followed by streaming updates via the site-wide Akka stream. +- Stream includes attribute values formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]` and alarm states formatted as `[InstanceUniqueName].[AlarmName]`. +- Subscribe-on-demand — stream starts when opened, stops when closed. + +### Parked Message Management (Deployment Role) +- Query sites for parked messages (external system calls, notifications, cached DB writes). +- View message details (target, payload, retry count, timestamps). +- Retry or discard individual parked messages. + +### Health Monitoring Dashboard (All Roles) +- Overview of all sites with online/offline status. +- Per-site detail: active/standby node status, data connection health, script error rates, alarm evaluation error rates, store-and-forward buffer depths. + +### Site Event Log Viewer (Deployment Role) +- Query site event logs remotely. +- Filter by event type, time range, instance. +- View script executions, alarm events (activations, clears, evaluation errors), deployment events (including script compilation results), connection status changes, store-and-forward activity, instance lifecycle events (enable, disable, delete). + +### Audit Log Viewer (Admin Role) +- Query the central audit log. +- Filter by user, entity type, action type, time range. +- View before/after state for each change. + +### LDAP Group Mapping (Admin Role) +- Map LDAP groups to system roles (Admin, Design, Deployment). +- Configure site-scoping for Deployment role groups. + +## Dependencies + +- **Template Engine**: Provides template and instance data models, flattening, diff calculation, and validation. +- **Deployment Manager**: Triggers deployments, system-wide artifact deployments, and instance lifecycle commands. Provides deployment status. +- **Communication Layer**: Routes debug view subscriptions, remote queries to sites. +- **Security & Auth**: Authenticates users and enforces role-based access. +- **Configuration Database**: All central data, including audit log data for the audit log viewer. Accessed via `ICentralUiRepository`. +- **Health Monitoring**: Provides site health data for the dashboard. diff --git a/Component-ClusterInfrastructure.md b/Component-ClusterInfrastructure.md new file mode 100644 index 0000000..13e7ac3 --- /dev/null +++ b/Component-ClusterInfrastructure.md @@ -0,0 +1,89 @@ +# Component: Cluster Infrastructure + +## Purpose + +The Cluster Infrastructure component manages the Akka.NET cluster setup, active/standby node roles, failover detection, and the foundational runtime environment on which all other components run. It provides the base layer for both central and site clusters. + +## Location + +Both central and site clusters. + +## Responsibilities + +- Bootstrap the Akka.NET actor system on each node. +- Form a two-node cluster (active/standby) using Akka.NET Cluster. +- Manage leader election and role assignment (active vs. standby). +- Detect node failures and trigger failover. +- Provide the Akka.NET remoting infrastructure for inter-cluster communication. +- Support cluster singleton hosting (used by the Site Runtime Deployment Manager singleton on site clusters). +- Manage Windows service lifecycle (start, stop, restart) on each node. + +## Cluster Topology + +### Central Cluster +- Two nodes forming an Akka.NET cluster. +- One active node runs all central components (Template Engine, Deployment Manager, Central UI, etc.). +- One standby node is ready to take over on failover. +- Connected to MS SQL databases (Config DB, Machine Data DB). + +### Site Cluster (per site) +- Two nodes forming an Akka.NET cluster. +- One active node runs all site components (Site Runtime, Data Connection Layer, Store-and-Forward Engine, etc.). +- The Site Runtime Deployment Manager runs as an **Akka.NET cluster singleton** on the active node, owning the full Instance Actor hierarchy. +- One standby node receives replicated store-and-forward data and is ready to take over. +- Connected to local SQLite databases (store-and-forward buffer, event logs, deployed configurations). +- Connected to machines via data connections (OPC UA, custom protocol). + +## Failover Behavior + +### Detection +- Akka.NET Cluster monitors node health via heartbeat. +- If the active node becomes unreachable, the standby node detects the failure and promotes itself to active. + +### Central Failover +- The new active node takes over all central responsibilities. +- In-progress deployments are treated as **failed** — engineers must retry. +- The UI session may be interrupted — users reconnect to the new active node. +- No message buffering at central — no state to recover beyond what's in MS SQL. + +### Site Failover +- The new active node takes over: + - The Deployment Manager singleton restarts and re-creates the full Instance Actor hierarchy by reading deployed configurations from local SQLite. Each Instance Actor spawns its child Script and Alarm Actors. + - Data collection (Data Connection Layer re-establishes subscriptions as Instance Actors register their data source references). + - Store-and-forward delivery (buffer is already replicated locally). +- Active debug view streams from central are interrupted — the engineer must re-open them. +- Health reporting resumes from the new active node. +- Alarm states are re-evaluated from incoming values (alarm state is in-memory only). + +## Node Configuration + +Each node is configured with: +- **Cluster seed nodes**: Addresses of both nodes in the cluster. +- **Cluster role**: Central or Site (plus site identifier for site clusters). +- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication. +- **Local storage paths**: SQLite database locations (site nodes only). + +## Windows Service + +- Each node runs as a **Windows service** for automatic startup and recovery. +- Service configuration includes Akka.NET cluster settings and component-specific configuration. + +## Platform + +- **OS**: Windows Server. +- **Runtime**: .NET (Akka.NET). +- **Cluster**: Akka.NET Cluster (application-level, not Windows Server Failover Clustering). + +## Dependencies + +- **Akka.NET**: Core actor system, cluster, remoting, and cluster singleton libraries. +- **Windows**: Service hosting, networking. +- **MS SQL** (central only): Database connectivity. +- **SQLite** (sites only): Local storage. + +## Interactions + +- **All components**: Every component runs within the Akka.NET actor system managed by this infrastructure. +- **Site Runtime**: The Deployment Manager singleton relies on Akka.NET cluster singleton support provided by this infrastructure. +- **Communication Layer**: Built on top of the Akka.NET remoting provided here. +- **Health Monitoring**: Reports node status (active/standby) as a health metric. diff --git a/Component-Commons.md b/Component-Commons.md new file mode 100644 index 0000000..270eaaf --- /dev/null +++ b/Component-Commons.md @@ -0,0 +1,142 @@ +# Component: Commons + +## Purpose + +The Commons component provides the shared foundation of data types, interfaces, enums, message contracts, data transfer objects, and persistence-ignorant domain entity classes used across all other ScadaLink components. It ensures consistent type definitions for cross-component communication, data access, and eliminates duplication of common abstractions. + +## Location + +Referenced by all component libraries and the Host. + +## Responsibilities + +- Define shared data types (enums, value objects, result types) used across multiple components. +- Define **persistence-ignorant domain entity classes** (POCOs) representing all configuration database entities. These classes have no dependency on Entity Framework or any persistence framework — EF mapping is handled entirely by the Configuration Database component via Fluent API. +- Define **per-component repository interfaces** that consuming components use for data access. Repository implementations are owned by the Configuration Database component. +- Define protocol abstraction interfaces for the Data Connection Layer. +- Define cross-component message contracts and DTOs for deployment, health, communication, instance lifecycle, and other inter-component data flows. +- Contain **no business logic** — only data structures, interfaces, and enums. +- Maintain **minimal dependencies** — only core .NET libraries; no Akka.NET, no ASP.NET, no Entity Framework. + +--- + +## Requirements + +### REQ-COM-1: Shared Data Type System + +Commons must define shared primitive and utility types used across multiple components, including but not limited to: + +- **`DataType` enum**: Enumerates the data types supported by the system (e.g., Boolean, Int32, Float, Double, String, DateTime, Binary). +- **`RetryPolicy`**: A record or immutable class describing retry behavior (max retries, fixed delay between retries). +- **`Result`**: A discriminated result type that represents either a success value or an error, enabling consistent error handling across component boundaries without exceptions. +- **`InstanceState` enum**: Enabled, Disabled. +- **`DeploymentStatus` enum**: Pending, InProgress, Success, Failed. +- **`AlarmState` enum**: Active, Normal. +- **`AlarmTriggerType` enum**: ValueMatch, RangeViolation, RateOfChange. +- **`ConnectionHealth` enum**: Connected, Disconnected, Connecting, Error. + +Types defined here must be immutable and thread-safe. + +### REQ-COM-2: Protocol Abstraction + +Commons must define the protocol abstraction interfaces that the Data Connection Layer implements and other components consume: + +- **`IDataConnection`**: The common interface for reading, writing, and subscribing to device data regardless of the underlying protocol (OPC UA, custom legacy, etc.). +- **Related types**: Tag identifiers, read/write results, subscription callbacks, connection status enums, and quality codes. + +These interfaces must not reference any specific protocol implementation. + +### REQ-COM-3: Domain Entity Classes (POCOs) + +Commons must define persistence-ignorant POCO entity classes for all configuration database entities. These classes: + +- Are plain C# classes with properties — no EF attributes, no base classes from EF, no navigation property annotations. +- May include navigation properties (e.g., `Template.Attributes` as `ICollection`) defined as plain collections. The Configuration Database component configures the relationships via Fluent API. +- May include constructors that enforce invariants (e.g., required fields). +- Must have **no dependency** on Entity Framework Core or any persistence library. + +Entity classes are organized by domain area: + +- **Template & Modeling**: `Template`, `TemplateAttribute`, `TemplateAlarm`, `TemplateScript`, `TemplateComposition`, `Instance`, `InstanceAttributeOverride`, `InstanceConnectionBinding`, `Area`. +- **Shared Scripts**: `SharedScript`. +- **Sites & Data Connections**: `Site`, `DataConnection`, `SiteDataConnectionAssignment`. +- **External Systems & Database Connections**: `ExternalSystemDefinition`, `ExternalSystemMethod`, `DatabaseConnectionDefinition`. +- **Notifications**: `NotificationList`, `NotificationRecipient`, `SmtpConfiguration`. +- **Inbound API**: `ApiKey`, `ApiMethod`. +- **Security**: `LdapGroupMapping`, `SiteScopeRule`. +- **Deployment**: `DeploymentRecord`, `SystemArtifactDeploymentRecord`. +- **Audit**: `AuditLogEntry`. + +### REQ-COM-4: Per-Component Repository Interfaces + +Commons must define repository interfaces that consuming components use for data access. Each interface is tailored to the data needs of its consuming component: + +- `ITemplateEngineRepository` — Templates, attributes, alarms, scripts, compositions, instances, overrides, connection bindings, areas. +- `IDeploymentManagerRepository` — Deployment records, deployed configuration snapshots, system-wide artifact deployment records. +- `ISecurityRepository` — LDAP group mappings, site scoping rules. +- `IInboundApiRepository` — API keys, API method definitions. +- `IExternalSystemRepository` — External system definitions, method definitions, database connection definitions. +- `INotificationRepository` — Notification lists, recipients, SMTP configuration. +- `ICentralUiRepository` — Read-oriented queries spanning multiple domain areas for display purposes. + +All repository interfaces must: +- Accept and return the POCO entity classes defined in Commons. +- Include a `SaveChangesAsync()` method (or equivalent) to support unit-of-work commit. +- Have **no dependency** on Entity Framework Core — they are pure interfaces. + +Implementations of these interfaces are owned by the Configuration Database component. + +### REQ-COM-4a: Cross-Cutting Service Interfaces + +Commons must define service interfaces for cross-cutting concerns that multiple components consume: + +- **`IAuditService`**: Provides a single method for components to log audit entries: `LogAsync(user, action, entityType, entityId, entityName, afterState)`. The implementation (owned by the Audit Logging component) serializes the state as JSON and adds the audit entry to the current unit-of-work transaction. Defined in Commons so any central component can call it without depending on the Audit Logging component directly. + +### REQ-COM-5: Cross-Component Message Contracts + +Commons must define the shared DTOs and message contracts used for inter-component communication, including: + +- **Deployment DTOs**: Configuration snapshots, deployment commands, deployment status, validation results. +- **Instance Lifecycle DTOs**: Disable, enable, delete commands and responses. +- **Health DTOs**: Health check results, site status reports, heartbeat messages. Includes script error rates and alarm evaluation error rates. +- **Communication DTOs**: Site identity, connection state, routing metadata. +- **Attribute Stream DTOs**: Attribute value change messages (instance name, attribute path, value, quality, timestamp) and alarm state change messages (instance name, alarm name, state, priority, timestamp) for the site-wide Akka stream. +- **Debug View DTOs**: Subscribe/unsubscribe requests, initial snapshot, stream filter criteria. +- **Script Execution DTOs**: Script call requests (with recursion depth), return values, error results. +- **System-Wide Artifact DTOs**: Shared script packages, external system definitions, database connection definitions, notification list definitions. + +All message types must be `record` types or immutable classes suitable for use as Akka.NET messages (though Commons itself must not depend on Akka.NET). + +### REQ-COM-6: No Business Logic + +Commons must contain only: + +- Data structures (records, classes, structs) +- Interfaces +- Enums +- Constants + +It must **not** contain any business logic, service implementations, actor definitions, or orchestration code. Any method bodies must be limited to trivial data-access logic (e.g., factory methods, validation of invariants in constructors). + +### REQ-COM-7: Minimal Dependencies + +Commons must depend only on core .NET libraries (`System.*`, `Microsoft.Extensions.Primitives` if needed). It must **not** reference: + +- Akka.NET or any Akka.* packages +- ASP.NET Core or any Microsoft.AspNetCore.* packages +- Entity Framework Core or any Microsoft.EntityFrameworkCore.* packages +- Any third-party libraries requiring paid licenses + +This ensures Commons can be referenced by all components without introducing transitive dependency conflicts. + +--- + +## Dependencies + +- **None** — only core .NET SDK. + +## Interactions + +- **All component libraries**: Reference Commons for shared types, interfaces, entity classes, and contracts. +- **Configuration Database**: Implements the repository interfaces defined in Commons. Maps the POCO entity classes to the database via EF Core Fluent API. +- **Host**: References Commons transitively through the component libraries. diff --git a/Component-Communication.md b/Component-Communication.md new file mode 100644 index 0000000..68bb74f --- /dev/null +++ b/Component-Communication.md @@ -0,0 +1,102 @@ +# Component: Central–Site Communication + +## Purpose + +The Communication component manages all messaging between the central cluster and site clusters using Akka.NET. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs). + +## Location + +Both central and site clusters. Each side has communication actors that handle message routing. + +## Responsibilities + +- Establish and maintain Akka.NET remoting connections between central and each site cluster. +- Route messages between central and site clusters in a hub-and-spoke topology. +- Broker requests from external systems (via central) to sites and return responses. +- Support multiple concurrent message patterns (request/response, fire-and-forget, streaming). +- Detect site connectivity status for health monitoring. + +## Communication Patterns + +### 1. Deployment (Central → Site) +- **Pattern**: Request/Response. +- Central sends a flattened configuration to a site. +- Site Runtime receives, compiles scripts, creates/updates Instance Actors, and responds with success/failure. +- No buffering at central — if the site is unreachable, the deployment fails immediately. + +### 2. Instance Lifecycle Commands (Central → Site) +- **Pattern**: Request/Response. +- Central sends disable, enable, or delete commands for specific instances. +- Site Runtime processes the command and responds with success/failure. +- If the site is unreachable, the command fails immediately (no buffering). + +### 3. System-Wide Artifact Deployment (Central → All Sites) +- **Pattern**: Broadcast with per-site acknowledgment. +- When shared scripts, external system definitions, database connections, or notification lists are explicitly deployed, central sends them to all sites. +- Each site acknowledges receipt and reports success/failure independently. + +### 4. Integration Routing (External System → Central → Site → Central → External System) +- **Pattern**: Request/Response (brokered). +- External system sends a request to central (e.g., MES requests machine values). +- Central routes the request to the appropriate site. +- Site reads values from the Instance Actor and responds. +- Central returns the response to the external system. + +### 5. Recipe/Command Delivery (External System → Central → Site) +- **Pattern**: Fire-and-forget with acknowledgment. +- External system sends a command to central (e.g., recipe manager sends recipe). +- Central routes to the site. +- Site applies and acknowledges. + +### 6. Debug Streaming (Site → Central) +- **Pattern**: Subscribe/stream with initial snapshot. +- Central sends a subscribe request for a specific instance (identified by unique name). +- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it to central. +- Site then subscribes to the **site-wide Akka stream** filtered by the instance's unique name and forwards attribute value changes and alarm state changes to central. +- Attribute value stream messages: `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp. +- Alarm state stream messages: `[InstanceUniqueName].[AlarmName]`, state (active/normal), priority, timestamp. +- Central sends an unsubscribe request when the debug view closes. The site removes its stream subscription. +- The stream is session-based and temporary. + +### 7. Health Reporting (Site → Central) +- **Pattern**: Periodic push. +- Sites periodically send health metrics (connection status, node status, buffer depth, script error rates, alarm evaluation error rates) to central. + +### 8. Remote Queries (Central → Site) +- **Pattern**: Request/Response. +- Central queries sites for: + - Parked messages (store-and-forward dead letters). + - Site event logs. +- Central can also send management commands: + - Retry or discard parked messages. + +## Topology + +``` +Central Cluster + ├── Akka.NET Remoting → Site A Cluster + ├── Akka.NET Remoting → Site B Cluster + └── Akka.NET Remoting → Site N Cluster +``` + +- Sites do **not** communicate with each other. +- All inter-cluster communication flows through central. + +## Failover Behavior + +- **Central failover**: The standby node takes over the Akka.NET cluster role. In-progress deployments are treated as failed. Sites reconnect to the new active central node. +- **Site failover**: The standby node takes over. The Deployment Manager singleton restarts and re-creates the Instance Actor hierarchy. Central detects the node change and reconnects. Ongoing debug streams are interrupted and must be re-established by the engineer. + +## Dependencies + +- **Akka.NET Remoting**: Provides the transport layer. +- **Cluster Infrastructure**: Manages node roles and failover detection. + +## Interactions + +- **Deployment Manager (central)**: Uses communication to deliver configurations, lifecycle commands, and system-wide artifacts, and receive status. +- **Site Runtime**: Receives deployments, lifecycle commands, and artifact updates. Provides debug view data. +- **Central UI**: Debug view requests and remote queries flow through communication. +- **Health Monitoring**: Receives periodic health reports from sites. +- **Store-and-Forward Engine (site)**: Parked message queries/commands are routed through communication. +- **Site Event Logging**: Event log queries are routed through communication. diff --git a/Component-ConfigurationDatabase.md b/Component-ConfigurationDatabase.md new file mode 100644 index 0000000..a39078a --- /dev/null +++ b/Component-ConfigurationDatabase.md @@ -0,0 +1,292 @@ +# Component: Configuration Database + +## Purpose + +The Configuration Database component provides the centralized data access layer for all system configuration data stored in MS SQL. It owns the database schema, Entity Framework DbContext, repository implementations, unit-of-work support, migration management, and audit logging. All central components access configuration data through this component — no other component interacts with the configuration database directly. + +## Location + +Central cluster only. Site clusters do not access the configuration database (they receive deployed configurations via the Communication Layer). + +## Responsibilities + +- Define and own the complete database schema for the configuration MS SQL database via EF Core Fluent API mappings. +- Provide the Entity Framework Core DbContext as the single point of access to the configuration database. +- **Implement** the per-component repository interfaces defined in Commons. The interfaces and POCO entity classes live in Commons (persistence-ignorant); this component provides the EF Core implementations. +- **Implement** the `IAuditService` interface defined in Commons. Handles JSON serialization of entity state and writes audit entries within the same unit-of-work transaction as the change being audited. +- Provide unit-of-work support via EF Core's DbContext for transactional multi-entity operations. +- Manage schema migrations via EF Core Migrations with support for generating SQL scripts for manual execution in production. +- Support seed data for initial system setup. +- Manage connection pooling and connection lifecycle for the configuration database. + +**Note**: This component does **not** manage the Machine Data Database. The Machine Data Database is a separate concern with different access patterns (direct ADO.NET connections from scripts via `Database.Connection()`). + +--- + +## Database Schema + +The configuration database stores all central system data, organized by domain area: + +### Template & Modeling +- **Templates**: Template definitions (name, parent template reference, description). +- **Template Attributes**: Attribute definitions per template (name, value, data type, lock flag, description, data source reference). +- **Template Alarms**: Alarm definitions per template (name, description, priority, lock flag, trigger type, trigger configuration, on-trigger script reference). +- **Template Scripts**: Script definitions per template (name, lock flag, C# source code, trigger type, trigger configuration, minimum time between runs, parameter definitions, return value definitions). +- **Template Compositions**: Feature module composition relationships (composing template, composed template, module instance name). +- **Instances**: Instance definitions (template reference, site reference, area reference, enabled/disabled state). +- **Instance Attribute Overrides**: Per-instance attribute value overrides. +- **Instance Connection Bindings**: Per-attribute data connection binding for each instance. +- **Areas**: Hierarchical area definitions per site (name, parent area reference, site reference). + +### Shared Scripts +- **Shared Scripts**: System-wide reusable script definitions (name, C# source code, parameter definitions, return value definitions). + +### Sites & Data Connections +- **Sites**: Site definitions (name, identifier, description). +- **Data Connections**: Data connection definitions (name, protocol type, connection details) with site assignments. + +### External Systems & Database Connections +- **External System Definitions**: External system contracts (name, connection details, retry settings). +- **External System Methods**: API method definitions per external system (method name, parameter definitions, return type definitions). +- **Database Connection Definitions**: Named database connections (name, connection details, retry settings). + +### Notifications +- **Notification Lists**: List definitions (name). +- **Notification Recipients**: Recipients per list (name, email address). +- **SMTP Configuration**: Email server settings. + +### Inbound API +- **API Keys**: Key definitions (name/label, key value, enabled flag). +- **API Methods**: Method definitions (name, approved key references, parameter definitions, return value definitions, implementation script, timeout). + +### Security +- **LDAP Group Mappings**: Mappings between LDAP group names and system roles (Admin, Design, Deployment). +- **Site Scoping Rules**: Per-mapping site scope restrictions for Deployment role. + +### Deployment +- **Deployment Records**: Deployment history per instance (timestamp, user, status, deployed configuration snapshot). +- **System-Wide Artifact Deployment Records**: Deployment history for shared artifacts (timestamp, user, artifact type, status). + +### Audit Logging +- **Audit Log Entries**: Append-only audit trail (timestamp, user, action, entity type, entity ID, entity name, state as JSON). Stores only the after-state — change history is reconstructed by comparing consecutive entries. Entries are never modified or deleted. No retention policy — retained indefinitely. Indexed on timestamp, user, entity type, entity ID, and action for efficient filtering. + +--- + +## Data Access Architecture + +### DbContext + +A single `ScadaLinkDbContext` (or a small number of bounded DbContexts if warranted) serves as the EF Core entry point. The DbContext: + +- Maps the POCO entity classes defined in Commons to the database using **Fluent API only** — no data annotations on the entity classes. +- Configures relationships, indexes, constraints, and value conversions. +- Provides `SaveChangesAsync()` as the unit-of-work commit mechanism. + +### Per-Component Repository Implementations + +Repository interfaces are defined in **Commons** alongside the POCO entity classes (see Component-Commons.md, REQ-COM-4). This component provides the **EF Core implementations** of those interfaces. + +| Repository Interface (in Commons) | Consuming Component | Scope | +|---|---|---| +| `ITemplateEngineRepository` | Template Engine | Templates, attributes, alarms, scripts, compositions, instances, overrides, connection bindings, areas | +| `IDeploymentManagerRepository` | Deployment Manager | Deployment records, deployed configuration snapshots, system-wide artifact deployment records | +| `ISecurityRepository` | Security & Auth | LDAP group mappings, site scoping rules | +| `IInboundApiRepository` | Inbound API | API keys, API method definitions | +| `IExternalSystemRepository` | External System Gateway | External system definitions, method definitions, database connection definitions | +| `INotificationRepository` | Notification Service | Notification lists, recipients, SMTP configuration | +| `IHealthMonitoringRepository` | Health Monitoring | (Minimal — health data is in-memory; repository needed only if connectivity history is persisted in the future) | +| `ICentralUiRepository` | Central UI | Read-oriented queries spanning multiple domain areas for display purposes | + +Each implementation class uses the DbContext internally and works with the POCO entity classes from Commons. Consuming components depend only on Commons (for interfaces and entities) — they never reference this component or EF Core directly. The DI container in the Host wires the implementations to the interfaces. + +### Unit of Work + +EF Core's DbContext naturally provides unit-of-work semantics: + +- Multiple entity modifications within a single request are tracked by the DbContext. +- `SaveChangesAsync()` commits all pending changes in a single database transaction. +- If any part fails, the entire transaction rolls back. +- For operations that span multiple repository calls (e.g., creating a template with attributes, alarms, and scripts), the consuming component uses a single DbContext instance (via DI scoping) to ensure atomicity. + +### Example Transactional Flow + +``` +Template Engine: Create Template + │ + ├── repository.AddTemplate(template) // template is a Commons POCO + ├── repository.AddAttributes(attributes) // attributes are Commons POCOs + ├── repository.AddAlarms(alarms) // alarms are Commons POCOs + ├── repository.AddScripts(scripts) // scripts are Commons POCOs + └── repository.SaveChangesAsync() // single transaction commits all +``` + +--- + +## Audit Logging + +The Configuration Database component implements the `IAuditService` interface (defined in Commons), providing audit logging as a built-in capability of the data access layer. + +### IAuditService Implementation + +Components call `IAuditService` after a successful operation: + +``` +IAuditService.LogAsync(user, action, entityType, entityId, entityName, afterState) +``` + +- **`user`**: The authenticated AD user who performed the action. +- **`action`**: The type of operation (`Create`, `Update`, `Delete`, `Deploy`, `Disable`, `Enable`). +- **`entityType`**: What was changed (`Template`, `Instance`, `SharedScript`, `Alarm`, `ExternalSystem`, `DatabaseConnection`, `NotificationList`, `ApiKey`, `ApiMethod`, `Area`, `Site`, `DataConnection`, `LdapGroupMapping`). +- **`entityId`**: Unique identifier of the specific entity. +- **`entityName`**: Human-readable name of the entity. +- **`afterState`**: The entity's state after the change, which the implementation serializes as JSON. Null for deletes. + +### Transactional Guarantee + +Audit entries are written **synchronously** within the same database transaction as the change. The `IAuditService` implementation adds an `AuditLogEntry` to the current DbContext. When the calling component calls `SaveChangesAsync()`, both the change and the audit entry commit together. This guarantees: + +- If the change succeeds, the audit entry is always recorded. +- If the change fails and rolls back, the audit entry is also rolled back. +- No audit entries are lost due to process crashes between the change and the audit write. + +### Integration Example + +``` +Template Engine: Update Template + │ + ├── repository.UpdateTemplate(template) + ├── auditService.LogAsync(user, "Update", "Template", template.Id, + │ template.Name, template) + └── repository.SaveChangesAsync() ← both the change and audit entry commit together +``` + +### Audit Entry Schema + +| Field | Type | Description | +|-------|------|-------------| +| **Id** | Long / GUID | Unique identifier for the audit entry. | +| **Timestamp** | DateTimeOffset | When the action occurred (UTC). | +| **User** | String | Authenticated AD username. | +| **Action** | String | The type of operation. | +| **EntityType** | String | What was changed. | +| **EntityId** | String | Unique identifier of the entity. | +| **EntityName** | String | Human-readable name (for display without deserializing state). | +| **State** | nvarchar(max) | Entity state after the change, serialized as JSON. Null for deletes. | + +### State Serialization + +- Entity state is serialized as **JSON** using the standard .NET JSON serializer. +- JSON is stored in `nvarchar(max)` and is queryable via SQL Server's `JSON_VALUE` and `OPENJSON` functions. +- For deletes, the state is null. The previous state can be found by querying the most recent prior entry for the same entity. + +### Granularity + +- **One audit entry per save operation**. When a user edits a template and changes multiple attributes in a single save, one entry is created with the full entity state after the save. + +### Reconstructing Change History + +Since only the after-state is stored, change history for an entity is reconstructed by querying all entries for that entity ordered by timestamp. Comparing consecutive entries reveals what changed at each step. This is a query-time concern handled by the Central UI. + +### Audited Actions + +| Category | Actions | +|----------|---------| +| Templates | Create, edit, delete templates | +| Scripts | Create, edit, delete template scripts and shared scripts | +| Alarms | Create, edit, delete alarm definitions | +| Instances | Create, override values, bind connections, area assignment, disable, enable, delete | +| Deployments | Deploy to instance (who, what, which instance, success/failure) | +| System-Wide Artifact Deployments | Deploy shared scripts / external system definitions / DB connections / notification lists to sites (who, what, result) | +| External Systems | Create, edit, delete definitions | +| Database Connections | Create, edit, delete definitions | +| Notification Lists | Create, edit, delete lists and recipients | +| Inbound API | API key create, enable/disable, delete. API method create, edit, delete | +| Areas | Create, edit, delete area definitions | +| Sites & Data Connections | Create, edit, delete sites. Define and assign data connections to sites | +| Security/Admin | Role mapping changes, site permission changes | + +### Query Capabilities + +The Central UI audit log viewer can filter by: +- **User**: Who made the change. +- **Entity type**: What kind of entity was changed. +- **Action type**: What kind of operation was performed. +- **Time range**: When the change occurred. +- **Specific entity ID/name**: Changes to a particular entity. + +Results are returned in reverse chronological order (most recent first) with pagination support. + +--- + +## Migration Management + +### Entity Framework Core Migrations + +- Schema changes are managed via EF Core Migrations (`dotnet ef migrations add`, `dotnet ef migrations script`). +- Each migration is a versioned, incremental schema change. + +### Development Environment +- Migrations are **auto-applied** at application startup using `dbContext.Database.MigrateAsync()`. +- This allows rapid iteration without manual SQL execution. + +### Production Environment +- Migrations are **never auto-applied**. +- SQL scripts are generated via `dotnet ef migrations script --idempotent` and reviewed by a DBA or engineer. +- Scripts are executed manually in SQL Server Management Studio (SSMS) or equivalent tooling. +- The Host startup in production validates that the database schema version matches the expected migration level and fails fast with a clear error if not. + +### Migration Script Generation + +```bash +# Generate idempotent SQL script for all pending migrations +dotnet ef migrations script --idempotent --output migration.sql --project + +# Generate script from a specific migration to another +dotnet ef migrations script FromMigration ToMigration --output migration.sql +``` + +Generated scripts are idempotent — they can be safely re-run without causing errors or duplicate changes. + +--- + +## Seed Data + +The Configuration Database supports seeding initial data required for the system to be usable after a fresh installation. Seed data is applied as part of the migration pipeline. + +### Seed Data Includes +- Default system configuration values. +- Any baseline reference data required by the application. + +### Mechanism +- Seed data is defined using EF Core's `HasData()` in entity configurations or in dedicated seed migrations. +- Seed data is included in the generated SQL scripts, so it is applied alongside schema changes in both development and production. + +--- + +## Connection Management + +- Connection strings are provided via the Host's `DatabaseConfiguration` options (bound from `appsettings.json`). +- EF Core manages connection pooling via the underlying ADO.NET SQL Server provider. +- The DbContext is registered as a **scoped** service in the DI container, ensuring each request/operation gets its own instance. +- No connection management for the Machine Data Database — that is handled separately by consumers (Inbound API scripts, external system gateway). + +--- + +## Dependencies + +- **Entity Framework Core**: ORM, DbContext, migrations, change tracking. +- **Microsoft.EntityFrameworkCore.SqlServer**: SQL Server database provider. +- **MS SQL Server**: The configuration database instance. +- **Commons**: POCO entity classes and repository interfaces that this component maps and implements. + +## Interactions + +- **Template Engine**: Uses `ITemplateEngineRepository` for all template, instance, and area data operations. +- **Deployment Manager**: Uses `IDeploymentManagerRepository` for deployment records and status tracking. +- **Security & Auth**: Uses `ISecurityRepository` for LDAP group mappings and site scoping. +- **Inbound API**: Uses `IInboundApiRepository` for API keys and method definitions. +- **External System Gateway**: Uses `IExternalSystemRepository` for external system and database connection definitions. +- **Notification Service**: Uses `INotificationRepository` for notification lists and SMTP configuration. +- **Central UI**: Uses `ICentralUiRepository` for read-oriented queries across domain areas, including audit log queries for the audit log viewer. +- **All central components that modify state**: Call `IAuditService.LogAsync()` after successful operations to record audit entries within the same transaction. +- **Host**: Provides database connection configuration. Registers DbContext, repository implementations, and `IAuditService` implementation in the DI container. Triggers auto-migration in development or validates schema version in production. diff --git a/Component-DataConnectionLayer.md b/Component-DataConnectionLayer.md new file mode 100644 index 0000000..3e21ee7 --- /dev/null +++ b/Component-DataConnectionLayer.md @@ -0,0 +1,80 @@ +# Component: Data Connection Layer + +## Purpose + +The Data Connection Layer provides a uniform interface for reading from and writing to physical machines at site clusters. It abstracts protocol-specific details behind a common interface, manages subscriptions, and delivers live tag value updates to Instance Actors. It is a **clean data pipe** — it performs no evaluation of triggers, alarm conditions, or business logic. + +## Location + +Site clusters only. Central does not interact with machines directly. + +## Responsibilities + +- Manage data connections defined at the site level (OPC UA servers, custom protocol endpoints). +- Establish and maintain connections to data sources based on deployed instance configurations. +- Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration). +- Deliver tag value updates to the requesting Instance Actors. +- Support writing values to machines (when Instance Actors forward `SetAttribute` write requests for data-connected attributes). +- Report data connection health status to the Health Monitoring component. + +## Common Interface + +Both OPC UA and the custom protocol implement the same interface: + +``` +IDataConnection +├── Connect(connectionDetails) → void +├── Disconnect() → void +├── Subscribe(tagPath, callback) → subscriptionId +├── Unsubscribe(subscriptionId) → void +├── Read(tagPath) → value +├── Write(tagPath, value) → void +└── Status → ConnectionHealth +``` + +Additional protocols can be added by implementing this interface. + +## Supported Protocols + +### OPC UA +- Standard OPC UA client implementation. +- Supports subscriptions (monitored items) and read/write operations. + +### Custom Protocol +- Proprietary protocol adapter. +- Implements the same subscription-based model as OPC UA. + +## Subscription Management + +- When an Instance Actor is created (as part of the Site Runtime actor hierarchy), it registers its data source references with the Data Connection Layer. +- The DCL subscribes to the tag paths using the concrete connection details from the flattened configuration. +- Tag value updates are delivered directly to the requesting Instance Actor. +- When an Instance Actor is stopped (due to disable, delete, or redeployment), the DCL cleans up the associated subscriptions. +- When a new Instance Actor is created for a redeployment, subscriptions are established fresh based on the new configuration. + +## Write-Back Support + +- When a script calls `Instance.SetAttribute` for an attribute with a data source reference, the Instance Actor sends a write request to the DCL. +- The DCL writes the value to the physical device via the appropriate protocol. +- The existing subscription picks up the confirmed new value from the device and delivers it back to the Instance Actor as a standard value update. +- The Instance Actor's in-memory value is **not** updated until the device confirms the write. + +## Value Update Message Format + +Each value update delivered to an Instance Actor includes: +- **Tag path**: The relative path of the attribute's data source reference. +- **Value**: The new value from the device. +- **Quality**: Data quality indicator (good, bad, uncertain). +- **Timestamp**: When the value was read from the device. + +## Dependencies + +- **Site Runtime (Instance Actors)**: Receives subscription registrations and delivers value updates. Receives write requests. +- **Health Monitoring**: Reports connection status. +- **Site Event Logging**: Logs connection status changes. + +## Interactions + +- **Site Runtime (Instance Actors)**: Bidirectional — delivers value updates, receives subscription registrations and write-back commands. +- **Health Monitoring**: Reports connection health periodically. +- **Site Event Logging**: Logs connection/disconnection events. diff --git a/Component-DeploymentManager.md b/Component-DeploymentManager.md new file mode 100644 index 0000000..f4c945b --- /dev/null +++ b/Component-DeploymentManager.md @@ -0,0 +1,96 @@ +# Component: Deployment Manager + +## Purpose + +The Deployment Manager orchestrates the process of deploying configurations from the central cluster to site clusters. It coordinates between the Template Engine (which produces flattened and validated configs), the Communication Layer (which delivers them), and tracks deployment status. It also manages system-wide artifact deployment and instance lifecycle commands (disable, enable, delete). + +## Location + +Central cluster only. The site-side deployment responsibilities (receiving configs, spawning Instance Actors) are handled by the Site Runtime component. + +## Responsibilities + +- Accept deployment requests from the Central UI for individual instances. +- Request flattened and validated configurations from the Template Engine. +- Request diffs between currently deployed and template-derived configurations from the Template Engine. +- Send flattened configurations to site clusters via the Communication Layer. +- Track deployment status (pending, in-progress, success, failed). +- Handle deployment failures gracefully — if a site is unreachable or the deployment fails, report the failure. No retry or buffering at central. +- If a central failover occurs during deployment, the deployment is treated as failed and must be re-initiated. +- Deploy system-wide artifacts (shared scripts, external system definitions, database connection definitions, notification lists) to all sites on explicit request. +- Send instance lifecycle commands (disable, enable, delete) to sites via the Communication Layer. + +## Deployment Flow + +``` +Engineer (UI) → Deployment Manager (Central) + │ + ├── 1. Request validated + flattened config from Template Engine + │ (validation includes flattening, script compilation, + │ trigger references, connection binding completeness) + ├── 2. If validation fails → return errors to UI, stop + ├── 3. Send config to site via Communication Layer + │ │ + │ ▼ + │ Site Runtime (Deployment Manager Singleton) + │ ├── 4. Store new flattened config locally (SQLite) + │ ├── 5. Compile scripts at site + │ ├── 6. Create/update Instance Actor (with child Script + Alarm Actors) + │ └── 7. Report success/failure back to central + │ + └── 8. Update deployment status in config DB +``` + +## Deployment Scope + +- Deployment is performed at the **individual instance level**. +- The UI may provide convenience operations (e.g., "deploy all out-of-date instances at Site A"), but these decompose into individual instance deployments. + +## Diff View + +Before deploying, the Deployment Manager can request a diff from the Template Engine showing: +- **Added** attributes, alarms, or scripts (new in the template since last deploy). +- **Removed** members (removed from template since last deploy). +- **Changed** values (attribute values, alarm thresholds, script code that differ). +- **Connection binding changes** (data connection references that changed). + +## Deployed vs. Template-Derived State + +The system maintains two views per instance: +- **Deployed Configuration**: What is currently running at the site, as of the last successful deployment. +- **Template-Derived Configuration**: What the instance would look like if deployed now, based on the current state of its template hierarchy and instance overrides. + +These are compared to determine staleness and generate diffs. + +## Deployable Artifacts + +A deployment to a site includes the flattened instance configuration plus any system-wide artifacts that have changed: +- Shared scripts +- External system definitions +- Database connection definitions +- Notification lists (and SMTP configuration) + +System-wide artifact deployment is a **separate action** from instance deployment, triggered explicitly by a user with the Deployment role. + +## Instance Lifecycle Commands + +The Deployment Manager sends the following commands to sites via the Communication Layer: + +- **Disable**: Instructs the site to stop the Instance Actor's data subscriptions, script triggers, and alarm evaluation. The deployed configuration is retained for re-enablement. +- **Enable**: Instructs the site to re-activate a disabled instance. +- **Delete**: Instructs the site to remove the running configuration and destroy the Instance Actor and its children. Store-and-forward messages are not cleared. If the site is unreachable, the delete command **fails** — the central side does not mark the instance as deleted until the site confirms. + +## Dependencies + +- **Template Engine**: Produces flattened configurations, diffs, and validation results. +- **Communication Layer**: Delivers configurations and lifecycle commands to sites. +- **Configuration Database (MS SQL)**: Stores deployment status and deployed configuration snapshots. +- **Security & Auth**: Enforces Deployment role (with optional site scoping). +- **Configuration Database (via IAuditService)**: Logs all deployment actions, system-wide artifact deployments, and instance lifecycle changes. + +## Interactions + +- **Central UI**: Engineers trigger deployments, view diffs/status, manage instance lifecycle, and deploy system-wide artifacts. +- **Template Engine**: Provides resolved and validated configurations. +- **Site Runtime**: Receives and applies configurations and lifecycle commands. +- **Health Monitoring**: Deployment failures contribute to site health status. diff --git a/Component-ExternalSystemGateway.md b/Component-ExternalSystemGateway.md new file mode 100644 index 0000000..80d989e --- /dev/null +++ b/Component-ExternalSystemGateway.md @@ -0,0 +1,73 @@ +# Component: External System Gateway + +## Purpose + +The External System Gateway manages predefined integrations with external systems (e.g., MES, recipe managers) and database connections. It provides the runtime for invoking external API methods and executing database operations from scripts at site clusters. + +## Location + +Site clusters (executes calls directly to external systems). Central cluster (stores definitions, brokers inbound requests from external systems to sites). + +## Responsibilities + +### Definitions (Central) +- Store external system definitions in the configuration database: connection details, API method signatures (parameters and return types). +- Store database connection definitions: server, database, credentials. +- Deploy definitions uniformly to all sites (no per-site overrides). Deployment requires **explicit action** by a user with the Deployment role. +- Managed by users with the Design role. + +### Execution (Site) +- Invoke external system API methods as requested by scripts (via Script Execution Actors and Alarm Execution Actors). +- Provide raw MS SQL client connections (ADO.NET) by name for synchronous database access. +- Submit cached database writes to the Store-and-Forward Engine for reliable delivery. +- Sites communicate with external systems **directly** (not routed through central). + +### Integration Brokering (Central) +- Receive inbound requests from external systems (e.g., MES querying machine values). +- Route requests to the appropriate site via the Communication Layer. +- Return responses to the external system. + +## External System Definition + +Each external system definition includes: +- **Name**: Unique identifier (e.g., "MES", "RecipeManager"). +- **Connection Details**: Endpoint URL, authentication, protocol. +- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine). +- **Method Definitions**: List of available API methods, each with: + - Method name. + - Parameter definitions (name, type). + - Return type definition. + +## Database Connection Definition + +Each database connection definition includes: +- **Name**: Unique identifier (e.g., "MES_DB", "HistorianDB"). +- **Connection Details**: Server address, database name, credentials. +- **Retry Settings**: Max retry count, fixed time between retries (for cached writes). + +## Database Access Modes + +### Synchronous (Real-time) +- Script calls `Database.Connection("name")` and receives a raw ADO.NET `SqlConnection`. +- Full control: queries, updates, transactions, stored procedures. +- Failures are immediate — no buffering. + +### Cached Write (Store-and-Forward) +- Script calls `Database.CachedWrite("name", "sql", parameters)`. +- The write is submitted to the Store-and-Forward Engine. +- Payload includes: connection name, SQL statement, serialized parameter values. +- If the database is unavailable, the write is buffered and retried per the connection's retry settings. + +## Dependencies + +- **Configuration Database (MS SQL)**: Stores external system and database connection definitions. +- **Store-and-Forward Engine**: Handles buffering for failed external system calls and cached database writes. +- **Communication Layer**: Routes inbound external system requests from central to sites. +- **Security & Auth**: Design role manages definitions. +- **Configuration Database (via IAuditService)**: Definition changes are audit logged. + +## Interactions + +- **Site Runtime (Script/Alarm Execution Actors)**: Scripts invoke external system methods and database operations through this component. +- **Store-and-Forward Engine**: Failed calls and cached writes are routed here for reliable delivery. +- **Deployment Manager**: Receives updated definitions as part of system-wide artifact deployment (triggered explicitly by Deployment role). diff --git a/Component-HealthMonitoring.md b/Component-HealthMonitoring.md new file mode 100644 index 0000000..4125b40 --- /dev/null +++ b/Component-HealthMonitoring.md @@ -0,0 +1,61 @@ +# Component: Health Monitoring + +## Purpose + +The Health Monitoring component collects and reports operational health metrics from site clusters to the central cluster, providing engineers with visibility into the status of the distributed system. + +## Location + +Site clusters (metric collection and reporting). Central cluster (aggregation and display). + +## Responsibilities + +### Site Side +- Collect health metrics from local subsystems. +- Periodically report metrics to the central cluster via the Communication Layer. + +### Central Side +- Receive and store health metrics from all sites. +- Detect site connectivity status (online/offline) based on heartbeat presence. +- Present health data in the Central UI dashboard. + +## Monitored Metrics + +| Metric | Source | Description | +|--------|--------|-------------| +| Site online/offline | Communication Layer | Whether the site is reachable (based on heartbeat) | +| Active/standby node status | Cluster Infrastructure | Which node is active, which is standby | +| Data connection health | Data Connection Layer | Connected/disconnected per data connection | +| Script error rates | Site Runtime (Script Actors) | Frequency of script failures | +| Alarm evaluation error rates | Site Runtime (Alarm Actors) | Frequency of alarm evaluation failures | +| Store-and-forward buffer depth | Store-and-Forward Engine | Pending messages by category (external, notification, DB write) | + +## Reporting Protocol + +- Sites send a **health report message** to central at a configurable interval (e.g., every 30 seconds). +- Each report contains the current values of all monitored metrics. +- If central does not receive a report within a timeout window, the site is marked as **offline**. + +## Central Storage + +- Health metrics are held **in memory** at the central cluster for display in the UI. +- No historical health data is persisted — the dashboard shows current/latest status only. +- Site connectivity history (online/offline transitions) may optionally be logged via the Audit Log or a separate mechanism if needed in the future. + +## No Alerting + +- Health monitoring is **display-only** for now — no automated notifications or alerts are triggered by health status changes. +- This can be extended in the future. + +## Dependencies + +- **Communication Layer**: Transports health reports from sites to central. +- **Data Connection Layer (site)**: Provides connection health metrics. +- **Site Runtime (site)**: Provides script error rate and alarm evaluation error rate metrics. +- **Store-and-Forward Engine (site)**: Provides buffer depth metrics. +- **Cluster Infrastructure (site)**: Provides node role status. + +## Interactions + +- **Central UI**: Health Monitoring Dashboard displays aggregated metrics. +- **Communication Layer**: Health reports flow as periodic messages. diff --git a/Component-Host.md b/Component-Host.md new file mode 100644 index 0000000..05b58bf --- /dev/null +++ b/Component-Host.md @@ -0,0 +1,144 @@ +# Component: Host + +## Purpose + +The Host component is the single deployable executable for the entire ScadaLink system. The same binary runs on every node — central and site alike. The node's role is determined entirely by configuration (`appsettings.json`), not by which binary is deployed. On central nodes the Host additionally bootstraps ASP.NET Core to serve the Central UI and Inbound API web endpoints. + +## Location + +All nodes (central and site). + +## Responsibilities + +- Serve as the single entry point (`Program.cs`) for the ScadaLink process. +- Read and validate node configuration at startup before any actor system is created. +- Register the correct set of component services and actors based on the configured node role. +- Bootstrap the Akka.NET actor system with Remoting, Clustering, Persistence, and split-brain resolution via Akka.Hosting. +- Host ASP.NET Core web endpoints on central nodes only. +- Configure structured logging (Serilog) with environment-specific enrichment. +- Support running as a Windows Service in production and as a console application during development. +- Perform graceful shutdown via Akka.NET CoordinatedShutdown when the service is stopped. + +--- + +## Requirements + +### REQ-HOST-1: Single Binary Deployment + +The same compiled binary must be deployable to both central and site nodes. The node's role (Central or Site) is determined solely by configuration values in `appsettings.json` (or environment-specific overrides). There must be no separate build targets, projects, or conditional compilation symbols for central vs. site. + +### REQ-HOST-2: Role-Based Service Registration + +At startup the Host must inspect the configured node role and register only the component services appropriate for that role: + +- **Shared** (both Central and Site): ClusterInfrastructure, Communication, HealthMonitoring, ExternalSystemGateway, NotificationService. +- **Central only**: TemplateEngine, DeploymentManager, Security, AuditLogging, CentralUI, InboundAPI. +- **Site only**: SiteRuntime, DataConnectionLayer, StoreAndForward, SiteEventLogging. + +Components not applicable to the current role must not be registered in the DI container or the Akka.NET actor system. + +### REQ-HOST-3: Configuration Binding + +The Host must bind configuration sections from `appsettings.json` to strongly-typed options classes using the .NET Options pattern: + +- `ScadaLink:Node` section bound to `NodeConfiguration` (Role, NodeHostname, SiteId, RemotingPort). +- `ScadaLink:Cluster` section bound to `ClusterConfiguration` (SeedNodes, SplitBrainResolverStrategy, StableAfter). +- `ScadaLink:Database` section bound to `DatabaseConfiguration` (Central: ConfigurationDb, MachineDataDb connection strings; Site: SQLite paths). + +### REQ-HOST-4: Startup Validation + +Before the Akka.NET actor system is created, the Host must validate all required configuration values and fail fast with a clear error message if any are missing or invalid. Validation rules include: + +- `NodeConfiguration.Role` must be a valid `NodeRole` value. +- `NodeConfiguration.NodeHostname` must not be null or empty. +- `NodeConfiguration.RemotingPort` must be in valid port range (1–65535). +- Site nodes must have a non-empty `SiteId`. +- Central nodes must have non-empty `ConfigurationDb` and `MachineDataDb` connection strings. +- Site nodes must have non-empty SQLite path values. +- At least two seed nodes must be configured. + +### REQ-HOST-5: Windows Service Hosting + +The Host must support running as a Windows Service via `UseWindowsService()`. When launched outside of a Windows Service context (e.g., during development), it must run as a standard console application. No code changes or conditional compilation are required to switch between the two modes. + +### REQ-HOST-6: Akka.NET Bootstrap + +The Host must configure the Akka.NET actor system using Akka.Hosting with: + +- **Remoting**: Configured with the node's hostname and port from `NodeConfiguration`. +- **Clustering**: Configured with seed nodes and the node's cluster role from configuration. +- **Persistence**: Configured with the appropriate journal and snapshot store (SQL for central, SQLite for site). +- **Split-Brain Resolver**: Configured with the strategy and stable-after duration from `ClusterConfiguration`. +- **Actor registration**: Each component's actors registered via its `AddXxxActors()` extension method, conditional on the node's role. + +### REQ-HOST-7: ASP.NET Web Endpoints (Central Only) + +On central nodes, the Host must use `WebApplication.CreateBuilder` to produce a full ASP.NET Core host with Kestrel, and must map web endpoints for: + +- Central UI (via `MapCentralUI()` extension method). +- Inbound API (via `MapInboundAPI()` extension method). + +On site nodes, the Host must use `Host.CreateDefaultBuilder` to produce a generic `IHost` — **not** a `WebApplication`. This ensures no Kestrel server is started, no HTTP port is opened, and no web endpoint or middleware pipeline is configured. Site nodes are headless and must never accept inbound HTTP connections. + +### REQ-HOST-8: Structured Logging + +The Host must configure Serilog as the logging provider with: + +- Configuration-driven sink setup (console and file sinks at minimum). +- Automatic enrichment of every log entry with `SiteId`, `NodeHostname`, and `NodeRole` properties sourced from `NodeConfiguration`. +- Structured (machine-parseable) output format. + +### REQ-HOST-9: Graceful Shutdown + +When the Host process receives a stop signal (Windows Service stop, `Ctrl+C`, or SIGTERM), it must trigger Akka.NET CoordinatedShutdown to allow actors to drain in-flight work before the process exits. The Host must not call `Environment.Exit()` or forcibly terminate the actor system without coordinated shutdown. + +### REQ-HOST-10: Extension Method Convention + +Each component library must expose its services to the Host via a consistent set of extension methods: + +- `IServiceCollection.AddXxx()` — registers the component's DI services. +- `AkkaConfigurationBuilder.AddXxxActors()` — registers the component's actors with the Akka.NET actor system (for components that have actors). +- `WebApplication.MapXxx()` — maps the component's web endpoints (only for CentralUI and InboundAPI). + +The Host's `Program.cs` calls these extension methods; the component libraries own the registration logic. This keeps the Host thin and each component self-contained. + +--- + +## Component Registration Matrix + +| Component | Central | Site | DI (`AddXxx`) | Actors (`AddXxxActors`) | Endpoints (`MapXxx`) | +|---|---|---|---|---|---| +| ClusterInfrastructure | Yes | Yes | Yes | Yes | No | +| Communication | Yes | Yes | Yes | Yes | No | +| HealthMonitoring | Yes | Yes | Yes | Yes | No | +| ExternalSystemGateway | Yes | Yes | Yes | Yes | No | +| NotificationService | Yes | Yes | Yes | Yes | No | +| TemplateEngine | Yes | No | Yes | Yes | No | +| DeploymentManager | Yes | No | Yes | Yes | No | +| Security | Yes | No | Yes | Yes | No | +| CentralUI | Yes | No | Yes | No | Yes | +| InboundAPI | Yes | No | Yes | No | Yes | +| SiteRuntime | No | Yes | Yes | Yes | No | +| DataConnectionLayer | No | Yes | Yes | Yes | No | +| StoreAndForward | No | Yes | Yes | Yes | No | +| SiteEventLogging | No | Yes | Yes | Yes | No | +| ConfigurationDatabase | Yes | No | Yes | No | No | + +--- + +## Dependencies + +- **All 15 component libraries**: The Host references every component project to call their extension methods. +- **Akka.Hosting**: For `AddAkka()` and the hosting configuration builder. +- **Akka.Remote.Hosting, Akka.Cluster.Hosting, Akka.Persistence.Hosting**: For Akka subsystem configuration. +- **Serilog.AspNetCore**: For structured logging integration. +- **Microsoft.Extensions.Hosting.WindowsServices**: For Windows Service support. +- **ASP.NET Core** (central only): For web endpoint hosting. + +## Interactions + +- **All components**: The Host is the composition root — it wires every component into the DI container and actor system. +- **Configuration Database**: The Host registers the DbContext and wires repository implementations to their interfaces. In development, triggers auto-migration; in production, validates schema version. +- **ClusterInfrastructure**: The Host configures the underlying Akka.NET cluster that ClusterInfrastructure manages at runtime. +- **CentralUI / InboundAPI**: The Host maps their web endpoints into the ASP.NET Core pipeline on central nodes. +- **HealthMonitoring**: The Host's startup validation and logging configuration provide the foundation for health reporting. diff --git a/Component-InboundAPI.md b/Component-InboundAPI.md new file mode 100644 index 0000000..cdccfdf --- /dev/null +++ b/Component-InboundAPI.md @@ -0,0 +1,128 @@ +# Component: Inbound API + +## Purpose + +The Inbound API exposes a web API on the central cluster that external systems can call into. This is the reverse of the External System Gateway — where that component handles the SCADA system calling out to external systems, this component handles external systems calling in. It provides API key authentication, method-level authorization, and script-based method implementations. + +## Location + +Central cluster only (active node). Not available at site clusters. + +## Responsibilities + +- Host a web API endpoint on the central cluster. +- Authenticate inbound requests via API keys. +- Route requests to the appropriate API method definition. +- Enforce per-method API key authorization (only approved keys can call a given method). +- Execute the C# script implementation for the called method. +- Return structured responses to the caller. +- Failover: API becomes available on the new active node after central failover. + +## API Key Management + +### Storage +- API keys are stored in the **configuration database (MS SQL)**. + +### Key Properties +- **Name/Label**: Human-readable identifier for the key (e.g., "MES-Production", "RecipeManager-Dev"). +- **Key Value**: The secret key string used for authentication. +- **Enabled/Disabled Flag**: Keys can be disabled without deletion. + +### Management +- Managed by users with the **Admin** role via the Central UI. +- All key changes (create, enable/disable, delete) are audit logged. + +## API Method Definition + +### Properties +Each API method definition includes: +- **Method Name**: Unique identifier and URL path segment for the endpoint. +- **Approved API Keys**: List of API keys authorized to invoke this method. Requests from non-approved keys are rejected. +- **Parameter Definitions**: Ordered list of input parameters, each with: + - Parameter name. + - Data type (Boolean, Integer, Float, String — same fixed set as template attributes). +- **Return Value Definition**: Structure of the response, with: + - Field names and data types. Supports returning **lists of objects**. +- **Implementation Script**: C# script that executes when the method is called. Stored **inline** in the method definition. Follows standard C# authoring patterns but has no template inheritance — it is a standalone script tied to this method. +- **Timeout**: Configurable per method. Defines the maximum time the method is allowed to execute (including any routed calls to sites) before returning a timeout error to the caller. + +### Management +- Managed by users with the **Design** role via the Central UI. +- All method definition changes are audit logged. + +## Request Flow + +``` +External System + │ + ▼ +Inbound API (Central) + ├── 1. Extract API key from request + ├── 2. Validate key exists and is enabled + ├── 3. Resolve method by name + ├── 4. Check API key is in method's approved list + ├── 5. Validate and deserialize parameters + ├── 6. Execute implementation script (subject to method timeout) + ├── 7. Serialize return value + └── 8. Return response +``` + +## Implementation Script Capabilities + +The C# script that implements an API method executes on the central cluster. Unlike instance scripts at sites, inbound API scripts run on central and can interact with **any instance at any site** through a routing API. + +Inbound API scripts **cannot** call shared scripts directly — shared scripts are deployed to sites only and execute inline in Script Actors. To execute logic on a site, use `Route.To().Call()`. + +### Script Runtime API + +#### Instance Routing +- `Route.To("instanceUniqueCode").Call("scriptName", parameters)` — Invoke a script on a specific instance at any site. Central routes the call to the appropriate site via the Communication Layer. The call reaches the target Instance Actor's Script Actor, which spawns a Script Execution Actor to execute the script. The return value flows back to the calling API script. +- `Route.To("instanceUniqueCode").GetAttribute("attributeName")` — Read a single attribute value from a specific instance at any site. +- `Route.To("instanceUniqueCode").GetAttributes("attr1", "attr2", ...)` — Read multiple attribute values in a **single call**, returned as a dictionary of name-value pairs. +- `Route.To("instanceUniqueCode").SetAttribute("attributeName", value)` — Write a single attribute value on a specific instance at any site. +- `Route.To("instanceUniqueCode").SetAttributes(dictionary)` — Write multiple attribute values in a **single call**, accepting a dictionary of name-value pairs. + +#### Input/Output +- **Input parameters** are available as defined in the method definition. +- **Return value** construction matching the defined return structure. + +#### Database Access +- `Database.Connection("connectionName")` — Obtain a raw MS SQL client connection for querying the configuration or machine data databases directly from central. + +### Routing Behavior +- The `Route.To()` helper resolves the instance's site assignment from the configuration database and routes the request to the correct site cluster via the Communication Layer. +- The call is **synchronous from the API caller's perspective** — the API method blocks until the site responds or the **method-level timeout** is reached. +- If the target site is unreachable or the call times out, the call fails and the API returns an error to the caller. No store-and-forward buffering is used for inbound API calls. + +## Authentication Details + +- API key is passed in the request (e.g., via HTTP header such as `X-API-Key`). +- The system validates: + 1. The key exists in the configuration database. + 2. The key is enabled. + 3. The key is in the approved list for the requested method. +- Failed authentication returns an appropriate HTTP error (401 Unauthorized or 403 Forbidden). + +## Error Handling + +- Invalid API key → 401 Unauthorized. +- Valid key but not approved for method → 403 Forbidden. +- Invalid parameters → 400 Bad Request. +- Script execution failure → 500 Internal Server Error (with safe error message, no internal details exposed). +- Script errors are logged in the central audit/event system. + +## Dependencies + +- **Configuration Database (MS SQL)**: Stores API keys and method definitions. +- **Communication Layer**: Routes requests to sites when method implementations need site data. +- **Security & Auth**: API key validation (separate from LDAP/AD — API uses key-based auth). +- **Configuration Database (via IAuditService)**: All API key and method definition changes are audit logged. Optionally, API call activity can be logged. +- **Cluster Infrastructure**: API is hosted on the active central node and fails over with it. + +## Interactions + +- **External Systems**: Call the API with API keys. +- **Communication Layer**: API method scripts use this to reach sites. +- **Site Runtime (Instance Actors, Script Actors)**: Routed calls execute on site Instance Actors via their Script Actors. +- **Central UI**: Admin manages API keys; Design manages method definitions. +- **Configuration Database (via IAuditService)**: Configuration changes are audited. diff --git a/Component-NotificationService.md b/Component-NotificationService.md new file mode 100644 index 0000000..ed92b37 --- /dev/null +++ b/Component-NotificationService.md @@ -0,0 +1,59 @@ +# Component: Notification Service + +## Purpose + +The Notification Service provides email notification capabilities to scripts running at site clusters. It manages notification lists, handles email delivery, and integrates with the Store-and-Forward Engine for reliable delivery when the email server is unavailable. + +## Location + +Central cluster (definition management). Site clusters (email delivery). + +## Responsibilities + +### Definitions (Central) +- Store notification lists in the configuration database: list name, recipients (name + email address). +- Store email server configuration (SMTP settings). +- Deploy notification lists and SMTP configuration uniformly to all sites. Deployment requires **explicit action** by a user with the Deployment role. +- Managed by users with the Design role. + +### Delivery (Site) +- Resolve notification list names to recipient lists. +- Compose and send emails via SMTP. +- On delivery failure, submit the notification to the Store-and-Forward Engine for buffered retry. + +## Notification List Definition + +Each notification list includes: +- **Name**: Unique identifier (e.g., "Maintenance-Team", "Shift-Supervisors"). +- **Recipients**: One or more entries, each with: + - Recipient name. + - Email address. + +## Email Server Configuration + +- SMTP server address, port, authentication credentials, TLS settings. +- Retry settings: Max retry count, fixed time between retries (used by Store-and-Forward Engine). +- Defined centrally, deployed to all sites. + +## Script API + +```csharp +Notify.To("listName").Send("subject", "message") +``` + +- Available to instance scripts (via Script Execution Actors), alarm on-trigger scripts (via Alarm Execution Actors), and shared scripts (executing inline). +- Resolves the list name to recipients, composes the email, and attempts delivery. +- On failure, the notification is handed to the Store-and-Forward Engine. + +## Dependencies + +- **Configuration Database (MS SQL)**: Stores notification list definitions and SMTP config. +- **Store-and-Forward Engine**: Handles buffering for failed email deliveries. +- **Security & Auth**: Design role manages notification lists. +- **Configuration Database (via IAuditService)**: Notification list changes are audit logged. + +## Interactions + +- **Site Runtime (Script/Alarm Execution Actors)**: Scripts invoke `Notify.To().Send()` through this component. +- **Store-and-Forward Engine**: Failed notifications are buffered here. +- **Deployment Manager**: Receives updated notification lists and SMTP config as part of system-wide artifact deployment (triggered explicitly by Deployment role). diff --git a/Component-Security.md b/Component-Security.md new file mode 100644 index 0000000..fa095ae --- /dev/null +++ b/Component-Security.md @@ -0,0 +1,96 @@ +# Component: Security & Auth + +## Purpose + +The Security & Auth component handles user authentication via LDAP/Active Directory and enforces role-based authorization across the system. It maps LDAP group memberships to system roles and applies permission checks to all operations. + +## Location + +Central cluster. Sites do not have user-facing interfaces and do not perform independent authentication. + +## Responsibilities + +- Authenticate users against LDAP/Active Directory using Windows Integrated Authentication. +- Map LDAP group memberships to system roles. +- Enforce role-based access control on all API and UI operations. +- Support site-scoped permissions for the Deployment role. + +## Authentication + +- **Mechanism**: Windows Integrated Authentication (Kerberos/NTLM) against Active Directory. +- **Session**: Authenticated user identity is maintained for the duration of the UI session. +- **No local user store**: All identity and group information comes from AD. + +## Roles + +### Admin +- **Scope**: System-wide (always). +- **Permissions**: + - Manage site definitions. + - Manage site-level data connections (define and assign to sites). + - Manage area definitions per site. + - Manage LDAP group-to-role mappings. + - Manage API keys (create, enable/disable, delete). + - System-level configuration. + - View audit logs. + +### Design +- **Scope**: System-wide (always). +- **Permissions**: + - Create, edit, delete templates (including attributes, alarms, scripts). + - Manage shared scripts. + - Manage external system definitions. + - Manage database connection definitions. + - Manage notification lists and SMTP configuration. + - Manage inbound API method definitions. + - Run on-demand validation (template flattening, script compilation). + +### Deployment +- **Scope**: System-wide or site-scoped. +- **Permissions**: + - Create and manage instances (overrides, connection bindings, area assignment). + - Disable, enable, and delete instances. + - Deploy configurations to instances. + - Deploy system-wide artifacts (shared scripts, external system definitions, DB connections, notification lists) to all sites. + - View deployment diffs and status. + - Use debug view. + - Manage parked messages. + - View site event logs. +- **Site scoping**: A user with site-scoped Deployment role can only perform these actions for instances at their permitted sites. + +## Multi-Role Support + +- A user can hold **multiple roles simultaneously** by being a member of multiple LDAP groups. +- Roles are **independent** — there is no implied hierarchy between roles. +- For example, a user who is a member of both `SCADA-Designers` and `SCADA-Deploy-All` holds both the Design and Deployment roles, allowing them to author templates and also deploy configurations. + +## LDAP Group Mapping + +- System administrators configure mappings between LDAP groups and roles. +- Examples: + - `SCADA-Admins` → Admin role + - `SCADA-Designers` → Design role + - `SCADA-Deploy-All` → Deployment role (all sites) + - `SCADA-Deploy-SiteA` → Deployment role (Site A only) + - `SCADA-Deploy-SiteB` → Deployment role (Site B only) +- A user can be a member of multiple groups, granting multiple independent roles. +- Group mappings are stored in the configuration database and managed via the Central UI (Admin role). + +## Permission Enforcement + +- Every API endpoint and UI action checks the authenticated user's roles before proceeding. +- Site-scoped checks additionally verify the target site is within the user's permitted sites. +- Unauthorized actions return an appropriate error and are not logged as audit events (only successful changes are audited). + +## Dependencies + +- **Active Directory / LDAP**: Source of user identity and group memberships. +- **Configuration Database (MS SQL)**: Stores LDAP group-to-role mappings and site scoping rules. +- **Configuration Database (via IAuditService)**: Security/admin changes (role mapping updates) are audit logged. + +## Interactions + +- **Central UI**: All UI requests pass through authentication and authorization. +- **Template Engine**: Design role enforcement. +- **Deployment Manager**: Deployment role enforcement with site scoping. +- **All central components**: Role checks are a cross-cutting concern applied at the API layer. diff --git a/Component-SiteEventLogging.md b/Component-SiteEventLogging.md new file mode 100644 index 0000000..854b7df --- /dev/null +++ b/Component-SiteEventLogging.md @@ -0,0 +1,69 @@ +# Component: Site Event Logging + +## Purpose + +The Site Event Logging component records operational events at each site cluster, providing a local audit trail of runtime activity. Events are queryable from the central UI for remote troubleshooting. + +## Location + +Site clusters (event recording and storage). Central cluster (remote query access via UI). + +## Responsibilities + +- Record operational events from all site subsystems. +- Persist events to local SQLite. +- Enforce 30-day retention policy with automatic purging. +- Respond to remote queries from central for event log data. + +## Events Logged + +| Category | Events | +|----------|--------| +| Script Executions | Script started, completed, failed (with error details), recursion limit exceeded | +| Alarm Events | Alarm activated, alarm cleared (which alarm, which instance), alarm evaluation error | +| Deployment Events | Configuration received from central, scripts compiled, applied successfully, apply failed | +| Data Connection Status | Connected, disconnected, reconnected (per connection) | +| Store-and-Forward | Message queued, delivered, retried, parked | +| Instance Lifecycle | Instance enabled, disabled, deleted | + +## Event Entry Schema + +Each event entry contains: +- **Timestamp**: When the event occurred. +- **Event Type**: Category of the event (script, alarm, deployment, connection, store-and-forward, instance-lifecycle). +- **Severity**: Info, Warning, or Error. +- **Instance ID** *(optional)*: The instance associated with the event (if applicable). +- **Source**: The subsystem that generated the event (e.g., "ScriptActor:MonitorSpeed", "AlarmActor:OverTemp", "DataConnection:PLC1"). +- **Message**: Human-readable description of the event. +- **Details** *(optional)*: Additional structured data (e.g., exception stack trace, alarm name, message ID, compilation errors). + +## Storage + +- Events are stored in **local SQLite** on each site node. +- Each node maintains its own event log (the active node generates events; the standby node generates minimal events related to replication). +- **Retention**: 30 days. A background job automatically purges events older than 30 days. + +## Central Access + +- The central UI can query site event logs remotely via the Communication Layer. +- Queries support filtering by: + - Event type / category + - Time range + - Instance ID + - Severity +- The site processes the query locally and returns matching results to central. + +## Dependencies + +- **SQLite**: Local storage on each site node. +- **Communication Layer**: Handles remote query requests from central. +- **Site Runtime**: Generates script execution events, alarm events, deployment application events, and instance lifecycle events. +- **Data Connection Layer**: Generates connection status events. +- **Store-and-Forward Engine**: Generates buffer activity events. + +## Interactions + +- **All site subsystems**: Event logging is a cross-cutting concern — any subsystem that produces notable events calls the Event Logging service. +- **Communication Layer**: Receives remote queries from central and returns results. +- **Central UI**: Site Event Log Viewer displays queried events. +- **Health Monitoring**: Script error rates and alarm evaluation error rates can be derived from event log data. diff --git a/Component-SiteRuntime.md b/Component-SiteRuntime.md new file mode 100644 index 0000000..70ce731 --- /dev/null +++ b/Component-SiteRuntime.md @@ -0,0 +1,270 @@ +# Component: Site Runtime + +## Purpose + +The Site Runtime component manages the execution of deployed machine instances at site clusters. It encompasses the actor hierarchy that represents running instances, their scripts, and their alarms. It owns the site-side deployment lifecycle (receiving configs from central, compiling scripts, creating actors), script execution, alarm evaluation, and the site-wide Akka stream for attribute and alarm state changes. + +This component replaces the previously separate Script Engine and Alarm Engine concepts, unifying them under a single actor hierarchy rooted at the Deployment Manager singleton. + +## Location + +Site clusters only. + +## Responsibilities + +- Run the Deployment Manager singleton (Akka.NET cluster singleton) on the active site node. +- On startup (or failover), read all deployed configurations from local SQLite and re-create the full actor hierarchy. +- Receive deployment commands from central: new/updated instance configurations, instance lifecycle commands (disable, enable, delete), and system-wide artifact updates. +- Compile C# scripts when deployments are received. +- Manage the Instance Actor hierarchy (Instance Actors, Script Actors, Alarm Actors). +- Execute scripts via Script Actors with support for concurrent execution. +- Evaluate alarm conditions via Alarm Actors and manage alarm state. +- Maintain the site-wide Akka stream for attribute value and alarm state changes. +- Execute shared scripts inline as compiled code libraries (no separate actors). +- Enforce script call recursion limits. + +--- + +## Actor Hierarchy + +``` +Deployment Manager Singleton (Cluster Singleton) +├── Instance Actor ("MachineA-001") +│ ├── Script Actor ("MonitorSpeed") — coordinator +│ │ └── Script Execution Actor — short-lived, per invocation +│ ├── Script Actor ("CalculateOEE") — coordinator +│ │ └── Script Execution Actor — short-lived, per invocation +│ ├── Alarm Actor ("OverTemp") — coordinator +│ │ └── Alarm Execution Actor — short-lived, per on-trigger invocation +│ └── Alarm Actor ("LowPressure") — coordinator +├── Instance Actor ("MachineA-002") +│ └── ... +└── ... +``` + +--- + +## Deployment Manager Singleton + +### Role +- Akka.NET **cluster singleton** — guaranteed to run on exactly one node in the site cluster (the active node). +- On failover, Akka.NET restarts the singleton on the new active node. + +### Startup Behavior +1. Read all deployed configurations from local SQLite. +2. Read all shared scripts from local storage. +3. Compile all scripts (instance scripts, alarm on-trigger scripts, shared scripts). +4. Create Instance Actors for all deployed, **enabled** instances as child actors. +5. Make compiled shared script code available to all Script Actors. + +### Deployment Handling +- Receives flattened instance configurations from central via the Communication Layer. +- Stores the new configuration in local SQLite. +- Compiles all scripts in the configuration. +- Creates a new Instance Actor (for new instances) or updates an existing one (for redeployments). +- For redeployments: the existing Instance Actor and all its children are stopped, then a new Instance Actor is created with the updated configuration. Subscriptions are re-established. +- Reports deployment result (success/failure) back to central. + +### System-Wide Artifact Handling +- Receives updated shared scripts, external system definitions, database connection definitions, and notification lists from central. +- Stores artifacts in local SQLite/filesystem. +- Recompiles shared scripts and makes updated code available to all Script Actors. + +### Instance Lifecycle Commands +- **Disable**: Stops the Instance Actor and its children. Retains the deployed configuration in SQLite so the instance can be re-enabled without redeployment. +- **Enable**: Creates a new Instance Actor from the stored configuration (same as startup). +- **Delete**: Stops the Instance Actor and its children, removes the deployed configuration from local SQLite. Does **not** clear store-and-forward messages. + +--- + +## Instance Actor + +### Role +- **Single source of truth** for all runtime state of a deployed instance. +- Holds all attribute values (both static configuration values and live values from data connections). +- Holds current alarm states (active/normal), updated by child Alarm Actors. +- Publishes attribute value changes and alarm state changes to the site-wide Akka stream. + +### Initialization +1. Load all attribute values from the flattened configuration (static defaults). +2. Register data source references with the Data Connection Layer for subscriptions. +3. Create child Script Actors (one per script defined on the instance). +4. Create child Alarm Actors (one per alarm defined on the instance). + +### Attribute Value Updates +- Receives tag value updates from the Data Connection Layer for attributes with data source references. +- Updates the in-memory attribute value. +- Notifies subscribed child Script Actors and Alarm Actors of the change. +- Publishes the change to the site-wide Akka stream. + +### Stream Message Format +- **Attribute changes**: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp. +- **Alarm state changes**: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp. + +### GetAttribute / SetAttribute +- **GetAttribute**: Returns the current in-memory value for the requested attribute. +- **SetAttribute** (for attributes with data source reference): Sends a write request to the Data Connection Layer. The DCL writes to the physical device. The existing subscription picks up the confirmed value from the device and sends it back as a value update, which then updates the in-memory value. The in-memory value is **not** optimistically updated. +- **SetAttribute** (for static attributes): Updates the in-memory value directly. This change is ephemeral — it is lost on restart and resets to the deployed configuration value. + +### Debug View Support +- On request from central (via Communication Layer), the Instance Actor provides a **snapshot** of all current attribute values and alarm states. +- Subsequent changes are delivered via the site-wide Akka stream, filtered by instance unique name. + +### Supervision +- The Instance Actor supervises all child Script and Alarm Actors. +- When the Instance Actor is stopped (due to disable, delete, or redeployment), Akka.NET automatically stops all child actors. + +--- + +## Script Actor + +### Role +- **Coordinator** for a single script definition on an instance. +- Holds the compiled script code and trigger configuration. +- Manages trigger evaluation (interval timer, value change detection, conditional evaluation). +- Spawns short-lived Script Execution Actors for each invocation. + +### Trigger Management +- **Interval**: The Script Actor manages an internal timer. When the timer fires, it spawns a Script Execution Actor. +- **Value Change**: The Script Actor subscribes to attribute change notifications from its parent Instance Actor for the specific monitored attribute. When the attribute changes, it spawns a Script Execution Actor. +- **Conditional**: The Script Actor subscribes to attribute change notifications for the monitored attribute. On each update, it evaluates the condition (equals or not-equals a value). If the condition is met, it spawns a Script Execution Actor. +- **Minimum time between runs**: If configured, the Script Actor tracks the last execution time and skips trigger invocations that fire before the minimum interval has elapsed. + +### Concurrent Execution +- Each invocation spawns a **new Script Execution Actor** as a child. +- Multiple Script Execution Actors can run concurrently (e.g., a trigger fires while a previous `Instance.CallScript` invocation is still running). +- The Script Actor coordinates but does not block on child completion. + +### Script Execution Actor +- **Short-lived** child actor created per invocation. +- Receives: compiled script code, input parameters, reference to the parent Instance Actor, current call depth. +- Executes the script in the Akka actor context. +- Has access to the full Script Runtime API (see below). +- Returns the script's return value (if defined) to the caller, then stops. + +### Handling `Instance.CallScript` +- When an external caller (another Script Execution Actor, an Alarm Execution Actor, or a routed call from the Inbound API) sends a `CallScript` message to the Script Actor, it spawns a Script Execution Actor to handle the call. +- The caller uses the **Akka ask pattern** and receives the return value when the execution completes. + +--- + +## Alarm Actor + +### Role +- **Coordinator** for a single alarm definition on an instance. +- Evaluates alarm trigger conditions against attribute value updates. +- Manages alarm state (active/normal) in memory. +- Executes on-trigger scripts when the alarm activates. + +### Alarm Evaluation +- Subscribes to attribute change notifications from its parent Instance Actor for the attribute(s) referenced by its trigger definition. +- On each value update, evaluates the trigger condition: + - **Value Match**: Incoming value equals the predefined target. + - **Range Violation**: Value is outside the allowed min/max range. + - **Rate of Change**: Value change rate exceeds the defined threshold over time. +- When the condition is met and the alarm is currently in **normal** state, the alarm transitions to **active**: + - Updates the alarm state on the parent Instance Actor (which publishes to the Akka stream). + - If an on-trigger script is defined, spawns an Alarm Execution Actor to execute it. +- When the condition clears and the alarm is in **active** state, the alarm transitions to **normal**: + - Updates the alarm state on the parent Instance Actor. + - No script execution on clear. + +### Alarm State +- Held **in memory** only — not persisted to SQLite. +- On restart (or failover), alarm states are re-evaluated from incoming values. All alarms start in normal state and transition to active when conditions are detected. + +### Alarm Execution Actor +- **Short-lived** child actor created when an on-trigger script needs to execute. +- Same pattern as Script Execution Actor — receives compiled code, executes, returns, and stops. +- Has access to the Instance Actor for `GetAttribute`/`SetAttribute`. +- **Can** call instance scripts via `Instance.CallScript()` — sends an ask message to the appropriate sibling Script Actor. +- Instance scripts **cannot** call alarm on-trigger scripts — the call direction is one-way. + +--- + +## Shared Script Library + +- Shared scripts are compiled at the site when received from central. +- Compiled code is stored in memory and made available to all Script Actors. +- When a Script Execution Actor calls `Scripts.CallShared("scriptName", params)`, the shared script code executes **inline** in the Script Execution Actor's context — it is a direct method invocation, not an actor message. +- This avoids serialization bottlenecks since there is no shared script actor to contend for. +- Shared scripts have access to the same runtime API as instance scripts (GetAttribute, SetAttribute, external systems, notifications, databases). + +--- + +## Script Runtime API + +Available to all Script Execution Actors and Alarm Execution Actors: + +### Instance Attributes +- `Instance.GetAttribute("name")` — Read an attribute value from the parent Instance Actor. +- `Instance.SetAttribute("name", value)` — Write an attribute value. For data-connected attributes, writes to the DCL; for static attributes, updates in-memory directly. + +### Other Scripts +- `Instance.CallScript("scriptName", parameters)` — Send an ask message to a sibling Script Actor. The target Script Actor spawns a Script Execution Actor, executes, and returns the result. The call includes the current recursion depth. +- `Scripts.CallShared("scriptName", parameters)` — Execute shared script code inline (direct method invocation). The call includes the current recursion depth. + +### External Systems +- Access to predefined external system API methods (see External System Gateway component). + +### Notifications +- `Notify.To("listName").Send("subject", "message")` — Send an email notification via a named notification list. + +### Database Access +- `Database.Connection("connectionName")` — Obtain a raw MS SQL client connection (ADO.NET) for synchronous read/write. +- `Database.CachedWrite("connectionName", "sql", parameters)` — Submit a write operation for store-and-forward delivery. + +### Recursion Limit +- Every script call (`Instance.CallScript` and `Scripts.CallShared`) increments a call depth counter. +- If the counter exceeds the maximum recursion depth (default: 10), the call fails with an error. +- The error is logged to the site event log. + +--- + +## Script Scoping Rules + +- Scripts can only read/write attributes on **their own instance** (via the parent Instance Actor). +- Scripts can call other scripts on **their own instance** (via sibling Script Actors). +- Scripts can call **shared scripts** (inline execution). +- Scripts **cannot** access other instances' attributes or scripts. +- Alarm on-trigger scripts **can** call instance scripts; instance scripts **cannot** call alarm on-trigger scripts. + +--- + +## Error Handling + +### Script Errors +- Unhandled exceptions and timeouts in Script Execution Actors are **logged locally** to the site event log. +- The Script Actor (coordinator) is **not affected** — it remains active for future trigger events. +- Script failures are **not reported to central** (except as aggregated error rate metrics via Health Monitoring). + +### Alarm Evaluation Errors +- Errors during alarm condition evaluation are **logged locally** to the site event log. +- The Alarm Actor remains active and continues evaluating on subsequent value updates. +- Alarm evaluation error rates are reported to central via Health Monitoring. + +### Script Compilation Errors +- If script compilation fails when a deployment is received, the entire deployment for that instance is **rejected**. No partial state is applied. +- The failure is reported back to central as a failed deployment. +- Note: Pre-deployment validation at central should catch compilation errors before they reach the site. Site-side compilation failures indicate an unexpected issue. + +--- + +## Dependencies + +- **Data Connection Layer**: Provides tag value updates to Instance Actors. Receives write requests from Instance Actors. +- **Store-and-Forward Engine**: Handles reliable delivery for external system calls, notifications, and cached database writes submitted by scripts. +- **External System Gateway**: Provides external system method invocations for scripts. +- **Notification Service**: Handles email delivery for scripts. +- **Communication Layer**: Receives deployments and lifecycle commands from central. Handles debug view requests. Reports deployment results. +- **Site Event Logging**: Records script executions, alarm events, deployment events, instance lifecycle events. +- **Health Monitoring**: Reports script error rates and alarm evaluation error rates. +- **Local SQLite**: Persists deployed configurations. + +## Interactions + +- **Deployment Manager (central)**: Receives flattened configurations, system-wide artifact updates, and instance lifecycle commands. +- **Data Connection Layer**: Bidirectional — receives value updates, sends write-back commands. +- **Communication Layer**: Receives commands from central, sends deployment results, serves debug view data. +- **Store-and-Forward Engine**: Scripts route cached writes, notifications, and external system calls here. +- **Health Monitoring**: Periodically reports error rate metrics. diff --git a/Component-StoreAndForward.md b/Component-StoreAndForward.md new file mode 100644 index 0000000..cfeb1a6 --- /dev/null +++ b/Component-StoreAndForward.md @@ -0,0 +1,99 @@ +# Component: Store-and-Forward Engine + +## Purpose + +The Store-and-Forward Engine provides reliable message delivery for outbound communications from site clusters. It buffers messages when the target system is unavailable, retries them according to configured policies, and parks messages that exhaust retries for manual review. + +## Location + +Site clusters only. The central cluster does not buffer messages. + +## Responsibilities + +- Buffer outbound messages when the target system is unavailable. +- Manage three categories of buffered messages: + - External system API calls. + - Email notifications. + - Cached database writes. +- Retry delivery per message according to the configured retry policy. +- Park messages that exhaust their retry limit (dead-letter). +- Persist buffered messages to local SQLite for durability. +- Replicate buffered messages to the standby node via application-level replication over Akka.NET remoting. +- On failover, the standby node takes over delivery from its replicated copy. +- Respond to remote queries from central for parked message management (list, retry, discard). + +## Message Lifecycle + +``` +Script submits message + │ + ▼ +Attempt immediate delivery + │ + ├── Success → Remove from buffer + │ + └── Failure → Buffer message + │ + ▼ + Retry loop (per retry policy) + │ + ├── Success → Remove from buffer + notify standby + │ + └── Max retries exhausted → Park message +``` + +## Retry Policy + +Retry settings are defined on the **source entity** (not per-message): +- **External systems**: Each external system definition includes max retry count and time between retries. +- **Notifications**: Email/SMTP configuration includes max retry count and time between retries. +- **Cached database writes**: Each database connection definition includes max retry count and time between retries. + +The retry interval is **fixed** (not exponential backoff). Fixed interval is sufficient for the expected use cases. + +## Buffer Size + +There is **no maximum buffer size**. Messages accumulate in the buffer until delivery succeeds or retries are exhausted and the message is parked. Storage is bounded only by available disk space on the site node. + +## Persistence + +- Buffered messages are persisted to a **local SQLite database** on each site node. +- The active node persists locally and forwards each buffer operation (add, remove, park) to the standby node via Akka.NET remoting. +- The standby node applies the same operations to its own local SQLite database. +- On failover, the new active node has a complete copy of the buffer and resumes delivery. + +## Parked Message Management + +- Parked messages remain stored at the site in SQLite. +- The central UI can query sites for parked messages via the Communication Layer. +- Operators can: + - **Retry** a parked message (moves it back to the retry queue). + - **Discard** a parked message (removes it permanently). +- Store-and-forward messages are **not** automatically cleared when an instance is deleted. Pending and parked messages continue to exist and can be managed via the central UI. + +## Message Format + +Each buffered message stores: +- **Message ID**: Unique identifier. +- **Category**: External system call, notification, or cached database write. +- **Target**: External system name, notification list name, or database connection name. +- **Payload**: Serialized message content (API method + parameters, email subject + body, SQL + parameters). +- **Retry Count**: Number of attempts so far. +- **Created At**: Timestamp when the message was first queued. +- **Last Attempt At**: Timestamp of the most recent delivery attempt. +- **Status**: Pending, retrying, or parked. + +## Dependencies + +- **SQLite**: Local persistence on each node. +- **Communication Layer**: Application-level replication to standby node; remote query handling from central. +- **External System Gateway**: Delivers external system API calls. +- **Notification Service**: Delivers email notifications. +- **Database Connections**: Delivers cached database writes. +- **Site Event Logging**: Logs store-and-forward activity (queued, delivered, retried, parked). + +## Interactions + +- **Site Runtime (Script Actors)**: Scripts submit messages to the buffer (external calls, notifications, cached DB writes). +- **Communication Layer**: Handles parked message queries/commands from central. +- **Health Monitoring**: Reports buffer depth metrics. diff --git a/Component-TemplateEngine.md b/Component-TemplateEngine.md new file mode 100644 index 0000000..27965d3 --- /dev/null +++ b/Component-TemplateEngine.md @@ -0,0 +1,124 @@ +# Component: Template Engine + +## Purpose + +The Template Engine is the core modeling component that lives on the central cluster. It manages the definition, inheritance, composition, and resolution of machine templates — the blueprints from which all machine instances are created. It handles flattening templates into deployable configurations, calculating diffs between deployed and current states, and performing comprehensive pre-deployment validation. + +## Location + +Central cluster only. Sites receive flattened output and have no awareness of templates. + +## Responsibilities + +- Store and manage template definitions (attributes, alarms, scripts) in the configuration database. +- Enforce inheritance (is-a) relationships between templates. +- Enforce composition (has-a) relationships, including recursive nesting of feature modules. +- Detect and reject naming collisions when composing feature modules (design-time error). +- Resolve the attribute chain: Instance → Child Template → Parent Template → Composing Template → Composed Module. +- Enforce locking rules — locked members cannot be overridden downstream, intermediate levels can lock previously unlocked members, and nothing can unlock what's locked above. +- Support adding new attributes, alarms, and scripts in child templates. +- Prevent removal of inherited members. +- Flatten a fully resolved template + instance overrides into a deployable configuration (no template structure, just concrete attribute values with resolved data connection bindings). +- Calculate diffs between deployed and template-derived configurations. +- Perform comprehensive pre-deployment validation (see Validation section). +- Provide on-demand validation for Design users during template authoring. +- Enforce template deletion constraints — templates cannot be deleted if any instances or child templates reference them. + +## Key Entities + +### Template +- Has a unique name/ID. +- Optionally extends a parent template (inheritance). +- Contains zero or more composed feature modules (composition). +- Defines attributes, alarms, and scripts as first-class members. +- Cannot be deleted if referenced by instances or child templates. +- Concurrent editing uses **last-write-wins** — no pessimistic locking or conflict detection. + +### Attribute +- Name, Value, Data Type (Boolean, Integer, Float, String), Lock Flag, Description. +- Optional Data Source Reference — a **relative path** within a data connection (e.g., `/Motor/Speed`). The template defines *what* to read but not *where* to read it from. The connection binding is an instance-level concern. +- Value may be empty if intended to be set at instance level or via data connection binding. + +### Alarm +- Name, Description, Priority Level (0–1000), Lock Flag. +- Trigger Definition: Value Match, Range Violation, or Rate of Change. +- Optional On-Trigger Script reference. + +### Script (Template-Level) +- Name, Lock Flag, C# source code. +- Trigger configuration: Interval, Value Change, Conditional, or invoked by alarm/other script. +- Optional minimum time between runs. +- **Parameter Definition** *(optional)*: Defines input parameters (name and data type per parameter). Scripts without parameters accept no arguments. +- **Return Value Definition** *(optional)*: Defines the structure of the script's return value (field names and data types). Supports single objects and lists of objects. Scripts without a return definition return void. + +### Instance +- Associated with a specific template and a specific site. +- Assigned to an area within the site. +- Can override non-locked attribute values (no adding/removing attributes). +- Bound to data connections at instance creation — **per-attribute binding** where each attribute with a data source reference individually selects its data connection. +- Can be in **enabled** or **disabled** state. +- Can be **deleted** — deletion is blocked if the site is unreachable. + +### Area +- Hierarchical groupings per site (parent-child). +- Stored in the configuration database. +- Used for filtering/organizing instances in the UI. + +## Naming Collision Detection + +When a template composes two or more feature modules, the system must check for naming collisions across: +- Attribute names +- Alarm names +- Script names + +If any composed module introduces a name that already exists (from another composed module or from the composing template itself), this is a **design-time error**. The template cannot be saved until the conflict is resolved. Collision detection is performed recursively for nested module compositions. + +## Flattening Process + +When an instance is deployed, the Template Engine resolves the full configuration: + +1. Start with the base template's attributes, alarms, and scripts. +2. Walk the inheritance chain, applying overrides at each level (respecting locks). +3. Resolve composed feature modules, applying overrides from composing templates (respecting locks). +4. Apply instance-level overrides (respecting locks). +5. Resolve data connection bindings — replace connection name references with concrete connection details from the site. +6. Output a flat structure: list of attributes with resolved values and data source addresses, list of alarms with resolved trigger definitions, list of scripts with resolved code and triggers. + +## Diff Calculation + +The Template Engine can compare: +- The **currently deployed** flat configuration of an instance. +- The **current template-derived** flat configuration (what the instance would look like if redeployed now). + +The diff output identifies added, removed, and changed attributes/alarms/scripts. + +## Pre-Deployment Validation + +Before a deployment is sent to a site, the Template Engine performs comprehensive validation: + +- **Flattening**: The full template hierarchy resolves and flattens without errors. +- **Naming collision detection**: No duplicate attribute, alarm, or script names in the flattened configuration. +- **Script compilation**: All instance scripts and alarm on-trigger scripts are test-compiled and must compile without errors. +- **Alarm trigger references**: Alarm trigger definitions reference attributes that exist in the flattened configuration. +- **Script trigger references**: Script triggers (value change, conditional) reference attributes that exist in the flattened configuration. +- **Data connection binding completeness**: Every attribute with a data source reference has a data connection binding assigned on the instance, and the bound data connection name exists as a defined connection at the instance's site. +- **Exception**: Validation does **not** verify that data source relative paths resolve to real tags on physical devices — that is a runtime concern. + +### On-Demand Validation + +The same validation logic is available to Design users in the Central UI without triggering a deployment. This allows template authors to check their work for errors during authoring. + +### Shared Script Validation + +For shared scripts, pre-compilation validation is performed before deployment. Since shared scripts have no instance context, validation is limited to C# syntax and structural correctness. + +## Dependencies + +- **Configuration Database (MS SQL)**: Stores all templates, instances, areas, and their relationships. +- **Security & Auth**: Enforces Design role for template authoring, Deployment role for instance management. +- **Configuration Database (via IAuditService)**: All template and instance changes are audit logged. + +## Interactions + +- **Deployment Manager**: Requests flattened configurations, diffs, and validation results from the Template Engine. +- **Central UI**: Provides the data model for template authoring, instance management, and on-demand validation. diff --git a/HighLevelReqs.md b/HighLevelReqs.md new file mode 100644 index 0000000..f93e2bd --- /dev/null +++ b/HighLevelReqs.md @@ -0,0 +1,464 @@ +# SCADA System - High Level Requirements + +## 1. Deployment Architecture + +- **Site Clusters**: 2-node failover clusters deployed at each site, running on Windows. +- **Central Cluster**: A single 2-node failover cluster serving as the central hub. +- **Communication Topology**: Hub-and-spoke. Central cluster communicates with each site cluster. Site clusters do **not** communicate with one another. + +### 1.1 Central vs. Site Responsibilities +- **Central cluster** is the single source of truth for all template authoring, configuration, and deployment decisions. +- **Site clusters** receive **flattened configurations** — fully resolved attribute sets with no template structure. Sites do not need to understand templates, inheritance, or composition. +- Sites **do not** support local/emergency configuration overrides. All configuration changes originate from central. + +### 1.2 Failover +- Failover is managed at the **application level** using **Akka.NET** (not Windows Server Failover Clustering). +- Each cluster (central and site) runs an **active/standby** pair where Akka.NET manages node roles and failover detection. +- **Site failover**: The standby node takes over data collection and script execution seamlessly, including responsibility for the store-and-forward buffers. The Site Runtime Deployment Manager singleton is restarted on the new active node, which reads deployed configurations from local SQLite and re-creates the full Instance Actor hierarchy. +- **Central failover**: The standby node takes over central responsibilities. Deployments that are in-progress during a failover are treated as **failed** and must be re-initiated by the engineer. + +### 1.3 Store-and-Forward Persistence (Site Clusters Only) +- Store-and-forward applies **only at site clusters** — the central cluster does **not** buffer messages. If a site is unreachable, operations from central fail and must be retried by the engineer. +- All site-level store-and-forward buffers (external system calls, notifications, and cached database writes) are **replicated between the two site cluster nodes** using **application-level replication** over Akka.NET remoting. +- The **active node** persists buffered messages to a **local SQLite database** and forwards them to the standby node, which maintains its own local SQLite copy. +- On failover, the standby node already has a replicated copy of the buffer and takes over delivery seamlessly. +- Successfully delivered messages are removed from both nodes' local stores. +- There is **no maximum buffer size** — messages accumulate until they either succeed or exhaust retries and are parked. +- Retry intervals are **fixed** (not exponential backoff). The fixed interval is sufficient for the expected use cases. + +### 1.4 Deployment Behavior +- When central deploys a new configuration to a site instance, the site **applies it immediately** upon receipt — no local operator confirmation is required. +- If a site loses connectivity to central, it **continues operating** with its last received deployed configuration. +- The site reports back to central whether deployment was successfully applied. +- **Pre-deployment validation**: Before any deployment is sent to a site, the central cluster performs comprehensive validation including flattening the configuration, test-compiling all scripts, verifying alarm trigger references, verifying script trigger references, and checking data connection binding completeness (see Section 3.11). + +### 1.5 System-Wide Artifact Deployment +- Changes to shared scripts, external system definitions, database connection definitions, and notification lists are **not automatically propagated** to sites. +- Deployment of system-wide artifacts requires **explicit action** by a user with the **Deployment** role. +- The Design role manages the definitions; the Deployment role triggers deployment to sites. A user may hold both roles. + +## 2. Data Storage & Data Flow + +### 2.1 Central Databases (MS SQL) +- **Configuration Database**: A dedicated database for system-specific configuration data (e.g., templates, site definitions, instance configurations, system settings). +- **Machine Data Database**: A separate database for collected machine data (e.g., telemetry, measurements, events). + +### 2.2 Communication: Central ↔ Site +- Central-to-site and site-to-central communication uses **Akka.NET** (remoting/cluster). +- **Central as integration hub**: Central brokers requests between external systems and sites. For example, a recipe manager sends a recipe to central, which routes it to the appropriate site. MES requests machine values from central, which routes the request to the site and returns the response. +- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes to the site-wide Akka stream filtered by instance (see Section 8.1). + +### 2.3 Site-Level Storage & Interface +- Sites have **no user interface** — they are headless collectors, forwarders, and script executors. +- Sites require local storage for: the current deployed (flattened) configurations, deployed scripts, shared scripts, external system definitions, database connection definitions, and notification lists. +- Store-and-forward buffers are persisted to a **local SQLite database on each node** and replicated between nodes via application-level replication (see 1.3). + +### 2.4 Data Connection Protocols +- The system supports **OPC UA** and a **custom protocol**. +- Both protocols implement a **common interface** supporting: connect, subscribe to tag paths, receive value updates, and write values. +- Additional protocols can be added by implementing the common interface. +- The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions. + +### 2.5 Scale +- Approximately **10 sites**. +- **50–500 machines per site**. +- **25–75 live data point tags per machine**. + +## 3. Template & Machine Modeling + +### 3.1 Template Structure +- Machines are modeled as **instances of templates**. +- Templates define a set of **attributes**. +- Each attribute has a **lock flag** that controls whether it can be overridden downstream. + +### 3.2 Attribute Definition +Each attribute carries the following metadata: +- **Name**: Identifier for the attribute. +- **Value**: The default or configured value. May be empty if intended to be set at the instance level. +- **Data Type**: The value's type. Fixed set: Boolean, Integer, Float, String. +- **Lock Flag**: Controls whether the attribute can be overridden downstream. +- **Description**: Human-readable explanation of the attribute's purpose. +- **Data Source Reference** *(optional)*: A **relative path** within a data connection (e.g., `/Motor/Speed`). The template defines *what* to read — the path relative to a data connection. The template does **not** specify which data connection to use; that is an instance-level concern (see Section 3.3). Attributes without a data source reference are static configuration values. + +### 3.3 Data Connections +- **Data connections** are reusable, named resources defined centrally and then **assigned to specific sites** (e.g., an OPC server, a PLC endpoint). +- A data connection encapsulates the details needed to communicate with a data source (protocol, address, credentials, etc.). +- Attributes with a data source reference must be **bound to a data connection at instance creation** — the template defines *what* to read (the relative path), and the instance specifies *where* to read it from (the data connection assigned to the site). +- **Binding is per-attribute**: Each attribute with a data source reference individually selects its data connection. Different attributes on the same instance may use different data connections. The Central UI supports bulk assignment (selecting multiple attributes and assigning a data connection to all of them at once) to reduce tedium. +- Templates do **not** specify a default connection. The connection binding is an instance-level concern. +- The flattened configuration sent to a site resolves connection references into concrete connection details paired with attribute relative paths. +- Data connection names are **not** standardized across sites — different sites may have different data connection names for equivalent devices. + +### 3.4 Alarm Definitions +Alarms are **first-class template members** alongside attributes and scripts, following the same **inheritance, override, and lock rules**. + +Each alarm has: +- **Name**: Identifier for the alarm. +- **Description**: Human-readable explanation of the alarm condition. +- **Priority Level**: Numeric value from 0–1000. +- **Lock Flag**: Controls whether the alarm can be overridden downstream. +- **Trigger Definition**: One of the following trigger types: + - **Value Match**: Triggers when a monitored attribute equals a predefined value. + - **Range Violation**: Triggers when a monitored attribute value falls outside an allowed range. + - **Rate of Change**: Triggers when a monitored attribute value changes faster than a defined threshold. +- **On-Trigger Script** *(optional)*: A script to execute when the alarm triggers. The alarm on-trigger script executes in the context of the instance and can call instance scripts, but instance scripts **cannot** call alarm on-trigger scripts. The call direction is one-way. + +### 3.4.1 Alarm State +- Alarm state (active/normal) is **managed at the site level** per instance, held **in memory** by the Alarm Actor. +- When the alarm condition clears, the alarm **automatically returns to normal state** — no acknowledgment workflow is required. +- Alarm state is **not persisted** — on restart, alarm states are re-evaluated from incoming values. +- Alarm state changes are published to the site-wide Akka stream as `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp. + +### 3.5 Template Relationships + +Templates participate in two distinct relationship types: + +- **Inheritance (is-a)**: A child template extends a parent template. The child inherits all attributes, alarms, scripts, and composed feature modules from the parent. The child can: + - Override the **values** of non-locked inherited attributes, alarms, and scripts. + - **Add** new attributes, alarms, or scripts not present in the parent. + - **Not** remove attributes, alarms, or scripts defined by the parent. +- **Composition (has-a)**: A template can nest an instance of another template as a **feature module** (e.g., embedding a RecipeSystem module inside a base machine template). Feature modules can themselves compose other feature modules **recursively**. +- **Naming collisions**: If a template composes two feature modules that each define an attribute, alarm, or script with the same name, this is a **design-time error**. The system must detect and report the collision, and the template cannot be saved until the conflict is resolved. + +### 3.6 Locking +- Locking applies to **attributes, alarms, and scripts** uniformly. +- Any of these can be **locked** at the level where it is defined or overridden. +- A locked attribute **cannot** be overridden by any downstream level (child templates, composing templates, or instances). +- An unlocked attribute **can** be overridden by any downstream level. +- **Intermediate locking**: Any level in the chain can lock an attribute that was unlocked upstream. Once locked, it remains locked for all levels below — a downstream level **cannot** unlock an attribute locked above it. + +### 3.6 Attribute Resolution Order +Attributes are resolved from most-specific to least-specific. The first value encountered wins: + +1. **Instance** (site-deployed machine) +2. **Child Template** (most derived first, walking up the inheritance chain) +3. **Composing Template** (the template that embeds a feature module can override the module's attributes) +4. **Composed Module** (the original feature module definition, recursively resolved if modules nest other modules) + +At any level, an override is only permitted if the attribute has **not been locked** at a higher-priority level. + +### 3.7 Override Scope +- **Inheritance**: Child templates can override non-locked attributes from their parent, including attributes originating from composed feature modules. +- **Composition**: A template that composes a feature module can override non-locked attributes within that module. +- Overrides can "pierce" into composed modules — a child template can override attributes inside a feature module it inherited from its parent. + +### 3.8 Instance Rules +- An instance is a deployed occurrence of a template at a site. +- Instances **can** override the values of non-locked attributes. +- Instances **cannot** add new attributes. +- Instances **cannot** remove attributes. +- The instance's structure (which attributes exist, which feature modules are composed) is strictly defined by its template. +- Each instance is **assigned to an area** within its site (see 3.10). + +### 3.8.1 Instance Lifecycle +- Instances can be in one of two states: **enabled** or **disabled**. +- **Enabled**: The instance is active at the site — data subscriptions, script triggers, and alarm evaluation are all running. +- **Disabled**: The site **stops** script triggers, data subscriptions (no live data collection), and alarm evaluation. The deployed configuration is **retained** on the site so the instance can be re-enabled without redeployment. Store-and-forward messages for a disabled instance **continue to drain** (deliver pending messages). +- **Deletion**: Instances can be deleted. Deletion removes the running configuration from the site, stops subscriptions, and destroys the Instance Actor and its children. Store-and-forward messages are **not** cleared on deletion — they continue to be delivered or can be managed (retried/discarded) via parked message management. If the site is unreachable when a delete is triggered, the deletion **fails** (same behavior as a failed deployment). The central side does not mark it as deleted until the site confirms. +- Templates **cannot** be deleted if any instances or child templates reference them. The user must remove all references first. + +### 3.9 Template Deployment & Change Propagation +- Template changes are **not** automatically propagated to deployed instances. +- The system maintains two views of each instance: + - **Deployed Configuration**: The currently active configuration on the instance, as it was last explicitly deployed. + - **Template-Derived Configuration**: The configuration the instance *would* have based on the current state of its template (including resolved inheritance, composition, and overrides). +- Deployment is performed at the **individual instance level** — an engineer explicitly commands the system to update a specific instance. +- The system must be able to **show differences** between the deployed configuration and the current template-derived configuration, allowing engineers to see what would change before deploying. +- **No rollback** support is required. The system only needs to track the current deployed state, not a history of prior deployments. +- **Concurrent editing**: Template editing uses a **last-write-wins** model. No pessimistic locking or optimistic concurrency conflict detection is required. + +### 3.10 Areas +- Areas are **predefined hierarchical groupings** associated with a site, stored in the configuration database. +- Areas support **parent-child relationships** (e.g., Plant → Building → Production Line → Cell). +- Each instance is assigned to an area within its site. +- Areas are used for **filtering and finding instances** in the central UI. +- Area definitions are managed by users with the **Admin** role. + +### 3.11 Pre-Deployment Validation + +Before any deployment is sent to a site, the central cluster performs **comprehensive validation**. Validation covers: + +- **Flattening**: The full template hierarchy is resolved and flattened successfully. +- **Naming collision detection**: No duplicate attribute, alarm, or script names exist in the flattened configuration. +- **Script compilation**: All instance scripts and alarm on-trigger scripts are test-compiled and must compile without errors. +- **Alarm trigger references**: Alarm trigger definitions reference attributes that exist in the flattened configuration. +- **Script trigger references**: Script triggers (value change, conditional) reference attributes that exist in the flattened configuration. +- **Data connection binding completeness**: Every attribute with a data source reference has a data connection binding assigned on the instance, and the bound data connection name exists as a defined connection at the instance's site. +- **Exception**: Validation does **not** verify that data source relative paths resolve to real tags on physical devices — that is a runtime concern that can only be determined at the site. + +Validation is also available **on demand in the Central UI** for Design users during template authoring, providing early feedback without requiring a deployment attempt. + +For **shared scripts**, pre-compilation validation is performed before deployment to sites. Since shared scripts have no instance context, validation is limited to C# syntax and structural correctness. + +## 4. Scripting + +### 4.1 Script Definitions +- Scripts are **C#** and are defined at the **template level** as first-class template members. +- Scripts follow the same **inheritance, override, and lock rules** as attributes. A parent template can define a script, a child template can override it (if not locked), and any level can lock a script to prevent downstream changes. +- Scripts are deployed to sites as part of the flattened instance configuration. +- Scripts are **compiled at the site** when a deployment is received. Pre-compilation validation occurs at central before deployment (see Section 3.11), but the site performs the actual compilation for execution. +- Scripts can optionally define **input parameters** (name and data type per parameter). Scripts without parameter definitions accept no arguments. +- Scripts can optionally define a **return value definition** (field names and data types). Return values support **single objects** and **lists of objects**. Scripts without a return definition return void. +- Return values are used when scripts are called explicitly by other scripts (via `Instance.CallScript()` or `Scripts.CallShared()`) or by the Inbound API (via `Route.To().Call()`). When invoked by a trigger (interval, value change, conditional, alarm), any return value is discarded. + +### 4.2 Script Triggers +Scripts can be triggered by: +- **Interval**: Execute on a recurring time schedule. +- **Value Change**: Execute when a specific instance attribute value changes. +- **Conditional**: Execute when an instance attribute value equals or does not equal a given value. + +Scripts have an optional **minimum time between runs** setting. If a trigger fires before the minimum interval has elapsed since the last execution, the invocation is skipped. + +### 4.3 Script Error Handling +- If a script fails (unhandled exception, timeout, etc.), the failure is **logged locally** at the site. +- The script is **not disabled** — it remains active and will fire on the next qualifying trigger event. +- Script failures are **not reported to central**. Diagnostics are local only. +- For external system call failures within scripts, store-and-forward handling (Section 5.3) applies independently of script error handling. + +### 4.4 Script Capabilities +Scripts executing on a site for a given instance can: +- **Read** attribute values on that instance (live data points and static config). +- **Write** attribute values on that instance. For attributes with a data source reference, the write goes to the Data Connection Layer which writes to the physical device; the in-memory value updates when the device confirms the new value via the existing subscription. For static attributes, the write updates the in-memory value directly. +- **Call other scripts** on that instance via `Instance.CallScript("scriptName", params)`. Calls use the Akka ask pattern and return the called script's return value. Script-to-script calls support concurrent execution. +- **Call shared scripts** via `Scripts.CallShared("scriptName", params)`. Shared scripts execute **inline** in the calling Script Actor's context — they are compiled code libraries, not separate actors. +- **Call external system API methods** (see Section 5). +- **Send notifications** (see Section 6). +- **Access databases** by requesting an MS SQL client connection by name (see Section 5.5). + +Scripts **cannot** access other instances' attributes or scripts. + +### 4.4.1 Script Call Recursion Limit +- Script-to-script calls (via `Instance.CallScript` and `Scripts.CallShared`) enforce a **maximum recursion depth** to prevent infinite loops. +- The default maximum depth is a reasonable limit (e.g., 10 levels). +- The current call depth is tracked and incremented with each nested call. If the limit is reached, the call fails with an error logged to the site event log. +- This applies to all script call chains including alarm on-trigger scripts calling instance scripts. + +### 4.5 Shared Scripts +- Shared scripts are **not associated with any template** — they are a **system-wide library** of reusable C# scripts. +- Shared scripts can optionally define **input parameters** and **return value definitions**, following the same rules as template-level scripts. +- Managed by users with the **Design** role. +- Deployed to **all sites** for use by any instance script (deployment requires explicit action by a user with the Deployment role). +- Shared scripts execute **inline** in the calling Script Actor's context as compiled code. They are not separate actors. This avoids serialization bottlenecks and messaging overhead. +- Shared scripts are **not available on the central cluster** — Inbound API scripts cannot call them directly. To execute shared script logic, route to a site instance via `Route.To().Call()`. + +### 4.6 Alarm On-Trigger Scripts +- Alarm on-trigger scripts are defined as part of the alarm definition and execute when the alarm activates. +- They execute directly in the Alarm Actor's context (via a short-lived Alarm Execution Actor), similar to how shared scripts execute inline. +- Alarm on-trigger scripts **can** call instance scripts via `Instance.CallScript()`, which sends an ask message to the appropriate sibling Script Actor. +- Instance scripts **cannot** call alarm on-trigger scripts — the call direction is one-way. +- The recursion depth limit applies to alarm-to-instance script call chains. + +## 5. External System Integrations + +### 5.1 External System Definitions +- External systems are **predefined contracts** created by users with the **Design** role. +- Each definition includes: + - **Connection details**: Endpoint URL, authentication, protocol information. + - **Method definitions**: Available API methods with defined parameters and return types. +- Definitions are deployed **uniformly to all sites** — no per-site connection detail overrides. +- Deployment of definition changes requires **explicit action** by a user with the Deployment role. + +### 5.2 Site-to-External-System Communication +- Sites communicate with external systems **directly** (not routed through central). +- Scripts invoke external system methods by referencing the predefined definitions. + +### 5.3 Store-and-Forward for External Calls +- If an external system is unavailable when a script invokes a method, the message is **buffered locally at the site**. +- Retry is performed **per message** — individual failed messages retry independently. +- Each external system definition includes configurable retry settings: + - **Max retry count**: Maximum number of retry attempts before giving up. + - **Time between retries**: Fixed interval between retry attempts (no exponential backoff). +- After max retries are exhausted, the message is **parked** (dead-lettered) for manual review. +- There is **no maximum buffer size** — messages accumulate until delivery succeeds or retries are exhausted. + +### 5.4 Parked Message Management +- Parked messages are **stored at the site** where they originated. +- The **central UI** can **query sites** for parked messages and manage them remotely. +- Operators can **retry** or **discard** parked messages from the central UI. +- Parked message management covers **external system calls**, **notifications**, and **cached database writes**. + +### 5.5 Database Connections +- Database connections are **predefined, named resources** created by users with the **Design** role. +- Each definition includes the connection details needed to connect to an MS SQL database (server, database name, credentials, etc.). +- Each definition includes configurable retry settings (same pattern as external systems): **max retry count** and **time between retries** (fixed interval). +- Definitions are deployed **uniformly to all sites** — no per-site overrides. +- Deployment of definition changes requires **explicit action** by a user with the Deployment role. + +### 5.6 Database Access Modes +Scripts can interact with databases in two modes: + +- **Real-time (synchronous)**: Scripts request a **raw MS SQL client connection by name** (e.g., `Database.Connection("MES_DB")`), giving script authors full ADO.NET-level control for immediate queries and updates. +- **Cached write (store-and-forward)**: Scripts submit a write operation for deferred, reliable delivery. The cached entry stores the **database connection name**, the **SQL statement to execute**, and **parameter values**. If the database is unavailable, the write is buffered locally at the site and retried per the connection's retry settings. After max retries are exhausted, the write is **parked** for manual review (managed via central UI alongside other parked messages). + +## 6. Notifications + +### 6.1 Notification Lists +- Notification lists are **system-wide**, managed by users with the **Design** role. +- Each list has a **name** and contains one or more **recipients**. +- Each recipient has a **name** and an **email address**. +- Notification lists are deployed to **all sites** (deployment requires explicit action by a user with the Deployment role). + +### 6.2 Email Support +- The system has **predefined support for sending email** as the notification delivery mechanism. +- Email server configuration (SMTP settings) is defined centrally and deployed to all sites. + +### 6.3 Script API +- Scripts send notifications using a simplified API: `Notify.To("list name").Send("subject", "message")` +- This API is available to instance scripts, alarm on-trigger scripts, and shared scripts. + +### 6.4 Store-and-Forward for Notifications +- If the email server is unavailable, notifications are **buffered locally at the site**. +- Follows the same retry pattern as external system calls: configurable **max retry count** and **time between retries** (fixed interval). +- After max retries are exhausted, the notification is **parked** for manual review (managed via central UI alongside external system parked messages). +- There is **no maximum buffer size** for notification messages. + +## 7. Inbound API (Central) + +### 7.1 Purpose +The system exposes a **web API on the central cluster** for external systems to call into the SCADA system. This is the counterpart to the outbound External System Integrations (Section 5) — where Section 5 defines how the system calls out, this section defines how external systems call in. + +### 7.2 API Key Management +- API keys are stored in the **configuration database**. +- Each API key has a **name/label** (for identification), the **key value**, and an **enabled/disabled** flag. +- API keys are managed by users with the **Admin** role. + +### 7.3 Authentication +- Inbound API requests are authenticated via **API key** (not LDAP/AD). +- The API key must be included with each request. +- Invalid or disabled keys are rejected. + +### 7.4 API Method Definitions +- API methods are **predefined** and managed by users with the **Design** role. +- Each method definition includes: + - **Method name**: Unique identifier for the endpoint. + - **Approved API keys**: List of API keys authorized to call this method. + - **Parameter definitions**: Name and data type for each input parameter. + - **Return value definition**: Data type and structure of the response. Supports **single objects** and **lists of objects**. + - **Timeout**: Configurable per method. Maximum execution time including routed calls to sites. +- The implementation of each method is a **C# script stored inline** in the method definition. It executes on the central cluster. No template inheritance — API scripts are standalone. +- API scripts can route calls to any instance at any site via `Route.To("instanceCode").Call("scriptName", parameters)`, read/write attributes in batch, and access databases directly. +- API scripts **cannot** call shared scripts directly (shared scripts are site-only). To invoke site logic, use `Route.To().Call()`. + +### 7.5 Availability +- The inbound API is hosted **only on the central cluster** (active node). +- On central failover, the API becomes available on the new active node. + +## 8. Central UI + +The central cluster hosts a **configuration and management UI** (no live machine data visualization, except on-demand debug views). The UI supports the following workflows: + +- **Template Authoring**: Create, edit, and manage templates including hierarchy (inheritance) and composition (feature modules). Author and manage scripts within templates. **Design-time validation** available on demand to check flattening, naming collisions, and script compilation without deploying. +- **Shared Script Management**: Create, edit, and manage the system-wide shared script library. +- **Notification List Management**: Create, edit, and manage notification lists and recipients. +- **External System Management**: Define external system contracts (connection details, API method definitions). +- **Database Connection Management**: Define named database connections for script use. +- **Inbound API Management**: Manage API keys (create, enable/disable, delete). Define API methods (name, parameters, return values, approved keys, implementation script). *(Admin role for keys, Design role for methods.)* +- **Instance Management**: Create instances from templates, bind data connections (per-attribute, with **bulk assignment** UI for selecting multiple attributes and assigning a data connection at once), set instance-level attribute overrides, assign instances to areas. **Disable** or **delete** instances. +- **Site & Data Connection Management**: Define sites, manage data connections and assign them to sites. +- **Area Management**: Define hierarchical area structures per site for organizing instances. +- **Deployment**: View diffs between deployed and current template-derived configurations, deploy updates to individual instances. Filter instances by area. Pre-deployment validation runs automatically before any deployment is sent. +- **System-Wide Artifact Deployment**: Explicitly deploy shared scripts, external system definitions, database connection definitions, and notification lists to all sites (requires Deployment role). +- **Deployment Status Monitoring**: Track whether deployments were successfully applied at site level. +- **Debug View**: On-demand real-time view of a specific instance's tag values and alarm states for troubleshooting (see 8.1). +- **Parked Message Management**: Query sites for parked messages (external system calls, notifications, and cached database writes), retry or discard them. +- **Health Monitoring Dashboard**: View site cluster health, node status, data connection health, script error rates, alarm evaluation errors, and store-and-forward buffer depths (see Section 11). +- **Site Event Log Viewer**: Query and view operational event logs from site clusters (see Section 12). + +### 8.1 Debug View +- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central subscribes to the **site-wide Akka stream** filtered by instance unique name. The site first provides a **snapshot** of all current attribute values and alarm states from the Instance Actor, then streams subsequent changes from the Akka stream. +- Attribute value stream messages are structured as: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp. +- Alarm state stream messages are structured as: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp. +- The stream continues until the engineer **closes the debug view**, at which point central unsubscribes and the site stops streaming. +- No attribute/alarm selection — the debug view always shows all tag values and alarm states for the instance. +- No special concurrency limits are required. + +## 9. Security & Access Control + +### 9.1 Authentication +- **UI users** authenticate via **LDAP/Active Directory** directly (Windows Integrated Authentication). +- **External system API callers** authenticate via **API key** (see Section 7). + +### 9.2 Authorization +- Authorization is **role-based**, with roles assigned by **LDAP group membership**. +- Roles are **independent** — they can be mixed and matched per user (via group membership). There is no implied hierarchy between roles. +- A user may hold multiple roles simultaneously (e.g., both Design and Deployment) by being a member of the corresponding LDAP groups. +- Inbound API authorization is per-method, based on **approved API key lists** (see Section 7.4). + +### 9.3 Roles +- **Admin**: System-wide permission to manage sites, data connections, LDAP group-to-role mappings, API keys, and system-level configuration. +- **Design**: System-wide permission to author and edit templates, scripts, shared scripts, external system definitions, notification lists, and inbound API method definitions. +- **Deployment**: Permission to manage instances (create, set overrides, bind connections, disable, delete) and deploy configurations to sites. Also triggers system-wide artifact deployment. Can be scoped **per site**. + +### 9.4 Role Scoping +- Admin is always **system-wide**. +- Design is always **system-wide**. +- Deployment can be **system-wide** or **site-scoped**, controlled by LDAP group membership (e.g., `Deploy-SiteA`, `Deploy-SiteB`, or `Deploy-All`). + +## 10. Audit Logging + +Audit logging is implemented as part of the **Configuration Database** component via the `IAuditService` interface. + +### 10.1 Storage +- Audit logs are stored in the **configuration MS SQL database** alongside system config data, enabling direct querying. +- Entries are **append-only** — never modified or deleted. No retention policy — retained indefinitely. + +### 10.2 Scope +All system-modifying actions are logged, including: +- **Template changes**: Create, edit, delete templates. +- **Script changes**: Template script and shared script create, edit, delete. +- **Alarm changes**: Create, edit, delete alarm definitions. +- **Instance changes**: Create, override values, bind connections, area assignment, disable, enable, delete. +- **Deployments**: Who deployed what to which instance, and the result (success/failure). +- **System-wide artifact deployments**: Who deployed shared scripts / external system definitions / DB connections / notification lists, and the result. +- **External system definition changes**: Create, edit, delete. +- **Database connection changes**: Create, edit, delete. +- **Notification list changes**: Create, edit, delete lists and recipients. +- **Inbound API changes**: API key create, enable/disable, delete. API method create, edit, delete. +- **Area changes**: Create, edit, delete area definitions. +- **Site & data connection changes**: Create, edit, delete. +- **Security/admin changes**: Role mapping changes, site permission changes. + +### 10.3 Detail Level +- Each audit log entry records the **state of the entity after the change**, serialized as JSON. Only the after-state is stored — change history is reconstructed by comparing consecutive entries for the same entity at query time. +- Each entry includes: **who** (authenticated user), **what** (action, entity type, entity ID, entity name), **when** (timestamp), and **state** (JSON after-state, null for deletes). +- **One entry per save operation** — when a user edits a template and changes multiple attributes in one save, a single entry captures the full entity state. + +### 10.4 Transactional Guarantee +- Audit entries are written **synchronously** within the same database transaction as the change (via the unit-of-work pattern). If the change succeeds, the audit entry is guaranteed to be recorded. If the change rolls back, the audit entry rolls back too. + +## 11. Health Monitoring + +### 11.1 Monitored Metrics +The central cluster monitors the health of each site cluster, including: +- **Site cluster online/offline status**: Whether the site is reachable. +- **Active vs. standby node status**: Which node is active and which is standby. +- **Data connection health**: Connected/disconnected status per data connection at the site. +- **Script error rates**: Frequency of script failures at the site. +- **Alarm evaluation errors**: Frequency of alarm evaluation failures at the site. +- **Store-and-forward buffer depth**: Number of messages currently queued (broken down by external system calls, notifications, and cached database writes). + +### 11.2 Reporting +- Site clusters **report health metrics to central** periodically. +- Health status is **visible in the central UI** — no automated alerting/notifications for now. + +## 12. Site-Level Event Logging + +### 12.1 Events Logged +Sites log operational events locally, including: +- **Script executions**: Start, complete, error (with error details). +- **Alarm events**: Alarm activated, alarm cleared (which alarm, which instance, when). Alarm evaluation errors. +- **Deployment applications**: Configuration received from central, applied successfully or failed. Script compilation results. +- **Data connection status changes**: Connected, disconnected, reconnected per connection. +- **Store-and-forward activity**: Message queued, delivered, retried, parked. +- **Instance lifecycle**: Instance enabled, disabled, deleted. + +### 12.2 Storage +- Event logs are stored in **local SQLite** on each site node. +- **Retention policy**: 30 days. Events older than 30 days are automatically purged. + +### 12.3 Central Access +- The central UI can **query site event logs remotely**, following the same pattern as parked message management — central requests data from the site over Akka.NET remoting. + +--- + +*All initial high-level requirements have been captured. This document will continue to be updated as the design evolves.* diff --git a/README.md b/README.md new file mode 100644 index 0000000..76a6578 --- /dev/null +++ b/README.md @@ -0,0 +1,95 @@ +# SCADA System — Design Documentation + +## Overview + +This document serves as the master index for the SCADA system design. The system is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology. + +## Document Map + +### Requirements +- [HighLevelReqs.md](HighLevelReqs.md) — Complete high-level requirements covering all functional areas. + +### Component Design Documents + +| # | Component | Document | Description | +|---|-----------|----------|-------------| +| 1 | Template Engine | [Component-TemplateEngine.md](Component-TemplateEngine.md) | Template modeling, inheritance, composition, attribute resolution, locking, alarms, flattening, validation, and diff calculation. | +| 2 | Deployment Manager | [Component-DeploymentManager.md](Component-DeploymentManager.md) | Central-side deployment pipeline: requesting configs, sending to sites, tracking status, system-wide artifact deployment, instance disable/delete. | +| 3 | Site Runtime | [Component-SiteRuntime.md](Component-SiteRuntime.md) | Site-side actor hierarchy: Deployment Manager singleton, Instance Actors, Script Actors, Alarm Actors, script compilation, shared script library, and the site-wide attribute/alarm Akka stream. | +| 4 | Data Connection Layer | [Component-DataConnectionLayer.md](Component-DataConnectionLayer.md) | Common data connection interface, OPC UA and custom protocol adapters, subscription management. Publishes tag value updates to Instance Actors. | +| 5 | Central–Site Communication | [Component-Communication.md](Component-Communication.md) | Akka.NET remoting/cluster topology, message patterns, request routing, and debug streaming. | +| 6 | Store-and-Forward Engine | [Component-StoreAndForward.md](Component-StoreAndForward.md) | Buffering, retry, parking, application-level replication, and SQLite persistence at sites. | +| 7 | External System Gateway | [Component-ExternalSystemGateway.md](Component-ExternalSystemGateway.md) | External system definitions, API method invocation, and database connection management. | +| 8 | Notification Service | [Component-NotificationService.md](Component-NotificationService.md) | Notification lists, email delivery, script API, and store-and-forward integration. | +| 9 | Central UI | [Component-CentralUI.md](Component-CentralUI.md) | Web-based management interface, workflows, and pages. | +| 10 | Security & Auth | [Component-Security.md](Component-Security.md) | LDAP/AD authentication, role-based authorization, and site-scoped permissions. | +| 11 | Health Monitoring | [Component-HealthMonitoring.md](Component-HealthMonitoring.md) | Site health metrics collection (including alarm evaluation errors) and central reporting. | +| 12 | Site Event Logging | [Component-SiteEventLogging.md](Component-SiteEventLogging.md) | Local operational event logs at sites with central query access. | +| 13 | Cluster Infrastructure | [Component-ClusterInfrastructure.md](Component-ClusterInfrastructure.md) | Akka.NET cluster setup, active/standby failover, and node management. | +| 14 | Inbound API | [Component-InboundAPI.md](Component-InboundAPI.md) | Web API for external systems to call in, API key auth, method definitions, script-based implementations. | +| 15 | Host | [Component-Host.md](Component-Host.md) | Single deployable binary, role-based component registration, Akka.NET bootstrap, and ASP.NET Core hosting for central nodes. | +| 16 | Commons | [Component-Commons.md](Component-Commons.md) | Shared data types, interfaces, domain entity POCOs, repository interfaces, and message contracts used across all components. | +| 17 | Configuration Database | [Component-ConfigurationDatabase.md](Component-ConfigurationDatabase.md) | EF Core data access layer, schema ownership, per-component repositories, unit-of-work, audit logging (IAuditService), and migration management for the central MS SQL configuration database. | + +### Architecture Diagram (Logical) + +``` +┌─────────────────────────────────────────────────────┐ +│ CENTRAL CLUSTER │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Template │ │Deployment│ │ Central │ │ +│ │ Engine │ │ Manager │ │ UI │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Security │ │ Audit │ │ Health │ │ +│ │ & Auth │ │ Logging │ │ Monitor │ │ +│ └──────────┘ └──────────┘ └──────────┘ │ +│ ┌──────────┐ │ +│ │ Inbound │ ◄── External Systems (API key auth) │ +│ │ API │ │ +│ └──────────┘ │ +│ ┌───────────────────────────────────┐ │ +│ │ Akka.NET Communication Layer │ │ +│ └──────────────┬────────────────────┘ │ +│ ┌───────────────────────────────────┐ │ +│ │ Configuration Database (EF) │──► MS SQL │ +│ └───────────────────────────────────┘ (Config DB)│ +│ │ Machine Data DB│ +└─────────────────┼───────────────────────────────────┘ + │ Akka.NET Remoting + ┌────────────┼────────────┐ + ▼ ▼ ▼ +┌─────────┐ ┌─────────┐ ┌─────────┐ +│ SITE A │ │ SITE B │ │ SITE N │ +│ ┌─────┐ │ │ ┌─────┐ │ │ ┌─────┐ │ +│ │Data │ │ │ │Data │ │ │ │Data │ │ +│ │Conn │ │ │ │Conn │ │ │ │Conn │ │ +│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ +│ │Site │ │ │ │Site │ │ │ │Site │ │ +│ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │ +│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │ +│ │S&F │ │ │ │S&F │ │ │ │S&F │ │ +│ │Engine│ │ │ │Engine│ │ │ │Engine│ │ +│ └─────┘ │ │ └─────┘ │ │ └─────┘ │ +│ SQLite │ │ SQLite │ │ SQLite │ +└─────────┘ └─────────┘ └─────────┘ +``` + +### Site Runtime Actor Hierarchy + +``` +Deployment Manager Singleton (Cluster Singleton) +├── Instance Actor (one per deployed, enabled instance) +│ ├── Script Actor (coordinator, one per instance script) +│ │ └── Script Execution Actor (short-lived, per invocation) +│ ├── Alarm Actor (coordinator, one per alarm definition) +│ │ └── Alarm Execution Actor (short-lived, per on-trigger invocation) +│ └── ... (more Script/Alarm Actors) +├── Instance Actor +│ └── ... +└── ... (more Instance Actors) + +Site-Wide Akka Stream (attribute + alarm state changes) +├── All Instance Actors publish to the stream +└── Debug view subscribes with instance-level filtering +```