Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.
498 lines
43 KiB
Markdown
498 lines
43 KiB
Markdown
# SCADA System - High Level Requirements
|
||
|
||
## 1. Deployment Architecture
|
||
|
||
- **Site Clusters**: 2-node failover clusters deployed at each site, running on Windows.
|
||
- **Central Cluster**: A single 2-node failover cluster serving as the central hub.
|
||
- **Communication Topology**: Hub-and-spoke. Central cluster communicates with each site cluster. Site clusters do **not** communicate with one another.
|
||
|
||
### 1.1 Central vs. Site Responsibilities
|
||
- **Central cluster** is the single source of truth for all template authoring, configuration, and deployment decisions.
|
||
- **Site clusters** receive **flattened configurations** — fully resolved attribute sets with no template structure. Sites do not need to understand templates, inheritance, or composition.
|
||
- Sites **do not** support local/emergency configuration overrides. All configuration changes originate from central.
|
||
|
||
### 1.2 Failover
|
||
- Failover is managed at the **application level** using **Akka.NET** (not Windows Server Failover Clustering).
|
||
- Each cluster (central and site) runs an **active/standby** pair where Akka.NET manages node roles and failover detection.
|
||
- **Site failover**: The standby node takes over data collection and script execution seamlessly, including responsibility for the store-and-forward buffers. The Site Runtime Deployment Manager singleton is restarted on the new active node, which reads deployed configurations from local SQLite and re-creates the full Instance Actor hierarchy.
|
||
- **Central failover**: The standby node takes over central responsibilities. Deployments that are in-progress during a failover are treated as **failed** and must be re-initiated by the engineer.
|
||
|
||
### 1.3 Store-and-Forward Persistence (Site Clusters Only)
|
||
- Store-and-forward applies **only at site clusters** — the central cluster does **not** buffer messages. If a site is unreachable, operations from central fail and must be retried by the engineer.
|
||
- All site-level store-and-forward buffers (external system calls, notifications, and cached database writes) are **replicated between the two site cluster nodes** using **application-level replication** over Akka.NET remoting.
|
||
- The **active node** persists buffered messages to a **local SQLite database** and forwards them to the standby node, which maintains its own local SQLite copy.
|
||
- On failover, the standby node already has a replicated copy of the buffer and takes over delivery seamlessly.
|
||
- Successfully delivered messages are removed from both nodes' local stores.
|
||
- There is **no maximum buffer size** — messages accumulate until they either succeed or exhaust retries and are parked.
|
||
- Retry intervals are **fixed** (not exponential backoff). The fixed interval is sufficient for the expected use cases.
|
||
|
||
### 1.4 Deployment Behavior
|
||
- When central deploys a new configuration to a site instance, the site **applies it immediately** upon receipt — no local operator confirmation is required.
|
||
- If a site loses connectivity to central, it **continues operating** with its last received deployed configuration.
|
||
- The site reports back to central whether deployment was successfully applied.
|
||
- **Pre-deployment validation**: Before any deployment is sent to a site, the central cluster performs comprehensive validation including flattening the configuration, test-compiling all scripts, verifying alarm trigger references, verifying script trigger references, and checking data connection binding completeness (see Section 3.11).
|
||
|
||
### 1.5 System-Wide Artifact Deployment
|
||
- Changes to shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration are **not automatically propagated** to sites.
|
||
- Deployment of system-wide artifacts requires **explicit action** by a user with the **Deployment** role.
|
||
- Artifacts can be deployed to **all sites at once** or to an **individual site** (per-site deployment).
|
||
- The Design role manages the definitions; the Deployment role triggers deployment to sites. A user may hold both roles.
|
||
|
||
## 2. Data Storage & Data Flow
|
||
|
||
### 2.1 Central Databases (MS SQL)
|
||
- **Configuration Database**: A dedicated database for system-specific configuration data (e.g., templates, site definitions, instance configurations, system settings).
|
||
- **Machine Data Database**: A separate database for collected machine data (e.g., telemetry, measurements, events).
|
||
|
||
### 2.2 Communication: Central ↔ Site
|
||
- Central-to-site and site-to-central communication uses **Akka.NET ClusterClient/ClusterClientReceptionist** for cross-cluster messaging with automatic failover.
|
||
- **Site addressing**: Site Akka base addresses (NodeA and NodeB) are stored in the **Sites database table** and configured via the Central UI. Central creates a ClusterClient per site using both addresses as contact points (cached in memory, refreshed periodically and on admin changes) rather than relying on runtime registration messages from sites.
|
||
- **Central contact points**: Sites configure **multiple central contact points** (both central node addresses) for redundancy. ClusterClient handles failover between central nodes automatically.
|
||
- **Central as integration hub**: Central brokers requests between external systems and sites. For example, a recipe manager sends a recipe to central, which routes it to the appropriate site. MES requests machine values from central, which routes the request to the site and returns the response.
|
||
- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes to the site-wide Akka stream filtered by instance (see Section 8.1).
|
||
|
||
### 2.3 Site-Level Storage & Interface
|
||
- Sites have **no user interface** — they are headless collectors, forwarders, and script executors.
|
||
- Sites require local storage for: the current deployed (flattened) configurations, deployed scripts, shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration.
|
||
- After artifact deployment, sites are **fully self-contained** — all runtime configuration is read from local SQLite. Sites do **not** access the central configuration database at runtime.
|
||
- Store-and-forward buffers are persisted to a **local SQLite database on each node** and replicated between nodes via application-level replication (see 1.3).
|
||
|
||
### 2.4 Data Connection Protocols
|
||
- The system supports **OPC UA** and **LmxProxy** (a gRPC-based custom protocol with an existing client SDK).
|
||
- Both protocols implement a **common interface** supporting: connect, subscribe to tag paths, receive value updates, and write values.
|
||
- Additional protocols can be added by implementing the common interface.
|
||
- The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions.
|
||
- **Initial attribute quality**: Attributes bound to a data connection start with **uncertain** quality when the Instance Actor initializes. The quality remains uncertain until the first value update is received from the Data Connection Layer. This distinguishes "never received a value" from "received a known-good value" or "connection lost" (bad quality).
|
||
|
||
### 2.5 Scale
|
||
- Approximately **10 sites**.
|
||
- **50–500 machines per site**.
|
||
- **25–75 live data point tags per machine**.
|
||
|
||
## 3. Template & Machine Modeling
|
||
|
||
### 3.1 Template Structure
|
||
- Machines are modeled as **instances of templates**.
|
||
- Templates define a set of **attributes**.
|
||
- Each attribute has a **lock flag** that controls whether it can be overridden downstream.
|
||
|
||
### 3.2 Attribute Definition
|
||
Each attribute carries the following metadata:
|
||
- **Name**: Identifier for the attribute.
|
||
- **Value**: The default or configured value. May be empty if intended to be set at the instance level.
|
||
- **Data Type**: The value's type. Fixed set: Boolean, Integer, Float, String.
|
||
- **Lock Flag**: Controls whether the attribute can be overridden downstream.
|
||
- **Description**: Human-readable explanation of the attribute's purpose.
|
||
- **Data Source Reference** *(optional)*: A **relative path** within a data connection (e.g., `/Motor/Speed`). The template defines *what* to read — the path relative to a data connection. The template does **not** specify which data connection to use; that is an instance-level concern (see Section 3.3). Attributes without a data source reference are static configuration values.
|
||
|
||
### 3.3 Data Connections
|
||
- **Data connections** are reusable, named resources defined centrally and then **assigned to specific sites** (e.g., an OPC server, a PLC endpoint). Data connection definitions are deployed to sites as part of **artifact deployment** (see Section 1.5) and stored in local SQLite.
|
||
- A data connection encapsulates the details needed to communicate with a data source (protocol, address, credentials, etc.).
|
||
- Attributes with a data source reference must be **bound to a data connection at instance creation** — the template defines *what* to read (the relative path), and the instance specifies *where* to read it from (the data connection assigned to the site).
|
||
- **Binding is per-attribute**: Each attribute with a data source reference individually selects its data connection. Different attributes on the same instance may use different data connections. The Central UI supports bulk assignment (selecting multiple attributes and assigning a data connection to all of them at once) to reduce tedium.
|
||
- Templates do **not** specify a default connection. The connection binding is an instance-level concern.
|
||
- The flattened configuration sent to a site resolves connection references into concrete connection details paired with attribute relative paths.
|
||
- Data connection names are **not** standardized across sites — different sites may have different data connection names for equivalent devices.
|
||
|
||
### 3.4 Alarm Definitions
|
||
Alarms are **first-class template members** alongside attributes and scripts, following the same **inheritance, override, and lock rules**.
|
||
|
||
Each alarm has:
|
||
- **Name**: Identifier for the alarm.
|
||
- **Description**: Human-readable explanation of the alarm condition.
|
||
- **Priority Level**: Numeric value from 0–1000.
|
||
- **Lock Flag**: Controls whether the alarm can be overridden downstream.
|
||
- **Trigger Definition**: One of the following trigger types:
|
||
- **Value Match**: Triggers when a monitored attribute equals a predefined value.
|
||
- **Range Violation**: Triggers when a monitored attribute value falls outside an allowed range.
|
||
- **Rate of Change**: Triggers when a monitored attribute value changes faster than a defined threshold.
|
||
- **On-Trigger Script** *(optional)*: A script to execute when the alarm triggers. The alarm on-trigger script executes in the context of the instance and can call instance scripts, but instance scripts **cannot** call alarm on-trigger scripts. The call direction is one-way.
|
||
|
||
### 3.4.1 Alarm State
|
||
- Alarm state (active/normal) is **managed at the site level** per instance, held **in memory** by the Alarm Actor.
|
||
- When the alarm condition clears, the alarm **automatically returns to normal state** — no acknowledgment workflow is required.
|
||
- Alarm state is **not persisted** — on restart, alarm states are re-evaluated from incoming values.
|
||
- Alarm state changes are published to the site-wide Akka stream as `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
|
||
|
||
### 3.5 Template Relationships
|
||
|
||
Templates participate in two distinct relationship types:
|
||
|
||
- **Inheritance (is-a)**: A child template extends a parent template. The child inherits all attributes, alarms, scripts, and composed feature modules from the parent. The child can:
|
||
- Override the **values** of non-locked inherited attributes, alarms, and scripts.
|
||
- **Add** new attributes, alarms, or scripts not present in the parent.
|
||
- **Not** remove attributes, alarms, or scripts defined by the parent.
|
||
- **Composition (has-a)**: A template can nest an instance of another template as a **feature module** (e.g., embedding a RecipeSystem module inside a base machine template). Feature modules can themselves compose other feature modules **recursively**.
|
||
- **Naming collisions**: If a template composes two feature modules that each define an attribute, alarm, or script with the same name, this is a **design-time error**. The system must detect and report the collision, and the template cannot be saved until the conflict is resolved.
|
||
|
||
### 3.6 Locking
|
||
- Locking applies to **attributes, alarms, and scripts** uniformly.
|
||
- Any of these can be **locked** at the level where it is defined or overridden.
|
||
- A locked attribute **cannot** be overridden by any downstream level (child templates, composing templates, or instances).
|
||
- An unlocked attribute **can** be overridden by any downstream level.
|
||
- **Intermediate locking**: Any level in the chain can lock an attribute that was unlocked upstream. Once locked, it remains locked for all levels below — a downstream level **cannot** unlock an attribute locked above it.
|
||
|
||
### 3.6 Attribute Resolution Order
|
||
Attributes are resolved from most-specific to least-specific. The first value encountered wins:
|
||
|
||
1. **Instance** (site-deployed machine)
|
||
2. **Child Template** (most derived first, walking up the inheritance chain)
|
||
3. **Composing Template** (the template that embeds a feature module can override the module's attributes)
|
||
4. **Composed Module** (the original feature module definition, recursively resolved if modules nest other modules)
|
||
|
||
At any level, an override is only permitted if the attribute has **not been locked** at a higher-priority level.
|
||
|
||
### 3.7 Override Scope
|
||
- **Inheritance**: Child templates can override non-locked attributes from their parent, including attributes originating from composed feature modules.
|
||
- **Composition**: A template that composes a feature module can override non-locked attributes within that module.
|
||
- Overrides can "pierce" into composed modules — a child template can override attributes inside a feature module it inherited from its parent.
|
||
|
||
### 3.8 Instance Rules
|
||
- An instance is a deployed occurrence of a template at a site.
|
||
- Instances **can** override the values of non-locked attributes.
|
||
- Instances **cannot** add new attributes.
|
||
- Instances **cannot** remove attributes.
|
||
- The instance's structure (which attributes exist, which feature modules are composed) is strictly defined by its template.
|
||
- Each instance is **assigned to an area** within its site (see 3.10).
|
||
|
||
### 3.8.1 Instance Lifecycle
|
||
- Instances can be in one of two states: **enabled** or **disabled**.
|
||
- **Enabled**: The instance is active at the site — data subscriptions, script triggers, and alarm evaluation are all running.
|
||
- **Disabled**: The site **stops** script triggers, data subscriptions (no live data collection), and alarm evaluation. The deployed configuration is **retained** on the site so the instance can be re-enabled without redeployment. Store-and-forward messages for a disabled instance **continue to drain** (deliver pending messages).
|
||
- **Deletion**: Instances can be deleted. Deletion removes the running configuration from the site, stops subscriptions, and destroys the Instance Actor and its children. Store-and-forward messages are **not** cleared on deletion — they continue to be delivered or can be managed (retried/discarded) via parked message management. If the site is unreachable when a delete is triggered, the deletion **fails** (same behavior as a failed deployment). The central side does not mark it as deleted until the site confirms.
|
||
- Templates **cannot** be deleted if any instances or child templates reference them. The user must remove all references first.
|
||
|
||
### 3.9 Template Deployment & Change Propagation
|
||
- Template changes are **not** automatically propagated to deployed instances.
|
||
- The system maintains two views of each instance:
|
||
- **Deployed Configuration**: The currently active configuration on the instance, as it was last explicitly deployed.
|
||
- **Template-Derived Configuration**: The configuration the instance *would* have based on the current state of its template (including resolved inheritance, composition, and overrides).
|
||
- Deployment is performed at the **individual instance level** — an engineer explicitly commands the system to update a specific instance.
|
||
- The system must be able to **show differences** between the deployed configuration and the current template-derived configuration, allowing engineers to see what would change before deploying.
|
||
- **No rollback** support is required. The system only needs to track the current deployed state, not a history of prior deployments.
|
||
- **Concurrent editing**: Template editing uses a **last-write-wins** model. No pessimistic locking or optimistic concurrency conflict detection is required.
|
||
|
||
### 3.10 Areas
|
||
- Areas are **predefined hierarchical groupings** associated with a site, stored in the configuration database.
|
||
- Areas support **parent-child relationships** (e.g., Plant → Building → Production Line → Cell).
|
||
- Each instance is assigned to an area within its site.
|
||
- Areas are used for **filtering and finding instances** in the central UI.
|
||
- Area definitions are managed by users with the **Admin** role.
|
||
|
||
### 3.11 Pre-Deployment Validation
|
||
|
||
Before any deployment is sent to a site, the central cluster performs **comprehensive validation**. Validation covers:
|
||
|
||
- **Flattening**: The full template hierarchy is resolved and flattened successfully.
|
||
- **Naming collision detection**: No duplicate attribute, alarm, or script names exist in the flattened configuration.
|
||
- **Script compilation**: All instance scripts and alarm on-trigger scripts are test-compiled and must compile without errors.
|
||
- **Alarm trigger references**: Alarm trigger definitions reference attributes that exist in the flattened configuration.
|
||
- **Script trigger references**: Script triggers (value change, conditional) reference attributes that exist in the flattened configuration.
|
||
- **Data connection binding completeness**: Every attribute with a data source reference has a data connection binding assigned on the instance, and the bound data connection name exists as a defined connection at the instance's site.
|
||
- **Exception**: Validation does **not** verify that data source relative paths resolve to real tags on physical devices — that is a runtime concern that can only be determined at the site.
|
||
|
||
Validation is also available **on demand in the Central UI** for Design users during template authoring, providing early feedback without requiring a deployment attempt.
|
||
|
||
For **shared scripts**, pre-compilation validation is performed before deployment to sites. Since shared scripts have no instance context, validation is limited to C# syntax and structural correctness.
|
||
|
||
## 4. Scripting
|
||
|
||
### 4.1 Script Definitions
|
||
- Scripts are **C#** and are defined at the **template level** as first-class template members.
|
||
- Scripts follow the same **inheritance, override, and lock rules** as attributes. A parent template can define a script, a child template can override it (if not locked), and any level can lock a script to prevent downstream changes.
|
||
- Scripts are deployed to sites as part of the flattened instance configuration.
|
||
- Scripts are **compiled at the site** when a deployment is received. Pre-compilation validation occurs at central before deployment (see Section 3.11), but the site performs the actual compilation for execution.
|
||
- Scripts can optionally define **input parameters** (name and data type per parameter). Scripts without parameter definitions accept no arguments.
|
||
- Scripts can optionally define a **return value definition** (field names and data types). Return values support **single objects** and **lists of objects**. Scripts without a return definition return void.
|
||
- Return values are used when scripts are called explicitly by other scripts (via `Instance.CallScript()` or `Scripts.CallShared()`) or by the Inbound API (via `Route.To().Call()`). When invoked by a trigger (interval, value change, conditional, alarm), any return value is discarded.
|
||
|
||
### 4.2 Script Triggers
|
||
Scripts can be triggered by:
|
||
- **Interval**: Execute on a recurring time schedule.
|
||
- **Value Change**: Execute when a specific instance attribute value changes.
|
||
- **Conditional**: Execute when an instance attribute value equals or does not equal a given value.
|
||
|
||
Scripts have an optional **minimum time between runs** setting. If a trigger fires before the minimum interval has elapsed since the last execution, the invocation is skipped.
|
||
|
||
### 4.3 Script Error Handling
|
||
- If a script fails (unhandled exception, timeout, etc.), the failure is **logged locally** at the site.
|
||
- The script is **not disabled** — it remains active and will fire on the next qualifying trigger event.
|
||
- Script failures are **not reported to central**. Diagnostics are local only.
|
||
- For external system call failures within scripts, store-and-forward handling (Section 5.3) applies independently of script error handling.
|
||
|
||
### 4.4 Script Capabilities
|
||
Scripts executing on a site for a given instance can:
|
||
- **Read** attribute values on that instance (live data points and static config).
|
||
- **Write** attribute values on that instance. For attributes with a data source reference, the write goes to the Data Connection Layer which writes to the physical device; the in-memory value updates when the device confirms the new value via the existing subscription. For static attributes, the write updates the in-memory value and **persists the override to local SQLite** — the value survives restart and failover. Persisted overrides are reset when the instance is redeployed.
|
||
- **Call other scripts** on that instance via `Instance.CallScript("scriptName", params)`. Calls use the Akka ask pattern and return the called script's return value. Script-to-script calls support concurrent execution.
|
||
- **Call shared scripts** via `Scripts.CallShared("scriptName", params)`. Shared scripts execute **inline** in the calling Script Actor's context — they are compiled code libraries, not separate actors.
|
||
- **Call external system API methods** in two modes: `ExternalSystem.Call()` for synchronous request/response, or `ExternalSystem.CachedCall()` for fire-and-forget with store-and-forward on transient failure (see Section 5).
|
||
- **Send notifications** (see Section 6).
|
||
- **Access databases** by requesting an MS SQL client connection by name (see Section 5.5).
|
||
|
||
Scripts **cannot** access other instances' attributes or scripts.
|
||
|
||
### 4.4.1 Script Call Recursion Limit
|
||
- Script-to-script calls (via `Instance.CallScript` and `Scripts.CallShared`) enforce a **maximum recursion depth** to prevent infinite loops.
|
||
- The default maximum depth is a reasonable limit (e.g., 10 levels).
|
||
- The current call depth is tracked and incremented with each nested call. If the limit is reached, the call fails with an error logged to the site event log.
|
||
- This applies to all script call chains including alarm on-trigger scripts calling instance scripts.
|
||
|
||
### 4.5 Shared Scripts
|
||
- Shared scripts are **not associated with any template** — they are a **system-wide library** of reusable C# scripts.
|
||
- Shared scripts can optionally define **input parameters** and **return value definitions**, following the same rules as template-level scripts.
|
||
- Managed by users with the **Design** role.
|
||
- Deployed to **all sites** for use by any instance script (deployment requires explicit action by a user with the Deployment role).
|
||
- Shared scripts execute **inline** in the calling Script Actor's context as compiled code. They are not separate actors. This avoids serialization bottlenecks and messaging overhead.
|
||
- Shared scripts are **not available on the central cluster** — Inbound API scripts cannot call them directly. To execute shared script logic, route to a site instance via `Route.To().Call()`.
|
||
|
||
### 4.6 Alarm On-Trigger Scripts
|
||
- Alarm on-trigger scripts are defined as part of the alarm definition and execute when the alarm activates.
|
||
- They execute directly in the Alarm Actor's context (via a short-lived Alarm Execution Actor), similar to how shared scripts execute inline.
|
||
- Alarm on-trigger scripts **can** call instance scripts via `Instance.CallScript()`, which sends an ask message to the appropriate sibling Script Actor.
|
||
- Instance scripts **cannot** call alarm on-trigger scripts — the call direction is one-way.
|
||
- The recursion depth limit applies to alarm-to-instance script call chains.
|
||
|
||
## 5. External System Integrations
|
||
|
||
### 5.1 External System Definitions
|
||
- External systems are **predefined contracts** created by users with the **Design** role.
|
||
- Each definition includes:
|
||
- **Connection details**: Endpoint URL, authentication, protocol information.
|
||
- **Method definitions**: Available API methods with defined parameters and return types.
|
||
- Definitions are deployed **uniformly to all sites** — no per-site connection detail overrides.
|
||
- Deployment of definition changes requires **explicit action** by a user with the Deployment role.
|
||
- At the site, external system definitions are read from **local SQLite** (populated by artifact deployment), not from the central config DB.
|
||
|
||
### 5.2 Site-to-External-System Communication
|
||
- Sites communicate with external systems **directly** (not routed through central).
|
||
- Scripts invoke external system methods by referencing the predefined definitions.
|
||
|
||
### 5.3 Store-and-Forward for External Calls
|
||
- If an external system is unavailable when a script invokes a method, the message is **buffered locally at the site**.
|
||
- Retry is performed **per message** — individual failed messages retry independently.
|
||
- Each external system definition includes configurable retry settings:
|
||
- **Max retry count**: Maximum number of retry attempts before giving up.
|
||
- **Time between retries**: Fixed interval between retry attempts (no exponential backoff).
|
||
- After max retries are exhausted, the message is **parked** (dead-lettered) for manual review.
|
||
- There is **no maximum buffer size** — messages accumulate until delivery succeeds or retries are exhausted.
|
||
|
||
### 5.4 Parked Message Management
|
||
- Parked messages are **stored at the site** where they originated.
|
||
- The **central UI** can **query sites** for parked messages and manage them remotely.
|
||
- Operators can **retry** or **discard** parked messages from the central UI.
|
||
- Parked message management covers **external system calls**, **notifications**, and **cached database writes**.
|
||
|
||
### 5.5 Database Connections
|
||
- Database connections are **predefined, named resources** created by users with the **Design** role.
|
||
- Each definition includes the connection details needed to connect to an MS SQL database (server, database name, credentials, etc.).
|
||
- Each definition includes configurable retry settings (same pattern as external systems): **max retry count** and **time between retries** (fixed interval).
|
||
- Definitions are deployed **uniformly to all sites** — no per-site overrides.
|
||
- Deployment of definition changes requires **explicit action** by a user with the Deployment role.
|
||
- At the site, database connection definitions are read from **local SQLite** (populated by artifact deployment), not from the central config DB.
|
||
|
||
### 5.6 Database Access Modes
|
||
Scripts can interact with databases in two modes:
|
||
|
||
- **Real-time (synchronous)**: Scripts request a **raw MS SQL client connection by name** (e.g., `Database.Connection("MES_DB")`), giving script authors full ADO.NET-level control for immediate queries and updates.
|
||
- **Cached write (store-and-forward)**: Scripts submit a write operation for deferred, reliable delivery. The cached entry stores the **database connection name**, the **SQL statement to execute**, and **parameter values**. If the database is unavailable, the write is buffered locally at the site and retried per the connection's retry settings. After max retries are exhausted, the write is **parked** for manual review (managed via central UI alongside other parked messages).
|
||
|
||
## 6. Notifications
|
||
|
||
### 6.1 Notification Lists
|
||
- Notification lists are **system-wide**, managed by users with the **Design** role.
|
||
- Each list has a **name** and contains one or more **recipients**.
|
||
- Each recipient has a **name** and an **email address**.
|
||
- Notification lists are deployed to **all sites** (deployment requires explicit action by a user with the Deployment role).
|
||
- At the site, notification lists and recipients are read from **local SQLite** (populated by artifact deployment), not from the central config DB.
|
||
|
||
### 6.2 Email Support
|
||
- The system has **predefined support for sending email** as the notification delivery mechanism.
|
||
- Email server configuration (SMTP settings) is defined centrally and deployed to all sites as part of **artifact deployment** (see Section 1.5). Sites read SMTP configuration from **local SQLite**.
|
||
|
||
### 6.3 Script API
|
||
- Scripts send notifications using a simplified API: `Notify.To("list name").Send("subject", "message")`
|
||
- This API is available to instance scripts, alarm on-trigger scripts, and shared scripts.
|
||
|
||
### 6.4 Store-and-Forward for Notifications
|
||
- If the email server is unavailable, notifications are **buffered locally at the site**.
|
||
- Follows the same retry pattern as external system calls: configurable **max retry count** and **time between retries** (fixed interval).
|
||
- After max retries are exhausted, the notification is **parked** for manual review (managed via central UI alongside external system parked messages).
|
||
- There is **no maximum buffer size** for notification messages.
|
||
|
||
## 7. Inbound API (Central)
|
||
|
||
### 7.1 Purpose
|
||
The system exposes a **web API on the central cluster** for external systems to call into the SCADA system. This is the counterpart to the outbound External System Integrations (Section 5) — where Section 5 defines how the system calls out, this section defines how external systems call in.
|
||
|
||
### 7.2 API Key Management
|
||
- API keys are stored in the **configuration database**.
|
||
- Each API key has a **name/label** (for identification), the **key value**, and an **enabled/disabled** flag.
|
||
- API keys are managed by users with the **Admin** role.
|
||
|
||
### 7.3 Authentication
|
||
- Inbound API requests are authenticated via **API key** (not LDAP/AD).
|
||
- The API key must be included with each request.
|
||
- Invalid or disabled keys are rejected.
|
||
|
||
### 7.4 API Method Definitions
|
||
- API methods are **predefined** and managed by users with the **Design** role.
|
||
- Each method definition includes:
|
||
- **Method name**: Unique identifier for the endpoint.
|
||
- **Approved API keys**: List of API keys authorized to call this method.
|
||
- **Parameter definitions**: Name and data type for each input parameter.
|
||
- **Return value definition**: Data type and structure of the response. Supports **single objects** and **lists of objects**.
|
||
- **Timeout**: Configurable per method. Maximum execution time including routed calls to sites.
|
||
- The implementation of each method is a **C# script stored inline** in the method definition. It executes on the central cluster. No template inheritance — API scripts are standalone.
|
||
- API scripts can route calls to any instance at any site via `Route.To("instanceCode").Call("scriptName", parameters)`, read/write attributes in batch, and access databases directly.
|
||
- API scripts **cannot** call shared scripts directly (shared scripts are site-only). To invoke site logic, use `Route.To().Call()`.
|
||
|
||
### 7.5 Availability
|
||
- The inbound API is hosted **only on the central cluster** (active node).
|
||
- On central failover, the API becomes available on the new active node.
|
||
|
||
## 8. Central UI
|
||
|
||
The central cluster hosts a **configuration and management UI** (no live machine data visualization, except on-demand debug views). The UI supports the following workflows:
|
||
|
||
- **Template Authoring**: Create, edit, and manage templates including hierarchy (inheritance) and composition (feature modules). Author and manage scripts within templates. **Design-time validation** available on demand to check flattening, naming collisions, and script compilation without deploying.
|
||
- **Shared Script Management**: Create, edit, and manage the system-wide shared script library.
|
||
- **Notification List Management**: Create, edit, and manage notification lists and recipients.
|
||
- **External System Management**: Define external system contracts (connection details, API method definitions).
|
||
- **Database Connection Management**: Define named database connections for script use.
|
||
- **Inbound API Management**: Manage API keys (create, enable/disable, delete). Define API methods (name, parameters, return values, approved keys, implementation script). *(Admin role for keys, Design role for methods.)*
|
||
- **Instance Management**: Create instances from templates, bind data connections (per-attribute, with **bulk assignment** UI for selecting multiple attributes and assigning a data connection at once), set instance-level attribute overrides, assign instances to areas. **Disable** or **delete** instances.
|
||
- **Site & Data Connection Management**: Define sites (including optional NodeAAddress and NodeBAddress fields for Akka remoting paths), manage data connections and assign them to sites.
|
||
- **Area Management**: Define hierarchical area structures per site for organizing instances.
|
||
- **Deployment**: View diffs between deployed and current template-derived configurations, deploy updates to individual instances. Filter instances by area. Pre-deployment validation runs automatically before any deployment is sent.
|
||
- **System-Wide Artifact Deployment**: Explicitly deploy shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration to all sites or to an individual site (requires Deployment role). Per-site deployment is available via the Sites admin page.
|
||
- **Deployment Status Monitoring**: Track whether deployments were successfully applied at site level.
|
||
- **Debug View**: On-demand real-time view of a specific instance's tag values and alarm states for troubleshooting (see 8.1).
|
||
- **Parked Message Management**: Query sites for parked messages (external system calls, notifications, and cached database writes), retry or discard them.
|
||
- **Health Monitoring Dashboard**: View site cluster health, node status, data connection health, script error rates, alarm evaluation errors, and store-and-forward buffer depths (see Section 11).
|
||
- **Site Event Log Viewer**: Query and view operational event logs from site clusters (see Section 12).
|
||
|
||
### 8.1 Debug View
|
||
- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central subscribes to the **site-wide Akka stream** filtered by instance unique name. The site first provides a **snapshot** of all current attribute values and alarm states from the Instance Actor, then streams subsequent changes from the Akka stream.
|
||
- Attribute value stream messages are structured as: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp.
|
||
- Alarm state stream messages are structured as: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
|
||
- The stream continues until the engineer **closes the debug view**, at which point central unsubscribes and the site stops streaming.
|
||
- No attribute/alarm selection — the debug view always shows all tag values and alarm states for the instance.
|
||
- No special concurrency limits are required.
|
||
|
||
## 9. Security & Access Control
|
||
|
||
### 9.1 Authentication
|
||
- **UI users** authenticate via **username/password** validated directly against **LDAP/Active Directory**. Sessions are maintained via JWT tokens.
|
||
- **External system API callers** authenticate via **API key** (see Section 7).
|
||
|
||
### 9.2 Authorization
|
||
- Authorization is **role-based**, with roles assigned by **LDAP group membership**.
|
||
- Roles are **independent** — they can be mixed and matched per user (via group membership). There is no implied hierarchy between roles.
|
||
- A user may hold multiple roles simultaneously (e.g., both Design and Deployment) by being a member of the corresponding LDAP groups.
|
||
- Inbound API authorization is per-method, based on **approved API key lists** (see Section 7.4).
|
||
|
||
### 9.3 Roles
|
||
- **Admin**: System-wide permission to manage sites, data connections, LDAP group-to-role mappings, API keys, and system-level configuration.
|
||
- **Design**: System-wide permission to author and edit templates, scripts, shared scripts, external system definitions, notification lists, and inbound API method definitions.
|
||
- **Deployment**: Permission to manage instances (create, set overrides, bind connections, disable, delete) and deploy configurations to sites. Also triggers system-wide artifact deployment. Can be scoped **per site**.
|
||
|
||
### 9.4 Role Scoping
|
||
- Admin is always **system-wide**.
|
||
- Design is always **system-wide**.
|
||
- Deployment can be **system-wide** or **site-scoped**, controlled by LDAP group membership (e.g., `Deploy-SiteA`, `Deploy-SiteB`, or `Deploy-All`).
|
||
|
||
## 10. Audit Logging
|
||
|
||
Audit logging is implemented as part of the **Configuration Database** component via the `IAuditService` interface.
|
||
|
||
### 10.1 Storage
|
||
- Audit logs are stored in the **configuration MS SQL database** alongside system config data, enabling direct querying.
|
||
- Entries are **append-only** — never modified or deleted. No retention policy — retained indefinitely.
|
||
|
||
### 10.2 Scope
|
||
All system-modifying actions are logged, including:
|
||
- **Template changes**: Create, edit, delete templates.
|
||
- **Script changes**: Template script and shared script create, edit, delete.
|
||
- **Alarm changes**: Create, edit, delete alarm definitions.
|
||
- **Instance changes**: Create, override values, bind connections, area assignment, disable, enable, delete.
|
||
- **Deployments**: Who deployed what to which instance, and the result (success/failure).
|
||
- **System-wide artifact deployments**: Who deployed shared scripts / external system definitions / DB connections / data connections / notification lists / SMTP config, to which site(s), and the result.
|
||
- **External system definition changes**: Create, edit, delete.
|
||
- **Database connection changes**: Create, edit, delete.
|
||
- **Notification list changes**: Create, edit, delete lists and recipients.
|
||
- **Inbound API changes**: API key create, enable/disable, delete. API method create, edit, delete.
|
||
- **Area changes**: Create, edit, delete area definitions.
|
||
- **Site & data connection changes**: Create, edit, delete.
|
||
- **Security/admin changes**: Role mapping changes, site permission changes.
|
||
|
||
### 10.3 Detail Level
|
||
- Each audit log entry records the **state of the entity after the change**, serialized as JSON. Only the after-state is stored — change history is reconstructed by comparing consecutive entries for the same entity at query time.
|
||
- Each entry includes: **who** (authenticated user), **what** (action, entity type, entity ID, entity name), **when** (timestamp), and **state** (JSON after-state, null for deletes).
|
||
- **One entry per save operation** — when a user edits a template and changes multiple attributes in one save, a single entry captures the full entity state.
|
||
|
||
### 10.4 Transactional Guarantee
|
||
- Audit entries are written **synchronously** within the same database transaction as the change (via the unit-of-work pattern). If the change succeeds, the audit entry is guaranteed to be recorded. If the change rolls back, the audit entry rolls back too.
|
||
|
||
## 11. Health Monitoring
|
||
|
||
### 11.1 Monitored Metrics
|
||
The central cluster monitors the health of each site cluster, including:
|
||
- **Site cluster online/offline status**: Whether the site is reachable.
|
||
- **Active vs. standby node status**: Which node is active and which is standby.
|
||
- **Data connection health**: Connected/disconnected status per data connection at the site.
|
||
- **Script error rates**: Frequency of script failures at the site.
|
||
- **Alarm evaluation errors**: Frequency of alarm evaluation failures at the site.
|
||
- **Store-and-forward buffer depth**: Number of messages currently queued (broken down by external system calls, notifications, and cached database writes).
|
||
|
||
### 11.2 Reporting
|
||
- Site clusters **report health metrics to central** periodically.
|
||
- Health status is **visible in the central UI** — no automated alerting/notifications for now.
|
||
|
||
## 12. Site-Level Event Logging
|
||
|
||
### 12.1 Events Logged
|
||
Sites log operational events locally, including:
|
||
- **Script executions**: Start, complete, error (with error details).
|
||
- **Alarm events**: Alarm activated, alarm cleared (which alarm, which instance, when). Alarm evaluation errors.
|
||
- **Deployment applications**: Configuration received from central, applied successfully or failed. Script compilation results.
|
||
- **Data connection status changes**: Connected, disconnected, reconnected per connection.
|
||
- **Store-and-forward activity**: Message queued, delivered, retried, parked.
|
||
- **Instance lifecycle**: Instance enabled, disabled, deleted.
|
||
|
||
### 12.2 Storage
|
||
- Event logs are stored in **local SQLite** on each site node.
|
||
- **Retention policy**: 30 days. Events older than 30 days are automatically purged.
|
||
|
||
### 12.3 Central Access
|
||
- The central UI can **query site event logs remotely**, following the same pattern as parked message management — central requests data from the site over Akka.NET remoting.
|
||
|
||
## 13. Management Service & CLI
|
||
|
||
### 13.1 Management Service
|
||
- The central cluster exposes a **ManagementActor** that provides programmatic access to all administrative operations — the same operations available through the Central UI.
|
||
- The ManagementActor registers with Akka.NET **ClusterClientReceptionist**, allowing external tools to communicate with it via ClusterClient without joining the cluster.
|
||
- The ManagementActor enforces the **same role-based authorization** as the Central UI. Every incoming message carries the authenticated user's identity and roles.
|
||
- All mutating operations performed through the Management Service are **audit logged** via IAuditService, identical to operations performed through the Central UI.
|
||
- The ManagementActor runs on the **active central node** and fails over with it. ClusterClient handles reconnection transparently.
|
||
|
||
### 13.2 CLI
|
||
- The system provides a standalone **command-line tool** (`scadalink`) for scripting and automating administrative operations.
|
||
- The CLI connects to the ManagementActor via Akka.NET **ClusterClient** — it does not join the cluster as a full member and does not use HTTP/REST.
|
||
- The CLI authenticates the user against **LDAP/AD** (direct bind, same mechanism as the Central UI) and includes the authenticated identity in every message sent to the ManagementActor.
|
||
- CLI commands mirror all Management Service operations: templates, instances, sites, data connections, deployments, external systems, notifications, security (API keys and role mappings), audit log queries, and health status.
|
||
- Output is **JSON by default** (machine-readable, suitable for scripting) with an optional `--format table` flag for human-readable tabular output.
|
||
- Configuration is resolved from command-line options, **environment variables** (`SCADALINK_CONTACT_POINTS`, `SCADALINK_LDAP_SERVER`, etc.), or a **configuration file** (`~/.scadalink/config.json`).
|
||
- The CLI is a separate executable from the Host binary — it is deployed on any Windows machine with network access to the central cluster.
|
||
|
||
## 14. General Conventions
|
||
|
||
### 14.1 Timestamps
|
||
- All timestamps throughout the system are stored, transmitted, and processed in **UTC**.
|
||
- This applies to: attribute value timestamps, alarm state change timestamps, audit log entries, event log entries, deployment records, health reports, store-and-forward message timestamps, and all inter-node messages.
|
||
- Local time conversion for display is a **Central UI concern only** — no other component performs timezone conversion.
|
||
|
||
---
|
||
|
||
*All initial high-level requirements have been captured. This document will continue to be updated as the design evolves.*
|