Files
scadalink-design/Component-SiteRuntime.md
Joseph Doherty 19c7e6880f Refine Data Connection Layer: error handling, reconnection, write failures, health reporting
Add connection lifecycle (fixed-interval auto-reconnect, immediate bad quality on
disconnect, transparent re-subscribe), synchronous write failure errors to scripts,
periodic tag path resolution retry, and enhanced health reporting with tag resolution
counts. Update cross-references in Health Monitoring and Site Runtime.
2026-03-16 07:51:37 -04:00

271 lines
16 KiB
Markdown

# Component: Site Runtime
## Purpose
The Site Runtime component manages the execution of deployed machine instances at site clusters. It encompasses the actor hierarchy that represents running instances, their scripts, and their alarms. It owns the site-side deployment lifecycle (receiving configs from central, compiling scripts, creating actors), script execution, alarm evaluation, and the site-wide Akka stream for attribute and alarm state changes.
This component replaces the previously separate Script Engine and Alarm Engine concepts, unifying them under a single actor hierarchy rooted at the Deployment Manager singleton.
## Location
Site clusters only.
## Responsibilities
- Run the Deployment Manager singleton (Akka.NET cluster singleton) on the active site node.
- On startup (or failover), read all deployed configurations from local SQLite and re-create the full actor hierarchy.
- Receive deployment commands from central: new/updated instance configurations, instance lifecycle commands (disable, enable, delete), and system-wide artifact updates.
- Compile C# scripts when deployments are received.
- Manage the Instance Actor hierarchy (Instance Actors, Script Actors, Alarm Actors).
- Execute scripts via Script Actors with support for concurrent execution.
- Evaluate alarm conditions via Alarm Actors and manage alarm state.
- Maintain the site-wide Akka stream for attribute value and alarm state changes.
- Execute shared scripts inline as compiled code libraries (no separate actors).
- Enforce script call recursion limits.
---
## Actor Hierarchy
```
Deployment Manager Singleton (Cluster Singleton)
├── Instance Actor ("MachineA-001")
│ ├── Script Actor ("MonitorSpeed") — coordinator
│ │ └── Script Execution Actor — short-lived, per invocation
│ ├── Script Actor ("CalculateOEE") — coordinator
│ │ └── Script Execution Actor — short-lived, per invocation
│ ├── Alarm Actor ("OverTemp") — coordinator
│ │ └── Alarm Execution Actor — short-lived, per on-trigger invocation
│ └── Alarm Actor ("LowPressure") — coordinator
├── Instance Actor ("MachineA-002")
│ └── ...
└── ...
```
---
## Deployment Manager Singleton
### Role
- Akka.NET **cluster singleton** — guaranteed to run on exactly one node in the site cluster (the active node).
- On failover, Akka.NET restarts the singleton on the new active node.
### Startup Behavior
1. Read all deployed configurations from local SQLite.
2. Read all shared scripts from local storage.
3. Compile all scripts (instance scripts, alarm on-trigger scripts, shared scripts).
4. Create Instance Actors for all deployed, **enabled** instances as child actors.
5. Make compiled shared script code available to all Script Actors.
### Deployment Handling
- Receives flattened instance configurations from central via the Communication Layer.
- Stores the new configuration in local SQLite.
- Compiles all scripts in the configuration.
- Creates a new Instance Actor (for new instances) or updates an existing one (for redeployments).
- For redeployments: the existing Instance Actor and all its children are stopped, then a new Instance Actor is created with the updated configuration. Subscriptions are re-established.
- Reports deployment result (success/failure) back to central.
### System-Wide Artifact Handling
- Receives updated shared scripts, external system definitions, database connection definitions, and notification lists from central.
- Stores artifacts in local SQLite/filesystem.
- Recompiles shared scripts and makes updated code available to all Script Actors.
### Instance Lifecycle Commands
- **Disable**: Stops the Instance Actor and its children. Retains the deployed configuration in SQLite so the instance can be re-enabled without redeployment.
- **Enable**: Creates a new Instance Actor from the stored configuration (same as startup).
- **Delete**: Stops the Instance Actor and its children, removes the deployed configuration from local SQLite. Does **not** clear store-and-forward messages.
---
## Instance Actor
### Role
- **Single source of truth** for all runtime state of a deployed instance.
- Holds all attribute values (both static configuration values and live values from data connections).
- Holds current alarm states (active/normal), updated by child Alarm Actors.
- Publishes attribute value changes and alarm state changes to the site-wide Akka stream.
### Initialization
1. Load all attribute values from the flattened configuration (static defaults).
2. Register data source references with the Data Connection Layer for subscriptions.
3. Create child Script Actors (one per script defined on the instance).
4. Create child Alarm Actors (one per alarm defined on the instance).
### Attribute Value Updates
- Receives tag value updates from the Data Connection Layer for attributes with data source references.
- Updates the in-memory attribute value.
- Notifies subscribed child Script Actors and Alarm Actors of the change.
- Publishes the change to the site-wide Akka stream.
### Stream Message Format
- **Attribute changes**: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp.
- **Alarm state changes**: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
### GetAttribute / SetAttribute
- **GetAttribute**: Returns the current in-memory value for the requested attribute.
- **SetAttribute** (for attributes with data source reference): Sends a write request to the Data Connection Layer. The DCL writes to the physical device. If the write fails (connection down, device rejection, timeout), the error is returned synchronously to the calling script for handling. On success, the existing subscription picks up the confirmed value from the device and sends it back as a value update, which then updates the in-memory value. The in-memory value is **not** optimistically updated.
- **SetAttribute** (for static attributes): Updates the in-memory value directly. This change is ephemeral — it is lost on restart and resets to the deployed configuration value.
### Debug View Support
- On request from central (via Communication Layer), the Instance Actor provides a **snapshot** of all current attribute values and alarm states.
- Subsequent changes are delivered via the site-wide Akka stream, filtered by instance unique name.
### Supervision
- The Instance Actor supervises all child Script and Alarm Actors.
- When the Instance Actor is stopped (due to disable, delete, or redeployment), Akka.NET automatically stops all child actors.
---
## Script Actor
### Role
- **Coordinator** for a single script definition on an instance.
- Holds the compiled script code and trigger configuration.
- Manages trigger evaluation (interval timer, value change detection, conditional evaluation).
- Spawns short-lived Script Execution Actors for each invocation.
### Trigger Management
- **Interval**: The Script Actor manages an internal timer. When the timer fires, it spawns a Script Execution Actor.
- **Value Change**: The Script Actor subscribes to attribute change notifications from its parent Instance Actor for the specific monitored attribute. When the attribute changes, it spawns a Script Execution Actor.
- **Conditional**: The Script Actor subscribes to attribute change notifications for the monitored attribute. On each update, it evaluates the condition (equals or not-equals a value). If the condition is met, it spawns a Script Execution Actor.
- **Minimum time between runs**: If configured, the Script Actor tracks the last execution time and skips trigger invocations that fire before the minimum interval has elapsed.
### Concurrent Execution
- Each invocation spawns a **new Script Execution Actor** as a child.
- Multiple Script Execution Actors can run concurrently (e.g., a trigger fires while a previous `Instance.CallScript` invocation is still running).
- The Script Actor coordinates but does not block on child completion.
### Script Execution Actor
- **Short-lived** child actor created per invocation.
- Receives: compiled script code, input parameters, reference to the parent Instance Actor, current call depth.
- Executes the script in the Akka actor context.
- Has access to the full Script Runtime API (see below).
- Returns the script's return value (if defined) to the caller, then stops.
### Handling `Instance.CallScript`
- When an external caller (another Script Execution Actor, an Alarm Execution Actor, or a routed call from the Inbound API) sends a `CallScript` message to the Script Actor, it spawns a Script Execution Actor to handle the call.
- The caller uses the **Akka ask pattern** and receives the return value when the execution completes.
---
## Alarm Actor
### Role
- **Coordinator** for a single alarm definition on an instance.
- Evaluates alarm trigger conditions against attribute value updates.
- Manages alarm state (active/normal) in memory.
- Executes on-trigger scripts when the alarm activates.
### Alarm Evaluation
- Subscribes to attribute change notifications from its parent Instance Actor for the attribute(s) referenced by its trigger definition.
- On each value update, evaluates the trigger condition:
- **Value Match**: Incoming value equals the predefined target.
- **Range Violation**: Value is outside the allowed min/max range.
- **Rate of Change**: Value change rate exceeds the defined threshold over time.
- When the condition is met and the alarm is currently in **normal** state, the alarm transitions to **active**:
- Updates the alarm state on the parent Instance Actor (which publishes to the Akka stream).
- If an on-trigger script is defined, spawns an Alarm Execution Actor to execute it.
- When the condition clears and the alarm is in **active** state, the alarm transitions to **normal**:
- Updates the alarm state on the parent Instance Actor.
- No script execution on clear.
### Alarm State
- Held **in memory** only — not persisted to SQLite.
- On restart (or failover), alarm states are re-evaluated from incoming values. All alarms start in normal state and transition to active when conditions are detected.
### Alarm Execution Actor
- **Short-lived** child actor created when an on-trigger script needs to execute.
- Same pattern as Script Execution Actor — receives compiled code, executes, returns, and stops.
- Has access to the Instance Actor for `GetAttribute`/`SetAttribute`.
- **Can** call instance scripts via `Instance.CallScript()` — sends an ask message to the appropriate sibling Script Actor.
- Instance scripts **cannot** call alarm on-trigger scripts — the call direction is one-way.
---
## Shared Script Library
- Shared scripts are compiled at the site when received from central.
- Compiled code is stored in memory and made available to all Script Actors.
- When a Script Execution Actor calls `Scripts.CallShared("scriptName", params)`, the shared script code executes **inline** in the Script Execution Actor's context — it is a direct method invocation, not an actor message.
- This avoids serialization bottlenecks since there is no shared script actor to contend for.
- Shared scripts have access to the same runtime API as instance scripts (GetAttribute, SetAttribute, external systems, notifications, databases).
---
## Script Runtime API
Available to all Script Execution Actors and Alarm Execution Actors:
### Instance Attributes
- `Instance.GetAttribute("name")` — Read an attribute value from the parent Instance Actor.
- `Instance.SetAttribute("name", value)` — Write an attribute value. For data-connected attributes, writes to the DCL; for static attributes, updates in-memory directly.
### Other Scripts
- `Instance.CallScript("scriptName", parameters)` — Send an ask message to a sibling Script Actor. The target Script Actor spawns a Script Execution Actor, executes, and returns the result. The call includes the current recursion depth.
- `Scripts.CallShared("scriptName", parameters)` — Execute shared script code inline (direct method invocation). The call includes the current recursion depth.
### External Systems
- Access to predefined external system API methods (see External System Gateway component).
### Notifications
- `Notify.To("listName").Send("subject", "message")` — Send an email notification via a named notification list.
### Database Access
- `Database.Connection("connectionName")` — Obtain a raw MS SQL client connection (ADO.NET) for synchronous read/write.
- `Database.CachedWrite("connectionName", "sql", parameters)` — Submit a write operation for store-and-forward delivery.
### Recursion Limit
- Every script call (`Instance.CallScript` and `Scripts.CallShared`) increments a call depth counter.
- If the counter exceeds the maximum recursion depth (default: 10), the call fails with an error.
- The error is logged to the site event log.
---
## Script Scoping Rules
- Scripts can only read/write attributes on **their own instance** (via the parent Instance Actor).
- Scripts can call other scripts on **their own instance** (via sibling Script Actors).
- Scripts can call **shared scripts** (inline execution).
- Scripts **cannot** access other instances' attributes or scripts.
- Alarm on-trigger scripts **can** call instance scripts; instance scripts **cannot** call alarm on-trigger scripts.
---
## Error Handling
### Script Errors
- Unhandled exceptions and timeouts in Script Execution Actors are **logged locally** to the site event log.
- The Script Actor (coordinator) is **not affected** — it remains active for future trigger events.
- Script failures are **not reported to central** (except as aggregated error rate metrics via Health Monitoring).
### Alarm Evaluation Errors
- Errors during alarm condition evaluation are **logged locally** to the site event log.
- The Alarm Actor remains active and continues evaluating on subsequent value updates.
- Alarm evaluation error rates are reported to central via Health Monitoring.
### Script Compilation Errors
- If script compilation fails when a deployment is received, the entire deployment for that instance is **rejected**. No partial state is applied.
- The failure is reported back to central as a failed deployment.
- Note: Pre-deployment validation at central should catch compilation errors before they reach the site. Site-side compilation failures indicate an unexpected issue.
---
## Dependencies
- **Data Connection Layer**: Provides tag value updates to Instance Actors. Receives write requests from Instance Actors.
- **Store-and-Forward Engine**: Handles reliable delivery for external system calls, notifications, and cached database writes submitted by scripts.
- **External System Gateway**: Provides external system method invocations for scripts.
- **Notification Service**: Handles email delivery for scripts.
- **Communication Layer**: Receives deployments and lifecycle commands from central. Handles debug view requests. Reports deployment results.
- **Site Event Logging**: Records script executions, alarm events, deployment events, instance lifecycle events.
- **Health Monitoring**: Reports script error rates and alarm evaluation error rates.
- **Local SQLite**: Persists deployed configurations.
## Interactions
- **Deployment Manager (central)**: Receives flattened configurations, system-wide artifact updates, and instance lifecycle commands.
- **Data Connection Layer**: Bidirectional — receives value updates, sends write-back commands.
- **Communication Layer**: Receives commands from central, sends deployment results, serves debug view data.
- **Store-and-Forward Engine**: Scripts route cached writes, notifications, and external system calls here.
- **Health Monitoring**: Periodically reports error rate metrics.