Files
scadalink-design/docs/requirements/Component-SiteRuntime.md
Joseph Doherty 416a03b782 feat: complete gRPC streaming channel — site host, docker config, docs, integration tests
Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server,
add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber,
expose gRPC ports in docker-compose, add site seed script, update all 10
requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.
2026-03-21 12:38:33 -04:00

23 KiB

Component: Site Runtime

Purpose

The Site Runtime component manages the execution of deployed machine instances at site clusters. It encompasses the actor hierarchy that represents running instances, their scripts, and their alarms. It owns the site-side deployment lifecycle (receiving configs from central, compiling scripts, creating actors), script execution, alarm evaluation, and the site-wide Akka stream for attribute and alarm state changes.

This component replaces the previously separate Script Engine and Alarm Engine concepts, unifying them under a single actor hierarchy rooted at the Deployment Manager singleton.

Location

Site clusters only.

Responsibilities

  • Run the Deployment Manager singleton (Akka.NET cluster singleton) on the active site node.
  • On startup (or failover), read all deployed configurations from local SQLite and re-create the full actor hierarchy.
  • Receive deployment commands from central: new/updated instance configurations, instance lifecycle commands (disable, enable, delete), and system-wide artifact updates.
  • Compile C# scripts when deployments are received.
  • Manage the Instance Actor hierarchy (Instance Actors, Script Actors, Alarm Actors).
  • Execute scripts via Script Actors with support for concurrent execution.
  • Evaluate alarm conditions via Alarm Actors and manage alarm state.
  • Maintain the site-wide Akka stream for attribute value and alarm state changes.
  • Execute shared scripts inline as compiled code libraries (no separate actors).
  • Enforce script call recursion limits.

Actor Hierarchy

Deployment Manager Singleton (Cluster Singleton)
├── Instance Actor ("MachineA-001")
│   ├── Script Actor ("MonitorSpeed") — coordinator
│   │   └── Script Execution Actor — short-lived, per invocation
│   ├── Script Actor ("CalculateOEE") — coordinator
│   │   └── Script Execution Actor — short-lived, per invocation
│   ├── Alarm Actor ("OverTemp") — coordinator
│   │   └── Alarm Execution Actor — short-lived, per on-trigger invocation
│   └── Alarm Actor ("LowPressure") — coordinator
├── Instance Actor ("MachineA-002")
│   └── ...
└── ...

Deployment Manager Singleton

Role

  • Akka.NET cluster singleton — guaranteed to run on exactly one node in the site cluster (the active node).
  • On failover, Akka.NET restarts the singleton on the new active node.

Startup Behavior

  1. Read all deployed configurations from local SQLite.
  2. Read all shared scripts from local storage.
  3. Compile all scripts (instance scripts, alarm on-trigger scripts, shared scripts).
  4. Create Instance Actors for all deployed, enabled instances as child actors. Instance Actors are created in staggered batches (e.g., 20 at a time with a short delay between batches) to prevent a reconnection storm — 500 Instance Actors all registering data subscriptions simultaneously would overwhelm OPC UA servers and network capacity.
  5. Make compiled shared script code available to all Script Actors.

Deployment Handling

  • Receives flattened instance configurations from central via the Communication Layer.
  • Stores the new configuration in local SQLite.
  • Compiles all scripts in the configuration.
  • Creates a new Instance Actor (for new instances) or updates an existing one (for redeployments).
  • For redeployments: the existing Instance Actor and all its children are stopped, then a new Instance Actor is created with the updated configuration. Subscriptions are re-established.
  • Reports deployment result (success/failure) back to central.

System-Wide Artifact Handling

  • Receives updated shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration from central.
  • Stores all artifacts in local SQLite. After artifact deployment, the site is fully self-contained — all runtime configuration is read from local SQLite with no access to the central configuration database.
  • Recompiles shared scripts and makes updated code available to all Script Actors.

Instance Lifecycle Commands

  • Disable: Stops the Instance Actor and its children. Retains the deployed configuration in SQLite so the instance can be re-enabled without redeployment.
  • Enable: Creates a new Instance Actor from the stored configuration (same as startup).
  • Delete: Stops the Instance Actor and its children, removes the deployed configuration from local SQLite. Does not clear store-and-forward messages.

Debug Snapshot Routing

  • Receives DebugSnapshotRequest from the Communication Layer and forwards to the Instance Actor by unique name (same lookup as SubscribeDebugViewRequest).
  • Returns an error response if no Instance Actor exists for the requested unique name (instance not deployed or not enabled).

Instance Actor

Role

  • Single source of truth for all runtime state of a deployed instance.
  • Holds all attribute values (both static configuration values and live values from data connections).
  • Holds current alarm states (active/normal), updated by child Alarm Actors.
  • Publishes attribute value changes and alarm state changes to the site-wide Akka stream.

Initialization

  1. Load all attribute values from the flattened configuration (static defaults).
  2. Set quality to uncertain for all attributes that have a data source reference. Static attributes (no data source reference) have quality good. The uncertain quality persists until the first value update arrives from the Data Connection Layer, distinguishing "not yet received" from "known good" or "connection lost."
  3. Register data source references with the Data Connection Layer for subscriptions.
  4. Create child Script Actors (one per script defined on the instance).
  5. Create child Alarm Actors (one per alarm defined on the instance).

Attribute Value Updates

  • Receives tag value updates from the Data Connection Layer for attributes with data source references.
  • Updates the in-memory attribute value.
  • Notifies subscribed child Script Actors and Alarm Actors of the change.
  • Publishes the change to the site-wide Akka stream.

Stream Message Format

  • Attribute changes: [InstanceUniqueName].[AttributePath].[AttributeName], attribute value, attribute quality, attribute change timestamp.
  • Alarm state changes: [InstanceUniqueName].[AlarmName], alarm state (active/normal), priority, timestamp.

GetAttribute / SetAttribute

  • GetAttribute: Returns the current in-memory value for the requested attribute.
  • SetAttribute (for attributes with data source reference): Sends a write request to the Data Connection Layer. The DCL writes to the physical device. If the write fails (connection down, device rejection, timeout), the error is returned synchronously to the calling script for handling. On success, the existing subscription picks up the confirmed value from the device and sends it back as a value update, which then updates the in-memory value. The in-memory value is not optimistically updated.
  • SetAttribute (for static attributes): Updates the in-memory value and persists the override to local SQLite. On restart (or failover), the Instance Actor loads persisted overrides on top of the deployed configuration, preserving runtime-modified values. A redeployment of the instance resets all persisted overrides to the new deployed configuration values.

Debug View Support

  • On request from central (via Communication Layer), the Instance Actor provides a snapshot of all current attribute values and alarm states.
  • Subsequent changes are delivered via the SiteStreamManagerSiteStreamGrpcServer → gRPC stream to central. The Instance Actor publishes attribute value and alarm state changes to the SiteStreamManager; it does not forward events directly to the Communication Layer.
  • The Instance Actor also handles one-shot DebugSnapshotRequest messages: it builds the same snapshot (attribute values and alarm states) and replies directly to the sender. Unlike SubscribeDebugViewRequest, no subscriber is registered and no stream is established.

Supervision Strategy

The Instance Actor supervises all child Script and Alarm Actors with explicit strategies:

Child Actor Exception Type Strategy Rationale
Script Actor Any exception Resume Script Actor is a coordinator — its state (trigger timers, last execution time) should survive child failures. Script Execution Actor failures are isolated.
Alarm Actor Any exception Resume Alarm Actor holds alarm state. Resume preserves state and continues evaluation on next value update.
Script Execution Actor Unhandled exception Stop Short-lived, per-invocation. Failure is logged; the Script Actor coordinator remains active for future triggers.
Alarm Execution Actor Unhandled exception Stop Short-lived, per on-trigger invocation. Same as Script Execution Actor.

The Deployment Manager singleton supervises Instance Actors with a OneForOneStrategy — one Instance Actor's failure does not affect other instances.

When the Instance Actor is stopped (due to disable, delete, or redeployment), Akka.NET automatically stops all child actors.


Script Actor

Role

  • Coordinator for a single script definition on an instance.
  • Holds the compiled script code and trigger configuration.
  • Manages trigger evaluation (interval timer, value change detection, conditional evaluation).
  • Spawns short-lived Script Execution Actors for each invocation.

Trigger Management

  • Interval: The Script Actor manages an internal timer. When the timer fires, it spawns a Script Execution Actor.
  • Value Change: The Script Actor subscribes to attribute change notifications from its parent Instance Actor for the specific monitored attribute. When the attribute changes, it spawns a Script Execution Actor.
  • Conditional: The Script Actor subscribes to attribute change notifications for the monitored attribute. On each update, it evaluates the condition (equals or not-equals a value). If the condition is met, it spawns a Script Execution Actor.
  • Minimum time between runs: If configured, the Script Actor tracks the last execution time and skips trigger invocations that fire before the minimum interval has elapsed.

Concurrent Execution

  • Each invocation spawns a new Script Execution Actor as a child.
  • Multiple Script Execution Actors can run concurrently (e.g., a trigger fires while a previous Instance.CallScript invocation is still running).
  • The Script Actor coordinates but does not block on child completion.

Script Execution Actor

  • Short-lived child actor created per invocation.
  • Receives: compiled script code, input parameters, reference to the parent Instance Actor, current call depth.
  • Executes the script in the Akka actor context.
  • Has access to the full Script Runtime API (see below).
  • Returns the script's return value (if defined) to the caller, then stops.

Handling Instance.CallScript

  • When an external caller (another Script Execution Actor, an Alarm Execution Actor, or a routed call from the Inbound API) sends a CallScript message to the Script Actor, it spawns a Script Execution Actor to handle the call.
  • The caller uses the Akka ask pattern and receives the return value when the execution completes.

Alarm Actor

Role

  • Coordinator for a single alarm definition on an instance.
  • Evaluates alarm trigger conditions against attribute value updates.
  • Manages alarm state (active/normal) in memory.
  • Executes on-trigger scripts when the alarm activates.

Alarm Evaluation

  • Subscribes to attribute change notifications from its parent Instance Actor for the attribute(s) referenced by its trigger definition.
  • On each value update, evaluates the trigger condition:
    • Value Match: Incoming value equals the predefined target.
    • Range Violation: Value is outside the allowed min/max range.
    • Rate of Change: Value change rate exceeds the defined threshold over time.
  • When the condition is met and the alarm is currently in normal state, the alarm transitions to active:
    • Updates the alarm state on the parent Instance Actor (which publishes to the Akka stream).
    • If an on-trigger script is defined, spawns an Alarm Execution Actor to execute it.
  • When the condition clears and the alarm is in active state, the alarm transitions to normal:
    • Updates the alarm state on the parent Instance Actor.
    • No script execution on clear.

Alarm State

  • Held in memory only — not persisted to SQLite.
  • On restart (or failover), alarm states are re-evaluated from incoming values. All alarms start in normal state and transition to active when conditions are detected.

Alarm Execution Actor

  • Short-lived child actor created when an on-trigger script needs to execute.
  • Same pattern as Script Execution Actor — receives compiled code, executes, returns, and stops.
  • Has access to the Instance Actor for GetAttribute/SetAttribute.
  • Can call instance scripts via Instance.CallScript() — sends an ask message to the appropriate sibling Script Actor.
  • Instance scripts cannot call alarm on-trigger scripts — the call direction is one-way.

Shared Script Library

  • Shared scripts are compiled at the site when received from central.
  • Compiled code is stored in memory and made available to all Script Actors.
  • When a Script Execution Actor calls Scripts.CallShared("scriptName", params), the shared script code executes inline in the Script Execution Actor's context — it is a direct method invocation, not an actor message.
  • This avoids serialization bottlenecks since there is no shared script actor to contend for.
  • Shared scripts have access to the same runtime API as instance scripts (GetAttribute, SetAttribute, external systems, notifications, databases).

Script Runtime API

Available to all Script Execution Actors and Alarm Execution Actors:

Instance Attributes

  • Instance.GetAttribute("name") — Read an attribute value from the parent Instance Actor.
  • Instance.SetAttribute("name", value) — Write an attribute value. For data-connected attributes, writes to the DCL; for static attributes, updates in-memory and persists to local SQLite (survives restart/failover, reset on redeployment).

Other Scripts

  • Instance.CallScript("scriptName", parameters) — Send an ask message to a sibling Script Actor. The target Script Actor spawns a Script Execution Actor, executes, and returns the result. The call includes the current recursion depth.
  • Scripts.CallShared("scriptName", parameters) — Execute shared script code inline (direct method invocation). The call includes the current recursion depth.

External Systems

  • ExternalSystem.Call("systemName", "methodName", params) — Synchronous HTTP call. Blocks until response or timeout. All failures return to script. Use when the script needs the result.
  • ExternalSystem.CachedCall("systemName", "methodName", params) — Fire-and-forget with store-and-forward on transient failure. Use for outbound data pushes where deferred delivery is acceptable.

Notifications

  • Notify.To("listName").Send("subject", "message") — Send an email notification via a named notification list.

Database Access

  • Database.Connection("connectionName") — Obtain a raw MS SQL client connection (ADO.NET) for synchronous read/write.
  • Database.CachedWrite("connectionName", "sql", parameters) — Submit a write operation for store-and-forward delivery.

Recursion Limit

  • Every script call (Instance.CallScript and Scripts.CallShared) increments a call depth counter.
  • If the counter exceeds the maximum recursion depth (default: 10), the call fails with an error.
  • The error is logged to the site event log.

Script Trust Model

Scripts execute in-process with constrained access. The following restrictions are enforced at compilation and runtime:

  • Allowed: Access to the Script Runtime API (GetAttribute, SetAttribute, CallScript, CallShared, ExternalSystem, Notify, Database), standard C# language features, basic .NET types (collections, string manipulation, math, date/time).
  • Forbidden: File system access (System.IO), process spawning (System.Diagnostics.Process), threading (System.Threading — except async/await), reflection (System.Reflection), raw network access (System.Net.Sockets, System.Net.Http — must use ExternalSystem.Call), assembly loading, unsafe code.
  • Execution timeout: Configurable per-script maximum execution time. Exceeding the timeout cancels the script and logs an error.
  • Memory: Scripts share the host process memory. No per-script memory limit, but the execution timeout prevents runaway allocations.

These constraints are enforced by restricting the set of assemblies and namespaces available to the script compilation context.

Script Scoping Rules

  • Scripts can only read/write attributes on their own instance (via the parent Instance Actor).
  • Scripts can call other scripts on their own instance (via sibling Script Actors).
  • Scripts can call shared scripts (inline execution).
  • Scripts cannot access other instances' attributes or scripts.
  • Alarm on-trigger scripts can call instance scripts; instance scripts cannot call alarm on-trigger scripts.

Tell vs. Ask Usage

Per Akka.NET best practices, internal actor communication uses Tell (fire-and-forget with reply-to) for the hot path:

  • Tag value updates (DCL → Instance Actor): Tell. High-frequency, no response needed.
  • Attribute change notifications (Instance Actor → Script/Alarm Actors): Tell. Fan-out notifications.
  • Stream publishing (Instance Actor → Akka stream): Tell. Fire-and-forget.

Ask is reserved for system boundaries where a synchronous response is needed:

  • Instance.CallScript(): Ask pattern from Script Execution Actor to sibling Script Actor. The caller needs the return value. Acceptable because script calls are infrequent relative to tag updates.
  • Route.To().Call(): Ask from Inbound API to site Instance Actor via Communication Layer. External caller needs a response.
  • Debug view snapshot: Ask from Communication Layer to Instance Actor for initial state.

Concurrency & Serialization

  • The Instance Actor processes messages sequentially (standard Akka actor model). This means SetAttribute calls from concurrent Script Execution Actors are serialized at the Instance Actor, preventing race conditions on attribute state.
  • Script Execution Actors may run concurrently, but all state mutations (attribute reads/writes, alarm state updates) are mediated through the parent Instance Actor's message queue.
  • External side effects (external system calls, notifications, database writes) are not serialized — concurrent scripts may produce interleaved side effects. This is acceptable because each side effect is independent.

SiteStreamManager and gRPC Integration

  • The SiteStreamManager implements the ISiteStreamSubscriber interface, allowing the Communication Layer's SiteStreamGrpcServer to subscribe to the stream for cross-cluster delivery via gRPC.
  • When a gRPC SubscribeInstance call arrives, the SiteStreamGrpcServer creates a StreamRelayActor and subscribes it to SiteStreamManager for the requested instance. Events flow from SiteStreamManagerStreamRelayActorChannel<SiteStreamEvent> → gRPC response stream to central.
  • The SiteStreamManager filters events by instance unique name and forwards matching events to all registered subscribers (both local debug consumers and gRPC relay actors).

Site-Wide Stream Backpressure

  • The site-wide Akka stream uses per-subscriber buffering with bounded buffers. Each subscriber (gRPC relay actors, future consumers) gets an independent buffer.
  • If a subscriber falls behind (e.g., slow network on gRPC stream), its buffer fills and oldest events are dropped. This does not affect other subscribers or the publishing Instance Actors.
  • Instance Actors publish to the stream with fire-and-forget semantics — publishing never blocks the actor.

Error Handling

Script Errors

  • Unhandled exceptions and timeouts in Script Execution Actors are logged locally to the site event log.
  • The Script Actor (coordinator) is not affected — it remains active for future trigger events.
  • Script failures are not reported to central (except as aggregated error rate metrics via Health Monitoring).

Alarm Evaluation Errors

  • Errors during alarm condition evaluation are logged locally to the site event log.
  • The Alarm Actor remains active and continues evaluating on subsequent value updates.
  • Alarm evaluation error rates are reported to central via Health Monitoring.

Script Compilation Errors

  • If script compilation fails when a deployment is received, the entire deployment for that instance is rejected. No partial state is applied.
  • The failure is reported back to central as a failed deployment.
  • Note: Pre-deployment validation at central should catch compilation errors before they reach the site. Site-side compilation failures indicate an unexpected issue.

Dependencies

  • Data Connection Layer: Provides tag value updates to Instance Actors. Receives write requests from Instance Actors.
  • Store-and-Forward Engine: Handles reliable delivery for external system calls, notifications, and cached database writes submitted by scripts.
  • External System Gateway: Provides external system method invocations for scripts.
  • Notification Service: Handles email delivery for scripts.
  • Communication Layer: Receives deployments and lifecycle commands from central. Handles debug view requests. Reports deployment results.
  • Site Event Logging: Records script executions, alarm events, deployment events, instance lifecycle events.
  • Health Monitoring: Reports script error rates and alarm evaluation error rates.
  • Local SQLite: Persists deployed configurations, system-wide artifacts (external system definitions, database connection definitions, data connection definitions, notification lists, SMTP configuration).

Interactions

  • Deployment Manager (central): Receives flattened configurations, system-wide artifact updates, and instance lifecycle commands.
  • Data Connection Layer: Bidirectional — receives value updates, sends write-back commands.
  • Communication Layer: Receives commands from central, sends deployment results, serves debug view data.
  • Store-and-Forward Engine: Scripts route cached writes, notifications, and external system calls here.
  • Health Monitoring: Periodically reports error rate metrics.