Verify component designs against Akka.NET best practices documentation

Cluster Infrastructure: add min-nr-of-members=1 requirement for single-node
operation after failover. Add graceful shutdown / CoordinatedShutdown section
for fast singleton handover during planned maintenance.

Site Runtime: add explicit supervision strategies per actor type (Resume for
coordinators, Stop for short-lived execution actors). Stagger Instance Actor
startup to prevent reconnection storms. Add Tell-vs-Ask usage guidance per
Akka.NET best practices (Tell for hot path, Ask for system boundaries only).

Data Connection Layer: add Connection Actor Model section documenting the
Become/Stash pattern for connection lifecycle state machine.

Health Monitoring: add dead letter count as a monitored metric.

Host: add REQ-HOST-8a for dead letter monitoring (subscribe to EventStream,
log at Warning level, report as health metric).
This commit is contained in:
Joseph Doherty
2026-03-16 09:12:36 -04:00
parent de636b908b
commit 409cc62309
5 changed files with 58 additions and 4 deletions

View File

@@ -54,7 +54,7 @@ Deployment Manager Singleton (Cluster Singleton)
1. Read all deployed configurations from local SQLite.
2. Read all shared scripts from local storage.
3. Compile all scripts (instance scripts, alarm on-trigger scripts, shared scripts).
4. Create Instance Actors for all deployed, **enabled** instances as child actors.
4. Create Instance Actors for all deployed, **enabled** instances as child actors. Instance Actors are created in **staggered batches** (e.g., 20 at a time with a short delay between batches) to prevent a reconnection storm — 500 Instance Actors all registering data subscriptions simultaneously would overwhelm OPC UA servers and network capacity.
5. Make compiled shared script code available to all Script Actors.
### Deployment Handling
@@ -110,9 +110,20 @@ Deployment Manager Singleton (Cluster Singleton)
- On request from central (via Communication Layer), the Instance Actor provides a **snapshot** of all current attribute values and alarm states.
- Subsequent changes are delivered via the site-wide Akka stream, filtered by instance unique name.
### Supervision
- The Instance Actor supervises all child Script and Alarm Actors.
- When the Instance Actor is stopped (due to disable, delete, or redeployment), Akka.NET automatically stops all child actors.
### Supervision Strategy
The Instance Actor supervises all child Script and Alarm Actors with explicit strategies:
| Child Actor | Exception Type | Strategy | Rationale |
|-------------|---------------|----------|-----------|
| Script Actor | Any exception | Resume | Script Actor is a coordinator — its state (trigger timers, last execution time) should survive child failures. Script Execution Actor failures are isolated. |
| Alarm Actor | Any exception | Resume | Alarm Actor holds alarm state. Resume preserves state and continues evaluation on next value update. |
| Script Execution Actor | Unhandled exception | Stop | Short-lived, per-invocation. Failure is logged; the Script Actor coordinator remains active for future triggers. |
| Alarm Execution Actor | Unhandled exception | Stop | Short-lived, per on-trigger invocation. Same as Script Execution Actor. |
The Deployment Manager singleton supervises Instance Actors with a **OneForOneStrategy** — one Instance Actor's failure does not affect other instances.
When the Instance Actor is stopped (due to disable, delete, or redeployment), Akka.NET automatically stops all child actors.
---
@@ -243,6 +254,20 @@ These constraints are enforced by restricting the set of assemblies and namespac
---
## Tell vs. Ask Usage
Per Akka.NET best practices, internal actor communication uses **Tell** (fire-and-forget with reply-to) for the hot path:
- **Tag value updates** (DCL → Instance Actor): Tell. High-frequency, no response needed.
- **Attribute change notifications** (Instance Actor → Script/Alarm Actors): Tell. Fan-out notifications.
- **Stream publishing** (Instance Actor → Akka stream): Tell. Fire-and-forget.
**Ask** is reserved for system boundaries where a synchronous response is needed:
- **`Instance.CallScript()`**: Ask pattern from Script Execution Actor to sibling Script Actor. The caller needs the return value. Acceptable because script calls are infrequent relative to tag updates.
- **`Route.To().Call()`**: Ask from Inbound API to site Instance Actor via Communication Layer. External caller needs a response.
- **Debug view snapshot**: Ask from Communication Layer to Instance Actor for initial state.
## Concurrency & Serialization
- The Instance Actor processes messages **sequentially** (standard Akka actor model). This means `SetAttribute` calls from concurrent Script Execution Actors are serialized at the Instance Actor, preventing race conditions on attribute state.