90 lines
4.3 KiB
Markdown
90 lines
4.3 KiB
Markdown
# Component: Cluster Infrastructure
|
|
|
|
## Purpose
|
|
|
|
The Cluster Infrastructure component manages the Akka.NET cluster setup, active/standby node roles, failover detection, and the foundational runtime environment on which all other components run. It provides the base layer for both central and site clusters.
|
|
|
|
## Location
|
|
|
|
Both central and site clusters.
|
|
|
|
## Responsibilities
|
|
|
|
- Bootstrap the Akka.NET actor system on each node.
|
|
- Form a two-node cluster (active/standby) using Akka.NET Cluster.
|
|
- Manage leader election and role assignment (active vs. standby).
|
|
- Detect node failures and trigger failover.
|
|
- Provide the Akka.NET remoting infrastructure for inter-cluster communication.
|
|
- Support cluster singleton hosting (used by the Site Runtime Deployment Manager singleton on site clusters).
|
|
- Manage Windows service lifecycle (start, stop, restart) on each node.
|
|
|
|
## Cluster Topology
|
|
|
|
### Central Cluster
|
|
- Two nodes forming an Akka.NET cluster.
|
|
- One active node runs all central components (Template Engine, Deployment Manager, Central UI, etc.).
|
|
- One standby node is ready to take over on failover.
|
|
- Connected to MS SQL databases (Config DB, Machine Data DB).
|
|
|
|
### Site Cluster (per site)
|
|
- Two nodes forming an Akka.NET cluster.
|
|
- One active node runs all site components (Site Runtime, Data Connection Layer, Store-and-Forward Engine, etc.).
|
|
- The Site Runtime Deployment Manager runs as an **Akka.NET cluster singleton** on the active node, owning the full Instance Actor hierarchy.
|
|
- One standby node receives replicated store-and-forward data and is ready to take over.
|
|
- Connected to local SQLite databases (store-and-forward buffer, event logs, deployed configurations).
|
|
- Connected to machines via data connections (OPC UA, custom protocol).
|
|
|
|
## Failover Behavior
|
|
|
|
### Detection
|
|
- Akka.NET Cluster monitors node health via heartbeat.
|
|
- If the active node becomes unreachable, the standby node detects the failure and promotes itself to active.
|
|
|
|
### Central Failover
|
|
- The new active node takes over all central responsibilities.
|
|
- In-progress deployments are treated as **failed** — engineers must retry.
|
|
- The UI session may be interrupted — users reconnect to the new active node.
|
|
- No message buffering at central — no state to recover beyond what's in MS SQL.
|
|
|
|
### Site Failover
|
|
- The new active node takes over:
|
|
- The Deployment Manager singleton restarts and re-creates the full Instance Actor hierarchy by reading deployed configurations from local SQLite. Each Instance Actor spawns its child Script and Alarm Actors.
|
|
- Data collection (Data Connection Layer re-establishes subscriptions as Instance Actors register their data source references).
|
|
- Store-and-forward delivery (buffer is already replicated locally).
|
|
- Active debug view streams from central are interrupted — the engineer must re-open them.
|
|
- Health reporting resumes from the new active node.
|
|
- Alarm states are re-evaluated from incoming values (alarm state is in-memory only).
|
|
|
|
## Node Configuration
|
|
|
|
Each node is configured with:
|
|
- **Cluster seed nodes**: Addresses of both nodes in the cluster.
|
|
- **Cluster role**: Central or Site (plus site identifier for site clusters).
|
|
- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication.
|
|
- **Local storage paths**: SQLite database locations (site nodes only).
|
|
|
|
## Windows Service
|
|
|
|
- Each node runs as a **Windows service** for automatic startup and recovery.
|
|
- Service configuration includes Akka.NET cluster settings and component-specific configuration.
|
|
|
|
## Platform
|
|
|
|
- **OS**: Windows Server.
|
|
- **Runtime**: .NET (Akka.NET).
|
|
- **Cluster**: Akka.NET Cluster (application-level, not Windows Server Failover Clustering).
|
|
|
|
## Dependencies
|
|
|
|
- **Akka.NET**: Core actor system, cluster, remoting, and cluster singleton libraries.
|
|
- **Windows**: Service hosting, networking.
|
|
- **MS SQL** (central only): Database connectivity.
|
|
- **SQLite** (sites only): Local storage.
|
|
|
|
## Interactions
|
|
|
|
- **All components**: Every component runs within the Akka.NET actor system managed by this infrastructure.
|
|
- **Site Runtime**: The Deployment Manager singleton relies on Akka.NET cluster singleton support provided by this infrastructure.
|
|
- **Communication Layer**: Built on top of the Akka.NET remoting provided here.
|
|
- **Health Monitoring**: Reports node status (active/standby) as a health metric.
|