feat: replace ActorSelection with ClusterClient for inter-cluster communication
Central and site clusters now communicate via ClusterClient/ ClusterClientReceptionist instead of direct ActorSelection. Both CentralCommunicationActor and SiteCommunicationActor are registered with their cluster's receptionist. Central creates one ClusterClient per site using NodeA/NodeB contact points from the DB. Sites configure multiple CentralContactPoints for automatic failover between central nodes. ISiteClientFactory enables test injection.
This commit is contained in:
@@ -11,7 +11,7 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
## Responsibilities
|
||||
|
||||
- Resolve site addresses from the configuration database and maintain a cached address map.
|
||||
- Establish and maintain Akka.NET remoting connections between central and each site cluster.
|
||||
- Establish and maintain cross-cluster connections using Akka.NET ClusterClient/ClusterClientReceptionist.
|
||||
- Route messages between central and site clusters in a hub-and-spoke topology.
|
||||
- Broker requests from external systems (via central) to sites and return responses.
|
||||
- Support multiple concurrent message patterns (request/response, fire-and-forget, streaming).
|
||||
@@ -75,25 +75,35 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
|
||||
```
|
||||
Central Cluster
|
||||
├── Akka.NET Remoting → Site A Cluster
|
||||
├── Akka.NET Remoting → Site B Cluster
|
||||
└── Akka.NET Remoting → Site N Cluster
|
||||
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist)
|
||||
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist)
|
||||
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist)
|
||||
|
||||
Site Clusters
|
||||
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist)
|
||||
```
|
||||
|
||||
- Sites do **not** communicate with each other.
|
||||
- All inter-cluster communication flows through central.
|
||||
- Both **CentralCommunicationActor** and **SiteCommunicationActor** are registered with their cluster's **ClusterClientReceptionist** for cross-cluster discovery.
|
||||
|
||||
## Site Address Resolution
|
||||
|
||||
Central discovers site addresses through the **configuration database**, not runtime registration:
|
||||
|
||||
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing the Akka remoting paths of the site's cluster nodes.
|
||||
- The **CentralCommunicationActor** loads all site addresses from the database at startup and caches them in memory.
|
||||
- The cache is **refreshed every 60 seconds** and **on-demand** when site records are added, edited, or deleted via the Central UI or CLI.
|
||||
- When routing a message to a site, the actor **prefers NodeA** and **falls back to NodeB** if NodeA is unreachable.
|
||||
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing base Akka addresses of the site's cluster nodes (e.g., `akka.tcp://scadalink@host:port`).
|
||||
- The **CentralCommunicationActor** loads all site addresses from the database at startup and creates one **ClusterClient per site**, configured with both NodeA and NodeB as contact points.
|
||||
- The address cache is **refreshed every 60 seconds** and **on-demand** when site records are added, edited, or deleted via the Central UI or CLI. ClusterClient instances are recreated when contact points change.
|
||||
- When routing a message to a site, central sends via `ClusterClient.Send("/user/site-communication", msg)`. **ClusterClient handles failover between NodeA and NodeB internally** — there is no application-level NodeA preference/NodeB fallback logic.
|
||||
- **Heartbeats** from sites serve **health monitoring only** — they do not serve as a registration or address discovery mechanism.
|
||||
- If no addresses are configured for a site, messages to that site are **dropped** and the caller's Ask times out.
|
||||
|
||||
### Site → Central Communication
|
||||
|
||||
- Site nodes configure a list of **CentralContactPoints** (both central node addresses) instead of a single `CentralActorPath`.
|
||||
- The site creates a **ClusterClient** using the central contact points and sends heartbeats, health reports, and other messages via `ClusterClient.Send("/user/central-communication", msg)`.
|
||||
- ClusterClient handles automatic failover between central nodes — if the active central node goes down, the site's ClusterClient reconnects to the standby node transparently.
|
||||
|
||||
## Message Timeouts
|
||||
|
||||
Each request/response pattern has a default timeout that can be overridden in configuration:
|
||||
@@ -111,11 +121,11 @@ Timeouts use the Akka.NET **ask pattern**. If no response is received within the
|
||||
|
||||
## Transport Configuration
|
||||
|
||||
Akka.NET remoting provides built-in connection management and failure detection. The following transport-level settings are **explicitly configured** (not left to framework defaults) for predictable behavior:
|
||||
Akka.NET remoting provides the underlying transport for both intra-cluster communication and ClusterClient connections. The following transport-level settings are **explicitly configured** (not left to framework defaults) for predictable behavior:
|
||||
|
||||
- **Transport heartbeat interval**: Configurable interval at which heartbeat messages are sent over remoting connections (e.g., every 5 seconds).
|
||||
- **Failure detection threshold**: Number of missed heartbeats before the connection is considered lost (e.g., 3 missed heartbeats = 15 seconds with a 5-second interval).
|
||||
- **Reconnection**: Akka.NET remoting handles reconnection automatically. No custom reconnection logic is required.
|
||||
- **Reconnection**: ClusterClient handles reconnection and failover between contact points automatically for cross-cluster communication. No custom reconnection logic is required.
|
||||
|
||||
These settings should be tuned for the expected network conditions between central and site clusters.
|
||||
|
||||
@@ -135,7 +145,7 @@ Akka.NET guarantees message ordering between a specific sender/receiver actor pa
|
||||
|
||||
## ManagementActor and ClusterClient
|
||||
|
||||
The ManagementActor is registered at the well-known path `/user/management` on central nodes and advertised via **ClusterClientReceptionist**. External tools (primarily the CLI) connect using Akka.NET ClusterClient, which contacts the receptionist to discover the ManagementActor. ClusterClient is a separate communication channel from the inter-cluster remoting used for central-site messaging — it does not participate in cluster membership or affect the hub-and-spoke topology.
|
||||
The ManagementActor is registered at the well-known path `/user/management` on central nodes and advertised via **ClusterClientReceptionist**. External tools (primarily the CLI) connect using Akka.NET ClusterClient, which contacts the receptionist to discover the ManagementActor. This is a separate ClusterClient usage from the inter-cluster ClusterClient connections used for central-site messaging — the CLI does not participate in cluster membership or affect the hub-and-spoke topology.
|
||||
|
||||
## Connection Failure Behavior
|
||||
|
||||
@@ -144,12 +154,12 @@ The ManagementActor is registered at the well-known path `/user/management` on c
|
||||
|
||||
## Failover Behavior
|
||||
|
||||
- **Central failover**: The standby node takes over the Akka.NET cluster role. In-progress deployments are treated as failed. Sites reconnect to the new active central node.
|
||||
- **Site failover**: The standby node takes over. The Deployment Manager singleton restarts and re-creates the Instance Actor hierarchy. Central detects the node change and reconnects. Ongoing debug streams are interrupted and must be re-established by the engineer.
|
||||
- **Central failover**: The standby node takes over the Akka.NET cluster role. In-progress deployments are treated as failed. Site ClusterClients automatically reconnect to the standby central node via their configured contact points.
|
||||
- **Site failover**: The standby node takes over. The Deployment Manager singleton restarts and re-creates the Instance Actor hierarchy. Central's per-site ClusterClient automatically reconnects to the surviving site node. Ongoing debug streams are interrupted and must be re-established by the engineer.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Akka.NET Remoting**: Provides the transport layer.
|
||||
- **Akka.NET Remoting + ClusterClient**: Provides the transport layer. ClusterClient/ClusterClientReceptionist used for all cross-cluster messaging.
|
||||
- **Cluster Infrastructure**: Manages node roles and failover detection.
|
||||
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress) for address resolution.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user