feat: replace site registration with database-driven site addressing

Central now resolves site Akka remoting addresses from the Sites DB table
(NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite
messages. Eliminates the race condition where sites starting before central
had their registration dead-lettered. Addresses are cached in
CentralCommunicationActor with 60s periodic refresh and on-demand refresh
when sites are added/edited/deleted via UI or CLI.
This commit is contained in:
Joseph Doherty
2026-03-17 23:13:10 -04:00
parent eb8d5ca2c0
commit 9e97c1acd2
21 changed files with 1641 additions and 92 deletions

View File

@@ -10,6 +10,7 @@ Both central and site clusters. Each side has communication actors that handle m
## Responsibilities
- Resolve site addresses from the configuration database and maintain a cached address map.
- Establish and maintain Akka.NET remoting connections between central and each site cluster.
- Route messages between central and site clusters in a hub-and-spoke topology.
- Broker requests from external systems (via central) to sites and return responses.
@@ -82,6 +83,17 @@ Central Cluster
- Sites do **not** communicate with each other.
- All inter-cluster communication flows through central.
## Site Address Resolution
Central discovers site addresses through the **configuration database**, not runtime registration:
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing the Akka remoting paths of the site's cluster nodes.
- The **CentralCommunicationActor** loads all site addresses from the database at startup and caches them in memory.
- The cache is **refreshed every 60 seconds** and **on-demand** when site records are added, edited, or deleted via the Central UI or CLI.
- When routing a message to a site, the actor **prefers NodeA** and **falls back to NodeB** if NodeA is unreachable.
- **Heartbeats** from sites serve **health monitoring only** — they do not serve as a registration or address discovery mechanism.
- If no addresses are configured for a site, messages to that site are **dropped** and the caller's Ask times out.
## Message Timeouts
Each request/response pattern has a default timeout that can be overridden in configuration:
@@ -139,6 +151,7 @@ The ManagementActor is registered at the well-known path `/user/management` on c
- **Akka.NET Remoting**: Provides the transport layer.
- **Cluster Infrastructure**: Manages node roles and failover detection.
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress) for address resolution.
## Interactions