Add Management Service and CLI components (design docs)

New components 18-19: ManagementService (Akka.NET actor on Central exposing
all admin operations via ClusterClientReceptionist) and CLI (console app using
ClusterClient for scripting). Updated HighLevelReqs, CLAUDE.md, README,
Component-Host, Component-Communication, Component-Security.
This commit is contained in:
Joseph Doherty
2026-03-17 14:28:02 -04:00
parent 7dcdcc46c7
commit 50dad61e72
8 changed files with 410 additions and 6 deletions

View File

@@ -121,6 +121,10 @@ This provides protocol-level safety beyond Akka.NET's transport guarantees, whic
Akka.NET guarantees message ordering between a specific sender/receiver actor pair. The Communication Layer relies on this guarantee — messages to a given site are processed in the order they are sent. Callers do not need to handle out-of-order delivery.
## ManagementActor and ClusterClient
The ManagementActor is registered at the well-known path `/user/management` on central nodes and advertised via **ClusterClientReceptionist**. External tools (primarily the CLI) connect using Akka.NET ClusterClient, which contacts the receptionist to discover the ManagementActor. ClusterClient is a separate communication channel from the inter-cluster remoting used for central-site messaging — it does not participate in cluster membership or affect the hub-and-spoke topology.
## Connection Failure Behavior
- **In-flight messages**: When a connection drops while a request is in flight (e.g., deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages.
@@ -144,3 +148,4 @@ Akka.NET guarantees message ordering between a specific sender/receiver actor pa
- **Health Monitoring**: Receives periodic health reports from sites.
- **Store-and-Forward Engine (site)**: Parked message queries/commands are routed through communication.
- **Site Event Logging**: Event log queries are routed through communication.
- **Management Service**: The ManagementActor is registered with ClusterClientReceptionist on central nodes. The CLI communicates with the ManagementActor via ClusterClient, which is a separate channel from inter-cluster remoting.