103 lines
5.3 KiB
Markdown
103 lines
5.3 KiB
Markdown
# Component: Central–Site Communication
|
||
|
||
## Purpose
|
||
|
||
The Communication component manages all messaging between the central cluster and site clusters using Akka.NET. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs).
|
||
|
||
## Location
|
||
|
||
Both central and site clusters. Each side has communication actors that handle message routing.
|
||
|
||
## Responsibilities
|
||
|
||
- Establish and maintain Akka.NET remoting connections between central and each site cluster.
|
||
- Route messages between central and site clusters in a hub-and-spoke topology.
|
||
- Broker requests from external systems (via central) to sites and return responses.
|
||
- Support multiple concurrent message patterns (request/response, fire-and-forget, streaming).
|
||
- Detect site connectivity status for health monitoring.
|
||
|
||
## Communication Patterns
|
||
|
||
### 1. Deployment (Central → Site)
|
||
- **Pattern**: Request/Response.
|
||
- Central sends a flattened configuration to a site.
|
||
- Site Runtime receives, compiles scripts, creates/updates Instance Actors, and responds with success/failure.
|
||
- No buffering at central — if the site is unreachable, the deployment fails immediately.
|
||
|
||
### 2. Instance Lifecycle Commands (Central → Site)
|
||
- **Pattern**: Request/Response.
|
||
- Central sends disable, enable, or delete commands for specific instances.
|
||
- Site Runtime processes the command and responds with success/failure.
|
||
- If the site is unreachable, the command fails immediately (no buffering).
|
||
|
||
### 3. System-Wide Artifact Deployment (Central → All Sites)
|
||
- **Pattern**: Broadcast with per-site acknowledgment.
|
||
- When shared scripts, external system definitions, database connections, or notification lists are explicitly deployed, central sends them to all sites.
|
||
- Each site acknowledges receipt and reports success/failure independently.
|
||
|
||
### 4. Integration Routing (External System → Central → Site → Central → External System)
|
||
- **Pattern**: Request/Response (brokered).
|
||
- External system sends a request to central (e.g., MES requests machine values).
|
||
- Central routes the request to the appropriate site.
|
||
- Site reads values from the Instance Actor and responds.
|
||
- Central returns the response to the external system.
|
||
|
||
### 5. Recipe/Command Delivery (External System → Central → Site)
|
||
- **Pattern**: Fire-and-forget with acknowledgment.
|
||
- External system sends a command to central (e.g., recipe manager sends recipe).
|
||
- Central routes to the site.
|
||
- Site applies and acknowledges.
|
||
|
||
### 6. Debug Streaming (Site → Central)
|
||
- **Pattern**: Subscribe/stream with initial snapshot.
|
||
- Central sends a subscribe request for a specific instance (identified by unique name).
|
||
- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it to central.
|
||
- Site then subscribes to the **site-wide Akka stream** filtered by the instance's unique name and forwards attribute value changes and alarm state changes to central.
|
||
- Attribute value stream messages: `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp.
|
||
- Alarm state stream messages: `[InstanceUniqueName].[AlarmName]`, state (active/normal), priority, timestamp.
|
||
- Central sends an unsubscribe request when the debug view closes. The site removes its stream subscription.
|
||
- The stream is session-based and temporary.
|
||
|
||
### 7. Health Reporting (Site → Central)
|
||
- **Pattern**: Periodic push.
|
||
- Sites periodically send health metrics (connection status, node status, buffer depth, script error rates, alarm evaluation error rates) to central.
|
||
|
||
### 8. Remote Queries (Central → Site)
|
||
- **Pattern**: Request/Response.
|
||
- Central queries sites for:
|
||
- Parked messages (store-and-forward dead letters).
|
||
- Site event logs.
|
||
- Central can also send management commands:
|
||
- Retry or discard parked messages.
|
||
|
||
## Topology
|
||
|
||
```
|
||
Central Cluster
|
||
├── Akka.NET Remoting → Site A Cluster
|
||
├── Akka.NET Remoting → Site B Cluster
|
||
└── Akka.NET Remoting → Site N Cluster
|
||
```
|
||
|
||
- Sites do **not** communicate with each other.
|
||
- All inter-cluster communication flows through central.
|
||
|
||
## Failover Behavior
|
||
|
||
- **Central failover**: The standby node takes over the Akka.NET cluster role. In-progress deployments are treated as failed. Sites reconnect to the new active central node.
|
||
- **Site failover**: The standby node takes over. The Deployment Manager singleton restarts and re-creates the Instance Actor hierarchy. Central detects the node change and reconnects. Ongoing debug streams are interrupted and must be re-established by the engineer.
|
||
|
||
## Dependencies
|
||
|
||
- **Akka.NET Remoting**: Provides the transport layer.
|
||
- **Cluster Infrastructure**: Manages node roles and failover detection.
|
||
|
||
## Interactions
|
||
|
||
- **Deployment Manager (central)**: Uses communication to deliver configurations, lifecycle commands, and system-wide artifacts, and receive status.
|
||
- **Site Runtime**: Receives deployments, lifecycle commands, and artifact updates. Provides debug view data.
|
||
- **Central UI**: Debug view requests and remote queries flow through communication.
|
||
- **Health Monitoring**: Receives periodic health reports from sites.
|
||
- **Store-and-Forward Engine (site)**: Parked message queries/commands are routed through communication.
|
||
- **Site Event Logging**: Event log queries are routed through communication.
|