feat: complete gRPC streaming channel — site host, docker config, docs, integration tests
Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server, add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber, expose gRPC ports in docker-compose, add site seed script, update all 10 requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.
This commit is contained in:
@@ -88,7 +88,7 @@ scadalink instance delete <code>
|
||||
```
|
||||
scadalink site list [--format json|table]
|
||||
scadalink site get <site-id> [--format json|table]
|
||||
scadalink site create --name <name> --id <site-id>
|
||||
scadalink site create --name <name> --id <site-id> [--node-a-address <addr>] [--node-b-address <addr>] [--grpc-node-a-address <addr>] [--grpc-node-b-address <addr>]
|
||||
scadalink site update <site-id> --file <path>
|
||||
scadalink site delete <site-id>
|
||||
scadalink site area list <site-id>
|
||||
|
||||
@@ -24,7 +24,7 @@ Central cluster only. Sites have no user interface.
|
||||
|
||||
## Real-Time Updates
|
||||
|
||||
- **Debug view**: Real-time display of attribute values and alarm states via **streaming**. When the user opens a debug view, a `DebugStreamBridgeActor` on the central side subscribes to the site's Akka stream for the selected instance. The bridge actor delivers an initial `DebugViewSnapshot` followed by ongoing `AttributeValueChanged` and `AlarmStateChanged` events to the Blazor component via callbacks, which call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
|
||||
- **Debug view**: Real-time display of attribute values and alarm states via **gRPC streaming**. When the user opens a debug view, a `DebugStreamBridgeActor` on the central side opens a gRPC server-streaming subscription to the site's `SiteStreamGrpcServer` for the selected instance, then requests an initial `DebugViewSnapshot` via ClusterClient. Ongoing `AttributeValueChanged` and `AlarmStateChanged` events flow via the gRPC stream (not through ClusterClient) to the bridge actor, which delivers them to the Blazor component via callbacks that call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
|
||||
- **Health dashboard**: Site status, connection health, error rates, and buffer depths update via a **10-second auto-refresh timer**. Since health reports arrive from sites every 30 seconds, a 10s poll interval catches updates within one reporting cycle without unnecessary overhead.
|
||||
- **Deployment status**: Pending/in-progress/success/failed transitions **push to the UI immediately** via SignalR (built into Blazor Server). No polling required for deployment tracking.
|
||||
|
||||
@@ -66,7 +66,7 @@ Central cluster only. Sites have no user interface.
|
||||
- Configure SMTP settings.
|
||||
|
||||
### Site & Data Connection Management (Admin Role)
|
||||
- Create, edit, and delete site definitions.
|
||||
- Create, edit, and delete site definitions, including Akka node addresses (NodeA/NodeB) and gRPC node addresses (GrpcNodeA/GrpcNodeB).
|
||||
- Define data connections and assign them to sites (name, protocol type, connection details).
|
||||
|
||||
### Area Management (Admin Role)
|
||||
@@ -101,8 +101,8 @@ Central cluster only. Sites have no user interface.
|
||||
### Debug View (Deployment Role)
|
||||
- Select a deployed instance and open a live debug view.
|
||||
- Real-time streaming of all attribute values (with quality and timestamp) and alarm states for that instance.
|
||||
- The `DebugStreamService` creates a `DebugStreamBridgeActor` on the central side that subscribes to the site's Akka stream for the selected instance.
|
||||
- The bridge actor receives an initial `DebugViewSnapshot` followed by ongoing `AttributeValueChanged` and `AlarmStateChanged` events from the site.
|
||||
- The `DebugStreamService` creates a `DebugStreamBridgeActor` on the central side. The bridge actor opens a **gRPC server-streaming subscription** to the site's `SiteStreamGrpcServer` for the selected instance, then requests an initial `DebugViewSnapshot` via ClusterClient.
|
||||
- Ongoing events (`AttributeValueChanged`, `AlarmStateChanged`) flow via the gRPC stream directly to the bridge actor — they do not pass through ClusterClient.
|
||||
- Events are delivered to the Blazor component via callbacks, which call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
|
||||
- A pulsing "Live" indicator replaces the static "Connected" badge when streaming is active.
|
||||
- Stream includes attribute values formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]` and alarm states formatted as `[InstanceUniqueName].[AlarmName]`.
|
||||
|
||||
@@ -106,7 +106,8 @@ The Host component wires CoordinatedShutdown into the Windows Service lifecycle
|
||||
Each node is configured with:
|
||||
- **Cluster seed nodes**: **Both nodes** are seed nodes — each node lists both itself and its partner. Either node can start first and form the cluster; the other joins when it starts. No startup ordering dependency.
|
||||
- **Cluster role**: Central or Site (plus site identifier for site clusters).
|
||||
- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication.
|
||||
- **Akka.NET remoting**: Hostname/port for inter-node and inter-cluster communication (default 8081 central, 8082 site).
|
||||
- **gRPC port** (site nodes only): Dedicated HTTP/2 port for the SiteStreamGrpcServer (default 8083). Separate from the Akka remoting port — gRPC uses Kestrel, Akka uses its own TCP transport.
|
||||
- **Local storage paths**: SQLite database locations (site nodes only).
|
||||
|
||||
## Windows Service
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
## Purpose
|
||||
|
||||
The Communication component manages all messaging between the central cluster and site clusters using Akka.NET. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs).
|
||||
The Communication component manages all messaging between the central cluster and site clusters. It provides the transport layer for deployments, instance lifecycle commands, integration routing, debug streaming, health reporting, and remote queries (parked messages, event logs). Two transports are used: **Akka.NET ClusterClient** for command/control messaging and **gRPC server-streaming** for real-time data (attribute values, alarm states).
|
||||
|
||||
## Location
|
||||
|
||||
@@ -10,12 +10,15 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- Resolve site addresses from the configuration database and maintain a cached address map.
|
||||
- Establish and maintain cross-cluster connections using Akka.NET ClusterClient/ClusterClientReceptionist.
|
||||
- Resolve site addresses (Akka remoting and gRPC) from the configuration database and maintain a cached address map.
|
||||
- Establish and maintain cross-cluster connections using Akka.NET ClusterClient/ClusterClientReceptionist for command/control.
|
||||
- Establish and maintain per-site gRPC streaming connections for real-time data delivery (site→central).
|
||||
- Route messages between central and site clusters in a hub-and-spoke topology.
|
||||
- Broker requests from external systems (via central) to sites and return responses.
|
||||
- Support multiple concurrent message patterns (request/response, fire-and-forget, streaming).
|
||||
- Detect site connectivity status for health monitoring.
|
||||
- Host the **SiteStreamGrpcServer** on site nodes (Kestrel HTTP/2) to serve real-time event streams.
|
||||
- Manage per-site **SiteStreamGrpcClient** instances on central nodes via **SiteStreamGrpcClientFactory**.
|
||||
|
||||
## Communication Patterns
|
||||
|
||||
@@ -50,22 +53,55 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
- Site applies and acknowledges.
|
||||
|
||||
### 6. Debug Streaming (Site → Central)
|
||||
- **Pattern**: Subscribe/push with initial snapshot (no polling).
|
||||
- A **DebugStreamBridgeActor** (one per active debug session) is created on the central cluster by the **DebugStreamService**. The bridge actor sends a `SubscribeDebugViewRequest` to the site via `CentralCommunicationActor`. The site's `InstanceActor` stores the subscription's correlation ID and replies with an initial snapshot via the ClusterClient reply path.
|
||||
- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it back to the bridge actor (via the ClusterClient reply path, which works for immediate responses).
|
||||
- For ongoing events, the InstanceActor wraps `AttributeValueChanged` and `AlarmStateChanged` in a `DebugStreamEvent(correlationId, event)` message and sends it to the local `SiteCommunicationActor`. The SiteCommunicationActor forwards it to central via its own ClusterClient (`ClusterClient.Send("/user/central-communication", event)`). The `CentralCommunicationActor` looks up the bridge actor by correlation ID and delivers the event. This follows the same site→central pattern as health reports.
|
||||
- **Pattern**: Subscribe/push with initial snapshot. Two transports: **ClusterClient** for the subscribe/unsubscribe handshake and initial snapshot, **gRPC server-streaming** for ongoing real-time events.
|
||||
- A **DebugStreamBridgeActor** (one per active debug session) is created on the central cluster by the **DebugStreamService**. The bridge actor first opens a **gRPC server-streaming subscription** to the site via `SiteStreamGrpcClient`, then sends a `SubscribeDebugViewRequest` to the site via `CentralCommunicationActor` (ClusterClient). The site's `InstanceActor` replies with an initial snapshot via the ClusterClient reply path.
|
||||
- **gRPC stream (real-time events)**: The site's **SiteStreamGrpcServer** receives the gRPC `SubscribeInstance` call and creates a **StreamRelayActor** that subscribes to **SiteStreamManager** for the requested instance. Events (`AttributeValueChanged`, `AlarmStateChanged`) flow from `SiteStreamManager` → `StreamRelayActor` → `Channel<SiteStreamEvent>` (bounded, 1000, DropOldest) → gRPC response stream → `SiteStreamGrpcClient` on central → `DebugStreamBridgeActor`.
|
||||
- The `DebugStreamEvent` message type no longer exists — events are not routed through ClusterClient. `SiteCommunicationActor` and `CentralCommunicationActor` have no role in streaming event delivery.
|
||||
- The bridge actor forwards received events to the consumer via callbacks (Blazor component or SignalR hub).
|
||||
- **Snapshot-to-stream handoff**: The gRPC stream is opened **before** the snapshot request to avoid missing events. The consumer applies the snapshot as baseline, then replays buffered gRPC events with timestamps newer than the snapshot (timestamp-based dedup).
|
||||
- Attribute value stream messages: `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp.
|
||||
- Alarm state stream messages: `[InstanceUniqueName].[AlarmName]`, state (active/normal), priority, timestamp.
|
||||
- Central sends an unsubscribe request when the debug session ends. The site removes its stream subscription and the bridge actor is stopped.
|
||||
- Central sends an unsubscribe request via ClusterClient when the debug session ends. The gRPC stream is cancelled. The site's `StreamRelayActor` is stopped and the SiteStreamManager subscription is removed.
|
||||
- The stream is session-based and temporary.
|
||||
|
||||
#### Site-Side gRPC Streaming Components
|
||||
|
||||
- **SiteStreamGrpcServer**: gRPC service (`SiteStreamService.SiteStreamServiceBase`) hosted on each site node via Kestrel HTTP/2 on a dedicated port (default 8083). Implements the `SubscribeInstance` RPC. For each subscription, creates a `StreamRelayActor` that subscribes to `SiteStreamManager`, bridges events through a `Channel<SiteStreamEvent>` to the gRPC response stream. Tracks active subscriptions by `correlation_id` — duplicate IDs cancel the old stream. Enforces a max concurrent stream limit (default 100). Rejects streams with `StatusCode.Unavailable` before the actor system is ready.
|
||||
- **StreamRelayActor**: Short-lived actor created per gRPC subscription. Receives domain events (`AttributeValueChanged`, `AlarmStateChanged`) from `SiteStreamManager`, converts them to protobuf `SiteStreamEvent` messages, and writes to the `Channel<SiteStreamEvent>` writer. Stopped when the gRPC stream is cancelled or the client disconnects.
|
||||
|
||||
#### Central-Side Debug Stream Components
|
||||
|
||||
- **DebugStreamService**: Singleton service that manages debug stream sessions. Resolves instance ID to unique name and site, creates and tears down `DebugStreamBridgeActor` instances, and provides a clean API for both Blazor components and the SignalR hub.
|
||||
- **DebugStreamBridgeActor**: One per active debug session. Acts as the Akka-level subscriber registered with the site's `InstanceActor`. Receives real-time `AttributeValueChanged` and `AlarmStateChanged` events from the site and forwards them to the consumer via callbacks.
|
||||
- **DebugStreamService**: Singleton service that manages debug stream sessions. Resolves instance ID to unique name and site, creates and tears down `DebugStreamBridgeActor` instances, and provides a clean API for both Blazor components and the SignalR hub. Injects `SiteStreamGrpcClientFactory` for gRPC stream creation.
|
||||
- **DebugStreamBridgeActor**: One per active debug session. Opens a gRPC streaming subscription via `SiteStreamGrpcClient` and receives real-time events via callback. Also receives the initial `DebugViewSnapshot` via ClusterClient. Forwards all events to the consumer via callbacks. Handles gRPC stream errors with reconnection logic: tries the other site node endpoint, retries with backoff (max 3 retries), terminates the session if all retries fail.
|
||||
- **SiteStreamGrpcClient**: Per-site gRPC client that manages `GrpcChannel` instances and streaming subscriptions. Reads from the gRPC response stream in a background task, converts protobuf messages to domain events, and invokes the `onEvent` callback.
|
||||
- **SiteStreamGrpcClientFactory**: Caches per-site `SiteStreamGrpcClient` instances. Reads `GrpcNodeAAddress` / `GrpcNodeBAddress` from the `Site` entity (loaded by `CentralCommunicationActor`). Falls back to NodeB if NodeA connection fails. Disposes clients on site removal or address change.
|
||||
- **DebugStreamHub**: SignalR hub at `/hubs/debug-stream` for external consumers (e.g., CLI). Authenticates via Basic Auth + LDAP and requires the **Deployment** role. Server-to-client methods: `OnSnapshot`, `OnAttributeChanged`, `OnAlarmChanged`, `OnStreamTerminated`.
|
||||
|
||||
#### gRPC Proto Definition
|
||||
|
||||
The streaming protocol is defined in `sitestream.proto` (`src/ScadaLink.Communication/Protos/sitestream.proto`):
|
||||
|
||||
- **Service**: `SiteStreamService` with a single RPC `SubscribeInstance(InstanceStreamRequest) returns (stream SiteStreamEvent)`.
|
||||
- **Messages**: `InstanceStreamRequest` (correlation_id, instance_unique_name), `SiteStreamEvent` (correlation_id, oneof event: `AttributeValueUpdate`, `AlarmStateUpdate`).
|
||||
- The `oneof event` pattern is extensible — future event types (health metrics, connection state changes) are added as new fields without breaking existing consumers.
|
||||
- Proto field numbers are never reused. Old clients ignore unknown `oneof` variants.
|
||||
|
||||
#### gRPC Connection Keepalive
|
||||
|
||||
Three layers of dead-client detection prevent orphan streams on site nodes:
|
||||
|
||||
| Layer | Detects | Timeline | Mechanism |
|
||||
|-------|---------|----------|-----------|
|
||||
| TCP RST | Clean process death, connection close | 1–5s | OS-level TCP, `WriteAsync` throws |
|
||||
| gRPC keepalive PING | Network partition, silent crash, firewall drop | ~25s | HTTP/2 PING frames, `CancellationToken` fires |
|
||||
| Session timeout | Misconfigured keepalive, long-lived zombie streams | 4 hours | `CancellationTokenSource.CancelAfter` |
|
||||
|
||||
Keepalive settings are configurable via `CommunicationOptions`:
|
||||
- `GrpcKeepAlivePingDelay`: 15 seconds (default)
|
||||
- `GrpcKeepAlivePingTimeout`: 10 seconds (default)
|
||||
- `GrpcMaxStreamLifetime`: 4 hours (default)
|
||||
- `GrpcMaxConcurrentStreams`: 100 (default)
|
||||
|
||||
### 6a. Debug Snapshot (Central → Site)
|
||||
- **Pattern**: Request/Response (one-shot, no subscription).
|
||||
- Central sends a `DebugSnapshotRequest` (identified by instance unique name) to the site.
|
||||
@@ -91,12 +127,17 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
|
||||
```
|
||||
Central Cluster
|
||||
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist)
|
||||
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist)
|
||||
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist)
|
||||
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist) [command/control]
|
||||
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist) [command/control]
|
||||
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist) [command/control]
|
||||
│
|
||||
├── SiteStreamGrpcClient ◄── gRPC stream ── Site A (SiteStreamGrpcServer) [real-time data]
|
||||
├── SiteStreamGrpcClient ◄── gRPC stream ── Site B (SiteStreamGrpcServer) [real-time data]
|
||||
└── SiteStreamGrpcClient ◄── gRPC stream ── Site N (SiteStreamGrpcServer) [real-time data]
|
||||
|
||||
Site Clusters
|
||||
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist)
|
||||
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist) [command/control]
|
||||
└── SiteStreamGrpcServer (Kestrel HTTP/2, port 8083) → serves gRPC streams [real-time data]
|
||||
```
|
||||
|
||||
- Sites do **not** communicate with each other.
|
||||
@@ -107,8 +148,8 @@ Site Clusters
|
||||
|
||||
Central discovers site addresses through the **configuration database**, not runtime registration:
|
||||
|
||||
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing base Akka addresses of the site's cluster nodes (e.g., `akka.tcp://scadalink@host:port`).
|
||||
- The **CentralCommunicationActor** loads all site addresses from the database at startup and creates one **ClusterClient per site**, configured with both NodeA and NodeB as contact points.
|
||||
- Each site record in the Sites table includes optional **NodeAAddress** and **NodeBAddress** fields containing base Akka addresses of the site's cluster nodes (e.g., `akka.tcp://scadalink@host:port`), and optional **GrpcNodeAAddress** and **GrpcNodeBAddress** fields containing gRPC endpoints (e.g., `http://host:8083`).
|
||||
- The **CentralCommunicationActor** loads all site addresses from the database at startup and creates one **ClusterClient per site**, configured with both NodeA and NodeB as contact points. The **SiteStreamGrpcClientFactory** uses `GrpcNodeAAddress` / `GrpcNodeBAddress` to create per-site gRPC channels for streaming.
|
||||
- The address cache is **refreshed every 60 seconds** and **on-demand** when site records are added, edited, or deleted via the Central UI or CLI. ClusterClient instances are recreated when contact points change.
|
||||
- When routing a message to a site, central sends via `ClusterClient.Send("/user/site-communication", msg)`. **ClusterClient handles failover between NodeA and NodeB internally** — there is no application-level NodeA preference/NodeB fallback logic.
|
||||
- **Heartbeats** from sites serve **health monitoring only** — they do not serve as a registration or address discovery mechanism.
|
||||
@@ -166,7 +207,7 @@ The ManagementActor is registered at the well-known path `/user/management` on c
|
||||
## Connection Failure Behavior
|
||||
|
||||
- **In-flight messages**: When a connection drops while a request is in flight (e.g., deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages.
|
||||
- **Debug streams**: Any connection interruption (failover or network blip) kills the debug stream. The `DebugStreamBridgeActor` is stopped and the consumer is notified via `OnStreamTerminated`. The engineer must reopen the debug view to re-establish the subscription with a fresh snapshot. There is no auto-resume.
|
||||
- **Debug streams**: Any gRPC stream interruption triggers reconnection logic in the `DebugStreamBridgeActor`. The bridge actor attempts to reconnect to the other site node endpoint (NodeB if NodeA failed, or vice versa), with up to 3 retries and 5-second backoff. If all retries fail, the consumer is notified via `OnStreamTerminated` and the bridge actor is stopped. Events during the reconnection gap are lost (acceptable for real-time debug view). On successful reconnection, the consumer can request a fresh snapshot to re-sync state.
|
||||
|
||||
## Failover Behavior
|
||||
|
||||
@@ -175,9 +216,11 @@ The ManagementActor is registered at the well-known path `/user/management` on c
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Akka.NET Remoting + ClusterClient**: Provides the transport layer. ClusterClient/ClusterClientReceptionist used for all cross-cluster messaging.
|
||||
- **Akka.NET Remoting + ClusterClient**: Provides the command/control transport layer. ClusterClient/ClusterClientReceptionist used for cross-cluster command/control messaging (deployments, lifecycle, subscribe/unsubscribe handshake, snapshots).
|
||||
- **gRPC (Grpc.AspNetCore + Grpc.Net.Client)**: Provides the real-time data streaming transport. Site nodes host a gRPC server (SiteStreamGrpcServer); central nodes create per-site gRPC clients (SiteStreamGrpcClient).
|
||||
- **Cluster Infrastructure**: Manages node roles and failover detection.
|
||||
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress) for address resolution.
|
||||
- **Configuration Database**: Provides site node addresses (NodeAAddress, NodeBAddress for Akka remoting; GrpcNodeAAddress, GrpcNodeBAddress for gRPC streaming) for address resolution.
|
||||
- **Site Runtime (SiteStreamManager)**: The SiteStreamGrpcServer subscribes to SiteStreamManager to receive real-time events for gRPC delivery.
|
||||
|
||||
## Interactions
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ The configuration database stores all central system data, organized by domain a
|
||||
- **Shared Scripts**: System-wide reusable script definitions (name, C# source code, parameter definitions, return value definitions).
|
||||
|
||||
### Sites & Data Connections
|
||||
- **Sites**: Site definitions (name, identifier, description).
|
||||
- **Sites**: Site definitions (name, identifier, description, NodeAAddress, NodeBAddress, GrpcNodeAAddress, GrpcNodeBAddress).
|
||||
- **Data Connections**: Data connection definitions (name, protocol type, connection details) with site assignments.
|
||||
|
||||
### External Systems & Database Connections
|
||||
|
||||
@@ -45,7 +45,7 @@ The Host must bind configuration sections from `appsettings.json` to strongly-ty
|
||||
|
||||
| Section | Options Class | Owner | Contents |
|
||||
|---------|--------------|-------|----------|
|
||||
| `ScadaLink:Node` | `NodeOptions` | Host | Role, NodeHostname, SiteId, RemotingPort |
|
||||
| `ScadaLink:Node` | `NodeOptions` | Host | Role, NodeHostname, SiteId, RemotingPort, GrpcPort (site only, default 8083) |
|
||||
| `ScadaLink:Cluster` | `ClusterOptions` | ClusterInfrastructure | SeedNodes, SplitBrainResolverStrategy, StableAfter, HeartbeatInterval, FailureDetectionThreshold, MinNrOfMembers |
|
||||
| `ScadaLink:Database` | `DatabaseOptions` | Host | Central: ConfigurationDb, MachineDataDb connection strings; Site: SQLite paths |
|
||||
|
||||
@@ -79,6 +79,7 @@ Before the Akka.NET actor system is created, the Host must validate all required
|
||||
- `NodeConfiguration.Role` must be a valid `NodeRole` value.
|
||||
- `NodeConfiguration.NodeHostname` must not be null or empty.
|
||||
- `NodeConfiguration.RemotingPort` must be in valid port range (1–65535).
|
||||
- Site nodes must have `GrpcPort` in valid port range (1–65535) and different from `RemotingPort`.
|
||||
- Site nodes must have a non-empty `SiteId`.
|
||||
- Central nodes must have non-empty `ConfigurationDb` and `MachineDataDb` connection strings.
|
||||
- Site nodes must have non-empty SQLite path values. Site nodes do **not** require a `ConfigurationDb` connection string — all configuration is received via artifact deployment and read from local SQLite.
|
||||
@@ -112,14 +113,24 @@ The Host must configure the Akka.NET actor system using Akka.Hosting with:
|
||||
|
||||
On central nodes, the Host must configure the Akka.NET **ClusterClientReceptionist** and register the ManagementActor with it. This allows external processes (e.g., the CLI) to discover and communicate with the ManagementActor via ClusterClient without joining the cluster as full members. The receptionist is started as part of the Akka.NET bootstrap (REQ-HOST-6) on central nodes only.
|
||||
|
||||
### REQ-HOST-7: ASP.NET Web Endpoints (Central Only)
|
||||
### REQ-HOST-7: ASP.NET Web Endpoints
|
||||
|
||||
On central nodes, the Host must use `WebApplication.CreateBuilder` to produce a full ASP.NET Core host with Kestrel, and must map web endpoints for:
|
||||
|
||||
- Central UI (via `MapCentralUI()` extension method).
|
||||
- Inbound API (via `MapInboundAPI()` extension method).
|
||||
|
||||
On site nodes, the Host must use `Host.CreateDefaultBuilder` to produce a generic `IHost` — **not** a `WebApplication`. This ensures no Kestrel server is started, no HTTP port is opened, and no web endpoint or middleware pipeline is configured. Site nodes are headless and must never accept inbound HTTP connections.
|
||||
On site nodes, the Host must also use `WebApplication.CreateBuilder` (not `Host.CreateDefaultBuilder`) to host the **SiteStreamGrpcServer** via Kestrel HTTP/2 on the configured `GrpcPort` (default 8083). Kestrel is configured with `HttpProtocols.Http2` on the gRPC port only — no HTTP/1.1 web endpoints are exposed. The gRPC service is mapped via `MapGrpcService<SiteStreamGrpcServer>()`.
|
||||
|
||||
**Startup ordering (site nodes)**:
|
||||
1. Actor system and SiteStreamManager must be initialized before gRPC begins accepting connections.
|
||||
2. The gRPC server rejects streams with `StatusCode.Unavailable` until the actor system is ready.
|
||||
|
||||
**Shutdown ordering (site nodes)**:
|
||||
1. On `CoordinatedShutdown`, stop accepting new gRPC streams first.
|
||||
2. Cancel all active gRPC streams (triggering client-side reconnect).
|
||||
3. Tear down actors.
|
||||
4. Use `IHostApplicationLifetime.ApplicationStopping` to signal the gRPC server.
|
||||
|
||||
### REQ-HOST-8: Structured Logging
|
||||
|
||||
|
||||
@@ -113,7 +113,7 @@ Deployment Manager Singleton (Cluster Singleton)
|
||||
|
||||
### Debug View Support
|
||||
- On request from central (via Communication Layer), the Instance Actor provides a **snapshot** of all current attribute values and alarm states.
|
||||
- Subsequent changes are delivered via the site-wide Akka stream, filtered by instance unique name.
|
||||
- Subsequent changes are delivered via the **SiteStreamManager** → **SiteStreamGrpcServer** → gRPC stream to central. The Instance Actor publishes attribute value and alarm state changes to the SiteStreamManager; it does not forward events directly to the Communication Layer.
|
||||
- The Instance Actor also handles one-shot `DebugSnapshotRequest` messages: it builds the same snapshot (attribute values and alarm states) and replies directly to the sender. Unlike `SubscribeDebugViewRequest`, no subscriber is registered and no stream is established.
|
||||
|
||||
### Supervision Strategy
|
||||
@@ -280,10 +280,16 @@ Per Akka.NET best practices, internal actor communication uses **Tell** (fire-an
|
||||
- Script Execution Actors may run concurrently, but all state mutations (attribute reads/writes, alarm state updates) are mediated through the parent Instance Actor's message queue.
|
||||
- External side effects (external system calls, notifications, database writes) are not serialized — concurrent scripts may produce interleaved side effects. This is acceptable because each side effect is independent.
|
||||
|
||||
## SiteStreamManager and gRPC Integration
|
||||
|
||||
- The `SiteStreamManager` implements the `ISiteStreamSubscriber` interface, allowing the Communication Layer's `SiteStreamGrpcServer` to subscribe to the stream for cross-cluster delivery via gRPC.
|
||||
- When a gRPC `SubscribeInstance` call arrives, the `SiteStreamGrpcServer` creates a `StreamRelayActor` and subscribes it to `SiteStreamManager` for the requested instance. Events flow from `SiteStreamManager` → `StreamRelayActor` → `Channel<SiteStreamEvent>` → gRPC response stream to central.
|
||||
- The `SiteStreamManager` filters events by instance unique name and forwards matching events to all registered subscribers (both local debug consumers and gRPC relay actors).
|
||||
|
||||
## Site-Wide Stream Backpressure
|
||||
|
||||
- The site-wide Akka stream uses **per-subscriber buffering** with bounded buffers. Each subscriber (debug view, future consumers) gets an independent buffer.
|
||||
- If a subscriber falls behind (e.g., slow network on debug view), its buffer fills and oldest events are dropped. This does not affect other subscribers or the publishing Instance Actors.
|
||||
- The site-wide Akka stream uses **per-subscriber buffering** with bounded buffers. Each subscriber (gRPC relay actors, future consumers) gets an independent buffer.
|
||||
- If a subscriber falls behind (e.g., slow network on gRPC stream), its buffer fills and oldest events are dropped. This does not affect other subscribers or the publishing Instance Actors.
|
||||
- Instance Actors publish to the stream with **fire-and-forget** semantics — publishing never blocks the actor.
|
||||
|
||||
## Error Handling
|
||||
|
||||
@@ -45,11 +45,13 @@
|
||||
- **Machine Data Database**: A separate database for collected machine data (e.g., telemetry, measurements, events).
|
||||
|
||||
### 2.2 Communication: Central ↔ Site
|
||||
- Central-to-site and site-to-central communication uses **Akka.NET ClusterClient/ClusterClientReceptionist** for cross-cluster messaging with automatic failover.
|
||||
- **Site addressing**: Site Akka base addresses (NodeA and NodeB) are stored in the **Sites database table** and configured via the Central UI. Central creates a ClusterClient per site using both addresses as contact points (cached in memory, refreshed periodically and on admin changes) rather than relying on runtime registration messages from sites.
|
||||
- Two transport layers are used for central-site communication:
|
||||
- **Akka.NET ClusterClient/ClusterClientReceptionist**: Handles **command/control** messaging — deployments, instance lifecycle commands, subscribe/unsubscribe handshake, debug snapshots, health reports, remote queries, and integration routing. Provides automatic failover between contact points.
|
||||
- **gRPC server-streaming (site→central)**: Handles **real-time data streaming** — attribute value updates and alarm state changes. Each site node hosts a **SiteStreamGrpcServer** on a dedicated HTTP/2 port (Kestrel, default port 8083). Central creates per-site **SiteStreamGrpcClient** instances to subscribe to site streams. gRPC provides HTTP/2 flow control and per-stream backpressure that ClusterClient lacks.
|
||||
- **Site addressing**: Site Akka base addresses (NodeA and NodeB) and gRPC endpoints (GrpcNodeAAddress and GrpcNodeBAddress) are stored in the **Sites database table** and configured via the Central UI or CLI. Central creates a ClusterClient per site using both Akka addresses as contact points, and per-site gRPC clients using the gRPC addresses.
|
||||
- **Central contact points**: Sites configure **multiple central contact points** (both central node addresses) for redundancy. ClusterClient handles failover between central nodes automatically.
|
||||
- **Central as integration hub**: Central brokers requests between external systems and sites. For example, a recipe manager sends a recipe to central, which routes it to the appropriate site. MES requests machine values from central, which routes the request to the site and returns the response.
|
||||
- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes to the site-wide Akka stream filtered by instance (see Section 8.1).
|
||||
- **Real-time data streaming** is not continuous for all machine data. The only real-time stream is an **on-demand debug view** — an engineer in the central UI can open a live view of a specific instance's tag values and alarm states for troubleshooting purposes. This is session-based and temporary. The debug view subscribes via gRPC to the site's SiteStreamManager filtered by instance (see Section 8.1).
|
||||
|
||||
### 2.3 Site-Level Storage & Interface
|
||||
- Sites have **no user interface** — they are headless collectors, forwarders, and script executors.
|
||||
@@ -362,7 +364,7 @@ The central cluster hosts a **configuration and management UI** (no live machine
|
||||
- **Database Connection Management**: Define named database connections for script use.
|
||||
- **Inbound API Management**: Manage API keys (create, enable/disable, delete). Define API methods (name, parameters, return values, approved keys, implementation script). *(Admin role for keys, Design role for methods.)*
|
||||
- **Instance Management**: Create instances from templates, bind data connections (per-attribute, with **bulk assignment** UI for selecting multiple attributes and assigning a data connection at once), set instance-level attribute overrides, assign instances to areas. **Disable** or **delete** instances.
|
||||
- **Site & Data Connection Management**: Define sites (including optional NodeAAddress and NodeBAddress fields for Akka remoting paths), manage data connections and assign them to sites.
|
||||
- **Site & Data Connection Management**: Define sites (including optional NodeAAddress and NodeBAddress fields for Akka remoting paths, and optional GrpcNodeAAddress and GrpcNodeBAddress fields for gRPC streaming endpoints), manage data connections and assign them to sites.
|
||||
- **Area Management**: Define hierarchical area structures per site for organizing instances.
|
||||
- **Deployment**: View diffs between deployed and current template-derived configurations, deploy updates to individual instances. Filter instances by area. Pre-deployment validation runs automatically before any deployment is sent.
|
||||
- **System-Wide Artifact Deployment**: Explicitly deploy shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, and SMTP configuration to all sites or to an individual site (requires Deployment role). Per-site deployment is available via the Sites admin page.
|
||||
@@ -373,7 +375,7 @@ The central cluster hosts a **configuration and management UI** (no live machine
|
||||
- **Site Event Log Viewer**: Query and view operational event logs from site clusters (see Section 12).
|
||||
|
||||
### 8.1 Debug View
|
||||
- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central subscribes to the **site-wide Akka stream** filtered by instance unique name. The site first provides a **snapshot** of all current attribute values and alarm states from the Instance Actor, then streams subsequent changes from the Akka stream.
|
||||
- **Subscribe-on-demand**: When an engineer opens a debug view for an instance, central opens a **gRPC server-streaming subscription** to the site's `SiteStreamGrpcServer` for the instance, then requests a **snapshot** of all current attribute values and alarm states via ClusterClient. The gRPC stream delivers subsequent attribute value and alarm state changes directly from the site's `SiteStreamManager`.
|
||||
- Attribute value stream messages are structured as: `[InstanceUniqueName].[AttributePath].[AttributeName]`, attribute value, attribute quality, attribute change timestamp.
|
||||
- Alarm state stream messages are structured as: `[InstanceUniqueName].[AlarmName]`, alarm state (active/normal), priority, timestamp.
|
||||
- The stream continues until the engineer **closes the debug view**, at which point central unsubscribes and the site stops streaming.
|
||||
|
||||
Reference in New Issue
Block a user