feat: replace debug view polling with real-time SignalR streaming
The debug view polled every 2s by re-subscribing for full snapshots. Now a persistent DebugStreamBridgeActor on central subscribes once and receives incremental Akka stream events from the site, forwarding them to the Blazor component via callbacks and to the CLI via a new SignalR hub at /hubs/debug-stream. Adds `debug stream` CLI command with auto-reconnect.
This commit is contained in:
@@ -168,8 +168,21 @@ scadalink health parked-messages --site-identifier <site-id> [--page <n>] [--pag
|
||||
### Debug Commands
|
||||
```
|
||||
scadalink debug snapshot --id <id> [--format json|table]
|
||||
scadalink debug stream --id <instanceId> [--url ...] [--username ...] [--password ...]
|
||||
```
|
||||
|
||||
The `debug snapshot` command retrieves a point-in-time snapshot via the HTTP Management API.
|
||||
|
||||
The `debug stream` command streams live attribute values and alarm state changes in real-time using a SignalR WebSocket connection. The CLI connects to the `/hubs/debug-stream` SignalR hub on the central server, authenticates with Basic Auth, and subscribes to the specified instance. Events are printed as they arrive — JSON format (default) outputs one NDJSON object per event; table format shows streaming rows. Press Ctrl+C to disconnect.
|
||||
|
||||
Key behaviors:
|
||||
- **Automatic reconnection**: Uses SignalR's `.WithAutomaticReconnect()` to re-establish the connection on loss.
|
||||
- **Re-subscription**: Automatically re-subscribes to the instance after reconnection.
|
||||
- **Traefik compatible**: Works through the Traefik reverse proxy — WebSocket upgrade is proxied natively.
|
||||
- **Required role**: `Deployment`.
|
||||
|
||||
Unlike `debug snapshot` (which uses the HTTP Management API), `debug stream` uses `Microsoft.AspNetCore.SignalR.Client` as a dependency for its WebSocket transport.
|
||||
|
||||
### Shared Script Commands
|
||||
```
|
||||
scadalink shared-script list [--format json|table]
|
||||
@@ -239,8 +252,10 @@ Configuration is resolved in the following priority order (highest wins):
|
||||
|
||||
- **Commons**: Message contracts (`Messages/Management/`) for command type definitions and registry.
|
||||
- **System.CommandLine**: Command-line argument parsing.
|
||||
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
|
||||
|
||||
## Interactions
|
||||
|
||||
- **Management Service (via HTTP)**: The CLI's sole runtime dependency. All operations are sent as HTTP POST requests to the Management API endpoint on a central node, which dispatches to the ManagementActor.
|
||||
- **Central Host**: Serves the Management API at `POST /management`. Handles LDAP authentication, role resolution, and ManagementActor dispatch.
|
||||
- **Management Service (via HTTP)**: The primary runtime dependency. All operations except `debug stream` are sent as HTTP POST requests to the Management API endpoint on a central node, which dispatches to the ManagementActor.
|
||||
- **Central Host**: Serves the Management API at `POST /management` and the debug stream SignalR hub at `/hubs/debug-stream`. Handles LDAP authentication, role resolution, and ManagementActor dispatch.
|
||||
- **Debug Stream Hub (via SignalR WebSocket)**: The `debug stream` command connects to the `/hubs/debug-stream` hub on the central server for real-time event streaming. This is the only CLI command that uses a persistent connection rather than request/response.
|
||||
|
||||
@@ -19,12 +19,12 @@ Central cluster only. Sites have no user interface.
|
||||
- A **load balancer** sits in front of the central cluster and routes to the active node.
|
||||
- On central failover, the Blazor Server SignalR circuit is interrupted. The browser automatically attempts to reconnect via SignalR's built-in reconnection logic.
|
||||
- Since sessions use **authentication cookies** carrying an embedded JWT (not server-side state), the user's authentication survives failover — the new active node validates the same cookie-embedded JWT. No re-login required if the token is still valid.
|
||||
- Active debug view polling and in-progress deployment status subscriptions are lost on failover and must be re-opened by the user.
|
||||
- Active debug view streams and in-progress deployment status subscriptions are lost on failover and must be re-opened by the user.
|
||||
- Both central nodes share the same **ASP.NET Data Protection keys** (stored in the configuration database or shared configuration) so that tokens and anti-forgery tokens remain valid across failover.
|
||||
|
||||
## Real-Time Updates
|
||||
|
||||
- **Debug view**: Near-real-time display of attribute values and alarm states, updated via a **2-second polling timer**. This avoids the complexity of cross-cluster streaming while providing responsive feedback — 2s latency is imperceptible for debugging purposes.
|
||||
- **Debug view**: Real-time display of attribute values and alarm states via **streaming**. When the user opens a debug view, a `DebugStreamBridgeActor` on the central side subscribes to the site's Akka stream for the selected instance. The bridge actor delivers an initial `DebugViewSnapshot` followed by ongoing `AttributeValueChanged` and `AlarmStateChanged` events to the Blazor component via callbacks, which call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
|
||||
- **Health dashboard**: Site status, connection health, error rates, and buffer depths update via a **10-second auto-refresh timer**. Since health reports arrive from sites every 30 seconds, a 10s poll interval catches updates within one reporting cycle without unnecessary overhead.
|
||||
- **Deployment status**: Pending/in-progress/success/failed transitions **push to the UI immediately** via SignalR (built into Blazor Server). No polling required for deployment tracking.
|
||||
|
||||
@@ -100,8 +100,11 @@ Central cluster only. Sites have no user interface.
|
||||
|
||||
### Debug View (Deployment Role)
|
||||
- Select a deployed instance and open a live debug view.
|
||||
- Near-real-time polling (2s interval) of all attribute values (with quality and timestamp) and alarm states for that instance.
|
||||
- Initial snapshot of current state followed by periodic polling for updates.
|
||||
- Real-time streaming of all attribute values (with quality and timestamp) and alarm states for that instance.
|
||||
- The `DebugStreamService` creates a `DebugStreamBridgeActor` on the central side that subscribes to the site's Akka stream for the selected instance.
|
||||
- The bridge actor receives an initial `DebugViewSnapshot` followed by ongoing `AttributeValueChanged` and `AlarmStateChanged` events from the site.
|
||||
- Events are delivered to the Blazor component via callbacks, which call `InvokeAsync(StateHasChanged)` to push UI updates through the built-in SignalR circuit.
|
||||
- A pulsing "Live" indicator replaces the static "Connected" badge when streaming is active.
|
||||
- Stream includes attribute values formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]` and alarm states formatted as `[InstanceUniqueName].[AlarmName]`.
|
||||
- Subscribe-on-demand — stream starts when opened, stops when closed.
|
||||
|
||||
|
||||
@@ -50,15 +50,22 @@ Both central and site clusters. Each side has communication actors that handle m
|
||||
- Site applies and acknowledges.
|
||||
|
||||
### 6. Debug Streaming (Site → Central)
|
||||
- **Pattern**: Subscribe/stream with initial snapshot.
|
||||
- Central sends a subscribe request for a specific instance (identified by unique name).
|
||||
- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it to central.
|
||||
- Site then subscribes to the **site-wide Akka stream** filtered by the instance's unique name and forwards attribute value changes and alarm state changes to central.
|
||||
- **Pattern**: Subscribe/push with initial snapshot (no polling).
|
||||
- A **DebugStreamBridgeActor** (one per active debug session) is created on the central cluster by the **DebugStreamService**. The bridge actor sends a `SubscribeDebugViewRequest` to the site via `CentralCommunicationActor`, with itself as the `Sender`. The site's `InstanceActor` registers the bridge actor as the debug subscriber.
|
||||
- Site requests a **snapshot** of all current attribute values and alarm states from the Instance Actor and sends it to the bridge actor.
|
||||
- Site then subscribes to the **site-wide Akka stream** filtered by the instance's unique name and forwards `AttributeValueChanged` and `AlarmStateChanged` events to the bridge actor in real time via Akka remoting.
|
||||
- The bridge actor forwards received events to the consumer via callbacks (Blazor component or SignalR hub).
|
||||
- Attribute value stream messages: `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp.
|
||||
- Alarm state stream messages: `[InstanceUniqueName].[AlarmName]`, state (active/normal), priority, timestamp.
|
||||
- Central sends an unsubscribe request when the debug view closes. The site removes its stream subscription.
|
||||
- Central sends an unsubscribe request when the debug session ends. The site removes its stream subscription and the bridge actor is stopped.
|
||||
- The stream is session-based and temporary.
|
||||
|
||||
#### Central-Side Debug Stream Components
|
||||
|
||||
- **DebugStreamService**: Singleton service that manages debug stream sessions. Resolves instance ID to unique name and site, creates and tears down `DebugStreamBridgeActor` instances, and provides a clean API for both Blazor components and the SignalR hub.
|
||||
- **DebugStreamBridgeActor**: One per active debug session. Acts as the Akka-level subscriber registered with the site's `InstanceActor`. Receives real-time `AttributeValueChanged` and `AlarmStateChanged` events from the site and forwards them to the consumer via callbacks.
|
||||
- **DebugStreamHub**: SignalR hub at `/hubs/debug-stream` for external consumers (e.g., CLI). Authenticates via Basic Auth + LDAP and requires the **Deployment** role. Server-to-client methods: `OnSnapshot`, `OnAttributeChanged`, `OnAlarmChanged`, `OnStreamTerminated`.
|
||||
|
||||
### 6a. Debug Snapshot (Central → Site)
|
||||
- **Pattern**: Request/Response (one-shot, no subscription).
|
||||
- Central sends a `DebugSnapshotRequest` (identified by instance unique name) to the site.
|
||||
@@ -159,7 +166,7 @@ The ManagementActor is registered at the well-known path `/user/management` on c
|
||||
## Connection Failure Behavior
|
||||
|
||||
- **In-flight messages**: When a connection drops while a request is in flight (e.g., deployment sent but no response received), the Akka ask pattern times out and the caller receives a failure. There is **no automatic retry or buffering at central** — the engineer sees the failure in the UI and re-initiates the action. This is consistent with the design principle that central does not buffer messages.
|
||||
- **Debug streams**: Any connection interruption (failover or network blip) kills the debug stream. The engineer must reopen the debug view in the Central UI to re-establish the subscription with a fresh snapshot. There is no auto-resume.
|
||||
- **Debug streams**: Any connection interruption (failover or network blip) kills the debug stream. The `DebugStreamBridgeActor` is stopped and the consumer is notified via `OnStreamTerminated`. The engineer must reopen the debug view to re-establish the subscription with a fresh snapshot. There is no auto-resume.
|
||||
|
||||
## Failover Behavior
|
||||
|
||||
|
||||
Reference in New Issue
Block a user