Files
lmxopcua/docs/StatusDashboard.md
Joseph Doherty 965e430f48 Add component-level documentation for all 14 server subsystems
Provides technical documentation covering OPC UA server, address space,
Galaxy repository, MXAccess bridge, data types, read/write, subscriptions,
alarms, historian, incremental sync, configuration, dashboard, service
hosting, and CLI tool. Updates README with component documentation table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:47:59 -04:00

123 lines
5.6 KiB
Markdown

# Status Dashboard
## Overview
The service hosts an embedded HTTP status dashboard that surfaces real-time health, connection state, subscription counts, data change throughput, and Galaxy metadata. Operators access it through a browser to verify the bridge is functioning without needing an OPC UA client. The dashboard is enabled by default on port 8081 and can be disabled via configuration.
## HTTP Server
`StatusWebServer` wraps a `System.Net.HttpListener` bound to `http://+:{port}/`. It starts a background task that accepts requests in a loop and dispatches them by path. Only `GET` requests are accepted; all other methods return `405 Method Not Allowed`. Responses include `Cache-Control: no-cache` headers to prevent stale data in the browser.
### Endpoints
| Path | Content-Type | Description |
|------|-------------|-------------|
| `/` | `text/html` | HTML dashboard with auto-refresh |
| `/api/status` | `application/json` | Full status snapshot as JSON |
| `/api/health` | `application/json` | Health check: returns `200` with `{"status":"healthy"}` or `503` with `{"status":"unhealthy"}` |
Any other path returns `404 Not Found`.
## Health Check Logic
`HealthCheckService` evaluates bridge health using two rules applied in order:
1. **Unhealthy** -- MXAccess connection state is not `Connected`. Returns a red banner with the current state.
2. **Degraded** -- Any recorded operation has more than 100 total invocations and a success rate below 50%. Returns a yellow banner identifying the failing operation.
3. **Healthy** -- All checks pass. Returns a green banner with "All systems operational."
The `/api/health` endpoint returns `200` for both Healthy and Degraded states, and `503` only for Unhealthy. This allows load balancers or monitoring tools to distinguish between a service that is running but degraded and one that has lost its runtime connection.
## Status Data Model
`StatusReportService` aggregates data from all bridge components into a `StatusData` DTO, which is then rendered as HTML or serialized to JSON. The DTO contains the following sections:
### Connection
| Field | Type | Description |
|-------|------|-------------|
| `State` | `string` | Current MXAccess connection state (Connected, Disconnected, Connecting) |
| `ReconnectCount` | `int` | Number of reconnect attempts since startup |
| `ActiveSessions` | `int` | Number of active OPC UA client sessions |
### Health
| Field | Type | Description |
|-------|------|-------------|
| `Status` | `string` | Healthy, Degraded, or Unhealthy |
| `Message` | `string` | Operator-facing explanation |
| `Color` | `string` | CSS color token (green, yellow, red, gray) |
### Subscriptions
| Field | Type | Description |
|-------|------|-------------|
| `ActiveCount` | `int` | Number of active MXAccess tag subscriptions |
### Galaxy
| Field | Type | Description |
|-------|------|-------------|
| `GalaxyName` | `string` | Name of the Galaxy being bridged |
| `DbConnected` | `bool` | Whether the Galaxy repository database is reachable |
| `LastDeployTime` | `DateTime?` | Most recent deploy timestamp from the Galaxy |
| `ObjectCount` | `int` | Number of Galaxy objects in the address space |
| `AttributeCount` | `int` | Number of Galaxy attributes as OPC UA variables |
| `LastRebuildTime` | `DateTime?` | UTC timestamp of the last completed address-space rebuild |
### Data change
| Field | Type | Description |
|-------|------|-------------|
| `EventsPerSecond` | `double` | Rate of MXAccess data change events per second |
| `AvgBatchSize` | `double` | Average items processed per dispatch cycle |
| `PendingItems` | `int` | Items waiting in the dispatch queue |
| `TotalEvents` | `long` | Total MXAccess data change events since startup |
### Operations
A dictionary of `MetricsStatistics` keyed by operation name. Each entry contains:
- `TotalCount` -- total invocations
- `SuccessRate` -- fraction of successful operations
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`, `Percentile95Milliseconds` -- latency distribution
### Footer
| Field | Type | Description |
|-------|------|-------------|
| `Timestamp` | `DateTime` | UTC time when the snapshot was generated |
| `Version` | `string` | Service assembly version |
## HTML Dashboard
The HTML dashboard uses a monospace font on a dark background with color-coded panels. Each status section renders as a bordered panel whose border color reflects the component state (green, yellow, red, or gray). The operations table shows per-operation latency and success rate statistics.
The page includes a `<meta http-equiv='refresh'>` tag set to the configured `RefreshIntervalSeconds` (default 10 seconds), so the browser polls automatically without JavaScript.
## Configuration
The dashboard is configured through the `Dashboard` section in `appsettings.json`:
```json
{
"Dashboard": {
"Enabled": true,
"Port": 8081,
"RefreshIntervalSeconds": 10
}
}
```
Setting `Enabled` to `false` prevents the `StatusWebServer` from starting. The `StatusReportService` is still created so that other components can query health programmatically, but no HTTP listener is opened.
## Component Wiring
`StatusReportService` is initialized after all other service components are created. `OpcUaService.Start()` calls `SetComponents()` to supply the live references:
```csharp
_statusReport.SetComponents(effectiveMxClient, _metrics, _galaxyStats, _serverHost, _nodeManager);
```
This deferred wiring allows the report service to be constructed before the MXAccess client or node manager are fully initialized. If a component is `null`, the report service falls back to default values (e.g., `ConnectionState.Disconnected`, zero counts).