15 KiB
Status Dashboard
Overview
The service hosts an embedded HTTP status dashboard that surfaces real-time health, connection state, subscription counts, data change throughput, and Galaxy metadata. Operators access it through a browser to verify the bridge is functioning without needing an OPC UA client. The dashboard is enabled by default on port 8081 and can be disabled via configuration.
HTTP Server
StatusWebServer wraps a System.Net.HttpListener bound to http://+:{port}/. It starts a background task that accepts requests in a loop and dispatches them by path. Only GET requests are accepted; all other methods return 405 Method Not Allowed. Responses include Cache-Control: no-cache headers to prevent stale data in the browser.
Endpoints
| Path | Content-Type | Description |
|---|---|---|
/ |
text/html |
Operator dashboard with auto-refresh |
/health |
text/html |
Focused health page with service-level badge and component cards |
/api/status |
application/json |
Full status snapshot as JSON (StatusData) |
/api/health |
application/json |
Health endpoint (HealthEndpointData) -- returns 503 when status is Unhealthy, 200 otherwise |
Any other path returns 404 Not Found.
Health Check Logic
HealthCheckService.CheckHealth evaluates bridge health using the following rules applied in order. The first rule that matches wins; rules 2b, 2c, and 2d only fire when the corresponding integration is enabled and a non-null snapshot is passed:
- Rule 1 -- Unhealthy: MXAccess connection state is not
Connected. Returns a red banner with the current state. - Rule 2b -- Degraded:
Historian.Enabled=truebut the plugin load outcome is notLoaded. Returns a yellow banner citing the plugin status (NotFound,LoadFailed) and the error message if one is available. - Rule 2 / 2c -- Degraded: Any recorded operation has a low success rate. The sample threshold depends on the operation category:
- Regular operations (
Read,Write,Subscribe,AlarmAcknowledge): >100 invocations and <50% success rate. - Historian operations (
HistoryReadRaw,HistoryReadProcessed,HistoryReadAtTime,HistoryReadEvents): >10 invocations and <50% success rate. The lower threshold surfaces a stuck historian quickly, since history reads are rare relative to live reads.
- Regular operations (
- Rule 2d -- Degraded (latched):
AlarmTrackingEnabled=trueand any alarm acknowledge MXAccess write has failed since startup. Latched on purpose -- an ack write failure is a durable MXAccess write problem that should stay visible until the operator restarts. - Rule 3 -- Healthy: All checks pass. Returns a green banner with "All systems operational."
The /api/health endpoint returns 200 for both Healthy and Degraded states, and 503 only for Unhealthy. This allows load balancers or monitoring tools to distinguish between a service that is running but degraded and one that has lost its runtime connection.
Status Data Model
StatusReportService aggregates data from all bridge components into a StatusData DTO, which is then rendered as HTML or serialized to JSON. The DTO contains the following sections:
Connection
| Field | Type | Description |
|---|---|---|
State |
string |
Current MXAccess connection state (Connected, Disconnected, Connecting) |
ReconnectCount |
int |
Number of reconnect attempts since startup |
ActiveSessions |
int |
Number of active OPC UA client sessions |
Health
| Field | Type | Description |
|---|---|---|
Status |
string |
Healthy, Degraded, or Unhealthy |
Message |
string |
Operator-facing explanation |
Color |
string |
CSS color token (green, yellow, red, gray) |
Subscriptions
| Field | Type | Description |
|---|---|---|
ActiveCount |
int |
Number of active MXAccess tag subscriptions |
Galaxy
| Field | Type | Description |
|---|---|---|
GalaxyName |
string |
Name of the Galaxy being bridged |
DbConnected |
bool |
Whether the Galaxy repository database is reachable |
LastDeployTime |
DateTime? |
Most recent deploy timestamp from the Galaxy |
ObjectCount |
int |
Number of Galaxy objects in the address space |
AttributeCount |
int |
Number of Galaxy attributes as OPC UA variables |
LastRebuildTime |
DateTime? |
UTC timestamp of the last completed address-space rebuild |
Data change
| Field | Type | Description |
|---|---|---|
EventsPerSecond |
double |
Rate of MXAccess data change events per second |
AvgBatchSize |
double |
Average items processed per dispatch cycle |
PendingItems |
int |
Items waiting in the dispatch queue |
TotalEvents |
long |
Total MXAccess data change events since startup |
Operations
A dictionary of MetricsStatistics keyed by operation name. Each entry contains:
TotalCount-- total invocationsSuccessRate-- fraction of successful operationsAverageMilliseconds,MinMilliseconds,MaxMilliseconds,Percentile95Milliseconds-- latency distribution
The instrumented operation names are:
| Name | Source |
|---|---|
Read |
MXAccess live tag reads (MxAccessClient.ReadWrite.cs) |
Write |
MXAccess live tag writes |
Subscribe |
MXAccess subscription attach |
HistoryReadRaw |
LmxNodeManager.HistoryReadRawModified -> historian plugin |
HistoryReadProcessed |
LmxNodeManager.HistoryReadProcessed -> historian plugin (aggregates) |
HistoryReadAtTime |
LmxNodeManager.HistoryReadAtTime -> historian plugin (interpolated) |
HistoryReadEvents |
LmxNodeManager.HistoryReadEvents -> historian plugin (alarm/event history) |
AlarmAcknowledge |
LmxNodeManager.OnAlarmAcknowledge -> MXAccess AckMsg write |
New operation names are auto-registered on first use, so the Operations dictionary only contains entries for features that have actually been exercised since startup.
Historian
HistorianStatusInfo -- reflects the outcome of the runtime-loaded historian plugin and the runtime query-health counters. See Historical Data Access for the plugin architecture and the Runtime Health Counters section for the data source instrumentation.
| Field | Type | Description |
|---|---|---|
Enabled |
bool |
Whether Historian.Enabled is set in configuration |
PluginStatus |
string |
Disabled, NotFound, LoadFailed, or Loaded — load-time outcome from HistorianPluginLoader.LastOutcome |
PluginError |
string? |
Exception message from the last load attempt when PluginStatus=LoadFailed; otherwise null |
PluginPath |
string |
Absolute path the loader probed for the plugin assembly |
ServerName |
string |
Legacy single-node hostname from Historian.ServerName; ignored when ServerNames is non-empty |
Port |
int |
Configured historian TCP port |
QueryTotal |
long |
Total historian read queries attempted since startup (raw + aggregate + at-time + events) |
QuerySuccesses |
long |
Queries that completed without an exception |
QueryFailures |
long |
Queries that raised an exception — each failure also triggers the plugin's reconnect path |
ConsecutiveFailures |
int |
Failures since the last success. Resets to zero on any successful query. Drives the Degraded health rule at threshold 3 |
LastSuccessTime |
DateTime? |
UTC timestamp of the most recent successful query, or null when no query has succeeded since startup |
LastFailureTime |
DateTime? |
UTC timestamp of the most recent failure |
LastQueryError |
string? |
Exception message from the most recent failure. Prefixed with the read-path name (raw:, aggregate:, at-time:, events:) so operators can tell which SDK call failed |
ProcessConnectionOpen |
bool |
Whether the plugin currently holds an open SDK connection for the process silo (historical value queries — ReadRaw, ReadAggregate, ReadAtTime). See Two SDK connection silos |
EventConnectionOpen |
bool |
Whether the plugin currently holds an open SDK connection for the event silo (alarm history queries — ReadEvents). Separate from the process connection because the SDK requires distinct query channels |
ActiveProcessNode |
string? |
Cluster node currently serving the process silo, or null when no process connection is open |
ActiveEventNode |
string? |
Cluster node currently serving the event silo, or null when no event connection is open |
NodeCount |
int |
Total configured historian cluster nodes. 1 for a legacy single-node deployment |
HealthyNodeCount |
int |
Nodes currently eligible for new connections (not in failure cooldown) |
Nodes |
List<HistorianClusterNodeState> |
Per-node cluster state in configuration order. Each entry carries Name, IsHealthy, CooldownUntil, FailureCount, LastError, LastFailureTime |
The operator dashboard renders a cluster table inside the Historian panel when NodeCount > 1. Legacy single-node deployments render a compact Node: <hostname> line and no table. Panel color reflects combined load-time + runtime health: green when everything is fine, yellow when any cluster node is in cooldown or 1-4 consecutive query failures are accumulated, red when the plugin is unloaded / all cluster nodes are failed / 5+ consecutive failures.
Alarms
AlarmStatusInfo -- surfaces alarm-condition tracking health and dispatch counters.
| Field | Type | Description |
|---|---|---|
TrackingEnabled |
bool |
Whether OpcUa.AlarmTrackingEnabled is set in configuration |
ConditionCount |
int |
Number of distinct alarm conditions currently tracked |
ActiveAlarmCount |
int |
Number of alarms currently in the InAlarm=true state |
TransitionCount |
long |
Total InAlarm transitions observed in the dispatch loop since startup |
AckEventCount |
long |
Total alarm acknowledgement transitions observed since startup |
AckWriteFailures |
long |
Total MXAccess AckMsg writes that have failed while processing alarm acknowledges. Any non-zero value latches the service into Degraded (see Rule 2d). |
FilterEnabled |
bool |
Whether OpcUa.AlarmFilter.ObjectFilters has any patterns configured |
FilterPatternCount |
int |
Number of compiled filter patterns (after comma-splitting and trimming) |
FilterIncludedObjectCount |
int |
Number of Galaxy objects included by the filter during the most recent address-space build. Zero when the filter is disabled. |
When the filter is active, the operator dashboard's Alarms panel renders an extra line Filter: N pattern(s), M object(s) included so operators can verify scope at a glance. See Alarm Tracking for the matching rules and resolution algorithm.
Redundancy
RedundancyInfo -- only populated when Redundancy.Enabled=true in configuration. Shows mode, role, computed service level, application URI, and the set of peer server URIs. See Redundancy for the full guide.
Footer
| Field | Type | Description |
|---|---|---|
Timestamp |
DateTime |
UTC time when the snapshot was generated |
Version |
string |
Service assembly version |
/api/health Payload
The health endpoint returns a HealthEndpointData document distinct from the full dashboard snapshot. It is designed for load balancers and external monitoring probes that only need an up/down signal plus component-level detail:
| Field | Type | Description |
|---|---|---|
Status |
string |
Healthy, Degraded, or Unhealthy (drives the HTTP status code) |
ServiceLevel |
byte |
OPC UA-style 0-255 service level. 255 when healthy non-redundant; 0 when MXAccess is down; redundancy-adjusted otherwise |
RedundancyEnabled |
bool |
Whether redundancy is configured |
RedundancyRole |
string? |
Primary or Secondary when redundancy is enabled; null otherwise |
RedundancyMode |
string? |
Warm or Hot when redundancy is enabled; null otherwise |
Components.MxAccess |
string |
Connected or Disconnected |
Components.Database |
string |
Connected or Disconnected |
Components.OpcUaServer |
string |
Running or Stopped |
Components.Historian |
string |
Disabled, NotFound, LoadFailed, or Loaded -- matches HistorianStatusInfo.PluginStatus |
Components.Alarms |
string |
Disabled or Enabled -- mirrors OpcUa.AlarmTrackingEnabled |
Uptime |
string |
Formatted service uptime (e.g., 3d 5h 20m) |
Timestamp |
DateTime |
UTC time the snapshot was generated |
Monitoring tools should:
- Alert on
Status=Unhealthy(HTTP 503) for hard outages. - Alert on
Status=Degraded(HTTP 200) for latched or cumulative failures -- a degraded status means the server is still operating but a subsystem needs attention (historian plugin missing, alarm ack writes failing, history read error rate too high, etc.).
HTML Dashboards
/ -- Operator dashboard
Monospace, dark background, color-coded panels. Panels: Connection, Health, Redundancy (when enabled), Subscriptions, Data Change Dispatch, Galaxy Info, Historian, Alarms, Operations (table), Footer. Each panel border color reflects component state (green, yellow, red, or gray).
The page includes a <meta http-equiv='refresh'> tag set to the configured RefreshIntervalSeconds (default 10 seconds), so the browser polls automatically without JavaScript.
/health -- Focused health view
Large status badge, computed ServiceLevel value, redundancy summary (when enabled), and a row of component cards: MXAccess, Galaxy Database, OPC UA Server, Historian, Alarm Tracking. Each card turns red when its component is in a failure state and grey when disabled. Best for wallboards and quick at-a-glance monitoring.
Configuration
The dashboard is configured through the Dashboard section in appsettings.json:
{
"Dashboard": {
"Enabled": true,
"Port": 8081,
"RefreshIntervalSeconds": 10
}
}
Setting Enabled to false prevents the StatusWebServer from starting. The StatusReportService is still created so that other components can query health programmatically, but no HTTP listener is opened.
Component Wiring
StatusReportService is initialized after all other service components are created. OpcUaService.Start() calls SetComponents() to supply the live references, including the historian configuration so the dashboard can label the plugin target and evaluate Rule 2b:
StatusReportInstance.SetComponents(
effectiveMxClient,
Metrics,
GalaxyStatsInstance,
ServerHost,
NodeManagerInstance,
_config.Redundancy,
_config.OpcUa.ApplicationUri,
_config.Historian);
This deferred wiring allows the report service to be constructed before the MXAccess client or node manager are fully initialized. If a component is null, the report service falls back to default values (e.g., ConnectionState.Disconnected, zero counts, HistorianPluginStatus.Disabled).
The historian plugin status is sourced from HistorianPluginLoader.LastOutcome, which is updated on every load attempt. OpcUaService explicitly calls HistorianPluginLoader.MarkDisabled() when Historian.Enabled=false so the dashboard can distinguish "feature off" from "load failed" without ambiguity.