Instrument the historian plugin with runtime query health counters and read-only cluster failover so operators can detect silent query degradation and keep serving history when a single cluster node goes down
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -104,16 +104,32 @@ New operation names are auto-registered on first use, so the `Operations` dictio
|
||||
|
||||
### Historian
|
||||
|
||||
`HistorianStatusInfo` -- reflects the outcome of the runtime-loaded historian plugin. See [Historical Data Access](HistoricalDataAccess.md) for the plugin architecture.
|
||||
`HistorianStatusInfo` -- reflects the outcome of the runtime-loaded historian plugin and the runtime query-health counters. See [Historical Data Access](HistoricalDataAccess.md) for the plugin architecture and the [Runtime Health Counters](HistoricalDataAccess.md#runtime-health-counters) section for the data source instrumentation.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `Enabled` | `bool` | Whether `Historian.Enabled` is set in configuration |
|
||||
| `PluginStatus` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` |
|
||||
| `PluginStatus` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` — load-time outcome from `HistorianPluginLoader.LastOutcome` |
|
||||
| `PluginError` | `string?` | Exception message from the last load attempt when `PluginStatus=LoadFailed`; otherwise `null` |
|
||||
| `PluginPath` | `string` | Absolute path the loader probed for the plugin assembly |
|
||||
| `ServerName` | `string` | Configured historian hostname |
|
||||
| `ServerName` | `string` | Legacy single-node hostname from `Historian.ServerName`; ignored when `ServerNames` is non-empty |
|
||||
| `Port` | `int` | Configured historian TCP port |
|
||||
| `QueryTotal` | `long` | Total historian read queries attempted since startup (raw + aggregate + at-time + events) |
|
||||
| `QuerySuccesses` | `long` | Queries that completed without an exception |
|
||||
| `QueryFailures` | `long` | Queries that raised an exception — each failure also triggers the plugin's reconnect path |
|
||||
| `ConsecutiveFailures` | `int` | Failures since the last success. Resets to zero on any successful query. Drives the `Degraded` health rule at threshold 3 |
|
||||
| `LastSuccessTime` | `DateTime?` | UTC timestamp of the most recent successful query, or `null` when no query has succeeded since startup |
|
||||
| `LastFailureTime` | `DateTime?` | UTC timestamp of the most recent failure |
|
||||
| `LastQueryError` | `string?` | Exception message from the most recent failure. Prefixed with the read-path name (`raw:`, `aggregate:`, `at-time:`, `events:`) so operators can tell which SDK call failed |
|
||||
| `ProcessConnectionOpen` | `bool` | Whether the plugin currently holds an open SDK connection for the **process** silo (historical value queries — `ReadRaw`, `ReadAggregate`, `ReadAtTime`). See [Two SDK connection silos](HistoricalDataAccess.md#two-sdk-connection-silos) |
|
||||
| `EventConnectionOpen` | `bool` | Whether the plugin currently holds an open SDK connection for the **event** silo (alarm history queries — `ReadEvents`). Separate from the process connection because the SDK requires distinct query channels |
|
||||
| `ActiveProcessNode` | `string?` | Cluster node currently serving the process silo, or `null` when no process connection is open |
|
||||
| `ActiveEventNode` | `string?` | Cluster node currently serving the event silo, or `null` when no event connection is open |
|
||||
| `NodeCount` | `int` | Total configured historian cluster nodes. 1 for a legacy single-node deployment |
|
||||
| `HealthyNodeCount` | `int` | Nodes currently eligible for new connections (not in failure cooldown) |
|
||||
| `Nodes` | `List<HistorianClusterNodeState>` | Per-node cluster state in configuration order. Each entry carries `Name`, `IsHealthy`, `CooldownUntil`, `FailureCount`, `LastError`, `LastFailureTime` |
|
||||
|
||||
The operator dashboard renders a cluster table inside the Historian panel when `NodeCount > 1`. Legacy single-node deployments render a compact `Node: <hostname>` line and no table. Panel color reflects combined load-time + runtime health: green when everything is fine, yellow when any cluster node is in cooldown or 1-4 consecutive query failures are accumulated, red when the plugin is unloaded / all cluster nodes are failed / 5+ consecutive failures.
|
||||
|
||||
### Alarms
|
||||
|
||||
|
||||
Reference in New Issue
Block a user