Surface historian plugin and alarm-tracking health in the status dashboard so operators can detect misconfiguration and runtime degradation that previously showed as fully healthy
Wraps the 4 HistoryRead overrides and OnAlarmAcknowledge with PerformanceMetrics.BeginOperation, adds alarm counters to LmxNodeManager, publishes a structured HistorianPluginOutcome from HistorianPluginLoader, and extends HealthCheckService with plugin-load, history-read, and alarm-ack-failure degradation rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -12,19 +12,24 @@ The service hosts an embedded HTTP status dashboard that surfaces real-time heal
|
||||
|
||||
| Path | Content-Type | Description |
|
||||
|------|-------------|-------------|
|
||||
| `/` | `text/html` | HTML dashboard with auto-refresh |
|
||||
| `/api/status` | `application/json` | Full status snapshot as JSON |
|
||||
| `/api/health` | `application/json` | Health check: returns `200` with `{"status":"healthy"}` or `503` with `{"status":"unhealthy"}` |
|
||||
| `/` | `text/html` | Operator dashboard with auto-refresh |
|
||||
| `/health` | `text/html` | Focused health page with service-level badge and component cards |
|
||||
| `/api/status` | `application/json` | Full status snapshot as JSON (`StatusData`) |
|
||||
| `/api/health` | `application/json` | Health endpoint (`HealthEndpointData`) -- returns `503` when status is `Unhealthy`, `200` otherwise |
|
||||
|
||||
Any other path returns `404 Not Found`.
|
||||
|
||||
## Health Check Logic
|
||||
|
||||
`HealthCheckService` evaluates bridge health using two rules applied in order:
|
||||
`HealthCheckService.CheckHealth` evaluates bridge health using the following rules applied in order. The first rule that matches wins; rules 2b, 2c, and 2d only fire when the corresponding integration is enabled and a non-null snapshot is passed:
|
||||
|
||||
1. **Unhealthy** -- MXAccess connection state is not `Connected`. Returns a red banner with the current state.
|
||||
2. **Degraded** -- Any recorded operation has more than 100 total invocations and a success rate below 50%. Returns a yellow banner identifying the failing operation.
|
||||
3. **Healthy** -- All checks pass. Returns a green banner with "All systems operational."
|
||||
1. **Rule 1 -- Unhealthy**: MXAccess connection state is not `Connected`. Returns a red banner with the current state.
|
||||
2. **Rule 2b -- Degraded**: `Historian.Enabled=true` but the plugin load outcome is not `Loaded`. Returns a yellow banner citing the plugin status (`NotFound`, `LoadFailed`) and the error message if one is available.
|
||||
3. **Rule 2 / 2c -- Degraded**: Any recorded operation has a low success rate. The sample threshold depends on the operation category:
|
||||
- Regular operations (`Read`, `Write`, `Subscribe`, `AlarmAcknowledge`): >100 invocations and <50% success rate.
|
||||
- Historian operations (`HistoryReadRaw`, `HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`): >10 invocations and <50% success rate. The lower threshold surfaces a stuck historian quickly, since history reads are rare relative to live reads.
|
||||
4. **Rule 2d -- Degraded (latched)**: `AlarmTrackingEnabled=true` and any alarm acknowledge MXAccess write has failed since startup. Latched on purpose -- an ack write failure is a durable MXAccess write problem that should stay visible until the operator restarts.
|
||||
5. **Rule 3 -- Healthy**: All checks pass. Returns a green banner with "All systems operational."
|
||||
|
||||
The `/api/health` endpoint returns `200` for both Healthy and Degraded states, and `503` only for Unhealthy. This allows load balancers or monitoring tools to distinguish between a service that is running but degraded and one that has lost its runtime connection.
|
||||
|
||||
@@ -82,6 +87,51 @@ A dictionary of `MetricsStatistics` keyed by operation name. Each entry contains
|
||||
- `SuccessRate` -- fraction of successful operations
|
||||
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`, `Percentile95Milliseconds` -- latency distribution
|
||||
|
||||
The instrumented operation names are:
|
||||
|
||||
| Name | Source |
|
||||
|---|---|
|
||||
| `Read` | MXAccess live tag reads (`MxAccessClient.ReadWrite.cs`) |
|
||||
| `Write` | MXAccess live tag writes |
|
||||
| `Subscribe` | MXAccess subscription attach |
|
||||
| `HistoryReadRaw` | `LmxNodeManager.HistoryReadRawModified` -> historian plugin |
|
||||
| `HistoryReadProcessed` | `LmxNodeManager.HistoryReadProcessed` -> historian plugin (aggregates) |
|
||||
| `HistoryReadAtTime` | `LmxNodeManager.HistoryReadAtTime` -> historian plugin (interpolated) |
|
||||
| `HistoryReadEvents` | `LmxNodeManager.HistoryReadEvents` -> historian plugin (alarm/event history) |
|
||||
| `AlarmAcknowledge` | `LmxNodeManager.OnAlarmAcknowledge` -> MXAccess AckMsg write |
|
||||
|
||||
New operation names are auto-registered on first use, so the `Operations` dictionary only contains entries for features that have actually been exercised since startup.
|
||||
|
||||
### Historian
|
||||
|
||||
`HistorianStatusInfo` -- reflects the outcome of the runtime-loaded historian plugin. See [Historical Data Access](HistoricalDataAccess.md) for the plugin architecture.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `Enabled` | `bool` | Whether `Historian.Enabled` is set in configuration |
|
||||
| `PluginStatus` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` |
|
||||
| `PluginError` | `string?` | Exception message from the last load attempt when `PluginStatus=LoadFailed`; otherwise `null` |
|
||||
| `PluginPath` | `string` | Absolute path the loader probed for the plugin assembly |
|
||||
| `ServerName` | `string` | Configured historian hostname |
|
||||
| `Port` | `int` | Configured historian TCP port |
|
||||
|
||||
### Alarms
|
||||
|
||||
`AlarmStatusInfo` -- surfaces alarm-condition tracking health and dispatch counters.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `TrackingEnabled` | `bool` | Whether `OpcUa.AlarmTrackingEnabled` is set in configuration |
|
||||
| `ConditionCount` | `int` | Number of distinct alarm conditions currently tracked |
|
||||
| `ActiveAlarmCount` | `int` | Number of alarms currently in the `InAlarm=true` state |
|
||||
| `TransitionCount` | `long` | Total `InAlarm` transitions observed in the dispatch loop since startup |
|
||||
| `AckEventCount` | `long` | Total alarm acknowledgement transitions observed since startup |
|
||||
| `AckWriteFailures` | `long` | Total MXAccess AckMsg writes that have failed while processing alarm acknowledges. Any non-zero value latches the service into Degraded (see Rule 2d). |
|
||||
|
||||
### Redundancy
|
||||
|
||||
`RedundancyInfo` -- only populated when `Redundancy.Enabled=true` in configuration. Shows mode, role, computed service level, application URI, and the set of peer server URIs. See [Redundancy](Redundancy.md) for the full guide.
|
||||
|
||||
### Footer
|
||||
|
||||
| Field | Type | Description |
|
||||
@@ -89,12 +139,42 @@ A dictionary of `MetricsStatistics` keyed by operation name. Each entry contains
|
||||
| `Timestamp` | `DateTime` | UTC time when the snapshot was generated |
|
||||
| `Version` | `string` | Service assembly version |
|
||||
|
||||
## HTML Dashboard
|
||||
## `/api/health` Payload
|
||||
|
||||
The HTML dashboard uses a monospace font on a dark background with color-coded panels. Each status section renders as a bordered panel whose border color reflects the component state (green, yellow, red, or gray). The operations table shows per-operation latency and success rate statistics.
|
||||
The health endpoint returns a `HealthEndpointData` document distinct from the full dashboard snapshot. It is designed for load balancers and external monitoring probes that only need an up/down signal plus component-level detail:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `Status` | `string` | `Healthy`, `Degraded`, or `Unhealthy` (drives the HTTP status code) |
|
||||
| `ServiceLevel` | `byte` | OPC UA-style 0-255 service level. 255 when healthy non-redundant; 0 when MXAccess is down; redundancy-adjusted otherwise |
|
||||
| `RedundancyEnabled` | `bool` | Whether redundancy is configured |
|
||||
| `RedundancyRole` | `string?` | `Primary` or `Secondary` when redundancy is enabled; `null` otherwise |
|
||||
| `RedundancyMode` | `string?` | `Warm` or `Hot` when redundancy is enabled; `null` otherwise |
|
||||
| `Components.MxAccess` | `string` | `Connected` or `Disconnected` |
|
||||
| `Components.Database` | `string` | `Connected` or `Disconnected` |
|
||||
| `Components.OpcUaServer` | `string` | `Running` or `Stopped` |
|
||||
| `Components.Historian` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` -- matches `HistorianStatusInfo.PluginStatus` |
|
||||
| `Components.Alarms` | `string` | `Disabled` or `Enabled` -- mirrors `OpcUa.AlarmTrackingEnabled` |
|
||||
| `Uptime` | `string` | Formatted service uptime (e.g., `3d 5h 20m`) |
|
||||
| `Timestamp` | `DateTime` | UTC time the snapshot was generated |
|
||||
|
||||
Monitoring tools should:
|
||||
|
||||
- Alert on `Status=Unhealthy` (HTTP 503) for hard outages.
|
||||
- Alert on `Status=Degraded` (HTTP 200) for latched or cumulative failures -- a degraded status means the server is still operating but a subsystem needs attention (historian plugin missing, alarm ack writes failing, history read error rate too high, etc.).
|
||||
|
||||
## HTML Dashboards
|
||||
|
||||
### `/` -- Operator dashboard
|
||||
|
||||
Monospace, dark background, color-coded panels. Panels: Connection, Health, Redundancy (when enabled), Subscriptions, Data Change Dispatch, Galaxy Info, **Historian**, **Alarms**, Operations (table), Footer. Each panel border color reflects component state (green, yellow, red, or gray).
|
||||
|
||||
The page includes a `<meta http-equiv='refresh'>` tag set to the configured `RefreshIntervalSeconds` (default 10 seconds), so the browser polls automatically without JavaScript.
|
||||
|
||||
### `/health` -- Focused health view
|
||||
|
||||
Large status badge, computed `ServiceLevel` value, redundancy summary (when enabled), and a row of component cards: MXAccess, Galaxy Database, OPC UA Server, **Historian**, **Alarm Tracking**. Each card turns red when its component is in a failure state and grey when disabled. Best for wallboards and quick at-a-glance monitoring.
|
||||
|
||||
## Configuration
|
||||
|
||||
The dashboard is configured through the `Dashboard` section in `appsettings.json`:
|
||||
@@ -113,10 +193,20 @@ Setting `Enabled` to `false` prevents the `StatusWebServer` from starting. The `
|
||||
|
||||
## Component Wiring
|
||||
|
||||
`StatusReportService` is initialized after all other service components are created. `OpcUaService.Start()` calls `SetComponents()` to supply the live references:
|
||||
`StatusReportService` is initialized after all other service components are created. `OpcUaService.Start()` calls `SetComponents()` to supply the live references, including the historian configuration so the dashboard can label the plugin target and evaluate Rule 2b:
|
||||
|
||||
```csharp
|
||||
_statusReport.SetComponents(effectiveMxClient, _metrics, _galaxyStats, _serverHost, _nodeManager);
|
||||
StatusReportInstance.SetComponents(
|
||||
effectiveMxClient,
|
||||
Metrics,
|
||||
GalaxyStatsInstance,
|
||||
ServerHost,
|
||||
NodeManagerInstance,
|
||||
_config.Redundancy,
|
||||
_config.OpcUa.ApplicationUri,
|
||||
_config.Historian);
|
||||
```
|
||||
|
||||
This deferred wiring allows the report service to be constructed before the MXAccess client or node manager are fully initialized. If a component is `null`, the report service falls back to default values (e.g., `ConnectionState.Disconnected`, zero counts).
|
||||
This deferred wiring allows the report service to be constructed before the MXAccess client or node manager are fully initialized. If a component is `null`, the report service falls back to default values (e.g., `ConnectionState.Disconnected`, zero counts, `HistorianPluginStatus.Disabled`).
|
||||
|
||||
The historian plugin status is sourced from `HistorianPluginLoader.LastOutcome`, which is updated on every load attempt. `OpcUaService` explicitly calls `HistorianPluginLoader.MarkDisabled()` when `Historian.Enabled=false` so the dashboard can distinguish "feature off" from "load failed" without ambiguity.
|
||||
|
||||
@@ -6,6 +6,39 @@ using ZB.MOM.WW.LmxOpcUa.Host.Configuration;
|
||||
|
||||
namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
{
|
||||
/// <summary>
|
||||
/// Result of the most recent historian plugin load attempt.
|
||||
/// </summary>
|
||||
public enum HistorianPluginStatus
|
||||
{
|
||||
/// <summary>Historian.Enabled is false; TryLoad was not called.</summary>
|
||||
Disabled,
|
||||
/// <summary>Plugin DLL was not present in the Historian/ subfolder.</summary>
|
||||
NotFound,
|
||||
/// <summary>Plugin file exists but could not be loaded or instantiated.</summary>
|
||||
LoadFailed,
|
||||
/// <summary>Plugin loaded and an IHistorianDataSource was constructed.</summary>
|
||||
Loaded
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Structured outcome of a <see cref="HistorianPluginLoader.TryLoad"/> or
|
||||
/// <see cref="HistorianPluginLoader.MarkDisabled"/> call, used by the status dashboard.
|
||||
/// </summary>
|
||||
public sealed class HistorianPluginOutcome
|
||||
{
|
||||
public HistorianPluginOutcome(HistorianPluginStatus status, string pluginPath, string? error)
|
||||
{
|
||||
Status = status;
|
||||
PluginPath = pluginPath;
|
||||
Error = error;
|
||||
}
|
||||
|
||||
public HistorianPluginStatus Status { get; }
|
||||
public string PluginPath { get; }
|
||||
public string? Error { get; }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Loads the Wonderware historian plugin assembly from the Historian/ subfolder next to
|
||||
/// the host executable. Used so the aahClientManaged SDK is not needed on hosts that run
|
||||
@@ -23,9 +56,28 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
private static bool _resolverInstalled;
|
||||
private static string? _resolvedProbeDirectory;
|
||||
|
||||
/// <summary>
|
||||
/// Gets the outcome of the most recent load attempt (or <see cref="HistorianPluginStatus.Disabled"/>
|
||||
/// if the loader has never been invoked). The dashboard reads this to distinguish "disabled",
|
||||
/// "plugin missing", and "plugin crashed".
|
||||
/// </summary>
|
||||
public static HistorianPluginOutcome LastOutcome { get; private set; }
|
||||
= new HistorianPluginOutcome(HistorianPluginStatus.Disabled, string.Empty, null);
|
||||
|
||||
/// <summary>
|
||||
/// Records that the historian plugin is disabled by configuration. Called by
|
||||
/// <c>OpcUaService</c> when <c>Historian.Enabled=false</c> so the status dashboard can
|
||||
/// report the exact reason history is unavailable.
|
||||
/// </summary>
|
||||
public static void MarkDisabled()
|
||||
{
|
||||
LastOutcome = new HistorianPluginOutcome(HistorianPluginStatus.Disabled, string.Empty, null);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Attempts to load the historian plugin and construct an <see cref="IHistorianDataSource"/>.
|
||||
/// Returns null on any failure so the server can continue with history unsupported.
|
||||
/// Returns null on any failure so the server can continue with history unsupported. The
|
||||
/// specific reason is published on <see cref="LastOutcome"/>.
|
||||
/// </summary>
|
||||
public static IHistorianDataSource? TryLoad(HistorianConfiguration config)
|
||||
{
|
||||
@@ -37,6 +89,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
Log.Warning(
|
||||
"Historian plugin not found at {PluginPath} — history read operations will return BadHistoryOperationUnsupported",
|
||||
pluginPath);
|
||||
LastOutcome = new HistorianPluginOutcome(HistorianPluginStatus.NotFound, pluginPath, null);
|
||||
return null;
|
||||
}
|
||||
|
||||
@@ -49,6 +102,9 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
if (entryType == null)
|
||||
{
|
||||
Log.Warning("Historian plugin {PluginPath} does not expose {EntryType}", pluginPath, PluginEntryType);
|
||||
LastOutcome = new HistorianPluginOutcome(
|
||||
HistorianPluginStatus.LoadFailed, pluginPath,
|
||||
$"Plugin assembly does not expose entry type {PluginEntryType}");
|
||||
return null;
|
||||
}
|
||||
|
||||
@@ -56,6 +112,9 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
if (create == null)
|
||||
{
|
||||
Log.Warning("Historian plugin entry type {EntryType} missing static {Method}", PluginEntryType, PluginEntryMethod);
|
||||
LastOutcome = new HistorianPluginOutcome(
|
||||
HistorianPluginStatus.LoadFailed, pluginPath,
|
||||
$"Plugin entry type {PluginEntryType} is missing a public static {PluginEntryMethod} method");
|
||||
return null;
|
||||
}
|
||||
|
||||
@@ -63,15 +122,22 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Historian
|
||||
if (result is IHistorianDataSource dataSource)
|
||||
{
|
||||
Log.Information("Historian plugin loaded from {PluginPath}", pluginPath);
|
||||
LastOutcome = new HistorianPluginOutcome(HistorianPluginStatus.Loaded, pluginPath, null);
|
||||
return dataSource;
|
||||
}
|
||||
|
||||
Log.Warning("Historian plugin {PluginPath} returned an object that does not implement IHistorianDataSource", pluginPath);
|
||||
LastOutcome = new HistorianPluginOutcome(
|
||||
HistorianPluginStatus.LoadFailed, pluginPath,
|
||||
"Plugin entry method returned an object that does not implement IHistorianDataSource");
|
||||
return null;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
Log.Warning(ex, "Failed to load historian plugin from {PluginPath} — history disabled", pluginPath);
|
||||
LastOutcome = new HistorianPluginOutcome(
|
||||
HistorianPluginStatus.LoadFailed, pluginPath,
|
||||
ex.GetBaseException().Message);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -73,6 +73,11 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
// Dispatch queue metrics
|
||||
private long _totalMxChangeEvents;
|
||||
|
||||
// Alarm instrumentation counters
|
||||
private long _alarmTransitionCount;
|
||||
private long _alarmAckEventCount;
|
||||
private long _alarmAckWriteFailures;
|
||||
|
||||
/// <summary>
|
||||
/// Initializes a new node manager for the Galaxy-backed OPC UA namespace.
|
||||
/// </summary>
|
||||
@@ -151,6 +156,47 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
/// </summary>
|
||||
public double AverageDispatchBatchSize { get; private set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets a value indicating whether alarm condition tracking is enabled for this node manager.
|
||||
/// </summary>
|
||||
public bool AlarmTrackingEnabled => _alarmTrackingEnabled;
|
||||
|
||||
/// <summary>
|
||||
/// Gets the number of distinct alarm conditions currently tracked (one per alarm attribute).
|
||||
/// </summary>
|
||||
public int AlarmConditionCount => _alarmInAlarmTags.Count;
|
||||
|
||||
/// <summary>
|
||||
/// Gets the number of alarms currently in the InAlarm=true state.
|
||||
/// </summary>
|
||||
public int ActiveAlarmCount => CountActiveAlarms();
|
||||
|
||||
/// <summary>
|
||||
/// Gets the total number of InAlarm transition events observed in the dispatch loop since startup.
|
||||
/// </summary>
|
||||
public long AlarmTransitionCount => Interlocked.Read(ref _alarmTransitionCount);
|
||||
|
||||
/// <summary>
|
||||
/// Gets the total number of alarm acknowledgement transition events observed since startup.
|
||||
/// </summary>
|
||||
public long AlarmAckEventCount => Interlocked.Read(ref _alarmAckEventCount);
|
||||
|
||||
/// <summary>
|
||||
/// Gets the total number of MXAccess AckMsg writes that failed while processing alarm acknowledges.
|
||||
/// </summary>
|
||||
public long AlarmAckWriteFailures => Interlocked.Read(ref _alarmAckWriteFailures);
|
||||
|
||||
private int CountActiveAlarms()
|
||||
{
|
||||
var count = 0;
|
||||
lock (Lock)
|
||||
{
|
||||
foreach (var info in _alarmInAlarmTags.Values)
|
||||
if (info.LastInAlarm) count++;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
||||
/// <inheritdoc />
|
||||
public override void CreateAddressSpace(IDictionary<NodeId, IList<IReference>> externalReferences)
|
||||
{
|
||||
@@ -421,6 +467,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
if (alarmInfo == null)
|
||||
return new ServiceResult(StatusCodes.BadNodeIdUnknown);
|
||||
|
||||
using var scope = _metrics.BeginOperation("AlarmAcknowledge");
|
||||
try
|
||||
{
|
||||
var ackMessage = comment?.Text ?? "";
|
||||
@@ -432,6 +479,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
scope.SetSuccess(false);
|
||||
Interlocked.Increment(ref _alarmAckWriteFailures);
|
||||
Log.Warning(ex, "Failed to write AckMsg for {Source}", alarmInfo.SourceName);
|
||||
return new ServiceResult(StatusCodes.BadInternalError);
|
||||
}
|
||||
@@ -1522,6 +1571,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
continue;
|
||||
}
|
||||
|
||||
using var historyScope = _metrics.BeginOperation("HistoryReadRaw");
|
||||
try
|
||||
{
|
||||
var maxValues = details.NumValuesPerNode > 0 ? (int)details.NumValuesPerNode : 0;
|
||||
@@ -1536,6 +1586,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
historyScope.SetSuccess(false);
|
||||
Log.Warning(ex, "HistoryRead raw failed for {TagRef}", tagRef);
|
||||
errors[idx] = new ServiceResult(StatusCodes.BadInternalError);
|
||||
}
|
||||
@@ -1598,6 +1649,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
continue;
|
||||
}
|
||||
|
||||
using var historyScope = _metrics.BeginOperation("HistoryReadProcessed");
|
||||
try
|
||||
{
|
||||
var dataValues = _historianDataSource.ReadAggregateAsync(
|
||||
@@ -1609,6 +1661,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
historyScope.SetSuccess(false);
|
||||
Log.Warning(ex, "HistoryRead processed failed for {TagRef}", tagRef);
|
||||
errors[idx] = new ServiceResult(StatusCodes.BadInternalError);
|
||||
}
|
||||
@@ -1648,6 +1701,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
continue;
|
||||
}
|
||||
|
||||
using var historyScope = _metrics.BeginOperation("HistoryReadAtTime");
|
||||
try
|
||||
{
|
||||
var timestamps = new DateTime[details.ReqTimes.Count];
|
||||
@@ -1669,6 +1723,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
historyScope.SetSuccess(false);
|
||||
Log.Warning(ex, "HistoryRead at-time failed for {TagRef}", tagRef);
|
||||
errors[idx] = new ServiceResult(StatusCodes.BadInternalError);
|
||||
}
|
||||
@@ -1714,6 +1769,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
}
|
||||
|
||||
using var historyScope = _metrics.BeginOperation("HistoryReadEvents");
|
||||
try
|
||||
{
|
||||
var maxEvents = details.NumValuesPerNode > 0 ? (int)details.NumValuesPerNode : 0;
|
||||
@@ -1751,6 +1807,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
historyScope.SetSuccess(false);
|
||||
Log.Warning(ex, "HistoryRead events failed for {NodeId}", nodeIdStr);
|
||||
errors[idx] = new ServiceResult(StatusCodes.BadInternalError);
|
||||
}
|
||||
@@ -2107,7 +2164,10 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
if (ackedAlarmInfo.LastAcked.HasValue && newAcked == ackedAlarmInfo.LastAcked.Value)
|
||||
ackedAlarmInfo = null; // No transition → skip
|
||||
else
|
||||
{
|
||||
pendingAckedEvents.Add((ackedAlarmInfo, newAcked));
|
||||
Interlocked.Increment(ref _alarmAckEventCount);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2127,6 +2187,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.OpcUa
|
||||
}
|
||||
|
||||
pendingAlarmEvents.Add((address, alarmInfo, newInAlarm, severity, message));
|
||||
Interlocked.Increment(ref _alarmTransitionCount);
|
||||
}
|
||||
|
||||
// Apply under Lock so ClearChangeMasks propagates to monitored items.
|
||||
|
||||
@@ -215,9 +215,15 @@ namespace ZB.MOM.WW.LmxOpcUa.Host
|
||||
// Step 8: Create OPC UA server host + node manager
|
||||
var effectiveMxClient = (IMxAccessClient?)_mxAccessClient ??
|
||||
_mxAccessClientForWiring ?? new NullMxAccessClient();
|
||||
_historianDataSource = _config.Historian.Enabled
|
||||
? HistorianPluginLoader.TryLoad(_config.Historian)
|
||||
: null;
|
||||
if (_config.Historian.Enabled)
|
||||
{
|
||||
_historianDataSource = HistorianPluginLoader.TryLoad(_config.Historian);
|
||||
}
|
||||
else
|
||||
{
|
||||
HistorianPluginLoader.MarkDisabled();
|
||||
_historianDataSource = null;
|
||||
}
|
||||
IUserAuthenticationProvider? authProvider = null;
|
||||
if (_hasAuthProviderOverride)
|
||||
{
|
||||
@@ -286,7 +292,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host
|
||||
StatusReportInstance = new StatusReportService(_healthCheck, _config.Dashboard.RefreshIntervalSeconds);
|
||||
StatusReportInstance.SetComponents(effectiveMxClient, Metrics, GalaxyStatsInstance, ServerHost,
|
||||
NodeManagerInstance,
|
||||
_config.Redundancy, _config.OpcUa.ApplicationUri);
|
||||
_config.Redundancy, _config.OpcUa.ApplicationUri, _config.Historian);
|
||||
|
||||
if (_config.Dashboard.Enabled)
|
||||
{
|
||||
|
||||
@@ -9,12 +9,19 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
public class HealthCheckService
|
||||
{
|
||||
/// <summary>
|
||||
/// Evaluates bridge health from runtime connectivity and recorded performance metrics.
|
||||
/// Evaluates bridge health from runtime connectivity, recorded performance metrics, and optional
|
||||
/// historian/alarm integration state.
|
||||
/// </summary>
|
||||
/// <param name="connectionState">The current MXAccess connection state.</param>
|
||||
/// <param name="metrics">The recorded performance metrics, if available.</param>
|
||||
/// <param name="historian">Optional historian integration snapshot; pass <c>null</c> to skip historian health rules.</param>
|
||||
/// <param name="alarms">Optional alarm integration snapshot; pass <c>null</c> to skip alarm health rules.</param>
|
||||
/// <returns>A dashboard health snapshot describing the current service condition.</returns>
|
||||
public HealthInfo CheckHealth(ConnectionState connectionState, PerformanceMetrics? metrics)
|
||||
public HealthInfo CheckHealth(
|
||||
ConnectionState connectionState,
|
||||
PerformanceMetrics? metrics,
|
||||
HistorianStatusInfo? historian = null,
|
||||
AlarmStatusInfo? alarms = null)
|
||||
{
|
||||
// Rule 1: Not connected → Unhealthy
|
||||
if (connectionState != ConnectionState.Connected)
|
||||
@@ -25,12 +32,26 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
Color = "red"
|
||||
};
|
||||
|
||||
// Rule 2: Success rate < 50% with > 100 ops → Degraded
|
||||
// Rule 2b: Historian enabled but plugin did not load → Degraded
|
||||
if (historian != null && historian.Enabled && historian.PluginStatus != "Loaded")
|
||||
return new HealthInfo
|
||||
{
|
||||
Status = "Degraded",
|
||||
Message =
|
||||
$"Historian enabled but plugin status is {historian.PluginStatus}: {historian.PluginError ?? "(no error)"}",
|
||||
Color = "yellow"
|
||||
};
|
||||
|
||||
// Rule 2 / 2c: Success rate too low for any recorded operation
|
||||
if (metrics != null)
|
||||
{
|
||||
var stats = metrics.GetStatistics();
|
||||
foreach (var kvp in stats)
|
||||
if (kvp.Value.TotalCount > 100 && kvp.Value.SuccessRate < 0.5)
|
||||
{
|
||||
var isHistoryOp = kvp.Key.StartsWith("HistoryRead", System.StringComparison.OrdinalIgnoreCase);
|
||||
// History reads are rare; drop the sample threshold so a stuck historian surfaces quickly.
|
||||
var sampleThreshold = isHistoryOp ? 10 : 100;
|
||||
if (kvp.Value.TotalCount > sampleThreshold && kvp.Value.SuccessRate < 0.5)
|
||||
return new HealthInfo
|
||||
{
|
||||
Status = "Degraded",
|
||||
@@ -38,8 +59,18 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
$"{kvp.Key} success rate is {kvp.Value.SuccessRate:P0} ({kvp.Value.TotalCount} ops)",
|
||||
Color = "yellow"
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Rule 2d: Any alarm acknowledge write has failed since startup → Degraded (latched)
|
||||
if (alarms != null && alarms.TrackingEnabled && alarms.AckWriteFailures > 0)
|
||||
return new HealthInfo
|
||||
{
|
||||
Status = "Degraded",
|
||||
Message = $"Alarm acknowledge writes have failed ({alarms.AckWriteFailures} total)",
|
||||
Color = "yellow"
|
||||
};
|
||||
|
||||
// Rule 3: All good
|
||||
return new HealthInfo
|
||||
{
|
||||
@@ -61,4 +92,4 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
return health.Status != "Unhealthy";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -39,6 +39,16 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
/// </summary>
|
||||
public Dictionary<string, MetricsStatistics> Operations { get; set; } = new();
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the historian integration status (plugin load outcome, server target).
|
||||
/// </summary>
|
||||
public HistorianStatusInfo Historian { get; set; } = new();
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the alarm integration status and event counters.
|
||||
/// </summary>
|
||||
public AlarmStatusInfo Alarms { get; set; } = new();
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the redundancy state when redundancy is enabled.
|
||||
/// </summary>
|
||||
@@ -165,6 +175,79 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
public long TotalEvents { get; set; }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Dashboard model for the Wonderware historian integration (runtime-loaded plugin).
|
||||
/// </summary>
|
||||
public class HistorianStatusInfo
|
||||
{
|
||||
/// <summary>
|
||||
/// Gets or sets a value indicating whether historian support is enabled in configuration.
|
||||
/// </summary>
|
||||
public bool Enabled { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the most recent plugin load outcome as a string.
|
||||
/// Values: <c>Disabled</c>, <c>NotFound</c>, <c>LoadFailed</c>, <c>Loaded</c>.
|
||||
/// </summary>
|
||||
public string PluginStatus { get; set; } = "Disabled";
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the error message from the last load attempt when <see cref="PluginStatus"/> is <c>LoadFailed</c>.
|
||||
/// </summary>
|
||||
public string? PluginError { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the absolute path the loader probed for the plugin assembly.
|
||||
/// </summary>
|
||||
public string PluginPath { get; set; } = "";
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the configured historian server hostname.
|
||||
/// </summary>
|
||||
public string ServerName { get; set; } = "";
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the configured historian TCP port.
|
||||
/// </summary>
|
||||
public int Port { get; set; }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Dashboard model for alarm integration health and event counters.
|
||||
/// </summary>
|
||||
public class AlarmStatusInfo
|
||||
{
|
||||
/// <summary>
|
||||
/// Gets or sets a value indicating whether alarm condition tracking is enabled in configuration.
|
||||
/// </summary>
|
||||
public bool TrackingEnabled { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the number of distinct alarm conditions currently tracked.
|
||||
/// </summary>
|
||||
public int ConditionCount { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the number of alarms currently in the InAlarm=true state.
|
||||
/// </summary>
|
||||
public int ActiveAlarmCount { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the total number of InAlarm transitions observed since startup.
|
||||
/// </summary>
|
||||
public long TransitionCount { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the total number of alarm acknowledgement transitions observed since startup.
|
||||
/// </summary>
|
||||
public long AckEventCount { get; set; }
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the total number of alarm acknowledgement MXAccess writes that have failed since startup.
|
||||
/// </summary>
|
||||
public long AckWriteFailures { get; set; }
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Dashboard model for redundancy state. Only populated when redundancy is enabled.
|
||||
/// </summary>
|
||||
@@ -266,6 +349,18 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
/// Gets or sets OPC UA server status.
|
||||
/// </summary>
|
||||
public string OpcUaServer { get; set; } = "Stopped";
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets the historian plugin status.
|
||||
/// Values: <c>Disabled</c>, <c>NotFound</c>, <c>LoadFailed</c>, <c>Loaded</c>.
|
||||
/// </summary>
|
||||
public string Historian { get; set; } = "Disabled";
|
||||
|
||||
/// <summary>
|
||||
/// Gets or sets whether alarm condition tracking is enabled.
|
||||
/// Values: <c>Disabled</c>, <c>Enabled</c>.
|
||||
/// </summary>
|
||||
public string Alarms { get; set; } = "Disabled";
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
using System;
|
||||
using System.Collections.Generic;
|
||||
using System.Net;
|
||||
using System.Text;
|
||||
using System.Text.Json;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Configuration;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Domain;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.GalaxyRepository;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Historian;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Metrics;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.OpcUa;
|
||||
|
||||
@@ -22,6 +24,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
private GalaxyRepositoryStats? _galaxyStats;
|
||||
private PerformanceMetrics? _metrics;
|
||||
|
||||
private HistorianConfiguration? _historianConfig;
|
||||
private IMxAccessClient? _mxAccessClient;
|
||||
private LmxNodeManager? _nodeManager;
|
||||
private RedundancyConfiguration? _redundancyConfig;
|
||||
@@ -53,7 +56,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
public void SetComponents(IMxAccessClient? mxAccessClient, PerformanceMetrics? metrics,
|
||||
GalaxyRepositoryStats? galaxyStats, OpcUaServerHost? serverHost,
|
||||
LmxNodeManager? nodeManager = null,
|
||||
RedundancyConfiguration? redundancyConfig = null, string? applicationUri = null)
|
||||
RedundancyConfiguration? redundancyConfig = null, string? applicationUri = null,
|
||||
HistorianConfiguration? historianConfig = null)
|
||||
{
|
||||
_mxAccessClient = mxAccessClient;
|
||||
_metrics = metrics;
|
||||
@@ -62,6 +66,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
_nodeManager = nodeManager;
|
||||
_redundancyConfig = redundancyConfig;
|
||||
_applicationUri = applicationUri;
|
||||
_historianConfig = historianConfig;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -71,6 +76,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
public StatusData GetStatusData()
|
||||
{
|
||||
var connectionState = _mxAccessClient?.State ?? ConnectionState.Disconnected;
|
||||
var historianInfo = BuildHistorianStatusInfo();
|
||||
var alarmInfo = BuildAlarmStatusInfo();
|
||||
|
||||
return new StatusData
|
||||
{
|
||||
@@ -80,7 +87,7 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
ReconnectCount = _mxAccessClient?.ReconnectCount ?? 0,
|
||||
ActiveSessions = _serverHost?.ActiveSessionCount ?? 0
|
||||
},
|
||||
Health = _healthCheck.CheckHealth(connectionState, _metrics),
|
||||
Health = _healthCheck.CheckHealth(connectionState, _metrics, historianInfo, alarmInfo),
|
||||
Subscriptions = new SubscriptionInfo
|
||||
{
|
||||
ActiveCount = _mxAccessClient?.ActiveSubscriptionCount ?? 0
|
||||
@@ -102,6 +109,8 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
TotalEvents = _nodeManager?.TotalMxChangeEvents ?? 0
|
||||
},
|
||||
Operations = _metrics?.GetStatistics() ?? new Dictionary<string, MetricsStatistics>(),
|
||||
Historian = historianInfo,
|
||||
Alarms = alarmInfo,
|
||||
Redundancy = BuildRedundancyInfo(),
|
||||
Footer = new FooterInfo
|
||||
{
|
||||
@@ -111,6 +120,33 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
};
|
||||
}
|
||||
|
||||
private HistorianStatusInfo BuildHistorianStatusInfo()
|
||||
{
|
||||
var outcome = HistorianPluginLoader.LastOutcome;
|
||||
return new HistorianStatusInfo
|
||||
{
|
||||
Enabled = _historianConfig?.Enabled ?? false,
|
||||
PluginStatus = outcome.Status.ToString(),
|
||||
PluginError = outcome.Error,
|
||||
PluginPath = outcome.PluginPath,
|
||||
ServerName = _historianConfig?.ServerName ?? "",
|
||||
Port = _historianConfig?.Port ?? 0
|
||||
};
|
||||
}
|
||||
|
||||
private AlarmStatusInfo BuildAlarmStatusInfo()
|
||||
{
|
||||
return new AlarmStatusInfo
|
||||
{
|
||||
TrackingEnabled = _nodeManager?.AlarmTrackingEnabled ?? false,
|
||||
ConditionCount = _nodeManager?.AlarmConditionCount ?? 0,
|
||||
ActiveAlarmCount = _nodeManager?.ActiveAlarmCount ?? 0,
|
||||
TransitionCount = _nodeManager?.AlarmTransitionCount ?? 0,
|
||||
AckEventCount = _nodeManager?.AlarmAckEventCount ?? 0,
|
||||
AckWriteFailures = _nodeManager?.AlarmAckWriteFailures ?? 0
|
||||
};
|
||||
}
|
||||
|
||||
private RedundancyInfo? BuildRedundancyInfo()
|
||||
{
|
||||
if (_redundancyConfig == null || !_redundancyConfig.Enabled)
|
||||
@@ -204,6 +240,26 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
sb.AppendLine($"<p>Last Rebuild: {data.Galaxy.LastRebuildTime:O}</p>");
|
||||
sb.AppendLine("</div>");
|
||||
|
||||
// Historian panel
|
||||
var histColor = data.Historian.PluginStatus == "Loaded" ? "green"
|
||||
: !data.Historian.Enabled ? "gray" : "red";
|
||||
sb.AppendLine($"<div class='panel {histColor}'><h2>Historian</h2>");
|
||||
sb.AppendLine(
|
||||
$"<p>Enabled: <b>{data.Historian.Enabled}</b> | Plugin: <b>{data.Historian.PluginStatus}</b> | Server: {WebUtility.HtmlEncode(data.Historian.ServerName)}:{data.Historian.Port}</p>");
|
||||
if (!string.IsNullOrEmpty(data.Historian.PluginError))
|
||||
sb.AppendLine($"<p>Error: {WebUtility.HtmlEncode(data.Historian.PluginError)}</p>");
|
||||
sb.AppendLine("</div>");
|
||||
|
||||
// Alarms panel
|
||||
var alarmPanelColor = !data.Alarms.TrackingEnabled ? "gray"
|
||||
: data.Alarms.AckWriteFailures > 0 ? "yellow" : "green";
|
||||
sb.AppendLine($"<div class='panel {alarmPanelColor}'><h2>Alarms</h2>");
|
||||
sb.AppendLine(
|
||||
$"<p>Tracking: <b>{data.Alarms.TrackingEnabled}</b> | Conditions: {data.Alarms.ConditionCount} | Active: <b>{data.Alarms.ActiveAlarmCount}</b></p>");
|
||||
sb.AppendLine(
|
||||
$"<p>Transitions: {data.Alarms.TransitionCount:N0} | Ack Events: {data.Alarms.AckEventCount:N0} | Ack Write Failures: {data.Alarms.AckWriteFailures}</p>");
|
||||
sb.AppendLine("</div>");
|
||||
|
||||
// Operations table
|
||||
sb.AppendLine("<div class='panel gray'><h2>Operations</h2>");
|
||||
sb.AppendLine(
|
||||
@@ -254,7 +310,9 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
var connectionState = _mxAccessClient?.State ?? ConnectionState.Disconnected;
|
||||
var mxConnected = connectionState == ConnectionState.Connected;
|
||||
var dbConnected = _galaxyStats?.DbConnected ?? false;
|
||||
var health = _healthCheck.CheckHealth(connectionState, _metrics);
|
||||
var historianInfo = BuildHistorianStatusInfo();
|
||||
var alarmInfo = BuildAlarmStatusInfo();
|
||||
var health = _healthCheck.CheckHealth(connectionState, _metrics, historianInfo, alarmInfo);
|
||||
var uptime = DateTime.UtcNow - _startTime;
|
||||
|
||||
var data = new HealthEndpointData
|
||||
@@ -265,7 +323,9 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
{
|
||||
MxAccess = connectionState.ToString(),
|
||||
Database = dbConnected ? "Connected" : "Disconnected",
|
||||
OpcUaServer = _serverHost?.IsRunning ?? false ? "Running" : "Stopped"
|
||||
OpcUaServer = _serverHost?.IsRunning ?? false ? "Running" : "Stopped",
|
||||
Historian = historianInfo.PluginStatus,
|
||||
Alarms = alarmInfo.TrackingEnabled ? "Enabled" : "Disabled"
|
||||
},
|
||||
Uptime = FormatUptime(uptime),
|
||||
Timestamp = DateTime.UtcNow
|
||||
@@ -354,6 +414,10 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
sb.AppendLine(
|
||||
$"<div class='redundancy'>Role: <b>{data.RedundancyRole}</b> | Mode: <b>{data.RedundancyMode}</b></div>");
|
||||
|
||||
var historianColor = data.Components.Historian == "Loaded" ? "#00cc66"
|
||||
: data.Components.Historian == "Disabled" ? "#666" : "#cc3333";
|
||||
var alarmColor = data.Components.Alarms == "Enabled" ? "#00cc66" : "#666";
|
||||
|
||||
// Component health cards
|
||||
sb.AppendLine("<div class='components'>");
|
||||
sb.AppendLine(
|
||||
@@ -362,6 +426,10 @@ namespace ZB.MOM.WW.LmxOpcUa.Host.Status
|
||||
$"<div class='component' style='border-color: {dbColor};'><div class='name'>Galaxy Database</div><div class='value' style='color: {dbColor};'>{data.Components.Database}</div></div>");
|
||||
sb.AppendLine(
|
||||
$"<div class='component' style='border-color: {uaColor};'><div class='name'>OPC UA Server</div><div class='value' style='color: {uaColor};'>{data.Components.OpcUaServer}</div></div>");
|
||||
sb.AppendLine(
|
||||
$"<div class='component' style='border-color: {historianColor};'><div class='name'>Historian</div><div class='value' style='color: {historianColor};'>{data.Components.Historian}</div></div>");
|
||||
sb.AppendLine(
|
||||
$"<div class='component' style='border-color: {alarmColor};'><div class='name'>Alarm Tracking</div><div class='value' style='color: {alarmColor};'>{data.Components.Alarms}</div></div>");
|
||||
sb.AppendLine("</div>");
|
||||
|
||||
// Footer
|
||||
|
||||
@@ -0,0 +1,45 @@
|
||||
using Shouldly;
|
||||
using Xunit;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Configuration;
|
||||
using ZB.MOM.WW.LmxOpcUa.Host.Historian;
|
||||
|
||||
namespace ZB.MOM.WW.LmxOpcUa.Tests.Historian
|
||||
{
|
||||
/// <summary>
|
||||
/// Verifies the load-outcome state machine of <see cref="HistorianPluginLoader"/>.
|
||||
/// </summary>
|
||||
public class HistorianPluginLoaderTests
|
||||
{
|
||||
/// <summary>
|
||||
/// MarkDisabled publishes a Disabled outcome so the dashboard can distinguish
|
||||
/// "feature off" from "load failed."
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void MarkDisabled_PublishesDisabledOutcome()
|
||||
{
|
||||
HistorianPluginLoader.MarkDisabled();
|
||||
|
||||
HistorianPluginLoader.LastOutcome.Status.ShouldBe(HistorianPluginStatus.Disabled);
|
||||
HistorianPluginLoader.LastOutcome.Error.ShouldBeNull();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// When the plugin directory is missing, TryLoad reports NotFound — not LoadFailed —
|
||||
/// and returns null so the server can start with history disabled.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void TryLoad_PluginMissing_ReturnsNullWithNotFoundOutcome()
|
||||
{
|
||||
// The test process runs from a bin directory that does not contain a Historian/
|
||||
// subfolder, so TryLoad will take the file-missing branch.
|
||||
var config = new HistorianConfiguration { Enabled = true };
|
||||
|
||||
var result = HistorianPluginLoader.TryLoad(config);
|
||||
|
||||
result.ShouldBeNull();
|
||||
HistorianPluginLoader.LastOutcome.Status.ShouldBe(HistorianPluginStatus.NotFound);
|
||||
HistorianPluginLoader.LastOutcome.PluginPath.ShouldContain("ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll");
|
||||
HistorianPluginLoader.LastOutcome.Error.ShouldBeNull();
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -105,5 +105,108 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Status
|
||||
var result = _sut.CheckHealth(ConnectionState.Reconnecting, null);
|
||||
result.Status.ShouldBe("Unhealthy");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Historian enabled but plugin failed to load → Degraded with the plugin error in the message.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void HistorianEnabled_PluginLoadFailed_ReturnsDegraded()
|
||||
{
|
||||
var historian = new HistorianStatusInfo
|
||||
{
|
||||
Enabled = true,
|
||||
PluginStatus = "LoadFailed",
|
||||
PluginError = "aahClientManaged.dll could not be loaded"
|
||||
};
|
||||
|
||||
var result = _sut.CheckHealth(ConnectionState.Connected, null, historian);
|
||||
|
||||
result.Status.ShouldBe("Degraded");
|
||||
result.Color.ShouldBe("yellow");
|
||||
result.Message.ShouldContain("LoadFailed");
|
||||
result.Message.ShouldContain("aahClientManaged.dll");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Historian disabled is healthy regardless of plugin status string.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void HistorianDisabled_ReturnsHealthy()
|
||||
{
|
||||
var historian = new HistorianStatusInfo
|
||||
{
|
||||
Enabled = false,
|
||||
PluginStatus = "Disabled"
|
||||
};
|
||||
|
||||
_sut.CheckHealth(ConnectionState.Connected, null, historian).Status.ShouldBe("Healthy");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Historian enabled and plugin loaded is healthy.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void HistorianEnabled_PluginLoaded_ReturnsHealthy()
|
||||
{
|
||||
var historian = new HistorianStatusInfo { Enabled = true, PluginStatus = "Loaded" };
|
||||
_sut.CheckHealth(ConnectionState.Connected, null, historian).Status.ShouldBe("Healthy");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// HistoryRead operations degrade after only 11 samples with <50% success rate
|
||||
/// (lower threshold than the regular 100-sample rule).
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void HistoryReadLowSuccessRate_WithLowSampleCount_ReturnsDegraded()
|
||||
{
|
||||
using var metrics = new PerformanceMetrics();
|
||||
for (var i = 0; i < 4; i++)
|
||||
metrics.RecordOperation("HistoryReadRaw", TimeSpan.FromMilliseconds(10));
|
||||
for (var i = 0; i < 8; i++)
|
||||
metrics.RecordOperation("HistoryReadRaw", TimeSpan.FromMilliseconds(10), false);
|
||||
|
||||
var result = _sut.CheckHealth(ConnectionState.Connected, metrics);
|
||||
|
||||
result.Status.ShouldBe("Degraded");
|
||||
result.Message.ShouldContain("HistoryReadRaw");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// A HistoryRead sample under the 10-sample threshold does not degrade the service.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void HistoryReadLowSuccessRate_BelowThreshold_ReturnsHealthy()
|
||||
{
|
||||
using var metrics = new PerformanceMetrics();
|
||||
for (var i = 0; i < 5; i++)
|
||||
metrics.RecordOperation("HistoryReadRaw", TimeSpan.FromMilliseconds(10), false);
|
||||
|
||||
_sut.CheckHealth(ConnectionState.Connected, metrics).Status.ShouldBe("Healthy");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Alarm acknowledge write failures are latched — any non-zero count degrades the service.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void AlarmAckWriteFailures_AnyCount_ReturnsDegraded()
|
||||
{
|
||||
var alarms = new AlarmStatusInfo { TrackingEnabled = true, AckWriteFailures = 1 };
|
||||
|
||||
var result = _sut.CheckHealth(ConnectionState.Connected, null, null, alarms);
|
||||
|
||||
result.Status.ShouldBe("Degraded");
|
||||
result.Message.ShouldContain("Alarm acknowledge");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Alarm tracking disabled ignores any failure count.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void AlarmAckWriteFailures_TrackingDisabled_ReturnsHealthy()
|
||||
{
|
||||
var alarms = new AlarmStatusInfo { TrackingEnabled = false, AckWriteFailures = 99 };
|
||||
|
||||
_sut.CheckHealth(ConnectionState.Connected, null, null, alarms).Status.ShouldBe("Healthy");
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -108,9 +108,65 @@ namespace ZB.MOM.WW.LmxOpcUa.Tests.Status
|
||||
json.ShouldContain("Subscriptions");
|
||||
json.ShouldContain("Galaxy");
|
||||
json.ShouldContain("Operations");
|
||||
json.ShouldContain("Historian");
|
||||
json.ShouldContain("Alarms");
|
||||
json.ShouldContain("Footer");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The dashboard JSON exposes the historian plugin status so operators can distinguish
|
||||
/// "disabled by config" from "plugin crashed on load."
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void GenerateJson_Historian_IncludesPluginStatus()
|
||||
{
|
||||
var sut = CreateService();
|
||||
var json = sut.GenerateJson();
|
||||
|
||||
json.ShouldContain("PluginStatus");
|
||||
json.ShouldContain("PluginPath");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The dashboard JSON exposes alarm counters so operators can see transition/ack activity.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void GenerateJson_Alarms_IncludesCounters()
|
||||
{
|
||||
var sut = CreateService();
|
||||
var json = sut.GenerateJson();
|
||||
|
||||
json.ShouldContain("TrackingEnabled");
|
||||
json.ShouldContain("TransitionCount");
|
||||
json.ShouldContain("AckWriteFailures");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The Historian and Alarms panels render in the HTML dashboard.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void GenerateHtml_IncludesHistorianAndAlarmPanels()
|
||||
{
|
||||
var sut = CreateService();
|
||||
var html = sut.GenerateHtml();
|
||||
|
||||
html.ShouldContain("<h2>Historian</h2>");
|
||||
html.ShouldContain("<h2>Alarms</h2>");
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// The /api/health payload exposes Historian and Alarms component status.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void GetHealthData_Components_IncludeHistorianAndAlarms()
|
||||
{
|
||||
var sut = CreateService();
|
||||
var data = sut.GetHealthData();
|
||||
|
||||
data.Components.Historian.ShouldNotBeNullOrEmpty();
|
||||
data.Components.Alarms.ShouldNotBeNullOrEmpty();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Confirms that the report service reports healthy when the runtime connection is up.
|
||||
/// </summary>
|
||||
|
||||
Reference in New Issue
Block a user