OpenTelemetry redundancy metrics + RoleChanged SignalR push. Closes instrumentation + live-push slices of task #198; the exporter wiring (OTLP vs Prometheus package decision) is split to new task #201 because the collector/scrape-endpoint choice is a fleet-ops decision that deserves its own PR rather than hardcoded here. New RedundancyMetrics class (Singleton-registered in DI) owning a System.Diagnostics.Metrics.Meter("ZB.MOM.WW.OtOpcUa.Redundancy", "1.0.0"). Three ObservableGauge instruments — otopcua.redundancy.primary_count / secondary_count / stale_count — all tagged by cluster.id, populated by SetClusterCounts(clusterId, primary, secondary, stale) which the poller calls at the tail of every tick; ObservableGauge callbacks snapshot the last value set under a lock so the reader (OTel collector, dotnet-counters) sees consistent tuples. One Counter — otopcua.redundancy.role_transition — tagged cluster.id, node.id, from_role, to_role; ideal for tracking "how often does Cluster-X failover" + "which node transitions most" aggregate queries. In-box Metrics API means zero NuGet dep here — the exporter PR adds OpenTelemetry.Extensions.Hosting + OpenTelemetry.Exporter.OpenTelemetryProtocol or OpenTelemetry.Exporter.Prometheus.AspNetCore to actually ship the data somewhere. FleetStatusPoller extended with role-change detection. Its PollOnceAsync now pulls ClusterNode rows alongside the existing ClusterNodeGenerationState scan, and a new PollRolesAsync walks every node comparing RedundancyRole to the _lastRole cache. On change: records the transition to RedundancyMetrics + emits a RoleChanged SignalR message to both FleetStatusHub.GroupName(cluster) + FleetStatusHub.FleetGroup so cluster-scoped + fleet-wide subscribers both see it. First observation per node is a bootstrap (cache fill) + NOT a transition — avoids spurious churn on service startup or pod restart. UpdateClusterGauges groups nodes by cluster + sets the three gauge values, using ClusterNodeService.StaleThreshold (shared 30s convention) for staleness so the /hosts page + the gauge agree. RoleChangedMessage record lives alongside NodeStateChangedMessage in FleetStatusPoller.cs. RedundancyTab.razor subscribes to the fleet-status hub on first parameters-set, filters RoleChanged events to the current cluster, reloads the node list + paints a blue info banner ("Role changed on node-a: Primary → Secondary at HH:mm:ss UTC") so operators see the transition without needing to poll-refresh the page. IAsyncDisposable closes the connection on tab swap-away. Two new RedundancyMetricsTests covering RecordRoleTransition tag emission (cluster.id + node.id + from_role + to_role all flow through the MeterListener callback) + ObservableGauge snapshot for two clusters (assert primary_count=1 for c1, stale_count=1 for c2). Existing FleetStatusPollerTests ctor-line updated to pass a RedundancyMetrics instance; all tests still pass. Full Admin.Tests suite 87/87 passing (was 85, +2). Admin project builds 0 errors. Task #201 captures the exporter-wiring follow-up — OpenTelemetry.Extensions.Hosting + OTLP vs Prometheus + /metrics endpoint decision, driven by fleet-ops infra direction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,9 +1,17 @@
|
||||
@using Microsoft.AspNetCore.SignalR.Client
|
||||
@using ZB.MOM.WW.OtOpcUa.Admin.Hubs
|
||||
@using ZB.MOM.WW.OtOpcUa.Admin.Services
|
||||
@using ZB.MOM.WW.OtOpcUa.Configuration.Entities
|
||||
@using ZB.MOM.WW.OtOpcUa.Configuration.Enums
|
||||
@inject ClusterNodeService NodeSvc
|
||||
@inject NavigationManager Nav
|
||||
@implements IAsyncDisposable
|
||||
|
||||
<h4>Redundancy topology</h4>
|
||||
@if (_roleChangedBanner is not null)
|
||||
{
|
||||
<div class="alert alert-info small mb-2">@_roleChangedBanner</div>
|
||||
}
|
||||
<p class="text-muted small">
|
||||
One row per <code>ClusterNode</code> in this cluster. Role, <code>ApplicationUri</code>,
|
||||
and <code>ServiceLevelBase</code> are authored separately; the Admin UI shows them read-only
|
||||
@@ -107,10 +115,41 @@ else
|
||||
[Parameter] public string ClusterId { get; set; } = string.Empty;
|
||||
|
||||
private List<ClusterNode>? _nodes;
|
||||
private HubConnection? _hub;
|
||||
private string? _roleChangedBanner;
|
||||
|
||||
protected override async Task OnParametersSetAsync()
|
||||
{
|
||||
_nodes = await NodeSvc.ListByClusterAsync(ClusterId, CancellationToken.None);
|
||||
if (_hub is null) await ConnectHubAsync();
|
||||
}
|
||||
|
||||
private async Task ConnectHubAsync()
|
||||
{
|
||||
_hub = new HubConnectionBuilder()
|
||||
.WithUrl(Nav.ToAbsoluteUri("/hubs/fleet-status"))
|
||||
.WithAutomaticReconnect()
|
||||
.Build();
|
||||
|
||||
_hub.On<RoleChangedMessage>("RoleChanged", async msg =>
|
||||
{
|
||||
if (msg.ClusterId != ClusterId) return;
|
||||
_roleChangedBanner = $"Role changed on {msg.NodeId}: {msg.FromRole} → {msg.ToRole} at {msg.ObservedAtUtc:HH:mm:ss 'UTC'}";
|
||||
_nodes = await NodeSvc.ListByClusterAsync(ClusterId, CancellationToken.None);
|
||||
await InvokeAsync(StateHasChanged);
|
||||
});
|
||||
|
||||
await _hub.StartAsync();
|
||||
await _hub.SendAsync("SubscribeCluster", ClusterId);
|
||||
}
|
||||
|
||||
public async ValueTask DisposeAsync()
|
||||
{
|
||||
if (_hub is not null)
|
||||
{
|
||||
await _hub.DisposeAsync();
|
||||
_hub = null;
|
||||
}
|
||||
}
|
||||
|
||||
private static string RowClass(ClusterNode n) =>
|
||||
|
||||
Reference in New Issue
Block a user