Files
lmxopcua/tests/ZB.MOM.WW.OtOpcUa.Admin.Tests/RedundancyMetricsTests.cs
Joseph Doherty 1f3343e61f OpenTelemetry redundancy metrics + RoleChanged SignalR push. Closes instrumentation + live-push slices of task #198; the exporter wiring (OTLP vs Prometheus package decision) is split to new task #201 because the collector/scrape-endpoint choice is a fleet-ops decision that deserves its own PR rather than hardcoded here. New RedundancyMetrics class (Singleton-registered in DI) owning a System.Diagnostics.Metrics.Meter("ZB.MOM.WW.OtOpcUa.Redundancy", "1.0.0"). Three ObservableGauge instruments — otopcua.redundancy.primary_count / secondary_count / stale_count — all tagged by cluster.id, populated by SetClusterCounts(clusterId, primary, secondary, stale) which the poller calls at the tail of every tick; ObservableGauge callbacks snapshot the last value set under a lock so the reader (OTel collector, dotnet-counters) sees consistent tuples. One Counter — otopcua.redundancy.role_transition — tagged cluster.id, node.id, from_role, to_role; ideal for tracking "how often does Cluster-X failover" + "which node transitions most" aggregate queries. In-box Metrics API means zero NuGet dep here — the exporter PR adds OpenTelemetry.Extensions.Hosting + OpenTelemetry.Exporter.OpenTelemetryProtocol or OpenTelemetry.Exporter.Prometheus.AspNetCore to actually ship the data somewhere. FleetStatusPoller extended with role-change detection. Its PollOnceAsync now pulls ClusterNode rows alongside the existing ClusterNodeGenerationState scan, and a new PollRolesAsync walks every node comparing RedundancyRole to the _lastRole cache. On change: records the transition to RedundancyMetrics + emits a RoleChanged SignalR message to both FleetStatusHub.GroupName(cluster) + FleetStatusHub.FleetGroup so cluster-scoped + fleet-wide subscribers both see it. First observation per node is a bootstrap (cache fill) + NOT a transition — avoids spurious churn on service startup or pod restart. UpdateClusterGauges groups nodes by cluster + sets the three gauge values, using ClusterNodeService.StaleThreshold (shared 30s convention) for staleness so the /hosts page + the gauge agree. RoleChangedMessage record lives alongside NodeStateChangedMessage in FleetStatusPoller.cs. RedundancyTab.razor subscribes to the fleet-status hub on first parameters-set, filters RoleChanged events to the current cluster, reloads the node list + paints a blue info banner ("Role changed on node-a: Primary → Secondary at HH:mm:ss UTC") so operators see the transition without needing to poll-refresh the page. IAsyncDisposable closes the connection on tab swap-away. Two new RedundancyMetricsTests covering RecordRoleTransition tag emission (cluster.id + node.id + from_role + to_role all flow through the MeterListener callback) + ObservableGauge snapshot for two clusters (assert primary_count=1 for c1, stale_count=1 for c2). Existing FleetStatusPollerTests ctor-line updated to pass a RedundancyMetrics instance; all tests still pass. Full Admin.Tests suite 87/87 passing (was 85, +2). Admin project builds 0 errors. Task #201 captures the exporter-wiring follow-up — OpenTelemetry.Extensions.Hosting + OTLP vs Prometheus + /metrics endpoint decision, driven by fleet-ops infra direction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:16:09 -04:00

71 lines
2.8 KiB
C#

using System.Diagnostics.Metrics;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Admin.Services;
namespace ZB.MOM.WW.OtOpcUa.Admin.Tests;
[Trait("Category", "Unit")]
public sealed class RedundancyMetricsTests
{
[Fact]
public void RecordRoleTransition_Increments_Counter_WithExpectedTags()
{
using var metrics = new RedundancyMetrics();
using var listener = new MeterListener();
var observed = new List<(long Value, Dictionary<string, object?> Tags)>();
listener.InstrumentPublished = (instrument, l) =>
{
if (instrument.Meter.Name == RedundancyMetrics.MeterName &&
instrument.Name == "otopcua.redundancy.role_transition")
{
l.EnableMeasurementEvents(instrument);
}
};
listener.SetMeasurementEventCallback<long>((_, value, tags, _) =>
{
var dict = new Dictionary<string, object?>();
foreach (var tag in tags) dict[tag.Key] = tag.Value;
observed.Add((value, dict));
});
listener.Start();
metrics.RecordRoleTransition("c1", "node-a", "Primary", "Secondary");
observed.Count.ShouldBe(1);
observed[0].Value.ShouldBe(1);
observed[0].Tags["cluster.id"].ShouldBe("c1");
observed[0].Tags["node.id"].ShouldBe("node-a");
observed[0].Tags["from_role"].ShouldBe("Primary");
observed[0].Tags["to_role"].ShouldBe("Secondary");
}
[Fact]
public void SetClusterCounts_Observed_Via_ObservableGauges()
{
using var metrics = new RedundancyMetrics();
metrics.SetClusterCounts("c1", primary: 1, secondary: 2, stale: 0);
metrics.SetClusterCounts("c2", primary: 0, secondary: 1, stale: 1);
var observations = new List<(string Name, long Value, string Cluster)>();
using var listener = new MeterListener();
listener.InstrumentPublished = (instrument, l) =>
{
if (instrument.Meter.Name == RedundancyMetrics.MeterName)
l.EnableMeasurementEvents(instrument);
};
listener.SetMeasurementEventCallback<long>((instrument, value, tags, _) =>
{
string? cluster = null;
foreach (var t in tags) if (t.Key == "cluster.id") cluster = t.Value as string;
observations.Add((instrument.Name, value, cluster ?? "?"));
});
listener.Start();
listener.RecordObservableInstruments();
observations.ShouldContain(o => o.Name == "otopcua.redundancy.primary_count" && o.Cluster == "c1" && o.Value == 1);
observations.ShouldContain(o => o.Name == "otopcua.redundancy.secondary_count" && o.Cluster == "c1" && o.Value == 2);
observations.ShouldContain(o => o.Name == "otopcua.redundancy.stale_count" && o.Cluster == "c2" && o.Value == 1);
}
}