Files
lmxopcua/src/ZB.MOM.WW.OtOpcUa.Admin/Services/ClusterNodeService.cs
Joseph Doherty 13d5a7968b Admin RedundancyTab — per-cluster read-only topology view. Closes the UI slice of task #149 (Phase 6.3 Stream E — Admin UI RedundancyTab + OpenTelemetry metrics + SignalR); the OpenTelemetry metrics + RoleChanged SignalR push are split into new follow-up task #198 because each is a structural add that deserves its own test matrix + NuGet-dep decision rather than riding this UI PR. New /clusters/{ClusterId} Redundancy tab slotted between ACLs and Audit in the existing ClusterDetail tab bar. Shows each ClusterNode row in the cluster with columns Node / Role (Primary green, Secondary blue, Standalone primary-blue badge) / Host / OPC UA port / ServiceLevel base / ApplicationUri (text-break so the long urn: doesn't blow out the table) / Enabled badge / Last seen (relative age via the same FormatAge helper as Hosts.razor, with a yellow "Stale" chip once LastSeenAt crosses the 30s threshold shared with HostStatusService.StaleThreshold — a missed heartbeat plus clock-skew buffer). Four summary cards above the table — total Nodes, Primary count, Secondary count, Stale count. Two guard-rail alerts: (a) red "No Primary or Standalone" when the cluster has no authoritative write target (all rows are Secondaries — read-only until one is promoted by the server-side RedundancyCoordinator apply-lease flow); (b) red "Split-brain" when >1 Primary exists — apply-lease enforcement at the coordinator level should have made this impossible, so the alert implies a hand-edited DB row + an investigation. New ClusterNodeService with ListByClusterAsync (ordered by ServiceLevelBase descending so Primary rows with higher base float to the top) + a static IsStale predicate matching HostStatusService's 30s convention. DI-registered alongside the existing scoped services in Program.cs. Writes (role swap, enable/disable) are deliberately absent from the service — they go through the RedundancyCoordinator apply-lease flow on the server side + direct DB mutation from Admin would race with it. New ClusterNodeServiceTests covering IsStale across null/recent/old LastSeenAt + ListByClusterAsync ordering + cluster filter. 4/4 new tests passing; full Admin.Tests suite 76/76 (was 72 before this PR, +4). Admin project builds 0 errors. Task #198 captures the deferred work: (1) OpenTelemetry Meter for primary/secondary/stale counts + role_transition counter with from/to/node tags + OTLP exporter config; (2) RoleChanged SignalR push — extend FleetStatusPoller to detect RedundancyRole changes on ClusterNode rows + emit a RoleChanged hub message so the RedundancyTab refreshes instantly instead of on-page-load polling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 22:14:25 -04:00

29 lines
1.3 KiB
C#

using Microsoft.EntityFrameworkCore;
using ZB.MOM.WW.OtOpcUa.Configuration;
using ZB.MOM.WW.OtOpcUa.Configuration.Entities;
namespace ZB.MOM.WW.OtOpcUa.Admin.Services;
/// <summary>
/// Read-side service for ClusterNode rows + their cluster-scoped redundancy view. Consumed
/// by the RedundancyTab on the cluster detail page. Writes (role swap, node enable/disable)
/// are not supported here — role swap happens through the RedundancyCoordinator apply-lease
/// flow on the server side and would conflict with any direct DB mutation from Admin.
/// </summary>
public sealed class ClusterNodeService(OtOpcUaConfigDbContext db)
{
/// <summary>Stale-threshold matching <c>HostStatusService.StaleThreshold</c> — 30s of clock
/// tolerance covers a missed heartbeat plus publisher GC pauses.</summary>
public static readonly TimeSpan StaleThreshold = TimeSpan.FromSeconds(30);
public Task<List<ClusterNode>> ListByClusterAsync(string clusterId, CancellationToken ct) =>
db.ClusterNodes.AsNoTracking()
.Where(n => n.ClusterId == clusterId)
.OrderByDescending(n => n.ServiceLevelBase)
.ThenBy(n => n.NodeId)
.ToListAsync(ct);
public static bool IsStale(ClusterNode node) =>
node.LastSeenAt is null || DateTime.UtcNow - node.LastSeenAt.Value > StaleThreshold;
}