Files
lmxopcua/src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs
Joseph Doherty 1f3343e61f OpenTelemetry redundancy metrics + RoleChanged SignalR push. Closes instrumentation + live-push slices of task #198; the exporter wiring (OTLP vs Prometheus package decision) is split to new task #201 because the collector/scrape-endpoint choice is a fleet-ops decision that deserves its own PR rather than hardcoded here. New RedundancyMetrics class (Singleton-registered in DI) owning a System.Diagnostics.Metrics.Meter("ZB.MOM.WW.OtOpcUa.Redundancy", "1.0.0"). Three ObservableGauge instruments — otopcua.redundancy.primary_count / secondary_count / stale_count — all tagged by cluster.id, populated by SetClusterCounts(clusterId, primary, secondary, stale) which the poller calls at the tail of every tick; ObservableGauge callbacks snapshot the last value set under a lock so the reader (OTel collector, dotnet-counters) sees consistent tuples. One Counter — otopcua.redundancy.role_transition — tagged cluster.id, node.id, from_role, to_role; ideal for tracking "how often does Cluster-X failover" + "which node transitions most" aggregate queries. In-box Metrics API means zero NuGet dep here — the exporter PR adds OpenTelemetry.Extensions.Hosting + OpenTelemetry.Exporter.OpenTelemetryProtocol or OpenTelemetry.Exporter.Prometheus.AspNetCore to actually ship the data somewhere. FleetStatusPoller extended with role-change detection. Its PollOnceAsync now pulls ClusterNode rows alongside the existing ClusterNodeGenerationState scan, and a new PollRolesAsync walks every node comparing RedundancyRole to the _lastRole cache. On change: records the transition to RedundancyMetrics + emits a RoleChanged SignalR message to both FleetStatusHub.GroupName(cluster) + FleetStatusHub.FleetGroup so cluster-scoped + fleet-wide subscribers both see it. First observation per node is a bootstrap (cache fill) + NOT a transition — avoids spurious churn on service startup or pod restart. UpdateClusterGauges groups nodes by cluster + sets the three gauge values, using ClusterNodeService.StaleThreshold (shared 30s convention) for staleness so the /hosts page + the gauge agree. RoleChangedMessage record lives alongside NodeStateChangedMessage in FleetStatusPoller.cs. RedundancyTab.razor subscribes to the fleet-status hub on first parameters-set, filters RoleChanged events to the current cluster, reloads the node list + paints a blue info banner ("Role changed on node-a: Primary → Secondary at HH:mm:ss UTC") so operators see the transition without needing to poll-refresh the page. IAsyncDisposable closes the connection on tab swap-away. Two new RedundancyMetricsTests covering RecordRoleTransition tag emission (cluster.id + node.id + from_role + to_role all flow through the MeterListener callback) + ObservableGauge snapshot for two clusters (assert primary_count=1 for c1, stale_count=1 for c2). Existing FleetStatusPollerTests ctor-line updated to pass a RedundancyMetrics instance; all tests still pass. Full Admin.Tests suite 87/87 passing (was 85, +2). Admin project builds 0 errors. Task #201 captures the exporter-wiring follow-up — OpenTelemetry.Extensions.Hosting + OTLP vs Prometheus + /metrics endpoint decision, driven by fleet-ops infra direction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:16:09 -04:00

93 lines
3.5 KiB
C#

using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.EntityFrameworkCore;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Admin.Components;
using ZB.MOM.WW.OtOpcUa.Admin.Hubs;
using ZB.MOM.WW.OtOpcUa.Admin.Security;
using ZB.MOM.WW.OtOpcUa.Admin.Services;
using ZB.MOM.WW.OtOpcUa.Configuration;
var builder = WebApplication.CreateBuilder(args);
builder.Host.UseSerilog((ctx, cfg) => cfg
.MinimumLevel.Information()
.WriteTo.Console()
.WriteTo.File("logs/otopcua-admin-.log", rollingInterval: RollingInterval.Day));
builder.Services.AddRazorComponents().AddInteractiveServerComponents();
builder.Services.AddHttpContextAccessor();
builder.Services.AddSignalR();
builder.Services.AddAuthentication(CookieAuthenticationDefaults.AuthenticationScheme)
.AddCookie(o =>
{
o.Cookie.Name = "OtOpcUa.Admin";
o.LoginPath = "/login";
o.ExpireTimeSpan = TimeSpan.FromHours(8);
});
builder.Services.AddAuthorizationBuilder()
.AddPolicy("CanEdit", p => p.RequireRole(AdminRoles.ConfigEditor, AdminRoles.FleetAdmin))
.AddPolicy("CanPublish", p => p.RequireRole(AdminRoles.FleetAdmin));
builder.Services.AddCascadingAuthenticationState();
builder.Services.AddDbContext<OtOpcUaConfigDbContext>(opt =>
opt.UseSqlServer(builder.Configuration.GetConnectionString("ConfigDb")
?? throw new InvalidOperationException("ConnectionStrings:ConfigDb not configured")));
builder.Services.AddScoped<ClusterService>();
builder.Services.AddScoped<GenerationService>();
builder.Services.AddScoped<EquipmentService>();
builder.Services.AddScoped<UnsService>();
builder.Services.AddScoped<NamespaceService>();
builder.Services.AddScoped<DriverInstanceService>();
builder.Services.AddScoped<NodeAclService>();
builder.Services.AddScoped<ReservationService>();
builder.Services.AddScoped<DraftValidationService>();
builder.Services.AddScoped<AuditLogService>();
builder.Services.AddScoped<HostStatusService>();
builder.Services.AddScoped<ClusterNodeService>();
builder.Services.AddSingleton<RedundancyMetrics>();
builder.Services.AddScoped<EquipmentImportBatchService>();
builder.Services.AddScoped<ZB.MOM.WW.OtOpcUa.Configuration.Services.ILdapGroupRoleMappingService,
ZB.MOM.WW.OtOpcUa.Configuration.Services.LdapGroupRoleMappingService>();
// Cert-trust management — reads the OPC UA server's PKI store root so rejected client certs
// can be promoted to trusted via the Admin UI. Singleton: no per-request state, just
// filesystem operations.
builder.Services.Configure<CertTrustOptions>(builder.Configuration.GetSection(CertTrustOptions.SectionName));
builder.Services.AddSingleton<CertTrustService>();
// LDAP auth — parity with ScadaLink's LdapAuthService (decision #102).
builder.Services.Configure<LdapOptions>(
builder.Configuration.GetSection("Authentication:Ldap"));
builder.Services.AddScoped<ILdapAuthService, LdapAuthService>();
// SignalR real-time fleet status + alerts (admin-ui.md §"Real-Time Updates").
builder.Services.AddHostedService<FleetStatusPoller>();
var app = builder.Build();
app.UseSerilogRequestLogging();
app.UseStaticFiles();
app.UseAuthentication();
app.UseAuthorization();
app.UseAntiforgery();
app.MapPost("/auth/logout", async (HttpContext ctx) =>
{
await ctx.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme);
ctx.Response.Redirect("/");
});
app.MapHub<FleetStatusHub>("/hubs/fleet");
app.MapHub<AlertHub>("/hubs/alerts");
app.MapRazorComponents<App>().AddInteractiveServerRenderMode();
await app.RunAsync();
public partial class Program;