fix(adminui): wire Test Connect probes + live panels on admin-only nodes
v2-ci / build (push) Failing after 36s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped

Both bugs surfaced only on split-role deployments (the MAIN cluster's
admin-only nodes), where the AdminUI runs without the driver role.

- Test Connect returned "No probe registered" for every driver: the
  IDriverProbe set was registered only under the driver role, but the
  admin-operations singleton that consumes it is pinned to admin. Extract
  AddOtOpcUaDriverProbes() (idempotent via TryAddEnumerable) and call it
  in the hasAdmin path too.

- Live driver-status/alerts/script-log panels showed "SignalR error:
  Connection refused": these Blazor Server components opened a HubConnection
  to their own hub via the browser's public URL, which server-side code
  can't reach behind Traefik (host :9200 -> container :9000). Read the
  in-process source directly instead -- DriverStatus via
  IDriverStatusSnapshotStore.SnapshotChanged, Alerts/ScriptLog via a new
  IInProcessBroadcaster<T>. Fleet status was unaffected (reads DB/ActorSystem).

Adds unit tests for probe registration, the snapshot-store event, and the
broadcaster.
This commit is contained in:
Joseph Doherty
2026-05-29 16:38:32 -04:00
parent e3a27422a1
commit 61193629b6
14 changed files with 388 additions and 106 deletions
@@ -4,14 +4,14 @@
DriverOperator-gated Reconnect/Restart buttons appear for authorised users. *@
@implements IAsyncDisposable
@using Microsoft.AspNetCore.Authorization
@using Microsoft.AspNetCore.SignalR.Client
@using ZB.MOM.WW.OtOpcUa.AdminUI.Hubs
@using ZB.MOM.WW.OtOpcUa.Commons.Interfaces
@using ZB.MOM.WW.OtOpcUa.Commons.Messages.Admin
@using ZB.MOM.WW.OtOpcUa.Commons.Messages.Drivers
@inject NavigationManager Nav
@inject AuthenticationStateProvider AuthState
@inject IAuthorizationService AuthorizationService
@inject IAdminOperationsClient AdminOps
@inject IDriverStatusSnapshotStore StatusStore
<section class="panel rise mt-3" style="animation-delay:.04s; @(_stale ? "opacity:0.5;" : "")">
<div class="panel-head d-flex align-items-center gap-2">
@@ -139,7 +139,6 @@
[Parameter] public string ClusterId { get; set; } = "";
[Parameter] public bool Enabled { get; set; } = true;
private HubConnection? _hub;
private DriverHealthChanged? _snapshot;
private DateTime _lastUpdateUtc = DateTime.MinValue;
private bool _stale;
@@ -180,30 +179,44 @@
InvokeAsync(StateHasChanged);
}, null, TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(5));
_hub = new HubConnectionBuilder()
.WithUrl(Nav.ToAbsoluteUri("/hubs/driverstatus"))
.WithAutomaticReconnect()
.Build();
_hub.On<DriverHealthChanged>("status", snap =>
{
_snapshot = snap;
_lastUpdateUtc = DateTime.UtcNow;
_stale = false;
InvokeAsync(StateHasChanged);
});
// Read live status straight from the in-process snapshot store rather than opening a
// self-targeted SignalR connection. This component runs server-side (Blazor
// InteractiveServer), so a HubConnection to the browser's public URL (e.g.
// http://localhost:9200 behind Traefik) would dial that port from *inside* the container —
// where only Kestrel's :9000 listens — and fail with "Connection refused". The store is fed
// on every admin node by DriverStatusSignalRBridge (a per-node DistributedPubSub
// subscriber), so the local singleton is always current regardless of which replica serves
// this circuit.
try
{
await _hub.StartAsync();
_connecting = false;
await _hub.InvokeAsync("JoinDriver", DriverInstanceId);
StatusStore.SnapshotChanged += OnSnapshotChanged;
if (StatusStore.TryGet(DriverInstanceId, out var snap))
{
_snapshot = snap;
_lastUpdateUtc = DateTime.UtcNow;
}
}
catch (Exception ex)
{
_connecting = false;
_error = ex.Message;
}
finally
{
_connecting = false;
}
}
// Invoked by the snapshot store (on the bridge actor's thread) for every driver instance;
// ignore snapshots for other instances and marshal onto the render sync context.
private void OnSnapshotChanged(DriverHealthChanged snap)
{
if (!string.Equals(snap.DriverInstanceId, DriverInstanceId, StringComparison.Ordinal))
return;
_snapshot = snap;
_lastUpdateUtc = DateTime.UtcNow;
_stale = false;
InvokeAsync(StateHasChanged);
}
private async Task ReconnectAsync()
@@ -285,12 +298,13 @@
public async ValueTask DisposeAsync()
{
// Drain BOTH timers first so an in-flight callback can't invoke StateHasChanged on
// a component whose hub has already been released. System.Threading.Timer's async
// dispose awaits any in-flight callback (.NET 6+).
// Unsubscribe first so the singleton store can't invoke a handler on a disposed component.
StatusStore.SnapshotChanged -= OnSnapshotChanged;
// Drain BOTH timers so an in-flight callback can't invoke StateHasChanged on a component
// that's already gone. System.Threading.Timer's async dispose awaits any in-flight
// callback (.NET 6+).
if (_timer is not null) await _timer.DisposeAsync();
if (_opResultClearTimer is not null) await _opResultClearTimer.DisposeAsync();
if (_hub is not null) await _hub.DisposeAsync();
}
// Map DriverState string → chip CSS class using the 4 defined theme variants.