Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up. Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'. Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record). Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -6,6 +6,7 @@
|
||||
<ul class="nav flex-column">
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/">Overview</a></li>
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/fleet">Fleet status</a></li>
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/hosts">Host status</a></li>
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/clusters">Clusters</a></li>
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/reservations">Reservations</a></li>
|
||||
<li class="nav-item"><a class="nav-link text-light" href="/certificates">Certificates</a></li>
|
||||
|
||||
160
src/ZB.MOM.WW.OtOpcUa.Admin/Components/Pages/Hosts.razor
Normal file
160
src/ZB.MOM.WW.OtOpcUa.Admin/Components/Pages/Hosts.razor
Normal file
@@ -0,0 +1,160 @@
|
||||
@page "/hosts"
|
||||
@using Microsoft.EntityFrameworkCore
|
||||
@using ZB.MOM.WW.OtOpcUa.Admin.Services
|
||||
@using ZB.MOM.WW.OtOpcUa.Configuration.Enums
|
||||
@inject IServiceScopeFactory ScopeFactory
|
||||
@implements IDisposable
|
||||
|
||||
<h1 class="mb-4">Driver host status</h1>
|
||||
|
||||
<div class="d-flex align-items-center mb-3 gap-2">
|
||||
<button class="btn btn-sm btn-outline-primary" @onclick="RefreshAsync" disabled="@_refreshing">
|
||||
@if (_refreshing) { <span class="spinner-border spinner-border-sm me-1" /> }
|
||||
Refresh
|
||||
</button>
|
||||
<span class="text-muted small">
|
||||
Auto-refresh every @RefreshIntervalSeconds s. Last updated: @(_lastRefreshUtc?.ToString("HH:mm:ss 'UTC'") ?? "—")
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div class="alert alert-info small mb-4">
|
||||
Each row is one host reported by a driver instance on a server node. Galaxy drivers report
|
||||
per-Platform / per-AppEngine entries; Modbus drivers report the PLC endpoint. Rows age out
|
||||
of the Server's publisher on every 10-second heartbeat — rows whose LastSeen is older than
|
||||
30s are flagged Stale, which usually means the owning Server process has crashed or lost
|
||||
its DB connection.
|
||||
</div>
|
||||
|
||||
@if (_rows is null)
|
||||
{
|
||||
<p>Loading…</p>
|
||||
}
|
||||
else if (_rows.Count == 0)
|
||||
{
|
||||
<div class="alert alert-secondary">
|
||||
No host-status rows yet. The Server publishes its first tick 2s after startup; if this list stays empty, check that the Server is running and the driver implements <code>IHostConnectivityProbe</code>.
|
||||
</div>
|
||||
}
|
||||
else
|
||||
{
|
||||
<div class="row g-3 mb-4">
|
||||
<div class="col-md-3"><div class="card"><div class="card-body">
|
||||
<h6 class="text-muted mb-1">Hosts</h6>
|
||||
<div class="fs-3">@_rows.Count</div>
|
||||
</div></div></div>
|
||||
<div class="col-md-3"><div class="card border-success"><div class="card-body">
|
||||
<h6 class="text-muted mb-1">Running</h6>
|
||||
<div class="fs-3 text-success">@_rows.Count(r => r.State == DriverHostState.Running && !HostStatusService.IsStale(r))</div>
|
||||
</div></div></div>
|
||||
<div class="col-md-3"><div class="card border-warning"><div class="card-body">
|
||||
<h6 class="text-muted mb-1">Stale</h6>
|
||||
<div class="fs-3 text-warning">@_rows.Count(HostStatusService.IsStale)</div>
|
||||
</div></div></div>
|
||||
<div class="col-md-3"><div class="card border-danger"><div class="card-body">
|
||||
<h6 class="text-muted mb-1">Faulted</h6>
|
||||
<div class="fs-3 text-danger">@_rows.Count(r => r.State == DriverHostState.Faulted)</div>
|
||||
</div></div></div>
|
||||
</div>
|
||||
|
||||
@foreach (var cluster in _rows.GroupBy(r => r.ClusterId ?? "(unassigned)").OrderBy(g => g.Key))
|
||||
{
|
||||
<h2 class="h5 mt-4">Cluster: <code>@cluster.Key</code></h2>
|
||||
<table class="table table-sm table-hover align-middle">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Node</th>
|
||||
<th>Driver</th>
|
||||
<th>Host</th>
|
||||
<th>State</th>
|
||||
<th>Last transition</th>
|
||||
<th>Last seen</th>
|
||||
<th>Detail</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
@foreach (var r in cluster)
|
||||
{
|
||||
<tr class="@RowClass(r)">
|
||||
<td><code>@r.NodeId</code></td>
|
||||
<td><code>@r.DriverInstanceId</code></td>
|
||||
<td>@r.HostName</td>
|
||||
<td>
|
||||
<span class="badge @StateBadge(r.State)">@r.State</span>
|
||||
@if (HostStatusService.IsStale(r))
|
||||
{
|
||||
<span class="badge bg-warning text-dark ms-1">Stale</span>
|
||||
}
|
||||
</td>
|
||||
<td class="small">@FormatAge(r.StateChangedUtc)</td>
|
||||
<td class="small @(HostStatusService.IsStale(r) ? "text-warning" : "")">@FormatAge(r.LastSeenUtc)</td>
|
||||
<td class="text-truncate small" style="max-width: 320px;" title="@r.Detail">@r.Detail</td>
|
||||
</tr>
|
||||
}
|
||||
</tbody>
|
||||
</table>
|
||||
}
|
||||
}
|
||||
|
||||
@code {
|
||||
// Mirrors HostStatusPublisher.HeartbeatInterval — polling ahead of the broadcaster
|
||||
// produces stale-looking rows mid-cycle.
|
||||
private const int RefreshIntervalSeconds = 10;
|
||||
|
||||
private List<HostStatusRow>? _rows;
|
||||
private bool _refreshing;
|
||||
private DateTime? _lastRefreshUtc;
|
||||
private Timer? _timer;
|
||||
|
||||
protected override async Task OnInitializedAsync()
|
||||
{
|
||||
await RefreshAsync();
|
||||
_timer = new Timer(async _ => await InvokeAsync(RefreshAsync),
|
||||
state: null,
|
||||
dueTime: TimeSpan.FromSeconds(RefreshIntervalSeconds),
|
||||
period: TimeSpan.FromSeconds(RefreshIntervalSeconds));
|
||||
}
|
||||
|
||||
private async Task RefreshAsync()
|
||||
{
|
||||
if (_refreshing) return;
|
||||
_refreshing = true;
|
||||
try
|
||||
{
|
||||
using var scope = ScopeFactory.CreateScope();
|
||||
var svc = scope.ServiceProvider.GetRequiredService<HostStatusService>();
|
||||
_rows = (await svc.ListAsync()).ToList();
|
||||
_lastRefreshUtc = DateTime.UtcNow;
|
||||
}
|
||||
finally
|
||||
{
|
||||
_refreshing = false;
|
||||
StateHasChanged();
|
||||
}
|
||||
}
|
||||
|
||||
private static string RowClass(HostStatusRow r) => r.State switch
|
||||
{
|
||||
DriverHostState.Faulted => "table-danger",
|
||||
_ when HostStatusService.IsStale(r) => "table-warning",
|
||||
_ => "",
|
||||
};
|
||||
|
||||
private static string StateBadge(DriverHostState s) => s switch
|
||||
{
|
||||
DriverHostState.Running => "bg-success",
|
||||
DriverHostState.Stopped => "bg-secondary",
|
||||
DriverHostState.Faulted => "bg-danger",
|
||||
_ => "bg-secondary",
|
||||
};
|
||||
|
||||
private static string FormatAge(DateTime t)
|
||||
{
|
||||
var age = DateTime.UtcNow - t;
|
||||
if (age.TotalSeconds < 60) return $"{(int)age.TotalSeconds}s ago";
|
||||
if (age.TotalMinutes < 60) return $"{(int)age.TotalMinutes}m ago";
|
||||
if (age.TotalHours < 24) return $"{(int)age.TotalHours}h ago";
|
||||
return t.ToString("yyyy-MM-dd HH:mm 'UTC'");
|
||||
}
|
||||
|
||||
public void Dispose() => _timer?.Dispose();
|
||||
}
|
||||
Reference in New Issue
Block a user