Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin) #33

Merged
dohertj2 merged 1 commits from phase-3-pr34-host-status-publisher-page into v2 2026-04-18 16:04:21 -04:00
Owner

Closes LMX follow-up #7 by wiring together the data layer from PR 33. Two halves: a Server-side publisher that writes per-host status rows, and an Admin-side page that renders them with per-cluster grouping and stale-row flagging.

Server half — HostStatusPublisher

BackgroundService that every 10 seconds walks every registered driver, skips drivers without IHostConnectivityProbe, calls GetHostStatuses(), and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName).

  • Upsert semanticsLastSeenUtc advances unconditionally every tick (heartbeat); State + StateChangedUtc only change on an actual transition. That lets the Admin distinguish "still Running" from "freshly transitioned to Running".
  • Failure isolation — a driver's GetHostStatuses throwing logs + skips that driver this tick. DB unreachable logs + retries next heartbeat (no buffering — the next snapshot is more useful than replaying stale transitions).
  • Startup — 2s delay so NodeBootstrap.RegisterAsync calls land first, then immediate tick so a fresh Server surfaces its topology without waiting a full interval.
  • Polling vs event-driven — polling chosen for this PR; sub-heartbeat latency via OnHostStatusChanged subscription is an optional follow-up (DriverHost doesn't expose registration lifecycle events today).

Requires Microsoft.EntityFrameworkCore.SqlServer on the Server project + AddDbContext<OtOpcUaConfigDbContext> in Program.cs (first direct SQL access from the Server — LocalCache is LiteDB-only).

Admin half — /hosts page

  • HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the node row doesn't exist yet (first-boot bootstrap case — rendered under (unassigned)).
  • StaleThreshold = 30s covers one missed heartbeat plus clock skew + GC pauses.
  • Page layout: four summary cards (Hosts / Running / Stale / Faulted), per-cluster tables grouped, Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns.
  • 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + the Fleet dashboard.
  • Sidebar link added between Fleet status and Clusters.

Tests

HostStatusPublisherTests (4 new Integration cases, throwaway DB per run):

  • Publisher upserts one row per host from each probe-capable driver, skips non-probe drivers.
  • Second tick advances LastSeenUtc without creating duplicate rows (upsert end-to-end).
  • State change between ticks updates State AND StateChangedUtc (1ms tolerance for datetime2(3) rounding — documented inline).
  • MapState translates every HostState enum member.

Test posture

  • Server.Tests Integration: 4 new pass.
  • Admin.Tests Unit: 23 / 0 (unchanged — new page is pure display, tested via its service).
  • Admin + Server build clean.

Deferred (explicit in lmx-followups.md)

  • Event-driven push for sub-heartbeat latency.
  • Failure-count column (needs publisher to track per-host transition history).
  • SignalR fan-out from Server → Admin, replacing the DB-polled refresh.
Closes LMX follow-up #7 by wiring together the data layer from PR 33. Two halves: a Server-side publisher that writes per-host status rows, and an Admin-side page that renders them with per-cluster grouping and stale-row flagging. ## Server half — `HostStatusPublisher` `BackgroundService` that every 10 seconds walks every registered driver, skips drivers without `IHostConnectivityProbe`, calls `GetHostStatuses()`, and upserts one `DriverHostStatus` row per `(NodeId, DriverInstanceId, HostName)`. - **Upsert semantics** — `LastSeenUtc` advances unconditionally every tick (heartbeat); `State` + `StateChangedUtc` only change on an actual transition. That lets the Admin distinguish "still Running" from "freshly transitioned to Running". - **Failure isolation** — a driver's `GetHostStatuses` throwing logs + skips that driver this tick. DB unreachable logs + retries next heartbeat (no buffering — the next snapshot is more useful than replaying stale transitions). - **Startup** — 2s delay so `NodeBootstrap.RegisterAsync` calls land first, then immediate tick so a fresh Server surfaces its topology without waiting a full interval. - **Polling vs event-driven** — polling chosen for this PR; sub-heartbeat latency via `OnHostStatusChanged` subscription is an optional follow-up (`DriverHost` doesn't expose registration lifecycle events today). Requires `Microsoft.EntityFrameworkCore.SqlServer` on the Server project + `AddDbContext<OtOpcUaConfigDbContext>` in `Program.cs` (first direct SQL access from the Server — LocalCache is LiteDB-only). ## Admin half — `/hosts` page - `HostStatusService` left-joins `DriverHostStatus` against `ClusterNode` on `NodeId` so rows persist even when the node row doesn't exist yet (first-boot bootstrap case — rendered under `(unassigned)`). - `StaleThreshold = 30s` covers one missed heartbeat plus clock skew + GC pauses. - Page layout: four summary cards (Hosts / Running / Stale / Faulted), per-cluster tables grouped, Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns. - 10s auto-refresh via `IServiceScopeFactory` timer pattern matching `FleetStatusPoller` + the Fleet dashboard. - Sidebar link added between `Fleet status` and `Clusters`. ## Tests `HostStatusPublisherTests` (4 new Integration cases, throwaway DB per run): - Publisher upserts one row per host from each probe-capable driver, skips non-probe drivers. - Second tick advances `LastSeenUtc` without creating duplicate rows (upsert end-to-end). - State change between ticks updates `State` AND `StateChangedUtc` (1ms tolerance for `datetime2(3)` rounding — documented inline). - `MapState` translates every `HostState` enum member. ## Test posture - Server.Tests Integration: 4 new pass. - Admin.Tests Unit: 23 / 0 (unchanged — new page is pure display, tested via its service). - Admin + Server build clean. ## Deferred (explicit in `lmx-followups.md`) - Event-driven push for sub-heartbeat latency. - Failure-count column (needs publisher to track per-host transition history). - SignalR fan-out from Server → Admin, replacing the DB-polled refresh.
dohertj2 added 1 commit 2026-04-18 16:04:19 -04:00
Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval. ef2a810b2d
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dohertj2 merged commit 2584379e75 into v2 2026-04-18 16:04:21 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: dohertj2/lmxopcua#33