From ef2a810b2d4b92e8cb1f83a77912798d961b7a5b Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sat, 18 Apr 2026 15:51:55 -0400 Subject: [PATCH] =?UTF-8?q?Phase=203=20PR=2034=20=E2=80=94=20Host-status?= =?UTF-8?q?=20publisher=20(Server)=20+=20/hosts=20drill-down=20page=20(Adm?= =?UTF-8?q?in).=20Closes=20LMX=20follow-up=20#7=20by=20wiring=20together?= =?UTF-8?q?=20the=20data=20layer=20from=20PR=2033.=20Server.HostStatusPubl?= =?UTF-8?q?isher=20is=20a=20BackgroundService=20that=20walks=20every=20dri?= =?UTF-8?q?ver=20registered=20in=20DriverHost=20every=2010=20seconds,=20sk?= =?UTF-8?q?ips=20drivers=20that=20don't=20implement=20IHostConnectivityPro?= =?UTF-8?q?be,=20calls=20GetHostStatuses()=20on=20each=20probe-capable=20d?= =?UTF-8?q?river,=20and=20upserts=20one=20DriverHostStatus=20row=20per=20(?= =?UTF-8?q?NodeId,=20DriverInstanceId,=20HostName)=20into=20the=20central?= =?UTF-8?q?=20config=20DB.=20Upsert=20path:=20SingleOrDefaultAsync=20on=20?= =?UTF-8?q?the=20composite=20PK;=20if=20no=20row=20exists,=20Add=20a=20new?= =?UTF-8?q?=20one;=20if=20a=20row=20exists,=20LastSeenUtc=20advances=20unc?= =?UTF-8?q?onditionally=20(heartbeat)=20and=20State=20+=20StateChangedUtc?= =?UTF-8?q?=20update=20only=20on=20transitions=20so=20Admin=20UI=20can=20d?= =?UTF-8?q?istinguish=20'still=20reporting,=20still=20Running'=20from=20'f?= =?UTF-8?q?reshly=20transitioned=20to=20Running'.=20MapState=20translates?= =?UTF-8?q?=20Core.Abstractions.HostState=20to=20Configuration.Enums.Drive?= =?UTF-8?q?rHostState=20(intentional=20duplicate=20enum=20=E2=80=94=20Conf?= =?UTF-8?q?iguration=20project=20stays=20free=20of=20driver-runtime=20deps?= =?UTF-8?q?=20per=20PR=2033's=20choice).=20If=20a=20driver's=20GetHostStat?= =?UTF-8?q?uses=20throws,=20log=20warning=20and=20skip=20that=20driver=20t?= =?UTF-8?q?his=20tick=20=E2=80=94=20never=20take=20down=20the=20Server=20o?= =?UTF-8?q?n=20a=20publisher=20failure.=20If=20the=20DB=20is=20unreachable?= =?UTF-8?q?,=20log=20warning=20+=20retry=20next=20heartbeat=20(no=20buffer?= =?UTF-8?q?ing=20=E2=80=94=20next=20tick's=20current-state=20snapshot=20is?= =?UTF-8?q?=20more=20useful=20than=20replaying=20stale=20transitions=20aft?= =?UTF-8?q?er=20a=20long=20outage).=202-second=20startup=20delay=20so=20No?= =?UTF-8?q?deBootstrap's=20RegisterAsync=20calls=20land=20before=20the=20f?= =?UTF-8?q?irst=20publish=20tick,=20then=20tick=20runs=20immediately=20so?= =?UTF-8?q?=20a=20freshly-started=20Server=20surfaces=20its=20host=20topol?= =?UTF-8?q?ogy=20in=20the=20Admin=20UI=20without=20waiting=20a=20full=20in?= =?UTF-8?q?terval.=20Polling=20chosen=20over=20event-driven=20for=20initia?= =?UTF-8?q?l=20scope:=20simpler,=20matches=20Admin=20UI=20consumer=20caden?= =?UTF-8?q?ce,=20avoids=20DriverHost=20lifecycle-event=20plumbing=20that?= =?UTF-8?q?=20doesn't=20exist=20today.=20Event-driven=20push=20for=20sub-h?= =?UTF-8?q?eartbeat=20latency=20is=20a=20straightforward=20follow-up.=20Ad?= =?UTF-8?q?min.Services.HostStatusService=20left-joins=20DriverHostStatus?= =?UTF-8?q?=20against=20ClusterNode=20on=20NodeId=20so=20rows=20persist=20?= =?UTF-8?q?even=20when=20the=20ClusterNode=20entry=20doesn't=20exist=20yet?= =?UTF-8?q?=20(first-boot=20bootstrap=20case).=20StaleThreshold=20=3D=2030?= =?UTF-8?q?s=20=E2=80=94=20covers=20one=20missed=20publisher=20heartbeat?= =?UTF-8?q?=20plus=20a=20generous=20buffer=20for=20clock=20skew=20and=20GC?= =?UTF-8?q?=20pauses.=20Admin=20Components/Pages/Hosts.razor=20=E2=80=94?= =?UTF-8?q?=20FleetAdmin-visible=20page=20grouped=20by=20cluster=20(handle?= =?UTF-8?q?s=20the=20'(unassigned)'=20case=20for=20rows=20without=20a=20ma?= =?UTF-8?q?tching=20ClusterNode).=20Four=20summary=20cards=20(Hosts=20/=20?= =?UTF-8?q?Running=20/=20Stale=20/=20Faulted);=20per-cluster=20table=20wit?= =?UTF-8?q?h=20Node=20/=20Driver=20/=20Host=20/=20State=20+=20Stale-badge?= =?UTF-8?q?=20/=20Last-transition=20/=20Last-seen=20/=20Detail=20columns;?= =?UTF-8?q?=2010s=20auto-refresh=20via=20IServiceScopeFactory=20timer=20pa?= =?UTF-8?q?ttern=20matching=20FleetStatusPoller=20+=20Fleet=20dashboard=20?= =?UTF-8?q?(PR=2027).=20Row-class=20highlighting:=20Faulted=20=E2=86=92=20?= =?UTF-8?q?table-danger,=20Stale=20=E2=86=92=20table-warning,=20else=20def?= =?UTF-8?q?ault.=20State=20badge=20maps=20DriverHostState=20enum=20to=20bo?= =?UTF-8?q?otstrap=20color=20classes.=20Sidebar=20link=20added=20between?= =?UTF-8?q?=20'Fleet=20status'=20and=20'Clusters'.=20Server=20csproj=20add?= =?UTF-8?q?s=20Microsoft.EntityFrameworkCore.SqlServer=2010.0.0=20+=20regi?= =?UTF-8?q?sters=20OtOpcUaConfigDbContext=20in=20Program.cs=20scoped=20via?= =?UTF-8?q?=20NodeOptions.ConfigDbConnectionString=20(no=20Admin-style=20m?= =?UTF-8?q?anual=20SQL=20raw=20=E2=80=94=20the=20DbContext=20is=20the=20on?= =?UTF-8?q?ly=20access=20path,=20keeps=20migrations=20owner-of-record).=20?= =?UTF-8?q?Tests=20=E2=80=94=20HostStatusPublisherTests=20(4=20new=20Integ?= =?UTF-8?q?ration=20cases,=20uses=20per-run=20throwaway=20DB=20matching=20?= =?UTF-8?q?the=20FleetStatusPollerTests=20pattern):=20publisher=20upserts?= =?UTF-8?q?=20one=20row=20per=20host=20from=20each=20probe-capable=20drive?= =?UTF-8?q?r=20and=20skips=20non-probe=20drivers;=20second=20tick=20advanc?= =?UTF-8?q?es=20LastSeenUtc=20without=20creating=20duplicate=20rows=20(ups?= =?UTF-8?q?ert=20pattern=20verified=20end-to-end);=20state=20change=20betw?= =?UTF-8?q?een=20ticks=20updates=20State=20AND=20StateChangedUtc=20(dateti?= =?UTF-8?q?me2(3)=20rounds=20to=20millisecond=20precision=20so=20compariso?= =?UTF-8?q?n=20uses=201ms=20tolerance=20=E2=80=94=20documented=20inline);?= =?UTF-8?q?=20MapState=20translates=20every=20HostState=20enum=20member.?= =?UTF-8?q?=20Server.Tests=20Integration:=204=20new=20tests=20pass.=20Admi?= =?UTF-8?q?n=20build=20clean,=20Admin.Tests=20Unit=20still=2023=20/=200.?= =?UTF-8?q?=20docs/v2/lmx-followups.md=20item=20#7=20marked=20DONE=20with?= =?UTF-8?q?=20three=20explicit=20deferred=20items=20(event-driven=20push,?= =?UTF-8?q?=20failure-count=20column,=20SignalR=20fan-out).?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/v2/lmx-followups.md | 33 ++- .../Components/Layout/MainLayout.razor | 1 + .../Components/Pages/Hosts.razor | 160 ++++++++++++++ src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs | 1 + .../Services/HostStatusService.cs | 63 ++++++ .../HostStatusPublisher.cs | 143 +++++++++++++ src/ZB.MOM.WW.OtOpcUa.Server/Program.cs | 8 + .../ZB.MOM.WW.OtOpcUa.Server.csproj | 1 + .../HostStatusPublisherTests.cs | 197 ++++++++++++++++++ 9 files changed, 599 insertions(+), 8 deletions(-) create mode 100644 src/ZB.MOM.WW.OtOpcUa.Admin/Components/Pages/Hosts.razor create mode 100644 src/ZB.MOM.WW.OtOpcUa.Admin/Services/HostStatusService.cs create mode 100644 src/ZB.MOM.WW.OtOpcUa.Server/HostStatusPublisher.cs create mode 100644 tests/ZB.MOM.WW.OtOpcUa.Server.Tests/HostStatusPublisherTests.cs diff --git a/docs/v2/lmx-followups.md b/docs/v2/lmx-followups.md index 11ca514..d5b91eb 100644 --- a/docs/v2/lmx-followups.md +++ b/docs/v2/lmx-followups.md @@ -108,13 +108,30 @@ condition node). Alarm tracking already has its own integration test (`AlarmSubscription*`); the multi-driver alarm case would need a stub `IAlarmSource` that's worth its own focused PR. -## 7. Host-status per-AppEngine granularity → Admin UI dashboard +## 7. Host-status per-AppEngine granularity → Admin UI dashboard — **DONE (PRs 33 + 34)** -**Status**: PR 13 ships per-platform/per-AppEngine `ScanState` probing; PR 17 -surfaces the resulting `OnHostStatusChanged` events through OPC UA. Admin -UI doesn't render a per-host dashboard yet. +**PR 33** landed the data layer: `DriverHostStatus` entity + migration with +composite key `(NodeId, DriverInstanceId, HostName)` and two query-supporting +indexes (per-cluster drill-down on `NodeId`, stale-row detection on +`LastSeenUtc`). -**To do**: -- SignalR hub push of `HostStatusChangedEventArgs` to the Admin UI. -- Dashboard page showing each tracked host, current state, last transition - time, failure count. +**PR 34** wired the publisher + consumer. `HostStatusPublisher` is a +`BackgroundService` in the Server process that walks every registered +`IHostConnectivityProbe`-capable driver every 10s, calls +`GetHostStatuses()`, and upserts rows (`LastSeenUtc` advances each tick; +`State` + `StateChangedUtc` update on transitions). Admin UI `/hosts` page +groups by cluster, shows four summary cards (Hosts / Running / Stale / +Faulted), and flags rows whose `LastSeenUtc` is older than 30s as Stale so +operators see crashed Servers without waiting for a state change. + +Deferred as follow-ups: + +- Event-driven push (subscribe to `OnHostStatusChanged` per driver for + sub-heartbeat latency). Adds DriverHost lifecycle-event plumbing; + 10s polling is fine for operator-scale use. +- Failure-count column — needs the publisher to track a transition history + per host, not just current-state. +- SignalR fan-out to the Admin page (currently the page polls the DB, not + a hub). The DB-polled version is fine at current cadence but a hub push + would eliminate the 10s race where a new row sits in the DB before the + Admin page notices. diff --git a/src/ZB.MOM.WW.OtOpcUa.Admin/Components/Layout/MainLayout.razor b/src/ZB.MOM.WW.OtOpcUa.Admin/Components/Layout/MainLayout.razor index 03540ed..90687dc 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Admin/Components/Layout/MainLayout.razor +++ b/src/ZB.MOM.WW.OtOpcUa.Admin/Components/Layout/MainLayout.razor @@ -6,6 +6,7 @@