fix(health): route site heartbeats into the aggregator

CentralCommunicationActor.HandleHeartbeat was forwarding each incoming HeartbeatMessage to Context.Parent, which resolves to the /user guardian — a non-actor. Every site heartbeat went straight to dead letters (~1026 per central node per 30 minutes at the default ~2s interval across three sites). The aggregator now exposes MarkHeartbeat(siteId, receivedAt) which bumps LastReportReceivedAt on already-known sites (and clears IsOnline if it had flipped) without touching LatestReport. Heartbeats from unregistered sites are dropped — first registration still happens on the first full report. CentralCommunicationActor calls this in place of the no-op Tell. The result: heartbeats now serve their stated health-monitoring purpose (per CLAUDE.md) by keeping a site marked online between the 30s full reports if a single report is briefly delayed, and the dead letter noise disappears entirely.
2026-05-13 08:11:43 -04:00
parent 7bba48a14a
commit f66dc031a4
5 changed files with 41 additions and 8 deletions
@@ -9,6 +9,15 @@ namespace ScadaLink.HealthMonitoring;
 public interface ICentralHealthAggregator
 {
    void ProcessReport(SiteHealthReport report);
+
+    /// <summary>
+    /// Bumps the last-seen timestamp for a site already known via a prior
+    /// SiteHealthReport. Used to keep a site marked online between full
+    /// 30s reports when ~2s heartbeats are arriving — protects against the
+    /// 60s offline threshold firing on a transiently delayed report.
+    /// </summary>
+    void MarkHeartbeat(string siteId, DateTimeOffset receivedAt);
+
    IReadOnlyDictionary<string, SiteHealthState> GetAllSiteStates();
    SiteHealthState? GetSiteState(string siteId);
 }