fix(health-monitoring): resolve HealthMonitoring-004,006,010,011,012 — heartbeat-doc accuracy, testable sequence seeding, logged failures, dead-code removal

This commit is contained in:
Joseph Doherty
2026-05-16 22:14:23 -04:00
parent e57ccd78b7
commit 2d7ac5b57f
9 changed files with 260 additions and 35 deletions

View File

@@ -210,10 +210,12 @@ public class CentralHealthAggregator : BackgroundService, ICentralHealthAggregat
var state = kvp.Value;
if (!state.IsOnline) continue;
// Use LastHeartbeatAt — heartbeats arrive frequently from any
// Use LastHeartbeatAt — heartbeats arrive every ~5s from any
// healthy site node (cadence owned by Cluster Infrastructure /
// SiteCommunicationActor), so OfflineTimeout only fires when no
// node can reach central, not during single-node failovers.
// SiteCommunicationActor — CommunicationOptions.TransportHeartbeatInterval),
// so the 60s OfflineTimeout tolerates several missed heartbeats and
// only fires when no node can reach central, not during single-node
// failovers.
//
// The synthetic "central" site has no heartbeat source — its only
// signal is the 30s CentralHealthReportLoop self-report — so it gets