fix: only active singleton node sends health reports

Both nodes of a site cluster were sending health reports. The standby
node (without the DeploymentManager singleton) reported 0 instances and
no connections, overwriting the active node's data in the aggregator.

Added IsActiveNode flag to ISiteHealthCollector, set by
DeploymentManagerActor on PreStart/PostStop. HealthReportSender skips
sending when the node is not active. Also ensured EnsureDclConnections
is called during startup batch creation so data connections survive
container restarts.
This commit is contained in:
Joseph Doherty
2026-03-18 01:44:57 -04:00
parent 213ca2698a
commit 8095c8efbe
6 changed files with 40 additions and 3 deletions

View File

@@ -17,6 +17,7 @@ public class SiteHealthCollector : ISiteHealthCollector
private readonly ConcurrentDictionary<string, TagResolutionStatus> _tagResolutionCounts = new();
private IReadOnlyDictionary<string, int> _sfBufferDepths = new Dictionary<string, int>();
private int _deployedInstanceCount, _enabledInstanceCount, _disabledInstanceCount;
private volatile bool _isActiveNode;
/// <summary>
/// Increment the script error counter. Covers unhandled exceptions,
@@ -90,6 +91,10 @@ public class SiteHealthCollector : ISiteHealthCollector
Interlocked.Exchange(ref _disabledInstanceCount, disabled);
}
public void SetActiveNode(bool isActive) => _isActiveNode = isActive;
public bool IsActiveNode => _isActiveNode;
/// <summary>
/// Collect the current health report for the site and reset interval counters.
/// Connection statuses and tag resolution counts are NOT reset (they reflect current state).