feat(host): readiness gates on required cluster singletons (#28, M2.14)
REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a readiness criterion, but /health/ready only checked database + akka-cluster. Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes never run it). Probe: for each required central singleton, Ask its local ClusterSingletonProxy an Identify with a short bounded per-singleton timeout (~2s, probes run concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the timeout means the singleton is running and reachable through the proxy; a null subject or a timeout means unreachable → Unhealthy, naming the unreachable singleton(s). The check never throws (catch-all → Unhealthy) and resolves ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up). Required-always set = the five singleton proxies created unconditionally in AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest, site-call-audit, audit-log-purge, site-audit-reconciliation. There are no feature/config-gated central singletons today; any future gated singleton is the "if applicable" case and must NOT be added to the required set. Leadership-agnostic: the proxy reaches the singleton from either central node, so a ready standby still reports ready (readiness must not require cluster leadership — that is the Active tier's job). During a brief singleton handover the probe may time out and the node flaps to not-ready, which is correct (a node mid-handover is legitimately not fully ready); no retries, to keep the probe fast. Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a TestKit ActorSystem — all proxies present+reachable → Healthy; one missing → Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests regression-guards the Ready tag + absence of the Active tag on the new check.
This commit is contained in:
@@ -202,6 +202,18 @@ try
|
||||
failureStatus: null,
|
||||
tags: new[] { ZbHealthTags.Ready },
|
||||
args: AkkaClusterStatusPolicy.Default)
|
||||
// M2.14 (#28): readiness ALSO reflects "required cluster singletons running"
|
||||
// (REQ-HOST-4a). Probes each central singleton's local ClusterSingletonProxy
|
||||
// with a bounded Identify and degrades to Unhealthy if any required singleton
|
||||
// is unreachable. Registered inside the Central-role branch (this is it) so the
|
||||
// check is naturally role-scoped — site nodes never run it. It resolves
|
||||
// ActorSystem from DI per probe, like the akka-cluster check above, and is
|
||||
// leadership-agnostic so a ready standby still reports ready (the proxy reaches
|
||||
// the singleton from either node).
|
||||
.AddTypeActivatedCheck<RequiredSingletonsHealthCheck>(
|
||||
"required-singletons",
|
||||
failureStatus: null,
|
||||
tags: new[] { ZbHealthTags.Ready })
|
||||
.AddTypeActivatedCheck<ActiveNodeHealthCheck>(
|
||||
"active-node",
|
||||
failureStatus: null,
|
||||
|
||||
Reference in New Issue
Block a user