fix(host): register ActorSystem as DI singleton so health-probe scopes don't dispose it (HOST-021)

Per-probe health-check child scopes were disposing the AddTransient-bridged
ActorSystem (IDisposable), terminating the live cluster node ~4s after boot and
leaving every singleton-proxy Ask to hang the full 30s QueryTimeout — the central
report pages (/notifications, /site-calls, /monitoring/health) loaded in ~30s.
Bridge it as a singleton via a new lazy AkkaHostedService.GetOrCreateActorSystem()
so child-scope disposal never touches it. Verified: 0 post-startup terminates,
healthy active/standby, report pages ~0.05s, Playwright 68 passed / 0 failed.
This commit is contained in:
Joseph Doherty
2026-06-05 08:26:09 -04:00
parent 0783547a2d
commit d33617d65d
4 changed files with 328 additions and 39 deletions
+11 -6
View File
@@ -204,12 +204,17 @@ try
builder.Services.AddSingleton<AkkaHostedService>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>());
// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the
// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each
// resolve re-reads the current value — null while warming up (checks → Degraded), live after.
// The factory must NOT throw: GetService<ActorSystem>() must return null (not raise) pre-start.
builder.Services.AddTransient<Akka.Actor.ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem!);
// HOST-021: bridge the AkkaHostedService-owned ActorSystem to DI as a SINGLETON via
// GetOrCreateActorSystem(). The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem
// from DI, per probe, inside a child scope. ActorSystem is IDisposable, so a TRANSIENT
// (or scoped) bridge is captured-and-disposed by each probe's scope — disposing the live
// system mid-flight (CoordinatedShutdown/ActorSystemTerminateReason) and wedging the
// central report pages at the 30s Ask timeout. A singleton is resolved from the root and
// never disposed by a child scope; routing through GetOrCreateActorSystem (instead of a
// plain singleton factory over .ActorSystem) means the first resolve CREATES the system
// rather than caching a null if a probe wins the startup race.
builder.Services.AddSingleton<Akka.Actor.ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().GetOrCreateActorSystem());
// InboundAPI-022: register the production IActiveNodeGate implementation so
// standby-node gating is actually enforced (the InboundApiEndpointFilter