Resolve Server-002, -004, -005, -006 code-review findings

Server-002: the gateway never terminated leftover MxGateway.Worker.exe
processes at startup, contradicting gateway.md and CLAUDE.md. Added
IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator,
and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are
accepted); updated gateway.md to describe the implemented behavior.

Server-004: API-key scopes were persisted verbatim with no validation. Added
GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now
reject unknown scope strings.

Server-005: a non-SqlException/InvalidOperationException fault on the initial
Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches
all non-cancellation exceptions on first load and RefreshCoreAsync broadens
its catch so the cache records Stale/Unavailable instead.

Server-006: OpenSessionAsync incremented the open-sessions gauge before
alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch
path now calls SessionRemoved() when the gauge was incremented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-18 21:31:10 -04:00
parent 5e795aeeb8
commit 1d9e3afadd
18 changed files with 676 additions and 15 deletions
@@ -68,6 +68,7 @@ public sealed class SessionManager : ISessionManager
EnsureSessionCapacity();
GatewaySession? session = null;
bool sessionOpenedRecorded = false;
try
{
session = CreateSession(request, clientIdentity);
@@ -86,6 +87,7 @@ public sealed class SessionManager : ISessionManager
session.AttachWorkerClient(workerClient);
session.MarkReady();
_metrics.SessionOpened();
sessionOpenedRecorded = true;
await TryAutoSubscribeAlarmsAsync(session, cancellationToken).ConfigureAwait(false);
@@ -100,6 +102,14 @@ public sealed class SessionManager : ISessionManager
await session.DisposeAsync().ConfigureAwait(false);
}
// If SessionOpened() already incremented the open-session gauge,
// a failure after that point (e.g. auto-subscribe rejection) must
// decrement it again so mxgateway.sessions.open does not leak.
if (sessionOpenedRecorded)
{
_metrics.SessionRemoved();
}
ReleaseSessionSlot();
_metrics.Fault(SessionManagerErrorCode.OpenFailed.ToString());
_logger.LogWarning(