fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)

The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 05:54:33 -04:00
parent 796871c210
commit 4df8737c86
4 changed files with 175 additions and 9 deletions

View File

@@ -732,13 +732,43 @@ public sealed class GalaxyDriver
_eventPump = new EventPump(
_subscriber!, _subscriptions, _logger,
channelCapacity: _options.MxAccess.EventPumpChannelCapacity,
clientName: _options.MxAccess.ClientName);
clientName: _options.MxAccess.ClientName,
onStreamFault: OnEventPumpStreamFault);
_eventPump.OnDataChange += OnPumpDataChange;
_eventPump.Start();
return _eventPump;
}
}
/// <summary>
/// Stream-fault callback for the <see cref="EventPump"/>. The gw StreamEvents
/// stream faulted (transient gateway drop, network blip, gw restart). Forward
/// the cause to the <see cref="ReconnectSupervisor"/> so it drives reopen →
/// replay; without this hand-off a transient transport drop permanently kills
/// the event stream and <c>GetHealth()</c> keeps reporting Healthy.
/// </summary>
private void OnEventPumpStreamFault(Exception cause)
{
var supervisor = _supervisor;
if (supervisor is null)
{
// No production runtime (skeleton / injected-seam path) — nothing to drive.
_logger.LogWarning(cause,
"GalaxyDriver {InstanceId} event stream faulted but no reconnect supervisor is wired.",
_driverInstanceId);
return;
}
try
{
supervisor.ReportTransportFailure(cause);
}
catch (ObjectDisposedException)
{
// Driver is being disposed — the stream fault is just shutdown noise.
}
}
// ===== IAlarmSource =====
/// <summary>