State machine that drives GalaxyDriver's recovery from gw transport failure. Healthy → TransportLost → Reopening → Replaying → Healthy. Drivers report failure signals; the supervisor runs reopen + replay with capped exponential backoff (default 500ms → 30s) until both succeed. Files: - Runtime/ReconnectSupervisor.cs — state machine with snapshot, change event, last-error tracking, and a one-attempt-at-a-time recovery loop. Idempotent ReportTransportFailure: repeated failure reports during an in-flight recovery do not spawn parallel loops. Reopen + replay are caller-supplied callbacks (the driver injects them in the wire-up PR); reopen re-Registers the gw session, replay re-establishes every active subscription via gw's ReplaySubscriptionsCommand (mxaccessgw issue gw-3) or the SubscribeBulk fallback. Dispose cancels the loop cleanly. - Public StateTransition record + IsDegraded predicate the driver maps to DriverState.Degraded for health snapshots. Wiring (GalaxyDriver subscribes the supervisor to its EventPump's transport-failure signal, exposes IsDegraded through GetHealth(), routes reopen/replay callbacks through GalaxyMxSession + SubscriptionRegistry) lands in PR 4.W to avoid conflict with the parallel host-probe track (PR 4.7) and align the wire-up with the rest of Phase 4's plumbing. 9 supervisor tests (full state-machine traversal, retry-until-success on both reopen and replay failures, idempotent failure reports, last-error propagation, Dispose mid-recovery, post-dispose throws, fast-path Healthy WaitForHealthy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.6 KiB
6.6 KiB