Files
lmxopcua/tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/Runtime
Joseph Doherty 123e3e48b9 PR 4.5 — ReconnectSupervisor
State machine that drives GalaxyDriver's recovery from gw transport
failure. Healthy → TransportLost → Reopening → Replaying → Healthy. Drivers
report failure signals; the supervisor runs reopen + replay with capped
exponential backoff (default 500ms → 30s) until both succeed.

Files:
- Runtime/ReconnectSupervisor.cs — state machine with snapshot, change
  event, last-error tracking, and a one-attempt-at-a-time recovery loop.
  Idempotent ReportTransportFailure: repeated failure reports during an
  in-flight recovery do not spawn parallel loops. Reopen + replay are
  caller-supplied callbacks (the driver injects them in the wire-up PR);
  reopen re-Registers the gw session, replay re-establishes every active
  subscription via gw's ReplaySubscriptionsCommand (mxaccessgw issue gw-3)
  or the SubscribeBulk fallback. Dispose cancels the loop cleanly.
- Public StateTransition record + IsDegraded predicate the driver maps
  to DriverState.Degraded for health snapshots.

Wiring (GalaxyDriver subscribes the supervisor to its EventPump's
transport-failure signal, exposes IsDegraded through GetHealth(), routes
reopen/replay callbacks through GalaxyMxSession + SubscriptionRegistry)
lands in PR 4.W to avoid conflict with the parallel host-probe track
(PR 4.7) and align the wire-up with the rest of Phase 4's plumbing.

9 supervisor tests (full state-machine traversal, retry-until-success on
both reopen and replay failures, idempotent failure reports, last-error
propagation, Dispose mid-recovery, post-dispose throws, fast-path Healthy
WaitForHealthy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:39:21 -04:00
..