Files
mxaccess/rust/crates
Joseph Doherty ff4ea4d5a9
rust / build / test / clippy / fmt (push) Has been cancelled
[F16] mxaccess: real Session::recover_connection (re-bind + re-advise)
Closes F16. Replaces the wave-2 no-op recover_connection with the
full .NET-equivalent shape (MxNativeSession.cs:399-474). Three
pieces:

1. Subscription registry on SessionInner.
   New subscriptions: Mutex<HashMap<[u8; 16], SubscriptionEntry>>
   tracks every active advise. subscribe() inserts after a successful
   AdviseSupervisory; unsubscribe() removes on the success path only
   (failed UnAdvises stay registered so next recovery replays them).
   The consumer's Subscription handle still holds the BroadcastStream;
   the registry is purely for AdviseSupervisory replay.

2. Pluggable RebuildFactory.
   New public typedef:
     pub type RebuildFactory = Arc<
         dyn Fn() -> Pin<Box<dyn Future<Output = Result<NmxClient,
                                                        NmxClientError>>
                            + Send>>
             + Send + Sync,
     >;
   Installed via Session::set_recovery_factory(factory);
   queryable via has_recovery_factory(). Kept separate from
   connect_nmx / connect_nmx_auto so existing constructors stay
   non-breaking — consumers opt in by calling the setter
   after-the-fact.

3. Real recover_connection + recover_connection_core.
   recover_connection is the retry loop (mirrors cs:399-440): for
   attempt in 1..=policy.max_attempts, emit RecoveryEvent::Started
   → call recover_connection_core → on Ok emit Recovered + return,
   on Err emit Failed{will_retry, error}, sleep policy.delay, retry,
   or bubble the last error.

   recover_connection_core mirrors cs:442-474: rebuild NMX via the
   factory → RegisterEngine2 with the saved callback_obj_ref → optional
   SetHeartbeatSendInterval → snapshot the registry under the lock,
   replay AdviseSupervisory(correlation_id) for each entry → atomically
   swap *nmx_lock = replacement. Old NmxClient drops at end of scope,
   closing its TCP transport.

Subscription correlation ids are preserved across the swap so the
consumer's Subscription stream continues to receive on its existing
broadcast filter. The CallbackExporter stays bound across recoveries
— no TCP listener re-bind.

R15's "long-lived connection task" was listed as a hard prereq, but
the existing Mutex<NmxClient> already serialises concurrent ops
during the rebuild — recover_connection_core holds the inner mutex
during the swap, concurrent ops just wait. Functionally equivalent
to the long-lived-task design.

New ConfigError::RecoveryNotConfigured returned when
recover_connection is called without a factory installed. New
public re-export: RebuildFactory.

Tests (mxaccess 65 → 67):
  - recover_connection_without_factory_returns_recovery_not_configured
  - recover_connection_with_always_failing_factory_exhausts_attempts
    (pins (Started, Failed)×3 + final will_retry=false + bubbled
    TransportFailure)
  - subscribe_populates_registry_unsubscribe_clears_it
  - recovery_events_supports_multiple_subscribers (updated for the
    new factory-required path)

connect_nmx_auto-side auto-population of the factory (capturing the
ntlm_factory + discovered (addr, service_ipid) so consumers don't
re-author the closure) is a future polish — not required to close
F16.

design/followups.md: F16 moved to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:57:43 -04:00
..