Files
mxaccess/design
Joseph Doherty ff4ea4d5a9
rust / build / test / clippy / fmt (push) Has been cancelled
[F16] mxaccess: real Session::recover_connection (re-bind + re-advise)
Closes F16. Replaces the wave-2 no-op recover_connection with the
full .NET-equivalent shape (MxNativeSession.cs:399-474). Three
pieces:

1. Subscription registry on SessionInner.
   New subscriptions: Mutex<HashMap<[u8; 16], SubscriptionEntry>>
   tracks every active advise. subscribe() inserts after a successful
   AdviseSupervisory; unsubscribe() removes on the success path only
   (failed UnAdvises stay registered so next recovery replays them).
   The consumer's Subscription handle still holds the BroadcastStream;
   the registry is purely for AdviseSupervisory replay.

2. Pluggable RebuildFactory.
   New public typedef:
     pub type RebuildFactory = Arc<
         dyn Fn() -> Pin<Box<dyn Future<Output = Result<NmxClient,
                                                        NmxClientError>>
                            + Send>>
             + Send + Sync,
     >;
   Installed via Session::set_recovery_factory(factory);
   queryable via has_recovery_factory(). Kept separate from
   connect_nmx / connect_nmx_auto so existing constructors stay
   non-breaking — consumers opt in by calling the setter
   after-the-fact.

3. Real recover_connection + recover_connection_core.
   recover_connection is the retry loop (mirrors cs:399-440): for
   attempt in 1..=policy.max_attempts, emit RecoveryEvent::Started
   → call recover_connection_core → on Ok emit Recovered + return,
   on Err emit Failed{will_retry, error}, sleep policy.delay, retry,
   or bubble the last error.

   recover_connection_core mirrors cs:442-474: rebuild NMX via the
   factory → RegisterEngine2 with the saved callback_obj_ref → optional
   SetHeartbeatSendInterval → snapshot the registry under the lock,
   replay AdviseSupervisory(correlation_id) for each entry → atomically
   swap *nmx_lock = replacement. Old NmxClient drops at end of scope,
   closing its TCP transport.

Subscription correlation ids are preserved across the swap so the
consumer's Subscription stream continues to receive on its existing
broadcast filter. The CallbackExporter stays bound across recoveries
— no TCP listener re-bind.

R15's "long-lived connection task" was listed as a hard prereq, but
the existing Mutex<NmxClient> already serialises concurrent ops
during the rebuild — recover_connection_core holds the inner mutex
during the swap, concurrent ops just wait. Functionally equivalent
to the long-lived-task design.

New ConfigError::RecoveryNotConfigured returned when
recover_connection is called without a factory installed. New
public re-export: RebuildFactory.

Tests (mxaccess 65 → 67):
  - recover_connection_without_factory_returns_recovery_not_configured
  - recover_connection_with_always_failing_factory_exhausts_attempts
    (pins (Started, Failed)×3 + final will_retry=false + bubbled
    TransportFailure)
  - subscribe_populates_registry_unsubscribe_clears_it
  - recovery_events_supports_multiple_subscribers (updated for the
    new factory-required path)

connect_nmx_auto-side auto-population of the factory (capturing the
ntlm_factory + discovered (addr, service_ipid) so consumers don't
re-author the closure) is a future polish — not required to close
F16.

design/followups.md: F16 moved to Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 01:57:43 -04:00
..

design/ — Rust port architectural plan

This folder is the design contract for the Rust replacement of AVEVA/Wonderware MXAccess. It is the gap between the .NET reference in src/ and the Rust crates that will be written under a sibling rust/ workspace (per CLAUDE.md).

The folder is structured as a small set of focused documents. Read in order; each builds on the previous.

File Purpose
00-overview.md Mission, two-layer goal, architectural principles, non-goals
10-raw-layer.md Byte-accurate raw MXAccess layer (codec + transport + session)
20-async-layer.md Idiomatic Tokio async layer on top of the raw layer
30-crate-topology.md Cargo workspace, crates, dependencies, build/test commands
40-protocol-invariants.md Bill of materials: IIDs, opnums, envelope/handle bytes
50-error-model.md MxStatus, error types, panic/cancellation policy
60-roadmap.md Milestones M0..M6, validation strategy
70-risks-and-open-questions.md Parity gaps, unproven flows, cross-platform constraints
dependencies.md Cross- and within-milestone parallelism map; agent budget per phase
review.md Adversarial review log (BLOCKER/MAJOR/MINOR/NIT findings, all resolved)
prompt.md /loop driver prompt for autonomous M2M6 execution
followups.md Open / resolved deferred work items; auto-triaged by prompt.md Step 0 (created on first /loop run if missing)

The design is grounded in the .NET reference at src/ and the protocol artifacts in docs/, analysis/, and captures/. Do not introduce protocol behavior in these documents that is not already proven in the reference. When adding a new claim about wire format, cite either:

  • a .cs file path in src/MxNativeCodec/, src/MxNativeClient/, or src/MxAsbClient/, or
  • a docs/*.md spec file, or
  • a captures/0NN-frida-* directory or analysis/frida/*.tsv row.

This folder is documentation, not code. When the Rust workspace is created, the design here is the contract it must satisfy. When evidence in captures/ invalidates a design decision here, update the design first, then the code.

Reading order

  • New contributor: 00 → 30 → 10 → 40 → 20 → 50 → 60 → 70.
  • Protocol question: 40 first, then the relevant section of 10.
  • API question: 20 first, then 50.
  • Planning a milestone: 60 first, cross-reference 70 for blockers.
  • Scheduling concurrent work: dependencies.md for the per-phase parallelism map.
  • Driving M2M6 autonomously via /loop: prompt.md (and the followups.md triage log it maintains).