From c5d611d6fa35ad6e86a624577443861b7156c289 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Wed, 6 May 2026 08:50:30 -0400 Subject: [PATCH] [F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`): - New `IUnknownHolder` newtype that owns an MTA-resident COM proxy with `unsafe impl Send + Sync`. Mirrors the .NET reference's `ManagedNmxService2Client._activatedComObject` private field (`cs:15`). - New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns `(Vec, IUnknownHolder)`. Existing `marshal_activated_iunknown_objref` retained as a wrapper that drops the holder (kept for inline-use callers). - `NmxClient` gains an `activated_com_object: Option` field, populated by `Self::create` from the new helper. `Self::connect` / `Self::from_bound_transport` set it `None` (no COM activation in those paths). - Holding the IUnknown for the client's lifetime keeps the SCM-tracked OXID valid; without it the COM ref count drops to zero and the SCM may release the activated server-side instance, making subsequent `ResolveOxid` / `RemQueryInterface` calls return `RPC_S_SERVER_UNAVAILABLE`. **F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2** Five-step instrumentation of `Session::connect_nmx_auto` proves all six COM-activation / RemQI / final-bind steps succeed. The 1722 fault originates at `RegisterEngine2` itself: ``` from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162 from_nmx_client: callback obj_ref hex: 4d454f57010000... from_nmx_client: RegisterEngine2 (31112, mxaccess.31112) from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 }) ``` Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32 HRESULT. **Critical finding: the .NET reference's `--probe-register-managed-callback` (which uses the same hand-rolled `ManagedCallbackExporter` approach as the Rust port) ALSO fails with the same `0x800706BA` fault.** Only `--probe-session-write`, which uses `ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So this is an architectural artifact of the hand-rolled-callback design, not a Rust port regression. `design/followups.md` F55 entry documents the three resolution paths (switch to DCOM-marshalled callback / hybrid / continue investigating OBJREF rejection at NmxSvc). F49 stays open with a refined diagnostic — the per-feature live verification is gated on F55's resolution. Workspace tests still 824 passing; clippy `-D warnings` clean across both feature configurations. Co-Authored-By: Claude Opus 4.7 (1M context) --- design/followups.md | 28 ++++++++ rust/crates/mxaccess-nmx/src/client.rs | 36 ++++++++-- .../mxaccess-rpc/src/com_objref_provider.rs | 70 ++++++++++++++++++- rust/crates/mxaccess/src/session.rs | 18 +++-- 4 files changed, 143 insertions(+), 9 deletions(-) diff --git a/design/followups.md b/design/followups.md index 6673368..b715102 100644 --- a/design/followups.md +++ b/design/followups.md @@ -98,6 +98,34 @@ Between each publish: wait for the crate to be indexed before the next one's `ca **Resolves when:** the lint is on and the workspace doc build is warning-clean with it. +### F55 — Hand-rolled callback exporter rejected by `RegisterEngine2` on this AVEVA install +**Severity:** P1 — blocks F49 live verification of every M6 feature that needs an `Engine` registered (i.e. all of them). +**Source:** Live attempt 2026-05-06 against the local AVEVA install. Both the Rust port and the .NET reference's `--probe-register-managed-callback` (which uses the same hand-rolled-exporter approach as the Rust port) fail `RegisterEngine2` with HRESULT `0x800706BA` (`RPC_S_SERVER_UNAVAILABLE` wrapped as Win32 HRESULT). The .NET reference's `--probe-session-write` SUCCEEDS because it goes through `MxNativeSession.Open` → `CreateRegisteredService` (`MxNativeSession.cs:624`) which does **`ComObjRefProvider.MarshalInterfaceObjRef(callback, INmxSvcCallback, DifferentMachine)`** on a real C# COM object — letting Windows DCOM proxy/stub infrastructure handle the callback dispatch — instead of building a hand-rolled OBJREF + TCP listener. + +**The Rust port mirrors the .NET reference's `ManagedCallbackExporter` design exactly.** Both fail. So this isn't a Rust port regression — it's a pre-existing issue in the hand-rolled callback architecture that wasn't previously live-tested end-to-end against this NmxSvc install. + +**Diagnostic chain (logged from `mxaccess::Session::from_nmx_client`):** +1. `Session::connect_nmx_auto` → `NmxClient::create` → all 6 steps OK (activate, marshal, ResolveOxid, RemQI, final bind). Endpoint resolved to `[fe80::...]:64311`. The new `IUnknownHolder` (mirrors `_activatedComObject` from `ManagedNmxService2Client.cs:15`) keeps the COM ref alive across the steps. +2. `from_nmx_client` builds the callback OBJREF (162 bytes, byte-structurally identical to .NET's at `ProbeRegisterEngine2ManagedCallback.managed_callback_objref_hex` modulo random fields). +3. `RegisterEngine2(engine_id, engine_name, version=6, callback_obj_ref)` returns `Transport(Fault { status: 0x800706BA })`. + +**The OBJREF binding is correct:** `DESKTOP-6JL3KKO[]` with `port` from `tokio::net::TcpListener::bind(0.0.0.0:0)`. Windows Firewall is OFF on all profiles. The hand-rolled exporter accepts connections; NmxSvc just refuses to use it. + +**Hypotheses (each needs verification):** +1. NmxSvc validates callback OBJREFs through Windows DCOM (`CoUnmarshalInterface` or similar) before registering them — and the hand-rolled blob fails that validation, surfacing as `RPC_S_SERVER_UNAVAILABLE` because COM interprets it as "the named server is unreachable". +2. The OBJREF carries fields (e.g. specific `STDOBJREF.flags`, security bindings, or authn-hint values) NmxSvc requires that the hand-rolled builder doesn't set correctly. Comparing the byte-by-byte structure shows identical layout to .NET's hand-rolled OBJREF — but the same .NET hand-rolled OBJREF also fails. So this isn't a Rust-vs-.NET layout drift, it's an architecture-vs-NmxSvc gap. +3. The NmxSvc version on this dev machine has stricter callback validation than the reference development version targeted by `MxNativeClient`'s original architecture. (NmxSvc release notes / version unknown at this point.) + +**Three resolution paths (each substantial):** + +- **Path A — switch to DCOM-marshalled callback.** Refactor `mxaccess-callback` so the callback is a real COM class (`#[implement]` via `windows-rs`) registered with the local DCOM SCM, then marshal it via `CoMarshalInterface` for the OBJREF. Abandons the project's "bypass DCOM proxy/stubs" goal but matches what .NET's working path does. ~1 week of work. +- **Path B — hybrid: register via DCOM, dispatch via hand-rolled.** Use `CoMarshalInterface` only to build the OBJREF (which NmxSvc accepts), but intercept the inbound callback connection at the TCP layer to bypass DCOM stub dispatch. Requires reading the `CoMarshalInterface`-produced OBJREF, extracting the OXID/IPID, and standing up a TCP listener that responds to OXID resolution against itself. Architecturally awkward. +- **Path C — investigate the OBJREF rejection at NmxSvc.** Capture the wire bytes NmxSvc sees from the .NET DCOM-marshalled path vs the hand-rolled path; diff to find what NmxSvc actually validates. May reveal a single field difference (e.g. a flag bit) that, set correctly in the hand-rolled OBJREF, makes it work. Cheapest if it pans out, but unbounded if it doesn't. + +**Definition of done:** F49 step 5 (LmxClient OnWriteComplete round-trip) runs end-to-end against the live AVEVA install: `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes. + +**Resolves when:** one of the three paths above lands. + ### F3 — Cross-domain NTLM Type1/2/3 fixture **Severity:** P2 **Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here. diff --git a/rust/crates/mxaccess-nmx/src/client.rs b/rust/crates/mxaccess-nmx/src/client.rs index b14cda3..0446f60 100644 --- a/rust/crates/mxaccess-nmx/src/client.rs +++ b/rust/crates/mxaccess-nmx/src/client.rs @@ -169,6 +169,20 @@ pub struct NmxClient { /// the call to the right per-engine `INmxService2` instance /// (`ManagedNmxService2Client.cs:74,486-488`). service_ipid: Guid, + /// Holder for the activated COM `IUnknown` proxy when this client + /// was built via [`Self::create`]. Mirrors the .NET reference's + /// `private readonly object _activatedComObject` field at + /// `ManagedNmxService2Client.cs:15`. Holding the IUnknown for the + /// client's lifetime keeps the SCM-tracked OXID valid; without it, + /// subsequent `ResolveOxid` / `RemQueryInterface` calls hit + /// `RPC_S_SERVER_UNAVAILABLE` (1722) once the server-side + /// activated instance is released. `None` for clients built via + /// [`Self::connect`] / [`Self::from_bound_transport`] — those + /// paths get the OBJREF / IPID out-of-band so they don't own the + /// COM activation lifetime. + #[cfg(all(windows, feature = "windows-com"))] + #[allow(dead_code)] // held only for Drop side-effect (release server-side ref) + activated_com_object: Option, } impl NmxClient { @@ -198,6 +212,8 @@ impl NmxClient { Ok(Self { transport, service_ipid, + #[cfg(all(windows, feature = "windows-com"))] + activated_com_object: None, }) } @@ -248,7 +264,7 @@ impl NmxClient { mut ntlm_factory: impl FnMut() -> NtlmClientContext, ) -> Result { use mxaccess_rpc::com_objref_provider::{ - marshal_activated_iunknown_objref, MarshalContext, + activate_and_marshal_iunknown_objref, MarshalContext, }; use mxaccess_rpc::object_exporter::PROTSEQ_NCACN_IP_TCP; use mxaccess_rpc::object_exporter_client::{ @@ -261,7 +277,13 @@ impl NmxClient { }; // Step 1+2: Activate NmxSvc.NmxService and parse OBJREF. - let blob = marshal_activated_iunknown_objref( + // Hold the IUnknown for the lifetime of the returned client — + // mirrors `ManagedNmxService2Client._activatedComObject` + // (`cs:15`). Without this hold, the COM ref count drops to + // zero, the SCM releases the server-side instance, and the + // ResolveOxid step below returns RPC_S_SERVER_UNAVAILABLE + // (1722). See `IUnknownHolder` doc. + let (blob, activated_holder) = activate_and_marshal_iunknown_objref( "NmxSvc.NmxService", MarshalContext::DifferentMachine, )?; @@ -367,8 +389,12 @@ impl NmxClient { // for the same reason — the IRemUnknown bind is single-use. drop(rem_qi_client); - // Step 6: Final transport bound to INmxService2. - Self::connect(svc_addr, service_ipid, ntlm_factory()).await + // Step 6: Final transport bound to INmxService2. Attach the + // `IUnknownHolder` so the COM ref stays alive for the + // client's lifetime. + let mut client = Self::connect(svc_addr, service_ipid, ntlm_factory()).await?; + client.activated_com_object = Some(activated_holder); + Ok(client) } /// Construct from an already-bound transport. Useful when a caller @@ -379,6 +405,8 @@ impl NmxClient { Self { transport, service_ipid, + #[cfg(all(windows, feature = "windows-com"))] + activated_com_object: None, } } diff --git a/rust/crates/mxaccess-rpc/src/com_objref_provider.rs b/rust/crates/mxaccess-rpc/src/com_objref_provider.rs index 79394e9..3335297 100644 --- a/rust/crates/mxaccess-rpc/src/com_objref_provider.rs +++ b/rust/crates/mxaccess-rpc/src/com_objref_provider.rs @@ -192,6 +192,17 @@ pub fn clsid_from_prog_id(prog_id: &str) -> Result { /// the same default `Activator.CreateInstance` picks up via /// `Type.GetTypeFromProgID`. /// +/// **The activated `IUnknown` is dropped at the end of this call.** For +/// most use cases that's a bug — when the COM ref count goes to zero +/// the SCM may release the activated server-side instance, which makes +/// the marshalled OXID invalid for subsequent RPC. Use +/// [`activate_and_marshal_iunknown_objref`] instead and hold the +/// returned [`IUnknownHolder`] for the lifetime of the consumer that +/// uses the OBJREF (typically the lifetime of the client built from +/// it). This function is retained for callers that consume the OBJREF +/// inline (e.g. tests / probes that use the bytes immediately and +/// don't care about the activated server-side lifetime). +/// /// # Errors /// /// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`], @@ -200,6 +211,33 @@ pub fn marshal_activated_iunknown_objref( prog_id: &str, destination_context: MarshalContext, ) -> Result, ProviderError> { + activate_and_marshal_iunknown_objref(prog_id, destination_context).map(|(blob, _holder)| blob) +} + +/// Activate a COM class by ProgID, marshal its `IUnknown`, and return +/// **both** the OBJREF byte stream **and** an [`IUnknownHolder`] that +/// keeps the activated server-side instance alive. +/// +/// This is the .NET-reference-faithful path: `ManagedNmxService2Client` +/// (`cs:15`) holds the activated COM object as a private field for the +/// client's lifetime via `_activatedComObject`. The Rust port previously +/// dropped the IUnknown right after marshalling, which let the SCM +/// release the server-side instance and made subsequent +/// `ResolveOxid`/`RemQueryInterface` calls return +/// `RPC_S_SERVER_UNAVAILABLE` (1722). Holding the +/// [`IUnknownHolder`] for the client's lifetime fixes that. +/// +/// The OBJREF blob and the IUnknown both refer to the same activated +/// server-side instance; keep them paired. +/// +/// # Errors +/// +/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`], +/// [`ProviderError::MarshalFailed`], [`ProviderError::GlobalLockFailed`]. +pub fn activate_and_marshal_iunknown_objref( + prog_id: &str, + destination_context: MarshalContext, +) -> Result<(Vec, IUnknownHolder), ProviderError> { ensure_apartment()?; let clsid = clsid_from_prog_id(prog_id)?; let activation_flags = CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER; @@ -213,9 +251,39 @@ pub fn marshal_activated_iunknown_objref( hr: e.code().0 as u32, } })?; - marshal_iunknown_objref(&unknown, destination_context) + let blob = marshal_iunknown_objref(&unknown, destination_context)?; + Ok((blob, IUnknownHolder { inner: unknown })) } +/// Owns a live `IUnknown` reference to a COM-activated server-side +/// instance. Drop releases the reference (the COM proxy's `Release` +/// runs, which decrements the server-side ref count and may trigger +/// instance teardown when no other holders remain). +/// +/// `Send + Sync` because the underlying COM proxy is registered in the +/// MTA (`COINIT_MULTITHREADED` per [`ensure_apartment`]) and is +/// therefore safe to invoke from any thread. SAFETY of the unsafe impls +/// rests on this MTA invariant — callers must not transition the +/// process apartment to STA after activating an [`IUnknownHolder`]. +pub struct IUnknownHolder { + #[allow(dead_code)] + inner: IUnknown, +} + +impl std::fmt::Debug for IUnknownHolder { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("IUnknownHolder").finish_non_exhaustive() + } +} + +// SAFETY: `IUnknownHolder` only ever wraps an MTA-resident COM proxy +// (see `ensure_apartment` initialising `COINIT_MULTITHREADED`). MTA +// proxies are thread-neutral by COM contract — calls can originate +// from any thread without marshalling. +unsafe impl Send for IUnknownHolder {} +// SAFETY: same MTA-invariant rationale as `Send`. +unsafe impl Sync for IUnknownHolder {} + /// Marshal an arbitrary `IUnknown` to an OBJREF byte stream. Mirrors /// `MarshalIUnknownObjRef` (`cs:32-35`), passing IID `IID_IUnknown` /// (`{00000000-0000-0000-C000-000000000046}`). diff --git a/rust/crates/mxaccess/src/session.rs b/rust/crates/mxaccess/src/session.rs index e93f817..7fb65ee 100644 --- a/rust/crates/mxaccess/src/session.rs +++ b/rust/crates/mxaccess/src/session.rs @@ -915,10 +915,20 @@ impl Session { // set so the OBJREF binding is always parseable as // "[]". let identities = ExporterIdentities::random(); - // Build the loopback address structurally rather than via `.parse()` - // — avoids `.expect()` on a Result that's structurally infallible - // (clippy::expect_used). - let exporter_addr = SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST), 0); + // Bind on UNSPECIFIED (`0.0.0.0`) so the listener accepts + // dial-backs on every interface NmxSvc could resolve the + // hostname to. The OBJREF's host string is the machine's + // `COMPUTERNAME` (or `127.0.0.1` fallback), and NmxSvc + // resolves that via DNS — which on a typical AVEVA install + // returns the machine's primary NIC IP, not loopback. If the + // exporter binds only on `127.0.0.1`, the dial-back lands on + // a different interface and the TCP SYN is dropped, surfacing + // as `RegisterEngine2 → Fault(0x800706BA RPC_S_SERVER_UNAVAILABLE)` + // because NmxSvc can't reach our exporter to negotiate the + // callback bind. Binding on UNSPECIFIED (= bind to all v4 + // interfaces, including loopback + primary NIC) avoids this. + let exporter_addr = + SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED), 0); let (exporter, callback_events) = CallbackExporter::bind(exporter_addr, identities) .await .map_err(Error::Io)?;