[F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722
**F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`):
- New `IUnknownHolder` newtype that owns an MTA-resident COM proxy
with `unsafe impl Send + Sync`. Mirrors the .NET reference's
`ManagedNmxService2Client._activatedComObject` private field
(`cs:15`).
- New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns
`(Vec<u8>, IUnknownHolder)`. Existing
`marshal_activated_iunknown_objref` retained as a wrapper that
drops the holder (kept for inline-use callers).
- `NmxClient` gains an `activated_com_object: Option<IUnknownHolder>`
field, populated by `Self::create` from the new helper.
`Self::connect` / `Self::from_bound_transport` set it `None` (no
COM activation in those paths).
- Holding the IUnknown for the client's lifetime keeps the
SCM-tracked OXID valid; without it the COM ref count drops to
zero and the SCM may release the activated server-side instance,
making subsequent `ResolveOxid` / `RemQueryInterface` calls
return `RPC_S_SERVER_UNAVAILABLE`.
**F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2**
Five-step instrumentation of `Session::connect_nmx_auto` proves all
six COM-activation / RemQI / final-bind steps succeed. The 1722
fault originates at `RegisterEngine2` itself:
```
from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162
from_nmx_client: callback obj_ref hex: 4d454f57010000...
from_nmx_client: RegisterEngine2 (31112, mxaccess.31112)
from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 })
```
Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32
HRESULT.
**Critical finding: the .NET reference's `--probe-register-managed-callback`
(which uses the same hand-rolled `ManagedCallbackExporter` approach
as the Rust port) ALSO fails with the same `0x800706BA` fault.**
Only `--probe-session-write`, which uses
`ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build
the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So
this is an architectural artifact of the hand-rolled-callback
design, not a Rust port regression.
`design/followups.md` F55 entry documents the three resolution
paths (switch to DCOM-marshalled callback / hybrid / continue
investigating OBJREF rejection at NmxSvc).
F49 stays open with a refined diagnostic — the per-feature live
verification is gated on F55's resolution.
Workspace tests still 824 passing; clippy `-D warnings` clean
across both feature configurations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -98,6 +98,34 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
|
||||
|
||||
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
|
||||
|
||||
### F55 — Hand-rolled callback exporter rejected by `RegisterEngine2` on this AVEVA install
|
||||
**Severity:** P1 — blocks F49 live verification of every M6 feature that needs an `Engine` registered (i.e. all of them).
|
||||
**Source:** Live attempt 2026-05-06 against the local AVEVA install. Both the Rust port and the .NET reference's `--probe-register-managed-callback` (which uses the same hand-rolled-exporter approach as the Rust port) fail `RegisterEngine2` with HRESULT `0x800706BA` (`RPC_S_SERVER_UNAVAILABLE` wrapped as Win32 HRESULT). The .NET reference's `--probe-session-write` SUCCEEDS because it goes through `MxNativeSession.Open` → `CreateRegisteredService` (`MxNativeSession.cs:624`) which does **`ComObjRefProvider.MarshalInterfaceObjRef(callback, INmxSvcCallback, DifferentMachine)`** on a real C# COM object — letting Windows DCOM proxy/stub infrastructure handle the callback dispatch — instead of building a hand-rolled OBJREF + TCP listener.
|
||||
|
||||
**The Rust port mirrors the .NET reference's `ManagedCallbackExporter` design exactly.** Both fail. So this isn't a Rust port regression — it's a pre-existing issue in the hand-rolled callback architecture that wasn't previously live-tested end-to-end against this NmxSvc install.
|
||||
|
||||
**Diagnostic chain (logged from `mxaccess::Session::from_nmx_client`):**
|
||||
1. `Session::connect_nmx_auto` → `NmxClient::create` → all 6 steps OK (activate, marshal, ResolveOxid, RemQI, final bind). Endpoint resolved to `[fe80::...]:64311`. The new `IUnknownHolder` (mirrors `_activatedComObject` from `ManagedNmxService2Client.cs:15`) keeps the COM ref alive across the steps.
|
||||
2. `from_nmx_client` builds the callback OBJREF (162 bytes, byte-structurally identical to .NET's at `ProbeRegisterEngine2ManagedCallback.managed_callback_objref_hex` modulo random fields).
|
||||
3. `RegisterEngine2(engine_id, engine_name, version=6, callback_obj_ref)` returns `Transport(Fault { status: 0x800706BA })`.
|
||||
|
||||
**The OBJREF binding is correct:** `DESKTOP-6JL3KKO[<port>]` with `port` from `tokio::net::TcpListener::bind(0.0.0.0:0)`. Windows Firewall is OFF on all profiles. The hand-rolled exporter accepts connections; NmxSvc just refuses to use it.
|
||||
|
||||
**Hypotheses (each needs verification):**
|
||||
1. NmxSvc validates callback OBJREFs through Windows DCOM (`CoUnmarshalInterface` or similar) before registering them — and the hand-rolled blob fails that validation, surfacing as `RPC_S_SERVER_UNAVAILABLE` because COM interprets it as "the named server is unreachable".
|
||||
2. The OBJREF carries fields (e.g. specific `STDOBJREF.flags`, security bindings, or authn-hint values) NmxSvc requires that the hand-rolled builder doesn't set correctly. Comparing the byte-by-byte structure shows identical layout to .NET's hand-rolled OBJREF — but the same .NET hand-rolled OBJREF also fails. So this isn't a Rust-vs-.NET layout drift, it's an architecture-vs-NmxSvc gap.
|
||||
3. The NmxSvc version on this dev machine has stricter callback validation than the reference development version targeted by `MxNativeClient`'s original architecture. (NmxSvc release notes / version unknown at this point.)
|
||||
|
||||
**Three resolution paths (each substantial):**
|
||||
|
||||
- **Path A — switch to DCOM-marshalled callback.** Refactor `mxaccess-callback` so the callback is a real COM class (`#[implement]` via `windows-rs`) registered with the local DCOM SCM, then marshal it via `CoMarshalInterface` for the OBJREF. Abandons the project's "bypass DCOM proxy/stubs" goal but matches what .NET's working path does. ~1 week of work.
|
||||
- **Path B — hybrid: register via DCOM, dispatch via hand-rolled.** Use `CoMarshalInterface` only to build the OBJREF (which NmxSvc accepts), but intercept the inbound callback connection at the TCP layer to bypass DCOM stub dispatch. Requires reading the `CoMarshalInterface`-produced OBJREF, extracting the OXID/IPID, and standing up a TCP listener that responds to OXID resolution against itself. Architecturally awkward.
|
||||
- **Path C — investigate the OBJREF rejection at NmxSvc.** Capture the wire bytes NmxSvc sees from the .NET DCOM-marshalled path vs the hand-rolled path; diff to find what NmxSvc actually validates. May reveal a single field difference (e.g. a flag bit) that, set correctly in the hand-rolled OBJREF, makes it work. Cheapest if it pans out, but unbounded if it doesn't.
|
||||
|
||||
**Definition of done:** F49 step 5 (LmxClient OnWriteComplete round-trip) runs end-to-end against the live AVEVA install: `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes.
|
||||
|
||||
**Resolves when:** one of the three paths above lands.
|
||||
|
||||
### F3 — Cross-domain NTLM Type1/2/3 fixture
|
||||
**Severity:** P2
|
||||
**Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here.
|
||||
|
||||
@@ -169,6 +169,20 @@ pub struct NmxClient {
|
||||
/// the call to the right per-engine `INmxService2` instance
|
||||
/// (`ManagedNmxService2Client.cs:74,486-488`).
|
||||
service_ipid: Guid,
|
||||
/// Holder for the activated COM `IUnknown` proxy when this client
|
||||
/// was built via [`Self::create`]. Mirrors the .NET reference's
|
||||
/// `private readonly object _activatedComObject` field at
|
||||
/// `ManagedNmxService2Client.cs:15`. Holding the IUnknown for the
|
||||
/// client's lifetime keeps the SCM-tracked OXID valid; without it,
|
||||
/// subsequent `ResolveOxid` / `RemQueryInterface` calls hit
|
||||
/// `RPC_S_SERVER_UNAVAILABLE` (1722) once the server-side
|
||||
/// activated instance is released. `None` for clients built via
|
||||
/// [`Self::connect`] / [`Self::from_bound_transport`] — those
|
||||
/// paths get the OBJREF / IPID out-of-band so they don't own the
|
||||
/// COM activation lifetime.
|
||||
#[cfg(all(windows, feature = "windows-com"))]
|
||||
#[allow(dead_code)] // held only for Drop side-effect (release server-side ref)
|
||||
activated_com_object: Option<mxaccess_rpc::com_objref_provider::IUnknownHolder>,
|
||||
}
|
||||
|
||||
impl NmxClient {
|
||||
@@ -198,6 +212,8 @@ impl NmxClient {
|
||||
Ok(Self {
|
||||
transport,
|
||||
service_ipid,
|
||||
#[cfg(all(windows, feature = "windows-com"))]
|
||||
activated_com_object: None,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -248,7 +264,7 @@ impl NmxClient {
|
||||
mut ntlm_factory: impl FnMut() -> NtlmClientContext,
|
||||
) -> Result<Self, NmxClientError> {
|
||||
use mxaccess_rpc::com_objref_provider::{
|
||||
marshal_activated_iunknown_objref, MarshalContext,
|
||||
activate_and_marshal_iunknown_objref, MarshalContext,
|
||||
};
|
||||
use mxaccess_rpc::object_exporter::PROTSEQ_NCACN_IP_TCP;
|
||||
use mxaccess_rpc::object_exporter_client::{
|
||||
@@ -261,7 +277,13 @@ impl NmxClient {
|
||||
};
|
||||
|
||||
// Step 1+2: Activate NmxSvc.NmxService and parse OBJREF.
|
||||
let blob = marshal_activated_iunknown_objref(
|
||||
// Hold the IUnknown for the lifetime of the returned client —
|
||||
// mirrors `ManagedNmxService2Client._activatedComObject`
|
||||
// (`cs:15`). Without this hold, the COM ref count drops to
|
||||
// zero, the SCM releases the server-side instance, and the
|
||||
// ResolveOxid step below returns RPC_S_SERVER_UNAVAILABLE
|
||||
// (1722). See `IUnknownHolder` doc.
|
||||
let (blob, activated_holder) = activate_and_marshal_iunknown_objref(
|
||||
"NmxSvc.NmxService",
|
||||
MarshalContext::DifferentMachine,
|
||||
)?;
|
||||
@@ -367,8 +389,12 @@ impl NmxClient {
|
||||
// for the same reason — the IRemUnknown bind is single-use.
|
||||
drop(rem_qi_client);
|
||||
|
||||
// Step 6: Final transport bound to INmxService2.
|
||||
Self::connect(svc_addr, service_ipid, ntlm_factory()).await
|
||||
// Step 6: Final transport bound to INmxService2. Attach the
|
||||
// `IUnknownHolder` so the COM ref stays alive for the
|
||||
// client's lifetime.
|
||||
let mut client = Self::connect(svc_addr, service_ipid, ntlm_factory()).await?;
|
||||
client.activated_com_object = Some(activated_holder);
|
||||
Ok(client)
|
||||
}
|
||||
|
||||
/// Construct from an already-bound transport. Useful when a caller
|
||||
@@ -379,6 +405,8 @@ impl NmxClient {
|
||||
Self {
|
||||
transport,
|
||||
service_ipid,
|
||||
#[cfg(all(windows, feature = "windows-com"))]
|
||||
activated_com_object: None,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -192,6 +192,17 @@ pub fn clsid_from_prog_id(prog_id: &str) -> Result<GUID, ProviderError> {
|
||||
/// the same default `Activator.CreateInstance` picks up via
|
||||
/// `Type.GetTypeFromProgID`.
|
||||
///
|
||||
/// **The activated `IUnknown` is dropped at the end of this call.** For
|
||||
/// most use cases that's a bug — when the COM ref count goes to zero
|
||||
/// the SCM may release the activated server-side instance, which makes
|
||||
/// the marshalled OXID invalid for subsequent RPC. Use
|
||||
/// [`activate_and_marshal_iunknown_objref`] instead and hold the
|
||||
/// returned [`IUnknownHolder`] for the lifetime of the consumer that
|
||||
/// uses the OBJREF (typically the lifetime of the client built from
|
||||
/// it). This function is retained for callers that consume the OBJREF
|
||||
/// inline (e.g. tests / probes that use the bytes immediately and
|
||||
/// don't care about the activated server-side lifetime).
|
||||
///
|
||||
/// # Errors
|
||||
///
|
||||
/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`],
|
||||
@@ -200,6 +211,33 @@ pub fn marshal_activated_iunknown_objref(
|
||||
prog_id: &str,
|
||||
destination_context: MarshalContext,
|
||||
) -> Result<Vec<u8>, ProviderError> {
|
||||
activate_and_marshal_iunknown_objref(prog_id, destination_context).map(|(blob, _holder)| blob)
|
||||
}
|
||||
|
||||
/// Activate a COM class by ProgID, marshal its `IUnknown`, and return
|
||||
/// **both** the OBJREF byte stream **and** an [`IUnknownHolder`] that
|
||||
/// keeps the activated server-side instance alive.
|
||||
///
|
||||
/// This is the .NET-reference-faithful path: `ManagedNmxService2Client`
|
||||
/// (`cs:15`) holds the activated COM object as a private field for the
|
||||
/// client's lifetime via `_activatedComObject`. The Rust port previously
|
||||
/// dropped the IUnknown right after marshalling, which let the SCM
|
||||
/// release the server-side instance and made subsequent
|
||||
/// `ResolveOxid`/`RemQueryInterface` calls return
|
||||
/// `RPC_S_SERVER_UNAVAILABLE` (1722). Holding the
|
||||
/// [`IUnknownHolder`] for the client's lifetime fixes that.
|
||||
///
|
||||
/// The OBJREF blob and the IUnknown both refer to the same activated
|
||||
/// server-side instance; keep them paired.
|
||||
///
|
||||
/// # Errors
|
||||
///
|
||||
/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`],
|
||||
/// [`ProviderError::MarshalFailed`], [`ProviderError::GlobalLockFailed`].
|
||||
pub fn activate_and_marshal_iunknown_objref(
|
||||
prog_id: &str,
|
||||
destination_context: MarshalContext,
|
||||
) -> Result<(Vec<u8>, IUnknownHolder), ProviderError> {
|
||||
ensure_apartment()?;
|
||||
let clsid = clsid_from_prog_id(prog_id)?;
|
||||
let activation_flags = CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER;
|
||||
@@ -213,9 +251,39 @@ pub fn marshal_activated_iunknown_objref(
|
||||
hr: e.code().0 as u32,
|
||||
}
|
||||
})?;
|
||||
marshal_iunknown_objref(&unknown, destination_context)
|
||||
let blob = marshal_iunknown_objref(&unknown, destination_context)?;
|
||||
Ok((blob, IUnknownHolder { inner: unknown }))
|
||||
}
|
||||
|
||||
/// Owns a live `IUnknown` reference to a COM-activated server-side
|
||||
/// instance. Drop releases the reference (the COM proxy's `Release`
|
||||
/// runs, which decrements the server-side ref count and may trigger
|
||||
/// instance teardown when no other holders remain).
|
||||
///
|
||||
/// `Send + Sync` because the underlying COM proxy is registered in the
|
||||
/// MTA (`COINIT_MULTITHREADED` per [`ensure_apartment`]) and is
|
||||
/// therefore safe to invoke from any thread. SAFETY of the unsafe impls
|
||||
/// rests on this MTA invariant — callers must not transition the
|
||||
/// process apartment to STA after activating an [`IUnknownHolder`].
|
||||
pub struct IUnknownHolder {
|
||||
#[allow(dead_code)]
|
||||
inner: IUnknown,
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for IUnknownHolder {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("IUnknownHolder").finish_non_exhaustive()
|
||||
}
|
||||
}
|
||||
|
||||
// SAFETY: `IUnknownHolder` only ever wraps an MTA-resident COM proxy
|
||||
// (see `ensure_apartment` initialising `COINIT_MULTITHREADED`). MTA
|
||||
// proxies are thread-neutral by COM contract — calls can originate
|
||||
// from any thread without marshalling.
|
||||
unsafe impl Send for IUnknownHolder {}
|
||||
// SAFETY: same MTA-invariant rationale as `Send`.
|
||||
unsafe impl Sync for IUnknownHolder {}
|
||||
|
||||
/// Marshal an arbitrary `IUnknown` to an OBJREF byte stream. Mirrors
|
||||
/// `MarshalIUnknownObjRef` (`cs:32-35`), passing IID `IID_IUnknown`
|
||||
/// (`{00000000-0000-0000-C000-000000000046}`).
|
||||
|
||||
@@ -915,10 +915,20 @@ impl Session {
|
||||
// set so the OBJREF binding is always parseable as
|
||||
// "<host>[<port>]".
|
||||
let identities = ExporterIdentities::random();
|
||||
// Build the loopback address structurally rather than via `.parse()`
|
||||
// — avoids `.expect()` on a Result that's structurally infallible
|
||||
// (clippy::expect_used).
|
||||
let exporter_addr = SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST), 0);
|
||||
// Bind on UNSPECIFIED (`0.0.0.0`) so the listener accepts
|
||||
// dial-backs on every interface NmxSvc could resolve the
|
||||
// hostname to. The OBJREF's host string is the machine's
|
||||
// `COMPUTERNAME` (or `127.0.0.1` fallback), and NmxSvc
|
||||
// resolves that via DNS — which on a typical AVEVA install
|
||||
// returns the machine's primary NIC IP, not loopback. If the
|
||||
// exporter binds only on `127.0.0.1`, the dial-back lands on
|
||||
// a different interface and the TCP SYN is dropped, surfacing
|
||||
// as `RegisterEngine2 → Fault(0x800706BA RPC_S_SERVER_UNAVAILABLE)`
|
||||
// because NmxSvc can't reach our exporter to negotiate the
|
||||
// callback bind. Binding on UNSPECIFIED (= bind to all v4
|
||||
// interfaces, including loopback + primary NIC) avoids this.
|
||||
let exporter_addr =
|
||||
SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED), 0);
|
||||
let (exporter, callback_events) = CallbackExporter::bind(exporter_addr, identities)
|
||||
.await
|
||||
.map_err(Error::Io)?;
|
||||
|
||||
Reference in New Issue
Block a user