[F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722
**F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`):
- New `IUnknownHolder` newtype that owns an MTA-resident COM proxy
with `unsafe impl Send + Sync`. Mirrors the .NET reference's
`ManagedNmxService2Client._activatedComObject` private field
(`cs:15`).
- New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns
`(Vec<u8>, IUnknownHolder)`. Existing
`marshal_activated_iunknown_objref` retained as a wrapper that
drops the holder (kept for inline-use callers).
- `NmxClient` gains an `activated_com_object: Option<IUnknownHolder>`
field, populated by `Self::create` from the new helper.
`Self::connect` / `Self::from_bound_transport` set it `None` (no
COM activation in those paths).
- Holding the IUnknown for the client's lifetime keeps the
SCM-tracked OXID valid; without it the COM ref count drops to
zero and the SCM may release the activated server-side instance,
making subsequent `ResolveOxid` / `RemQueryInterface` calls
return `RPC_S_SERVER_UNAVAILABLE`.
**F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2**
Five-step instrumentation of `Session::connect_nmx_auto` proves all
six COM-activation / RemQI / final-bind steps succeed. The 1722
fault originates at `RegisterEngine2` itself:
```
from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162
from_nmx_client: callback obj_ref hex: 4d454f57010000...
from_nmx_client: RegisterEngine2 (31112, mxaccess.31112)
from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 })
```
Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32
HRESULT.
**Critical finding: the .NET reference's `--probe-register-managed-callback`
(which uses the same hand-rolled `ManagedCallbackExporter` approach
as the Rust port) ALSO fails with the same `0x800706BA` fault.**
Only `--probe-session-write`, which uses
`ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build
the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So
this is an architectural artifact of the hand-rolled-callback
design, not a Rust port regression.
`design/followups.md` F55 entry documents the three resolution
paths (switch to DCOM-marshalled callback / hybrid / continue
investigating OBJREF rejection at NmxSvc).
F49 stays open with a refined diagnostic — the per-feature live
verification is gated on F55's resolution.
Workspace tests still 824 passing; clippy `-D warnings` clean
across both feature configurations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -98,6 +98,34 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
|
|||||||
|
|
||||||
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
|
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
|
||||||
|
|
||||||
|
### F55 — Hand-rolled callback exporter rejected by `RegisterEngine2` on this AVEVA install
|
||||||
|
**Severity:** P1 — blocks F49 live verification of every M6 feature that needs an `Engine` registered (i.e. all of them).
|
||||||
|
**Source:** Live attempt 2026-05-06 against the local AVEVA install. Both the Rust port and the .NET reference's `--probe-register-managed-callback` (which uses the same hand-rolled-exporter approach as the Rust port) fail `RegisterEngine2` with HRESULT `0x800706BA` (`RPC_S_SERVER_UNAVAILABLE` wrapped as Win32 HRESULT). The .NET reference's `--probe-session-write` SUCCEEDS because it goes through `MxNativeSession.Open` → `CreateRegisteredService` (`MxNativeSession.cs:624`) which does **`ComObjRefProvider.MarshalInterfaceObjRef(callback, INmxSvcCallback, DifferentMachine)`** on a real C# COM object — letting Windows DCOM proxy/stub infrastructure handle the callback dispatch — instead of building a hand-rolled OBJREF + TCP listener.
|
||||||
|
|
||||||
|
**The Rust port mirrors the .NET reference's `ManagedCallbackExporter` design exactly.** Both fail. So this isn't a Rust port regression — it's a pre-existing issue in the hand-rolled callback architecture that wasn't previously live-tested end-to-end against this NmxSvc install.
|
||||||
|
|
||||||
|
**Diagnostic chain (logged from `mxaccess::Session::from_nmx_client`):**
|
||||||
|
1. `Session::connect_nmx_auto` → `NmxClient::create` → all 6 steps OK (activate, marshal, ResolveOxid, RemQI, final bind). Endpoint resolved to `[fe80::...]:64311`. The new `IUnknownHolder` (mirrors `_activatedComObject` from `ManagedNmxService2Client.cs:15`) keeps the COM ref alive across the steps.
|
||||||
|
2. `from_nmx_client` builds the callback OBJREF (162 bytes, byte-structurally identical to .NET's at `ProbeRegisterEngine2ManagedCallback.managed_callback_objref_hex` modulo random fields).
|
||||||
|
3. `RegisterEngine2(engine_id, engine_name, version=6, callback_obj_ref)` returns `Transport(Fault { status: 0x800706BA })`.
|
||||||
|
|
||||||
|
**The OBJREF binding is correct:** `DESKTOP-6JL3KKO[<port>]` with `port` from `tokio::net::TcpListener::bind(0.0.0.0:0)`. Windows Firewall is OFF on all profiles. The hand-rolled exporter accepts connections; NmxSvc just refuses to use it.
|
||||||
|
|
||||||
|
**Hypotheses (each needs verification):**
|
||||||
|
1. NmxSvc validates callback OBJREFs through Windows DCOM (`CoUnmarshalInterface` or similar) before registering them — and the hand-rolled blob fails that validation, surfacing as `RPC_S_SERVER_UNAVAILABLE` because COM interprets it as "the named server is unreachable".
|
||||||
|
2. The OBJREF carries fields (e.g. specific `STDOBJREF.flags`, security bindings, or authn-hint values) NmxSvc requires that the hand-rolled builder doesn't set correctly. Comparing the byte-by-byte structure shows identical layout to .NET's hand-rolled OBJREF — but the same .NET hand-rolled OBJREF also fails. So this isn't a Rust-vs-.NET layout drift, it's an architecture-vs-NmxSvc gap.
|
||||||
|
3. The NmxSvc version on this dev machine has stricter callback validation than the reference development version targeted by `MxNativeClient`'s original architecture. (NmxSvc release notes / version unknown at this point.)
|
||||||
|
|
||||||
|
**Three resolution paths (each substantial):**
|
||||||
|
|
||||||
|
- **Path A — switch to DCOM-marshalled callback.** Refactor `mxaccess-callback` so the callback is a real COM class (`#[implement]` via `windows-rs`) registered with the local DCOM SCM, then marshal it via `CoMarshalInterface` for the OBJREF. Abandons the project's "bypass DCOM proxy/stubs" goal but matches what .NET's working path does. ~1 week of work.
|
||||||
|
- **Path B — hybrid: register via DCOM, dispatch via hand-rolled.** Use `CoMarshalInterface` only to build the OBJREF (which NmxSvc accepts), but intercept the inbound callback connection at the TCP layer to bypass DCOM stub dispatch. Requires reading the `CoMarshalInterface`-produced OBJREF, extracting the OXID/IPID, and standing up a TCP listener that responds to OXID resolution against itself. Architecturally awkward.
|
||||||
|
- **Path C — investigate the OBJREF rejection at NmxSvc.** Capture the wire bytes NmxSvc sees from the .NET DCOM-marshalled path vs the hand-rolled path; diff to find what NmxSvc actually validates. May reveal a single field difference (e.g. a flag bit) that, set correctly in the hand-rolled OBJREF, makes it work. Cheapest if it pans out, but unbounded if it doesn't.
|
||||||
|
|
||||||
|
**Definition of done:** F49 step 5 (LmxClient OnWriteComplete round-trip) runs end-to-end against the live AVEVA install: `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes.
|
||||||
|
|
||||||
|
**Resolves when:** one of the three paths above lands.
|
||||||
|
|
||||||
### F3 — Cross-domain NTLM Type1/2/3 fixture
|
### F3 — Cross-domain NTLM Type1/2/3 fixture
|
||||||
**Severity:** P2
|
**Severity:** P2
|
||||||
**Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here.
|
**Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here.
|
||||||
|
|||||||
@@ -169,6 +169,20 @@ pub struct NmxClient {
|
|||||||
/// the call to the right per-engine `INmxService2` instance
|
/// the call to the right per-engine `INmxService2` instance
|
||||||
/// (`ManagedNmxService2Client.cs:74,486-488`).
|
/// (`ManagedNmxService2Client.cs:74,486-488`).
|
||||||
service_ipid: Guid,
|
service_ipid: Guid,
|
||||||
|
/// Holder for the activated COM `IUnknown` proxy when this client
|
||||||
|
/// was built via [`Self::create`]. Mirrors the .NET reference's
|
||||||
|
/// `private readonly object _activatedComObject` field at
|
||||||
|
/// `ManagedNmxService2Client.cs:15`. Holding the IUnknown for the
|
||||||
|
/// client's lifetime keeps the SCM-tracked OXID valid; without it,
|
||||||
|
/// subsequent `ResolveOxid` / `RemQueryInterface` calls hit
|
||||||
|
/// `RPC_S_SERVER_UNAVAILABLE` (1722) once the server-side
|
||||||
|
/// activated instance is released. `None` for clients built via
|
||||||
|
/// [`Self::connect`] / [`Self::from_bound_transport`] — those
|
||||||
|
/// paths get the OBJREF / IPID out-of-band so they don't own the
|
||||||
|
/// COM activation lifetime.
|
||||||
|
#[cfg(all(windows, feature = "windows-com"))]
|
||||||
|
#[allow(dead_code)] // held only for Drop side-effect (release server-side ref)
|
||||||
|
activated_com_object: Option<mxaccess_rpc::com_objref_provider::IUnknownHolder>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl NmxClient {
|
impl NmxClient {
|
||||||
@@ -198,6 +212,8 @@ impl NmxClient {
|
|||||||
Ok(Self {
|
Ok(Self {
|
||||||
transport,
|
transport,
|
||||||
service_ipid,
|
service_ipid,
|
||||||
|
#[cfg(all(windows, feature = "windows-com"))]
|
||||||
|
activated_com_object: None,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -248,7 +264,7 @@ impl NmxClient {
|
|||||||
mut ntlm_factory: impl FnMut() -> NtlmClientContext,
|
mut ntlm_factory: impl FnMut() -> NtlmClientContext,
|
||||||
) -> Result<Self, NmxClientError> {
|
) -> Result<Self, NmxClientError> {
|
||||||
use mxaccess_rpc::com_objref_provider::{
|
use mxaccess_rpc::com_objref_provider::{
|
||||||
marshal_activated_iunknown_objref, MarshalContext,
|
activate_and_marshal_iunknown_objref, MarshalContext,
|
||||||
};
|
};
|
||||||
use mxaccess_rpc::object_exporter::PROTSEQ_NCACN_IP_TCP;
|
use mxaccess_rpc::object_exporter::PROTSEQ_NCACN_IP_TCP;
|
||||||
use mxaccess_rpc::object_exporter_client::{
|
use mxaccess_rpc::object_exporter_client::{
|
||||||
@@ -261,7 +277,13 @@ impl NmxClient {
|
|||||||
};
|
};
|
||||||
|
|
||||||
// Step 1+2: Activate NmxSvc.NmxService and parse OBJREF.
|
// Step 1+2: Activate NmxSvc.NmxService and parse OBJREF.
|
||||||
let blob = marshal_activated_iunknown_objref(
|
// Hold the IUnknown for the lifetime of the returned client —
|
||||||
|
// mirrors `ManagedNmxService2Client._activatedComObject`
|
||||||
|
// (`cs:15`). Without this hold, the COM ref count drops to
|
||||||
|
// zero, the SCM releases the server-side instance, and the
|
||||||
|
// ResolveOxid step below returns RPC_S_SERVER_UNAVAILABLE
|
||||||
|
// (1722). See `IUnknownHolder` doc.
|
||||||
|
let (blob, activated_holder) = activate_and_marshal_iunknown_objref(
|
||||||
"NmxSvc.NmxService",
|
"NmxSvc.NmxService",
|
||||||
MarshalContext::DifferentMachine,
|
MarshalContext::DifferentMachine,
|
||||||
)?;
|
)?;
|
||||||
@@ -367,8 +389,12 @@ impl NmxClient {
|
|||||||
// for the same reason — the IRemUnknown bind is single-use.
|
// for the same reason — the IRemUnknown bind is single-use.
|
||||||
drop(rem_qi_client);
|
drop(rem_qi_client);
|
||||||
|
|
||||||
// Step 6: Final transport bound to INmxService2.
|
// Step 6: Final transport bound to INmxService2. Attach the
|
||||||
Self::connect(svc_addr, service_ipid, ntlm_factory()).await
|
// `IUnknownHolder` so the COM ref stays alive for the
|
||||||
|
// client's lifetime.
|
||||||
|
let mut client = Self::connect(svc_addr, service_ipid, ntlm_factory()).await?;
|
||||||
|
client.activated_com_object = Some(activated_holder);
|
||||||
|
Ok(client)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Construct from an already-bound transport. Useful when a caller
|
/// Construct from an already-bound transport. Useful when a caller
|
||||||
@@ -379,6 +405,8 @@ impl NmxClient {
|
|||||||
Self {
|
Self {
|
||||||
transport,
|
transport,
|
||||||
service_ipid,
|
service_ipid,
|
||||||
|
#[cfg(all(windows, feature = "windows-com"))]
|
||||||
|
activated_com_object: None,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -192,6 +192,17 @@ pub fn clsid_from_prog_id(prog_id: &str) -> Result<GUID, ProviderError> {
|
|||||||
/// the same default `Activator.CreateInstance` picks up via
|
/// the same default `Activator.CreateInstance` picks up via
|
||||||
/// `Type.GetTypeFromProgID`.
|
/// `Type.GetTypeFromProgID`.
|
||||||
///
|
///
|
||||||
|
/// **The activated `IUnknown` is dropped at the end of this call.** For
|
||||||
|
/// most use cases that's a bug — when the COM ref count goes to zero
|
||||||
|
/// the SCM may release the activated server-side instance, which makes
|
||||||
|
/// the marshalled OXID invalid for subsequent RPC. Use
|
||||||
|
/// [`activate_and_marshal_iunknown_objref`] instead and hold the
|
||||||
|
/// returned [`IUnknownHolder`] for the lifetime of the consumer that
|
||||||
|
/// uses the OBJREF (typically the lifetime of the client built from
|
||||||
|
/// it). This function is retained for callers that consume the OBJREF
|
||||||
|
/// inline (e.g. tests / probes that use the bytes immediately and
|
||||||
|
/// don't care about the activated server-side lifetime).
|
||||||
|
///
|
||||||
/// # Errors
|
/// # Errors
|
||||||
///
|
///
|
||||||
/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`],
|
/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`],
|
||||||
@@ -200,6 +211,33 @@ pub fn marshal_activated_iunknown_objref(
|
|||||||
prog_id: &str,
|
prog_id: &str,
|
||||||
destination_context: MarshalContext,
|
destination_context: MarshalContext,
|
||||||
) -> Result<Vec<u8>, ProviderError> {
|
) -> Result<Vec<u8>, ProviderError> {
|
||||||
|
activate_and_marshal_iunknown_objref(prog_id, destination_context).map(|(blob, _holder)| blob)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Activate a COM class by ProgID, marshal its `IUnknown`, and return
|
||||||
|
/// **both** the OBJREF byte stream **and** an [`IUnknownHolder`] that
|
||||||
|
/// keeps the activated server-side instance alive.
|
||||||
|
///
|
||||||
|
/// This is the .NET-reference-faithful path: `ManagedNmxService2Client`
|
||||||
|
/// (`cs:15`) holds the activated COM object as a private field for the
|
||||||
|
/// client's lifetime via `_activatedComObject`. The Rust port previously
|
||||||
|
/// dropped the IUnknown right after marshalling, which let the SCM
|
||||||
|
/// release the server-side instance and made subsequent
|
||||||
|
/// `ResolveOxid`/`RemQueryInterface` calls return
|
||||||
|
/// `RPC_S_SERVER_UNAVAILABLE` (1722). Holding the
|
||||||
|
/// [`IUnknownHolder`] for the client's lifetime fixes that.
|
||||||
|
///
|
||||||
|
/// The OBJREF blob and the IUnknown both refer to the same activated
|
||||||
|
/// server-side instance; keep them paired.
|
||||||
|
///
|
||||||
|
/// # Errors
|
||||||
|
///
|
||||||
|
/// [`ProviderError::UnknownProgId`], [`ProviderError::ActivationFailed`],
|
||||||
|
/// [`ProviderError::MarshalFailed`], [`ProviderError::GlobalLockFailed`].
|
||||||
|
pub fn activate_and_marshal_iunknown_objref(
|
||||||
|
prog_id: &str,
|
||||||
|
destination_context: MarshalContext,
|
||||||
|
) -> Result<(Vec<u8>, IUnknownHolder), ProviderError> {
|
||||||
ensure_apartment()?;
|
ensure_apartment()?;
|
||||||
let clsid = clsid_from_prog_id(prog_id)?;
|
let clsid = clsid_from_prog_id(prog_id)?;
|
||||||
let activation_flags = CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER;
|
let activation_flags = CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER;
|
||||||
@@ -213,9 +251,39 @@ pub fn marshal_activated_iunknown_objref(
|
|||||||
hr: e.code().0 as u32,
|
hr: e.code().0 as u32,
|
||||||
}
|
}
|
||||||
})?;
|
})?;
|
||||||
marshal_iunknown_objref(&unknown, destination_context)
|
let blob = marshal_iunknown_objref(&unknown, destination_context)?;
|
||||||
|
Ok((blob, IUnknownHolder { inner: unknown }))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Owns a live `IUnknown` reference to a COM-activated server-side
|
||||||
|
/// instance. Drop releases the reference (the COM proxy's `Release`
|
||||||
|
/// runs, which decrements the server-side ref count and may trigger
|
||||||
|
/// instance teardown when no other holders remain).
|
||||||
|
///
|
||||||
|
/// `Send + Sync` because the underlying COM proxy is registered in the
|
||||||
|
/// MTA (`COINIT_MULTITHREADED` per [`ensure_apartment`]) and is
|
||||||
|
/// therefore safe to invoke from any thread. SAFETY of the unsafe impls
|
||||||
|
/// rests on this MTA invariant — callers must not transition the
|
||||||
|
/// process apartment to STA after activating an [`IUnknownHolder`].
|
||||||
|
pub struct IUnknownHolder {
|
||||||
|
#[allow(dead_code)]
|
||||||
|
inner: IUnknown,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for IUnknownHolder {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
f.debug_struct("IUnknownHolder").finish_non_exhaustive()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SAFETY: `IUnknownHolder` only ever wraps an MTA-resident COM proxy
|
||||||
|
// (see `ensure_apartment` initialising `COINIT_MULTITHREADED`). MTA
|
||||||
|
// proxies are thread-neutral by COM contract — calls can originate
|
||||||
|
// from any thread without marshalling.
|
||||||
|
unsafe impl Send for IUnknownHolder {}
|
||||||
|
// SAFETY: same MTA-invariant rationale as `Send`.
|
||||||
|
unsafe impl Sync for IUnknownHolder {}
|
||||||
|
|
||||||
/// Marshal an arbitrary `IUnknown` to an OBJREF byte stream. Mirrors
|
/// Marshal an arbitrary `IUnknown` to an OBJREF byte stream. Mirrors
|
||||||
/// `MarshalIUnknownObjRef` (`cs:32-35`), passing IID `IID_IUnknown`
|
/// `MarshalIUnknownObjRef` (`cs:32-35`), passing IID `IID_IUnknown`
|
||||||
/// (`{00000000-0000-0000-C000-000000000046}`).
|
/// (`{00000000-0000-0000-C000-000000000046}`).
|
||||||
|
|||||||
@@ -915,10 +915,20 @@ impl Session {
|
|||||||
// set so the OBJREF binding is always parseable as
|
// set so the OBJREF binding is always parseable as
|
||||||
// "<host>[<port>]".
|
// "<host>[<port>]".
|
||||||
let identities = ExporterIdentities::random();
|
let identities = ExporterIdentities::random();
|
||||||
// Build the loopback address structurally rather than via `.parse()`
|
// Bind on UNSPECIFIED (`0.0.0.0`) so the listener accepts
|
||||||
// — avoids `.expect()` on a Result that's structurally infallible
|
// dial-backs on every interface NmxSvc could resolve the
|
||||||
// (clippy::expect_used).
|
// hostname to. The OBJREF's host string is the machine's
|
||||||
let exporter_addr = SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST), 0);
|
// `COMPUTERNAME` (or `127.0.0.1` fallback), and NmxSvc
|
||||||
|
// resolves that via DNS — which on a typical AVEVA install
|
||||||
|
// returns the machine's primary NIC IP, not loopback. If the
|
||||||
|
// exporter binds only on `127.0.0.1`, the dial-back lands on
|
||||||
|
// a different interface and the TCP SYN is dropped, surfacing
|
||||||
|
// as `RegisterEngine2 → Fault(0x800706BA RPC_S_SERVER_UNAVAILABLE)`
|
||||||
|
// because NmxSvc can't reach our exporter to negotiate the
|
||||||
|
// callback bind. Binding on UNSPECIFIED (= bind to all v4
|
||||||
|
// interfaces, including loopback + primary NIC) avoids this.
|
||||||
|
let exporter_addr =
|
||||||
|
SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::UNSPECIFIED), 0);
|
||||||
let (exporter, callback_events) = CallbackExporter::bind(exporter_addr, identities)
|
let (exporter, callback_events) = CallbackExporter::bind(exporter_addr, identities)
|
||||||
.await
|
.await
|
||||||
.map_err(Error::Io)?;
|
.map_err(Error::Io)?;
|
||||||
|
|||||||
Reference in New Issue
Block a user