[F12 partial + F55] hold IUnknown for client lifetime + diagnose RegisterEngine2 1722
rust / build / test / clippy / fmt (push) Has been cancelled
rust / cargo public-api drift check (F41) (push) Has been cancelled

**F12 partial improvement** (`mxaccess-rpc::IUnknownHolder` + `mxaccess-nmx`):

- New `IUnknownHolder` newtype that owns an MTA-resident COM proxy
  with `unsafe impl Send + Sync`. Mirrors the .NET reference's
  `ManagedNmxService2Client._activatedComObject` private field
  (`cs:15`).
- New `activate_and_marshal_iunknown_objref(prog_id, ctx)` returns
  `(Vec<u8>, IUnknownHolder)`. Existing
  `marshal_activated_iunknown_objref` retained as a wrapper that
  drops the holder (kept for inline-use callers).
- `NmxClient` gains an `activated_com_object: Option<IUnknownHolder>`
  field, populated by `Self::create` from the new helper.
  `Self::connect` / `Self::from_bound_transport` set it `None` (no
  COM activation in those paths).
- Holding the IUnknown for the client's lifetime keeps the
  SCM-tracked OXID valid; without it the COM ref count drops to
  zero and the SCM may release the activated server-side instance,
  making subsequent `ResolveOxid` / `RemQueryInterface` calls
  return `RPC_S_SERVER_UNAVAILABLE`.

**F55 (new) — hand-rolled callback exporter rejected by RegisterEngine2**

Five-step instrumentation of `Session::connect_nmx_auto` proves all
six COM-activation / RemQI / final-bind steps succeed. The 1722
fault originates at `RegisterEngine2` itself:

```
from_nmx_client: callback hostname="DESKTOP-6JL3KKO" port=57886 obj_ref_len=162
from_nmx_client: callback obj_ref hex: 4d454f57010000...
from_nmx_client: RegisterEngine2 (31112, mxaccess.31112)
from_nmx_client: RegisterEngine2 FAIL: Transport(Fault { status: 2147944122 })
```

Status `0x800706BA` = `RPC_S_SERVER_UNAVAILABLE` wrapped as Win32
HRESULT.

**Critical finding: the .NET reference's `--probe-register-managed-callback`
(which uses the same hand-rolled `ManagedCallbackExporter` approach
as the Rust port) ALSO fails with the same `0x800706BA` fault.**
Only `--probe-session-write`, which uses
`ComObjRefProvider.MarshalInterfaceObjRef(callback, ...)` to build
the OBJREF via Windows DCOM proxy/stub marshalling, succeeds. So
this is an architectural artifact of the hand-rolled-callback
design, not a Rust port regression.

`design/followups.md` F55 entry documents the three resolution
paths (switch to DCOM-marshalled callback / hybrid / continue
investigating OBJREF rejection at NmxSvc).

F49 stays open with a refined diagnostic — the per-feature live
verification is gated on F55's resolution.

Workspace tests still 824 passing; clippy `-D warnings` clean
across both feature configurations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-06 08:50:30 -04:00
parent e5b31fadb1
commit c5d611d6fa
4 changed files with 143 additions and 9 deletions
+28
View File
@@ -98,6 +98,34 @@ Between each publish: wait for the crate to be indexed before the next one's `ca
**Resolves when:** the lint is on and the workspace doc build is warning-clean with it.
### F55 — Hand-rolled callback exporter rejected by `RegisterEngine2` on this AVEVA install
**Severity:** P1 — blocks F49 live verification of every M6 feature that needs an `Engine` registered (i.e. all of them).
**Source:** Live attempt 2026-05-06 against the local AVEVA install. Both the Rust port and the .NET reference's `--probe-register-managed-callback` (which uses the same hand-rolled-exporter approach as the Rust port) fail `RegisterEngine2` with HRESULT `0x800706BA` (`RPC_S_SERVER_UNAVAILABLE` wrapped as Win32 HRESULT). The .NET reference's `--probe-session-write` SUCCEEDS because it goes through `MxNativeSession.Open``CreateRegisteredService` (`MxNativeSession.cs:624`) which does **`ComObjRefProvider.MarshalInterfaceObjRef(callback, INmxSvcCallback, DifferentMachine)`** on a real C# COM object — letting Windows DCOM proxy/stub infrastructure handle the callback dispatch — instead of building a hand-rolled OBJREF + TCP listener.
**The Rust port mirrors the .NET reference's `ManagedCallbackExporter` design exactly.** Both fail. So this isn't a Rust port regression — it's a pre-existing issue in the hand-rolled callback architecture that wasn't previously live-tested end-to-end against this NmxSvc install.
**Diagnostic chain (logged from `mxaccess::Session::from_nmx_client`):**
1. `Session::connect_nmx_auto``NmxClient::create` → all 6 steps OK (activate, marshal, ResolveOxid, RemQI, final bind). Endpoint resolved to `[fe80::...]:64311`. The new `IUnknownHolder` (mirrors `_activatedComObject` from `ManagedNmxService2Client.cs:15`) keeps the COM ref alive across the steps.
2. `from_nmx_client` builds the callback OBJREF (162 bytes, byte-structurally identical to .NET's at `ProbeRegisterEngine2ManagedCallback.managed_callback_objref_hex` modulo random fields).
3. `RegisterEngine2(engine_id, engine_name, version=6, callback_obj_ref)` returns `Transport(Fault { status: 0x800706BA })`.
**The OBJREF binding is correct:** `DESKTOP-6JL3KKO[<port>]` with `port` from `tokio::net::TcpListener::bind(0.0.0.0:0)`. Windows Firewall is OFF on all profiles. The hand-rolled exporter accepts connections; NmxSvc just refuses to use it.
**Hypotheses (each needs verification):**
1. NmxSvc validates callback OBJREFs through Windows DCOM (`CoUnmarshalInterface` or similar) before registering them — and the hand-rolled blob fails that validation, surfacing as `RPC_S_SERVER_UNAVAILABLE` because COM interprets it as "the named server is unreachable".
2. The OBJREF carries fields (e.g. specific `STDOBJREF.flags`, security bindings, or authn-hint values) NmxSvc requires that the hand-rolled builder doesn't set correctly. Comparing the byte-by-byte structure shows identical layout to .NET's hand-rolled OBJREF — but the same .NET hand-rolled OBJREF also fails. So this isn't a Rust-vs-.NET layout drift, it's an architecture-vs-NmxSvc gap.
3. The NmxSvc version on this dev machine has stricter callback validation than the reference development version targeted by `MxNativeClient`'s original architecture. (NmxSvc release notes / version unknown at this point.)
**Three resolution paths (each substantial):**
- **Path A — switch to DCOM-marshalled callback.** Refactor `mxaccess-callback` so the callback is a real COM class (`#[implement]` via `windows-rs`) registered with the local DCOM SCM, then marshal it via `CoMarshalInterface` for the OBJREF. Abandons the project's "bypass DCOM proxy/stubs" goal but matches what .NET's working path does. ~1 week of work.
- **Path B — hybrid: register via DCOM, dispatch via hand-rolled.** Use `CoMarshalInterface` only to build the OBJREF (which NmxSvc accepts), but intercept the inbound callback connection at the TCP layer to bypass DCOM stub dispatch. Requires reading the `CoMarshalInterface`-produced OBJREF, extracting the OXID/IPID, and standing up a TCP listener that responds to OXID resolution against itself. Architecturally awkward.
- **Path C — investigate the OBJREF rejection at NmxSvc.** Capture the wire bytes NmxSvc sees from the .NET DCOM-marshalled path vs the hand-rolled path; diff to find what NmxSvc actually validates. May reveal a single field difference (e.g. a flag bit) that, set correctly in the hand-rolled OBJREF, makes it work. Cheapest if it pans out, but unbounded if it doesn't.
**Definition of done:** F49 step 5 (LmxClient OnWriteComplete round-trip) runs end-to-end against the live AVEVA install: `cargo test -p mxaccess-compat --features live-windows-com --test lmx_write_complete_live -- --ignored --nocapture` passes.
**Resolves when:** one of the three paths above lands.
### F3 — Cross-domain NTLM Type1/2/3 fixture
**Severity:** P2
**Status:** Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here.