From 8e695b9347f062a77c0f3809cbd91e8208effa41 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 5 May 2026 22:30:25 -0400 Subject: [PATCH] [F12 wrapper + F32 close] Session::connect_nmx_auto + close M5 type-matrix DoD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related closures in one commit: 1. Session-level wrapper around F12: new `mxaccess::Session::connect_nmx_auto(ntlm_factory, options, resolver, recovery)` gated on a new `mxaccess/windows-com` feature (which propagates `mxaccess-nmx/windows-com`). Drives `NmxClient::create` (the F12 COM-activation factory) for the `(host, port, service_ipid)` discovery, then funnels into the shared post-NMX-bind orchestration. Refactored `connect_nmx` to extract steps 1+2+4+5 into a private `from_nmx_client` helper — both `connect_nmx` and `connect_nmx_auto` reuse it so the `CallbackExporter` + router + `RegisterEngine2` + heartbeat policy stays in one place. The .NET `MxNativeSession.Open` shape (`MxNativeSession.cs:127-147`) is now reproduced end-to-end on Windows with `windows-com` on — callers no longer pre-resolve `(addr, service_ipid)` by hand. `connect_nmx`'s doc comment updated to drop the stale "F12 not yet wired" note. `parse_bracketed_host_port` in mxaccess-nmx gets a `cfg_attr(not(...), allow(dead_code))` so the default-feature build stays warning-clean. 2. F32 closed via option (b) of its own resolve criterion: the four missing types (Float / Double / DateTime / Duration) are gated on Galaxy-side template provisioning that's outside the Rust port's scope. The deployed test Galaxy on this host only has mx_data_type ∈ {1=Bool, 2=Int32, 5=String}; we cannot exercise the missing types without authoring new template attributes in the Aveva console (a manual platform-engineering task). The three-type live verification at commit 9063f10 satisfies the M5 DoD bullet for what is deployable. F18's M5 status block updated to reflect F32-resolved. Workspace: 718 tests pass on default features (was 712 before F12, +6 from new parse_bracketed_host_port tests). Default-feature clippy + windows-com clippy both clean. Closes F32 in design/followups.md and extends F12's resolution note with the Session-level wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) --- design/followups.md | 22 ++----- rust/crates/mxaccess-nmx/src/client.rs | 6 +- rust/crates/mxaccess/Cargo.toml | 3 + rust/crates/mxaccess/src/session.rs | 88 +++++++++++++++++++++----- 4 files changed, 87 insertions(+), 32 deletions(-) diff --git a/design/followups.md b/design/followups.md index cf7d996..e9b1fef 100644 --- a/design/followups.md +++ b/design/followups.md @@ -53,7 +53,7 @@ move to `## Resolved` with a date + commit hash. 4. ✅ `cargo build --workspace` + `cargo test --workspace` (711 tests) + `cargo clippy --workspace -- -D warnings` all green. **Remaining open work for full M5 closeout** (none are P0 blockers anymore): -- **F32**: live type-matrix coverage beyond Int32. +- ~~F32~~: resolved (commit ``) via option (b) — three-type live coverage is the deployable maximum; missing types are Galaxy-provisioning-gated. - **F28**: canonical-XML signing currently covers only the `[XmlSerializerFormat]` ops (AuthenticateMe / Disconnect / KeepAlive / RegisterItems / UnregisterItems). Read / Write / CreateSubscription / AddMonitoredItems / Publish / etc. still sign over NBFX wire bytes via the legacy fallback. Live Read works by virtue of those ops not requiring HMAC validation server-side under the empty `hashAlgorithm` setting (registry default), so this is latent rather than blocking. Promote to P0 once a deployment with non-empty `hashAlgorithm` is in scope. - ~~F29~~: resolved (commit ``) — `nbfs.rs` re-aligned to the canonical `[MC-NBFS]` table from `dotnet/wcf` `ServiceModelStringsVersion1`. - **F26 stream subscription**: `Stream` over a publish-loop is still stubbed in `AsbSession`. Tracked under F25 step 8 / F26 step 3 in the cumulative log. @@ -146,21 +146,6 @@ move to `## Resolved` with a date + commit hash. F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTransport`) remain open. With F19-F24 landed, the M5 framing/encoder layer (streams A+B+C+D and the codec stream) is complete; F25 composes them into the `IASBIDataV2` wire client. F22's static dictionary subset is intentionally curated; expand entries as wire captures show new IDs. F27 (constant-time DH) is filed as a separate follow-up below. -### F32 — Live type-matrix coverage for `asb-subscribe` -**Severity:** P1 — final M5 DoD bullet (#3). -**Source:** F18 M5 status block. - -**Live coverage so far (commits `9063f10`, ``):** three types round-trip end-to-end against the live MxDataProvider on this Windows host: -- ✅ **Int32** (type_id 4) — `TestChildObject.TestInt = 99` -- ✅ **String** (type_id 10) — `TestChildObject.TestString = "mxaccesscli verified 17778523775"` (UTF-16LE on the wire) -- ✅ **Bool** (type_id 17) — `DelmiaReceiver_001.TestAttribute = 0` - -A SQL probe of the Galaxy DB (`SELECT mx_data_type, MIN(...) FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id INNER JOIN dynamic_attribute da ON da.package_id = p.package_id WHERE g.is_template = 0 GROUP BY da.mx_data_type`) shows the live Galaxy only has tags of `mx_data_type ∈ {1=Bool, 2=Int32, 5=String}`. Float (3), Double (4), DateTime (6), Duration (7), and array shapes are not deployed in this Galaxy, so we cannot exercise them without provisioning new attributes. - -**Transient flakiness observed:** the `InvalidConnectionId` race after one-way `AuthenticateMe` is not deterministic — even with the `MAX_ATTEMPTS=10`, `BACKOFF_BASE_MS=200` retry loop in `register_items` (commit ``) and a 250 ms post-auth settle in `connect`, individual runs occasionally exhaust the retry budget. The server appears to enter a degraded mode after many test runs (presumably pending-connection table fills), and a 30-second cool-down restores reliability. Each tag works in some runs and fails in others; the failure is not tag-specific. Production deployments with a single long-lived session are unlikely to hit this. - -**Resolves when:** Either (a) the Galaxy is augmented with sample tags for the four missing types and an `asb-typematrix.rs` integration test loops over all seven proven types, OR (b) the four-missing-types coverage is acknowledged as gated on Galaxy-side provisioning that's outside the Rust port's scope and the followup is closed with the three-type live verification as the M5 DoD ✓. - ### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte) **Status: PARTIALLY RESOLVED.** The five `[XmlSerializerFormat]` ops (AuthenticateMe, Disconnect, KeepAlive, RegisterItems, UnregisterItems) plus the per-action `ValidatorWireFormat` selector + DH-params-from-registry + dynamic-dict id management all landed in commits `f14580e` / `104efc4`. Live AuthenticateMe + RegisterItems work end-to-end (commit `9063f10`). Read / Write / CreateSubscription / AddMonitoredItems / Publish / DeleteMonitored / DeleteSubscription / PublishWriteComplete still sign over NBFX wire bytes via the legacy fallback; works in practice because the live registry has empty `hashAlgorithm` (no HMAC required for the unforced-MAC path), but will break under any deployment that sets a real algorithm. **Severity now P2** — promote back to P0 if a hashAlgorithm-non-empty environment is in scope. **Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs). @@ -230,6 +215,11 @@ The fixture is captured by `MxAsbClient.Probe --dump-deterministic-hmac` (`src/M ### F12 — `NmxClient::create` (auto-resolving COM-activation factory) **Resolved:** 2026-05-05 (commit ``). Builds on F6: new `NmxClient::create(ntlm_factory)` constructor in `crates/mxaccess-nmx/src/client.rs`, gated on `cfg(all(windows, feature = "windows-com"))`. New crate-level feature `mxaccess-nmx/windows-com` propagates to `mxaccess-rpc/windows-com`. Mirrors `ManagedNmxService2Client.Create()` (`cs:30-64`) + `ResolveService` (`cs:491-523`) — six steps: (1) `com_objref_provider::marshal_activated_iunknown_objref("NmxSvc.NmxService", MarshalContext::DifferentMachine)` activates the COM class and emits an OBJREF blob; (2) `ComObjRef::parse` extracts `oxid` + `ipid` (the activated server's `IUnknown` IPID); (3) `resolve_oxid_with_managed_ntlm_packet_integrity` against `127.0.0.1:135` (RPCSS endpoint mapper) returns the server's `(host, port)` bindings + `IRemUnknown` IPID; (4) the `ncacn_ip_tcp` non-security binding's `host[port]` text is parsed via the new `parse_bracketed_host_port` helper (mirrors the .NET `ParseBracketedHost` / `ParseBracketedPort` pair, using `rfind` so FQDNs with `.` round-trip — matches `cs:540-561`); (5) a fresh transport binds to `IRemUnknown` and calls `RemQueryInterface(iunknown_ipid, INmxService2_IID, fresh_causality_id, public_refs=5)` — the `RemQiResult` carries the new `INmxService2` IPID; (6) a second fresh transport binds to `INmxService2` via `Self::connect`. The `ntlm_factory: impl FnMut() -> NtlmClientContext` closure is invoked **three times** (one per bind); callers are responsible for fresh contexts each call. New error variants: `NmxClientError::Activation(ProviderError)` (only with `windows-com`) and `NmxClientError::EndpointResolution { reason }` (covers no binding / parse failure / non-zero RemQI HRESULT). 6 offline tests on the host/port parser pin: extracts FQDN host + port, uses `rfind` for the rightmost brackets, rejects missing `[` / missing `]` / non-numeric port / port overflow. 1 live test (`#[ignore]`'d, gated on `MX_LIVE` + the `MX_TEST_*` Setup-LiveProbeEnv env triple) round-trips end-to-end against the AVEVA install — activates `NmxSvc.NmxService`, drives the full chain, asserts the resolved `service_ipid` is non-zero. Live verification: passes. Workspace tests went 17 → 23 in mxaccess-nmx (+6). +**Session-level wrapper (same commit):** `mxaccess::Session::connect_nmx_auto(ntlm_factory, options, resolver, recovery)` — gated on the new `mxaccess/windows-com` feature (which propagates to `mxaccess-nmx/windows-com`). Refactored `connect_nmx` to extract the post-NMX-bind orchestration into a private `from_nmx_client` helper; both `connect_nmx` and `connect_nmx_auto` funnel through it so the `CallbackExporter` + router-task + `RegisterEngine2` + heartbeat policy stays in one place. `connect_nmx`'s doc comment updated — the prior "F12 not yet wired" note is gone. With both layers landed, the .NET `MxNativeSession.Open` surface (`cs:127-147`) is reproduced end-to-end on the Rust side: callers no longer need to pre-resolve `(host, port, service_ipid)` by hand on Windows. + +### F32 — Live type-matrix coverage for `asb-subscribe` +**Resolved:** 2026-05-05 (commit ``). Closed via option (b) of the followup's own resolve criterion: the four missing types (Float / Double / DateTime / Duration) are gated on Galaxy-side provisioning that's outside the Rust port's scope. The deployed test Galaxy on this host only has `mx_data_type ∈ {1=Bool, 2=Int32, 5=String}` (verified via direct SQL probe of `dbo.dynamic_attribute`); we cannot exercise the missing types without authoring new template attributes in the Aveva console — a manual platform-engineering task, not a Rust port issue. The three-type live verification (Int32 = 99, String = `"mxaccesscli verified 17778523775"`, Bool = 0) at commit `9063f10` therefore satisfies the **type-matrix DoD bullet for what is deployable**. M5 DoD bullet #3 closes ✓ for the deployed shape; if a future deployment provisions the remaining four types, an `asb-typematrix.rs` integration test that loops over all seven types would make a clean follow-on. **Transient `InvalidConnectionId` race** noted in the original block remains as a known characteristic of the live MxDataProvider after many test cycles (settles after a 30-second cool-down); production deployments with a single long-lived session are unlikely to hit it. + ### F6 — Port `ComObjRefProvider.cs` (OBJREF emitter via Win32 `CoMarshalInterface`) **Resolved:** 2026-05-05 (commit ``). New module `crates/mxaccess-rpc/src/com_objref_provider.rs` (~330 LoC including tests) gated on `cfg(all(windows, feature = "windows-com"))`. Pulls `windows = "0.59"` (features `Win32_Foundation` + `Win32_System_Com` + `Win32_System_Com_Marshal` + `Win32_System_Com_StructuredStorage` + `Win32_System_Memory`) as an optional dep behind the existing `windows-com` feature; default footprint stays slim. Public API mirrors `ComObjRefProvider.cs` 1:1: `MarshalContext` enum (InProcess / Local / DifferentMachine — wraps the `MSHCTX_*` newtype constants), `clsid_from_prog_id(&str) -> Result` (wraps `CLSIDFromProgID`), `marshal_activated_iunknown_objref(prog_id, ctx)` (activates via `CoCreateInstance(CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER)` then marshals), `marshal_iunknown_objref(unknown, ctx)` (uses `IUnknown::IID`), `marshal_interface_objref(unknown, iid, ctx)` (the underlying `CoMarshalInterface` over an HGlobal-backed `IStream`). All `unsafe` is internal to the module — public API exposes only typed Rust values, no raw pointers / HRESULTs / lifetime-bound interface pointers. Each `unsafe` block carries an inline SAFETY comment. `ProviderError` enumerates the four documented failure modes (UnknownProgId, ActivationFailed, MarshalFailed, GlobalLockFailed) plus the apartment-init pre-check (ApartmentInitFailed). Per-thread COM init via `OnceLock<()>` thread-local: lazy `CoInitializeEx(MULTITHREADED)` on first call; `S_FALSE` (already initialised) and `RPC_E_CHANGED_MODE` (thread is STA) treated as success — matches the .NET runtime's tolerant apartment behaviour. 4 offline tests pin: `MarshalContext` → `MSHCTX_*` mapping, `ensure_apartment` idempotence, `clsid_from_prog_id` returns `UnknownProgId` for fake ProgIDs, `marshal_activated_*` short-circuits at the resolution stage. 1 live test (`#[ignore]`'d, gated on `MX_LIVE`) round-trips the real `NmxSvc.NmxService`: activates, marshals, then parses the blob via `ComObjRef::parse` and asserts non-zero OXID + IPID. Live verification: passes against the AVEVA install on this host. Workspace tests went 183 → was 179 in mxaccess-rpc (+4 new). Unblocks F12 (NmxClient::create) — the auto-resolving COM-activation factory can now chain `marshal_activated_iunknown_objref` → `ComObjRef::parse` → `resolve_oxid_with_managed_ntlm_packet_integrity` → `RemQueryInterface` over the existing primitives. diff --git a/rust/crates/mxaccess-nmx/src/client.rs b/rust/crates/mxaccess-nmx/src/client.rs index f3e69e0..eddbd05 100644 --- a/rust/crates/mxaccess-nmx/src/client.rs +++ b/rust/crates/mxaccess-nmx/src/client.rs @@ -127,7 +127,11 @@ fn fresh_orpc_this() -> OrpcThis { /// expects (`cs:540-561`). The host is everything before the **last** `[`, /// the port is the decimal text between that `[` and the **last** `]`. /// -/// Used by [`NmxClient::create`] only. +/// Used by [`NmxClient::create`] only — gated on `windows-com`. +#[cfg_attr( + not(all(windows, feature = "windows-com")), + allow(dead_code) +)] fn parse_bracketed_host_port(binding: &str) -> Result<(String, u16), NmxClientError> { let open = binding.rfind('[').ok_or_else(|| NmxClientError::EndpointResolution { reason: format!("binding {binding:?} has no '['"), diff --git a/rust/crates/mxaccess/Cargo.toml b/rust/crates/mxaccess/Cargo.toml index 98515ed..dd146fe 100644 --- a/rust/crates/mxaccess/Cargo.toml +++ b/rust/crates/mxaccess/Cargo.toml @@ -36,6 +36,9 @@ serde = ["mxaccess-codec/serde"] # `live` gates integration tests that hit a running AVEVA install. Driven by # the `MX_LIVE` env var via `tools/Setup-LiveProbeEnv.ps1`. live = [] +# Pulls F12's `Session::connect_nmx_auto` constructor — the auto-resolving +# COM-activation path. Propagates to `mxaccess-nmx/windows-com`. +windows-com = ["mxaccess-nmx/windows-com"] [lints] workspace = true diff --git a/rust/crates/mxaccess/src/session.rs b/rust/crates/mxaccess/src/session.rs index 38ab907..2a0302a 100644 --- a/rust/crates/mxaccess/src/session.rs +++ b/rust/crates/mxaccess/src/session.rs @@ -393,14 +393,18 @@ pub(crate) async fn callback_router( } impl Session { - /// Open a session over the NMX transport. Mirrors the wire-side of - /// `MxNativeSession.Open` (`MxNativeSession.cs:127-147`) — `Open` - /// itself is .NET-side: COM-activates `NmxSvc.NmxService`, marshals - /// an OBJREF, calls ResolveOxid + RemQI to discover `(host, port, - /// service_ipid)`, then calls `RegisterEngine2`. The Rust port - /// requires the caller to pre-resolve those because COM activation - /// is not yet wired (followup F12); the call sequence after that is - /// identical. + /// Open a session over the NMX transport with caller-supplied + /// `(addr, service_ipid)`. Mirrors the wire-side of + /// `MxNativeSession.Open` (`MxNativeSession.cs:127-147`). + /// + /// The .NET shape of `Open` includes a COM-activation pre-step + /// (`NmxSvc.NmxService` → `CoMarshalInterface` → `ResolveOxid` → + /// `IRemUnknown::RemQueryInterface`) that auto-resolves + /// `(host, port, service_ipid)` before `RegisterEngine2`. This + /// constructor takes the resolved triple by hand — useful for + /// tests, deterministic probes, and non-Windows builds. Use + /// [`Self::connect_nmx_auto`] (Windows + `windows-com` feature) to + /// drive the auto-resolving path. /// /// On success: a `RegisterEngine2` round-trip has completed and the /// LMX server has acknowledged the engine registration. The @@ -421,7 +425,66 @@ impl Session { recovery: RecoveryPolicy, ) -> Result { recovery.validate()?; + let nmx = NmxClient::connect(addr, service_ipid, ntlm) + .await + .map_err(map_nmx)?; + Self::from_nmx_client(nmx, options, resolver).await + } + /// Auto-resolving NMX session bring-up. Mirrors `MxNativeSession.Open`'s + /// .NET shape (`MxNativeSession.cs:127-147`) end-to-end — including + /// the COM-activation step that + /// [`Self::connect_nmx`] expects the caller to have done by hand. + /// + /// Internally this drives [`mxaccess_nmx::NmxClient::create`] (F12) + /// to discover `(host, port, service_ipid)` via: + /// `marshal_activated_iunknown_objref("NmxSvc.NmxService")` → + /// `ComObjRef::parse` → `ResolveOxid` against `127.0.0.1:135` → + /// `IRemUnknown::RemQueryInterface` → bind `INmxService2`. Then + /// runs the same post-NMX setup as [`Self::connect_nmx`] + /// (`CallbackExporter`, router task, `RegisterEngine2`, optional + /// heartbeat). Only available on Windows with the `windows-com` + /// feature. + /// + /// `ntlm_factory` is invoked **three times** (once per bind in the + /// COM-activation chain). Callers typically capture credentials + /// from `MX_TEST_*` env vars and produce a fresh + /// [`NtlmClientContext`] each call. + /// + /// # Errors + /// + /// [`Error::Connection`] for any failure in the COM-activation / + /// OXID-resolution / RemQI chain (mapped from + /// `NmxClientError::Activation` and `NmxClientError::EndpointResolution`), + /// plus the same failures `connect_nmx` can return after the NMX + /// client is built. + #[cfg(all(windows, feature = "windows-com"))] + pub async fn connect_nmx_auto( + ntlm_factory: impl FnMut() -> NtlmClientContext, + options: SessionOptions, + resolver: Arc, + recovery: RecoveryPolicy, + ) -> Result { + recovery.validate()?; + let nmx = NmxClient::create(ntlm_factory).await.map_err(map_nmx)?; + Self::from_nmx_client(nmx, options, resolver).await + } + + /// Shared post-NMX-bind orchestration: stand up the + /// `CallbackExporter` + router task, call `RegisterEngine2` with + /// the local callback OBJREF, optionally configure heartbeats, and + /// assemble [`SessionInner`]. + /// + /// Mirrors steps 1, 2, 4, 5 of the .NET `MxNativeSession.Open` + /// shape (`cs:148-170`). Both [`Self::connect_nmx`] and + /// [`Self::connect_nmx_auto`] funnel through here so the + /// callback-exporter / register / heartbeat policy stays in one + /// place. + async fn from_nmx_client( + mut nmx: NmxClient, + options: SessionOptions, + resolver: Arc, + ) -> Result { // 1. Bind a local CallbackExporter on an OS-assigned ephemeral // port, then build the OBJREF advertising it. Hostname comes // from `local_hostname()` (env-var lookup); falls back to @@ -447,12 +510,7 @@ impl Session { let (callback_tx, _) = broadcast::channel(CALLBACK_BROADCAST_CAPACITY); let router_handle = tokio::spawn(callback_router(callback_events, callback_tx.clone())); - // 3. Open the NMX transport + bind. - let mut nmx = NmxClient::connect(addr, service_ipid, ntlm) - .await - .map_err(map_nmx)?; - - // 4. RegisterEngine2 with the callback OBJREF. Mirrors cs:163-175. + // 3. RegisterEngine2 with the callback OBJREF. Mirrors cs:163-175. let hr = nmx .register_engine_2( options.local_engine_id, @@ -468,7 +526,7 @@ impl Session { return Err(Error::Connection(ConnectionError::EngineNotRegistered)); } - // 5. Optional heartbeat-interval setup (cs:165-167). + // 4. Optional heartbeat-interval setup (cs:165-167). if let Some(ticks) = options.heartbeat_ticks_per_beat { let hr = nmx .set_heartbeat_send_interval(ticks, options.heartbeat_max_missed_ticks)