Files
mxaccess/design/followups.md
T
Joseph Doherty bedad57b4e
rust / build / test / clippy / fmt (push) Has been cancelled
design/followups: move F18 (M5 meta-tracker) to Resolved
Trim the planning content (sub-stream breakdown table, parallel-safety
analysis, risk-driven sequencing, "Resolves when" gate) since M5 is
done. Keep the closure verdict, M5 DoD checklist showing the actual
state at close, sub-followup closeout list (F19-F26 + F28/F29/F30/
F31/F32/F33/F34), cumulative execution log, and the architectural
note explaining why AsbSession stays parallel to the NMX Session
rather than unified — that's load-bearing for future maintenance.

Open section now contains only F3 (cross-domain NTLM Type1/2/3
fixture, permanently external-blocked on this single-domain dev host
— resolution requires multi-domain Windows lab not available here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 04:09:03 -04:00

49 KiB
Raw Blame History

Followups

Open work items deferred during /loop iterations. Triaged at the top of every iteration. New items are appended under ## Open; resolved items move to ## Resolved with a date + commit hash.

Open

F3 — Cross-domain NTLM Type1/2/3 fixture

Severity: P2 Status: Permanently out-of-scope on the current dev host (no second AD domain). Resolution requires external infrastructure not available here. Source: M2 wave 1, crates/mxaccess-rpc/src/ntlm.rs. All current NTLM fixtures are single-domain (the local AVEVA install). Tracked separately in design/70-risks-and-open-questions.md R8 (P1 risk) and the open-evidence-gaps table. Concrete next step: Provision a two-domain Windows lab (e.g. LAB-A + LAB-B with cross-domain trust + an AVEVA install on LAB-A that authenticates a user from LAB-B). Run cargo run -p mxaccess --example connect-write-read from a LAB-B-domain user; capture the NTLM Type1 / Type2 / Challenge / Type3 bytes via examples/asb-relay.rs or a Wireshark NTLM filter. Save under crates/mxaccess-rpc/tests/fixtures/cross-domain-ntlm/. The existing single-domain Type1/2/3 round-trip tests in mxaccess-rpc::ntlm then extend to validate the cross-domain shape (TargetInfo AV pairs differ when crossing domains; specifically MsvAvDnsTreeName and MsvAvDnsComputerName carry the trusted-domain DNS suffix instead of the local one). Clears R8 in the risks doc.

Resolved

F18 — M5 plan of attack (ASB transport, parallel-safe sub-streams)

Resolved: 2026-05-06 — all sub-followups F19F26 closed plus F28 / F29 / F30 / F31 / F32 / F33 / F34 layered on top. M5 is functionally LIVE end-to-end: cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt against the AVEVA install successfully exercises Connect → AuthenticateMe → RegisterItems → Read → CreateSubscription → AddMonitoredItems → Publish (delivers tag value) → DeleteMonitoredItems → DeleteSubscription → UnregisterItems → Disconnect with canonical-XML HMAC signing on every signed op. Severity: P0 — milestone driver, blocks ASB consumers + V1 release. Source: design/dependencies.md:73-89 + design/60-roadmap.md:84-91 + design/70-risks-and-open-questions.md:5-25.

M5 DoD per design/60-roadmap.md:91:

  1. cargo run -p mxaccess --example asb-subscribe succeeds against the live AVEVA endpoint — Read returns the real tag value, Publish stream delivers monitored values via the F26 stream (AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }).
  2. ⚠️ Wire structure matches .NET's request bytes byte-for-byte for AuthenticateMe / Register / AddMonitoredItems (verified via asb-relay middleman with the .NET probe routed through ClientVia + the captured add-monitored-items-request-wire.bin fixture for F34). Strict byte-identical parity for the response side is not guaranteed because WCF chunks Bytes8/16/32 records at different boundaries — both forms are functionally equivalent and collect_asbidata_payloads concatenates chunks (commit cf97eab). Canonical XML for the 13 signed ops is byte-equal to .NET's XmlSerializer.Serialize output (F28 fixture-comparison tests).
  3. ⚠️ Type matrix: only Int32 verified live (the captured TestChildObject.TestInt tag). Bool / Float / Double / String / DateTime / Duration / arrays not yet exercised against live MxDataProvider — three-type live coverage was the deployable maximum on this dev host (F32 closed via option (b): missing types are Galaxy-provisioning-gated, not codec-gated).
  4. cargo build --workspace + cargo test --workspace (758 tests) + cargo clippy --workspace -- -D warnings all green.

M5 sub-followup closeout:

  • F19: workspace deps for aes / hmac / md-5 / sha1 / sha2 / pbkdf2 / flate2 / rand / crypto-bigint / quick-xml / tokio-util.
  • F20 (NMF framing), F21 (NBFX node codec), F22 (NBFS static dictionary), F23 (auth crypto), F24 (AsbVariant codec), F25 (IASBIDataV2 client end-to-end), F26 (mxaccess::AsbSession over AsbTransport + Stream<Item = MonitoredItemValue>).
  • F28: canonical-XML HMAC signing for all 13 ConnectedRequest shapes (XmlSerializer-byte-equal vs .NET fixtures; legacy NBFX-bytes fallback retired).
  • F29: nbfs.rs re-aligned to canonical [MC-NBFS] / ServiceModelStringsVersion1 table.
  • F30: dict-id resolution post-pass turns Static(id) element/attribute names back into their string forms on the read side.
  • F31: InvalidConnectionId-on-first-Register-after-AuthenticateMe pattern resolved (cool-down + retry).
  • F32: live type-matrix coverage capped at the deployable maximum on this dev host.
  • F33: InvalidConnectionId tolerance pattern propagated to all 8 ConnectedRequest response decoders + the F26 stream's publish-loop terminates cleanly on server-side rejection.
  • F34: MonitoredItem wire format uses DataContract field-suffix names (activeField / bufferedField / itemField / etc.) under prefix b bound to the DC namespace — verified live (F26 stream now delivers values).

Cumulative execution log. F19 + F23 (ed17c07); F24 (7611d9e); F20 (9dfd193); F22 (43c10a1); F21 (5f98558); F25 step 1 (25dbd8d); F25 step 2 (a2b8989); F25 step 3 (c4bf0a0); F25 step 4 (1e59249); F25 step 5 (9b8133f); F25 step 6 (321b796); F25 step 7 (1b1ee1e); F26 step 1 (8a0f92b); F26 step 2 (14bb529); example rewrite (c6570dc); F25 step 8 (b543eb1); F25 step 9 (0441a2e); F25 step 10 (9876b4e); F25 live-bring-up reconciliation (NBFX PrefixElement_a..z + xmlns redeclaration + SOAP-fault surfacing); F26 step 3 (AsbSession cheap-clone async API); F28 step 1 (f14580e) + step 2; F29 / F30 / F31 / F32 / F33; F34 (101a8b1). For per-step detail, see the matching commit message — git show <hash> is the authoritative record.

Architectural note (kept for future maintenance): mxaccess::AsbSession is deliberately parallel to the NMX-shaped Session rather than unified. The NMX Session carries orchestration (CallbackExporter, callback router task, recovery broadcast, INmxService2 mutex) that has no ASB analogue, and ASB's request/response loop over a single TCP stream maps naturally to Mutex<AsbClient> — the two paths converge at the consumer-facing mxaccess API but stay distinct at the orchestration layer. AsbSession is Clone + Send + Sync via Arc<AsbSessionInner>, so each clone() is O(1) and the inner mutex serialises operation calls.

F34 — MonitoredItem wire format: DataContract field-suffix names, not XmlSerializer property names

Resolved: 2026-05-06 (commit 101a8b1). Severity: P2 — affected the F26 stream's data flow against MxDataProvider; canonical-XML HMAC signing for the operation was already verified working (server accepted the request, returned a non-fault response).

Two halves, both closed:

Half 1 — Response decoder (closed earlier). decode_publish_response previously filtered empty <ASBIData/> placeholders out of the positional payload list. Captured the full S→C bytes of a working PublishResponse via examples/asb-relay.rs between the .NET probe and MxDataProvider (fixture stashed at crates/mxaccess-asb/tests/fixtures/publish-response-with-value.bin). The wire shape is <Status><ASBIData/></Status><Values><ASBIData>{bytes}</ASBIData></Values> — Status is empty-but-present, Values carries the binary MonitoredItemValue[]. collect_asbidata_payloads previously skipped the empty Status, shifting Values down to index 0 where the decoder mis-read it as Status and corrupted the parse. Fix: always push every <ASBIData> element as a positional entry, empty or not. tests/publish_capture.rs runs the full decode chain over the real wire bytes and asserts values.len() == 1.

Half 2 — Request body emitter (closed by this commit). Rewrite of push_monitored_item_body (crates/mxaccess-asb/src/operations.rs) replaces the legacy XmlSerializer property names (<MonitoredItem>, <Item>, <SampleInterval>, <Active>, <Buffered>) with the WCF DataContract field-suffix names emitted under prefix b bound to http://schemas.datacontract.org/2004/07/ArchestrAServices.ASBIDataV2Contract. Children: <b:MonitoredItem> with <b:activeField>, <b:activeFieldSpecified>, <b:bufferedField>, <b:itemField> (with nested ItemIdentity DC fields <b:contextNameField> / <b:idField> / <b:idFieldSpecified> / <b:nameField> / <b:referenceTypeField> / <b:typeField>), <b:sampleIntervalField>, <b:timeDeadbandField>, <b:timeDeadbandFieldSpecified>, <b:userDataField> (Variant), <b:valueDeadbandField> (Variant). The <Items> wrapper now declares xmlns:b + xmlns:i (XSI). Wire-byte type encoding matches the captured fixture: bool → Bool record; ulong → Zero/One/Chars (decimal text via XmlConvert); ushort → Zero/One/Int8/Int16/Int32 (smallest-fit binary); int32 → same. Empty string? and null byte[]? emit as empty elements (no <i:nil> attribute, matching the wire). Field order follows the explicit [DataMember(Order = N)] declarations from AsbContracts.cs:940-965. The canonical-XML HMAC-signing emitter at xml_canonical::emit_monitored_item is unchanged (still XmlSerializer-property names) — F28 fixture-byte-equality holds for all 13 ops.

The dual-format world (the root insight that drove the fix): ASB requests have two element-name conventions on the wire — HMAC canonical XML (input to AsbAuthenticator::Sign) uses XmlSerializer-derived names (<Active>, <Items>, <MonitoredItem>); binary NBFX body (the actual wire request) uses DataContractSerializer-derived names (<b:activeField>, <b:bufferedField>, etc.). For ops where the body is purely IAsbCustomSerializableType arrays (Read, Register, Unregister), no DataContract names appear — every payload is wrapped as <Items><ASBIData>{bytes}</ASBIData></Items> (binary fast-path) and our builders were already correct. The DC schema only matters for ops carrying non-IAsbCustomSerializable types like MonitoredItem and (likely) WriteValue.

Captured ground-truth dictionary (from tests/fixtures/add-monitored-items-request-wire.bintests/add_monitored_items_request_capture.rs decodes it). The .NET WCF binary writer pre-declares 23 strings in the per-message dynamic dictionary including the wrapper / array / namespace strings plus all DC field names: activeField, activeFieldSpecified, bufferedField, itemField, contextNameField, idField, idFieldSpecified, nameField, referenceTypeField, typeField, sampleIntervalField, timeDeadbandField, timeDeadbandFieldSpecified, userDataField, lengthField, payloadField, valueDeadbandField. The dictionary-id pre-population that .NET's WCF binary writer uses is a perf optimisation; an inline-string emit works for correctness — and that's what our rewrite does.

Verification:

  1. New unit test add_monitored_items_body_uses_data_contract_field_names (asserts every DC field name appears under prefix b in [DataMember(Order = N)] sequence, with the legacy XmlSerializer names absent).
  2. Live cargo run -p mxaccess --example asb-subscribe -- --tag TestChildObject.TestInt against the AVEVA install: AddMonitoredItems returns 1 status item with error_code=0x0000 (was 0 items previously); Publish poll #4 delivers the actual tag value through the F26 stream as AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }. Workspace cargo test 757 → 758 pass; clippy clean.

Bonus context discovered while debugging F34:

  • MinimalMonitoredItem gained an active: Option<bool> field with with_active(item, interval, active) constructor. Without <Active>true</Active> on the wire (or its DC equivalent <b:activeField>true</>+<b:activeFieldSpecified>true</>), MxDataProvider treats the subscription as inactive even when AddMonitoredItems "succeeds" — F26 stream then never sees values.
  • SampleInterval unit corrected from "100-ns ticks" to milliseconds in the example + the MinimalMonitoredItem.sample_interval doc — matches MxAsbDataClient.cs:441's ulong sampleInterval = 1000 default.
  • result_code = 32 is AsbErrorCode.PublishComplete (AsbResultMapping.cs:37), informational not fatal — ToResult:122-129 treats it like Success. F26 stream's publish_loop narrowed to bail only on RESULT_CODE_INVALID_CONNECTION_ID = 1.

F28 — Canonical XML serialiser for ConnectedRequest signing (matches XmlSerializer.Serialize byte-for-byte)

Resolved: 2026-05-06 (commit <this commit>). All 13 ConnectedRequest shapes now sign over byte-identical canonical XML; the legacy NBFX-bytes fallback is gone from every client::* op. Hardens the ASB transport against deployments with a non-empty hashAlgorithm registry value (where the server's HMAC validation actually runs).

Two-step closure:

  1. Step 1 (commit f14580e, 2026-05-05) — landed the 5 [XmlSerializerFormat] ops (AuthenticateMe, Disconnect, KeepAlive, RegisterItems, UnregisterItems) plus the per-action ValidatorWireFormat selector + DH-params-from-registry + dynamic-dict id management. Live AuthenticateMe + RegisterItems verified end-to-end (commit 9063f10).
  2. Step 2 (this commit) — extended MxAsbClient.Probe --dump-signed-xml to emit the 8 remaining shapes (ReadRequest, WriteBasicRequest, PublishWriteCompleteRequest, CreateSubscriptionRequest, DeleteSubscriptionRequest, AddMonitoredItemsRequest, DeleteMonitoredItemsRequest, PublishRequest) against deterministic field values. Saved fixtures at rust/crates/mxaccess-asb/tests/fixtures/signed-xml/{read,write-basic,publish-write-complete,create-subscription,delete-subscription,add-monitored-items,delete-monitored-items,publish}-request.xml. Pinned byte sizes 981 / 1497 / 741 / 814 / 793 / 1768 / 1782 / 771. Ported 8 emitters in mxaccess-asb::xml_canonical: emit_read_request_xml, emit_write_basic_request_xml, emit_publish_write_complete_request_xml, emit_create_subscription_request_xml, emit_delete_subscription_request_xml, emit_add_monitored_items_request_xml, emit_delete_monitored_items_request_xml, emit_publish_request_xml. New helpers: emit_invensys_text (primitives in the parent ns), emit_write_value (<Values> wrapper inlining Value/Status/Comment), emit_monitored_item (<Items> wrapper with Item/SampleInterval/ValueDeadband/UserData/Buffered), emit_inline_item_identity (ItemIdentity as a child of MonitoredItem with shared parent xmlns), emit_inline_text / emit_inline_optional_string (no-xmlns-redeclaration variants), emit_idata_variant (Variant's Type/Length/Payload in the idata.data namespace), emit_iom_default_variant (default-shape Variant for ValueDeadband / UserData). New private helper AsbClient::pre_signing_validator() consolidates the 8 call-site repetitions of (connection_id, peek_next_message_number, "", "").

Wired into client::*: every send_signed_envelope[_one_way] call now passes Some(&xml) for xml_for_signing — the legacy NBFX-bytes fallback path inside send_signed_envelope is unreachable from the standard client. (The path itself stays in place to allow lower-level callers and tests to exercise the fallback.) The 8 ops affected: read, write, publish_write_complete, delete_monitored_items, create_subscription, add_monitored_items, publish, delete_subscription (plus their _once retry-loop variants for the ops that retry on InvalidConnectionId).

Verification: 8 new fixture-comparison tests (each emitter byte-equal vs the .NET fixture on the first try, no iteration). Workspace mxaccess-asb 87 → 95 tests; default-feature clippy clean. Live cargo run -p mxaccess --example asb-subscribe returns TestChildObject.TestInt = 99 against AVEVA — proving Read (now signed via canonical XML) round-trips end-to-end where it previously used the legacy NBFX-bytes path. The other 7 ops are wire-tested only at fixture-byte-equality so far; live exercise is gated on the F33 follow-on capture for subscribe-flow ops, but the canonical XML produces byte-identical bytes to the .NET reference, so the HMAC will match by construction.

Closes: M5 DoD bullets 1+2 fully resolved across all 13 ConnectedRequest shapes. The hashAlgorithm-non-empty deployment shape is no longer latent — any future deployment with a real algorithm should sign correctly without further work.

F16 — Real Session::recover_connection reconnect loop (re-bind + re-advise)

Resolved: 2026-05-06 (commit <this commit>). Replaces the wave-2 no-op recover_connection with the full .NET-equivalent shape (MxNativeSession.cs:399-474).

Three pieces, all in crates/mxaccess/src/session.rs:

  1. Subscription registry on SessionInner — new subscriptions: Mutex<HashMap<[u8; 16], SubscriptionEntry>> tracks every active advise. subscribe() inserts the (correlation_idSubscriptionEntry { metadata }) row after a successful AdviseSupervisory. unsubscribe() removes it on the success path only — failed UnAdvises stay in the registry so the next recovery replays them. The consumer's Subscription handle still holds the BroadcastStream; the registry is purely for replay.
  2. Pluggable RebuildFactory — public typedef pub type RebuildFactory = Arc<dyn Fn() -> Pin<Box<dyn Future<Output = Result<NmxClient, NmxClientError>> + Send>> + Send + Sync>. Installed via the new Session::set_recovery_factory(factory); queryable via Session::has_recovery_factory(). Kept separate from connect_nmx / connect_nmx_auto so the existing constructors stay non-breaking — consumers opt in to recovery by calling the setter after-the-fact.
  3. Real recover_connection + recover_connection_corerecover_connection is now the retry loop (mirrors cs:399-440): for attempt in 1..=policy.max_attempts, emit RecoveryEvent::Started → call recover_connection_core → emit Recovered on success (return) or Failed { will_retry, error } on failure (sleep policy.delay, retry, or bubble the last error after the budget is exhausted). recover_connection_core mirrors cs:442-474: rebuild NMX via the factory → RegisterEngine2 with the saved callback_obj_ref (the same exporter is reused — no TCP listener restart) → optional SetHeartbeatSendInterval → snapshot the registry under the lock, then iterate replaying AdviseSupervisory(correlation_id) for each entry → atomically swap *nmx_lock = replacement (the old NmxClient drops at end of scope, closing its TCP transport).

Subscription correlation ids are preserved across the swap, so the consumer's Subscription stream continues to receive on its existing broadcast filter without observing the recovery event. The CallbackExporter stays bound across recoveries (no need to re-bind a TCP listener).

New error variant ConfigError::RecoveryNotConfigured returned when recover_connection is called without a factory installed. New public re-export: RebuildFactory.

R15's "long-lived connection task" was previously listed as a hard prerequisite, but the existing Mutex<NmxClient> already serialises concurrent operations during the rebuild — recover_connection_core holds the inner mutex during the swap, so concurrent ops just wait. Functionally equivalent to the long-lived-task design.

Tests (4 new in mxaccess):

  • recover_connection_without_factory_returns_recovery_not_configured — no factory → ConfigError::RecoveryNotConfigured.
  • recovery_events_supports_multiple_subscribers (updated) — Arc-shared Started event with a stub-failing factory.
  • recover_connection_with_always_failing_factory_exhausts_attempts — pins (Started, Failed)×3 sequence + final will_retry=false + bubbled TransportFailure error.
  • subscribe_populates_registry_unsubscribe_clears_it — subscribe → registry entry; unsubscribe → cleared.

Workspace mxaccess 65 → 67 tests; default-feature clippy clean. The connect_nmx_auto-side auto-population of the factory (capturing the ntlm_factory + discovered (addr, service_ipid) so consumers don't need to re-author the closure) is a future polish not required to close F16.

F2 — NTLM verify_signature path + constant-time MAC compare (server-to-client direction)

Resolved: 2026-05-06 (commit <this commit>). Structural port from [MS-NLMP] §3.4.4 — same shape as sign but uses the server-to-client (S→C) sub-keys derived alongside the client-to-server pair at the end of create_type3. The S2C key derivation already existed in auth.rs (the seal_key/sign_key helpers take a client_mode flag); F2 just plumbs them into a new verify_signature(message, signature) -> Result<(), NtlmError> method on NtlmClientContext.

NtlmClientContext gained four new fields populated during create_type3: server_signing_key, server_sealing_key, server_sealing_state (RC4), and server_sequence (independent counter). The verify path:

  1. Validates signature.len() == 16 and the leading version word 0x00000001.
  2. Reads the trailing 4-byte sequence number and compares against self.server_sequence (mismatch ⇒ InvalidSignature, no state change).
  3. Computes expected_mac = HMAC_MD5(server_signing_key, seq || message)[0..8] then RC4(server_sealing_state).Transform(expected_mac).
  4. Constant-time compares expected_mac against wire bytes 4..12 via subtle::ConstantTimeEq (timing-oracle safe).
  5. On success: commits the advanced cipher state + increments server_sequence. On failure: re-derives RC4 from server_sealing_key and skips past server_sequence × 8 keystream bytes to restore the pre-verify position — caller can retry with a corrected signature.

New dep subtle = "2" (workspace-internal to mxaccess-rpc) for the constant-time MAC compare. 6 new tests pin every documented edge: round-trip against sign (3-message sequence), corrupted-MAC rejection (with server_sequence non-advance assertion), wrong-sequence-number rejection, wrong-version-field rejection, wrong-length rejection, before-authenticate NotAuthenticated error. mxaccess-rpc 188 → 194 tests.

The "Awaiting wire-fixture capture" step listed in the prior status note is no longer a hard prerequisite — the algorithm shape is fully defined by [MS-NLMP] §3.4.4 and the round-trip tests prove the decoder/encoder pair is internally consistent. A captured INmxSvcCallback::StatusReceived frame would still validate byte-by-byte parity vs a real NmxSvc.exe server-side signer, but that's a future verification task; the structural port ships unblocked.

F10 — IObjectExporter::ResolveOxid2 (opnum 4) body codec

Resolved: 2026-05-06 (commit <this commit>) per option (b) of the followup's resolve criterion: structural port from [MS-DCOM] §3.1.2.5.1.4. New parse_resolve_oxid2_result in crates/mxaccess-rpc/src/object_exporter.rs mirrors the opnum-0 parser exactly except for the extra COMVERSION slot (4 bytes: u16 major + u16 minor) wedged between authn_hint and error_status. New types: ComVersion and ResolveOxid2Result. The trailing-fields truncation check tightens from 24 bytes (opnum 0) to 28 bytes (opnum 4) to account for the COMVERSION slot.

referent_id == 0 short-circuits to an empty bindings + ComVersion::default() + status from the trailing 4 bytes — same shape pattern as the opnum-0 parser. mxaccess-rpc 183 → 188 tests (+4 structural tests covering: short-stub error, referent-zero short-circuit, full one-binding round-trip with COMVERSION assertion, truncated-trailing error).

No live ResolveOxid2 capture exists in this tree (the .NET reference doesn't call opnum 4); structural correctness is pinned against [MS-DCOM] §3.1.2.5.1.4 verbatim. Future captured frames will validate.

F11 — IRemUnknown::RemAddRef and RemRelease body codecs

Resolved: 2026-05-06 (commit <this commit>) — structural port from [MS-DCOM] §3.1.1.5.6. Both opnums share the same REMINTERFACEREF[] request shape (per [MS-DCOM] §2.2.19: 16-byte IPID + 4-byte cPublicRefs + 4-byte cPrivateRefs per element, prefixed by an OrpcThis header + u16 count + 2-byte NDR padding + u32 max_count). New encoders encode_rem_add_ref_request and encode_rem_release_request (the latter delegates to a shared encode_remref_array_request helper since the wire shape is identical between the two ops).

Response shape: OrpcThat(8) + referent_id(4) + max_count(4) + N×4-byte HRESULT + error_code(4) per the conformant-array convention established by RemQueryInterface's response decoder. referent_id == 0 short-circuits to an empty per_ref_hresults array. New RemRefResponse struct + parse_remref_response decoder shared between both opnums. New RemInterfaceRef struct.

4 new structural tests: AddRef request layout pin (88-byte total for a 2-element refs array), Release-vs-AddRef wire-shape equivalence, full HRESULT[] round-trip with two HRESULTs (success + E_FAIL), referent-zero short-circuit. Like F10, the .NET reference doesn't call these opnums; structural correctness is pinned against [MS-DCOM] §3.1.1.5.6 verbatim.

F27 — Constant-time DH mod_exp (swap num-bigintcrypto-bigint::DynResidue)

Resolved: 2026-05-06 (commit <this commit>). Per the followup's own option (b): added a fixed-width U2048 DH backend via crypto-bigint::modular::runtime_mod::DynResidue. New auth.rs::constant_time_mod_exp(base, exp, modulus) wrapper preserves the BigUint-in-BigUint-out API used by the byte-conversion helpers; the actual square-and-multiply chain runs in Montgomery form against the registry-supplied prime as a U2048. Both DH call sites (public-key generation in AsbAuthenticator::new at line 179, and shared-secret derivation in crypto_key at line 354) swap BigUint::modpow for the new wrapper.

crypto-bigint::DynResidueParams::new requires an odd modulus (Montgomery form's only restriction). DH primes in production are always odd by definition; the only exception is the CryptoParameters::DEFAULT_PRIME_TEXT test-fixture default, which ends in 4 (mathematically unsound for DH but kept for parity with the .NET reference's published default constant). For that case the wrapper falls back to the legacy BigUint::modpow — same wire bytes either way, so there's no fixture or HMAC-output divergence.

Wire-byte parity verified:

  • Unit tests: 61 in mxaccess-asb-nettcp (was 61) — auth.rs::deterministic_hmac_matches_dotnet_fixture is the byte-for-byte ground-truth pin against captured .NET output (passphrase / prime / generator / private-key / remote-pub / message-number / connection-id / IV / consumer-data all pinned to deterministic values; derive_validator_mac_iv runs the full DH→PBKDF2→AES-CBC chain and asserts hex equality of every intermediate). Continues to pass after the swap.
  • Live: cargo run -p mxaccess --example asb-subscribe — Connect handshake completes with apollo:V2 lifetime + apollo=true, proving the server accepted the constant-time-derived public key and the shared-secret-based AuthenticateMe. Tested 2026-05-06 against the local AVEVA install with the captured 768-bit MX_ASB_DH_PRIME = 1552...7919 (odd; takes the constant-time path).

Workspace deps: crypto-bigint = "0.5" added to [workspace.dependencies] and to mxaccess-asb-nettcp/Cargo.toml. num-bigint retained for decimal-string parsing + .NET-LE byte conversion (crypto-bigint has neither). Default-feature clippy clean. The "review.md MAJOR finding" originally flagged at design/30-crate-topology.md:269-274 is now closed.

F33 — Live wire reconciliation for the ASB subscription path

Resolved: 2026-05-06 (commits 218f4c4, 7a5f251, <this commit>). MX_ASB_TRACE_REPLY capture during investigation revealed the live MxDataProvider returns a Result wrapper with <resultCodeField>1</> + <successField>false</> followed by empty <ASBIData/> payloads when it short-circuits on InvalidConnectionId — the same transient race F31 fixed for RegisterItems. The original F33 symptoms (subscription_id = 0 from CreateSubscriptionResponse, MissingField "Status" from AddMonitoredItemsResponse) were both consequences of decoders not tolerating that wrapper shape, NOT a fundamentally different wire format. Three commits propagated the F31 tolerance pattern to every remaining response decoder and surfaced result_code / success so the F26 stream's publish-loop can detect failures cleanly.

  1. 218f4c4decode_read_response + client::read retry loop. Added result_code / success to ReadResponse. Live verified: TestChildObject.TestInt = 99 returned end-to-end where the prior run had bailed with MissingField "Status".
  2. 7a5f251 — same pattern for decode_create_subscription_response (returns subscription_id = 0 sentinel when missing instead of erroring) + decode_add_monitored_items_response. Both ops gain F31-style retry loops in client::create_subscription / client::add_monitored_items.
  3. <this commit> — pattern propagated to the remaining five decoders: decode_publish_response, decode_unregister_items_response, decode_delete_monitored_items_response, decode_write_response, decode_publish_write_complete_response. Shared extract_result_status(body_tokens) helper consolidates the per-decoder find_text_in_named_element calls. The F26 stream's publish_loop (asb_session.rs::publish_loop) now terminates the stream with a ConnectionError::TransportFailure carrying "publish returned result_code 0xXX (server-side rejection)" when PublishResponse.result_code is Some(non_zero) — preventing silent infinite-spin on InvalidConnectionId.

Live read still passes after all changes. mxaccess-asb 79 → 87 tests (+8 InvalidConnectionId tolerance tests via the shared synthesise_invalid_connection_id_body helper). Default-feature clippy clean.

The examples/asb-subscribe.rs Subscribe demo can be promoted from the current Read-loop form once a fresh live run confirms the active subscribe-flow doesn't surface additional wire-format gaps beyond the InvalidConnectionId race. The "session desync" observed in the original investigation should clear once the retry loops give the subscribe ops time to succeed.

F12 — NmxClient::create (auto-resolving COM-activation factory)

Resolved: 2026-05-05 (commit <this commit>). Builds on F6: new NmxClient::create(ntlm_factory) constructor in crates/mxaccess-nmx/src/client.rs, gated on cfg(all(windows, feature = "windows-com")). New crate-level feature mxaccess-nmx/windows-com propagates to mxaccess-rpc/windows-com. Mirrors ManagedNmxService2Client.Create() (cs:30-64) + ResolveService (cs:491-523) — six steps: (1) com_objref_provider::marshal_activated_iunknown_objref("NmxSvc.NmxService", MarshalContext::DifferentMachine) activates the COM class and emits an OBJREF blob; (2) ComObjRef::parse extracts oxid + ipid (the activated server's IUnknown IPID); (3) resolve_oxid_with_managed_ntlm_packet_integrity against 127.0.0.1:135 (RPCSS endpoint mapper) returns the server's (host, port) bindings + IRemUnknown IPID; (4) the ncacn_ip_tcp non-security binding's host[port] text is parsed via the new parse_bracketed_host_port helper (mirrors the .NET ParseBracketedHost / ParseBracketedPort pair, using rfind so FQDNs with . round-trip — matches cs:540-561); (5) a fresh transport binds to IRemUnknown and calls RemQueryInterface(iunknown_ipid, INmxService2_IID, fresh_causality_id, public_refs=5) — the RemQiResult carries the new INmxService2 IPID; (6) a second fresh transport binds to INmxService2 via Self::connect. The ntlm_factory: impl FnMut() -> NtlmClientContext closure is invoked three times (one per bind); callers are responsible for fresh contexts each call. New error variants: NmxClientError::Activation(ProviderError) (only with windows-com) and NmxClientError::EndpointResolution { reason } (covers no binding / parse failure / non-zero RemQI HRESULT). 6 offline tests on the host/port parser pin: extracts FQDN host + port, uses rfind for the rightmost brackets, rejects missing [ / missing ] / non-numeric port / port overflow. 1 live test (#[ignore]'d, gated on MX_LIVE + the MX_TEST_* Setup-LiveProbeEnv env triple) round-trips end-to-end against the AVEVA install — activates NmxSvc.NmxService, drives the full chain, asserts the resolved service_ipid is non-zero. Live verification: passes. Workspace tests went 17 → 23 in mxaccess-nmx (+6).

Session-level wrapper (same commit): mxaccess::Session::connect_nmx_auto(ntlm_factory, options, resolver, recovery) — gated on the new mxaccess/windows-com feature (which propagates to mxaccess-nmx/windows-com). Refactored connect_nmx to extract the post-NMX-bind orchestration into a private from_nmx_client helper; both connect_nmx and connect_nmx_auto funnel through it so the CallbackExporter + router-task + RegisterEngine2 + heartbeat policy stays in one place. connect_nmx's doc comment updated — the prior "F12 not yet wired" note is gone. With both layers landed, the .NET MxNativeSession.Open surface (cs:127-147) is reproduced end-to-end on the Rust side: callers no longer need to pre-resolve (host, port, service_ipid) by hand on Windows.

F32 — Live type-matrix coverage for asb-subscribe

Resolved: 2026-05-05 (commit <this commit>). Closed via option (b) of the followup's own resolve criterion: the four missing types (Float / Double / DateTime / Duration) are gated on Galaxy-side provisioning that's outside the Rust port's scope. The deployed test Galaxy on this host only has mx_data_type ∈ {1=Bool, 2=Int32, 5=String} (verified via direct SQL probe of dbo.dynamic_attribute); we cannot exercise the missing types without authoring new template attributes in the Aveva console — a manual platform-engineering task, not a Rust port issue. The three-type live verification (Int32 = 99, String = "mxaccesscli verified 17778523775", Bool = 0) at commit 9063f10 therefore satisfies the type-matrix DoD bullet for what is deployable. M5 DoD bullet #3 closes ✓ for the deployed shape; if a future deployment provisions the remaining four types, an asb-typematrix.rs integration test that loops over all seven types would make a clean follow-on. Transient InvalidConnectionId race noted in the original block remains as a known characteristic of the live MxDataProvider after many test cycles (settles after a 30-second cool-down); production deployments with a single long-lived session are unlikely to hit it.

F6 — Port ComObjRefProvider.cs (OBJREF emitter via Win32 CoMarshalInterface)

Resolved: 2026-05-05 (commit <this commit>). New module crates/mxaccess-rpc/src/com_objref_provider.rs (~330 LoC including tests) gated on cfg(all(windows, feature = "windows-com")). Pulls windows = "0.59" (features Win32_Foundation + Win32_System_Com + Win32_System_Com_Marshal + Win32_System_Com_StructuredStorage + Win32_System_Memory) as an optional dep behind the existing windows-com feature; default footprint stays slim. Public API mirrors ComObjRefProvider.cs 1:1: MarshalContext enum (InProcess / Local / DifferentMachine — wraps the MSHCTX_* newtype constants), clsid_from_prog_id(&str) -> Result<GUID, ProviderError> (wraps CLSIDFromProgID), marshal_activated_iunknown_objref(prog_id, ctx) (activates via CoCreateInstance(CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_REMOTE_SERVER) then marshals), marshal_iunknown_objref(unknown, ctx) (uses IUnknown::IID), marshal_interface_objref(unknown, iid, ctx) (the underlying CoMarshalInterface over an HGlobal-backed IStream). All unsafe is internal to the module — public API exposes only typed Rust values, no raw pointers / HRESULTs / lifetime-bound interface pointers. Each unsafe block carries an inline SAFETY comment. ProviderError enumerates the four documented failure modes (UnknownProgId, ActivationFailed, MarshalFailed, GlobalLockFailed) plus the apartment-init pre-check (ApartmentInitFailed). Per-thread COM init via OnceLock<()> thread-local: lazy CoInitializeEx(MULTITHREADED) on first call; S_FALSE (already initialised) and RPC_E_CHANGED_MODE (thread is STA) treated as success — matches the .NET runtime's tolerant apartment behaviour. 4 offline tests pin: MarshalContextMSHCTX_* mapping, ensure_apartment idempotence, clsid_from_prog_id returns UnknownProgId for fake ProgIDs, marshal_activated_* short-circuits at the resolution stage. 1 live test (#[ignore]'d, gated on MX_LIVE) round-trips the real NmxSvc.NmxService: activates, marshals, then parses the blob via ComObjRef::parse and asserts non-zero OXID + IPID. Live verification: passes against the AVEVA install on this host. Workspace tests went 183 → was 179 in mxaccess-rpc (+4 new). Unblocks F12 (NmxClient::create) — the auto-resolving COM-activation factory can now chain marshal_activated_iunknown_objrefComObjRef::parseresolve_oxid_with_managed_ntlm_packet_integrityRemQueryInterface over the existing primitives.

F14 — tiberius-backed SQL implementation of Resolver + UserResolver

Resolved: 2026-05-05 (commit <this commit>). New module crates/mxaccess-galaxy/src/sql_resolver.rs (~480 LoC) gated behind the existing galaxy-resolver Cargo feature; adds SqlTagResolver + SqlUserResolver, both constructed via from_ado_string(&str) accepting the same shape the .NET reference uses by default (Server=localhost;Database=ZB;Integrated Security=True;Encrypt=False;TrustServerCertificate=True). Integrated Security=True resolves to Windows authentication via tiberius's winauth feature. Each top-level call opens a fresh Client<Compat<TcpStream>> and drops it on return — matches the .NET await using shape. tiberius's Client::query only accepts positional @P1..@PN placeholders (delegates to sp_executesql); the canonical RESOLVE_SQL / BROWSE_SQL / USER_BY_GUID_SQL / USER_BY_NAME_SQL constants are rewritten once-per-process via OnceLock<String> (@objectTagName@P1, etc.). read_metadata mirrors ReadMetadata (cs:149-165) byte-by-byte: signed smallinti16 widened to u16 for platform/engine/object IDs (matches the .NET checked((ushort)...)), inti32 checked-cast to i16 for property_id, nullable nvarchar for primitive_name. read_user_profile mirrors ReadProfile (cs:76-85) including the roles_text blob → parse_role_blob round-trip. New deps: tiberius 0.12 (tds73/rustls/winauth features, no chrono / rust_decimal), tokio-util compat feature for the futures-rs ↔ tokio AsyncRead bridge, futures-util for TryStreamExt::try_next. New live feature in the crate for parity with the workspace pattern (live = ["galaxy-resolver"]). 11 offline unit tests pin: SQL named→positional rewriting (no @named left, @P1/@P2/@P3 present), line-count preserved by rewriting, ado-string acceptance (default Galaxy shape parses; garbage rejected), input validation (max_rows=0 rejected, empty LIKE rejected, empty user_name rejected). Two #[cfg(feature = "live")] #[ignore]'d tests round-trip against a real Galaxy DB (gated on MX_LIVE + MX_GALAXY_DB env vars per tools/Setup-LiveProbeEnv.ps1): live_resolve_test_child_object_test_int (TestChildObject.TestInt → mx_data_type=2 Int32, is_array=false) and live_browse_test_child_object (browse returns ≥1 attribute on TestChildObject). Both pass against the local AVEVA install.

F4 + F5 — BindAck body parser + captured-bytes round-trip

Resolved: 2026-05-05 (commit <this commit>). Single change closes both: new BindAckPdu struct + BindAckResult per-result type + decode/encode impl in crates/mxaccess-rpc/src/pdu.rs. Body layout per [C706] §12.6.3.4: port_any_t secondary address (u16-length + bytes including NUL) + alignment to 4-byte boundary + n_results u8 + 3 reserved + array of p_result_t (u16 result + u16 reason + 20-byte SyntaxId). Accepts both PacketType::BindAck and PacketType::AlterContextResponse (same body shape). New regression test bind_ack_round_trips_live_capture decodes the first 84 bytes of captures/013-loopback-subscribe-scalars/tcp-stream-__1_49704-to-__1_55690.bin (the server's response to the client's first Bind), asserts the shape (sec_addr="49704\0", n_results=2, NDR accepted + DCOM negotiate_ack reason 3), then re-encodes and asserts byte-identical against the original frame. Stronger live-wire parity than the prior synthetic-frame tests. F4 + F5 collapsed into one commit because they share scope (parser + round-trip-test).

F29 — Align mxaccess-asb-nettcp::nbfs static dictionary ids with canonical [MC-NBFS] table

Resolved: 2026-05-05 (commit <this commit>). The original hand-curated table was wrong starting at id 74 — entries had been deduplicated/renumbered without preserving the canonical id = 2 × StringN mapping from [MC-NBFS] §2.2, leaving most of the SOAP-fault subset at the wrong ids (Fault at 114 instead of 134, Code at 122 instead of 142, etc.). Replaced with a faithful port of the first 200 entries from dotnet/wcf ServiceModelStringsVersion1.cs (covering id 0..400, the canonical SOAP / WS-Addressing / WS-Security / Trust / Algorithm-URI subset) plus the 436..444 xsi/xsd/nil extras already in place. Four new tests pin: (a) ids monotonic, (b) ids all even (odd reserved for dynamic dict), (c) full SOAP-fault subset (s, Fault, MustUnderstand, Code, Reason, Text, Node, Role, Detail, Value, Subcode) resolves, (d) xsi/xsd/nil round-trip via position_of_static. Future extensions: append more ServiceModelStringsVersion1.StringN entries as captures show new ids; mechanical extension.

F31 — InvalidConnectionId on first Register after AuthenticateMe

Resolved: 2026-05-05 (commit 9063f10). Not a HMAC bug — AsbErrorCode.InvalidConnectionId (= 1) is a transient race that .NET's MxAsbDataClient.RegisterMany (cs:191-204) handles with a 5-attempt retry loop and 100*attempt ms backoff. AuthenticateMe is one-way (AsbContracts.cs:18); the server commits auth state asynchronously and a Register that arrives too quickly sees the connection in pre-authenticated state. decode_register_items_response now tolerates an empty <ASBIData /> Status array and surfaces Result.resultCodeField + successField; AsbClient::register_items retries up to 5 times on RESULT_CODE_INVALID_CONNECTION_ID (new public constant), mirroring .NET. Live verification: register status: 1 item(s); first error_code = 0x0000 followed by TestChildObject.TestInt = AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] } over the live wire.

F30 — Resolve dict-id element/attribute names on the read side

Resolved: 2026-05-05 (commit eb6c689). decode_envelope now runs a post-pass over body_tokens that substitutes NbfxName::Static(id)NbfxName::Inline(name) and NbfxText::DictionaryStatic(id)NbfxText::Chars(name) whenever the wire dict id resolves. Lookup tries the per-message binary header strings first, then the cumulative session dynamic dict, then the [MC-NBFS] static table (even ids). Tokens with unresolvable ids stay opaque so trace output still reveals them. Was the unblocker for F31: without it the server's <b:resultCodeField>1</> element came back as <b:Static(43)>1</> and the failure looked like a HMAC mismatch instead of a transient retryable error.

F7 — Consolidate Guid type across mxaccess-rpc

Resolved: 2026-05-05 in this iteration's commit. Guid was hoisted from objref::Guid into the new shared crate::guid::Guid module. objref and pdu now re-export from there; M2 wave 2's orpc, object_exporter, and rem_unknown import it directly. The OXID-resolve dual-string decoder additionally needs an owned protocol label (format!("protseq_0x{:04x}", tower_id) per ObjectExporterMessages.cs:120) — ComDualStringEntry::protocol was upgraded from &'static str to Cow<'static, str> to support both decoders without the agent's interim Box::leak workaround.

F8 — RpcError is duplicated across objref and pdu modules

Resolved: 2026-05-05 in this iteration's commit. RpcError was hoisted into the new shared crate::error::RpcError module as a single union of all wave 1 variants plus a generic Decode { offset, reason: &'static str, buffer_len } variant for the wave 2 ORPC parsers' one-off failures. objref and pdu re-export from there; M2 wave 2's orpc, object_exporter, and rem_unknown use it directly.

F13 — NmxClient high-level write/advise/subscribe wrappers

Resolved: 2026-05-05. All seven wrappers landed in crates/mxaccess-nmx/src/client.rs: write, write2, write_secured2, advise_supervisory, send_observed_pre_advise_metadata, register_reference, un_advise. Each takes a GalaxyTagMetadata + a typed WriteValue (re-exported from mxaccess-codec), builds the inner NMX body via mxaccess-codec (write_message::encode / encode_timestamped / secured_write::encode / NmxItemControlMessage / NmxMetadataQueryMessage / NmxReferenceRegistrationMessage), wraps in NmxTransferEnvelope, and routes through transfer_data. The pure-codec encode_*_transfer_body helpers are extracted as pub(crate) fn for testability, mirroring the .NET reference's internal static shape. un_advise preserves the .NET reference's quirky NmxTransferMessageKind::Write envelope (not ItemControl) per cs:457.

F15 — Callback router wires CallbackExporter events into Subscription stream

Resolved: 2026-05-05 across two commits.

  • Step 1/2 (2b849ae): Session::connect_nmx now starts a CallbackExporter on a 127.0.0.1 ephemeral port, builds the OBJREF via local_hostname() + 127.0.0.1 fallback, registers it through NmxClient::register_engine_2 (was ..._without_callback). A callback_router task drains CallbackEvents, decodes each CallbackInvoked body via NmxSubscriptionMessage::parse_inner, and broadcasts parsed messages on a tokio::sync::broadcast channel exposed via Session::callbacks(). Shutdown chains: UnregisterEngine → CallbackExporter::shutdown → wait for router task.
  • Step 2/2 (this commit): Subscription now impls Stream<Item = Result<DataChange, Error>>. Filtering follows the .NET reference at cs:333-343 exactly — 0x32 SubscriptionStatus messages are kept only when message.item_correlation_id == subscription.correlation_id; 0x33 DataUpdate messages pass through to ALL subscriptions because the codec exposes no per-record correlation field (matches the .NET MxNativeCallbackEvent filter behavior verbatim). Each NmxSubscriptionRecord with a parseable value becomes one DataChange. Records with value: None are dropped silently (mirrors the .NET evt.Record.Value is null filter at cs:337). Lag-loss surfaces as Error::Configuration(InvalidArgument) carrying the lag count. Stream-end (broadcast sender dropped) yields None. New helper: filetime_to_system_time (inverse of the existing system_time_to_filetime); saturates at Unix epoch for pre-1970 FILETIMEs. Tests cover correlation match/mismatch for 0x32, 0x33 pass-through for any correlation, and FILETIME round-trip.

F1 — NTLM consumer-layer helpers (workstation default + from_env constructor)

Resolved: 2026-05-05. NtlmClientContext::from_env() reads MX_RPC_USER / MX_RPC_PASSWORD / MX_RPC_DOMAIN (mirrors ManagedNtlmClientContext.FromEnvironment at cs:41-49); empty MX_RPC_DOMAIN is permitted. local_hostname() checks COMPUTERNAME then HOSTNAME and returns the empty string when neither is set — same "unavailable" semantics as Environment.MachineName returning null. Lives in mxaccess-rpc/src/ntlm.rs; deliberately doesn't pull gethostname (no native-libc deps, no unsafe for hostname lookup). Added NtlmError::MissingEnvVar { name } for the env-var-unset case. Test mod gained an EnvScope + ENV_LOCK mutex pattern for serializing process-global env mutation across parallel tests.

F9 — ObjectExporterClient.cs ResolveOxid wrapper methods

Resolved: 2026-05-05. Both portable methods land in crates/mxaccess-rpc/src/object_exporter_client.rs: resolve_oxid_unauthenticated (mirrors cs:14-30) and resolve_oxid_with_managed_ntlm_packet_integrity (mirrors cs:66-81). Each opens a TCP connection, binds to IObjectExporter, calls opnum 0 with the encoded request, and decodes the response — preferring parse_resolve_oxid_result then falling back to parse_resolve_oxid_failure for short stubs. The two SSPI flavours (ResolveOxidWithNtlmConnect, ResolveOxidWithNtlmPacketIntegrity) wrap .NET's System.Net.Security.SspiClientContext and are explicitly out of scope for the Rust port — that's a permanent skip, not a deferral.

F17 — Guid::parse_str helper (dashed-hex string parser)

Resolved: 2026-05-05. Guid::parse_str(&str) -> Result<Guid, RpcError> landed in crates/mxaccess-rpc/src/guid.rs:65-112 as the inverse of the existing Display impl. Accepts the canonical dashed-hex form, optionally wrapped in {} braces (.NET B format), case-insensitive, and tolerant of bare 32-char hex without dashes. Single-pass char-by-char nibble accumulator avoids per-byte string allocation; the same byte-swap of groups 1-3 the Display impl does is applied after the raw hex pass. Eight new tests cover round-trip against the Display fixture (b49f92f7-c748-4169-8eca-a0670b012746), braces, uppercase, no-dashes, zero-GUID, too-short, too-long, and non-hex rejection. The five live-NMX examples (connect-write-read, subscribe, recovery, multi-tag, secured-write) lost their per-file 15-line parse_guid helpers in favour of the canonical implementation. Test count delta: 524 → 532 (+8).