From 9063f10b1b3b464fb2cacec29439d0f43df309a1 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 5 May 2026 21:02:38 -0400 Subject: [PATCH] =?UTF-8?q?[M5]=20mxaccess-asb:=20register=5Fitems=20retry?= =?UTF-8?q?=20on=20InvalidConnectionId=20=E2=80=94=20LIVE=20PATH=20WORKS?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit End-to-end live path now functional: Connect → AuthenticateMe → RegisterItems → Read → Disconnect. The example reads back the live TestChildObject.TestInt value (99) over the wire on the first run. Root-cause of the previous "InvalidConnectionId" mystery: it was never an HMAC verification failure. `AuthenticateMe` is one-way (`AsbContracts.cs:18`) and the server commits auth state asynchronously after the request lands. A Register that follows too quickly sees the connection in pre-authenticated state and returns `AsbErrorCode.InvalidConnectionId` (= 1). .NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) handles this explicitly with a retry loop: for (int attempt = 1; attempt < 5 && response.Result.ErrorCode == InvalidConnectionId; attempt++) { Thread.Sleep(TimeSpan.FromMilliseconds(100 * attempt)); response = RegisterOnce(items); } We now mirror the same pattern in `AsbClient::register_items_once` followed by a retry loop in `register_items` — up to 5 attempts with `100 * attempt` ms backoff. Supporting changes: - `RegisterItemsResponse` gains `result_code: Option` + `success: Option` so callers can read `Result.resultCodeField` + `successField` from the response. `decode_register_items_response` now tolerates an empty `` Status array (server returns empty when the operation fails server-side) instead of erroring with `MissingField`. New helper `find_text_in_named_element` walks the body token stream. - New public constant `RESULT_CODE_INVALID_CONNECTION_ID = 1` for callers that want to detect this status outside the retry path. - The previously-failing test `decode_register_items_response_returns_ missing_field_when_status_absent` was renamed and rewritten as `decode_register_items_response_returns_empty_status_when_absent` to match the new tolerant decode contract. F31 closed. F30 (read-side dict-id resolution, landed in `eb6c689`) was the unblocker — without it we couldn't see the `1` element in the response and the failure mode looked like a HMAC mismatch instead of a transient retryable error. Workspace: 711 unit tests pass. Clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- design/followups.md | 16 +--- rust/crates/mxaccess-asb/src/client.rs | 33 +++++++- rust/crates/mxaccess-asb/src/operations.rs | 97 +++++++++++++++++++--- 3 files changed, 120 insertions(+), 26 deletions(-) diff --git a/design/followups.md b/design/followups.md index 2d09dc5..0cffce1 100644 --- a/design/followups.md +++ b/design/followups.md @@ -141,20 +141,10 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr **Resolves when:** `decode_tokens` (or a post-pass over the token stream) substitutes `NbfxName::Static(id)` with `NbfxName::Inline(name)` whenever the dict id resolves to a known string. The dynamic dict (`read_dictionary`) accumulates session strings via `intern`; the read-path needs the parallel session counter to map wire ids to slots — wire ids are odd and session-cumulative across messages, mirroring the F28 fix on the write side. **Resolves**: F25 live data path (Read/Write/Subscribe responses are all dict-encoded too). -### F31 — `AuthenticateMe` HMAC silently invalid on the server (resultCode = `InvalidConnectionId`) -**Severity:** P1 — gates every signed and unsigned operation after Connect. -**Source:** Live capture + F30 dict-id resolution exposing the response `1` (= `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`) plus `false`. +### F30 — Resolve dict-id element/attribute names on the read side (RESOLVED, commit `eb6c689`) -**Why this is mysterious:** the entire crypto stack is proven byte-equal to .NET (commit `ce27b63` deterministic HMAC fixture covers DH, crypto_key, HMAC-SHA1, PBKDF2-SHA1, AES-CBC PKCS7), the canonical XML emitter is fixture-validated against `request.ToXml()` (commit `f14580e`), the registry DH params are honoured (commit `f14580e`), and the wire-level `` now carries the same four xmlns declarations .NET emits (`xmlns:h`, default `xmlns`, `xmlns:xsi`, `xmlns:xsd` all in this commit). Yet the server reports `InvalidConnectionId` on Register, indicating that AuthenticateMe's HMAC failed to verify and the server discarded the connection state. - -**Investigation done:** side-by-side `MX_ASB_TRACE_DERIVE` confirms passphrase bytes [96..176] of the crypto_key match .NET (commit `fd38189`); shared_secret bytes diverge per session because each peer chooses its own DH random, but the client+server pair derives the same value by construction. - -**Hypotheses still standing:** -- The server's canonical-XML reconstruction uses `new XmlSerializer(type)` without the `"urn:invensys.schemas"` default namespace that the client passes in `AsbSerialization.cs:27` — would produce different bytes, mismatching HMAC. Untestable from outside the server. -- A subtle byte-level wire difference that affects deserialization (e.g. an attribute the server's XmlSerializer requires but XmlBinaryReader normalizes differently). Hard to find without server logs. -- Some other state the server tracks per-connection that we're not setting (e.g. a session token from `ServiceAuthenticationData` we ignore). The `ConnectResponse.ServiceAuthenticationData` is currently parsed but not fed back into anything; .NET's `AsbSystemAuthenticator` may use it for a downstream verification we're missing. - -**Resolves when:** Either (a) the server is instrumented (`IncludeExceptionDetailInFaults` on the WCF service config, or a TraceListener on `System.ServiceModel.MessageLogging`) to surface the actual deserialization / HMAC mismatch reason; or (b) we capture .NET probe HMAC bytes alongside Rust HMAC bytes for a controlled scenario (fixed DH private key on both ends) and identify the byte-level divergence. +### F31 — InvalidConnectionId on first Register after AuthenticateMe — RESOLVED via retry +**Resolved:** ``. Not a HMAC bug after all — `AsbErrorCode.InvalidConnectionId` (= 1) is a **transient race** condition that .NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) explicitly handles with a retry loop (`for (int attempt = 1; attempt < 5 && response.Result.ErrorCode == InvalidConnectionId; attempt++)` with `100*attempt` ms backoff). `AuthenticateMe` is one-way (`AsbContracts.cs:18`); the server commits auth state asynchronously after the request lands, and a Register that arrives too quickly sees the connection in pre-authenticated state. `decode_register_items_response` now tolerates an empty `` Status array and surfaces `Result.resultCodeField` + `successField`; `AsbClient::register_items` retries up to 5 times on `RESULT_CODE_INVALID_CONNECTION_ID`, mirroring .NET. **Live verification**: `register status: 1 item(s); first error_code = 0x0000` followed by `TestChildObject.TestInt = AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }` — the real tag value `99` over the live wire, end-to-end. ### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte) **Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs). diff --git a/rust/crates/mxaccess-asb/src/client.rs b/rust/crates/mxaccess-asb/src/client.rs index a69a65c..7c8b1de 100644 --- a/rust/crates/mxaccess-asb/src/client.rs +++ b/rust/crates/mxaccess-asb/src/client.rs @@ -547,12 +547,43 @@ impl AsbClient { } /// `RegisterItems` operation — sends a signed `RegisterItemsIn` - /// SOAP envelope and decodes the `RegisterItemsResponse`. + /// SOAP envelope and decodes the `RegisterItemsResponse`. Retries + /// up to 5 times with `100 * attempt` ms backoff on + /// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's + /// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists + /// because `AuthenticateMe` is one-way: the server may not have + /// finished processing it before our first Register lands, in + /// which case it sees an unauthenticated connection and returns + /// `InvalidConnectionId`. A short backoff lets the auth state + /// commit on the server side. pub async fn register_items( &mut self, items: &[ItemIdentity], require_id: bool, register_only: bool, + ) -> Result { + let mut response = self + .register_items_once(items, require_id, register_only) + .await?; + let mut attempt = 1u32; + while attempt < 5 + && response.result_code + == Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID) + { + tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await; + response = self + .register_items_once(items, require_id, register_only) + .await?; + attempt += 1; + } + Ok(response) + } + + async fn register_items_once( + &mut self, + items: &[ItemIdentity], + require_id: bool, + register_only: bool, ) -> Result { let pre_signing = ConnectionValidator { connection_id: self.authenticator.connection_id(), diff --git a/rust/crates/mxaccess-asb/src/operations.rs b/rust/crates/mxaccess-asb/src/operations.rs index cba9fb2..9bffcff 100644 --- a/rust/crates/mxaccess-asb/src/operations.rs +++ b/rust/crates/mxaccess-asb/src/operations.rs @@ -1080,8 +1080,24 @@ pub struct RegisterItemsResponse { /// Whether the `` element appeared. Decoding the /// individual `ItemRegistration` records is a future iteration. pub item_capabilities_present: bool, + /// `Result.resultCodeField` from the response — `0` is success, + /// `1` is `InvalidConnectionId` (transient race with the one-way + /// AuthenticateMe), see `AsbResultMapping.cs:6` for the full enum. + /// `None` if the field wasn't found in the response. + pub result_code: Option, + /// `Result.successField` — `false` means the operation failed + /// server-side and the per-item Status array is empty. + pub success: Option, } +/// `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`. +/// Surfaces as `Result.resultCodeField=1` when the server has not +/// (yet) processed our one-way AuthenticateMe and treats the +/// connection as unauthenticated. .NET's `MxAsbDataClient.RegisterMany` +/// (`cs:191-204`) retries up to 5 times with a 100*N ms backoff per +/// attempt — we mirror that pattern in `AsbClient::register_items`. +pub const RESULT_CODE_INVALID_CONNECTION_ID: u32 = 1; + /// Decoded `UnregisterItemsResponse`. Single field: the per-item /// `Status` array (`AsbContracts.cs:153-159`). #[derive(Debug, Clone, PartialEq)] @@ -1090,23 +1106,75 @@ pub struct UnregisterItemsResponse { } /// Decode a `RegisterItemsResponse` SOAP body from the NBFX token -/// stream returned by [`crate::decode_envelope`]. +/// stream returned by [`crate::decode_envelope`]. Tolerates an empty +/// or missing `` (Status array) — that's how the server +/// signals an operation-level failure (e.g. `successField=false` + +/// non-zero `resultCodeField`). Caller is expected to inspect +/// `result_code` for transient failures like InvalidConnectionId +/// and retry where appropriate. pub fn decode_register_items_response( body_tokens: &[NbfxToken], ) -> Result { let payloads = collect_asbidata_payloads(body_tokens); - let status_payload = payloads - .into_iter() - .next() - .ok_or(OperationError::MissingField { field: "Status" })?; - let status = decode_item_status_array(&status_payload)?; + let status = match payloads.into_iter().next() { + Some(payload) if !payload.is_empty() => decode_item_status_array(&payload)?, + _ => Vec::new(), + }; let item_capabilities_present = find_element_named(body_tokens, "ItemCapabilities").is_some(); + let result_code = find_text_in_named_element(body_tokens, "resultCodeField") + .and_then(|s| s.parse().ok()); + let success = find_text_in_named_element(body_tokens, "successField") + .map(|s| s.eq_ignore_ascii_case("true")); Ok(RegisterItemsResponse { status, item_capabilities_present, + result_code, + success, }) } +/// Walk the token stream looking for an element with the given local +/// name (inline match) and return its first text child as a string. +/// Used to extract `Result.resultCodeField`, `successField`, etc. +/// from the structured RegisterItemsResponse body. +fn find_text_in_named_element(tokens: &[NbfxToken], name: &str) -> Option { + let mut idx = 0; + while let Some(tok) = tokens.get(idx) { + if let NbfxToken::Element { + name: NbfxName::Inline(local), + .. + } = tok + { + if local == name { + let mut inner = idx + 1; + while matches!( + tokens.get(inner), + Some(NbfxToken::Attribute { .. }) + | Some(NbfxToken::DefaultNamespace { .. }) + | Some(NbfxToken::NamespaceDeclaration { .. }) + ) { + inner += 1; + } + if let Some(NbfxToken::Text(text)) = tokens.get(inner) { + return Some(match text { + NbfxText::Chars(s) => s.clone(), + NbfxText::Zero => "0".to_string(), + NbfxText::One => "1".to_string(), + NbfxText::Bool(b) => b.to_string(), + NbfxText::Int8(n) => n.to_string(), + NbfxText::Int16(n) => n.to_string(), + NbfxText::Int32(n) => n.to_string(), + NbfxText::Int64(n) => n.to_string(), + _ => return None, + }); + } + } + } + idx += 1; + } + None +} + /// Decode an `UnregisterItemsResponse` SOAP body. pub fn decode_unregister_items_response( body_tokens: &[NbfxToken], @@ -1610,13 +1678,18 @@ mod tests { } #[test] - fn decode_register_items_response_returns_missing_field_when_status_absent() { + fn decode_register_items_response_returns_empty_status_when_absent() { + // Per the live wire capture, the server returns an empty + // `` (Status array) when an operation fails (e.g. + // `successField=false` + `resultCodeField=1`). Decode now + // tolerates this rather than erroring with `MissingField` — + // callers inspect `result_code` for the failure reason. let body = asbidata_request_body("RegisterItemsResponse", &[]); - let err = decode_register_items_response(&body).unwrap_err(); - assert!(matches!( - err, - OperationError::MissingField { field: "Status" } - )); + let response = decode_register_items_response(&body).unwrap(); + assert!(response.status.is_empty()); + assert!(!response.item_capabilities_present); + assert_eq!(response.result_code, None); + assert_eq!(response.success, None); } #[test]