[M5] mxaccess-asb: register_items retry on InvalidConnectionId — LIVE PATH WORKS
rust / build / test / clippy / fmt (push) Has been cancelled

End-to-end live path now functional: Connect → AuthenticateMe →
RegisterItems → Read → Disconnect. The example reads back the live
TestChildObject.TestInt value (99) over the wire on the first run.

Root-cause of the previous "InvalidConnectionId" mystery: it was
never an HMAC verification failure. `AuthenticateMe` is one-way
(`AsbContracts.cs:18`) and the server commits auth state
asynchronously after the request lands. A Register that follows too
quickly sees the connection in pre-authenticated state and returns
`AsbErrorCode.InvalidConnectionId` (= 1).

.NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) handles this
explicitly with a retry loop:

  for (int attempt = 1; attempt < 5
       && response.Result.ErrorCode == InvalidConnectionId; attempt++)
  {
      Thread.Sleep(TimeSpan.FromMilliseconds(100 * attempt));
      response = RegisterOnce(items);
  }

We now mirror the same pattern in `AsbClient::register_items_once`
followed by a retry loop in `register_items` — up to 5 attempts with
`100 * attempt` ms backoff.

Supporting changes:
- `RegisterItemsResponse` gains `result_code: Option<u32>` +
  `success: Option<bool>` so callers can read `Result.resultCodeField`
  + `successField` from the response. `decode_register_items_response`
  now tolerates an empty `<ASBIData />` Status array (server returns
  empty when the operation fails server-side) instead of erroring
  with `MissingField`. New helper `find_text_in_named_element` walks
  the body token stream.
- New public constant `RESULT_CODE_INVALID_CONNECTION_ID = 1` for
  callers that want to detect this status outside the retry path.
- The previously-failing test `decode_register_items_response_returns_
  missing_field_when_status_absent` was renamed and rewritten as
  `decode_register_items_response_returns_empty_status_when_absent`
  to match the new tolerant decode contract.

F31 closed. F30 (read-side dict-id resolution, landed in `eb6c689`)
was the unblocker — without it we couldn't see the
`<resultCodeField>1</>` element in the response and the failure mode
looked like a HMAC mismatch instead of a transient retryable error.

Workspace: 711 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-05 21:02:38 -04:00
parent eb6c689f09
commit 9063f10b1b
3 changed files with 120 additions and 26 deletions
+3 -13
View File
@@ -141,20 +141,10 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr
**Resolves when:** `decode_tokens` (or a post-pass over the token stream) substitutes `NbfxName::Static(id)` with `NbfxName::Inline(name)` whenever the dict id resolves to a known string. The dynamic dict (`read_dictionary`) accumulates session strings via `intern`; the read-path needs the parallel session counter to map wire ids to slots — wire ids are odd and session-cumulative across messages, mirroring the F28 fix on the write side. **Resolves**: F25 live data path (Read/Write/Subscribe responses are all dict-encoded too).
### F31`AuthenticateMe` HMAC silently invalid on the server (resultCode = `InvalidConnectionId`)
**Severity:** P1 — gates every signed and unsigned operation after Connect.
**Source:** Live capture + F30 dict-id resolution exposing the response `<b:resultCodeField>1</b:resultCodeField>` (= `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`) plus `<b:successField>false</b:successField>`.
### F30Resolve dict-id element/attribute names on the read side (RESOLVED, commit `eb6c689`)
**Why this is mysterious:** the entire crypto stack is proven byte-equal to .NET (commit `ce27b63` deterministic HMAC fixture covers DH, crypto_key, HMAC-SHA1, PBKDF2-SHA1, AES-CBC PKCS7), the canonical XML emitter is fixture-validated against `request.ToXml()` (commit `f14580e`), the registry DH params are honoured (commit `f14580e`), and the wire-level `<h:ConnectionValidator>` now carries the same four xmlns declarations .NET emits (`xmlns:h`, default `xmlns`, `xmlns:xsi`, `xmlns:xsd` all in this commit). Yet the server reports `InvalidConnectionId` on Register, indicating that AuthenticateMe's HMAC failed to verify and the server discarded the connection state.
**Investigation done:** side-by-side `MX_ASB_TRACE_DERIVE` confirms passphrase bytes [96..176] of the crypto_key match .NET (commit `fd38189`); shared_secret bytes diverge per session because each peer chooses its own DH random, but the client+server pair derives the same value by construction.
**Hypotheses still standing:**
- The server's canonical-XML reconstruction uses `new XmlSerializer(type)` without the `"urn:invensys.schemas"` default namespace that the client passes in `AsbSerialization.cs:27` — would produce different bytes, mismatching HMAC. Untestable from outside the server.
- A subtle byte-level wire difference that affects deserialization (e.g. an attribute the server's XmlSerializer requires but XmlBinaryReader normalizes differently). Hard to find without server logs.
- Some other state the server tracks per-connection that we're not setting (e.g. a session token from `ServiceAuthenticationData` we ignore). The `ConnectResponse.ServiceAuthenticationData` is currently parsed but not fed back into anything; .NET's `AsbSystemAuthenticator` may use it for a downstream verification we're missing.
**Resolves when:** Either (a) the server is instrumented (`IncludeExceptionDetailInFaults` on the WCF service config, or a TraceListener on `System.ServiceModel.MessageLogging`) to surface the actual deserialization / HMAC mismatch reason; or (b) we capture .NET probe HMAC bytes alongside Rust HMAC bytes for a controlled scenario (fixed DH private key on both ends) and identify the byte-level divergence.
### F31 — InvalidConnectionId on first Register after AuthenticateMe — RESOLVED via retry
**Resolved:** `<this commit>`. Not a HMAC bug after all — `AsbErrorCode.InvalidConnectionId` (= 1) is a **transient race** condition that .NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) explicitly handles with a retry loop (`for (int attempt = 1; attempt < 5 && response.Result.ErrorCode == InvalidConnectionId; attempt++)` with `100*attempt` ms backoff). `AuthenticateMe` is one-way (`AsbContracts.cs:18`); the server commits auth state asynchronously after the request lands, and a Register that arrives too quickly sees the connection in pre-authenticated state. `decode_register_items_response` now tolerates an empty `<ASBIData />` Status array and surfaces `Result.resultCodeField` + `successField`; `AsbClient::register_items` retries up to 5 times on `RESULT_CODE_INVALID_CONNECTION_ID`, mirroring .NET. **Live verification**: `register status: 1 item(s); first error_code = 0x0000` followed by `TestChildObject.TestInt = AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }` — the real tag value `99` over the live wire, end-to-end.
### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte)
**Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs).
+32 -1
View File
@@ -547,12 +547,43 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
}
/// `RegisterItems` operation — sends a signed `RegisterItemsIn`
/// SOAP envelope and decodes the `RegisterItemsResponse`.
/// SOAP envelope and decodes the `RegisterItemsResponse`. Retries
/// up to 5 times with `100 * attempt` ms backoff on
/// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's
/// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists
/// because `AuthenticateMe` is one-way: the server may not have
/// finished processing it before our first Register lands, in
/// which case it sees an unauthenticated connection and returns
/// `InvalidConnectionId`. A short backoff lets the auth state
/// commit on the server side.
pub async fn register_items(
&mut self,
items: &[ItemIdentity],
require_id: bool,
register_only: bool,
) -> Result<RegisterItemsResponse, ClientError> {
let mut response = self
.register_items_once(items, require_id, register_only)
.await?;
let mut attempt = 1u32;
while attempt < 5
&& response.result_code
== Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID)
{
tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await;
response = self
.register_items_once(items, require_id, register_only)
.await?;
attempt += 1;
}
Ok(response)
}
async fn register_items_once(
&mut self,
items: &[ItemIdentity],
require_id: bool,
register_only: bool,
) -> Result<RegisterItemsResponse, ClientError> {
let pre_signing = ConnectionValidator {
connection_id: self.authenticator.connection_id(),
+85 -12
View File
@@ -1080,8 +1080,24 @@ pub struct RegisterItemsResponse {
/// Whether the `<ItemCapabilities>` element appeared. Decoding the
/// individual `ItemRegistration` records is a future iteration.
pub item_capabilities_present: bool,
/// `Result.resultCodeField` from the response — `0` is success,
/// `1` is `InvalidConnectionId` (transient race with the one-way
/// AuthenticateMe), see `AsbResultMapping.cs:6` for the full enum.
/// `None` if the field wasn't found in the response.
pub result_code: Option<u32>,
/// `Result.successField` — `false` means the operation failed
/// server-side and the per-item Status array is empty.
pub success: Option<bool>,
}
/// `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`.
/// Surfaces as `Result.resultCodeField=1` when the server has not
/// (yet) processed our one-way AuthenticateMe and treats the
/// connection as unauthenticated. .NET's `MxAsbDataClient.RegisterMany`
/// (`cs:191-204`) retries up to 5 times with a 100*N ms backoff per
/// attempt — we mirror that pattern in `AsbClient::register_items`.
pub const RESULT_CODE_INVALID_CONNECTION_ID: u32 = 1;
/// Decoded `UnregisterItemsResponse`. Single field: the per-item
/// `Status` array (`AsbContracts.cs:153-159`).
#[derive(Debug, Clone, PartialEq)]
@@ -1090,23 +1106,75 @@ pub struct UnregisterItemsResponse {
}
/// Decode a `RegisterItemsResponse` SOAP body from the NBFX token
/// stream returned by [`crate::decode_envelope`].
/// stream returned by [`crate::decode_envelope`]. Tolerates an empty
/// or missing `<ASBIData>` (Status array) — that's how the server
/// signals an operation-level failure (e.g. `successField=false` +
/// non-zero `resultCodeField`). Caller is expected to inspect
/// `result_code` for transient failures like InvalidConnectionId
/// and retry where appropriate.
pub fn decode_register_items_response(
body_tokens: &[NbfxToken],
) -> Result<RegisterItemsResponse, OperationError> {
let payloads = collect_asbidata_payloads(body_tokens);
let status_payload = payloads
.into_iter()
.next()
.ok_or(OperationError::MissingField { field: "Status" })?;
let status = decode_item_status_array(&status_payload)?;
let status = match payloads.into_iter().next() {
Some(payload) if !payload.is_empty() => decode_item_status_array(&payload)?,
_ => Vec::new(),
};
let item_capabilities_present = find_element_named(body_tokens, "ItemCapabilities").is_some();
let result_code = find_text_in_named_element(body_tokens, "resultCodeField")
.and_then(|s| s.parse().ok());
let success = find_text_in_named_element(body_tokens, "successField")
.map(|s| s.eq_ignore_ascii_case("true"));
Ok(RegisterItemsResponse {
status,
item_capabilities_present,
result_code,
success,
})
}
/// Walk the token stream looking for an element with the given local
/// name (inline match) and return its first text child as a string.
/// Used to extract `Result.resultCodeField`, `successField`, etc.
/// from the structured RegisterItemsResponse body.
fn find_text_in_named_element(tokens: &[NbfxToken], name: &str) -> Option<String> {
let mut idx = 0;
while let Some(tok) = tokens.get(idx) {
if let NbfxToken::Element {
name: NbfxName::Inline(local),
..
} = tok
{
if local == name {
let mut inner = idx + 1;
while matches!(
tokens.get(inner),
Some(NbfxToken::Attribute { .. })
| Some(NbfxToken::DefaultNamespace { .. })
| Some(NbfxToken::NamespaceDeclaration { .. })
) {
inner += 1;
}
if let Some(NbfxToken::Text(text)) = tokens.get(inner) {
return Some(match text {
NbfxText::Chars(s) => s.clone(),
NbfxText::Zero => "0".to_string(),
NbfxText::One => "1".to_string(),
NbfxText::Bool(b) => b.to_string(),
NbfxText::Int8(n) => n.to_string(),
NbfxText::Int16(n) => n.to_string(),
NbfxText::Int32(n) => n.to_string(),
NbfxText::Int64(n) => n.to_string(),
_ => return None,
});
}
}
}
idx += 1;
}
None
}
/// Decode an `UnregisterItemsResponse` SOAP body.
pub fn decode_unregister_items_response(
body_tokens: &[NbfxToken],
@@ -1610,13 +1678,18 @@ mod tests {
}
#[test]
fn decode_register_items_response_returns_missing_field_when_status_absent() {
fn decode_register_items_response_returns_empty_status_when_absent() {
// Per the live wire capture, the server returns an empty
// `<ASBIData />` (Status array) when an operation fails (e.g.
// `successField=false` + `resultCodeField=1`). Decode now
// tolerates this rather than erroring with `MissingField` —
// callers inspect `result_code` for the failure reason.
let body = asbidata_request_body("RegisterItemsResponse", &[]);
let err = decode_register_items_response(&body).unwrap_err();
assert!(matches!(
err,
OperationError::MissingField { field: "Status" }
));
let response = decode_register_items_response(&body).unwrap();
assert!(response.status.is_empty());
assert!(!response.item_capabilities_present);
assert_eq!(response.result_code, None);
assert_eq!(response.success, None);
}
#[test]