[M5] mxaccess-asb: register_items retry on InvalidConnectionId — LIVE PATH WORKS
rust / build / test / clippy / fmt (push) Has been cancelled
rust / build / test / clippy / fmt (push) Has been cancelled
End-to-end live path now functional: Connect → AuthenticateMe →
RegisterItems → Read → Disconnect. The example reads back the live
TestChildObject.TestInt value (99) over the wire on the first run.
Root-cause of the previous "InvalidConnectionId" mystery: it was
never an HMAC verification failure. `AuthenticateMe` is one-way
(`AsbContracts.cs:18`) and the server commits auth state
asynchronously after the request lands. A Register that follows too
quickly sees the connection in pre-authenticated state and returns
`AsbErrorCode.InvalidConnectionId` (= 1).
.NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) handles this
explicitly with a retry loop:
for (int attempt = 1; attempt < 5
&& response.Result.ErrorCode == InvalidConnectionId; attempt++)
{
Thread.Sleep(TimeSpan.FromMilliseconds(100 * attempt));
response = RegisterOnce(items);
}
We now mirror the same pattern in `AsbClient::register_items_once`
followed by a retry loop in `register_items` — up to 5 attempts with
`100 * attempt` ms backoff.
Supporting changes:
- `RegisterItemsResponse` gains `result_code: Option<u32>` +
`success: Option<bool>` so callers can read `Result.resultCodeField`
+ `successField` from the response. `decode_register_items_response`
now tolerates an empty `<ASBIData />` Status array (server returns
empty when the operation fails server-side) instead of erroring
with `MissingField`. New helper `find_text_in_named_element` walks
the body token stream.
- New public constant `RESULT_CODE_INVALID_CONNECTION_ID = 1` for
callers that want to detect this status outside the retry path.
- The previously-failing test `decode_register_items_response_returns_
missing_field_when_status_absent` was renamed and rewritten as
`decode_register_items_response_returns_empty_status_when_absent`
to match the new tolerant decode contract.
F31 closed. F30 (read-side dict-id resolution, landed in `eb6c689`)
was the unblocker — without it we couldn't see the
`<resultCodeField>1</>` element in the response and the failure mode
looked like a HMAC mismatch instead of a transient retryable error.
Workspace: 711 unit tests pass. Clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+3
-13
@@ -141,20 +141,10 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr
|
||||
|
||||
**Resolves when:** `decode_tokens` (or a post-pass over the token stream) substitutes `NbfxName::Static(id)` with `NbfxName::Inline(name)` whenever the dict id resolves to a known string. The dynamic dict (`read_dictionary`) accumulates session strings via `intern`; the read-path needs the parallel session counter to map wire ids to slots — wire ids are odd and session-cumulative across messages, mirroring the F28 fix on the write side. **Resolves**: F25 live data path (Read/Write/Subscribe responses are all dict-encoded too).
|
||||
|
||||
### F31 — `AuthenticateMe` HMAC silently invalid on the server (resultCode = `InvalidConnectionId`)
|
||||
**Severity:** P1 — gates every signed and unsigned operation after Connect.
|
||||
**Source:** Live capture + F30 dict-id resolution exposing the response `<b:resultCodeField>1</b:resultCodeField>` (= `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`) plus `<b:successField>false</b:successField>`.
|
||||
### F30 — Resolve dict-id element/attribute names on the read side (RESOLVED, commit `eb6c689`)
|
||||
|
||||
**Why this is mysterious:** the entire crypto stack is proven byte-equal to .NET (commit `ce27b63` deterministic HMAC fixture covers DH, crypto_key, HMAC-SHA1, PBKDF2-SHA1, AES-CBC PKCS7), the canonical XML emitter is fixture-validated against `request.ToXml()` (commit `f14580e`), the registry DH params are honoured (commit `f14580e`), and the wire-level `<h:ConnectionValidator>` now carries the same four xmlns declarations .NET emits (`xmlns:h`, default `xmlns`, `xmlns:xsi`, `xmlns:xsd` all in this commit). Yet the server reports `InvalidConnectionId` on Register, indicating that AuthenticateMe's HMAC failed to verify and the server discarded the connection state.
|
||||
|
||||
**Investigation done:** side-by-side `MX_ASB_TRACE_DERIVE` confirms passphrase bytes [96..176] of the crypto_key match .NET (commit `fd38189`); shared_secret bytes diverge per session because each peer chooses its own DH random, but the client+server pair derives the same value by construction.
|
||||
|
||||
**Hypotheses still standing:**
|
||||
- The server's canonical-XML reconstruction uses `new XmlSerializer(type)` without the `"urn:invensys.schemas"` default namespace that the client passes in `AsbSerialization.cs:27` — would produce different bytes, mismatching HMAC. Untestable from outside the server.
|
||||
- A subtle byte-level wire difference that affects deserialization (e.g. an attribute the server's XmlSerializer requires but XmlBinaryReader normalizes differently). Hard to find without server logs.
|
||||
- Some other state the server tracks per-connection that we're not setting (e.g. a session token from `ServiceAuthenticationData` we ignore). The `ConnectResponse.ServiceAuthenticationData` is currently parsed but not fed back into anything; .NET's `AsbSystemAuthenticator` may use it for a downstream verification we're missing.
|
||||
|
||||
**Resolves when:** Either (a) the server is instrumented (`IncludeExceptionDetailInFaults` on the WCF service config, or a TraceListener on `System.ServiceModel.MessageLogging`) to surface the actual deserialization / HMAC mismatch reason; or (b) we capture .NET probe HMAC bytes alongside Rust HMAC bytes for a controlled scenario (fixed DH private key on both ends) and identify the byte-level divergence.
|
||||
### F31 — InvalidConnectionId on first Register after AuthenticateMe — RESOLVED via retry
|
||||
**Resolved:** `<this commit>`. Not a HMAC bug after all — `AsbErrorCode.InvalidConnectionId` (= 1) is a **transient race** condition that .NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) explicitly handles with a retry loop (`for (int attempt = 1; attempt < 5 && response.Result.ErrorCode == InvalidConnectionId; attempt++)` with `100*attempt` ms backoff). `AuthenticateMe` is one-way (`AsbContracts.cs:18`); the server commits auth state asynchronously after the request lands, and a Register that arrives too quickly sees the connection in pre-authenticated state. `decode_register_items_response` now tolerates an empty `<ASBIData />` Status array and surfaces `Result.resultCodeField` + `successField`; `AsbClient::register_items` retries up to 5 times on `RESULT_CODE_INVALID_CONNECTION_ID`, mirroring .NET. **Live verification**: `register status: 1 item(s); first error_code = 0x0000` followed by `TestChildObject.TestInt = AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }` — the real tag value `99` over the live wire, end-to-end.
|
||||
|
||||
### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte)
|
||||
**Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs).
|
||||
|
||||
@@ -547,12 +547,43 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
|
||||
}
|
||||
|
||||
/// `RegisterItems` operation — sends a signed `RegisterItemsIn`
|
||||
/// SOAP envelope and decodes the `RegisterItemsResponse`.
|
||||
/// SOAP envelope and decodes the `RegisterItemsResponse`. Retries
|
||||
/// up to 5 times with `100 * attempt` ms backoff on
|
||||
/// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's
|
||||
/// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists
|
||||
/// because `AuthenticateMe` is one-way: the server may not have
|
||||
/// finished processing it before our first Register lands, in
|
||||
/// which case it sees an unauthenticated connection and returns
|
||||
/// `InvalidConnectionId`. A short backoff lets the auth state
|
||||
/// commit on the server side.
|
||||
pub async fn register_items(
|
||||
&mut self,
|
||||
items: &[ItemIdentity],
|
||||
require_id: bool,
|
||||
register_only: bool,
|
||||
) -> Result<RegisterItemsResponse, ClientError> {
|
||||
let mut response = self
|
||||
.register_items_once(items, require_id, register_only)
|
||||
.await?;
|
||||
let mut attempt = 1u32;
|
||||
while attempt < 5
|
||||
&& response.result_code
|
||||
== Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID)
|
||||
{
|
||||
tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await;
|
||||
response = self
|
||||
.register_items_once(items, require_id, register_only)
|
||||
.await?;
|
||||
attempt += 1;
|
||||
}
|
||||
Ok(response)
|
||||
}
|
||||
|
||||
async fn register_items_once(
|
||||
&mut self,
|
||||
items: &[ItemIdentity],
|
||||
require_id: bool,
|
||||
register_only: bool,
|
||||
) -> Result<RegisterItemsResponse, ClientError> {
|
||||
let pre_signing = ConnectionValidator {
|
||||
connection_id: self.authenticator.connection_id(),
|
||||
|
||||
@@ -1080,8 +1080,24 @@ pub struct RegisterItemsResponse {
|
||||
/// Whether the `<ItemCapabilities>` element appeared. Decoding the
|
||||
/// individual `ItemRegistration` records is a future iteration.
|
||||
pub item_capabilities_present: bool,
|
||||
/// `Result.resultCodeField` from the response — `0` is success,
|
||||
/// `1` is `InvalidConnectionId` (transient race with the one-way
|
||||
/// AuthenticateMe), see `AsbResultMapping.cs:6` for the full enum.
|
||||
/// `None` if the field wasn't found in the response.
|
||||
pub result_code: Option<u32>,
|
||||
/// `Result.successField` — `false` means the operation failed
|
||||
/// server-side and the per-item Status array is empty.
|
||||
pub success: Option<bool>,
|
||||
}
|
||||
|
||||
/// `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`.
|
||||
/// Surfaces as `Result.resultCodeField=1` when the server has not
|
||||
/// (yet) processed our one-way AuthenticateMe and treats the
|
||||
/// connection as unauthenticated. .NET's `MxAsbDataClient.RegisterMany`
|
||||
/// (`cs:191-204`) retries up to 5 times with a 100*N ms backoff per
|
||||
/// attempt — we mirror that pattern in `AsbClient::register_items`.
|
||||
pub const RESULT_CODE_INVALID_CONNECTION_ID: u32 = 1;
|
||||
|
||||
/// Decoded `UnregisterItemsResponse`. Single field: the per-item
|
||||
/// `Status` array (`AsbContracts.cs:153-159`).
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
@@ -1090,23 +1106,75 @@ pub struct UnregisterItemsResponse {
|
||||
}
|
||||
|
||||
/// Decode a `RegisterItemsResponse` SOAP body from the NBFX token
|
||||
/// stream returned by [`crate::decode_envelope`].
|
||||
/// stream returned by [`crate::decode_envelope`]. Tolerates an empty
|
||||
/// or missing `<ASBIData>` (Status array) — that's how the server
|
||||
/// signals an operation-level failure (e.g. `successField=false` +
|
||||
/// non-zero `resultCodeField`). Caller is expected to inspect
|
||||
/// `result_code` for transient failures like InvalidConnectionId
|
||||
/// and retry where appropriate.
|
||||
pub fn decode_register_items_response(
|
||||
body_tokens: &[NbfxToken],
|
||||
) -> Result<RegisterItemsResponse, OperationError> {
|
||||
let payloads = collect_asbidata_payloads(body_tokens);
|
||||
let status_payload = payloads
|
||||
.into_iter()
|
||||
.next()
|
||||
.ok_or(OperationError::MissingField { field: "Status" })?;
|
||||
let status = decode_item_status_array(&status_payload)?;
|
||||
let status = match payloads.into_iter().next() {
|
||||
Some(payload) if !payload.is_empty() => decode_item_status_array(&payload)?,
|
||||
_ => Vec::new(),
|
||||
};
|
||||
let item_capabilities_present = find_element_named(body_tokens, "ItemCapabilities").is_some();
|
||||
let result_code = find_text_in_named_element(body_tokens, "resultCodeField")
|
||||
.and_then(|s| s.parse().ok());
|
||||
let success = find_text_in_named_element(body_tokens, "successField")
|
||||
.map(|s| s.eq_ignore_ascii_case("true"));
|
||||
Ok(RegisterItemsResponse {
|
||||
status,
|
||||
item_capabilities_present,
|
||||
result_code,
|
||||
success,
|
||||
})
|
||||
}
|
||||
|
||||
/// Walk the token stream looking for an element with the given local
|
||||
/// name (inline match) and return its first text child as a string.
|
||||
/// Used to extract `Result.resultCodeField`, `successField`, etc.
|
||||
/// from the structured RegisterItemsResponse body.
|
||||
fn find_text_in_named_element(tokens: &[NbfxToken], name: &str) -> Option<String> {
|
||||
let mut idx = 0;
|
||||
while let Some(tok) = tokens.get(idx) {
|
||||
if let NbfxToken::Element {
|
||||
name: NbfxName::Inline(local),
|
||||
..
|
||||
} = tok
|
||||
{
|
||||
if local == name {
|
||||
let mut inner = idx + 1;
|
||||
while matches!(
|
||||
tokens.get(inner),
|
||||
Some(NbfxToken::Attribute { .. })
|
||||
| Some(NbfxToken::DefaultNamespace { .. })
|
||||
| Some(NbfxToken::NamespaceDeclaration { .. })
|
||||
) {
|
||||
inner += 1;
|
||||
}
|
||||
if let Some(NbfxToken::Text(text)) = tokens.get(inner) {
|
||||
return Some(match text {
|
||||
NbfxText::Chars(s) => s.clone(),
|
||||
NbfxText::Zero => "0".to_string(),
|
||||
NbfxText::One => "1".to_string(),
|
||||
NbfxText::Bool(b) => b.to_string(),
|
||||
NbfxText::Int8(n) => n.to_string(),
|
||||
NbfxText::Int16(n) => n.to_string(),
|
||||
NbfxText::Int32(n) => n.to_string(),
|
||||
NbfxText::Int64(n) => n.to_string(),
|
||||
_ => return None,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
idx += 1;
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
/// Decode an `UnregisterItemsResponse` SOAP body.
|
||||
pub fn decode_unregister_items_response(
|
||||
body_tokens: &[NbfxToken],
|
||||
@@ -1610,13 +1678,18 @@ mod tests {
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn decode_register_items_response_returns_missing_field_when_status_absent() {
|
||||
fn decode_register_items_response_returns_empty_status_when_absent() {
|
||||
// Per the live wire capture, the server returns an empty
|
||||
// `<ASBIData />` (Status array) when an operation fails (e.g.
|
||||
// `successField=false` + `resultCodeField=1`). Decode now
|
||||
// tolerates this rather than erroring with `MissingField` —
|
||||
// callers inspect `result_code` for the failure reason.
|
||||
let body = asbidata_request_body("RegisterItemsResponse", &[]);
|
||||
let err = decode_register_items_response(&body).unwrap_err();
|
||||
assert!(matches!(
|
||||
err,
|
||||
OperationError::MissingField { field: "Status" }
|
||||
));
|
||||
let response = decode_register_items_response(&body).unwrap();
|
||||
assert!(response.status.is_empty());
|
||||
assert!(!response.item_capabilities_present);
|
||||
assert_eq!(response.result_code, None);
|
||||
assert_eq!(response.success, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
Reference in New Issue
Block a user