[M5] mxaccess-asb: register_items retry on InvalidConnectionId — LIVE PATH WORKS
rust / build / test / clippy / fmt (push) Has been cancelled

End-to-end live path now functional: Connect → AuthenticateMe →
RegisterItems → Read → Disconnect. The example reads back the live
TestChildObject.TestInt value (99) over the wire on the first run.

Root-cause of the previous "InvalidConnectionId" mystery: it was
never an HMAC verification failure. `AuthenticateMe` is one-way
(`AsbContracts.cs:18`) and the server commits auth state
asynchronously after the request lands. A Register that follows too
quickly sees the connection in pre-authenticated state and returns
`AsbErrorCode.InvalidConnectionId` (= 1).

.NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) handles this
explicitly with a retry loop:

  for (int attempt = 1; attempt < 5
       && response.Result.ErrorCode == InvalidConnectionId; attempt++)
  {
      Thread.Sleep(TimeSpan.FromMilliseconds(100 * attempt));
      response = RegisterOnce(items);
  }

We now mirror the same pattern in `AsbClient::register_items_once`
followed by a retry loop in `register_items` — up to 5 attempts with
`100 * attempt` ms backoff.

Supporting changes:
- `RegisterItemsResponse` gains `result_code: Option<u32>` +
  `success: Option<bool>` so callers can read `Result.resultCodeField`
  + `successField` from the response. `decode_register_items_response`
  now tolerates an empty `<ASBIData />` Status array (server returns
  empty when the operation fails server-side) instead of erroring
  with `MissingField`. New helper `find_text_in_named_element` walks
  the body token stream.
- New public constant `RESULT_CODE_INVALID_CONNECTION_ID = 1` for
  callers that want to detect this status outside the retry path.
- The previously-failing test `decode_register_items_response_returns_
  missing_field_when_status_absent` was renamed and rewritten as
  `decode_register_items_response_returns_empty_status_when_absent`
  to match the new tolerant decode contract.

F31 closed. F30 (read-side dict-id resolution, landed in `eb6c689`)
was the unblocker — without it we couldn't see the
`<resultCodeField>1</>` element in the response and the failure mode
looked like a HMAC mismatch instead of a transient retryable error.

Workspace: 711 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-05 21:02:38 -04:00
parent eb6c689f09
commit 9063f10b1b
3 changed files with 120 additions and 26 deletions
+3 -13
View File
@@ -141,20 +141,10 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr
**Resolves when:** `decode_tokens` (or a post-pass over the token stream) substitutes `NbfxName::Static(id)` with `NbfxName::Inline(name)` whenever the dict id resolves to a known string. The dynamic dict (`read_dictionary`) accumulates session strings via `intern`; the read-path needs the parallel session counter to map wire ids to slots — wire ids are odd and session-cumulative across messages, mirroring the F28 fix on the write side. **Resolves**: F25 live data path (Read/Write/Subscribe responses are all dict-encoded too).
### F31`AuthenticateMe` HMAC silently invalid on the server (resultCode = `InvalidConnectionId`)
**Severity:** P1 — gates every signed and unsigned operation after Connect.
**Source:** Live capture + F30 dict-id resolution exposing the response `<b:resultCodeField>1</b:resultCodeField>` (= `AsbErrorCode.InvalidConnectionId` per `AsbResultMapping.cs:6`) plus `<b:successField>false</b:successField>`.
### F30Resolve dict-id element/attribute names on the read side (RESOLVED, commit `eb6c689`)
**Why this is mysterious:** the entire crypto stack is proven byte-equal to .NET (commit `ce27b63` deterministic HMAC fixture covers DH, crypto_key, HMAC-SHA1, PBKDF2-SHA1, AES-CBC PKCS7), the canonical XML emitter is fixture-validated against `request.ToXml()` (commit `f14580e`), the registry DH params are honoured (commit `f14580e`), and the wire-level `<h:ConnectionValidator>` now carries the same four xmlns declarations .NET emits (`xmlns:h`, default `xmlns`, `xmlns:xsi`, `xmlns:xsd` all in this commit). Yet the server reports `InvalidConnectionId` on Register, indicating that AuthenticateMe's HMAC failed to verify and the server discarded the connection state.
**Investigation done:** side-by-side `MX_ASB_TRACE_DERIVE` confirms passphrase bytes [96..176] of the crypto_key match .NET (commit `fd38189`); shared_secret bytes diverge per session because each peer chooses its own DH random, but the client+server pair derives the same value by construction.
**Hypotheses still standing:**
- The server's canonical-XML reconstruction uses `new XmlSerializer(type)` without the `"urn:invensys.schemas"` default namespace that the client passes in `AsbSerialization.cs:27` — would produce different bytes, mismatching HMAC. Untestable from outside the server.
- A subtle byte-level wire difference that affects deserialization (e.g. an attribute the server's XmlSerializer requires but XmlBinaryReader normalizes differently). Hard to find without server logs.
- Some other state the server tracks per-connection that we're not setting (e.g. a session token from `ServiceAuthenticationData` we ignore). The `ConnectResponse.ServiceAuthenticationData` is currently parsed but not fed back into anything; .NET's `AsbSystemAuthenticator` may use it for a downstream verification we're missing.
**Resolves when:** Either (a) the server is instrumented (`IncludeExceptionDetailInFaults` on the WCF service config, or a TraceListener on `System.ServiceModel.MessageLogging`) to surface the actual deserialization / HMAC mismatch reason; or (b) we capture .NET probe HMAC bytes alongside Rust HMAC bytes for a controlled scenario (fixed DH private key on both ends) and identify the byte-level divergence.
### F31 — InvalidConnectionId on first Register after AuthenticateMe — RESOLVED via retry
**Resolved:** `<this commit>`. Not a HMAC bug after all — `AsbErrorCode.InvalidConnectionId` (= 1) is a **transient race** condition that .NET's `MxAsbDataClient.RegisterMany` (`cs:191-204`) explicitly handles with a retry loop (`for (int attempt = 1; attempt < 5 && response.Result.ErrorCode == InvalidConnectionId; attempt++)` with `100*attempt` ms backoff). `AuthenticateMe` is one-way (`AsbContracts.cs:18`); the server commits auth state asynchronously after the request lands, and a Register that arrives too quickly sees the connection in pre-authenticated state. `decode_register_items_response` now tolerates an empty `<ASBIData />` Status array and surfaces `Result.resultCodeField` + `successField`; `AsbClient::register_items` retries up to 5 times on `RESULT_CODE_INVALID_CONNECTION_ID`, mirroring .NET. **Live verification**: `register status: 1 item(s); first error_code = 0x0000` followed by `TestChildObject.TestInt = AsbVariant { type_id: 4, length: 4, payload: [99, 0, 0, 0] }` — the real tag value `99` over the live wire, end-to-end.
### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte)
**Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs).