[M5] mxaccess-asb-nettcp/asb: Connect handshake live + SOAP fault detection

Live-bring-up reconciliation against AVEVA's MxDataProvider on Windows.
Connect now completes end-to-end (real DH key exchange, apollo:V2
encryption, ServicePublicKey/ServiceAuthenticationData populated). Five
fixes land:

1. NBFX `PrefixElement_a..z` (0x5E-0x77) and `PrefixAttribute_a..z`
   (0x26-0x3F) decode + encode arms. The server's ConnectResponse hit
   `0x65 = PrefixElement_h` for a dynamically-named element and our
   decoder bailed with `unknown NBFX record byte 0x65`. Both directions
   now round-trip; encoder picks short-form when prefix is a single
   lowercase ASCII letter.

2. xmlns redeclaration on `<Data>` AND `<InitializationVector>` inside
   `AuthenticationData` / `PublicKey`. `[XmlType(Namespace = ...)]` on
   AuthenticationData / PublicKey (`AsbContracts.cs:350-381`) means
   XmlSerializer emits `xmlns="..."` on each direct child. The default-
   ns scope ends at `</Data>`, so `<InitializationVector>` needs its own
   redeclaration to stay in the data namespace; without it the server
   fell back to messages-namespace and the deserialiser threw an
   `InternalServiceFault`.

3. SOAP-fault detection in `AsbClient::send_envelope`. New
   `ClientError::SoapFault { action, code, reason }` surfaces when the
   response Action header matches the canonical `dispatcher/fault`
   template; previously body decoders blindly ran and surfaced
   `MissingField { field: "Status" }` masking the actual fault. Reason
   text is extracted as the longest `NbfxText::Chars` in the body —
   robust against the `nbfs.rs` static-dictionary id mismatches.

4. Identified blocker (filed as F28): signed-request HMAC currently
   covers the NBFX wire bytes, but .NET's `AsbSystemAuthenticator.Sign`
   HMACs `Encoding.UTF8.GetBytes(request.ToXml())` — the canonical XML
   serialisation via `XmlSerializer` with namespace
   `urn:invensys.schemas` (`AsbSerialization.cs:12-48`). Until the Rust
   port emits identical XML bytes for `ConnectedRequest` subclasses,
   AuthenticateMe / RegisterItems / every signed RPC fault on the
   server. Connect itself is unsigned (`ServiceMessage` not
   `ConnectedRequest`) which is why it works today.

5. Identified `nbfs.rs` static-dictionary id drift (filed as F29): wire
   uses Fault=134/Code=142/Reason=144/Text=146/Value=154/Subcode=156
   but our table has them at 114/122/124/126/134/136. Off by 20 from
   id 114+ — 10 missing entries between `s` (id 112) and `Fault`. No
   request-side impact (we only encode IDs ≤44, all correct); the SOAP
   fault decode walks text records directly so it sidesteps the issue.

Workspace: 702 tests pass (no test count delta — wire-only fixes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-05 16:29:12 -04:00
parent 4c4177050c
commit d1e887b91b
4 changed files with 172 additions and 1 deletions
+24 -1
View File
@@ -46,7 +46,17 @@ move to `## Resolved` with a date + commit hash.
**Resolves when:** F19-F26 are all closed and the four DoD bullets above pass.
**Cumulative execution log.** F19 + F23 (`ed17c07`); F24 (`7611d9e`); F20 (`9dfd193`); F22 (`43c10a1`); F21 (`5f98558`); F25 step 1 (`25dbd8d`); F25 step 2 (`a2b8989`); F25 step 3 (`c4bf0a0`); F25 step 4 (`1e59249`); F25 step 5 (`9b8133f`); F25 step 6 (`321b796`); F25 step 7 (`1b1ee1e`); F26 step 1 (`8a0f92b`); F26 step 2 (`14bb529`); example rewrite (`c6570dc`); F25 step 8 (`b543eb1`); F25 step 9 (`0441a2e`); F25 step 10 (`9876b4e`); F26 step 3 landed in this commit:
**Cumulative execution log.** F19 + F23 (`ed17c07`); F24 (`7611d9e`); F20 (`9dfd193`); F22 (`43c10a1`); F21 (`5f98558`); F25 step 1 (`25dbd8d`); F25 step 2 (`a2b8989`); F25 step 3 (`c4bf0a0`); F25 step 4 (`1e59249`); F25 step 5 (`9b8133f`); F25 step 6 (`321b796`); F25 step 7 (`1b1ee1e`); F26 step 1 (`8a0f92b`); F26 step 2 (`14bb529`); example rewrite (`c6570dc`); F25 step 8 (`b543eb1`); F25 step 9 (`0441a2e`); F25 step 10 (`9876b4e`); F26 step 3 (`<previous>`); **F25 live-bring-up reconciliation** (this commit):
- F25 live-bring-up reconciliation: live `asb-subscribe` + `asb-relay` (TCP middleman) capture-and-diff against AVEVA's MxDataProvider on Windows. Five concrete fixes landed:
1. **NBFX `PrefixElement_a..z` (0x5E-0x77) and `PrefixAttribute_a..z` (0x26-0x3F) decode + encode arms** — single-letter-prefix records that WCF emits in responses but our codec only recognised the dictionary-named cousins (`PrefixDictionaryElement_a..z` 0x44-0x5D, `PrefixDictionaryAttribute_a..z` 0x0C-0x25). The server's ConnectResponse hit `0x65 = PrefixElement_h` for a dynamically-named element (e.g. `<h:Foo>`) and our decoder bailed with `unknown NBFX record byte 0x65`. Both directions now round-trip; the encoder picks the short-form arm whenever `prefix_letter_offset(prefix).is_some()`.
2. **xmlns redeclaration on `<Data>` and `<InitializationVector>` inside `AuthenticationData` / `PublicKey`**`[XmlType(Namespace = "http://asb.contracts.data/20111111")]` on the AuthenticationData / PublicKey classes (`AsbContracts.cs:350-381`) means XmlSerializer emits an `xmlns="..."` redeclaration on each direct child. The default-ns scope ends at `</Data>`, so `<InitializationVector>` needs its own redeclaration to stay in the data namespace; without it the server fell back to messages-namespace and the deserialiser threw an `InternalServiceFault`. Connect handshake now completes end-to-end with the apollo:V2 ConnectionLifetime and a real ServicePublicKey.
3. **SOAP-fault detection on the response path**`ClientError::SoapFault { action, code, reason }` surfaces when the response Action header matches the canonical `dispatcher/fault` template; we previously let body decoders blindly run and hit `MissingField { field: "Status" }` which masked the fact that the wire was a fault. The reason text is extracted as the longest `NbfxText::Chars` in the body — robust against the `nbfs.rs` static-dictionary id mismatches noted below.
4. **Identified blocker**: `ConnectedRequest` signing currently HMACs the **NBFX wire bytes** of the unsigned envelope. .NET's `AsbSystemAuthenticator.Sign` (`AsbSystemAuthenticator.cs:79`) HMACs `Encoding.UTF8.GetBytes(request.ToXml())` — the **canonical XML serialisation** of the message contract via `XmlSerializer` with namespace `"urn:invensys.schemas"` (`AsbSerialization.cs:12-48`). Until the Rust port emits identical XML bytes, the HMAC mismatches and the server rejects every signed request (`AuthenticateMe`, `RegisterItems`, etc.) with a generic `dispatcher/fault` InternalServiceFault. Connect itself is unsigned (extends `ServiceMessage`, no `ConnectionValidator` header) which is why it works today. The fault's `a:RelatesTo` UniqueId in our captures matches the AuthenticateMe `MessageID`, confirming the failure point. **New followup F28** captures the XML-canonicaliser scope.
5. **`nbfs.rs` static dictionary ids drift** at id 114+ vs. the canonical `[MC-NBFS]` table (`Fault`/`Code`/`Reason`/`Text`/`Value` are 20 IDs higher on the wire than what we encode). Doesn't affect requests we send (we only encode IDs ≤44 = `ReplyTo`, all correct), but breaks `decode_envelope`'s element-by-name matching for fault bodies. Tracked as **F29**.
Workspace: 702 tests pass (no test count delta — wire-only fixes). Live status: Connect handshake working with real DH key + apollo encryption; AuthenticateMe and onwards blocked on F28. Companion diagnostic example `asb-relay.rs` (TCP middleman that hex-dumps both directions to stderr) lands as a permanent debugging aid.
- F26 step 3: `mxaccess::AsbSession` — high-level cheap-clone async API on top of `AsbTransport`. Parallel to the NMX-shaped `Session` rather than unified, because NMX's `Session` carries orchestration (`CallbackExporter`, callback router task, recovery broadcast, `INmxService2` mutex) that has no ASB analogue, and ASB's request/response loop over a single TCP stream maps naturally to a `Mutex<AsbClient>` that would be foreign to NMX. The struct is `Clone + Send + Sync` (compile-time `assert_clone_send_sync` test guards the contract) — clones share inner state through `Arc<AsbSessionInner { transport: Mutex<AsbTransport<TcpStream>>, connect_response }>`, so each `clone()` is `O(1)` and the lock serialises operation calls. API surface: `AsbSession::connect(endpoint, passphrase, crypto_parameters, via_uri, connection_id)` runs the full bring-up; `from_transport(transport, connect_response)` builds from an existing transport for tests; `connect_response()` exposes the negotiated lifetime / Apollo flag. Operation methods forward to AsbClient: `register_items`/`unregister_items`/`read`/`write`/`keep_alive`/`disconnect`/`create_subscription`/`add_monitored_items`/`publish`/`delete_monitored_items`/`delete_subscription`/`publish_write_complete`. ClientError → mxaccess::Error mapping via `ConnectionError::TransportFailure` (consistent with F26 step 2). 1 new test (compile-time Clone+Send+Sync assertion). **Stubbed for next F26 iteration**: `Stream<Item = MonitoredItemValue>` subscription handle that internally drives a publish-loop, recovery/reconnect policy, and full live-probe wire-byte reconciliation. Workspace: 702 tests pass.
**Earlier slices:**
@@ -124,6 +134,19 @@ move to `## Resolved` with a date + commit hash.
F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTransport`) remain open. With F19-F24 landed, the M5 framing/encoder layer (streams A+B+C+D and the codec stream) is complete; F25 composes them into the `IASBIDataV2` wire client. F22's static dictionary subset is intentionally curated; expand entries as wire captures show new IDs. F27 (constant-time DH) is filed as a separate follow-up below.
### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte)
**Severity:** P0 — blocks every signed ASB operation (AuthenticateMe, RegisterItems, all data-plane RPCs).
**Source:** F25 live-bring-up; `AsbSystemAuthenticator.cs:79` + `AsbSerialization.cs:12-48`.
**Why deferred:** `AsbSystemAuthenticator.Sign` HMACs `Encoding.UTF8.GetBytes(request.ToXml())` — the XML text produced by .NET's `XmlSerializer.Serialize(writer, value)` with `XmlSerializerNamespaces` = `"urn:invensys.schemas"`, then re-parsed via `XDocument.Load` and re-saved to normalise xmlns attribute ordering (xsi before xsd; see `AsbSerialization.cs:36-47`). The HMAC must match the server's recomputation, which uses the same XmlSerializer on the deserialised request — so the Rust port has to produce byte-identical XML. We currently HMAC the NBFX wire bytes of the unsigned envelope, which never matches.
**Resolves when:** A canonical XmlSerializer-compatible emitter lands in `mxaccess-asb` (probably `crates/mxaccess-asb/src/xml_canonical.rs`). Scope per request type: `AuthenticateMe`, `Disconnect`, `KeepAlive`, `RegisterItemsRequest`, `UnregisterItemsRequest`, `ReadRequest`, `WriteBasicRequest`, `PublishWriteCompleteRequest`, `CreateSubscriptionRequest`, `DeleteSubscriptionRequest`, `AddMonitoredItemsRequest`, `DeleteMonitoredItemsRequest`, `PublishRequest`. Each derives its XML form from the `[MessageContract] / [MessageBodyMember(Order = N, Namespace = ...)]` attributes plus per-type `[XmlType(Namespace = ...)]` on `AuthenticationData` / `PublicKey`. Validation: capture .NET probe's `request.ToXml()` output for each operation (instrument `AsbSystemAuthenticator.Sign` with a hex+UTF-8 trace, run `MxAsbClient.Probe`) and assert byte-equal vs the Rust emitter. The `request_xml_utf8` argument to `AsbAuthenticator::sign` is already wired correctly — only the producer is missing. Once HMAC matches, the existing `ConnectionValidator` header path (`mac` + `iv` base64 round-trip) is already validated by the F23 unit tests. **Resolves**: F25 live AuthenticateMe + RegisterItems + every signed operation; M5 DoD bullets 1+2 unblocked.
### F29 — Align `mxaccess-asb-nettcp::nbfs` static dictionary ids with canonical `[MC-NBFS]` table
**Severity:** P2 — diagnostic-only today; blocks future fault-body decoding.
**Source:** F25 live-bring-up; observed wire ids (Fault=134, Code=142, Reason=144, Text=146, Value=154, Subcode=156) vs `nbfs.rs` (Fault=114, Code=122, Reason=124, Text=126, Value=134, Subcode=136). Off by 20 starting at the SOAP-fault subset.
**Why deferred:** Doesn't affect request encoding — every dict id we emit is ≤44 (`ReplyTo`) and those IDs are correct. The SOAP-fault element-by-name decode in `detect_soap_fault` was sidestepped by walking text records directly rather than relying on dict-resolved element names, so the user-facing fault reason still surfaces correctly. The dictionary mismatch is a latent issue that will bite when (a) we want richer fault decoding (parsing `<Code><Value>s:Receiver</Value></Code>` to surface the SOAP fault role) or (b) we encode anything in the upper id range (none of our current encoders do).
**Resolves when:** The 10 missing `[MC-NBFS]` §2.2 entries between `s` (id 112) and `Fault` (id 134) are inserted, and existing 114+ entries are renumbered by +20. The canonical reference is the `[MC-NBFS]` PDF (Microsoft Open Specifications) or the `XD.cs` / `ServiceModelStringsVersion1` table inside `System.ServiceModel`. Add a regression test that hands a captured fault envelope to `decode_envelope` and asserts both Code and Reason text resolve via dict lookup.
### F27 — Constant-time DH `mod_exp` (swap `num-bigint` → `crypto-bigint::BoxedUint`)
**Severity:** P2 (security regression vs the long-term Rust target — but at parity with the .NET reference today, so not a release-blocker)
**Source:** F23 (`crates/mxaccess-asb-nettcp/src/auth.rs:179,303`); originally flagged in `design/30-crate-topology.md:269-274` and the project's `review.md` MAJOR finding.
@@ -432,6 +432,13 @@ fn encode_element(
out.push(0x44 + off);
encode_multibyte_int31_to_nbfx(out, *id)
}
// Short-form: single-letter prefix + inline name. Records
// 0x5E..0x77 (PrefixElement_a..z).
(Some(prefix), NbfxName::Inline(s)) if prefix_letter_offset(prefix).is_some() => {
let off = prefix_letter_offset(prefix).unwrap_or(0);
out.push(0x5E + off);
encode_string(s.as_bytes(), out)
}
(Some(prefix), NbfxName::Inline(s)) => {
out.push(REC_ELEMENT);
encode_string(prefix.as_bytes(), out)?;
@@ -470,6 +477,13 @@ fn encode_attribute(
out.push(0x0C + off);
encode_multibyte_int31_to_nbfx(out, *id)?;
}
// Short-form: single-letter prefix + inline name. Records
// 0x26..0x3F (PrefixAttribute_a..z).
(Some(prefix), NbfxName::Inline(s)) if prefix_letter_offset(prefix).is_some() => {
let off = prefix_letter_offset(prefix).unwrap_or(0);
out.push(0x26 + off);
encode_string(s.as_bytes(), out)?;
}
(Some(prefix), NbfxName::Inline(s)) => {
out.push(REC_ATTRIBUTE);
encode_string(prefix.as_bytes(), out)?;
@@ -647,6 +661,18 @@ pub fn decode_tokens(
name: NbfxName::Static(id),
});
}
// PrefixElement_a..z: 0x5E..0x77 — single-letter prefix +
// inline element name. WCF emits these on the response side
// when the element name is not in either dictionary (e.g.
// dynamically-named DataContract members).
byte if (0x5E..=0x77).contains(&byte) => {
let prefix_letter = char::from(b'a' + (byte - 0x5E));
let name = decode_string(input, &mut cursor, "prefix-element-name")?;
tokens.push(NbfxToken::Element {
prefix: Some(prefix_letter.to_string()),
name: NbfxName::Inline(name),
});
}
REC_SHORT_ATTRIBUTE => {
let name = decode_string(input, &mut cursor, "short-attribute")?;
let value = decode_text_record(input, &mut cursor)?;
@@ -697,6 +723,18 @@ pub fn decode_tokens(
value,
});
}
// PrefixAttribute_a..z: 0x26..0x3F — single-letter prefix +
// inline attribute name + text-record value.
byte if (0x26..=0x3F).contains(&byte) => {
let prefix_letter = char::from(b'a' + (byte - 0x26));
let name = decode_string(input, &mut cursor, "prefix-attribute-name")?;
let value = decode_text_record(input, &mut cursor)?;
tokens.push(NbfxToken::Attribute {
prefix: Some(prefix_letter.to_string()),
name: NbfxName::Inline(name),
value,
});
}
REC_SHORT_XMLNS_ATTRIBUTE => {
let value_str = decode_string(input, &mut cursor, "default-xmlns-value")?;
tokens.push(NbfxToken::DefaultNamespace {
+69
View File
@@ -171,6 +171,9 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
match record {
NmfRecord::SizedEnvelope(reply_bytes) => {
let decoded = decode_envelope(&reply_bytes, &mut self.read_dictionary)?;
if let Some(fault) = detect_soap_fault(&decoded) {
return Err(fault);
}
Ok(decoded)
}
NmfRecord::Fault(message) => Err(ClientError::Fault(message)),
@@ -604,6 +607,59 @@ async fn read_multibyte_int31_async<T: AsyncRead + Unpin>(
usize::try_from(value).map_err(|_| ClientError::Nmf(NmfError::NegativeLength(value)))
}
/// Inspect a `DecodedEnvelope` for a SOAP-1.2 `<s:Fault>` body and
/// return a typed `ClientError::SoapFault` if found. Returns `None`
/// for non-fault responses so the normal decode path runs.
///
/// WCF surfaces server-side exceptions as a `dispatcher/fault` action
/// envelope wrapping `<s:Fault>`. The fault structure uses static dict
/// ids (Reason=144, Text=146, Value=154 per `[MC-NBFS]`) which our
/// `nbfs.rs` static table partially mismatches; rather than relying
/// on element-name lookup, we accept any envelope whose Action header
/// matches the canonical fault action template AND extract the
/// human-readable reason as the longest `Chars` text in the body.
/// The fault code is the first short `Chars` value (typically
/// `s:Receiver` or `s:Sender`).
fn detect_soap_fault(decoded: &crate::DecodedEnvelope) -> Option<ClientError> {
use mxaccess_asb_nettcp::nbfx::{NbfxText, NbfxToken};
let action_is_fault = decoded
.action
.as_deref()
.is_some_and(|a| a.contains("/fault") || a.ends_with(":fault"));
if !action_is_fault {
return None;
}
// Walk the body's text records. The fault Reason text is by far
// the longest free-form Chars in a fault body; the Code/Subcode
// values are shorter qname-style strings ("s:Receiver", "...:.
// InternalServiceFault"). Sort accordingly.
let mut all_chars: Vec<&str> = Vec::new();
for tok in &decoded.body_tokens {
if let NbfxToken::Text(NbfxText::Chars(s)) = tok {
all_chars.push(s);
}
}
let reason = all_chars
.iter()
.max_by_key(|s| s.len())
.map(|s| (*s).to_string())
.unwrap_or_else(|| "(no reason text)".to_string());
// First Chars that looks like a SOAP fault code qname (contains a
// colon or ends with "Fault").
let code = all_chars
.iter()
.find(|s| s.contains(':') || s.ends_with("Fault"))
.map(|s| (*s).to_string());
let action = decoded.action.clone().unwrap_or_default();
Some(ClientError::SoapFault {
action,
code,
reason,
})
}
// ---- error type ----------------------------------------------------------
#[derive(Debug, thiserror::Error)]
@@ -627,6 +683,19 @@ pub enum ClientError {
AlreadyClosed,
#[error("peer reported NMF fault: {0}")]
Fault(String),
/// SOAP-level fault inside a SizedEnvelope. WCF's
/// `dispatcher/fault` action wraps a SOAP 1.2 `<s:Fault>` body
/// when the service throws an unhandled exception. The action is
/// preserved so callers can correlate (e.g.
/// `.../dispatcher/fault` is the generic catch-all;
/// `.../addressing/fault` indicates AddressFilterMismatch). The
/// `reason` is the human-readable `<s:Reason><s:Text>` text.
#[error("SOAP fault from peer (action={action}): {reason}")]
SoapFault {
action: String,
code: Option<String>,
reason: String,
},
#[error("peer closed the channel before sending a response")]
PeerClosed,
#[error("unexpected NMF record on response path: {0}")]
@@ -229,23 +229,64 @@ fn public_key_data_field(data: &[u8]) -> Vec<NbfxToken> {
prefix: None,
name: NbfxName::Inline("Data".to_string()),
},
// .NET's `PublicKey` class has
// `[XmlType(Namespace = "http://asb.contracts.data/20111111")]`
// (`AsbContracts.cs:350-362`). XmlSerializer emits an
// `xmlns="..."` redeclaration on `<Data>` to switch from the
// outer messages namespace into the data namespace. Without
// this, the server's deserialiser fails and dispatches a
// generic InternalServiceFault. Verified against .NET probe
// wire capture.
NbfxToken::DefaultNamespace {
value: NbfxText::Chars("http://asb.contracts.data/20111111".to_string()),
},
NbfxToken::Text(NbfxText::Bytes(data.to_vec())),
NbfxToken::EndElement,
]
}
/// `AuthenticationData` per `AsbContracts.cs:364-381`:
///
/// ```csharp
/// [XmlType(Namespace = "http://asb.contracts.data/20111111")]
/// public sealed class AuthenticationData {
/// public byte[]? Data { get; set; }
/// public byte[]? InitializationVector { get; set; }
/// }
/// ```
///
/// Same data-namespace switch as `<PublicKey><Data>` — the `<Data>`
/// element gets the `xmlns="...data/20111111"` redeclaration. The
/// `<InitializationVector>` element is in the same data namespace
/// (already-in-scope because of the prior `<Data>` redeclaration's
/// `xmlns` lasts until end of `<AuthenticationData>`).
fn authentication_data_fields(data: &[u8], iv: &[u8]) -> Vec<NbfxToken> {
// The default-namespace declaration on `<Data>` only stays in
// scope until `</Data>` closes. `<InitializationVector>` opens
// afterwards and therefore needs its OWN xmlns redeclaration to
// stay in the `http://asb.contracts.data/20111111` namespace
// (matching `[XmlType]` on the `AuthenticationData` class). Without
// the second redeclaration the IV element falls back to the parent
// (messages) namespace and the server's XmlSerializer rejects the
// request with a generic InternalServiceFault.
let data_ns = "http://asb.contracts.data/20111111".to_string();
vec![
NbfxToken::Element {
prefix: None,
name: NbfxName::Inline("Data".to_string()),
},
NbfxToken::DefaultNamespace {
value: NbfxText::Chars(data_ns.clone()),
},
NbfxToken::Text(NbfxText::Bytes(data.to_vec())),
NbfxToken::EndElement,
NbfxToken::Element {
prefix: None,
name: NbfxName::Inline("InitializationVector".to_string()),
},
NbfxToken::DefaultNamespace {
value: NbfxText::Chars(data_ns),
},
NbfxToken::Text(NbfxText::Bytes(iv.to_vec())),
NbfxToken::EndElement,
]