diff --git a/design/followups.md b/design/followups.md index f15ba21..c8d382f 100644 --- a/design/followups.md +++ b/design/followups.md @@ -149,8 +149,17 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr ### F32 — Live type-matrix coverage for `asb-subscribe` **Severity:** P1 — final M5 DoD bullet (#3). **Source:** F18 M5 status block. -**Why deferred:** The live bring-up loop verified Int32 end-to-end (`TestChildObject.TestInt = 99`). The remaining proven-on-.NET-side types — Boolean, Float, Double, String, DateTime, Duration, plus deployed array shapes per `work_remain.md:108-113` — need at least one sample tag per type in the Galaxy and a probe loop in `examples/asb-subscribe.rs` (or a new `asb-typematrix.rs`) that registers + reads each, asserting the decoded `AsbVariant` round-trips through the F24 codec. -**Resolves when:** A list of test tags (one per type) is provisioned in the live Galaxy and the matrix loop produces a clean run. + +**Live coverage so far (commits `9063f10`, ``):** three types round-trip end-to-end against the live MxDataProvider on this Windows host: +- ✅ **Int32** (type_id 4) — `TestChildObject.TestInt = 99` +- ✅ **String** (type_id 10) — `TestChildObject.TestString = "mxaccesscli verified 17778523775"` (UTF-16LE on the wire) +- ✅ **Bool** (type_id 17) — `DelmiaReceiver_001.TestAttribute = 0` + +A SQL probe of the Galaxy DB (`SELECT mx_data_type, MIN(...) FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id INNER JOIN dynamic_attribute da ON da.package_id = p.package_id WHERE g.is_template = 0 GROUP BY da.mx_data_type`) shows the live Galaxy only has tags of `mx_data_type ∈ {1=Bool, 2=Int32, 5=String}`. Float (3), Double (4), DateTime (6), Duration (7), and array shapes are not deployed in this Galaxy, so we cannot exercise them without provisioning new attributes. + +**Transient flakiness observed:** the `InvalidConnectionId` race after one-way `AuthenticateMe` is not deterministic — even with the `MAX_ATTEMPTS=10`, `BACKOFF_BASE_MS=200` retry loop in `register_items` (commit ``) and a 250 ms post-auth settle in `connect`, individual runs occasionally exhaust the retry budget. The server appears to enter a degraded mode after many test runs (presumably pending-connection table fills), and a 30-second cool-down restores reliability. Each tag works in some runs and fails in others; the failure is not tag-specific. Production deployments with a single long-lived session are unlikely to hit this. + +**Resolves when:** Either (a) the Galaxy is augmented with sample tags for the four missing types and an `asb-typematrix.rs` integration test loops over all seven proven types, OR (b) the four-missing-types coverage is acknowledged as gated on Galaxy-side provisioning that's outside the Rust port's scope and the followup is closed with the three-type live verification as the M5 DoD ✓. ### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte) **Status: PARTIALLY RESOLVED.** The five `[XmlSerializerFormat]` ops (AuthenticateMe, Disconnect, KeepAlive, RegisterItems, UnregisterItems) plus the per-action `ValidatorWireFormat` selector + DH-params-from-registry + dynamic-dict id management all landed in commits `f14580e` / `104efc4`. Live AuthenticateMe + RegisterItems work end-to-end (commit `9063f10`). Read / Write / CreateSubscription / AddMonitoredItems / Publish / DeleteMonitored / DeleteSubscription / PublishWriteComplete still sign over NBFX wire bytes via the legacy fallback; works in practice because the live registry has empty `hashAlgorithm` (no HMAC required for the unforced-MAC path), but will break under any deployment that sets a real algorithm. **Severity now P2** — promote back to P0 if a hashAlgorithm-non-empty environment is in scope. diff --git a/rust/crates/mxaccess-asb/src/client.rs b/rust/crates/mxaccess-asb/src/client.rs index 7c8b1de..19964a5 100644 --- a/rust/crates/mxaccess-asb/src/client.rs +++ b/rust/crates/mxaccess-asb/src/client.rs @@ -309,6 +309,17 @@ impl AsbClient { ) .await?; + // AuthenticateMe is one-way — the server processes it + // asynchronously after we send. Give it a beat to commit auth + // state before any subsequent signed op lands. Without this + // initial settle, the per-op retry loop frequently exhausts + // its budget on the InvalidConnectionId race (observed + // empirically against the live MxDataProvider on Windows). + // 250 ms is short enough to be invisible to user-perceived + // latency and long enough to absorb the typical server-side + // commit delay on this deployment. + tokio::time::sleep(std::time::Duration::from_millis(250)).await; + Ok(connect_response) } @@ -548,29 +559,38 @@ impl AsbClient { /// `RegisterItems` operation — sends a signed `RegisterItemsIn` /// SOAP envelope and decodes the `RegisterItemsResponse`. Retries - /// up to 5 times with `100 * attempt` ms backoff on - /// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's - /// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists - /// because `AuthenticateMe` is one-way: the server may not have - /// finished processing it before our first Register lands, in - /// which case it sees an unauthenticated connection and returns - /// `InvalidConnectionId`. A short backoff lets the auth state - /// commit on the server side. + /// on `InvalidConnectionId` (`AsbErrorCode = 1`) — a transient + /// race after our one-way `AuthenticateMe`, since the server + /// commits auth state asynchronously and a Register that arrives + /// too quickly sees an unauthenticated connection. + /// + /// .NET's reference uses 5 attempts with `100 * attempt` ms backoff + /// (`MxAsbDataClient.cs:191-204`); we observed that wasn't always + /// enough on slower live deployments (Bool tag failed all 5 in + /// some runs). We bump to 10 attempts with `200 * attempt` ms + /// backoff — total worst-case wait ~11 s, which is well within + /// any reasonable user-perceived timeout. pub async fn register_items( &mut self, items: &[ItemIdentity], require_id: bool, register_only: bool, ) -> Result { + const MAX_ATTEMPTS: u32 = 10; + const BACKOFF_BASE_MS: u64 = 200; + let mut response = self .register_items_once(items, require_id, register_only) .await?; let mut attempt = 1u32; - while attempt < 5 + while attempt < MAX_ATTEMPTS && response.result_code == Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID) { - tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await; + tokio::time::sleep(std::time::Duration::from_millis( + BACKOFF_BASE_MS * u64::from(attempt), + )) + .await; response = self .register_items_once(items, require_id, register_only) .await?; diff --git a/rust/crates/mxaccess/examples/asb-subscribe.rs b/rust/crates/mxaccess/examples/asb-subscribe.rs index 9c0756b..6455a9b 100644 --- a/rust/crates/mxaccess/examples/asb-subscribe.rs +++ b/rust/crates/mxaccess/examples/asb-subscribe.rs @@ -81,8 +81,10 @@ async fn main() -> Result<(), Box> { eprintln!("registering {}", env.tag); let register = client.register_items(&items, true, false).await?; eprintln!( - "register status: {} item(s); first error_code = 0x{:04x}", + "register status: {} item(s); result_code={:?} success={:?}; first error_code = 0x{:04x}", register.status.len(), + register.result_code, + register.success, register.status.first().map(|s| s.error_code).unwrap_or(0) );