[M5] mxaccess-asb: F32 partial — Bool + String + Int32 live, longer retry budget
rust / build / test / clippy / fmt (push) Has been cancelled

Three of seven proven types now round-trip end-to-end against the
live MxDataProvider:

   Int32   (type_id 4)  — TestChildObject.TestInt = 99
   String  (type_id 10) — TestChildObject.TestString = "mxaccesscli
                            verified 17778523775" (UTF-16LE on wire)
   Bool    (type_id 17) — DelmiaReceiver_001.TestAttribute = 0

A SQL probe of the live Galaxy (`gobject ⨝ package ⨝
dynamic_attribute` grouped by `mx_data_type`) shows only types {1=Bool,
2=Int32, 5=String} have deployed instances. Float/Double/DateTime/
Duration/array shapes are not in this Galaxy, so the remaining four
type-matrix bullets in F32 are gated on Galaxy-side provisioning
that's outside the Rust port's scope. The M5 DoD #3 was always going
to bottom out at "what types are deployed in the test environment."

Code changes:
- `register_items` retry budget bumped: 10 attempts (was 5) with
  `200 * attempt` ms backoff (was 100 * attempt). Worst-case wait
  ~11 s, well within user-perceived latency on a one-shot RPC. The
  .NET reference's 5×100 ms didn't always cover the live AVEVA
  install's auth-state-commit latency on this hardware.
- `AsbClient::connect` adds a 250 ms `tokio::time::sleep` immediately
  after the one-way `AuthenticateMe` send. The server processes the
  request asynchronously; without an initial settle, the per-op retry
  loop frequently exhausts its budget on the InvalidConnectionId
  race even on the FIRST register attempt. 250 ms is short enough to
  be invisible and long enough to absorb the typical commit delay.
- `examples/asb-subscribe.rs` now prints `result_code` and `success`
  alongside the status count so the user can see when register is
  hitting the retry-exhausted state.

Live flakiness note: the AuthenticateMe race is not fully
deterministic — after many back-to-back test runs the live server
appears to degrade (presumably pending-connection table fills) and
the retry budget exhausts on EVERY tag, not just one. A 30-second
cool-down restores reliability. Production deployments with a single
long-lived session are unlikely to hit this. F32 status doc captures
the observation.

Workspace: 711 unit tests pass. Clippy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-05 21:21:07 -04:00
parent 4ddb6542e1
commit 5845b5eb12
3 changed files with 44 additions and 13 deletions
+30 -10
View File
@@ -309,6 +309,17 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
)
.await?;
// AuthenticateMe is one-way — the server processes it
// asynchronously after we send. Give it a beat to commit auth
// state before any subsequent signed op lands. Without this
// initial settle, the per-op retry loop frequently exhausts
// its budget on the InvalidConnectionId race (observed
// empirically against the live MxDataProvider on Windows).
// 250 ms is short enough to be invisible to user-perceived
// latency and long enough to absorb the typical server-side
// commit delay on this deployment.
tokio::time::sleep(std::time::Duration::from_millis(250)).await;
Ok(connect_response)
}
@@ -548,29 +559,38 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
/// `RegisterItems` operation — sends a signed `RegisterItemsIn`
/// SOAP envelope and decodes the `RegisterItemsResponse`. Retries
/// up to 5 times with `100 * attempt` ms backoff on
/// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's
/// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists
/// because `AuthenticateMe` is one-way: the server may not have
/// finished processing it before our first Register lands, in
/// which case it sees an unauthenticated connection and returns
/// `InvalidConnectionId`. A short backoff lets the auth state
/// commit on the server side.
/// on `InvalidConnectionId` (`AsbErrorCode = 1`) — a transient
/// race after our one-way `AuthenticateMe`, since the server
/// commits auth state asynchronously and a Register that arrives
/// too quickly sees an unauthenticated connection.
///
/// .NET's reference uses 5 attempts with `100 * attempt` ms backoff
/// (`MxAsbDataClient.cs:191-204`); we observed that wasn't always
/// enough on slower live deployments (Bool tag failed all 5 in
/// some runs). We bump to 10 attempts with `200 * attempt` ms
/// backoff — total worst-case wait ~11 s, which is well within
/// any reasonable user-perceived timeout.
pub async fn register_items(
&mut self,
items: &[ItemIdentity],
require_id: bool,
register_only: bool,
) -> Result<RegisterItemsResponse, ClientError> {
const MAX_ATTEMPTS: u32 = 10;
const BACKOFF_BASE_MS: u64 = 200;
let mut response = self
.register_items_once(items, require_id, register_only)
.await?;
let mut attempt = 1u32;
while attempt < 5
while attempt < MAX_ATTEMPTS
&& response.result_code
== Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID)
{
tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await;
tokio::time::sleep(std::time::Duration::from_millis(
BACKOFF_BASE_MS * u64::from(attempt),
))
.await;
response = self
.register_items_once(items, require_id, register_only)
.await?;
@@ -81,8 +81,10 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
eprintln!("registering {}", env.tag);
let register = client.register_items(&items, true, false).await?;
eprintln!(
"register status: {} item(s); first error_code = 0x{:04x}",
"register status: {} item(s); result_code={:?} success={:?}; first error_code = 0x{:04x}",
register.status.len(),
register.result_code,
register.success,
register.status.first().map(|s| s.error_code).unwrap_or(0)
);