[M5] mxaccess-asb: F32 partial — Bool + String + Int32 live, longer retry budget
rust / build / test / clippy / fmt (push) Has been cancelled
rust / build / test / clippy / fmt (push) Has been cancelled
Three of seven proven types now round-trip end-to-end against the live MxDataProvider: ✅ Int32 (type_id 4) — TestChildObject.TestInt = 99 ✅ String (type_id 10) — TestChildObject.TestString = "mxaccesscli verified 17778523775" (UTF-16LE on wire) ✅ Bool (type_id 17) — DelmiaReceiver_001.TestAttribute = 0 A SQL probe of the live Galaxy (`gobject ⨝ package ⨝ dynamic_attribute` grouped by `mx_data_type`) shows only types {1=Bool, 2=Int32, 5=String} have deployed instances. Float/Double/DateTime/ Duration/array shapes are not in this Galaxy, so the remaining four type-matrix bullets in F32 are gated on Galaxy-side provisioning that's outside the Rust port's scope. The M5 DoD #3 was always going to bottom out at "what types are deployed in the test environment." Code changes: - `register_items` retry budget bumped: 10 attempts (was 5) with `200 * attempt` ms backoff (was 100 * attempt). Worst-case wait ~11 s, well within user-perceived latency on a one-shot RPC. The .NET reference's 5×100 ms didn't always cover the live AVEVA install's auth-state-commit latency on this hardware. - `AsbClient::connect` adds a 250 ms `tokio::time::sleep` immediately after the one-way `AuthenticateMe` send. The server processes the request asynchronously; without an initial settle, the per-op retry loop frequently exhausts its budget on the InvalidConnectionId race even on the FIRST register attempt. 250 ms is short enough to be invisible and long enough to absorb the typical commit delay. - `examples/asb-subscribe.rs` now prints `result_code` and `success` alongside the status count so the user can see when register is hitting the retry-exhausted state. Live flakiness note: the AuthenticateMe race is not fully deterministic — after many back-to-back test runs the live server appears to degrade (presumably pending-connection table fills) and the retry budget exhausts on EVERY tag, not just one. A 30-second cool-down restores reliability. Production deployments with a single long-lived session are unlikely to hit this. F32 status doc captures the observation. Workspace: 711 unit tests pass. Clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
-2
@@ -149,8 +149,17 @@ F25 (`mxaccess-asb` IASBIDataV2 client) and F26 (`mxaccess::Session` over `AsbTr
|
||||
### F32 — Live type-matrix coverage for `asb-subscribe`
|
||||
**Severity:** P1 — final M5 DoD bullet (#3).
|
||||
**Source:** F18 M5 status block.
|
||||
**Why deferred:** The live bring-up loop verified Int32 end-to-end (`TestChildObject.TestInt = 99`). The remaining proven-on-.NET-side types — Boolean, Float, Double, String, DateTime, Duration, plus deployed array shapes per `work_remain.md:108-113` — need at least one sample tag per type in the Galaxy and a probe loop in `examples/asb-subscribe.rs` (or a new `asb-typematrix.rs`) that registers + reads each, asserting the decoded `AsbVariant` round-trips through the F24 codec.
|
||||
**Resolves when:** A list of test tags (one per type) is provisioned in the live Galaxy and the matrix loop produces a clean run.
|
||||
|
||||
**Live coverage so far (commits `9063f10`, `<this commit>`):** three types round-trip end-to-end against the live MxDataProvider on this Windows host:
|
||||
- ✅ **Int32** (type_id 4) — `TestChildObject.TestInt = 99`
|
||||
- ✅ **String** (type_id 10) — `TestChildObject.TestString = "mxaccesscli verified 17778523775"` (UTF-16LE on the wire)
|
||||
- ✅ **Bool** (type_id 17) — `DelmiaReceiver_001.TestAttribute = 0`
|
||||
|
||||
A SQL probe of the Galaxy DB (`SELECT mx_data_type, MIN(...) FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id INNER JOIN dynamic_attribute da ON da.package_id = p.package_id WHERE g.is_template = 0 GROUP BY da.mx_data_type`) shows the live Galaxy only has tags of `mx_data_type ∈ {1=Bool, 2=Int32, 5=String}`. Float (3), Double (4), DateTime (6), Duration (7), and array shapes are not deployed in this Galaxy, so we cannot exercise them without provisioning new attributes.
|
||||
|
||||
**Transient flakiness observed:** the `InvalidConnectionId` race after one-way `AuthenticateMe` is not deterministic — even with the `MAX_ATTEMPTS=10`, `BACKOFF_BASE_MS=200` retry loop in `register_items` (commit `<this commit>`) and a 250 ms post-auth settle in `connect`, individual runs occasionally exhaust the retry budget. The server appears to enter a degraded mode after many test runs (presumably pending-connection table fills), and a 30-second cool-down restores reliability. Each tag works in some runs and fails in others; the failure is not tag-specific. Production deployments with a single long-lived session are unlikely to hit this.
|
||||
|
||||
**Resolves when:** Either (a) the Galaxy is augmented with sample tags for the four missing types and an `asb-typematrix.rs` integration test loops over all seven proven types, OR (b) the four-missing-types coverage is acknowledged as gated on Galaxy-side provisioning that's outside the Rust port's scope and the followup is closed with the three-type live verification as the M5 DoD ✓.
|
||||
|
||||
### F28 — Canonical XML serialiser for `ConnectedRequest` signing (matches `XmlSerializer.Serialize` byte-for-byte)
|
||||
**Status: PARTIALLY RESOLVED.** The five `[XmlSerializerFormat]` ops (AuthenticateMe, Disconnect, KeepAlive, RegisterItems, UnregisterItems) plus the per-action `ValidatorWireFormat` selector + DH-params-from-registry + dynamic-dict id management all landed in commits `f14580e` / `104efc4`. Live AuthenticateMe + RegisterItems work end-to-end (commit `9063f10`). Read / Write / CreateSubscription / AddMonitoredItems / Publish / DeleteMonitored / DeleteSubscription / PublishWriteComplete still sign over NBFX wire bytes via the legacy fallback; works in practice because the live registry has empty `hashAlgorithm` (no HMAC required for the unforced-MAC path), but will break under any deployment that sets a real algorithm. **Severity now P2** — promote back to P0 if a hashAlgorithm-non-empty environment is in scope.
|
||||
|
||||
@@ -309,6 +309,17 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
|
||||
)
|
||||
.await?;
|
||||
|
||||
// AuthenticateMe is one-way — the server processes it
|
||||
// asynchronously after we send. Give it a beat to commit auth
|
||||
// state before any subsequent signed op lands. Without this
|
||||
// initial settle, the per-op retry loop frequently exhausts
|
||||
// its budget on the InvalidConnectionId race (observed
|
||||
// empirically against the live MxDataProvider on Windows).
|
||||
// 250 ms is short enough to be invisible to user-perceived
|
||||
// latency and long enough to absorb the typical server-side
|
||||
// commit delay on this deployment.
|
||||
tokio::time::sleep(std::time::Duration::from_millis(250)).await;
|
||||
|
||||
Ok(connect_response)
|
||||
}
|
||||
|
||||
@@ -548,29 +559,38 @@ impl<T: AsyncRead + AsyncWrite + Unpin + Send> AsbClient<T> {
|
||||
|
||||
/// `RegisterItems` operation — sends a signed `RegisterItemsIn`
|
||||
/// SOAP envelope and decodes the `RegisterItemsResponse`. Retries
|
||||
/// up to 5 times with `100 * attempt` ms backoff on
|
||||
/// `InvalidConnectionId` (`AsbErrorCode = 1`), mirroring .NET's
|
||||
/// `MxAsbDataClient.RegisterMany` (`cs:191-204`). The retry exists
|
||||
/// because `AuthenticateMe` is one-way: the server may not have
|
||||
/// finished processing it before our first Register lands, in
|
||||
/// which case it sees an unauthenticated connection and returns
|
||||
/// `InvalidConnectionId`. A short backoff lets the auth state
|
||||
/// commit on the server side.
|
||||
/// on `InvalidConnectionId` (`AsbErrorCode = 1`) — a transient
|
||||
/// race after our one-way `AuthenticateMe`, since the server
|
||||
/// commits auth state asynchronously and a Register that arrives
|
||||
/// too quickly sees an unauthenticated connection.
|
||||
///
|
||||
/// .NET's reference uses 5 attempts with `100 * attempt` ms backoff
|
||||
/// (`MxAsbDataClient.cs:191-204`); we observed that wasn't always
|
||||
/// enough on slower live deployments (Bool tag failed all 5 in
|
||||
/// some runs). We bump to 10 attempts with `200 * attempt` ms
|
||||
/// backoff — total worst-case wait ~11 s, which is well within
|
||||
/// any reasonable user-perceived timeout.
|
||||
pub async fn register_items(
|
||||
&mut self,
|
||||
items: &[ItemIdentity],
|
||||
require_id: bool,
|
||||
register_only: bool,
|
||||
) -> Result<RegisterItemsResponse, ClientError> {
|
||||
const MAX_ATTEMPTS: u32 = 10;
|
||||
const BACKOFF_BASE_MS: u64 = 200;
|
||||
|
||||
let mut response = self
|
||||
.register_items_once(items, require_id, register_only)
|
||||
.await?;
|
||||
let mut attempt = 1u32;
|
||||
while attempt < 5
|
||||
while attempt < MAX_ATTEMPTS
|
||||
&& response.result_code
|
||||
== Some(crate::operations::RESULT_CODE_INVALID_CONNECTION_ID)
|
||||
{
|
||||
tokio::time::sleep(std::time::Duration::from_millis(100 * u64::from(attempt))).await;
|
||||
tokio::time::sleep(std::time::Duration::from_millis(
|
||||
BACKOFF_BASE_MS * u64::from(attempt),
|
||||
))
|
||||
.await;
|
||||
response = self
|
||||
.register_items_once(items, require_id, register_only)
|
||||
.await?;
|
||||
|
||||
@@ -81,8 +81,10 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||||
eprintln!("registering {}", env.tag);
|
||||
let register = client.register_items(&items, true, false).await?;
|
||||
eprintln!(
|
||||
"register status: {} item(s); first error_code = 0x{:04x}",
|
||||
"register status: {} item(s); result_code={:?} success={:?}; first error_code = 0x{:04x}",
|
||||
register.status.len(),
|
||||
register.result_code,
|
||||
register.success,
|
||||
register.status.first().map(|s| s.error_code).unwrap_or(0)
|
||||
);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user