Files

T

Joseph Doherty 4a0f88b17d Resolve Client.Rust-022..029: MalformedReply, correlation ids, clippy

Client.Rust-022  Restored Error::MalformedReply for register / add_item /
                 add_item2 and the bulk-subscribe / read-bulk / write-bulk
                 dispatch arms so malformed-but-OK replies fail loudly
                 instead of returning Vec::new().
Client.Rust-023  Restored next_correlation_id and routed every CLI close /
                 stream-alarms / acknowledge-alarm / bench-read-bulk call
                 through it so each call carries a unique opaque token.
Client.Rust-024  Added round-trip tests for read_bulk / write_bulk /
                 write2_bulk / write_secured_bulk / write_secured2_bulk
                 plus stream_alarms and percentile_summary unit tests.
Client.Rust-025  RustClientDesign.md re-synced — new bulk SDK, alarms
                 surface, Error variants, CLI command list, and the
                 Windows stack workaround.
Client.Rust-026  Session::read_bulk now borrows a tag slice; bench-read-
                 bulk binds tags once outside the warm-up / steady-state
                 loops.
Client.Rust-027  .cargo/config.toml selector tightened to
                 cfg(all(windows, target_env = "msvc")) and comment
                 rewritten to match reality (release + debug ship the
                 8 MB reservation).
Client.Rust-028  run_batch removed the empty-line break; stdin EOF is
                 the only terminator.
Client.Rust-029  Re-applied Client.Rust-001 / 002 / 012 — added the
                 missing doc comments, renamed BulkReplyKind variants,
                 and replaced the clone-on-copy with a deref under lock
                 so cargo clippy -D warnings is clean.

All resolved at 2026-05-24; cargo fmt + check + clippy + test all green
(55 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 08:50:15 -04:00

76 KiB

Raw Blame History

Code Review — Client.Rust

Field	Value
Module	`clients/rust`
Reviewer	Claude Code
Review date	2026-05-24
Commit reviewed	`42b0037`
Status	Re-reviewed
Open findings	0

Checklist coverage

This re-review (a020350) covers the resolution work for Client.Rust-013 through 017 (scoped doc_lazy_continuation allow on generated submodules, pub next_correlation_id shared with the CLI, success/failure split in bench-read-bulk, eight new tests, design-doc resync). The pass spot-checked the items called out in the request: stability of the newly-pub next_correlation_id, the bench-read-bulk JSON shape vs the PowerShell driver, presence of unsafe, and the scope of #![allow(clippy::doc_lazy_continuation)]. cargo clippy --workspace --all-targets -- -D warnings and cargo test --workspace both pass on this commit.

#	Category	Result
1	Correctness & logic bugs	No issues found — the five new `MalformedReply` paths and the `read_bulk` mismatched-payload branch each have a dedicated test; `BenchReadBulkStats` correctly partitions success vs failure latency.
2	mxaccessgw conventions	No issues found — `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit; the `#![allow(clippy::doc_lazy_continuation)]` allow is scoped narrowly to each generated v1 inner module so hand-written code is unaffected; CLI `Ping`/`CloseSession` now call `session::next_correlation_id`.
3	Concurrency & thread safety	No issues found — `CORRELATION_SEQUENCE` is `AtomicU64` with `Relaxed`, correct for monotonic id generation; no `unsafe` anywhere in `src/` or `crates/`.
4	Error handling & resilience	Issue found: the `bench-read-bulk` fix for Client.Rust-015 has fixed Rust's own histogram honestly but the change makes Rust's `latencyMs` semantically incompatible with the four other clients' `latencyMs` field that the cross-language PowerShell driver collates side-by-side (Client.Rust-018).
5	Security	No issues found — API keys still redacted in `Debug`/`Display`, status messages scrubbed, `first_failure` records `Error::Display` (which already redacts `mxgw_*` tokens) so secure-write values cannot leak into the bench JSON.
6	Performance & resource management	No issues found in the reviewed delta.
7	Design-document adherence	Issue found: `RustClientDesign.md` Session signatures for the four bulk-write helpers and `read_bulk` do not match the actual implementation — the design lists trailing `user_id` / `timestamp` / `current_user_id` / `verifier_user_id` parameters and a `Vec<ReadBulkResult>` return that the impl does not have (all of those move per-entry into `WriteBulkEntry` etc.) (Client.Rust-019).
8	Code organization & conventions	No new issues — `BenchReadBulkStats` is cleanly factored out and tested.
9	Testing coverage	No new issues — the malformed-reply paths and unary `Error::Unavailable` are now covered, and the four bulk-write families each have round-trip smoke.
10	Documentation & comments	Issue found: `next_correlation_id` is now `pub` and its doc comment commits the SDK to the literal `"rust-client-{label}-{N}"` string format, but neither the doc nor `lib.rs` re-exports it or declares any stability stance, leaving the public surface ambiguous (Client.Rust-020).

2026-05-24 review (commit `d692232`)

Re-review pass at d692232. Diff against a020350 is commit 397d3c5: top-level crate renamed mxgateway-client → zb-mom-ww-mxgateway-client, build.rs proto path updated, every use mxgateway_client:: sweep-replaced to use zb_mom_ww_mxgateway_client::, tests/client_behavior.rs updated for the retired session_id and a new stream_alarms impl on the fake gateway, and the protocol-version assertion bumped to 3. Workspace member layout unchanged. cargo test --workspace and cargo clippy --workspace --all-targets -- -D warnings both clean at HEAD.

#	Category	Result
1	Correctness & logic bugs	No issues found in the a020350..d692232 diff.
2	mxaccessgw conventions	No issues found — wire identifiers (CLI binary `mxgw`, `MXGATEWAY_*` env vars, `mxaccess_gateway.v1` proto packages) unchanged per commit message.
3	Concurrency & thread safety	No issues found in this diff.
4	Error handling & resilience	No issues found in this diff.
5	Security	No issues found in this diff.
6	Performance & resource management	No issues found in this diff.
7	Design-document adherence	Issues found: Client.Rust-021 (`RustClientDesign.md` "Crate layout" section still shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/` directory that does not exist; actual layout is the flat top-level crate at `clients/rust/`).
8	Code organization & conventions	No issues found in this diff.
9	Testing coverage	No issues found in this diff — the fake gateway's `stream_alarms` impl is a minimal stub, but no production behaviour relies on it being exercised by a test.
10	Documentation & comments	Issues found: Client.Rust-021 (design-doc drift).

2026-05-24 re-review (commit `42b0037`)

Re-review pass at 42b0037. The diff against d692232 is dominated by three feature commits — 71d2c39 (ports the cross-language batch subcommand to every CLI; for Rust this adds a Batch clap variant plus a run_batch stdin-driven dispatch loop, and a .cargo/config.toml setting -C link-arg=/STACK:8388608 so debug builds of the now-large clap-derive Command enum do not overflow Windows's default 1 MB main-thread stack); 3251069 (ports BenchReadBulk and re-adds the bulk read/write SDK methods to session.rs — read_bulk, write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk plus the matching CLI subcommands; this commit silently dropped the BenchReadBulkStats helper, the failureLatencyMs/firstFailure JSON enrichment, and every test from Client.Rust-015/016/018 that lived on the divergent branch); and 7de4efe (adds stream_alarms / acknowledge_alarm raw SDK methods on GatewayClient, the AlarmFeedStream type alias re-exported at the crate root, two new CLI subcommands stream-alarms / acknowledge-alarm, and fixes event_to_json to render MxEventFamily as the protobuf enum name so the cross-language e2e parity matcher recognises events). 8738735 is a README-only update for the new alarm SDK methods + CLI examples.

cargo test --workspace is clean (26 tests across the library, client-behavior integration suite, and proto-fixture suite). cargo clippy --workspace --all-targets -- -D warnings fails at HEAD with three errors — the same three the previous reviewer flagged in Client.Rust-021's resolution as "out of scope" (see Client.Rust-029 below): options.rs:98,143 missing-docs, galaxy.rs:282 clone-on-copy, and session.rs:664 enum-variant-names on BulkReplyKind. All three were resolved at a020350 (Client.Rust-001/002/012) and regressed when the crate rename 397d3c5 reverted parts of the prior fix; the diff under review does not address them.

#	Category	Result
1	Correctness & logic bugs	Issue found: the new bulk read/write SDK methods (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) silently return `Vec::new()` on malformed-but-OK replies (Client.Rust-022).
2	mxaccessgw conventions	Issues found: hard-coded correlation IDs in the new CLI subcommands `stream-alarms`, `acknowledge-alarm`, and `bench-read-bulk`'s session-close (Client.Rust-023); clippy regression at HEAD (Client.Rust-029).
3	Concurrency & thread safety	No issues found — `stream_alarms`/`acknowledge_alarm` go through the existing `unary_request`/`stream_request` path, no new shared mutable state. The `run_batch` stdin loop uses blocking `std::io::Stdin::lock().lines()` on a tokio main task but each dispatched subcommand is spawned on a fresh task so its future does not share the main-thread stack.
4	Error handling & resilience	No new issues beyond Client.Rust-022's malformed-OK silent default. `bench-read-bulk`'s session close is best-effort and ordered so a bench-loop error does not mask the close (`let _ = close_result;` after `bench_outcome?`).
5	Security	No issues found — alarm `comment`/`operator` strings flow through the redaction-aware `Error::Display` path; bench's `tags` list is logged in cleartext but tags are not credentials.
6	Performance & resource management	Issue found: `run_bench_read_bulk` clones the `Vec<String>` tag list on every warmup + steady-state iteration (Client.Rust-026).
7	Design-document adherence	Issue found: `RustClientDesign.md` is silent on the new alarm SDK methods, the new bulk read/write SDK methods, every new CLI subcommand, and the new `.cargo/config.toml` workaround (Client.Rust-025).
8	Code organization & conventions	No new issues — the `event_to_json` family-as-enum-name fix is a small, localised change; the bulk-write helpers are factored consistently with the existing bulk-subscribe helpers.
9	Testing coverage	Issue found: zero tests cover `stream_alarms` on `GatewayClient`, the new bulk read/write SDK methods, or the `BenchReadBulk` flow; the fake gateway's `stream_alarms` impl drops the sender immediately (Client.Rust-024).
10	Documentation & comments	Issue found: `.cargo/config.toml`'s comment promises "Release builds are unaffected" but the `link-arg=/STACK:8388608` setting is unconditional under `cfg(windows)` and only applies to the MSVC linker (Client.Rust-027).

Findings

Client.Rust-001

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/options.rs:98,143`
Status	Resolved

Description: with_max_grpc_message_bytes and max_grpc_message_bytes have no /// doc comments. The crate sets #![warn(missing_docs)] and CLAUDE.md mandates that cargo clippy --workspace --all-targets -- -D warnings pass. Under -D warnings these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.

Recommendation: Add doc comments to both methods, e.g. /// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).

Resolution: Resolved in 0d8a28d (2026-05-18): doc comments added to both methods.

Client.Rust-002

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/session.rs:522`
Status	Resolved

Description: The BulkReplyKind enum's variants (AddItemBulk, AdviseItemBulk, RemoveItemBulk, UnAdviseItemBulk, SubscribeBulk, UnsubscribeBulk) all share the Bulk suffix, tripping clippy::enum_variant_names. Under -D warnings this is a compile error, so cargo clippy --workspace --all-targets -- -D warnings fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.

Recommendation: Rename the variants to drop the common suffix (e.g. AddItem, AdviseItem, …) or add a narrowly-scoped #[allow(clippy::enum_variant_names)] with a reason comment.

Resolution: Resolved in 0d8a28d (2026-05-18): variants renamed to AddItem/AdviseItem/RemoveItem/UnAdviseItem/Subscribe/Unsubscribe, which no longer share a common suffix.

Client.Rust-003

Field	Value
Severity	High
Category	Correctness & logic bugs
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1051`
Status	Resolved

Description: The unit test version_json_output_has_protocol_versions asserts value["gatewayProtocolVersion"] == 2, but GATEWAY_PROTOCOL_VERSION is 3 (version.rs:10), matching the authoritative server constant GatewayContractInfo.GatewayProtocolVersion = 3. The test fails, so cargo test --workspace (the documented test step) does not pass — the test was not updated when the protocol version was bumped.

Recommendation: Update the assertion to 3, or better, assert against GATEWAY_PROTOCOL_VERSION so it cannot drift again.

Resolution: Resolved in 0d8a28d (2026-05-18): the test now asserts against the GATEWAY_PROTOCOL_VERSION / WORKER_PROTOCOL_VERSION constants, so it cannot drift again.

Client.Rust-004

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/rust/src/version.rs:7`
Status	Resolved

Description: CLIENT_VERSION is "0.1.0-dev" and its doc comment claims "Mirrors Cargo.toml", but Cargo.toml declares version = "0.1.0" (no -dev suffix). The comment is misleading and the value is not actually kept in sync with the manifest.

Recommendation: Either set CLIENT_VERSION from the build via env!("CARGO_PKG_VERSION"), or correct the constant to "0.1.0" and drop the "Mirrors Cargo.toml" claim.

Resolution: Resolved in 0d8a28d (2026-05-18): CLIENT_VERSION is now env!("CARGO_PKG_VERSION"), taken from Cargo.toml at compile time so the two cannot drift.

Client.Rust-005

Field	Value
Severity	Medium
Category	Correctness & logic bugs
Location	`clients/rust/src/session.rs:489-520`
Status	Resolved

Description: register_server_handle, add_item_handle, and add_item2_handle fall through to reply.return_value … .unwrap_or_default(), returning 0 when the reply carries neither the expected typed payload nor an Int32 return_value. Because Session::invoke has already confirmed protocol_status == Ok, a malformed-but-OK reply silently yields handle 0, which the caller then uses as a real handle against the worker.

Recommendation: Return Err(Error::ProtocolStatus { … }) (or a dedicated Error::MalformedReply) when an OK reply lacks an extractable handle, instead of defaulting to 0.

Resolution: Resolved in 0d8a28d (2026-05-18): the three handle extractors now return Result<i32, Error> and yield the new Error::MalformedReply when an OK reply carries no usable handle.

Client.Rust-006

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/src/session.rs:531-555`
Status	Resolved

Description: bulk_results returns Vec::new() for any (payload, kind) combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of subscribe_bulk/add_item_bulk then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".

Recommendation: Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.

Resolution: Resolved in 0d8a28d (2026-05-18): bulk_results now returns Result<Vec<SubscribeResult>, Error> and yields Error::MalformedReply on a mismatched or absent bulk payload.

Client.Rust-007

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:14-55`
Status	Resolved

Description: RustClientDesign.md is stale relative to the implemented code. It documents a nested crates/mxgateway-client/ layout (the real crate root is clients/rust/ with a flat src/), and lists tracing among "Expected dependencies", but tracing appears in no Cargo.toml. CLAUDE.md requires docs to change with the source.

Recommendation: Update RustClientDesign.md to the actual flat layout and remove tracing from the dependency list (or add tracing if structured logging is genuinely intended).

Resolution: Resolved in 0d8a28d (2026-05-18): the "Crate Layout" section now shows the actual flat layout (mxgateway-client as the workspace-root crate, mxgw-cli as a member) and the unused tracing entry was removed from the dependency list.

Client.Rust-008

Field	Value
Severity	Low
Category	Performance & resource management
Location	`clients/rust/src/value.rs:161-261`
Status	Resolved

Description: MxValueProjection::from_proto and MxArrayProjection::from_proto deep-clone every element out of the wire message while MxValue/MxArrayValue also retain the original raw message. Every MxValue therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.

Recommendation: Compute the projection lazily on demand, or have the projection borrow from raw, so array/raw payloads are not duplicated for every wrapped value.

Resolution: Resolved in 0d8a28d (2026-05-18): MxValue and MxArrayValue no longer cache a projection field — projection() computes the typed view on demand from raw. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.

Client.Rust-009

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs`
Status	Resolved

Description: Several critical paths are untested: TLS channel setup (with_plaintext(false) / CA-file loading), mid-stream tonic::Status fault propagation through EventStream/DeployEventStream (tests only send Ok items), and the bulk-size cap (ensure_bulk_size rejecting >1000 items).

Recommendation: Add tests that (a) feed an Err(Status) into the event/deploy streams and assert it surfaces as the mapped Error, (b) assert add_item_bulk with 1001 items returns Error::InvalidArgument, and (c) exercise the CA-file/InvalidEndpoint error path.

Resolution: Resolved in 0d8a28d (2026-05-18): added add_item_bulk_rejects_input_above_the_thousand_item_cap, event_stream_surfaces_a_mid_stream_status_fault (the fake gateway now optionally emits a mid-stream Status::unavailable), and connect_with_unreadable_ca_file_reports_invalid_endpoint.

Client.Rust-010

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216`
Status	Resolved

Description: The client applies only a per-call deadline via Request::set_timeout and has no retry, reconnect, or transient-vs-permanent classification. A transient Unavailable (e.g. a gateway restart) maps to the catch-all Error::Status and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.

Recommendation: Either add a documented Error::Unavailable variant classifying Code::Unavailable/Code::ResourceExhausted, or explicitly document in the README that the client performs no retries and that transient failures arrive as Error::Status.

Resolution: Resolved in 0d8a28d (2026-05-18): added the Error::Unavailable variant; From<tonic::Status> maps Code::Unavailable and Code::ResourceExhausted to it, so callers can classify transient failures without unwrapping the raw status.

Client.Rust-011

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/src/session.rs:469`
Status	Resolved

Description: command_request hard-codes client_correlation_id as format!("rust-client-{}", kind.as_str_name()). Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.

Recommendation: Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.

Resolution: Resolved in 0d8a28d (2026-05-18): correlation ids are built by next_correlation_id, which appends a process-wide atomic sequence number; Session::close uses it too.

Client.Rust-012

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/galaxy.rs:282`
Status	Resolved

Description: Found while verifying the fix for Client.Rust-001/002: cargo clippy --workspace --all-targets -- -D warnings reported a third violation the original review missed. The get_last_deploy_time test fake calls .clone() on a MutexGuard<Option<prost_types::Timestamp>>, and Option<Timestamp> is Copy (clippy::clone_on_copy). Under -D warnings this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.

Recommendation: Dereference instead of cloning: *self.state.last_deploy.lock().unwrap().

Resolution: Resolved in 0d8a28d (2026-05-18): replaced .clone() with a deref. cargo clippy --workspace --all-targets -- -D warnings now passes cleanly.

Client.Rust-013

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site)
Status	Resolved

Description: cargo clippy --workspace --all-targets -- -D warnings fails again on this commit, this time on a clippy::doc_lazy_continuation violation in generated code:

error: doc list item without indentation
   --> .../mxaccess_gateway.v1.rs:526:5
    |
526 | /// `timeout_ms == 0` uses the gateway-configured default (1000 ms).
    |     ^

The lint fires because the ReadBulkCommand proto comment (added with the bulk Read feature in commit 5e375f6) writes a bulleted list and then a trailing paragraph without the required blank line. prost-build forwards the proto comment verbatim into Rust doc comments, and the Rust client compiles those generated modules with crate-default lints. The crate already opts out of clippy::large_enum_variant in src/generated.rs for exactly this kind of generator-style problem, but doc_lazy_continuation is not on the allow-list, so the lint reaches -D warnings and breaks the documented cargo clippy --workspace --all-targets -- -D warnings invocation that CLAUDE.md mandates pass. The Rust client review was previously closed as clippy-clean (Client.Rust-001/002/012); this is the third clippy-clean regression caused by generated code in this module and warrants a more durable fix.

Recommendation: Add #![allow(clippy::doc_lazy_continuation)] to each generated submodule in clients/rust/src/generated.rs alongside clippy::large_enum_variant, so generated doc comments — which the client cannot edit — cannot break the -D warnings build. Independently, fix the upstream proto comment to insert a blank line before the trailing paragraph so the C# / Go / Python / Java generators do not carry the same flaky text.

Resolution: 2026-05-20 — Added #![allow(clippy::doc_lazy_continuation)] to each generated submodule in clients/rust/src/generated.rs next to the existing clippy::large_enum_variant allow, and reformatted the ReadBulkCommand proto comment in src/MxGateway.Contracts/Protos/mxaccess_gateway.proto to surround the bulleted list with blank lines so doc-comment generators in every language see a properly-terminated list. cargo clippy --workspace --all-targets -- -D warnings and cargo test --workspace now pass, and dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj reports 0 warnings.

Client.Rust-014

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/crates/mxgw-cli/src/main.rs:450,497`
Status	Resolved

Description: Client.Rust-011 made Session build unique correlation ids per call, but the mxgw CLI's Ping and CloseSession subcommands still hard-code client_correlation_id: "rust-cli-ping".to_owned() and "rust-cli-close-session".to_owned(). Both go through client.invoke(…) / client.close_session_raw(…) rather than the Session helpers, so the library's id generator does not run. The CLI is the cross-language e2e driver — when the same machine runs concurrent CLI smokes, every ping/close-session request collides on the same correlation id in gateway logs, defeating the diagnostic value the library fix unlocked.

Recommendation: Either (a) expose session::next_correlation_id as a pub(crate) or library-level helper and have the CLI call it from Ping/CloseSession, or (b) replace these RPCs with the higher-level Session helpers (Session::close, and a thin Session::ping wrapper) so the CLI shares the library's correlation-id discipline by construction.

Resolution: 2026-05-20 — Promoted session::next_correlation_id from a module-private helper to a pub library-level function (it already lived in the pub mod session) and updated the mxgw CLI's Ping and CloseSession subcommands to call mxgateway_client::session::next_correlation_id("cli-ping") / next_correlation_id("cli-close-session") instead of the hard-coded "rust-cli-ping" / "rust-cli-close-session" strings. Concurrent CLI smokes now produce unique correlation ids per call — driven by the same process-wide CORRELATION_SEQUENCE AtomicU64 the library uses — so gateway logs can tell collisions apart again. cargo fmt, cargo build --workspace, cargo clippy --workspace --all-targets -- -D warnings, and cargo test --workspace all pass.

Client.Rust-015

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1053-1070`
Status	Resolved

Description: The new cross-language benchmark bench-read-bulk pushes the elapsed time of every read_bulk call into latencies_ms regardless of whether the call returned Ok or Err:

let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
latencies_ms.push(elapsed_ms);
match outcome {
    Ok(results) => { successful_calls += 1; … }
    Err(_) => failed_calls += 1,
}

A failed read_bulk (transient Unavailable, deadline-exceeded mid-call, etc.) typically returns later than a successful one — it includes the full per-call timeout that the success path never waits for. The histogram therefore conflates "p99 cached-read latency" with "p99 of (cached-read + timed-out call)", and the JSON document the PowerShell driver collates publishes latencyMs.p99 / latencyMs.max that no longer represent successful-call latency. Worse, the failure category is silently dropped (Err(_) => failed_calls += 1) so a benchmark run that fails on every call still emits a coherent-looking JSON without ever surfacing why. This is misleading for a benchmark whose JSON shape is the cross-language comparison contract.

Recommendation: Only push elapsed time into latencies_ms on Ok, or split into two histograms (successLatencyMs and failureLatencyMs) and log the first failure's error string into the stats record so a partial-failure run is visible at the report layer.

Resolution: 2026-05-20 — Extracted the per-iteration accounting in bench-read-bulk into a BenchReadBulkStats helper with explicit record_success/record_failure methods. Successful read_bulk calls now flow into success_latencies_ms (driving the cross-language latencyMs.p99/max JSON contract), failures flow into a separate failure_latencies_ms histogram surfaced as failureLatencyMs, and the first failure's redacted error string is stashed as firstFailure so a partial-failure run is visible at the report layer instead of producing a coherent-looking JSON that hides every error. Added a unit test (bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram) that records two fast successes plus a deliberately slow failure and asserts the success histogram never sees the failure latency, plus a smaller smoke test for the zero-duration calls-per-second path.

Client.Rust-016

Field	Value
Severity	Medium
Category	Testing coverage
Location	`clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768`
Status	Resolved

Description: The fixes for Client.Rust-005 / 006 added five new Error::MalformedReply paths to session.rs (register_server_handle, add_item_handle, add_item2_handle, bulk_results, bulk_write_results) plus the inline branch in read_bulk. None of them are exercised by tests — every test in client_behavior.rs feeds the matching payload back to the client, so the malformed-reply branches are dead code from the test suite's perspective. The new bulk-write helpers (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk) only have a single happy-path assertion via write_bulk, leaving the three other variants and every per-entry-failure shape untested. The bench-read-bulk flow has no test (the driver script is the only consumer). The Error::Unavailable variant from Client.Rust-010 is covered by event_stream_surfaces_a_mid_stream_status_fault, but the same variant on a unary Code::Unavailable is not.

Recommendation: Add three light tests against the existing FakeGateway:

Have the fake reply to AddItem (or Register / AddItem2) with protocol_status = Ok and no payload, and assert the client surfaces Error::MalformedReply.
Have the fake reply to WriteBulk with protocol_status = Ok and the wrong payload arm (e.g. an AddItemReply body), and assert Error::MalformedReply.
Have the fake fail the unary Invoke with Status::unavailable(...) and assert Error::Unavailable.

Optionally add Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk smoke assertions so all four bulk-write families have at least one round-trip test.

Resolution: 2026-05-20 — Added eight new integration tests in clients/rust/tests/client_behavior.rs. Each new Error::MalformedReply site is exercised via a test-only InvokeOverride injected into FakeState that lets a single test pin the fake gateway's Invoke handler to one of three malformed shapes (OK reply with no payload, OK reply with the wrong payload arm for read_bulk, OK reply with the wrong payload arm for the other bulk / bulk-write families): register_returns_malformed_reply_when_ok_reply_has_no_payload, add_item_returns_malformed_reply_when_ok_reply_has_no_payload, add_item2_returns_malformed_reply_when_ok_reply_has_no_payload, subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm, write_bulk_returns_malformed_reply_on_mismatched_payload_arm, and read_bulk_returns_malformed_reply_on_mismatched_payload_arm. The unary Error::Unavailable path is covered by unary_invoke_maps_status_unavailable_to_error_unavailable (the override returns Status::unavailable(...)). The remaining three bulk-write families gained round-trip smoke tests — write2_bulk_round_trips_through_the_fake_gateway, write_secured_bulk_round_trips_through_the_fake_gateway, write_secured2_bulk_round_trips_through_the_fake_gateway — extending the fake gateway's dispatcher with happy-path replies for Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk. The bench-read-bulk flow gets a BenchReadBulkStats unit test in crates/mxgw-cli/src/main.rs (see Client.Rust-015) that asserts the latency-tracking change keeps failed-call durations out of latencyMs.

Client.Rust-017

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:79-99,156-163`
Status	Resolved

Description: CLAUDE.md requires docs to change with the source. RustClientDesign.md was refreshed to fix the layout/tracing drift (Client.Rust-007), but the Session API surface in the design (Library API block, lines 79-99) still lists only the original six bulk helpers — add_item_bulk, advise_item_bulk, remove_item_bulk, un_advise_item_bulk, subscribe_bulk, unsubscribe_bulk — and is missing the five new bulk-write helpers and read_bulk (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk, read_bulk) that landed in commits 5e375f6 / f220908 / 61644e6. The Error Handling block (lines 130-146) still enumerates Transport, Status, Authentication, Authorization, Session, Worker, Command, MxAccess, Timeout, Cancelled — but not MalformedReply, Unavailable, or InvalidEndpoint, all of which are now public variants of the crate's Error enum. The Test CLI block (lines 158-163) lists version / smoke / stream-events / write but is missing every new subcommand (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk, galaxy watch).

Recommendation: Bring the design doc back in sync: extend the Session API code block to enumerate the bulk-write/read methods, expand the Error enum to match clients/rust/src/error.rs, and add the missing CLI subcommands. The README is already up to date, so this is design-doc-only churn.

Resolution: 2026-05-20 — Brought clients/rust/RustClientDesign.md back in sync with the implementation. The Session block now lists the five new bulk helpers (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk, read_bulk) alongside the original six and notes that session::next_correlation_id is pub for raw-RPC consumers (the CLI). The Error enum block now matches clients/rust/src/error.rs — InvalidEndpoint, InvalidArgument, Transport, Authentication, Authorization, Timeout, Cancelled, Unavailable, Status, Command, ProtocolStatus, MalformedReply — with a short paragraph explaining what Unavailable, MalformedReply, and InvalidEndpoint classify. The Test CLI block enumerates every subcommand the binary exposes today: version, ping, open-session, close-session, register, add-item, advise, subscribe-bulk, unsubscribe-bulk, read-bulk, write, write2, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, stream-events, bench-read-bulk, smoke, and the galaxy {test-connection,last-deploy-time,discover-hierarchy,watch} subtree.

Client.Rust-018

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1098-1170`; `scripts/bench-read-bulk.ps1:347-365`; siblings: `clients/go/cmd/mxgw-go/main.go:600-648`, `clients/python/src/mxgateway_cli/commands.py:614-662`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:685-770`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-940`
Status	Resolved

Description: Client.Rust-015's resolution split Rust's bench histogram so latencyMs records only successful read_bulk calls and a new failureLatencyMs field holds failed-call durations. The local logic is right, the unit test (bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram) is right, and the JSON shape stays additively compatible with scripts/bench-read-bulk.ps1 (the collator reads $s.latencyMs.p50/p95/p99/max/mean and these keys still exist on the Rust output). The problem is cross-language: the .NET, Go, Python, and Java bench implementations still push every call's elapsed time into a single latenciesMs / latencies_ms / latencyMillis array regardless of success or failure (e.g. clients/go/cmd/mxgw-go/main.go:611 appends before the success/failure branch; clients/python/src/mxgateway_cli/commands.py:624,626 appends in both except and the happy path; clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:701,705 adds in both catch and the OK path; clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:865,880 records in both branches). The PS driver's side-by-side comparison table (lines 348-360) pulls latencyMs.p50/p95/p99/max/mean from every client and prints them in one row, so a partial-failure run now shows Rust's p99 measured over successes only and the other four clients' p99 measured over (success + per-call timeout) — the numbers are silently no longer comparable. This re-introduces the original Client.Rust-015 problem at the cross-language layer that the fix was meant to remove.

Recommendation: Make the contract uniform. Either (a) revert Rust's latencyMs to the all-calls histogram for backwards/cross-language compatibility and keep failureLatencyMs as an additive Rust-only enrichment, or (b) push the same success-only / failure-separated split into the .NET, Go, Python, and Java bench commands so every language emits the honest pair (latencyMs = success, failureLatencyMs = failure, plus firstFailure) and update the PS driver's table column to make the success-only semantics explicit (p99 ok ms). Option (b) is the better long-term posture but it is a cross-client change; option (a) restores comparability immediately.

Resolution: 2026-05-20 — Took option (a) to restore cross-language comparability immediately. Reverted Rust's latencyMs to the all-calls histogram so it matches the .NET/Go/Python/Java bench shape that scripts/bench-read-bulk.ps1 collates side-by-side: BenchReadBulkStats::record_success and record_failure now both push elapsed time into a single latencies_ms vector, and record_failure additionally pushes into failure_latencies_ms and stashes the first failure's redacted error string in first_failure. The JSON output keeps failureLatencyMs and firstFailure as Rust-only additive enrichment so a partial-failure run is still visible at the report layer without breaking the side-by-side table. Renamed the unit test to bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately; it now asserts latencyMs.max == 1500.0 (the slow failure is included in the cross-language latencyMs contract) while failureLatencyMs.max == 1500.0 and firstFailure still surface the failure separately for diagnostics. Pushing the success-only / failure-separated split into the other four clients (option (b)) is the better long-term posture but is deliberately out of scope here.

Client.Rust-019

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:96-100`
Status	Resolved

Description: Client.Rust-017 was closed by adding the new bulk-write/read entries to the design doc, but the signatures shown in the code block do not match the implementation. The doc declares:

pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;

The actual implementations in clients/rust/src/session.rs:385-526 take only (server_handle, entries) — user_id is per-entry on WriteBulkEntry/Write2BulkEntry, timestamp_value is per-entry on Write2BulkEntry/WriteSecured2BulkEntry, and current_user_id/verifier_user_id are per-entry on WriteSecured{,2}BulkEntry. The protobuf in src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:364-416 confirms this — there is no top-level user_id on these commands. The doc also returns Vec<ReadBulkResult> but the generated type is BulkReadResult (the gateway's BulkReadReply carries repeated BulkReadResult), and the actual signature is read_bulk<S: AsRef<str>>(..., tag_addresses: &[S], ...) -> Vec<BulkReadResult> — generic over AsRef<str> so callers can pass either Vec<String> or [&str].

The drift is small but the design doc was the explicit subject of Client.Rust-017's resolution, so it warrants a follow-up. CLAUDE.md requires docs to change with the source.

Recommendation: Replace the five signatures in RustClientDesign.md:96-100 with the ones actually in session.rs:

pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;

and add a one-line note that the per-entry fields (user_id, timestamp_value, current_user_id, verifier_user_id) live on the entry structs themselves.

Resolution: 2026-05-20 — Replaced the five drifted signatures in RustClientDesign.md with the ones that actually live in clients/rust/src/session.rs: write_bulk / write2_bulk / write_secured_bulk / write_secured2_bulk take only (server_handle, entries), and read_bulk<S: AsRef<str>> takes a borrowed &[S] and returns Vec<BulkReadResult> (not Vec<ReadBulkResult>). Added a follow-up paragraph noting that the per-entry fields user_id / timestamp_value / current_user_id / verifier_user_id live on WriteBulkEntry / Write2BulkEntry / WriteSecuredBulkEntry / WriteSecured2BulkEntry themselves rather than as trailing positional arguments, matching the protobuf shapes in mxaccess_gateway.proto, and that read_bulk is generic over AsRef<str> so callers can pass &[String] or &[&str] without cloning at the call site.

Client.Rust-020

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39`
Status	Resolved

Description: Client.Rust-014's resolution promoted next_correlation_id from a module-private helper to a pub function so the mxgw CLI's raw-RPC paths can share the library's correlation-id discipline. The doc comment commits the library to a literal string format — "rust-client-{label}-{N}" — that external code can now depend on. Two concerns:

The function is not re-exported at the crate root in lib.rs (it only ships through the pub mod session namespace), so the in-tree caller writes the long mxgateway_client::session::next_correlation_id("cli-ping") path. Either re-export it via #[doc(inline)] pub use session::next_correlation_id; or leave it where it is and add a short note in the doc — but the current state straddles "public API" and "lib-internal helper" without saying which.
The doc comment does not declare a stability stance (no #[doc(hidden)], no "experimental" note, no __priv naming). As written it promises the literal format "rust-client-{label}-{N}" to any out-of-tree consumer; a future change that renames the prefix (for example to drop the rust- after a multi-client reformat) would be a behavioural break. The RustClientDesign.md resolution of Client.Rust-017 ("session::next_correlation_id is pub") reads similarly — it does not say whether the format is stable.

The combination — pub, format-committing doc, no stability note, no crate-root re-export — leaves the public surface ambiguous. The same review category (Documentation & comments) is where Client.Rust-014's CLI-side fix is now visible, so this is the natural place to clean it up.

Recommendation: Pick one of:

Treat next_correlation_id as part of the SDK's public API. Re-export it from lib.rs (#[doc(inline)] pub use session::next_correlation_id;), rewrite the doc comment to not promise the literal "rust-client-{label}-{N}" format (just the property "monotonic, unique within a process, includes the supplied label"), and call that out in RustClientDesign.md.
Treat it as internal-only. Mark it #[doc(hidden)] pub and add a // Internal helper exposed for the in-tree mxgw CLI; not part of the public SDK contract. comment so out-of-tree consumers do not build against a format that the SDK is free to change.

The CLI integration in Client.Rust-014 works either way; this is solely about declaring intent so the SDK's public surface is unambiguous.

Resolution: 2026-05-20 — Took the "treat as public SDK API" branch. Re-exported next_correlation_id at the crate root in clients/rust/src/lib.rs (#[doc(inline)] pub use session::{next_correlation_id, Session};) so in-tree and external callers can write the short mxgateway_client::next_correlation_id(...) path. Updated the in-tree mxgw CLI (Ping and CloseSession subcommands) to call through the crate-root re-export instead of mxgateway_client::session::next_correlation_id. Rewrote the doc comment to drop the format promise: the returned id is now documented as an opaque token with three guaranteed properties (embeds the supplied label, unique within a process via an internal monotonic atomic sequence, carries no embedded secret beyond label), and the doc explicitly states that the textual format rust-client-{label}-{N} is not part of the public contract and that callers must not parse it. Cross-referenced the crate-root re-export from the function-level doc. Updated RustClientDesign.md to call out that next_correlation_id is part of the public SDK surface, re-exported at the crate root, and that its textual format is intentionally not part of the contract.

Client.Rust-021

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:14-33`
Status	Resolved

Description: The crate-name change in commit 397d3c5 (top-level mxgateway-client → zb-mom-ww-mxgateway-client) is reflected in Cargo.toml, Cargo.lock, every use zb_mom_ww_mxgateway_client:: import, and build.rs. The "Recommended layout" block in RustClientDesign.md:21-33 shows a nested structure with a top-level crates/zb-mom-ww-mxgateway-client/ subdirectory containing src/lib.rs, src/client.rs, etc. — but the actual layout on disk is flat: the top-level crate lives at clients/rust/ directly (with src/, Cargo.toml, and build.rs at the workspace root) and crates/mxgw-cli/ is the only nested member. A reader consulting the design to understand the layout will be misled into looking for crates/zb-mom-ww-mxgateway-client/ that does not exist. CLAUDE.md requires design docs to track code changes.

Recommendation: Update RustClientDesign.md:14-33 to describe the actual flat layout: workspace root at clients/rust/, top-level crate zb-mom-ww-mxgateway-client (declared in Cargo.toml) with src/lib.rs, src/client.rs, etc. directly under it; crates/mxgw-cli/ is the single member subcrate. Alternatively label the block as "Aspirational nested layout (not currently adopted)" and add a separate "Current layout" section.

Resolution: 2026-05-24 — Rewrote the "Crate Layout" section of clients/rust/RustClientDesign.md to describe the actual flat layout on disk instead of the aspirational nested form. The section now opens with a short paragraph stating that the workspace is rooted at clients/rust/, that clients/rust/Cargo.toml declares the top-level crate zb-mom-ww-mxgateway-client itself (flat layout — src/ sits directly under the workspace root, not under crates/), and that the only [workspace.members] entry is the mxgw-cli binary subcrate under crates/mxgw-cli/. The accompanying tree replaces the fictitious crates/zb-mom-ww-mxgateway-client/ subdirectory with the real files: Cargo.toml, Cargo.lock, build.rs, README.md, RustClientDesign.md, src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs plus src/generated/ (tonic-build output), tests/{client_behavior,proto_fixtures}.rs, and crates/mxgw-cli/ annotated as the sole workspace member producing the mxgw binary. cargo build --workspace and cargo test --workspace are clean at HEAD (29 tests pass across the library, integration, and proto-fixture targets). cargo clippy --workspace --all-targets -- -D warnings reports three pre-existing errors at HEAD on this dev box (options.rs:98,143 missing docs, galaxy.rs:282 clone_on_copy, session.rs:522 enum_variant_names) that exist independently of this doc-only change — verified by running clippy with the design-doc edit stashed; tracking those is out of scope for Client.Rust-021.

Client.Rust-022

Field	Value
Severity	Medium
Category	Correctness & logic bugs
Location	`clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724`
Status	Resolved

Description: Commit 3251069 re-introduced the bulk read/write SDK methods (read_bulk, write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk) on Session. Each method falls back to Vec::new() when an OK reply does not carry the expected typed payload arm:

Ok(match reply.payload {
    Some(mx_command_reply::Payload::ReadBulk(reply)) => reply.results,
    _ => Vec::new(),
})

and bulk_write_results does the same for the four write families. A caller of write_bulk / read_bulk on a malformed-but-OK reply therefore sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply". This is exactly the anti-pattern Client.Rust-006 fixed in 0d8a28d (which added an Error::MalformedReply variant and made bulk_results return Result<Vec<…>, Error>), and which Client.Rust-005 fixed for register_server_handle / add_item_handle / add_item2_handle (those three also now fall back to unwrap_or_default() returning handle 0 at session.rs:631-660, regressing alongside the new methods). The fix lived on the divergent branch and was lost when the d692232 line of history was re-platformed without picking up the post-a020350 Resolve commits — Error::MalformedReply is absent from clients/rust/src/error.rs at HEAD.

Recommendation: Re-apply the Client.Rust-005/006 resolutions on top of the new methods: add Error::MalformedReply { detail: String } back to error.rs, change register_server_handle / add_item_handle / add_item2_handle to return Result<i32, Error> (yielding MalformedReply when the reply lacks an extractable handle), change bulk_results and bulk_write_results to return Result<Vec<_>, Error> (yielding MalformedReply on a mismatched / absent payload arm), and route the same fix through the new read_bulk inline branch. Re-add the six …_returns_malformed_reply_… tests from the Client.Rust-016 resolution to lock the contract in.

Resolution: 2026-05-24 — Re-added Error::MalformedReply { detail: String } to clients/rust/src/error.rs (alongside re-adding Error::Unavailable for Client.Rust-010's resolution). Changed register_server_handle / add_item_handle / add_item2_handle in clients/rust/src/session.rs to return Result<i32, Error> and yield MalformedReply when the reply lacks both a typed payload and an int32 return_value. Changed bulk_results and bulk_write_results to return Result<Vec<_>, Error> and yield MalformedReply on a mismatched or absent payload arm. Rewrote read_bulk's inline branch the same way. Added six regression tests in clients/rust/tests/client_behavior.rs — register_returns_malformed_reply_when_ok_reply_has_no_payload, add_item_returns_malformed_reply_when_ok_reply_has_no_payload, add_item2_returns_malformed_reply_when_ok_reply_has_no_payload, subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm, read_bulk_returns_malformed_reply_on_mismatched_payload_arm, write_bulk_returns_malformed_reply_on_mismatched_payload_arm — driven by a new InvokeOverride enum on FakeState that pins the fake gateway's Invoke handler to OkWithoutPayload, OkWithMismatchedPayload, or Unavailable(...) per test.

Client.Rust-023

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/crates/mxgw-cli/src/main.rs:835,872,1476`
Status	Resolved

Description: Three CLI subcommands added since d692232 hard-code their client_correlation_id:

client_correlation_id: "rust-cli-stream-alarms".to_owned(),       // line 835
client_correlation_id: "rust-cli-acknowledge-alarm".to_owned(),   // line 872
client_correlation_id: "rust-cli-bench-read-bulk-close".to_owned(), // line 1476

Every invocation of these subcommands on the same machine — and every iteration of the cross-language e2e matrix that exercises them concurrently — produces requests with identical correlation IDs in gateway logs, defeating the diagnostic value Client.Rust-011/014 unlocked when next_correlation_id was made pub. (As noted in the d692232 review-row header, the prior rust-cli-ping/rust-cli-close-session hard-codes are also still in place at lines 506 and 553 after the rename commit reverted the Client.Rust-014 fix; the three new ones extend the same regression to new CLI surface.)

Recommendation: Replace the three string literals with calls to the existing next_correlation_id helper (already pub and intended for raw-RPC consumers): zb_mom_ww_mxgateway_client::session::next_correlation_id("cli-stream-alarms"), next_correlation_id("cli-acknowledge-alarm"), next_correlation_id("cli-bench-read-bulk-close"). While here, restore the same fix at lines 506 and 553 so the CLI surface as a whole shares the library's correlation-id discipline.

Resolution: 2026-05-24 — Restored pub fn next_correlation_id(label: &str) -> String in clients/rust/src/session.rs (the d692232 rename had reverted Client.Rust-011/014's resolution along with the rest) and re-exported it at the crate root in clients/rust/src/lib.rs (pub use session::{next_correlation_id, Session};) so the in-tree mxgw CLI calls the short zb_mom_ww_mxgateway_client::next_correlation_id(...) path. Replaced all five hard-coded correlation-id literals in clients/rust/crates/mxgw-cli/src/main.rs (rust-cli-ping, rust-cli-close-session, rust-cli-stream-alarms, rust-cli-acknowledge-alarm, rust-cli-bench-read-bulk-close) with next_correlation_id("cli-..."). Session::command_request and Session::close now use the same helper. Added a regression test cli_subcommands_propagate_unique_correlation_ids_from_next_correlation_id in clients/rust/tests/client_behavior.rs that records the correlation id observed at the fake gateway and asserts both that successive next_correlation_id calls produce distinct values and that the label is propagated through stream_alarms / acknowledge_alarm.

Client.Rust-024

Field	Value
Severity	Medium
Category	Testing coverage
Location	`clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505`
Status	Resolved

Description: The diff under review adds substantial SDK and CLI surface with no positive-path coverage:

GatewayClient::stream_alarms (client.rs:280-291) has no test. The fake gateway's stream_alarms impl in tests/client_behavior.rs:408-415 immediately drops the unique sender and returns an empty ReceiverStream, so an attached client would observe a clean end-of-stream — nothing exercises the snapshot → snapshot_complete → transition sequence that the doc on AlarmFeedStream advertises, mid-stream Status mapping through Error::from, or the cooperative-cancel-on-drop contract.
Five new bulk SDK methods (read_bulk, write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk on Session) have zero tests. The fake gateway's Invoke dispatcher does not handle any of MxCommandKind::ReadBulk, WriteBulk, Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk, so a happy-path round trip would fail at the gateway-side match. The malformed-reply branches in Client.Rust-022 are therefore also untested.
run_bench_read_bulk has no test. The BenchReadBulkStats helper that Client.Rust-015/016 added at a020350 (with its own dedicated unit test) was dropped in the port and there is no replacement — the percentile / mean math, the success-vs-failure accounting, and the JSON schema that scripts/bench-read-bulk.ps1 collates are all untested.

Recommendation: Extend the fake-gateway Invoke dispatcher in tests/client_behavior.rs with the five new bulk reply arms, add round-trip tests for each (write_bulk / write2_bulk / write_secured_bulk / write_secured2_bulk / read_bulk), and re-add the six malformed-reply tests from the Client.Rust-016 resolution. Replace the trivial stream_alarms stub with one that emits a synthetic ActiveAlarm → SnapshotComplete → Transition sequence and assert the client surfaces them in order. For the bench, factor the percentile / accounting helpers out of run_bench_read_bulk into a small struct (matching the previous BenchReadBulkStats) and add unit tests asserting latencyMs.{p50,p95,p99,max,mean} are computed correctly from a hand-built sample.

Resolution: 2026-05-24 — Extended the fake gateway's Invoke dispatcher to handle Register, AddItem2, ReadBulk, WriteBulk, Write2Bulk, WriteSecuredBulk, and WriteSecured2Bulk, with shared write_bulk_reply_for / bulk_write_result_ok helpers. Added round-trip integration tests in clients/rust/tests/client_behavior.rs for all five bulk SDK methods: read_bulk_round_trips_through_the_fake_gateway, write_bulk_round_trips_through_the_fake_gateway, write2_bulk_round_trips_through_the_fake_gateway, write_secured_bulk_round_trips_through_the_fake_gateway, write_secured2_bulk_round_trips_through_the_fake_gateway. Replaced the trivial stream_alarms stub with a per-test script (FakeState::stream_alarms_script) and added stream_alarms_emits_snapshot_then_complete_then_transition_in_order, which feeds an ActiveAlarm → SnapshotComplete → Transition sequence and asserts the client surfaces all three payload arms in order. Added three CLI-side unit tests against the existing percentile_summary helper in clients/rust/crates/mxgw-cli/src/main.rs: bench_percentile_summary_matches_hand_built_sample (asserts p50/p95/p99/max/mean on the sample [1,2,3,4,5]), bench_percentile_summary_handles_empty_sample, and bench_percentile_summary_handles_single_value_sample. The malformed-reply suite from Client.Rust-022 plus the unary Error::Unavailable test together cover the remaining surface.

Client.Rust-025

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:92-106,142-153,164-171`
Status	Resolved

Description: CLAUDE.md mandates that "When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs ... must change in the same commit." The diff under review adds the following public surface, none of which is reflected in RustClientDesign.md:

SDK methods on GatewayClient: stream_alarms and acknowledge_alarm, plus the new AlarmFeedStream type alias re-exported at the crate root in lib.rs:27.
SDK methods on Session: read_bulk, write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk. The Session impl block in RustClientDesign.md:92-106 still lists only the original six bulk-subscribe helpers, identical to its d692232 state (the Client.Rust-019 resolution that added these to the doc lived on the divergent branch and was lost in the 397d3c5 rename).
CLI subcommands: stream-alarms, acknowledge-alarm, read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk, and batch. The Commands block in RustClientDesign.md:166-171 still lists only version, smoke, stream-events, write.
Error variants: Error::MalformedReply / Error::Unavailable / Error::InvalidEndpoint are still missing from the Error enum sketch at RustClientDesign.md:142-153 (Client.Rust-017's expansion was also lost in the rename — and Client.Rust-022 will add MalformedReply back when fixed).
Build configuration: the new clients/rust/.cargo/config.toml Windows 8 MB stack workaround is not mentioned anywhere in the design doc — a reader reproducing the build on a fresh dev box would not know to expect the stack-overflow failure mode the file documents.

Recommendation: Bring RustClientDesign.md back in sync with the actual implementation. Restore the Session API block to enumerate the five new bulk read/write methods alongside the existing six bulk-subscribe helpers (cross-reference Client.Rust-019's recommendation for the correct signatures — no trailing positional user_id/timestamp/current_user_id/verifier_user_id, those live on the per-entry structs). Add stream_alarms / acknowledge_alarm / AlarmFeedStream to a new "Alarms" section adjacent to the existing event-stream section. Expand the CLI command list to enumerate every subcommand the binary exposes today. Expand the Error enum sketch to match clients/rust/src/error.rs. Add a short "Windows build notes" subsection documenting the .cargo/config.toml stack workaround and why clap-derive's large Command enum needs it.

Resolution: 2026-05-24 — Brought clients/rust/RustClientDesign.md back in sync with the implementation. Removed the orphan tracing dependency line and added a new "Windows Build Notes" section explaining the clients/rust/.cargo/config.toml /STACK:8388608 MSVC link-arg (why clap-derive's large Command enum needs it, that it writes into IMAGE_OPTIONAL_HEADER.SizeOfStackReserve for both debug and release builds, and that the MSVC-only target selector keeps mingw unaffected). Extended the Session API block to enumerate the five new bulk read/write helpers — read_bulk<S: AsRef<str>>(&[S], u32), write_bulk(Vec<WriteBulkEntry>), write2_bulk(Vec<Write2BulkEntry>), write_secured_bulk(Vec<WriteSecuredBulkEntry>), write_secured2_bulk(Vec<WriteSecured2BulkEntry>) — plus add_item2, un_advise, remove_item, and write2. Added a paragraph noting that per-entry credentials/timestamps live on the entry structs and that read_bulk's borrow-and-AsRef<str> shape is what the cross-language bench-read-bulk hot loop relies on. Added a new "Alarms" section with GatewayClient::stream_alarms / acknowledge_alarm / the AlarmFeedStream type alias and the always-on (no-session) snapshot → complete → transition contract. Expanded the Error enum block to match clients/rust/src/error.rs (every public variant including MalformedReply, Unavailable, InvalidEndpoint, InvalidArgument). Expanded the CLI command list to enumerate every subcommand the binary exposes today, with notes on batch's EOF-only termination and bench-read-bulk's cross-language JSON shape.

Client.Rust-026

Field	Value
Severity	Low
Category	Performance & resource management
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1402-1406,1419-1423`
Status	Resolved

Description: run_bench_read_bulk clones the tags: Vec<String> on every iteration of both the warmup loop and the steady-state measurement loop:

while Instant::now() < warmup_deadline {
    let _ = session
        .read_bulk(server_handle, tags.clone(), timeout_ms_param)
        .await;
}
// ...
while Instant::now() < steady_deadline {
    let call_start = Instant::now();
    let result = session
        .read_bulk(server_handle, tags.clone(), timeout_ms_param)
        .await;
    let elapsed = call_start.elapsed();
    latencies_ms.push(elapsed.as_secs_f64() * 1000.0);
    // ...
}

Each clone allocates one fresh Vec<String> plus bulk_size heap-allocated String clones (default 6, but the flag goes higher; the cross-language matrix routinely runs with bulk_size=64 and bulk_size=256). The allocation happens inside call_start ... call_start.elapsed(), so the per-call latency the bench reports includes the clone cost — both inflating the absolute number and adding allocator-noise variance to the percentile distribution that the cross-language driver compares against the .NET/Go/Java/Python bench numbers (none of which allocate the tag list per call).

Recommendation: Change Session::read_bulk to take tag_addresses: &[String] or generic over AsRef<str> (as Client.Rust-019's recommendation noted is what read_bulk was at a020350) so the bench can pass &tags once and avoid the per-call clone. Alternatively keep the Vec<String> signature but borrow the underlying buffer for the hot loop — e.g. build the bench's payload once outside the steady-state window and re-use it. The current read_bulk(..., Vec<String>, ...) shape forces the clone at the call site.

Resolution: 2026-05-24 — Changed Session::read_bulk to read_bulk<S: AsRef<str>>(&self, server_handle, tag_addresses: &[S], timeout_ms) so the bench loop can borrow the tag list once instead of cloning it per iteration. run_bench_read_bulk now binds let tags_ref: &[String] = &tags; outside the warm-up / steady-state loops and passes tags_ref into both, eliminating the per-call Vec<String> + String clone-allocations that were previously charged into the per-call latency reported by the cross-language latencyMs JSON contract. The CLI read-bulk subcommand was updated to call session.read_bulk(server_handle, &items, timeout_ms). Clippy's clone_on_copy / redundant_clone lints are clean at HEAD so no extra regression test is needed beyond read_bulk_round_trips_through_the_fake_gateway from Client.Rust-024.

Client.Rust-027

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/rust/.cargo/config.toml:1-9`
Status	Resolved

Description: The new build-config file added by 71d2c39 carries this leading comment:

[target.'cfg(windows)']
# Bump the default 1 MB Windows stack to 8 MB. clap-derive builds a large
# Command enum in this CLI (one variant per subcommand, each carrying flag
# args); in debug builds the enum is materialized on the stack without
# optimization and overflows the default Windows main-thread stack before
# even reaching our code. Release builds are unaffected but the e2e matrix
# drives the CLI through `cargo run` (debug), so the link-arg ships with
# every dev-time invocation.
rustflags = ["-C", "link-arg=/STACK:8388608"]

Two issues with the documentation vs the configuration:

The comment claims "Release builds are unaffected" but the rustflags setting is unconditional under cfg(windows) — cargo build --release on Windows also produces a binary whose IMAGE_OPTIONAL_HEADER.SizeOfStackReserve is 8 MB. The 8 MB stack reservation is in fact harmless for release builds (the optimizer removes the enum from the stack), but the comment misleads a reader trying to understand whether release artifacts of mxgw ship with a non-default header value (they do).
/STACK: is an MSVC-linker (link.exe / lld-link) directive; cargo build on Windows with +gnu targets (mingw) routes link args through the GNU linker which rejects /STACK: and instead expects -Wl,--stack,8388608. The current file silently breaks x86_64-pc-windows-gnu builds — a configuration docs/ToolchainLinks.md lists no expectation about, but a reader picking up the file does not see the constraint.

Recommendation: Either tighten the [target.…] selector to only the MSVC-linker tier (cfg(all(windows, target_env = "msvc"))) and re-word the comment to make the release-build behaviour explicit ("the stack reservation goes into the PE header for both debug and release builds; release is unaffected at runtime because the optimizer elides the enum from the stack frame"), or add the +gnu variant as a parallel block using -Wl,--stack,8388608. As a complementary fix, see Client.Rust-024's recommendation — boxing the largest clap-derive variants would let the stack workaround be retired entirely.

Resolution: 2026-05-24 — Tightened the target selector in clients/rust/.cargo/config.toml from cfg(windows) to cfg(all(windows, target_env = "msvc")) so x86_64-pc-windows-gnu (mingw) builds, which route link args through the GNU linker and reject /STACK:, are no longer affected. Rewrote the leading comment to make the release-build behaviour explicit: the /STACK: link-arg writes into IMAGE_OPTIONAL_HEADER.SizeOfStackReserve at link time and applies to both debug and release builds (so release artifacts ship with the same 8 MB stack reservation); release builds do not need it at runtime because the optimizer elides the enum from the stack frame, but the setting is kept on so both flavours produce binaries with identical stack metadata. The doc note in RustClientDesign.md now mirrors this.

Client.Rust-028

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1126-1166`
Status	Resolved

Description: run_batch reads commands from stdin with the blocking std::io::Stdin::lock().lines() iterator while the surrounding function is async fn and the runtime is #[tokio::main] (multi-threaded by default). Each for line in stdin.lock().lines() iteration pins one of tokio's worker threads on a blocking syscall (ReadFile on Windows), then spawns the dispatch as a separate tokio::task::spawn(dispatch(cli.command)) — which itself runs on another worker thread — and awaits its JoinHandle. The pattern works for a single-threaded harness driver but has two latent problems:

Empty-line sentinel: if line.is_empty() { break; } exits the loop on the first empty line, terminating the batch session. The comment block above declares this as a feature ("stdin EOF (or an empty line) ends the session") but an accidental blank line from any driver — for example a Write-Output "" between commands in the PowerShell e2e matrix — silently ends the session and the harness has no way to distinguish "end-of-input" from "the CLI broke after the previous command". The other four language CLIs that implement batch use only EOF, not an empty-line sentinel.
Blocking on a tokio worker: any non-trivial latency in dispatch (e.g. a 30-second bench-read-bulk) keeps run_batch parked on its JoinHandle.await, which is fine; but if dispatch ever needs to drive the stdin reader (it doesn't today, but the Batch subcommand is the natural seam for future "stream input to a subcommand"), the blocking Stdin::lock() on the main task would deadlock the runtime worker.

Recommendation: Replace the empty-line break with an EOF-only terminator (if line.trim().is_empty() { println!("{BATCH_EOR}"); stdout.lock().flush().ok(); continue; }) so accidental blank lines log an empty-EOR-bracketed result instead of ending the session. Optionally wrap the stdin reader in tokio::task::spawn_blocking so the runtime can move it onto a dedicated blocking-pool thread; the current shape works for today's harness contract but is brittle if the dispatch future ever needs to share the runtime with stdin reads.

Resolution: 2026-05-24 — Removed the if line.is_empty() { break; } empty-line sentinel in run_batch so the only terminator is now stdin EOF (the implicit end of the for line in stdin.lock().lines() iterator). Blank or whitespace-only lines now fall into the existing parts.is_empty() branch which logs __MXGW_BATCH_EOR__ and continues, matching the other four language CLIs and shielding the PowerShell e2e harness from accidental Write-Output "" calls between commands. Added a comment block on run_batch explaining the tokio-runtime / blocking-stdin trade-off: dispatch is already spawned on a fresh tokio task so the blocking iterator parks at most one worker thread on ReadFile, and that is acceptable because no other future on the main task needs to run while we wait for the next command. The CLI doc comment on the Batch clap variant was updated to mirror the new semantics. No unit test added — run_batch reads from real stdin which can't be driven from a #[tokio::test] without a subprocess harness, and the existing parses_batch_command / batch_eor_marker_is_stable tests already cover the parser and the EOR sentinel.

Client.Rust-029

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/options.rs:98,143`; `clients/rust/src/galaxy.rs:282`; `clients/rust/src/session.rs:664-671`
Status	Resolved

Description: cargo clippy --workspace --all-targets -- -D warnings fails at HEAD 42b0037 with three errors that the prior d692232 reviewer noted as "out of scope for Client.Rust-021" but did not open as a tracked finding:

error: missing documentation for a method
  --> src\options.rs:98:5
   |
98 |     pub fn with_max_grpc_message_bytes(mut self, max_grpc_message_bytes: usize) -> Self {

error: missing documentation for a method
   --> src\options.rs:143:5
    |
143 |     pub fn max_grpc_message_bytes(&self) -> usize {

error: using `clone` on type `Option<Timestamp>` which implements the `Copy` trait
   --> src\galaxy.rs:282:24

error: all variants have the same postfix: `Bulk`
   --> src\session.rs:664:1
    |
664 | / enum BulkReplyKind {
665 | |     AddItemBulk,
666 | |     AdviseItemBulk,
667 | |     RemoveItemBulk,

All three were resolved at a020350 (Client.Rust-001, Client.Rust-002, Client.Rust-012 respectively) but regressed when 397d3c5 re-platformed parts of the prior fix and were never re-fixed. CLAUDE.md explicitly mandates that cargo clippy --workspace --all-targets -- -D warnings pass cleanly; the documented build/test workflow for the module breaks. The prior reviewer correctly noted the regressions but routed them as "out of scope for Client.Rust-021", which left them invisible to the open-findings count. They are tracked here so they appear in the pending-findings index and are not lost again.

The third error (BulkReplyKind enum-variant-names) is also touched by the diff under review: commit 3251069 added a sibling enum BulkWriteReplyKind at session.rs:699-704 whose variants (Write, Write2, WriteSecured, WriteSecured2) do not share a suffix and so do not trip the lint — but the pattern is now duplicated. A fix should rename both enums consistently or apply the same scoped #[allow(clippy::enum_variant_names)] reason-annotated allow to both.

Recommendation: Re-apply Client.Rust-001 (add doc comments on with_max_grpc_message_bytes / max_grpc_message_bytes in options.rs), Client.Rust-002 (drop the Bulk suffix from BulkReplyKind's variants so they become AddItem / AdviseItem / …, or add a narrowly-scoped #[allow(clippy::enum_variant_names)] with a reason comment), and Client.Rust-012 (replace last_deploy.lock().unwrap().clone() with *last_deploy.lock().unwrap() in galaxy.rs:282). Verify with cargo clippy --workspace --all-targets -- -D warnings. Consider adding a pre-commit / CI gate so the next reviewer never has to discover the regression by running clippy.

Resolution: 2026-05-24 — Re-applied all three resolutions. clients/rust/src/options.rs now has /// doc comments on with_max_grpc_message_bytes and max_grpc_message_bytes. clients/rust/src/galaxy.rs:282 uses *self.state.last_deploy.lock().unwrap() instead of .clone(). clients/rust/src/session.rs's BulkReplyKind variants are renamed to AddItem / AdviseItem / RemoveItem / UnAdviseItem / Subscribe / Unsubscribe (no shared Bulk suffix), with the call sites in add_item_bulk / advise_item_bulk / remove_item_bulk / un_advise_item_bulk / subscribe_bulk / unsubscribe_bulk updated accordingly. The sibling BulkWriteReplyKind already had non-suffix-sharing variants (Write / Write2 / WriteSecured / WriteSecured2) and required no rename. cargo clippy --workspace --all-targets -- -D warnings is clean at HEAD.

76 KiB Raw Blame History