mxaccessgw/code-reviews/Client.Rust/findings.md

# Code Review — Client.Rust

| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | `a020350` |
| Status | Reviewed |
| Open findings | 0 |

## Checklist coverage

This re-review (`a020350`) covers the resolution work for Client.Rust-013 through 017 (scoped `doc_lazy_continuation` allow on generated submodules, `pub` `next_correlation_id` shared with the CLI, success/failure split in `bench-read-bulk`, eight new tests, design-doc resync). The pass spot-checked the items called out in the request: stability of the newly-`pub` `next_correlation_id`, the `bench-read-bulk` JSON shape vs the PowerShell driver, presence of `unsafe`, and the scope of `#![allow(clippy::doc_lazy_continuation)]`. `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit.

| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found — the five new `MalformedReply` paths and the `read_bulk` mismatched-payload branch each have a dedicated test; `BenchReadBulkStats` correctly partitions success vs failure latency. |
| 2 | mxaccessgw conventions | No issues found — `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit; the `#![allow(clippy::doc_lazy_continuation)]` allow is scoped narrowly to each generated v1 inner module so hand-written code is unaffected; CLI `Ping`/`CloseSession` now call `session::next_correlation_id`. |
| 3 | Concurrency & thread safety | No issues found — `CORRELATION_SEQUENCE` is `AtomicU64` with `Relaxed`, correct for monotonic id generation; no `unsafe` anywhere in `src/` or `crates/`. |
| 4 | Error handling & resilience | Issue found: the `bench-read-bulk` fix for Client.Rust-015 has fixed Rust's own histogram honestly but the change makes Rust's `latencyMs` semantically incompatible with the four other clients' `latencyMs` field that the cross-language PowerShell driver collates side-by-side (Client.Rust-018). |
| 5 | Security | No issues found — API keys still redacted in `Debug`/`Display`, status messages scrubbed, `first_failure` records `Error::Display` (which already redacts `mxgw_*` tokens) so secure-write values cannot leak into the bench JSON. |
| 6 | Performance & resource management | No issues found in the reviewed delta. |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` Session signatures for the four bulk-write helpers and `read_bulk` do not match the actual implementation — the design lists trailing `user_id` / `timestamp` / `current_user_id` / `verifier_user_id` parameters and a `Vec<ReadBulkResult>` return that the impl does not have (all of those move per-entry into `WriteBulkEntry` etc.) (Client.Rust-019). |
| 8 | Code organization & conventions | No new issues — `BenchReadBulkStats` is cleanly factored out and tested. |
| 9 | Testing coverage | No new issues — the malformed-reply paths and unary `Error::Unavailable` are now covered, and the four bulk-write families each have round-trip smoke. |
| 10 | Documentation & comments | Issue found: `next_correlation_id` is now `pub` and its doc comment commits the SDK to the literal `"rust-client-{label}-{N}"` string format, but neither the doc nor `lib.rs` re-exports it or declares any stability stance, leaving the public surface ambiguous (Client.Rust-020). |

## Findings

### Client.Rust-001

| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Resolved |

**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.

**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`

**Resolution:** Resolved in `0d8a28d` (2026-05-18): doc comments added to both methods.

### Client.Rust-002

| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Resolved |

**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.

**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): variants renamed to `AddItem`/`AdviseItem`/`RemoveItem`/`UnAdviseItem`/`Subscribe`/`Unsubscribe`, which no longer share a common suffix.

### Client.Rust-003

| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Resolved |

**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.

**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): the test now asserts against the `GATEWAY_PROTOCOL_VERSION` / `WORKER_PROTOCOL_VERSION` constants, so it cannot drift again.

### Client.Rust-004

| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Resolved |

**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.

**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): `CLIENT_VERSION` is now `env!("CARGO_PKG_VERSION")`, taken from `Cargo.toml` at compile time so the two cannot drift.

### Client.Rust-005

| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Resolved |

**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.

**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): the three handle extractors now return `Result<i32, Error>` and yield the new `Error::MalformedReply` when an OK reply carries no usable handle.

### Client.Rust-006

| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Resolved |

**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".

**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): `bulk_results` now returns `Result<Vec<SubscribeResult>, Error>` and yields `Error::MalformedReply` on a mismatched or absent bulk payload.

### Client.Rust-007

| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Resolved |

**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.

**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).

**Resolution:** Resolved in `0d8a28d` (2026-05-18): the "Crate Layout" section now shows the actual flat layout (`mxgateway-client` as the workspace-root crate, `mxgw-cli` as a member) and the unused `tracing` entry was removed from the dependency list.

### Client.Rust-008

| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Resolved |

**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.

**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): `MxValue` and `MxArrayValue` no longer cache a `projection` field — `projection()` computes the typed view on demand from `raw`. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.

### Client.Rust-009

| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Resolved |

**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).

**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): added `add_item_bulk_rejects_input_above_the_thousand_item_cap`, `event_stream_surfaces_a_mid_stream_status_fault` (the fake gateway now optionally emits a mid-stream `Status::unavailable`), and `connect_with_unreadable_ca_file_reports_invalid_endpoint`.

### Client.Rust-010

| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Resolved |

**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.

**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): added the `Error::Unavailable` variant; `From<tonic::Status>` maps `Code::Unavailable` and `Code::ResourceExhausted` to it, so callers can classify transient failures without unwrapping the raw status.

### Client.Rust-011

| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Resolved |

**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.

**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): correlation ids are built by `next_correlation_id`, which appends a process-wide atomic sequence number; `Session::close` uses it too.

### Client.Rust-012

| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/galaxy.rs:282` |
| Status | Resolved |

**Description:** Found while verifying the fix for Client.Rust-001/002: `cargo clippy --workspace --all-targets -- -D warnings` reported a third violation the original review missed. The `get_last_deploy_time` test fake calls `.clone()` on a `MutexGuard<Option<prost_types::Timestamp>>`, and `Option<Timestamp>` is `Copy` (`clippy::clone_on_copy`). Under `-D warnings` this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.

**Recommendation:** Dereference instead of cloning: `*self.state.last_deploy.lock().unwrap()`.

**Resolution:** Resolved in `0d8a28d` (2026-05-18): replaced `.clone()` with a deref. `cargo clippy --workspace --all-targets -- -D warnings` now passes cleanly.

### Client.Rust-013

| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site) |
| Status | Resolved |

**Description:** `cargo clippy --workspace --all-targets -- -D warnings` fails again on this commit, this time on a `clippy::doc_lazy_continuation` violation in generated code:

```
error: doc list item without indentation
   --> .../mxaccess_gateway.v1.rs:526:5
    |
526 | /// `timeout_ms == 0` uses the gateway-configured default (1000 ms).
    |     ^
```

The lint fires because the `ReadBulkCommand` proto comment (added with the bulk Read feature in commit `5e375f6`) writes a bulleted list and then a trailing paragraph without the required blank line. prost-build forwards the proto comment verbatim into Rust doc comments, and the Rust client compiles those generated modules with crate-default lints. The crate already opts out of `clippy::large_enum_variant` in `src/generated.rs` for exactly this kind of generator-style problem, but `doc_lazy_continuation` is not on the allow-list, so the lint reaches `-D warnings` and breaks the documented `cargo clippy --workspace --all-targets -- -D warnings` invocation that CLAUDE.md mandates pass. The Rust client review was previously closed as clippy-clean (Client.Rust-001/002/012); this is the third clippy-clean regression caused by generated code in this module and warrants a more durable fix.

**Recommendation:** Add `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` alongside `clippy::large_enum_variant`, so generated doc comments — which the client cannot edit — cannot break the `-D warnings` build. Independently, fix the upstream proto comment to insert a blank line before the trailing paragraph so the C# / Go / Python / Java generators do not carry the same flaky text.

**Resolution:** 2026-05-20 — Added `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` next to the existing `clippy::large_enum_variant` allow, and reformatted the `ReadBulkCommand` proto comment in `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto` to surround the bulleted list with blank lines so doc-comment generators in every language see a properly-terminated list. `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` now pass, and `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` reports 0 warnings.

### Client.Rust-014

| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:450,497` |
| Status | Resolved |

**Description:** Client.Rust-011 made `Session` build unique correlation ids per call, but the `mxgw` CLI's `Ping` and `CloseSession` subcommands still hard-code `client_correlation_id: "rust-cli-ping".to_owned()` and `"rust-cli-close-session".to_owned()`. Both go through `client.invoke(…)` / `client.close_session_raw(…)` rather than the `Session` helpers, so the library's id generator does not run. The CLI is the cross-language e2e driver — when the same machine runs concurrent CLI smokes, every `ping`/`close-session` request collides on the same correlation id in gateway logs, defeating the diagnostic value the library fix unlocked.

**Recommendation:** Either (a) expose `session::next_correlation_id` as a `pub(crate)` or library-level helper and have the CLI call it from `Ping`/`CloseSession`, or (b) replace these RPCs with the higher-level `Session` helpers (`Session::close`, and a thin `Session::ping` wrapper) so the CLI shares the library's correlation-id discipline by construction.

**Resolution:** 2026-05-20 — Promoted `session::next_correlation_id` from a module-private helper to a `pub` library-level function (it already lived in the `pub mod session`) and updated the `mxgw` CLI's `Ping` and `CloseSession` subcommands to call `mxgateway_client::session::next_correlation_id("cli-ping")` / `next_correlation_id("cli-close-session")` instead of the hard-coded `"rust-cli-ping"` / `"rust-cli-close-session"` strings. Concurrent CLI smokes now produce unique correlation ids per call — driven by the same process-wide `CORRELATION_SEQUENCE` `AtomicU64` the library uses — so gateway logs can tell collisions apart again. `cargo fmt`, `cargo build --workspace`, `cargo clippy --workspace --all-targets -- -D warnings`, and `cargo test --workspace` all pass.

### Client.Rust-015

| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1053-1070` |
| Status | Resolved |

**Description:** The new cross-language benchmark `bench-read-bulk` pushes the elapsed time of every `read_bulk` call into `latencies_ms` regardless of whether the call returned `Ok` or `Err`:

```rust
let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
latencies_ms.push(elapsed_ms);
match outcome {
    Ok(results) => { successful_calls += 1; … }
    Err(_) => failed_calls += 1,
}
```

A failed `read_bulk` (transient `Unavailable`, deadline-exceeded mid-call, etc.) typically returns *later* than a successful one — it includes the full per-call timeout that the success path never waits for. The histogram therefore conflates "p99 cached-read latency" with "p99 of (cached-read + timed-out call)", and the JSON document the PowerShell driver collates publishes `latencyMs.p99` / `latencyMs.max` that no longer represent successful-call latency. Worse, the failure category is silently dropped (`Err(_) => failed_calls += 1`) so a benchmark run that fails on every call still emits a coherent-looking JSON without ever surfacing why. This is misleading for a benchmark whose JSON shape is the cross-language comparison contract.

**Recommendation:** Only push elapsed time into `latencies_ms` on `Ok`, or split into two histograms (`successLatencyMs` and `failureLatencyMs`) and log the first failure's error string into the stats record so a partial-failure run is visible at the report layer.

**Resolution:** 2026-05-20 — Extracted the per-iteration accounting in `bench-read-bulk` into a `BenchReadBulkStats` helper with explicit `record_success`/`record_failure` methods. Successful `read_bulk` calls now flow into `success_latencies_ms` (driving the cross-language `latencyMs.p99`/`max` JSON contract), failures flow into a separate `failure_latencies_ms` histogram surfaced as `failureLatencyMs`, and the first failure's redacted error string is stashed as `firstFailure` so a partial-failure run is visible at the report layer instead of producing a coherent-looking JSON that hides every error. Added a unit test (`bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram`) that records two fast successes plus a deliberately slow failure and asserts the success histogram never sees the failure latency, plus a smaller smoke test for the zero-duration calls-per-second path.

### Client.Rust-016

| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768` |
| Status | Resolved |

**Description:** The fixes for Client.Rust-005 / 006 added five new `Error::MalformedReply` paths to `session.rs` (`register_server_handle`, `add_item_handle`, `add_item2_handle`, `bulk_results`, `bulk_write_results`) plus the inline branch in `read_bulk`. None of them are exercised by tests — every test in `client_behavior.rs` feeds the matching payload back to the client, so the malformed-reply branches are dead code from the test suite's perspective. The new bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) only have a single happy-path assertion via `write_bulk`, leaving the three other variants and every per-entry-failure shape untested. The bench-read-bulk flow has no test (the driver script is the only consumer). The `Error::Unavailable` variant from Client.Rust-010 is covered by `event_stream_surfaces_a_mid_stream_status_fault`, but the same variant on a unary `Code::Unavailable` is not.

**Recommendation:** Add three light tests against the existing `FakeGateway`:

1. Have the fake reply to `AddItem` (or `Register` / `AddItem2`) with `protocol_status = Ok` and no payload, and assert the client surfaces `Error::MalformedReply`.
2. Have the fake reply to `WriteBulk` with `protocol_status = Ok` and the wrong payload arm (e.g. an `AddItemReply` body), and assert `Error::MalformedReply`.
3. Have the fake fail the unary `Invoke` with `Status::unavailable(...)` and assert `Error::Unavailable`.

Optionally add Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk smoke assertions so all four bulk-write families have at least one round-trip test.

**Resolution:** 2026-05-20 — Added eight new integration tests in `clients/rust/tests/client_behavior.rs`. Each new `Error::MalformedReply` site is exercised via a test-only `InvokeOverride` injected into `FakeState` that lets a single test pin the fake gateway's `Invoke` handler to one of three malformed shapes (OK reply with no payload, OK reply with the wrong payload arm for `read_bulk`, OK reply with the wrong payload arm for the other bulk / bulk-write families): `register_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item2_returns_malformed_reply_when_ok_reply_has_no_payload`, `subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `write_bulk_returns_malformed_reply_on_mismatched_payload_arm`, and `read_bulk_returns_malformed_reply_on_mismatched_payload_arm`. The unary `Error::Unavailable` path is covered by `unary_invoke_maps_status_unavailable_to_error_unavailable` (the override returns `Status::unavailable(...)`). The remaining three bulk-write families gained round-trip smoke tests — `write2_bulk_round_trips_through_the_fake_gateway`, `write_secured_bulk_round_trips_through_the_fake_gateway`, `write_secured2_bulk_round_trips_through_the_fake_gateway` — extending the fake gateway's dispatcher with happy-path replies for `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk`. The `bench-read-bulk` flow gets a `BenchReadBulkStats` unit test in `crates/mxgw-cli/src/main.rs` (see Client.Rust-015) that asserts the latency-tracking change keeps failed-call durations out of `latencyMs`.

### Client.Rust-017

| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:79-99,156-163` |
| Status | Resolved |

**Description:** CLAUDE.md requires docs to change with the source. `RustClientDesign.md` was refreshed to fix the layout/`tracing` drift (Client.Rust-007), but the Session API surface in the design (`Library API` block, lines 79-99) still lists only the original six bulk helpers — `add_item_bulk`, `advise_item_bulk`, `remove_item_bulk`, `un_advise_item_bulk`, `subscribe_bulk`, `unsubscribe_bulk` — and is missing the five new bulk-write helpers and `read_bulk` (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) that landed in commits `5e375f6` / `f220908` / `61644e6`. The `Error Handling` block (lines 130-146) still enumerates `Transport`, `Status`, `Authentication`, `Authorization`, `Session`, `Worker`, `Command`, `MxAccess`, `Timeout`, `Cancelled` — but not `MalformedReply`, `Unavailable`, or `InvalidEndpoint`, all of which are now public variants of the crate's `Error` enum. The `Test CLI` block (lines 158-163) lists `version` / `smoke` / `stream-events` / `write` but is missing every new subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, `galaxy watch`).

**Recommendation:** Bring the design doc back in sync: extend the `Session` API code block to enumerate the bulk-write/read methods, expand the `Error` enum to match `clients/rust/src/error.rs`, and add the missing CLI subcommands. The README is already up to date, so this is design-doc-only churn.

**Resolution:** 2026-05-20 — Brought `clients/rust/RustClientDesign.md` back in sync with the implementation. The `Session` block now lists the five new bulk helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) alongside the original six and notes that `session::next_correlation_id` is `pub` for raw-RPC consumers (the CLI). The `Error` enum block now matches `clients/rust/src/error.rs` — `InvalidEndpoint`, `InvalidArgument`, `Transport`, `Authentication`, `Authorization`, `Timeout`, `Cancelled`, `Unavailable`, `Status`, `Command`, `ProtocolStatus`, `MalformedReply` — with a short paragraph explaining what `Unavailable`, `MalformedReply`, and `InvalidEndpoint` classify. The `Test CLI` block enumerates every subcommand the binary exposes today: `version`, `ping`, `open-session`, `close-session`, `register`, `add-item`, `advise`, `subscribe-bulk`, `unsubscribe-bulk`, `read-bulk`, `write`, `write2`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `stream-events`, `bench-read-bulk`, `smoke`, and the `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}` subtree.

### Client.Rust-018

| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1098-1170`; `scripts/bench-read-bulk.ps1:347-365`; siblings: `clients/go/cmd/mxgw-go/main.go:600-648`, `clients/python/src/mxgateway_cli/commands.py:614-662`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:685-770`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-940` |
| Status | Resolved |

**Description:** Client.Rust-015's resolution split Rust's bench histogram so `latencyMs` records only successful `read_bulk` calls and a new `failureLatencyMs` field holds failed-call durations. The local logic is right, the unit test (`bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram`) is right, and the JSON shape stays additively compatible with `scripts/bench-read-bulk.ps1` (the collator reads `$s.latencyMs.p50`/`p95`/`p99`/`max`/`mean` and these keys still exist on the Rust output). The problem is cross-language: the .NET, Go, Python, and Java bench implementations still push every call's elapsed time into a single `latenciesMs` / `latencies_ms` / `latencyMillis` array regardless of success or failure (e.g. `clients/go/cmd/mxgw-go/main.go:611` appends before the success/failure branch; `clients/python/src/mxgateway_cli/commands.py:624,626` appends in both `except` and the happy path; `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:701,705` adds in both `catch` and the OK path; `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:865,880` records in both branches). The PS driver's side-by-side comparison table (lines 348-360) pulls `latencyMs.p50/p95/p99/max/mean` from every client and prints them in one row, so a partial-failure run now shows Rust's p99 measured over successes only and the other four clients' p99 measured over (success + per-call timeout) — the numbers are silently no longer comparable. This re-introduces the original Client.Rust-015 problem at the cross-language layer that the fix was meant to remove.

**Recommendation:** Make the contract uniform. Either (a) revert Rust's `latencyMs` to the all-calls histogram for backwards/cross-language compatibility and keep `failureLatencyMs` as an additive Rust-only enrichment, or (b) push the same success-only / failure-separated split into the .NET, Go, Python, and Java bench commands so every language emits the honest pair (`latencyMs` = success, `failureLatencyMs` = failure, plus `firstFailure`) and update the PS driver's table column to make the success-only semantics explicit (`p99 ok ms`). Option (b) is the better long-term posture but it is a cross-client change; option (a) restores comparability immediately.

**Resolution:** 2026-05-20 — Took option (a) to restore cross-language comparability immediately. Reverted Rust's `latencyMs` to the all-calls histogram so it matches the .NET/Go/Python/Java bench shape that `scripts/bench-read-bulk.ps1` collates side-by-side: `BenchReadBulkStats::record_success` and `record_failure` now both push elapsed time into a single `latencies_ms` vector, and `record_failure` additionally pushes into `failure_latencies_ms` and stashes the first failure's redacted error string in `first_failure`. The JSON output keeps `failureLatencyMs` and `firstFailure` as Rust-only additive enrichment so a partial-failure run is still visible at the report layer without breaking the side-by-side table. Renamed the unit test to `bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately`; it now asserts `latencyMs.max == 1500.0` (the slow failure is included in the cross-language `latencyMs` contract) while `failureLatencyMs.max == 1500.0` and `firstFailure` still surface the failure separately for diagnostics. Pushing the success-only / failure-separated split into the other four clients (option (b)) is the better long-term posture but is deliberately out of scope here.

### Client.Rust-019

| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:96-100` |
| Status | Resolved |

**Description:** Client.Rust-017 was closed by adding the new bulk-write/read entries to the design doc, but the signatures shown in the code block do not match the implementation. The doc declares:

```rust
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;
```

The actual implementations in `clients/rust/src/session.rs:385-526` take only `(server_handle, entries)` — `user_id` is per-entry on `WriteBulkEntry`/`Write2BulkEntry`, `timestamp_value` is per-entry on `Write2BulkEntry`/`WriteSecured2BulkEntry`, and `current_user_id`/`verifier_user_id` are per-entry on `WriteSecured{,2}BulkEntry`. The protobuf in `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:364-416` confirms this — there is no top-level `user_id` on these commands. The doc also returns `Vec<ReadBulkResult>` but the generated type is `BulkReadResult` (the gateway's `BulkReadReply` carries `repeated BulkReadResult`), and the actual signature is `read_bulk<S: AsRef<str>>(..., tag_addresses: &[S], ...) -> Vec<BulkReadResult>` — generic over `AsRef<str>` so callers can pass either `Vec<String>` or `[&str]`.

The drift is small but the design doc was the explicit subject of Client.Rust-017's resolution, so it warrants a follow-up. CLAUDE.md requires docs to change with the source.

**Recommendation:** Replace the five signatures in `RustClientDesign.md:96-100` with the ones actually in `session.rs`:

```rust
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;
```

and add a one-line note that the per-entry fields (`user_id`, `timestamp_value`, `current_user_id`, `verifier_user_id`) live on the entry structs themselves.

**Resolution:** 2026-05-20 — Replaced the five drifted signatures in `RustClientDesign.md` with the ones that actually live in `clients/rust/src/session.rs`: `write_bulk` / `write2_bulk` / `write_secured_bulk` / `write_secured2_bulk` take only `(server_handle, entries)`, and `read_bulk<S: AsRef<str>>` takes a borrowed `&[S]` and returns `Vec<BulkReadResult>` (not `Vec<ReadBulkResult>`). Added a follow-up paragraph noting that the per-entry fields `user_id` / `timestamp_value` / `current_user_id` / `verifier_user_id` live on `WriteBulkEntry` / `Write2BulkEntry` / `WriteSecuredBulkEntry` / `WriteSecured2BulkEntry` themselves rather than as trailing positional arguments, matching the protobuf shapes in `mxaccess_gateway.proto`, and that `read_bulk` is generic over `AsRef<str>` so callers can pass `&[String]` or `&[&str]` without cloning at the call site.

### Client.Rust-020

| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39` |
| Status | Resolved |

**Description:** Client.Rust-014's resolution promoted `next_correlation_id` from a module-private helper to a `pub` function so the `mxgw` CLI's raw-RPC paths can share the library's correlation-id discipline. The doc comment commits the library to a literal string format — `"rust-client-{label}-{N}"` — that external code can now depend on. Two concerns:

1. The function is not re-exported at the crate root in `lib.rs` (it only ships through the `pub mod session` namespace), so the in-tree caller writes the long `mxgateway_client::session::next_correlation_id("cli-ping")` path. Either re-export it via `#[doc(inline)] pub use session::next_correlation_id;` or leave it where it is and add a short note in the doc — but the current state straddles "public API" and "lib-internal helper" without saying which.

2. The doc comment does not declare a stability stance (no `#[doc(hidden)]`, no "experimental" note, no `__priv` naming). As written it promises the literal format `"rust-client-{label}-{N}"` to any out-of-tree consumer; a future change that renames the prefix (for example to drop the `rust-` after a multi-client reformat) would be a behavioural break. The `RustClientDesign.md` resolution of Client.Rust-017 ("`session::next_correlation_id` is `pub`") reads similarly — it does not say whether the format is stable.

The combination — `pub`, format-committing doc, no stability note, no crate-root re-export — leaves the public surface ambiguous. The same review category (Documentation & comments) is where Client.Rust-014's CLI-side fix is now visible, so this is the natural place to clean it up.

**Recommendation:** Pick one of:

- Treat `next_correlation_id` as part of the SDK's public API. Re-export it from `lib.rs` (`#[doc(inline)] pub use session::next_correlation_id;`), rewrite the doc comment to *not* promise the literal `"rust-client-{label}-{N}"` format (just the property "monotonic, unique within a process, includes the supplied label"), and call that out in `RustClientDesign.md`.
- Treat it as internal-only. Mark it `#[doc(hidden)] pub` and add a `// Internal helper exposed for the in-tree `mxgw` CLI; not part of the public SDK contract.` comment so out-of-tree consumers do not build against a format that the SDK is free to change.

The CLI integration in Client.Rust-014 works either way; this is solely about declaring intent so the SDK's public surface is unambiguous.

**Resolution:** 2026-05-20 — Took the "treat as public SDK API" branch. Re-exported `next_correlation_id` at the crate root in `clients/rust/src/lib.rs` (`#[doc(inline)] pub use session::{next_correlation_id, Session};`) so in-tree and external callers can write the short `mxgateway_client::next_correlation_id(...)` path. Updated the in-tree `mxgw` CLI (`Ping` and `CloseSession` subcommands) to call through the crate-root re-export instead of `mxgateway_client::session::next_correlation_id`. Rewrote the doc comment to drop the format promise: the returned id is now documented as an opaque token with three guaranteed properties (embeds the supplied `label`, unique within a process via an internal monotonic atomic sequence, carries no embedded secret beyond `label`), and the doc explicitly states that the textual format `rust-client-{label}-{N}` is *not* part of the public contract and that callers must not parse it. Cross-referenced the crate-root re-export from the function-level doc. Updated `RustClientDesign.md` to call out that `next_correlation_id` is part of the public SDK surface, re-exported at the crate root, and that its textual format is intentionally not part of the contract.