Files
mxaccessgw/code-reviews/Client.Rust/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

313 lines
27 KiB
Markdown

# Code Review — Client.Rust
| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | `1cd51bb` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
This re-review (`1cd51bb`) covers the changes added since `3cc53a8`: the new bulk-write/read methods on `Session`, the `read_bulk` borrowed-slice signature, `MalformedReply` / `Unavailable` error variants, the projection-on-demand `MxValue`/`MxArrayValue`, the `next_correlation_id` rework, the new ReadBulk and bulk-write CLI subcommands, and the cross-language `bench-read-bulk` driver.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: `read_bulk` is missing the OK-but-shapeless `MalformedReply` symmetry of the other bulk helpers, but the bigger issue is no test exercises any of the new `MalformedReply` paths (Client.Rust-016). |
| 2 | mxaccessgw conventions | Issue found: `cargo clippy --workspace --all-targets -- -D warnings` still fails — a fresh `clippy::doc_lazy_continuation` violation in `ReadBulkCommand`'s generated doc comment trips the lint that the prior fixes did not anticipate (Client.Rust-013). CLI subcommands still emit hard-coded `client_correlation_id` strings on the `raw` paths (Client.Rust-014). |
| 3 | Concurrency & thread safety | No issues found — `CORRELATION_SEQUENCE` is `AtomicU64` with `Relaxed`, which is correct for monotonic id generation; clients remain cheaply cloneable; streams are `Send`. |
| 4 | Error handling & resilience | Issue found: `bench-read-bulk` records every `read_bulk` failure into the latency histogram as if it succeeded, skewing p99/max upward (Client.Rust-015). The new `Error::Unavailable` mapping looks correct. |
| 5 | Security | No issues found — API keys still redacted in `Debug`/`Display`, status messages scrubbed, secret arguments unchanged. |
| 6 | Performance & resource management | No issues found in the changed code — `read_bulk` is honest about the unavoidable owned-Vec materialisation; projection-on-demand is now lazy. |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` was refreshed but never grew the new bulk-write/read methods, the `Unavailable`/`MalformedReply` error variants, or the `bench-read-bulk` CLI command on its current surface (Client.Rust-017). |
| 8 | Code organization & conventions | No new issues — `BulkWriteReplyKind` follows the renamed `BulkReplyKind` shape. |
| 9 | Testing coverage | Issue found: none of the new code paths (bulk-write helpers, `read_bulk`, `MalformedReply`, `Error::Unavailable`, the `bench-read-bulk` flow) are covered by client-side tests (Client.Rust-016). |
| 10 | Documentation & comments | No new issues beyond Client.Rust-017. |
## Findings
### Client.Rust-001
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Resolved |
**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.
**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`
**Resolution:** Resolved in `0d8a28d` (2026-05-18): doc comments added to both methods.
### Client.Rust-002
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Resolved |
**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.
**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): variants renamed to `AddItem`/`AdviseItem`/`RemoveItem`/`UnAdviseItem`/`Subscribe`/`Unsubscribe`, which no longer share a common suffix.
### Client.Rust-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Resolved |
**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.
**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the test now asserts against the `GATEWAY_PROTOCOL_VERSION` / `WORKER_PROTOCOL_VERSION` constants, so it cannot drift again.
### Client.Rust-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Resolved |
**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.
**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `CLIENT_VERSION` is now `env!("CARGO_PKG_VERSION")`, taken from `Cargo.toml` at compile time so the two cannot drift.
### Client.Rust-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Resolved |
**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.
**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the three handle extractors now return `Result<i32, Error>` and yield the new `Error::MalformedReply` when an OK reply carries no usable handle.
### Client.Rust-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Resolved |
**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".
**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `bulk_results` now returns `Result<Vec<SubscribeResult>, Error>` and yields `Error::MalformedReply` on a mismatched or absent bulk payload.
### Client.Rust-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Resolved |
**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.
**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the "Crate Layout" section now shows the actual flat layout (`mxgateway-client` as the workspace-root crate, `mxgw-cli` as a member) and the unused `tracing` entry was removed from the dependency list.
### Client.Rust-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Resolved |
**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.
**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `MxValue` and `MxArrayValue` no longer cache a `projection` field — `projection()` computes the typed view on demand from `raw`. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.
### Client.Rust-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Resolved |
**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).
**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added `add_item_bulk_rejects_input_above_the_thousand_item_cap`, `event_stream_surfaces_a_mid_stream_status_fault` (the fake gateway now optionally emits a mid-stream `Status::unavailable`), and `connect_with_unreadable_ca_file_reports_invalid_endpoint`.
### Client.Rust-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Resolved |
**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.
**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added the `Error::Unavailable` variant; `From<tonic::Status>` maps `Code::Unavailable` and `Code::ResourceExhausted` to it, so callers can classify transient failures without unwrapping the raw status.
### Client.Rust-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Resolved |
**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.
**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): correlation ids are built by `next_correlation_id`, which appends a process-wide atomic sequence number; `Session::close` uses it too.
### Client.Rust-012
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/galaxy.rs:282` |
| Status | Resolved |
**Description:** Found while verifying the fix for Client.Rust-001/002: `cargo clippy --workspace --all-targets -- -D warnings` reported a third violation the original review missed. The `get_last_deploy_time` test fake calls `.clone()` on a `MutexGuard<Option<prost_types::Timestamp>>`, and `Option<Timestamp>` is `Copy` (`clippy::clone_on_copy`). Under `-D warnings` this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.
**Recommendation:** Dereference instead of cloning: `*self.state.last_deploy.lock().unwrap()`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): replaced `.clone()` with a deref. `cargo clippy --workspace --all-targets -- -D warnings` now passes cleanly.
### Client.Rust-013
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site) |
| Status | Resolved |
**Description:** `cargo clippy --workspace --all-targets -- -D warnings` fails again on this commit, this time on a `clippy::doc_lazy_continuation` violation in generated code:
```
error: doc list item without indentation
--> .../mxaccess_gateway.v1.rs:526:5
|
526 | /// `timeout_ms == 0` uses the gateway-configured default (1000 ms).
| ^
```
The lint fires because the `ReadBulkCommand` proto comment (added with the bulk Read feature in commit `5e375f6`) writes a bulleted list and then a trailing paragraph without the required blank line. prost-build forwards the proto comment verbatim into Rust doc comments, and the Rust client compiles those generated modules with crate-default lints. The crate already opts out of `clippy::large_enum_variant` in `src/generated.rs` for exactly this kind of generator-style problem, but `doc_lazy_continuation` is not on the allow-list, so the lint reaches `-D warnings` and breaks the documented `cargo clippy --workspace --all-targets -- -D warnings` invocation that CLAUDE.md mandates pass. The Rust client review was previously closed as clippy-clean (Client.Rust-001/002/012); this is the third clippy-clean regression caused by generated code in this module and warrants a more durable fix.
**Recommendation:** Add `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` alongside `clippy::large_enum_variant`, so generated doc comments — which the client cannot edit — cannot break the `-D warnings` build. Independently, fix the upstream proto comment to insert a blank line before the trailing paragraph so the C# / Go / Python / Java generators do not carry the same flaky text.
**Resolution:** 2026-05-20 — Added `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` next to the existing `clippy::large_enum_variant` allow, and reformatted the `ReadBulkCommand` proto comment in `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto` to surround the bulleted list with blank lines so doc-comment generators in every language see a properly-terminated list. `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` now pass, and `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` reports 0 warnings.
### Client.Rust-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:450,497` |
| Status | Resolved |
**Description:** Client.Rust-011 made `Session` build unique correlation ids per call, but the `mxgw` CLI's `Ping` and `CloseSession` subcommands still hard-code `client_correlation_id: "rust-cli-ping".to_owned()` and `"rust-cli-close-session".to_owned()`. Both go through `client.invoke(…)` / `client.close_session_raw(…)` rather than the `Session` helpers, so the library's id generator does not run. The CLI is the cross-language e2e driver — when the same machine runs concurrent CLI smokes, every `ping`/`close-session` request collides on the same correlation id in gateway logs, defeating the diagnostic value the library fix unlocked.
**Recommendation:** Either (a) expose `session::next_correlation_id` as a `pub(crate)` or library-level helper and have the CLI call it from `Ping`/`CloseSession`, or (b) replace these RPCs with the higher-level `Session` helpers (`Session::close`, and a thin `Session::ping` wrapper) so the CLI shares the library's correlation-id discipline by construction.
**Resolution:** 2026-05-20 — Promoted `session::next_correlation_id` from a module-private helper to a `pub` library-level function (it already lived in the `pub mod session`) and updated the `mxgw` CLI's `Ping` and `CloseSession` subcommands to call `mxgateway_client::session::next_correlation_id("cli-ping")` / `next_correlation_id("cli-close-session")` instead of the hard-coded `"rust-cli-ping"` / `"rust-cli-close-session"` strings. Concurrent CLI smokes now produce unique correlation ids per call — driven by the same process-wide `CORRELATION_SEQUENCE` `AtomicU64` the library uses — so gateway logs can tell collisions apart again. `cargo fmt`, `cargo build --workspace`, `cargo clippy --workspace --all-targets -- -D warnings`, and `cargo test --workspace` all pass.
### Client.Rust-015
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1053-1070` |
| Status | Resolved |
**Description:** The new cross-language benchmark `bench-read-bulk` pushes the elapsed time of every `read_bulk` call into `latencies_ms` regardless of whether the call returned `Ok` or `Err`:
```rust
let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
latencies_ms.push(elapsed_ms);
match outcome {
Ok(results) => { successful_calls += 1; }
Err(_) => failed_calls += 1,
}
```
A failed `read_bulk` (transient `Unavailable`, deadline-exceeded mid-call, etc.) typically returns *later* than a successful one — it includes the full per-call timeout that the success path never waits for. The histogram therefore conflates "p99 cached-read latency" with "p99 of (cached-read + timed-out call)", and the JSON document the PowerShell driver collates publishes `latencyMs.p99` / `latencyMs.max` that no longer represent successful-call latency. Worse, the failure category is silently dropped (`Err(_) => failed_calls += 1`) so a benchmark run that fails on every call still emits a coherent-looking JSON without ever surfacing why. This is misleading for a benchmark whose JSON shape is the cross-language comparison contract.
**Recommendation:** Only push elapsed time into `latencies_ms` on `Ok`, or split into two histograms (`successLatencyMs` and `failureLatencyMs`) and log the first failure's error string into the stats record so a partial-failure run is visible at the report layer.
**Resolution:** 2026-05-20 — Extracted the per-iteration accounting in `bench-read-bulk` into a `BenchReadBulkStats` helper with explicit `record_success`/`record_failure` methods. Successful `read_bulk` calls now flow into `success_latencies_ms` (driving the cross-language `latencyMs.p99`/`max` JSON contract), failures flow into a separate `failure_latencies_ms` histogram surfaced as `failureLatencyMs`, and the first failure's redacted error string is stashed as `firstFailure` so a partial-failure run is visible at the report layer instead of producing a coherent-looking JSON that hides every error. Added a unit test (`bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram`) that records two fast successes plus a deliberately slow failure and asserts the success histogram never sees the failure latency, plus a smaller smoke test for the zero-duration calls-per-second path.
### Client.Rust-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768` |
| Status | Resolved |
**Description:** The fixes for Client.Rust-005 / 006 added five new `Error::MalformedReply` paths to `session.rs` (`register_server_handle`, `add_item_handle`, `add_item2_handle`, `bulk_results`, `bulk_write_results`) plus the inline branch in `read_bulk`. None of them are exercised by tests — every test in `client_behavior.rs` feeds the matching payload back to the client, so the malformed-reply branches are dead code from the test suite's perspective. The new bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) only have a single happy-path assertion via `write_bulk`, leaving the three other variants and every per-entry-failure shape untested. The bench-read-bulk flow has no test (the driver script is the only consumer). The `Error::Unavailable` variant from Client.Rust-010 is covered by `event_stream_surfaces_a_mid_stream_status_fault`, but the same variant on a unary `Code::Unavailable` is not.
**Recommendation:** Add three light tests against the existing `FakeGateway`:
1. Have the fake reply to `AddItem` (or `Register` / `AddItem2`) with `protocol_status = Ok` and no payload, and assert the client surfaces `Error::MalformedReply`.
2. Have the fake reply to `WriteBulk` with `protocol_status = Ok` and the wrong payload arm (e.g. an `AddItemReply` body), and assert `Error::MalformedReply`.
3. Have the fake fail the unary `Invoke` with `Status::unavailable(...)` and assert `Error::Unavailable`.
Optionally add Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk smoke assertions so all four bulk-write families have at least one round-trip test.
**Resolution:** 2026-05-20 — Added eight new integration tests in `clients/rust/tests/client_behavior.rs`. Each new `Error::MalformedReply` site is exercised via a test-only `InvokeOverride` injected into `FakeState` that lets a single test pin the fake gateway's `Invoke` handler to one of three malformed shapes (OK reply with no payload, OK reply with the wrong payload arm for `read_bulk`, OK reply with the wrong payload arm for the other bulk / bulk-write families): `register_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item2_returns_malformed_reply_when_ok_reply_has_no_payload`, `subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `write_bulk_returns_malformed_reply_on_mismatched_payload_arm`, and `read_bulk_returns_malformed_reply_on_mismatched_payload_arm`. The unary `Error::Unavailable` path is covered by `unary_invoke_maps_status_unavailable_to_error_unavailable` (the override returns `Status::unavailable(...)`). The remaining three bulk-write families gained round-trip smoke tests — `write2_bulk_round_trips_through_the_fake_gateway`, `write_secured_bulk_round_trips_through_the_fake_gateway`, `write_secured2_bulk_round_trips_through_the_fake_gateway` — extending the fake gateway's dispatcher with happy-path replies for `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk`. The `bench-read-bulk` flow gets a `BenchReadBulkStats` unit test in `crates/mxgw-cli/src/main.rs` (see Client.Rust-015) that asserts the latency-tracking change keeps failed-call durations out of `latencyMs`.
### Client.Rust-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:79-99,156-163` |
| Status | Resolved |
**Description:** CLAUDE.md requires docs to change with the source. `RustClientDesign.md` was refreshed to fix the layout/`tracing` drift (Client.Rust-007), but the Session API surface in the design (`Library API` block, lines 79-99) still lists only the original six bulk helpers — `add_item_bulk`, `advise_item_bulk`, `remove_item_bulk`, `un_advise_item_bulk`, `subscribe_bulk`, `unsubscribe_bulk` — and is missing the five new bulk-write helpers and `read_bulk` (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) that landed in commits `5e375f6` / `f220908` / `61644e6`. The `Error Handling` block (lines 130-146) still enumerates `Transport`, `Status`, `Authentication`, `Authorization`, `Session`, `Worker`, `Command`, `MxAccess`, `Timeout`, `Cancelled` — but not `MalformedReply`, `Unavailable`, or `InvalidEndpoint`, all of which are now public variants of the crate's `Error` enum. The `Test CLI` block (lines 158-163) lists `version` / `smoke` / `stream-events` / `write` but is missing every new subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, `galaxy watch`).
**Recommendation:** Bring the design doc back in sync: extend the `Session` API code block to enumerate the bulk-write/read methods, expand the `Error` enum to match `clients/rust/src/error.rs`, and add the missing CLI subcommands. The README is already up to date, so this is design-doc-only churn.
**Resolution:** 2026-05-20 — Brought `clients/rust/RustClientDesign.md` back in sync with the implementation. The `Session` block now lists the five new bulk helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) alongside the original six and notes that `session::next_correlation_id` is `pub` for raw-RPC consumers (the CLI). The `Error` enum block now matches `clients/rust/src/error.rs``InvalidEndpoint`, `InvalidArgument`, `Transport`, `Authentication`, `Authorization`, `Timeout`, `Cancelled`, `Unavailable`, `Status`, `Command`, `ProtocolStatus`, `MalformedReply` — with a short paragraph explaining what `Unavailable`, `MalformedReply`, and `InvalidEndpoint` classify. The `Test CLI` block enumerates every subcommand the binary exposes today: `version`, `ping`, `open-session`, `close-session`, `register`, `add-item`, `advise`, `subscribe-bulk`, `unsubscribe-bulk`, `read-bulk`, `write`, `write2`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `stream-events`, `bench-read-bulk`, `smoke`, and the `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}` subtree.