Files
mxaccessgw/code-reviews/Client.Rust/findings.md
T

765 lines
91 KiB
Markdown

# Code Review — Client.Rust
| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Status | Re-reviewed |
| Open findings | 0 |
## Checklist coverage
This re-review (`a020350`) covers the resolution work for Client.Rust-013 through 017 (scoped `doc_lazy_continuation` allow on generated submodules, `pub` `next_correlation_id` shared with the CLI, success/failure split in `bench-read-bulk`, eight new tests, design-doc resync). The pass spot-checked the items called out in the request: stability of the newly-`pub` `next_correlation_id`, the `bench-read-bulk` JSON shape vs the PowerShell driver, presence of `unsafe`, and the scope of `#![allow(clippy::doc_lazy_continuation)]`. `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found — the five new `MalformedReply` paths and the `read_bulk` mismatched-payload branch each have a dedicated test; `BenchReadBulkStats` correctly partitions success vs failure latency. |
| 2 | mxaccessgw conventions | No issues found — `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit; the `#![allow(clippy::doc_lazy_continuation)]` allow is scoped narrowly to each generated v1 inner module so hand-written code is unaffected; CLI `Ping`/`CloseSession` now call `session::next_correlation_id`. |
| 3 | Concurrency & thread safety | No issues found — `CORRELATION_SEQUENCE` is `AtomicU64` with `Relaxed`, correct for monotonic id generation; no `unsafe` anywhere in `src/` or `crates/`. |
| 4 | Error handling & resilience | Issue found: the `bench-read-bulk` fix for Client.Rust-015 has fixed Rust's own histogram honestly but the change makes Rust's `latencyMs` semantically incompatible with the four other clients' `latencyMs` field that the cross-language PowerShell driver collates side-by-side (Client.Rust-018). |
| 5 | Security | No issues found — API keys still redacted in `Debug`/`Display`, status messages scrubbed, `first_failure` records `Error::Display` (which already redacts `mxgw_*` tokens) so secure-write values cannot leak into the bench JSON. |
| 6 | Performance & resource management | No issues found in the reviewed delta. |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` Session signatures for the four bulk-write helpers and `read_bulk` do not match the actual implementation — the design lists trailing `user_id` / `timestamp` / `current_user_id` / `verifier_user_id` parameters and a `Vec<ReadBulkResult>` return that the impl does not have (all of those move per-entry into `WriteBulkEntry` etc.) (Client.Rust-019). |
| 8 | Code organization & conventions | No new issues — `BenchReadBulkStats` is cleanly factored out and tested. |
| 9 | Testing coverage | No new issues — the malformed-reply paths and unary `Error::Unavailable` are now covered, and the four bulk-write families each have round-trip smoke. |
| 10 | Documentation & comments | Issue found: `next_correlation_id` is now `pub` and its doc comment commits the SDK to the literal `"rust-client-{label}-{N}"` string format, but neither the doc nor `lib.rs` re-exports it or declares any stability stance, leaving the public surface ambiguous (Client.Rust-020). |
### 2026-05-24 review (commit d692232)
Re-review pass at `d692232`. Diff against `a020350` is commit `397d3c5`:
top-level crate renamed `mxgateway-client``zb-mom-ww-mxgateway-client`,
`build.rs` proto path updated, every `use mxgateway_client::` sweep-replaced
to `use zb_mom_ww_mxgateway_client::`, `tests/client_behavior.rs` updated
for the retired `session_id` and a new `stream_alarms` impl on the fake
gateway, and the protocol-version assertion bumped to `3`. Workspace
member layout unchanged. `cargo test --workspace` and `cargo clippy
--workspace --all-targets -- -D warnings` both clean at HEAD.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
| 2 | mxaccessgw conventions | No issues found — wire identifiers (CLI binary `mxgw`, `MXGATEWAY_*` env vars, `mxaccess_gateway.v1` proto packages) unchanged per commit message. |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | No issues found in this diff. |
| 5 | Security | No issues found in this diff. |
| 6 | Performance & resource management | No issues found in this diff. |
| 7 | Design-document adherence | Issues found: Client.Rust-021 (`RustClientDesign.md` "Crate layout" section still shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/` directory that does not exist; actual layout is the flat top-level crate at `clients/rust/`). |
| 8 | Code organization & conventions | No issues found in this diff. |
| 9 | Testing coverage | No issues found in this diff — the fake gateway's `stream_alarms` impl is a minimal stub, but no production behaviour relies on it being exercised by a test. |
| 10 | Documentation & comments | Issues found: Client.Rust-021 (design-doc drift). |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037`. The diff against `d692232` is dominated by
three feature commits — `71d2c39` (ports the cross-language `batch`
subcommand to every CLI; for Rust this adds a `Batch` clap variant plus a
`run_batch` stdin-driven dispatch loop, and a `.cargo/config.toml` setting
`-C link-arg=/STACK:8388608` so debug builds of the now-large clap-derive
Command enum do not overflow Windows's default 1 MB main-thread stack);
`3251069` (ports `BenchReadBulk` and re-adds the bulk read/write SDK
methods to `session.rs``read_bulk`, `write_bulk`, `write2_bulk`,
`write_secured_bulk`, `write_secured2_bulk` plus the matching CLI
subcommands; this commit silently dropped the `BenchReadBulkStats`
helper, the `failureLatencyMs`/`firstFailure` JSON enrichment, and every
test from Client.Rust-015/016/018 that lived on the divergent branch);
and `7de4efe` (adds `stream_alarms` / `acknowledge_alarm` raw SDK methods
on `GatewayClient`, the `AlarmFeedStream` type alias re-exported at the
crate root, two new CLI subcommands `stream-alarms` /
`acknowledge-alarm`, and fixes `event_to_json` to render `MxEventFamily`
as the protobuf enum name so the cross-language e2e parity matcher
recognises events). `8738735` is a README-only update for the new alarm
SDK methods + CLI examples.
`cargo test --workspace` is clean (26 tests across the library,
client-behavior integration suite, and proto-fixture suite). `cargo
clippy --workspace --all-targets -- -D warnings` **fails at HEAD** with
three errors — the same three the previous reviewer flagged in
Client.Rust-021's resolution as "out of scope" (see Client.Rust-029
below): `options.rs:98,143` missing-docs, `galaxy.rs:282` clone-on-copy,
and `session.rs:664` enum-variant-names on `BulkReplyKind`. All three
were resolved at `a020350` (Client.Rust-001/002/012) and regressed when
the crate rename `397d3c5` reverted parts of the prior fix; the diff
under review does not address them.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: the new bulk read/write SDK methods (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) silently return `Vec::new()` on malformed-but-OK replies (Client.Rust-022). |
| 2 | mxaccessgw conventions | Issues found: hard-coded correlation IDs in the new CLI subcommands `stream-alarms`, `acknowledge-alarm`, and `bench-read-bulk`'s session-close (Client.Rust-023); clippy regression at HEAD (Client.Rust-029). |
| 3 | Concurrency & thread safety | No issues found — `stream_alarms`/`acknowledge_alarm` go through the existing `unary_request`/`stream_request` path, no new shared mutable state. The `run_batch` stdin loop uses blocking `std::io::Stdin::lock().lines()` on a tokio main task but each dispatched subcommand is spawned on a fresh task so its future does not share the main-thread stack. |
| 4 | Error handling & resilience | No new issues beyond Client.Rust-022's malformed-OK silent default. `bench-read-bulk`'s session close is best-effort and ordered so a bench-loop error does not mask the close (`let _ = close_result;` after `bench_outcome?`). |
| 5 | Security | No issues found — alarm `comment`/`operator` strings flow through the redaction-aware `Error::Display` path; bench's `tags` list is logged in cleartext but tags are not credentials. |
| 6 | Performance & resource management | Issue found: `run_bench_read_bulk` clones the `Vec<String>` tag list on every warmup + steady-state iteration (Client.Rust-026). |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` is silent on the new alarm SDK methods, the new bulk read/write SDK methods, every new CLI subcommand, and the new `.cargo/config.toml` workaround (Client.Rust-025). |
| 8 | Code organization & conventions | No new issues — the `event_to_json` family-as-enum-name fix is a small, localised change; the bulk-write helpers are factored consistently with the existing bulk-subscribe helpers. |
| 9 | Testing coverage | Issue found: zero tests cover `stream_alarms` on `GatewayClient`, the new bulk read/write SDK methods, or the `BenchReadBulk` flow; the fake gateway's `stream_alarms` impl drops the sender immediately (Client.Rust-024). |
| 10 | Documentation & comments | Issue found: `.cargo/config.toml`'s comment promises "Release builds are unaffected" but the `link-arg=/STACK:8388608` setting is unconditional under `cfg(windows)` and only applies to the MSVC linker (Client.Rust-027). |
### 2026-06-15 re-review (commit 410acc9)
Re-review pass at `410acc9`. The diff against `42b0037` (`git diff 42b0037..HEAD -- clients/rust/`) covers: Cargo metadata + Gitea alternative-registry config (`Cargo.toml`, `.cargo/config.toml`, README install section); a `[registries.dohertj2-gitea]` index entry and `publish = ["dohertj2-gitea"]` with `mxgw-cli` set `publish = false`; the resolution work for Client.Rust-022..029 (malformed-reply `Result` plumbing, `next_correlation_id` re-export, clippy fixes, `read_bulk<S: AsRef<str>>`); a **new** Galaxy lazy-browse walker (`browse`, `browse_children_raw`, `browse_children_inner`, `BrowseChildrenOptions`, `LazyBrowseNode`) with six unit tests; a **new** TLS pin-only guard (`build_tls_config` + `ClientOptions::with_require_certificate_validation` + `--require-certificate-validation` CLI flag) with a new `tests/tls.rs`; and the alarm-provider-fallback proto surface (`AlarmFeedMessage.provider_status`, added contracts-side in `1d85db7`).
`cargo fmt --check` is clean. `cargo check -p zb-mom-ww-mxgateway-client`, `cargo test -p zb-mom-ww-mxgateway-client` (24 lib + integration, 4 proto-fixture, 4 tls — all pass), and the library half of the workspace are clean. **`cargo clippy --workspace --all-targets -- -D warnings` and `cargo check --workspace` both FAIL at HEAD** — not on a lint but on a hard `E0004` compile error: the `mxgw-cli` binary's two `match &message.payload` blocks (`crates/mxgw-cli/src/main.rs:1731,1757`) are non-exhaustive after the proto added `AlarmFeedMessage.payload::ProviderStatus` (Client.Rust-030). The library crate compiles and all its tests pass; the break is confined to the CLI binary. No committed registry tokens — `.cargo/config.toml` carries only the sparse index URL; the README documents the token living in `~/.cargo/credentials.toml`.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: `mxgw-cli` fails to compile at HEAD — non-exhaustive `AlarmFeedMessage.payload` match missing the new `ProviderStatus` arm (Client.Rust-030). The library `read_bulk`/galaxy-walker/TLS-guard logic is correct and tested. |
| 2 | mxaccessgw conventions | Issue found: `cargo clippy --workspace --all-targets -- -D warnings` / `cargo check --workspace` do not pass — CLAUDE.md mandates they do (Client.Rust-030). The prior 029 clippy regressions are resolved; this is a new build break from the alarm-provider proto change. |
| 3 | Concurrency & thread safety | No issues found — `LazyBrowseNode` shares state via `Arc<…AsyncMutex<…>>`; `expand()` holds the mutex across the `browse_children_inner` await so concurrent expanders serialize and the idempotency check is race-free. `CORRELATION_SEQUENCE` is still `AtomicU64`/`Relaxed`. No `unsafe`. |
| 4 | Error handling & resilience | Issue found: the strict TLS path (`require_certificate_validation(true)` with no CA) builds a `ClientTlsConfig` with zero trust roots (no `tls-native-roots`/`tls-webpki-roots` feature, no `.with_*_roots()` call), so it cannot validate any certificate — contradicting the documented "verify against the system trust roots" behaviour (Client.Rust-031). The galaxy page-token loop has a correct repeated-token guard. |
| 5 | Security | No issues found in the registry/secret surface — `.cargo/config.toml` holds only the sparse index URL, no token; README puts the Bearer token in `~/.cargo/credentials.toml` (uncommitted). (See Client.Rust-031 for the strict-TLS validation gap, classified under error handling.) |
| 6 | Performance & resource management | No issues found — `read_bulk` is now borrow-based (`&[S]`), the bench loop reuses `tags_ref` (Client.Rust-026 resolved). The walker clones the `GalaxyClient` channel handle per node, which is the intended cheap `Channel` clone. |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` is not updated for the new Galaxy lazy-browse SDK surface (`browse` / `browse_children_raw` / `LazyBrowseNode` / `BrowseChildrenOptions`); CLAUDE.md requires docs to change with the source (Client.Rust-032). The TLS pin-only section pre-dates this diff but repeats the inaccurate "system trust roots" claim (cross-referenced from Client.Rust-031). |
| 8 | Code organization & conventions | No issues found — Cargo metadata (name/version/license/repository/keywords/categories) is well-formed; `publish = ["dohertj2-gitea"]` on the library and `publish = false` on `mxgw-cli` is the right split. `license = "Proprietary"` is non-SPDX but cargo accepts it and it is a deliberate closed-source marker. |
| 9 | Testing coverage | No issues found in the new surface — the walker has six unit tests (roots, expand, idempotency, NotFound, multi-page, filter-forwarding) and TLS has four. Gap noted: `tls_with_require_certificate_validation_does_not_short_circuit` connects to a dead address, so it only asserts the guard does not fire and never exercises a real handshake — which is why the no-trust-roots defect in Client.Rust-031 is not caught by a test. |
| 10 | Documentation & comments | Issue found: the `alarm_feed_message_summary` / `alarm_feed_message_to_json` doc comments still say "three `payload` oneof cases" (`main.rs:1729,1755`) although the proto now has four; folded into Client.Rust-030's fix. The TLS doc inaccuracy is Client.Rust-031. |
## Findings
### Client.Rust-001
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Resolved |
**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.
**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`
**Resolution:** Resolved in `0d8a28d` (2026-05-18): doc comments added to both methods.
### Client.Rust-002
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Resolved |
**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.
**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): variants renamed to `AddItem`/`AdviseItem`/`RemoveItem`/`UnAdviseItem`/`Subscribe`/`Unsubscribe`, which no longer share a common suffix.
### Client.Rust-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Resolved |
**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.
**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the test now asserts against the `GATEWAY_PROTOCOL_VERSION` / `WORKER_PROTOCOL_VERSION` constants, so it cannot drift again.
### Client.Rust-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Resolved |
**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.
**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `CLIENT_VERSION` is now `env!("CARGO_PKG_VERSION")`, taken from `Cargo.toml` at compile time so the two cannot drift.
### Client.Rust-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Resolved |
**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.
**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the three handle extractors now return `Result<i32, Error>` and yield the new `Error::MalformedReply` when an OK reply carries no usable handle.
### Client.Rust-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Resolved |
**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".
**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `bulk_results` now returns `Result<Vec<SubscribeResult>, Error>` and yields `Error::MalformedReply` on a mismatched or absent bulk payload.
### Client.Rust-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Resolved |
**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.
**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).
**Resolution:** Resolved in `0d8a28d` (2026-05-18): the "Crate Layout" section now shows the actual flat layout (`mxgateway-client` as the workspace-root crate, `mxgw-cli` as a member) and the unused `tracing` entry was removed from the dependency list.
### Client.Rust-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Resolved |
**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.
**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): `MxValue` and `MxArrayValue` no longer cache a `projection` field — `projection()` computes the typed view on demand from `raw`. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.
### Client.Rust-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Resolved |
**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).
**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added `add_item_bulk_rejects_input_above_the_thousand_item_cap`, `event_stream_surfaces_a_mid_stream_status_fault` (the fake gateway now optionally emits a mid-stream `Status::unavailable`), and `connect_with_unreadable_ca_file_reports_invalid_endpoint`.
### Client.Rust-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Resolved |
**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.
**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): added the `Error::Unavailable` variant; `From<tonic::Status>` maps `Code::Unavailable` and `Code::ResourceExhausted` to it, so callers can classify transient failures without unwrapping the raw status.
### Client.Rust-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Resolved |
**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.
**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): correlation ids are built by `next_correlation_id`, which appends a process-wide atomic sequence number; `Session::close` uses it too.
### Client.Rust-012
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/galaxy.rs:282` |
| Status | Resolved |
**Description:** Found while verifying the fix for Client.Rust-001/002: `cargo clippy --workspace --all-targets -- -D warnings` reported a third violation the original review missed. The `get_last_deploy_time` test fake calls `.clone()` on a `MutexGuard<Option<prost_types::Timestamp>>`, and `Option<Timestamp>` is `Copy` (`clippy::clone_on_copy`). Under `-D warnings` this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.
**Recommendation:** Dereference instead of cloning: `*self.state.last_deploy.lock().unwrap()`.
**Resolution:** Resolved in `0d8a28d` (2026-05-18): replaced `.clone()` with a deref. `cargo clippy --workspace --all-targets -- -D warnings` now passes cleanly.
### Client.Rust-013
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site) |
| Status | Resolved |
**Description:** `cargo clippy --workspace --all-targets -- -D warnings` fails again on this commit, this time on a `clippy::doc_lazy_continuation` violation in generated code:
```
error: doc list item without indentation
--> .../mxaccess_gateway.v1.rs:526:5
|
526 | /// `timeout_ms == 0` uses the gateway-configured default (1000 ms).
| ^
```
The lint fires because the `ReadBulkCommand` proto comment (added with the bulk Read feature in commit `5e375f6`) writes a bulleted list and then a trailing paragraph without the required blank line. prost-build forwards the proto comment verbatim into Rust doc comments, and the Rust client compiles those generated modules with crate-default lints. The crate already opts out of `clippy::large_enum_variant` in `src/generated.rs` for exactly this kind of generator-style problem, but `doc_lazy_continuation` is not on the allow-list, so the lint reaches `-D warnings` and breaks the documented `cargo clippy --workspace --all-targets -- -D warnings` invocation that CLAUDE.md mandates pass. The Rust client review was previously closed as clippy-clean (Client.Rust-001/002/012); this is the third clippy-clean regression caused by generated code in this module and warrants a more durable fix.
**Recommendation:** Add `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` alongside `clippy::large_enum_variant`, so generated doc comments — which the client cannot edit — cannot break the `-D warnings` build. Independently, fix the upstream proto comment to insert a blank line before the trailing paragraph so the C# / Go / Python / Java generators do not carry the same flaky text.
**Resolution:** 2026-05-20 — Added `#![allow(clippy::doc_lazy_continuation)]` to each generated submodule in `clients/rust/src/generated.rs` next to the existing `clippy::large_enum_variant` allow, and reformatted the `ReadBulkCommand` proto comment in `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto` to surround the bulleted list with blank lines so doc-comment generators in every language see a properly-terminated list. `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` now pass, and `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` reports 0 warnings.
### Client.Rust-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:450,497` |
| Status | Resolved |
**Description:** Client.Rust-011 made `Session` build unique correlation ids per call, but the `mxgw` CLI's `Ping` and `CloseSession` subcommands still hard-code `client_correlation_id: "rust-cli-ping".to_owned()` and `"rust-cli-close-session".to_owned()`. Both go through `client.invoke(…)` / `client.close_session_raw(…)` rather than the `Session` helpers, so the library's id generator does not run. The CLI is the cross-language e2e driver — when the same machine runs concurrent CLI smokes, every `ping`/`close-session` request collides on the same correlation id in gateway logs, defeating the diagnostic value the library fix unlocked.
**Recommendation:** Either (a) expose `session::next_correlation_id` as a `pub(crate)` or library-level helper and have the CLI call it from `Ping`/`CloseSession`, or (b) replace these RPCs with the higher-level `Session` helpers (`Session::close`, and a thin `Session::ping` wrapper) so the CLI shares the library's correlation-id discipline by construction.
**Resolution:** 2026-05-20 — Promoted `session::next_correlation_id` from a module-private helper to a `pub` library-level function (it already lived in the `pub mod session`) and updated the `mxgw` CLI's `Ping` and `CloseSession` subcommands to call `mxgateway_client::session::next_correlation_id("cli-ping")` / `next_correlation_id("cli-close-session")` instead of the hard-coded `"rust-cli-ping"` / `"rust-cli-close-session"` strings. Concurrent CLI smokes now produce unique correlation ids per call — driven by the same process-wide `CORRELATION_SEQUENCE` `AtomicU64` the library uses — so gateway logs can tell collisions apart again. `cargo fmt`, `cargo build --workspace`, `cargo clippy --workspace --all-targets -- -D warnings`, and `cargo test --workspace` all pass.
### Client.Rust-015
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1053-1070` |
| Status | Resolved |
**Description:** The new cross-language benchmark `bench-read-bulk` pushes the elapsed time of every `read_bulk` call into `latencies_ms` regardless of whether the call returned `Ok` or `Err`:
```rust
let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
latencies_ms.push(elapsed_ms);
match outcome {
Ok(results) => { successful_calls += 1; }
Err(_) => failed_calls += 1,
}
```
A failed `read_bulk` (transient `Unavailable`, deadline-exceeded mid-call, etc.) typically returns *later* than a successful one — it includes the full per-call timeout that the success path never waits for. The histogram therefore conflates "p99 cached-read latency" with "p99 of (cached-read + timed-out call)", and the JSON document the PowerShell driver collates publishes `latencyMs.p99` / `latencyMs.max` that no longer represent successful-call latency. Worse, the failure category is silently dropped (`Err(_) => failed_calls += 1`) so a benchmark run that fails on every call still emits a coherent-looking JSON without ever surfacing why. This is misleading for a benchmark whose JSON shape is the cross-language comparison contract.
**Recommendation:** Only push elapsed time into `latencies_ms` on `Ok`, or split into two histograms (`successLatencyMs` and `failureLatencyMs`) and log the first failure's error string into the stats record so a partial-failure run is visible at the report layer.
**Resolution:** 2026-05-20 — Extracted the per-iteration accounting in `bench-read-bulk` into a `BenchReadBulkStats` helper with explicit `record_success`/`record_failure` methods. Successful `read_bulk` calls now flow into `success_latencies_ms` (driving the cross-language `latencyMs.p99`/`max` JSON contract), failures flow into a separate `failure_latencies_ms` histogram surfaced as `failureLatencyMs`, and the first failure's redacted error string is stashed as `firstFailure` so a partial-failure run is visible at the report layer instead of producing a coherent-looking JSON that hides every error. Added a unit test (`bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram`) that records two fast successes plus a deliberately slow failure and asserts the success histogram never sees the failure latency, plus a smaller smoke test for the zero-duration calls-per-second path.
### Client.Rust-016
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768` |
| Status | Resolved |
**Description:** The fixes for Client.Rust-005 / 006 added five new `Error::MalformedReply` paths to `session.rs` (`register_server_handle`, `add_item_handle`, `add_item2_handle`, `bulk_results`, `bulk_write_results`) plus the inline branch in `read_bulk`. None of them are exercised by tests — every test in `client_behavior.rs` feeds the matching payload back to the client, so the malformed-reply branches are dead code from the test suite's perspective. The new bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) only have a single happy-path assertion via `write_bulk`, leaving the three other variants and every per-entry-failure shape untested. The bench-read-bulk flow has no test (the driver script is the only consumer). The `Error::Unavailable` variant from Client.Rust-010 is covered by `event_stream_surfaces_a_mid_stream_status_fault`, but the same variant on a unary `Code::Unavailable` is not.
**Recommendation:** Add three light tests against the existing `FakeGateway`:
1. Have the fake reply to `AddItem` (or `Register` / `AddItem2`) with `protocol_status = Ok` and no payload, and assert the client surfaces `Error::MalformedReply`.
2. Have the fake reply to `WriteBulk` with `protocol_status = Ok` and the wrong payload arm (e.g. an `AddItemReply` body), and assert `Error::MalformedReply`.
3. Have the fake fail the unary `Invoke` with `Status::unavailable(...)` and assert `Error::Unavailable`.
Optionally add Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk smoke assertions so all four bulk-write families have at least one round-trip test.
**Resolution:** 2026-05-20 — Added eight new integration tests in `clients/rust/tests/client_behavior.rs`. Each new `Error::MalformedReply` site is exercised via a test-only `InvokeOverride` injected into `FakeState` that lets a single test pin the fake gateway's `Invoke` handler to one of three malformed shapes (OK reply with no payload, OK reply with the wrong payload arm for `read_bulk`, OK reply with the wrong payload arm for the other bulk / bulk-write families): `register_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item2_returns_malformed_reply_when_ok_reply_has_no_payload`, `subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `write_bulk_returns_malformed_reply_on_mismatched_payload_arm`, and `read_bulk_returns_malformed_reply_on_mismatched_payload_arm`. The unary `Error::Unavailable` path is covered by `unary_invoke_maps_status_unavailable_to_error_unavailable` (the override returns `Status::unavailable(...)`). The remaining three bulk-write families gained round-trip smoke tests — `write2_bulk_round_trips_through_the_fake_gateway`, `write_secured_bulk_round_trips_through_the_fake_gateway`, `write_secured2_bulk_round_trips_through_the_fake_gateway` — extending the fake gateway's dispatcher with happy-path replies for `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk`. The `bench-read-bulk` flow gets a `BenchReadBulkStats` unit test in `crates/mxgw-cli/src/main.rs` (see Client.Rust-015) that asserts the latency-tracking change keeps failed-call durations out of `latencyMs`.
### Client.Rust-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:79-99,156-163` |
| Status | Resolved |
**Description:** CLAUDE.md requires docs to change with the source. `RustClientDesign.md` was refreshed to fix the layout/`tracing` drift (Client.Rust-007), but the Session API surface in the design (`Library API` block, lines 79-99) still lists only the original six bulk helpers — `add_item_bulk`, `advise_item_bulk`, `remove_item_bulk`, `un_advise_item_bulk`, `subscribe_bulk`, `unsubscribe_bulk` — and is missing the five new bulk-write helpers and `read_bulk` (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) that landed in commits `5e375f6` / `f220908` / `61644e6`. The `Error Handling` block (lines 130-146) still enumerates `Transport`, `Status`, `Authentication`, `Authorization`, `Session`, `Worker`, `Command`, `MxAccess`, `Timeout`, `Cancelled` — but not `MalformedReply`, `Unavailable`, or `InvalidEndpoint`, all of which are now public variants of the crate's `Error` enum. The `Test CLI` block (lines 158-163) lists `version` / `smoke` / `stream-events` / `write` but is missing every new subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, `galaxy watch`).
**Recommendation:** Bring the design doc back in sync: extend the `Session` API code block to enumerate the bulk-write/read methods, expand the `Error` enum to match `clients/rust/src/error.rs`, and add the missing CLI subcommands. The README is already up to date, so this is design-doc-only churn.
**Resolution:** 2026-05-20 — Brought `clients/rust/RustClientDesign.md` back in sync with the implementation. The `Session` block now lists the five new bulk helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`) alongside the original six and notes that `session::next_correlation_id` is `pub` for raw-RPC consumers (the CLI). The `Error` enum block now matches `clients/rust/src/error.rs``InvalidEndpoint`, `InvalidArgument`, `Transport`, `Authentication`, `Authorization`, `Timeout`, `Cancelled`, `Unavailable`, `Status`, `Command`, `ProtocolStatus`, `MalformedReply` — with a short paragraph explaining what `Unavailable`, `MalformedReply`, and `InvalidEndpoint` classify. The `Test CLI` block enumerates every subcommand the binary exposes today: `version`, `ping`, `open-session`, `close-session`, `register`, `add-item`, `advise`, `subscribe-bulk`, `unsubscribe-bulk`, `read-bulk`, `write`, `write2`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `stream-events`, `bench-read-bulk`, `smoke`, and the `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}` subtree.
### Client.Rust-018
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1098-1170`; `scripts/bench-read-bulk.ps1:347-365`; siblings: `clients/go/cmd/mxgw-go/main.go:600-648`, `clients/python/src/mxgateway_cli/commands.py:614-662`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:685-770`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-940` |
| Status | Resolved |
**Description:** Client.Rust-015's resolution split Rust's bench histogram so `latencyMs` records only successful `read_bulk` calls and a new `failureLatencyMs` field holds failed-call durations. The local logic is right, the unit test (`bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram`) is right, and the JSON shape stays additively compatible with `scripts/bench-read-bulk.ps1` (the collator reads `$s.latencyMs.p50`/`p95`/`p99`/`max`/`mean` and these keys still exist on the Rust output). The problem is cross-language: the .NET, Go, Python, and Java bench implementations still push every call's elapsed time into a single `latenciesMs` / `latencies_ms` / `latencyMillis` array regardless of success or failure (e.g. `clients/go/cmd/mxgw-go/main.go:611` appends before the success/failure branch; `clients/python/src/mxgateway_cli/commands.py:624,626` appends in both `except` and the happy path; `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:701,705` adds in both `catch` and the OK path; `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:865,880` records in both branches). The PS driver's side-by-side comparison table (lines 348-360) pulls `latencyMs.p50/p95/p99/max/mean` from every client and prints them in one row, so a partial-failure run now shows Rust's p99 measured over successes only and the other four clients' p99 measured over (success + per-call timeout) — the numbers are silently no longer comparable. This re-introduces the original Client.Rust-015 problem at the cross-language layer that the fix was meant to remove.
**Recommendation:** Make the contract uniform. Either (a) revert Rust's `latencyMs` to the all-calls histogram for backwards/cross-language compatibility and keep `failureLatencyMs` as an additive Rust-only enrichment, or (b) push the same success-only / failure-separated split into the .NET, Go, Python, and Java bench commands so every language emits the honest pair (`latencyMs` = success, `failureLatencyMs` = failure, plus `firstFailure`) and update the PS driver's table column to make the success-only semantics explicit (`p99 ok ms`). Option (b) is the better long-term posture but it is a cross-client change; option (a) restores comparability immediately.
**Resolution:** 2026-05-20 — Took option (a) to restore cross-language comparability immediately. Reverted Rust's `latencyMs` to the all-calls histogram so it matches the .NET/Go/Python/Java bench shape that `scripts/bench-read-bulk.ps1` collates side-by-side: `BenchReadBulkStats::record_success` and `record_failure` now both push elapsed time into a single `latencies_ms` vector, and `record_failure` additionally pushes into `failure_latencies_ms` and stashes the first failure's redacted error string in `first_failure`. The JSON output keeps `failureLatencyMs` and `firstFailure` as Rust-only additive enrichment so a partial-failure run is still visible at the report layer without breaking the side-by-side table. Renamed the unit test to `bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately`; it now asserts `latencyMs.max == 1500.0` (the slow failure is included in the cross-language `latencyMs` contract) while `failureLatencyMs.max == 1500.0` and `firstFailure` still surface the failure separately for diagnostics. Pushing the success-only / failure-separated split into the other four clients (option (b)) is the better long-term posture but is deliberately out of scope here.
### Client.Rust-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:96-100` |
| Status | Resolved |
**Description:** Client.Rust-017 was closed by adding the new bulk-write/read entries to the design doc, but the signatures shown in the code block do not match the implementation. The doc declares:
```rust
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;
```
The actual implementations in `clients/rust/src/session.rs:385-526` take only `(server_handle, entries)``user_id` is per-entry on `WriteBulkEntry`/`Write2BulkEntry`, `timestamp_value` is per-entry on `Write2BulkEntry`/`WriteSecured2BulkEntry`, and `current_user_id`/`verifier_user_id` are per-entry on `WriteSecured{,2}BulkEntry`. The protobuf in `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:364-416` confirms this — there is no top-level `user_id` on these commands. The doc also returns `Vec<ReadBulkResult>` but the generated type is `BulkReadResult` (the gateway's `BulkReadReply` carries `repeated BulkReadResult`), and the actual signature is `read_bulk<S: AsRef<str>>(..., tag_addresses: &[S], ...) -> Vec<BulkReadResult>` — generic over `AsRef<str>` so callers can pass either `Vec<String>` or `[&str]`.
The drift is small but the design doc was the explicit subject of Client.Rust-017's resolution, so it warrants a follow-up. CLAUDE.md requires docs to change with the source.
**Recommendation:** Replace the five signatures in `RustClientDesign.md:96-100` with the ones actually in `session.rs`:
```rust
pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;
```
and add a one-line note that the per-entry fields (`user_id`, `timestamp_value`, `current_user_id`, `verifier_user_id`) live on the entry structs themselves.
**Resolution:** 2026-05-20 — Replaced the five drifted signatures in `RustClientDesign.md` with the ones that actually live in `clients/rust/src/session.rs`: `write_bulk` / `write2_bulk` / `write_secured_bulk` / `write_secured2_bulk` take only `(server_handle, entries)`, and `read_bulk<S: AsRef<str>>` takes a borrowed `&[S]` and returns `Vec<BulkReadResult>` (not `Vec<ReadBulkResult>`). Added a follow-up paragraph noting that the per-entry fields `user_id` / `timestamp_value` / `current_user_id` / `verifier_user_id` live on `WriteBulkEntry` / `Write2BulkEntry` / `WriteSecuredBulkEntry` / `WriteSecured2BulkEntry` themselves rather than as trailing positional arguments, matching the protobuf shapes in `mxaccess_gateway.proto`, and that `read_bulk` is generic over `AsRef<str>` so callers can pass `&[String]` or `&[&str]` without cloning at the call site.
### Client.Rust-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39` |
| Status | Resolved |
**Description:** Client.Rust-014's resolution promoted `next_correlation_id` from a module-private helper to a `pub` function so the `mxgw` CLI's raw-RPC paths can share the library's correlation-id discipline. The doc comment commits the library to a literal string format — `"rust-client-{label}-{N}"` — that external code can now depend on. Two concerns:
1. The function is not re-exported at the crate root in `lib.rs` (it only ships through the `pub mod session` namespace), so the in-tree caller writes the long `mxgateway_client::session::next_correlation_id("cli-ping")` path. Either re-export it via `#[doc(inline)] pub use session::next_correlation_id;` or leave it where it is and add a short note in the doc — but the current state straddles "public API" and "lib-internal helper" without saying which.
2. The doc comment does not declare a stability stance (no `#[doc(hidden)]`, no "experimental" note, no `__priv` naming). As written it promises the literal format `"rust-client-{label}-{N}"` to any out-of-tree consumer; a future change that renames the prefix (for example to drop the `rust-` after a multi-client reformat) would be a behavioural break. The `RustClientDesign.md` resolution of Client.Rust-017 ("`session::next_correlation_id` is `pub`") reads similarly — it does not say whether the format is stable.
The combination — `pub`, format-committing doc, no stability note, no crate-root re-export — leaves the public surface ambiguous. The same review category (Documentation & comments) is where Client.Rust-014's CLI-side fix is now visible, so this is the natural place to clean it up.
**Recommendation:** Pick one of:
- Treat `next_correlation_id` as part of the SDK's public API. Re-export it from `lib.rs` (`#[doc(inline)] pub use session::next_correlation_id;`), rewrite the doc comment to *not* promise the literal `"rust-client-{label}-{N}"` format (just the property "monotonic, unique within a process, includes the supplied label"), and call that out in `RustClientDesign.md`.
- Treat it as internal-only. Mark it `#[doc(hidden)] pub` and add a `// Internal helper exposed for the in-tree `mxgw` CLI; not part of the public SDK contract.` comment so out-of-tree consumers do not build against a format that the SDK is free to change.
The CLI integration in Client.Rust-014 works either way; this is solely about declaring intent so the SDK's public surface is unambiguous.
**Resolution:** 2026-05-20 — Took the "treat as public SDK API" branch. Re-exported `next_correlation_id` at the crate root in `clients/rust/src/lib.rs` (`#[doc(inline)] pub use session::{next_correlation_id, Session};`) so in-tree and external callers can write the short `mxgateway_client::next_correlation_id(...)` path. Updated the in-tree `mxgw` CLI (`Ping` and `CloseSession` subcommands) to call through the crate-root re-export instead of `mxgateway_client::session::next_correlation_id`. Rewrote the doc comment to drop the format promise: the returned id is now documented as an opaque token with three guaranteed properties (embeds the supplied `label`, unique within a process via an internal monotonic atomic sequence, carries no embedded secret beyond `label`), and the doc explicitly states that the textual format `rust-client-{label}-{N}` is *not* part of the public contract and that callers must not parse it. Cross-referenced the crate-root re-export from the function-level doc. Updated `RustClientDesign.md` to call out that `next_correlation_id` is part of the public SDK surface, re-exported at the crate root, and that its textual format is intentionally not part of the contract.
### Client.Rust-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-33` |
| Status | Resolved |
**Description:** The crate-name change in commit `397d3c5` (top-level `mxgateway-client``zb-mom-ww-mxgateway-client`) is reflected in `Cargo.toml`, `Cargo.lock`, every `use zb_mom_ww_mxgateway_client::` import, and `build.rs`. The "Recommended layout" block in `RustClientDesign.md:21-33` shows a nested structure with a top-level `crates/zb-mom-ww-mxgateway-client/` subdirectory containing `src/lib.rs`, `src/client.rs`, etc. — but the actual layout on disk is flat: the top-level crate lives at `clients/rust/` directly (with `src/`, `Cargo.toml`, and `build.rs` at the workspace root) and `crates/mxgw-cli/` is the only nested member. A reader consulting the design to understand the layout will be misled into looking for `crates/zb-mom-ww-mxgateway-client/` that does not exist. `CLAUDE.md` requires design docs to track code changes.
**Recommendation:** Update `RustClientDesign.md:14-33` to describe the actual flat layout: workspace root at `clients/rust/`, top-level crate `zb-mom-ww-mxgateway-client` (declared in `Cargo.toml`) with `src/lib.rs`, `src/client.rs`, etc. directly under it; `crates/mxgw-cli/` is the single member subcrate. Alternatively label the block as "Aspirational nested layout (not currently adopted)" and add a separate "Current layout" section.
**Resolution:** 2026-05-24 — Rewrote the "Crate Layout" section of `clients/rust/RustClientDesign.md` to describe the actual flat layout on disk instead of the aspirational nested form. The section now opens with a short paragraph stating that the workspace is rooted at `clients/rust/`, that `clients/rust/Cargo.toml` declares the top-level crate `zb-mom-ww-mxgateway-client` itself (flat layout — `src/` sits directly under the workspace root, not under `crates/`), and that the only `[workspace.members]` entry is the `mxgw-cli` binary subcrate under `crates/mxgw-cli/`. The accompanying tree replaces the fictitious `crates/zb-mom-ww-mxgateway-client/` subdirectory with the real files: `Cargo.toml`, `Cargo.lock`, `build.rs`, `README.md`, `RustClientDesign.md`, `src/{lib,client,session,galaxy,options,auth,error,value,version,generated}.rs` plus `src/generated/` (tonic-build output), `tests/{client_behavior,proto_fixtures}.rs`, and `crates/mxgw-cli/` annotated as the sole workspace member producing the `mxgw` binary. `cargo build --workspace` and `cargo test --workspace` are clean at HEAD (29 tests pass across the library, integration, and proto-fixture targets). `cargo clippy --workspace --all-targets -- -D warnings` reports three pre-existing errors at HEAD on this dev box (`options.rs:98,143` missing docs, `galaxy.rs:282` `clone_on_copy`, `session.rs:522` `enum_variant_names`) that exist independently of this doc-only change — verified by running clippy with the design-doc edit stashed; tracking those is out of scope for Client.Rust-021.
### Client.Rust-022
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724` |
| Status | Resolved |
**Description:** Commit `3251069` re-introduced the bulk read/write SDK methods (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`) on `Session`. Each method falls back to `Vec::new()` when an OK reply does not carry the expected typed payload arm:
```rust
Ok(match reply.payload {
Some(mx_command_reply::Payload::ReadBulk(reply)) => reply.results,
_ => Vec::new(),
})
```
and `bulk_write_results` does the same for the four write families. A caller of `write_bulk` / `read_bulk` on a malformed-but-OK reply therefore sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply". This is exactly the anti-pattern Client.Rust-006 fixed in `0d8a28d` (which added an `Error::MalformedReply` variant and made `bulk_results` return `Result<Vec<…>, Error>`), and which Client.Rust-005 fixed for `register_server_handle` / `add_item_handle` / `add_item2_handle` (those three also now fall back to `unwrap_or_default()` returning handle `0` at `session.rs:631-660`, regressing alongside the new methods). The fix lived on the divergent branch and was lost when the d692232 line of history was re-platformed without picking up the post-`a020350` Resolve commits — `Error::MalformedReply` is absent from `clients/rust/src/error.rs` at HEAD.
**Recommendation:** Re-apply the Client.Rust-005/006 resolutions on top of the new methods: add `Error::MalformedReply { detail: String }` back to `error.rs`, change `register_server_handle` / `add_item_handle` / `add_item2_handle` to return `Result<i32, Error>` (yielding `MalformedReply` when the reply lacks an extractable handle), change `bulk_results` and `bulk_write_results` to return `Result<Vec<_>, Error>` (yielding `MalformedReply` on a mismatched / absent payload arm), and route the same fix through the new `read_bulk` inline branch. Re-add the six `…_returns_malformed_reply_…` tests from the Client.Rust-016 resolution to lock the contract in.
**Resolution:** 2026-05-24 — Re-added `Error::MalformedReply { detail: String }` to `clients/rust/src/error.rs` (alongside re-adding `Error::Unavailable` for Client.Rust-010's resolution). Changed `register_server_handle` / `add_item_handle` / `add_item2_handle` in `clients/rust/src/session.rs` to return `Result<i32, Error>` and yield `MalformedReply` when the reply lacks both a typed payload and an int32 `return_value`. Changed `bulk_results` and `bulk_write_results` to return `Result<Vec<_>, Error>` and yield `MalformedReply` on a mismatched or absent payload arm. Rewrote `read_bulk`'s inline branch the same way. Added six regression tests in `clients/rust/tests/client_behavior.rs``register_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item_returns_malformed_reply_when_ok_reply_has_no_payload`, `add_item2_returns_malformed_reply_when_ok_reply_has_no_payload`, `subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `read_bulk_returns_malformed_reply_on_mismatched_payload_arm`, `write_bulk_returns_malformed_reply_on_mismatched_payload_arm` — driven by a new `InvokeOverride` enum on `FakeState` that pins the fake gateway's `Invoke` handler to `OkWithoutPayload`, `OkWithMismatchedPayload`, or `Unavailable(...)` per test.
### Client.Rust-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:835,872,1476` |
| Status | Resolved |
**Description:** Three CLI subcommands added since `d692232` hard-code their `client_correlation_id`:
```rust
client_correlation_id: "rust-cli-stream-alarms".to_owned(), // line 835
client_correlation_id: "rust-cli-acknowledge-alarm".to_owned(), // line 872
client_correlation_id: "rust-cli-bench-read-bulk-close".to_owned(), // line 1476
```
Every invocation of these subcommands on the same machine — and every iteration of the cross-language e2e matrix that exercises them concurrently — produces requests with identical correlation IDs in gateway logs, defeating the diagnostic value Client.Rust-011/014 unlocked when `next_correlation_id` was made `pub`. (As noted in the d692232 review-row header, the prior `rust-cli-ping`/`rust-cli-close-session` hard-codes are also still in place at lines 506 and 553 after the rename commit reverted the Client.Rust-014 fix; the three new ones extend the same regression to new CLI surface.)
**Recommendation:** Replace the three string literals with calls to the existing `next_correlation_id` helper (already `pub` and intended for raw-RPC consumers): `zb_mom_ww_mxgateway_client::session::next_correlation_id("cli-stream-alarms")`, `next_correlation_id("cli-acknowledge-alarm")`, `next_correlation_id("cli-bench-read-bulk-close")`. While here, restore the same fix at lines 506 and 553 so the CLI surface as a whole shares the library's correlation-id discipline.
**Resolution:** 2026-05-24 — Restored `pub fn next_correlation_id(label: &str) -> String` in `clients/rust/src/session.rs` (the `d692232` rename had reverted Client.Rust-011/014's resolution along with the rest) and re-exported it at the crate root in `clients/rust/src/lib.rs` (`pub use session::{next_correlation_id, Session};`) so the in-tree `mxgw` CLI calls the short `zb_mom_ww_mxgateway_client::next_correlation_id(...)` path. Replaced all five hard-coded correlation-id literals in `clients/rust/crates/mxgw-cli/src/main.rs` (`rust-cli-ping`, `rust-cli-close-session`, `rust-cli-stream-alarms`, `rust-cli-acknowledge-alarm`, `rust-cli-bench-read-bulk-close`) with `next_correlation_id("cli-...")`. `Session::command_request` and `Session::close` now use the same helper. Added a regression test `cli_subcommands_propagate_unique_correlation_ids_from_next_correlation_id` in `clients/rust/tests/client_behavior.rs` that records the correlation id observed at the fake gateway and asserts both that successive `next_correlation_id` calls produce distinct values and that the label is propagated through `stream_alarms` / `acknowledge_alarm`.
### Client.Rust-024
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505` |
| Status | Resolved |
**Description:** The diff under review adds substantial SDK and CLI surface with no positive-path coverage:
1. **`GatewayClient::stream_alarms`** (client.rs:280-291) has no test. The fake gateway's `stream_alarms` impl in `tests/client_behavior.rs:408-415` immediately drops the unique sender and returns an empty `ReceiverStream`, so an attached client would observe a clean end-of-stream — nothing exercises the snapshot → `snapshot_complete``transition` sequence that the doc on `AlarmFeedStream` advertises, mid-stream `Status` mapping through `Error::from`, or the cooperative-cancel-on-drop contract.
2. **Five new bulk SDK methods** (`read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk` on `Session`) have **zero** tests. The fake gateway's `Invoke` dispatcher does not handle any of `MxCommandKind::ReadBulk`, `WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, so a happy-path round trip would fail at the gateway-side match. The malformed-reply branches in Client.Rust-022 are therefore also untested.
3. **`run_bench_read_bulk`** has no test. The `BenchReadBulkStats` helper that Client.Rust-015/016 added at `a020350` (with its own dedicated unit test) was dropped in the port and there is no replacement — the percentile / mean math, the success-vs-failure accounting, and the JSON schema that `scripts/bench-read-bulk.ps1` collates are all untested.
**Recommendation:** Extend the fake-gateway `Invoke` dispatcher in `tests/client_behavior.rs` with the five new bulk reply arms, add round-trip tests for each (`write_bulk` / `write2_bulk` / `write_secured_bulk` / `write_secured2_bulk` / `read_bulk`), and re-add the six malformed-reply tests from the Client.Rust-016 resolution. Replace the trivial `stream_alarms` stub with one that emits a synthetic `ActiveAlarm``SnapshotComplete``Transition` sequence and assert the client surfaces them in order. For the bench, factor the percentile / accounting helpers out of `run_bench_read_bulk` into a small struct (matching the previous `BenchReadBulkStats`) and add unit tests asserting `latencyMs.{p50,p95,p99,max,mean}` are computed correctly from a hand-built sample.
**Resolution:** 2026-05-24 — Extended the fake gateway's `Invoke` dispatcher to handle `Register`, `AddItem2`, `ReadBulk`, `WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, and `WriteSecured2Bulk`, with shared `write_bulk_reply_for` / `bulk_write_result_ok` helpers. Added round-trip integration tests in `clients/rust/tests/client_behavior.rs` for all five bulk SDK methods: `read_bulk_round_trips_through_the_fake_gateway`, `write_bulk_round_trips_through_the_fake_gateway`, `write2_bulk_round_trips_through_the_fake_gateway`, `write_secured_bulk_round_trips_through_the_fake_gateway`, `write_secured2_bulk_round_trips_through_the_fake_gateway`. Replaced the trivial `stream_alarms` stub with a per-test script (`FakeState::stream_alarms_script`) and added `stream_alarms_emits_snapshot_then_complete_then_transition_in_order`, which feeds an `ActiveAlarm``SnapshotComplete``Transition` sequence and asserts the client surfaces all three payload arms in order. Added three CLI-side unit tests against the existing `percentile_summary` helper in `clients/rust/crates/mxgw-cli/src/main.rs`: `bench_percentile_summary_matches_hand_built_sample` (asserts p50/p95/p99/max/mean on the sample `[1,2,3,4,5]`), `bench_percentile_summary_handles_empty_sample`, and `bench_percentile_summary_handles_single_value_sample`. The malformed-reply suite from Client.Rust-022 plus the unary `Error::Unavailable` test together cover the remaining surface.
### Client.Rust-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:92-106,142-153,164-171` |
| Status | Resolved |
**Description:** CLAUDE.md mandates that "When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs ... must change in the same commit." The diff under review adds the following public surface, none of which is reflected in `RustClientDesign.md`:
- **SDK methods** on `GatewayClient`: `stream_alarms` and `acknowledge_alarm`, plus the new `AlarmFeedStream` type alias re-exported at the crate root in `lib.rs:27`.
- **SDK methods** on `Session`: `read_bulk`, `write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`. The `Session` impl block in `RustClientDesign.md:92-106` still lists only the original six bulk-subscribe helpers, identical to its d692232 state (the Client.Rust-019 resolution that added these to the doc lived on the divergent branch and was lost in the `397d3c5` rename).
- **CLI subcommands**: `stream-alarms`, `acknowledge-alarm`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, and `batch`. The `Commands` block in `RustClientDesign.md:166-171` still lists only `version`, `smoke`, `stream-events`, `write`.
- **Error variants**: `Error::MalformedReply` / `Error::Unavailable` / `Error::InvalidEndpoint` are still missing from the `Error` enum sketch at `RustClientDesign.md:142-153` (Client.Rust-017's expansion was also lost in the rename — and Client.Rust-022 will add `MalformedReply` back when fixed).
- **Build configuration**: the new `clients/rust/.cargo/config.toml` Windows 8 MB stack workaround is not mentioned anywhere in the design doc — a reader reproducing the build on a fresh dev box would not know to expect the stack-overflow failure mode the file documents.
**Recommendation:** Bring `RustClientDesign.md` back in sync with the actual implementation. Restore the `Session` API block to enumerate the five new bulk read/write methods alongside the existing six bulk-subscribe helpers (cross-reference Client.Rust-019's recommendation for the correct signatures — no trailing positional `user_id`/`timestamp`/`current_user_id`/`verifier_user_id`, those live on the per-entry structs). Add `stream_alarms` / `acknowledge_alarm` / `AlarmFeedStream` to a new "Alarms" section adjacent to the existing event-stream section. Expand the CLI command list to enumerate every subcommand the binary exposes today. Expand the `Error` enum sketch to match `clients/rust/src/error.rs`. Add a short "Windows build notes" subsection documenting the `.cargo/config.toml` stack workaround and why clap-derive's large `Command` enum needs it.
**Resolution:** 2026-05-24 — Brought `clients/rust/RustClientDesign.md` back in sync with the implementation. Removed the orphan `tracing` dependency line and added a new "Windows Build Notes" section explaining the `clients/rust/.cargo/config.toml` `/STACK:8388608` MSVC link-arg (why clap-derive's large `Command` enum needs it, that it writes into `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` for both debug and release builds, and that the MSVC-only target selector keeps mingw unaffected). Extended the `Session` API block to enumerate the five new bulk read/write helpers — `read_bulk<S: AsRef<str>>(&[S], u32)`, `write_bulk(Vec<WriteBulkEntry>)`, `write2_bulk(Vec<Write2BulkEntry>)`, `write_secured_bulk(Vec<WriteSecuredBulkEntry>)`, `write_secured2_bulk(Vec<WriteSecured2BulkEntry>)` — plus `add_item2`, `un_advise`, `remove_item`, and `write2`. Added a paragraph noting that per-entry credentials/timestamps live on the entry structs and that `read_bulk`'s borrow-and-`AsRef<str>` shape is what the cross-language bench-read-bulk hot loop relies on. Added a new "Alarms" section with `GatewayClient::stream_alarms` / `acknowledge_alarm` / the `AlarmFeedStream` type alias and the always-on (no-session) snapshot → complete → transition contract. Expanded the `Error` enum block to match `clients/rust/src/error.rs` (every public variant including `MalformedReply`, `Unavailable`, `InvalidEndpoint`, `InvalidArgument`). Expanded the CLI command list to enumerate every subcommand the binary exposes today, with notes on `batch`'s EOF-only termination and `bench-read-bulk`'s cross-language JSON shape.
### Client.Rust-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1402-1406,1419-1423` |
| Status | Resolved |
**Description:** `run_bench_read_bulk` clones the `tags: Vec<String>` on every iteration of both the warmup loop and the steady-state measurement loop:
```rust
while Instant::now() < warmup_deadline {
let _ = session
.read_bulk(server_handle, tags.clone(), timeout_ms_param)
.await;
}
// ...
while Instant::now() < steady_deadline {
let call_start = Instant::now();
let result = session
.read_bulk(server_handle, tags.clone(), timeout_ms_param)
.await;
let elapsed = call_start.elapsed();
latencies_ms.push(elapsed.as_secs_f64() * 1000.0);
// ...
}
```
Each clone allocates one fresh `Vec<String>` plus `bulk_size` heap-allocated `String` clones (default 6, but the flag goes higher; the cross-language matrix routinely runs with `bulk_size=64` and `bulk_size=256`). The allocation happens **inside** `call_start ... call_start.elapsed()`, so the per-call latency the bench reports includes the clone cost — both inflating the absolute number and adding allocator-noise variance to the percentile distribution that the cross-language driver compares against the .NET/Go/Java/Python bench numbers (none of which allocate the tag list per call).
**Recommendation:** Change `Session::read_bulk` to take `tag_addresses: &[String]` or generic over `AsRef<str>` (as Client.Rust-019's recommendation noted is what `read_bulk` was at `a020350`) so the bench can pass `&tags` once and avoid the per-call clone. Alternatively keep the `Vec<String>` signature but borrow the underlying buffer for the hot loop — e.g. build the bench's payload once outside the steady-state window and re-use it. The current `read_bulk(..., Vec<String>, ...)` shape forces the clone at the call site.
**Resolution:** 2026-05-24 — Changed `Session::read_bulk` to `read_bulk<S: AsRef<str>>(&self, server_handle, tag_addresses: &[S], timeout_ms)` so the bench loop can borrow the tag list once instead of cloning it per iteration. `run_bench_read_bulk` now binds `let tags_ref: &[String] = &tags;` outside the warm-up / steady-state loops and passes `tags_ref` into both, eliminating the per-call `Vec<String>` + `String` clone-allocations that were previously charged into the per-call latency reported by the cross-language `latencyMs` JSON contract. The CLI `read-bulk` subcommand was updated to call `session.read_bulk(server_handle, &items, timeout_ms)`. Clippy's `clone_on_copy` / `redundant_clone` lints are clean at HEAD so no extra regression test is needed beyond `read_bulk_round_trips_through_the_fake_gateway` from Client.Rust-024.
### Client.Rust-027
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/.cargo/config.toml:1-9` |
| Status | Resolved |
**Description:** The new build-config file added by `71d2c39` carries this leading comment:
```
[target.'cfg(windows)']
# Bump the default 1 MB Windows stack to 8 MB. clap-derive builds a large
# Command enum in this CLI (one variant per subcommand, each carrying flag
# args); in debug builds the enum is materialized on the stack without
# optimization and overflows the default Windows main-thread stack before
# even reaching our code. Release builds are unaffected but the e2e matrix
# drives the CLI through `cargo run` (debug), so the link-arg ships with
# every dev-time invocation.
rustflags = ["-C", "link-arg=/STACK:8388608"]
```
Two issues with the documentation vs the configuration:
1. The comment claims "Release builds are unaffected" but the `rustflags` setting is unconditional under `cfg(windows)``cargo build --release` on Windows also produces a binary whose `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` is 8 MB. The 8 MB stack reservation is in fact harmless for release builds (the optimizer removes the enum from the stack), but the comment misleads a reader trying to understand whether release artifacts of `mxgw` ship with a non-default header value (they do).
2. `/STACK:` is an MSVC-linker (`link.exe` / `lld-link`) directive; `cargo build` on Windows with `+gnu` targets (mingw) routes link args through the GNU linker which rejects `/STACK:` and instead expects `-Wl,--stack,8388608`. The current file silently breaks `x86_64-pc-windows-gnu` builds — a configuration `docs/ToolchainLinks.md` lists no expectation about, but a reader picking up the file does not see the constraint.
**Recommendation:** Either tighten the `[target.…]` selector to only the MSVC-linker tier (`cfg(all(windows, target_env = "msvc"))`) and re-word the comment to make the release-build behaviour explicit ("the stack reservation goes into the PE header for both debug and release builds; release is unaffected at runtime because the optimizer elides the enum from the stack frame"), or add the `+gnu` variant as a parallel block using `-Wl,--stack,8388608`. As a complementary fix, see Client.Rust-024's recommendation — boxing the largest clap-derive variants would let the stack workaround be retired entirely.
**Resolution:** 2026-05-24 — Tightened the target selector in `clients/rust/.cargo/config.toml` from `cfg(windows)` to `cfg(all(windows, target_env = "msvc"))` so `x86_64-pc-windows-gnu` (mingw) builds, which route link args through the GNU linker and reject `/STACK:`, are no longer affected. Rewrote the leading comment to make the release-build behaviour explicit: the `/STACK:` link-arg writes into `IMAGE_OPTIONAL_HEADER.SizeOfStackReserve` at link time and applies to both debug and release builds (so release artifacts ship with the same 8 MB stack reservation); release builds do not need it at runtime because the optimizer elides the enum from the stack frame, but the setting is kept on so both flavours produce binaries with identical stack metadata. The doc note in `RustClientDesign.md` now mirrors this.
### Client.Rust-028
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1126-1166` |
| Status | Resolved |
**Description:** `run_batch` reads commands from stdin with the blocking `std::io::Stdin::lock().lines()` iterator while the surrounding function is `async fn` and the runtime is `#[tokio::main]` (multi-threaded by default). Each `for line in stdin.lock().lines()` iteration pins one of tokio's worker threads on a blocking syscall (`ReadFile` on Windows), then spawns the dispatch as a separate `tokio::task::spawn(dispatch(cli.command))` — which itself runs on another worker thread — and awaits its `JoinHandle`. The pattern works for a single-threaded harness driver but has two latent problems:
1. **Empty-line sentinel**: `if line.is_empty() { break; }` exits the loop on the first empty line, terminating the batch session. The comment block above declares this as a feature ("stdin EOF (or an empty line) ends the session") but an accidental blank line from any driver — for example a `Write-Output ""` between commands in the PowerShell e2e matrix — silently ends the session and the harness has no way to distinguish "end-of-input" from "the CLI broke after the previous command". The other four language CLIs that implement `batch` use only EOF, not an empty-line sentinel.
2. **Blocking on a tokio worker**: any non-trivial latency in `dispatch` (e.g. a 30-second `bench-read-bulk`) keeps `run_batch` parked on its `JoinHandle.await`, which is fine; but if `dispatch` ever needs to drive the stdin reader (it doesn't today, but the `Batch` subcommand is the natural seam for future "stream input to a subcommand"), the blocking `Stdin::lock()` on the main task would deadlock the runtime worker.
**Recommendation:** Replace the empty-line break with an EOF-only terminator (`if line.trim().is_empty() { println!("{BATCH_EOR}"); stdout.lock().flush().ok(); continue; }`) so accidental blank lines log an empty-EOR-bracketed result instead of ending the session. Optionally wrap the stdin reader in `tokio::task::spawn_blocking` so the runtime can move it onto a dedicated blocking-pool thread; the current shape works for today's harness contract but is brittle if the dispatch future ever needs to share the runtime with stdin reads.
**Resolution:** 2026-05-24 — Removed the `if line.is_empty() { break; }` empty-line sentinel in `run_batch` so the only terminator is now stdin EOF (the implicit end of the `for line in stdin.lock().lines()` iterator). Blank or whitespace-only lines now fall into the existing `parts.is_empty()` branch which logs `__MXGW_BATCH_EOR__` and continues, matching the other four language CLIs and shielding the PowerShell e2e harness from accidental `Write-Output ""` calls between commands. Added a comment block on `run_batch` explaining the tokio-runtime / blocking-stdin trade-off: dispatch is already spawned on a fresh tokio task so the blocking iterator parks at most one worker thread on `ReadFile`, and that is acceptable because no other future on the main task needs to run while we wait for the next command. The CLI doc comment on the `Batch` clap variant was updated to mirror the new semantics. No unit test added — `run_batch` reads from real stdin which can't be driven from a `#[tokio::test]` without a subprocess harness, and the existing `parses_batch_command` / `batch_eor_marker_is_stable` tests already cover the parser and the EOR sentinel.
### Client.Rust-029
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143`; `clients/rust/src/galaxy.rs:282`; `clients/rust/src/session.rs:664-671` |
| Status | Resolved |
**Description:** `cargo clippy --workspace --all-targets -- -D warnings` fails at HEAD `42b0037` with three errors that the prior d692232 reviewer noted as "out of scope for Client.Rust-021" but did not open as a tracked finding:
```
error: missing documentation for a method
--> src\options.rs:98:5
|
98 | pub fn with_max_grpc_message_bytes(mut self, max_grpc_message_bytes: usize) -> Self {
error: missing documentation for a method
--> src\options.rs:143:5
|
143 | pub fn max_grpc_message_bytes(&self) -> usize {
error: using `clone` on type `Option<Timestamp>` which implements the `Copy` trait
--> src\galaxy.rs:282:24
error: all variants have the same postfix: `Bulk`
--> src\session.rs:664:1
|
664 | / enum BulkReplyKind {
665 | | AddItemBulk,
666 | | AdviseItemBulk,
667 | | RemoveItemBulk,
```
All three were resolved at `a020350` (Client.Rust-001, Client.Rust-002, Client.Rust-012 respectively) but regressed when `397d3c5` re-platformed parts of the prior fix and were never re-fixed. CLAUDE.md explicitly mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass cleanly; the documented build/test workflow for the module breaks. The prior reviewer correctly noted the regressions but routed them as "out of scope for Client.Rust-021", which left them invisible to the open-findings count. They are tracked here so they appear in the pending-findings index and are not lost again.
The third error (`BulkReplyKind` enum-variant-names) is also touched by the diff under review: commit `3251069` added a sibling enum `BulkWriteReplyKind` at `session.rs:699-704` whose variants (`Write`, `Write2`, `WriteSecured`, `WriteSecured2`) do not share a suffix and so do not trip the lint — but the pattern is now duplicated. A fix should rename both enums consistently or apply the same scoped `#[allow(clippy::enum_variant_names)]` reason-annotated allow to both.
**Recommendation:** Re-apply Client.Rust-001 (add doc comments on `with_max_grpc_message_bytes` / `max_grpc_message_bytes` in `options.rs`), Client.Rust-002 (drop the `Bulk` suffix from `BulkReplyKind`'s variants so they become `AddItem` / `AdviseItem` / …, or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment), and Client.Rust-012 (replace `last_deploy.lock().unwrap().clone()` with `*last_deploy.lock().unwrap()` in `galaxy.rs:282`). Verify with `cargo clippy --workspace --all-targets -- -D warnings`. Consider adding a pre-commit / CI gate so the next reviewer never has to discover the regression by running clippy.
**Resolution:** 2026-05-24 — Re-applied all three resolutions. `clients/rust/src/options.rs` now has `///` doc comments on `with_max_grpc_message_bytes` and `max_grpc_message_bytes`. `clients/rust/src/galaxy.rs:282` uses `*self.state.last_deploy.lock().unwrap()` instead of `.clone()`. `clients/rust/src/session.rs`'s `BulkReplyKind` variants are renamed to `AddItem` / `AdviseItem` / `RemoveItem` / `UnAdviseItem` / `Subscribe` / `Unsubscribe` (no shared `Bulk` suffix), with the call sites in `add_item_bulk` / `advise_item_bulk` / `remove_item_bulk` / `un_advise_item_bulk` / `subscribe_bulk` / `unsubscribe_bulk` updated accordingly. The sibling `BulkWriteReplyKind` already had non-suffix-sharing variants (`Write` / `Write2` / `WriteSecured` / `WriteSecured2`) and required no rename. `cargo clippy --workspace --all-targets -- -D warnings` is clean at HEAD.
### Client.Rust-030
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1731,1757` (origin: `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:909-924`, added in commit `1d85db7`) |
| Status | Resolved |
**Description:** The `mxgw-cli` binary does not compile at HEAD `410acc9`. `cargo check --workspace`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo build --workspace`, and `cargo test --workspace` all fail with a hard `E0004` (non-exhaustive patterns), so the entire documented Rust build/test/clippy workflow that CLAUDE.md mandates is broken:
```
error[E0004]: non-exhaustive patterns: `&Some(...alarm_feed_message::Payload::ProviderStatus(_))` not covered
--> crates/mxgw-cli/src/main.rs:1731:11
error[E0004]: non-exhaustive patterns: `&Some(...alarm_feed_message::Payload::ProviderStatus(_))` not covered
--> crates/mxgw-cli/src/main.rs:1757:11
```
The alarm-provider-fallback contract change (`1d85db7`, within the reviewed range) added a fourth `AlarmFeedMessage.payload` oneof arm — `AlarmProviderStatus provider_status = 4`. tonic-build regenerates the Rust enum with the new `ProviderStatus` variant, but `alarm_feed_message_summary` (`main.rs:1731`) and `alarm_feed_message_to_json` (`main.rs:1757`) each `match &message.payload` exhaustively over only `ActiveAlarm` / `SnapshotComplete` / `Transition` / `None` with no wildcard arm. Because they are exhaustive matches on a now-larger enum, the binary fails to compile rather than silently mishandling the new variant. The library crate (`zb-mom-ww-mxgateway-client`) itself compiles cleanly and all 32 of its tests pass; the break is confined to the CLI — but the CLI is the cross-language e2e matrix driver, so the whole `clients/rust` workspace is unbuildable and no Rust e2e smoke can run against the gateway at this commit. This is the alarm-surface gap the review request asked to check: the `ProviderStatus` payload is unhandled in the only place the Rust client renders the alarm feed.
**Recommendation:** Add a `Some(alarm_feed_message::Payload::ProviderStatus(status))` arm to both `alarm_feed_message_summary` and `alarm_feed_message_to_json` (render the provider-status fields — mode, degraded/provenance, reference — consistent with how the .NET/Go/Java/Python CLIs serialise it so the cross-language parity matcher recognises the payload). While there, update the two doc comments that still say "three `payload` oneof cases" (`main.rs:1729,1755`) to four. Verify with `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace`. Consider a CI gate so a contract change that adds a oneof arm cannot leave the Rust CLI unbuildable again.
**Resolution:** 2026-06-15 — Root cause confirmed: the contract's new fourth `AlarmFeedMessage.payload` oneof arm (`AlarmProviderStatus provider_status`, proto fields `mode`/`degraded`/`reason`/`since`) left both `match &message.payload` blocks non-exhaustive (`E0004`). Added a `Some(alarm_feed_message::Payload::ProviderStatus(status))` arm to both `alarm_feed_message_summary` (`mode`/`degraded`/`reason` one-liner) and `alarm_feed_message_to_json` (a `providerStatus` object with `mode`/`degraded`/`reason`/`since`), added an `AlarmEnumName::provider_mode` enum-name helper consistent with the existing `condition_state`/`transition_kind` renderers, and updated the summary doc comment to "four payload oneof cases". No `_ => {}` wildcard. Test: `alarm_feed_provider_status_renders_in_summary_and_json` (in `crates/mxgw-cli/src/main.rs`). All four cargo commands now pass.
### Client.Rust-031
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/options.rs:196-240` (`build_tls_config`); `clients/rust/Cargo.toml:40` (tonic features); docs: `clients/rust/src/options.rs:76-101`, `clients/rust/README.md` (TLS trust section), `clients/rust/crates/mxgw-cli/src/main.rs:429-431`, `clients/rust/RustClientDesign.md:202` |
| Status | Resolved |
**Description:** The new strict-verification escape hatch does not do what it documents. `build_tls_config` only configures trust roots when a CA file is pinned: with `require_certificate_validation(true)` and no `ca_file`, it returns a bare `ClientTlsConfig::new()` and never calls `.with_native_roots()`, `.with_webpki_roots()`, or `.with_enabled_roots()`. The crate also enables only `tonic` feature `tls-ring` (`Cargo.toml:40`) — neither `tls-native-roots` nor `tls-webpki-roots` is on, so even if the code wanted to enable system roots the methods are feature-gated out. A `ClientTlsConfig` with zero trust anchors rejects every server certificate during the rustls handshake, so the strict path cannot connect to any TLS gateway — not even one whose certificate is genuinely chained to a system root. Yet `with_require_certificate_validation`'s doc comment (`options.rs:80-89`), the README "TLS trust (pin-only)" section, the `--require-certificate-validation` CLI flag help (`main.rs:429-431`), and `RustClientDesign.md:202` all tell the user this option will "verify against the system trust roots." The documented behaviour is unreachable; the only working TLS path is CA pinning (`with_ca_file`).
This is masked by the tests: `tls_with_require_certificate_validation_does_not_short_circuit` (`tests/tls.rs`) dials a dead address (`https://127.0.0.1:1`) and only asserts the no-CA guard error does *not* fire — it never reaches a handshake, so the absent-roots defect is invisible to the suite.
**Recommendation:** Either (a) make the strict path actually load system roots — add the `tls-native-roots` (and/or `tls-webpki-roots`) feature to the `tonic` dependency and call `tls = tls.with_native_roots()` (or `.with_enabled_roots()`) in the `require_certificate_validation == true && ca_file.is_none()` branch of `build_tls_config` — and add a test that pins a self-signed cert as a CA and asserts a system-root-only connection to that same server is *rejected* (proving roots are actually consulted); or (b) if loading system roots is intentionally out of scope for v1, correct every doc site (the `with_require_certificate_validation` doc comment, README, CLI flag help, and `RustClientDesign.md`) to state that the strict flag does not currently enable any trust roots and that CA pinning is the only supported TLS path. Option (a) is the better fix because the flag otherwise has no working effect.
**Resolution:** 2026-06-15 — Took option (a). Root cause confirmed: strict-on/no-CA returned a bare `ClientTlsConfig::new()` with zero trust anchors and the crate only enabled tonic `tls-ring`, so the documented "verify against the system trust roots" path could never validate any certificate. Added `tls-native-roots` to the `tonic` features in `Cargo.toml` and refactored `build_tls_config` to compute the trust posture via a new pure `tls_trust_decision` helper returning `TlsTrustDecision::{None,PinnedCa,SystemRoots,RejectNoCa}`; the `SystemRoots` branch now calls `ClientTlsConfig::with_native_roots()` so a cert chaining to an OS-trusted root validates. Corrected every doc site to state the strict flag verifies against OS roots (not a bare self-signed cert, which still needs `with_ca_file`): the `with_require_certificate_validation` doc comment and `build_tls_config` docs (`options.rs`), README "TLS trust" section, and `RustClientDesign.md` "Trust posture"; the CLI flag help was already accurate. TDD: added failing-first unit tests then the fix — `strict_without_ca_uses_system_roots`, `lenient_without_ca_is_rejected`, `pinned_ca_uses_pinned_trust`, `plaintext_needs_no_tls` (in `src/options.rs`). All four cargo commands pass.
### Client.Rust-032
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md`; surface in `clients/rust/src/galaxy.rs:281-379` |
| Status | Resolved |
**Description:** The diff under review adds substantial new public Galaxy SDK surface — `GalaxyClient::browse`, `GalaxyClient::browse_children_raw`, the `BrowseChildrenOptions` filter struct, and the `LazyBrowseNode` lazy walker (`object`, `has_children_hint`, `children`, `is_expanded`, `expand`) — none of which is described in `RustClientDesign.md`. The README was updated with a "Browsing lazily" / "High-level walker" section, but CLAUDE.md requires the design docs to change in the same change as the public API. A reader consulting the detailed design to understand the Galaxy client surface will not learn that lazy browsing, sibling pagination, the `child_has_children` hint, or the idempotent `expand` contract exist.
**Recommendation:** Add a "Lazy browse" subsection to the Galaxy section of `RustClientDesign.md` enumerating `browse`, `browse_children_raw`, `BrowseChildrenOptions` (its filter fields and AND semantics), and `LazyBrowseNode` (the `Arc`-shared clone semantics, the idempotent single-RPC `expand`, the `has_children_hint`, and the internal paged `BrowseChildren` loop with its repeated-page-token guard). Cross-reference `docs/GalaxyRepository.md#browsechildren` for the wire-level request/filter semantics the README already links.
**Resolution:** 2026-06-15 — Confirmed by inspection that `RustClientDesign.md` had no Galaxy library-API coverage at all. Added a new "Galaxy Repository" section documenting `browse`, `browse_children_raw`, the `BrowseChildrenOptions` filter struct (all six fields, AND combination semantics, `include_attributes` tri-state), and `LazyBrowseNode` (`Arc`-shared clone semantics, `has_children_hint`, the idempotent single-RPC `expand` under an async mutex with page size 500, and the repeated-page-token `Error::InvalidArgument` guard), cross-referencing `docs/GalaxyRepository.md#browsechildren`. Also noted the fourth alarm `provider_status` oneof case in the Alarms section while resolving Client.Rust-030. Doc-only change verified by inspection; design-doc anchor target confirmed present.