Files

T

Joseph Doherty d2d2e5f68f code-review 2026-05-24: re-review at d692232 across all 11 modules

Restores the `code-reviews/` tree (was unwritten on this working copy)
and re-reviews every module per `REVIEW-PROCESS.md` against HEAD
`d692232`. The diff in scope is the five commits since the last sweep:
`dc9c0c9` (ZB.MOM.WW gateway-side rename + slnx migrate),
`397d3c5` (client SDK rename + the missing alarm-RPC proto types and
the .NET DiscoverHierarchyOptions POCO), `27ed651` (role-based LDAP
auth + HubToken bearer, drop PathBase), `6594359` (sidebar layout +
three SignalR push hubs), and `d692232` (EventsHub publisher + doc
refresh).

Module status

| Module | Open | Total | Delta this pass |
|---|---|---|---|
| Server           | 8 | 43 | +6 |
| Contracts        | 2 | 17 | +2 |
| Tests            | 2 | 26 | +2 |
| IntegrationTests | 3 | 24 | +3 |
| Client.Java      | 5 | 31 | +5 |
| Client.Rust      | 1 | 21 | +1 |
| Worker           | 0 | 25 |  0 (rename-only diff, clean) |
| Worker.Tests     | 0 | 30 |  0 (rename-only diff, clean) |
| Client.Dotnet    | 0 | 17 |  0 (rename + alarm-fix diff, clean) |
| Client.Python    | 0 | 21 |  0 (rename + alarm-fix diff, clean) |
| Client.Go        | 0 | 21 |  0 (rename + alarm-fix diff, clean) |

Total new findings: 19. Severity breakdown: 1 Medium-security
(Server-038), 4 Medium-documentation/coverage, 14 Low.

New findings

  * Server-038 (Medium / Security) — EventsHub.SubscribeSession accepts
    any session id from any Viewer; no per-session ACL guards the
    EventsHub group fan-out.
  * Server-039 (Low / Error handling) — HubTokenService.Validate
    accepts a payload with null Name/NameIdentifier.
  * Server-040 (Low / Conventions) — MapGroupsToRoles undocumented
    full-vs-RDN lookup precedence.
  * Server-041 (Low / Design adherence) — EventStreamService calls
    IDashboardEventBroadcaster.Publish without a try/catch — fragile
    seam relying on the never-throw contract.
  * Server-042 (Low / Performance) — DashboardSnapshotPublisher tight
    retry loop with no backoff (vs AlarmsHubPublisher 5s delay).
  * Server-043 (Low / Documentation) — HubTokenService singleton
    sharing across login + hub-token validation undocumented.

  * Contracts-016 (Low / Conventions) — QueryActiveAlarmsRequest.session_id
    reserved-for-future-use ambiguity.
  * Contracts-017 (Low / Documentation) — rpc QueryActiveAlarms doc
    omits the alarm_filter_prefix filter description.

  * Tests-025 (Low / Conventions) — duplicate NullDashboardEventBroadcaster
    fakes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests.
  * Tests-026 (Medium / Testing coverage) — no test proves
    EventStreamService actually calls IDashboardEventBroadcaster.Publish.

  * IntegrationTests-022 (Low / Conventions) — ResolveRepositoryRoot
    silent fallback to Directory.GetCurrentDirectory().
  * IntegrationTests-023 (Low / Testing coverage) — DashboardLdapLiveTests
    success-path asserts ldap_group but not the Role claim.
  * IntegrationTests-024 (Low / Conventions) — inline
    NullDashboardEventBroadcaster fake duplicates Tests-side copies.

  * Client.Java-027 (Medium / Documentation) — README + JavaClientDesign
    Gradle task names still use the old short project names.
  * Client.Java-028 (Medium / Design adherence) — JavaClientDesign
    build-layout shows the old `com/dohertylan/mxgateway/` package paths.
  * Client.Java-029 (Low / Documentation) — README installDist path
    cites the wrong directory.
  * Client.Java-030 (Low / Testing coverage) — no Java test exercises
    the regenerated QueryActiveAlarmsRequest RPC.
  * Client.Java-031 (Low / Conventions) — README prose uses old short
    project names instead of canonical prefixed ones.

  * Client.Rust-021 (Low / Design adherence) — RustClientDesign.md
    "Crate layout" shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/`
    that does not exist; actual layout is the flat top-level crate.

Two pre-existing pending findings (Server-031 lock-contention,
Server-032 bounded event channel) remain unchanged — neither was
touched by this wave of commits.

Process notes

- The `code-reviews/` tree was not in this working copy's git
  history (the local extract pre-dates the divergent branch that
  carried the reviews). Restored from `dd7ca16` via
  `git checkout dd7ca16 -- code-reviews/` before the re-review.
- Some "Resolved" entries in the restored findings.md reference
  fixes that landed on the divergent branch (the same one that
  carried the reviews) and are not present on the current main
  lineage. The re-review treats those statuses as historical;
  the new pass only files findings against HEAD's actual state.
- `python code-reviews/regen-readme.py --check` is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 02:34:30 -04:00

42 KiB

Raw Blame History

Code Review — Client.Rust

Field	Value
Module	`clients/rust`
Reviewer	Claude Code
Review date	2026-05-24
Commit reviewed	`d692232`
Status	Reviewed
Open findings	1

Checklist coverage

This re-review (a020350) covers the resolution work for Client.Rust-013 through 017 (scoped doc_lazy_continuation allow on generated submodules, pub next_correlation_id shared with the CLI, success/failure split in bench-read-bulk, eight new tests, design-doc resync). The pass spot-checked the items called out in the request: stability of the newly-pub next_correlation_id, the bench-read-bulk JSON shape vs the PowerShell driver, presence of unsafe, and the scope of #![allow(clippy::doc_lazy_continuation)]. cargo clippy --workspace --all-targets -- -D warnings and cargo test --workspace both pass on this commit.

#	Category	Result
1	Correctness & logic bugs	No issues found — the five new `MalformedReply` paths and the `read_bulk` mismatched-payload branch each have a dedicated test; `BenchReadBulkStats` correctly partitions success vs failure latency.
2	mxaccessgw conventions	No issues found — `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace` both pass on this commit; the `#![allow(clippy::doc_lazy_continuation)]` allow is scoped narrowly to each generated v1 inner module so hand-written code is unaffected; CLI `Ping`/`CloseSession` now call `session::next_correlation_id`.
3	Concurrency & thread safety	No issues found — `CORRELATION_SEQUENCE` is `AtomicU64` with `Relaxed`, correct for monotonic id generation; no `unsafe` anywhere in `src/` or `crates/`.
4	Error handling & resilience	Issue found: the `bench-read-bulk` fix for Client.Rust-015 has fixed Rust's own histogram honestly but the change makes Rust's `latencyMs` semantically incompatible with the four other clients' `latencyMs` field that the cross-language PowerShell driver collates side-by-side (Client.Rust-018).
5	Security	No issues found — API keys still redacted in `Debug`/`Display`, status messages scrubbed, `first_failure` records `Error::Display` (which already redacts `mxgw_*` tokens) so secure-write values cannot leak into the bench JSON.
6	Performance & resource management	No issues found in the reviewed delta.
7	Design-document adherence	Issue found: `RustClientDesign.md` Session signatures for the four bulk-write helpers and `read_bulk` do not match the actual implementation — the design lists trailing `user_id` / `timestamp` / `current_user_id` / `verifier_user_id` parameters and a `Vec<ReadBulkResult>` return that the impl does not have (all of those move per-entry into `WriteBulkEntry` etc.) (Client.Rust-019).
8	Code organization & conventions	No new issues — `BenchReadBulkStats` is cleanly factored out and tested.
9	Testing coverage	No new issues — the malformed-reply paths and unary `Error::Unavailable` are now covered, and the four bulk-write families each have round-trip smoke.
10	Documentation & comments	Issue found: `next_correlation_id` is now `pub` and its doc comment commits the SDK to the literal `"rust-client-{label}-{N}"` string format, but neither the doc nor `lib.rs` re-exports it or declares any stability stance, leaving the public surface ambiguous (Client.Rust-020).

2026-05-24 review (commit `d692232`)

Re-review pass at d692232. Diff against a020350 is commit 397d3c5: top-level crate renamed mxgateway-client → zb-mom-ww-mxgateway-client, build.rs proto path updated, every use mxgateway_client:: sweep-replaced to use zb_mom_ww_mxgateway_client::, tests/client_behavior.rs updated for the retired session_id and a new stream_alarms impl on the fake gateway, and the protocol-version assertion bumped to 3. Workspace member layout unchanged. cargo test --workspace and cargo clippy --workspace --all-targets -- -D warnings both clean at HEAD.

#	Category	Result
1	Correctness & logic bugs	No issues found in the a020350..d692232 diff.
2	mxaccessgw conventions	No issues found — wire identifiers (CLI binary `mxgw`, `MXGATEWAY_*` env vars, `mxaccess_gateway.v1` proto packages) unchanged per commit message.
3	Concurrency & thread safety	No issues found in this diff.
4	Error handling & resilience	No issues found in this diff.
5	Security	No issues found in this diff.
6	Performance & resource management	No issues found in this diff.
7	Design-document adherence	Issues found: Client.Rust-021 (`RustClientDesign.md` "Crate layout" section still shows an aspirational nested `crates/zb-mom-ww-mxgateway-client/` directory that does not exist; actual layout is the flat top-level crate at `clients/rust/`).
8	Code organization & conventions	No issues found in this diff.
9	Testing coverage	No issues found in this diff — the fake gateway's `stream_alarms` impl is a minimal stub, but no production behaviour relies on it being exercised by a test.
10	Documentation & comments	Issues found: Client.Rust-021 (design-doc drift).

Findings

Client.Rust-001

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/options.rs:98,143`
Status	Resolved

Description: with_max_grpc_message_bytes and max_grpc_message_bytes have no /// doc comments. The crate sets #![warn(missing_docs)] and CLAUDE.md mandates that cargo clippy --workspace --all-targets -- -D warnings pass. Under -D warnings these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.

Recommendation: Add doc comments to both methods, e.g. /// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).

Resolution: Resolved in 0d8a28d (2026-05-18): doc comments added to both methods.

Client.Rust-002

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/session.rs:522`
Status	Resolved

Description: The BulkReplyKind enum's variants (AddItemBulk, AdviseItemBulk, RemoveItemBulk, UnAdviseItemBulk, SubscribeBulk, UnsubscribeBulk) all share the Bulk suffix, tripping clippy::enum_variant_names. Under -D warnings this is a compile error, so cargo clippy --workspace --all-targets -- -D warnings fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.

Recommendation: Rename the variants to drop the common suffix (e.g. AddItem, AdviseItem, …) or add a narrowly-scoped #[allow(clippy::enum_variant_names)] with a reason comment.

Resolution: Resolved in 0d8a28d (2026-05-18): variants renamed to AddItem/AdviseItem/RemoveItem/UnAdviseItem/Subscribe/Unsubscribe, which no longer share a common suffix.

Client.Rust-003

Field	Value
Severity	High
Category	Correctness & logic bugs
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1051`
Status	Resolved

Description: The unit test version_json_output_has_protocol_versions asserts value["gatewayProtocolVersion"] == 2, but GATEWAY_PROTOCOL_VERSION is 3 (version.rs:10), matching the authoritative server constant GatewayContractInfo.GatewayProtocolVersion = 3. The test fails, so cargo test --workspace (the documented test step) does not pass — the test was not updated when the protocol version was bumped.

Recommendation: Update the assertion to 3, or better, assert against GATEWAY_PROTOCOL_VERSION so it cannot drift again.

Resolution: Resolved in 0d8a28d (2026-05-18): the test now asserts against the GATEWAY_PROTOCOL_VERSION / WORKER_PROTOCOL_VERSION constants, so it cannot drift again.

Client.Rust-004

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/rust/src/version.rs:7`
Status	Resolved

Description: CLIENT_VERSION is "0.1.0-dev" and its doc comment claims "Mirrors Cargo.toml", but Cargo.toml declares version = "0.1.0" (no -dev suffix). The comment is misleading and the value is not actually kept in sync with the manifest.

Recommendation: Either set CLIENT_VERSION from the build via env!("CARGO_PKG_VERSION"), or correct the constant to "0.1.0" and drop the "Mirrors Cargo.toml" claim.

Resolution: Resolved in 0d8a28d (2026-05-18): CLIENT_VERSION is now env!("CARGO_PKG_VERSION"), taken from Cargo.toml at compile time so the two cannot drift.

Client.Rust-005

Field	Value
Severity	Medium
Category	Correctness & logic bugs
Location	`clients/rust/src/session.rs:489-520`
Status	Resolved

Description: register_server_handle, add_item_handle, and add_item2_handle fall through to reply.return_value … .unwrap_or_default(), returning 0 when the reply carries neither the expected typed payload nor an Int32 return_value. Because Session::invoke has already confirmed protocol_status == Ok, a malformed-but-OK reply silently yields handle 0, which the caller then uses as a real handle against the worker.

Recommendation: Return Err(Error::ProtocolStatus { … }) (or a dedicated Error::MalformedReply) when an OK reply lacks an extractable handle, instead of defaulting to 0.

Resolution: Resolved in 0d8a28d (2026-05-18): the three handle extractors now return Result<i32, Error> and yield the new Error::MalformedReply when an OK reply carries no usable handle.

Client.Rust-006

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/src/session.rs:531-555`
Status	Resolved

Description: bulk_results returns Vec::new() for any (payload, kind) combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of subscribe_bulk/add_item_bulk then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".

Recommendation: Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.

Resolution: Resolved in 0d8a28d (2026-05-18): bulk_results now returns Result<Vec<SubscribeResult>, Error> and yields Error::MalformedReply on a mismatched or absent bulk payload.

Client.Rust-007

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:14-55`
Status	Resolved

Description: RustClientDesign.md is stale relative to the implemented code. It documents a nested crates/mxgateway-client/ layout (the real crate root is clients/rust/ with a flat src/), and lists tracing among "Expected dependencies", but tracing appears in no Cargo.toml. CLAUDE.md requires docs to change with the source.

Recommendation: Update RustClientDesign.md to the actual flat layout and remove tracing from the dependency list (or add tracing if structured logging is genuinely intended).

Resolution: Resolved in 0d8a28d (2026-05-18): the "Crate Layout" section now shows the actual flat layout (mxgateway-client as the workspace-root crate, mxgw-cli as a member) and the unused tracing entry was removed from the dependency list.

Client.Rust-008

Field	Value
Severity	Low
Category	Performance & resource management
Location	`clients/rust/src/value.rs:161-261`
Status	Resolved

Description: MxValueProjection::from_proto and MxArrayProjection::from_proto deep-clone every element out of the wire message while MxValue/MxArrayValue also retain the original raw message. Every MxValue therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.

Recommendation: Compute the projection lazily on demand, or have the projection borrow from raw, so array/raw payloads are not duplicated for every wrapped value.

Resolution: Resolved in 0d8a28d (2026-05-18): MxValue and MxArrayValue no longer cache a projection field — projection() computes the typed view on demand from raw. A value built only to be sent over the wire now holds a single copy of its payload and pays no projection cost.

Client.Rust-009

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs`
Status	Resolved

Description: Several critical paths are untested: TLS channel setup (with_plaintext(false) / CA-file loading), mid-stream tonic::Status fault propagation through EventStream/DeployEventStream (tests only send Ok items), and the bulk-size cap (ensure_bulk_size rejecting >1000 items).

Recommendation: Add tests that (a) feed an Err(Status) into the event/deploy streams and assert it surfaces as the mapped Error, (b) assert add_item_bulk with 1001 items returns Error::InvalidArgument, and (c) exercise the CA-file/InvalidEndpoint error path.

Resolution: Resolved in 0d8a28d (2026-05-18): added add_item_bulk_rejects_input_above_the_thousand_item_cap, event_stream_surfaces_a_mid_stream_status_fault (the fake gateway now optionally emits a mid-stream Status::unavailable), and connect_with_unreadable_ca_file_reports_invalid_endpoint.

Client.Rust-010

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216`
Status	Resolved

Description: The client applies only a per-call deadline via Request::set_timeout and has no retry, reconnect, or transient-vs-permanent classification. A transient Unavailable (e.g. a gateway restart) maps to the catch-all Error::Status and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.

Recommendation: Either add a documented Error::Unavailable variant classifying Code::Unavailable/Code::ResourceExhausted, or explicitly document in the README that the client performs no retries and that transient failures arrive as Error::Status.

Resolution: Resolved in 0d8a28d (2026-05-18): added the Error::Unavailable variant; From<tonic::Status> maps Code::Unavailable and Code::ResourceExhausted to it, so callers can classify transient failures without unwrapping the raw status.

Client.Rust-011

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/src/session.rs:469`
Status	Resolved

Description: command_request hard-codes client_correlation_id as format!("rust-client-{}", kind.as_str_name()). Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.

Recommendation: Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.

Resolution: Resolved in 0d8a28d (2026-05-18): correlation ids are built by next_correlation_id, which appends a process-wide atomic sequence number; Session::close uses it too.

Client.Rust-012

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`clients/rust/src/galaxy.rs:282`
Status	Resolved

Description: Found while verifying the fix for Client.Rust-001/002: cargo clippy --workspace --all-targets -- -D warnings reported a third violation the original review missed. The get_last_deploy_time test fake calls .clone() on a MutexGuard<Option<prost_types::Timestamp>>, and Option<Timestamp> is Copy (clippy::clone_on_copy). Under -D warnings this is a compile error, so clippy still did not pass after Client.Rust-001/002 alone.

Recommendation: Dereference instead of cloning: *self.state.last_deploy.lock().unwrap().

Resolution: Resolved in 0d8a28d (2026-05-18): replaced .clone() with a deref. cargo clippy --workspace --all-targets -- -D warnings now passes cleanly.

Client.Rust-013

Field	Value
Severity	High
Category	mxaccessgw conventions
Location	`src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:414-424` (origin); `clients/rust/src/generated.rs:11-31` (suppression site)
Status	Resolved

Description: cargo clippy --workspace --all-targets -- -D warnings fails again on this commit, this time on a clippy::doc_lazy_continuation violation in generated code:

error: doc list item without indentation
   --> .../mxaccess_gateway.v1.rs:526:5
    |
526 | /// `timeout_ms == 0` uses the gateway-configured default (1000 ms).
    |     ^

The lint fires because the ReadBulkCommand proto comment (added with the bulk Read feature in commit 5e375f6) writes a bulleted list and then a trailing paragraph without the required blank line. prost-build forwards the proto comment verbatim into Rust doc comments, and the Rust client compiles those generated modules with crate-default lints. The crate already opts out of clippy::large_enum_variant in src/generated.rs for exactly this kind of generator-style problem, but doc_lazy_continuation is not on the allow-list, so the lint reaches -D warnings and breaks the documented cargo clippy --workspace --all-targets -- -D warnings invocation that CLAUDE.md mandates pass. The Rust client review was previously closed as clippy-clean (Client.Rust-001/002/012); this is the third clippy-clean regression caused by generated code in this module and warrants a more durable fix.

Recommendation: Add #![allow(clippy::doc_lazy_continuation)] to each generated submodule in clients/rust/src/generated.rs alongside clippy::large_enum_variant, so generated doc comments — which the client cannot edit — cannot break the -D warnings build. Independently, fix the upstream proto comment to insert a blank line before the trailing paragraph so the C# / Go / Python / Java generators do not carry the same flaky text.

Resolution: 2026-05-20 — Added #![allow(clippy::doc_lazy_continuation)] to each generated submodule in clients/rust/src/generated.rs next to the existing clippy::large_enum_variant allow, and reformatted the ReadBulkCommand proto comment in src/MxGateway.Contracts/Protos/mxaccess_gateway.proto to surround the bulleted list with blank lines so doc-comment generators in every language see a properly-terminated list. cargo clippy --workspace --all-targets -- -D warnings and cargo test --workspace now pass, and dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj reports 0 warnings.

Client.Rust-014

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/rust/crates/mxgw-cli/src/main.rs:450,497`
Status	Resolved

Description: Client.Rust-011 made Session build unique correlation ids per call, but the mxgw CLI's Ping and CloseSession subcommands still hard-code client_correlation_id: "rust-cli-ping".to_owned() and "rust-cli-close-session".to_owned(). Both go through client.invoke(…) / client.close_session_raw(…) rather than the Session helpers, so the library's id generator does not run. The CLI is the cross-language e2e driver — when the same machine runs concurrent CLI smokes, every ping/close-session request collides on the same correlation id in gateway logs, defeating the diagnostic value the library fix unlocked.

Recommendation: Either (a) expose session::next_correlation_id as a pub(crate) or library-level helper and have the CLI call it from Ping/CloseSession, or (b) replace these RPCs with the higher-level Session helpers (Session::close, and a thin Session::ping wrapper) so the CLI shares the library's correlation-id discipline by construction.

Resolution: 2026-05-20 — Promoted session::next_correlation_id from a module-private helper to a pub library-level function (it already lived in the pub mod session) and updated the mxgw CLI's Ping and CloseSession subcommands to call mxgateway_client::session::next_correlation_id("cli-ping") / next_correlation_id("cli-close-session") instead of the hard-coded "rust-cli-ping" / "rust-cli-close-session" strings. Concurrent CLI smokes now produce unique correlation ids per call — driven by the same process-wide CORRELATION_SEQUENCE AtomicU64 the library uses — so gateway logs can tell collisions apart again. cargo fmt, cargo build --workspace, cargo clippy --workspace --all-targets -- -D warnings, and cargo test --workspace all pass.

Client.Rust-015

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1053-1070`
Status	Resolved

Description: The new cross-language benchmark bench-read-bulk pushes the elapsed time of every read_bulk call into latencies_ms regardless of whether the call returned Ok or Err:

let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
latencies_ms.push(elapsed_ms);
match outcome {
    Ok(results) => { successful_calls += 1; … }
    Err(_) => failed_calls += 1,
}

A failed read_bulk (transient Unavailable, deadline-exceeded mid-call, etc.) typically returns later than a successful one — it includes the full per-call timeout that the success path never waits for. The histogram therefore conflates "p99 cached-read latency" with "p99 of (cached-read + timed-out call)", and the JSON document the PowerShell driver collates publishes latencyMs.p99 / latencyMs.max that no longer represent successful-call latency. Worse, the failure category is silently dropped (Err(_) => failed_calls += 1) so a benchmark run that fails on every call still emits a coherent-looking JSON without ever surfacing why. This is misleading for a benchmark whose JSON shape is the cross-language comparison contract.

Recommendation: Only push elapsed time into latencies_ms on Ok, or split into two histograms (successLatencyMs and failureLatencyMs) and log the first failure's error string into the stats record so a partial-failure run is visible at the report layer.

Resolution: 2026-05-20 — Extracted the per-iteration accounting in bench-read-bulk into a BenchReadBulkStats helper with explicit record_success/record_failure methods. Successful read_bulk calls now flow into success_latencies_ms (driving the cross-language latencyMs.p99/max JSON contract), failures flow into a separate failure_latencies_ms histogram surfaced as failureLatencyMs, and the first failure's redacted error string is stashed as firstFailure so a partial-failure run is visible at the report layer instead of producing a coherent-looking JSON that hides every error. Added a unit test (bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram) that records two fast successes plus a deliberately slow failure and asserts the success histogram never sees the failure latency, plus a smaller smoke test for the zero-duration calls-per-second path.

Client.Rust-016

Field	Value
Severity	Medium
Category	Testing coverage
Location	`clients/rust/tests/client_behavior.rs`, `clients/rust/src/session.rs:489-519,654-768`
Status	Resolved

Description: The fixes for Client.Rust-005 / 006 added five new Error::MalformedReply paths to session.rs (register_server_handle, add_item_handle, add_item2_handle, bulk_results, bulk_write_results) plus the inline branch in read_bulk. None of them are exercised by tests — every test in client_behavior.rs feeds the matching payload back to the client, so the malformed-reply branches are dead code from the test suite's perspective. The new bulk-write helpers (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk) only have a single happy-path assertion via write_bulk, leaving the three other variants and every per-entry-failure shape untested. The bench-read-bulk flow has no test (the driver script is the only consumer). The Error::Unavailable variant from Client.Rust-010 is covered by event_stream_surfaces_a_mid_stream_status_fault, but the same variant on a unary Code::Unavailable is not.

Recommendation: Add three light tests against the existing FakeGateway:

Have the fake reply to AddItem (or Register / AddItem2) with protocol_status = Ok and no payload, and assert the client surfaces Error::MalformedReply.
Have the fake reply to WriteBulk with protocol_status = Ok and the wrong payload arm (e.g. an AddItemReply body), and assert Error::MalformedReply.
Have the fake fail the unary Invoke with Status::unavailable(...) and assert Error::Unavailable.

Optionally add Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk smoke assertions so all four bulk-write families have at least one round-trip test.

Resolution: 2026-05-20 — Added eight new integration tests in clients/rust/tests/client_behavior.rs. Each new Error::MalformedReply site is exercised via a test-only InvokeOverride injected into FakeState that lets a single test pin the fake gateway's Invoke handler to one of three malformed shapes (OK reply with no payload, OK reply with the wrong payload arm for read_bulk, OK reply with the wrong payload arm for the other bulk / bulk-write families): register_returns_malformed_reply_when_ok_reply_has_no_payload, add_item_returns_malformed_reply_when_ok_reply_has_no_payload, add_item2_returns_malformed_reply_when_ok_reply_has_no_payload, subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm, write_bulk_returns_malformed_reply_on_mismatched_payload_arm, and read_bulk_returns_malformed_reply_on_mismatched_payload_arm. The unary Error::Unavailable path is covered by unary_invoke_maps_status_unavailable_to_error_unavailable (the override returns Status::unavailable(...)). The remaining three bulk-write families gained round-trip smoke tests — write2_bulk_round_trips_through_the_fake_gateway, write_secured_bulk_round_trips_through_the_fake_gateway, write_secured2_bulk_round_trips_through_the_fake_gateway — extending the fake gateway's dispatcher with happy-path replies for Write2Bulk / WriteSecuredBulk / WriteSecured2Bulk. The bench-read-bulk flow gets a BenchReadBulkStats unit test in crates/mxgw-cli/src/main.rs (see Client.Rust-015) that asserts the latency-tracking change keeps failed-call durations out of latencyMs.

Client.Rust-017

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:79-99,156-163`
Status	Resolved

Description: CLAUDE.md requires docs to change with the source. RustClientDesign.md was refreshed to fix the layout/tracing drift (Client.Rust-007), but the Session API surface in the design (Library API block, lines 79-99) still lists only the original six bulk helpers — add_item_bulk, advise_item_bulk, remove_item_bulk, un_advise_item_bulk, subscribe_bulk, unsubscribe_bulk — and is missing the five new bulk-write helpers and read_bulk (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk, read_bulk) that landed in commits 5e375f6 / f220908 / 61644e6. The Error Handling block (lines 130-146) still enumerates Transport, Status, Authentication, Authorization, Session, Worker, Command, MxAccess, Timeout, Cancelled — but not MalformedReply, Unavailable, or InvalidEndpoint, all of which are now public variants of the crate's Error enum. The Test CLI block (lines 158-163) lists version / smoke / stream-events / write but is missing every new subcommand (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk, galaxy watch).

Recommendation: Bring the design doc back in sync: extend the Session API code block to enumerate the bulk-write/read methods, expand the Error enum to match clients/rust/src/error.rs, and add the missing CLI subcommands. The README is already up to date, so this is design-doc-only churn.

Resolution: 2026-05-20 — Brought clients/rust/RustClientDesign.md back in sync with the implementation. The Session block now lists the five new bulk helpers (write_bulk, write2_bulk, write_secured_bulk, write_secured2_bulk, read_bulk) alongside the original six and notes that session::next_correlation_id is pub for raw-RPC consumers (the CLI). The Error enum block now matches clients/rust/src/error.rs — InvalidEndpoint, InvalidArgument, Transport, Authentication, Authorization, Timeout, Cancelled, Unavailable, Status, Command, ProtocolStatus, MalformedReply — with a short paragraph explaining what Unavailable, MalformedReply, and InvalidEndpoint classify. The Test CLI block enumerates every subcommand the binary exposes today: version, ping, open-session, close-session, register, add-item, advise, subscribe-bulk, unsubscribe-bulk, read-bulk, write, write2, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, stream-events, bench-read-bulk, smoke, and the galaxy {test-connection,last-deploy-time,discover-hierarchy,watch} subtree.

Client.Rust-018

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/rust/crates/mxgw-cli/src/main.rs:1098-1170`; `scripts/bench-read-bulk.ps1:347-365`; siblings: `clients/go/cmd/mxgw-go/main.go:600-648`, `clients/python/src/mxgateway_cli/commands.py:614-662`, `clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:685-770`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-940`
Status	Resolved

Description: Client.Rust-015's resolution split Rust's bench histogram so latencyMs records only successful read_bulk calls and a new failureLatencyMs field holds failed-call durations. The local logic is right, the unit test (bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram) is right, and the JSON shape stays additively compatible with scripts/bench-read-bulk.ps1 (the collator reads $s.latencyMs.p50/p95/p99/max/mean and these keys still exist on the Rust output). The problem is cross-language: the .NET, Go, Python, and Java bench implementations still push every call's elapsed time into a single latenciesMs / latencies_ms / latencyMillis array regardless of success or failure (e.g. clients/go/cmd/mxgw-go/main.go:611 appends before the success/failure branch; clients/python/src/mxgateway_cli/commands.py:624,626 appends in both except and the happy path; clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:701,705 adds in both catch and the OK path; clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:865,880 records in both branches). The PS driver's side-by-side comparison table (lines 348-360) pulls latencyMs.p50/p95/p99/max/mean from every client and prints them in one row, so a partial-failure run now shows Rust's p99 measured over successes only and the other four clients' p99 measured over (success + per-call timeout) — the numbers are silently no longer comparable. This re-introduces the original Client.Rust-015 problem at the cross-language layer that the fix was meant to remove.

Recommendation: Make the contract uniform. Either (a) revert Rust's latencyMs to the all-calls histogram for backwards/cross-language compatibility and keep failureLatencyMs as an additive Rust-only enrichment, or (b) push the same success-only / failure-separated split into the .NET, Go, Python, and Java bench commands so every language emits the honest pair (latencyMs = success, failureLatencyMs = failure, plus firstFailure) and update the PS driver's table column to make the success-only semantics explicit (p99 ok ms). Option (b) is the better long-term posture but it is a cross-client change; option (a) restores comparability immediately.

Resolution: 2026-05-20 — Took option (a) to restore cross-language comparability immediately. Reverted Rust's latencyMs to the all-calls histogram so it matches the .NET/Go/Python/Java bench shape that scripts/bench-read-bulk.ps1 collates side-by-side: BenchReadBulkStats::record_success and record_failure now both push elapsed time into a single latencies_ms vector, and record_failure additionally pushes into failure_latencies_ms and stashes the first failure's redacted error string in first_failure. The JSON output keeps failureLatencyMs and firstFailure as Rust-only additive enrichment so a partial-failure run is still visible at the report layer without breaking the side-by-side table. Renamed the unit test to bench_read_bulk_stats_tracks_all_calls_in_latency_and_failures_separately; it now asserts latencyMs.max == 1500.0 (the slow failure is included in the cross-language latencyMs contract) while failureLatencyMs.max == 1500.0 and firstFailure still surface the failure separately for diagnostics. Pushing the success-only / failure-separated split into the other four clients (option (b)) is the better long-term posture but is deliberately out of scope here.

Client.Rust-019

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:96-100`
Status	Resolved

Description: Client.Rust-017 was closed by adding the new bulk-write/read entries to the design doc, but the signatures shown in the code block do not match the implementation. The doc declares:

pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;

The actual implementations in clients/rust/src/session.rs:385-526 take only (server_handle, entries) — user_id is per-entry on WriteBulkEntry/Write2BulkEntry, timestamp_value is per-entry on Write2BulkEntry/WriteSecured2BulkEntry, and current_user_id/verifier_user_id are per-entry on WriteSecured{,2}BulkEntry. The protobuf in src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:364-416 confirms this — there is no top-level user_id on these commands. The doc also returns Vec<ReadBulkResult> but the generated type is BulkReadResult (the gateway's BulkReadReply carries repeated BulkReadResult), and the actual signature is read_bulk<S: AsRef<str>>(..., tag_addresses: &[S], ...) -> Vec<BulkReadResult> — generic over AsRef<str> so callers can pass either Vec<String> or [&str].

The drift is small but the design doc was the explicit subject of Client.Rust-017's resolution, so it warrants a follow-up. CLAUDE.md requires docs to change with the source.

Recommendation: Replace the five signatures in RustClientDesign.md:96-100 with the ones actually in session.rs:

pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>) -> Result<Vec<BulkWriteResult>, Error>;
pub async fn read_bulk<S: AsRef<str>>(&self, server_handle: i32, tag_addresses: &[S], timeout_ms: u32) -> Result<Vec<BulkReadResult>, Error>;

and add a one-line note that the per-entry fields (user_id, timestamp_value, current_user_id, verifier_user_id) live on the entry structs themselves.

Resolution: 2026-05-20 — Replaced the five drifted signatures in RustClientDesign.md with the ones that actually live in clients/rust/src/session.rs: write_bulk / write2_bulk / write_secured_bulk / write_secured2_bulk take only (server_handle, entries), and read_bulk<S: AsRef<str>> takes a borrowed &[S] and returns Vec<BulkReadResult> (not Vec<ReadBulkResult>). Added a follow-up paragraph noting that the per-entry fields user_id / timestamp_value / current_user_id / verifier_user_id live on WriteBulkEntry / Write2BulkEntry / WriteSecuredBulkEntry / WriteSecured2BulkEntry themselves rather than as trailing positional arguments, matching the protobuf shapes in mxaccess_gateway.proto, and that read_bulk is generic over AsRef<str> so callers can pass &[String] or &[&str] without cloning at the call site.

Client.Rust-020

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/rust/src/session.rs:31-46`; `clients/rust/src/lib.rs:14-39`
Status	Resolved

Description: Client.Rust-014's resolution promoted next_correlation_id from a module-private helper to a pub function so the mxgw CLI's raw-RPC paths can share the library's correlation-id discipline. The doc comment commits the library to a literal string format — "rust-client-{label}-{N}" — that external code can now depend on. Two concerns:

The function is not re-exported at the crate root in lib.rs (it only ships through the pub mod session namespace), so the in-tree caller writes the long mxgateway_client::session::next_correlation_id("cli-ping") path. Either re-export it via #[doc(inline)] pub use session::next_correlation_id; or leave it where it is and add a short note in the doc — but the current state straddles "public API" and "lib-internal helper" without saying which.
The doc comment does not declare a stability stance (no #[doc(hidden)], no "experimental" note, no __priv naming). As written it promises the literal format "rust-client-{label}-{N}" to any out-of-tree consumer; a future change that renames the prefix (for example to drop the rust- after a multi-client reformat) would be a behavioural break. The RustClientDesign.md resolution of Client.Rust-017 ("session::next_correlation_id is pub") reads similarly — it does not say whether the format is stable.

The combination — pub, format-committing doc, no stability note, no crate-root re-export — leaves the public surface ambiguous. The same review category (Documentation & comments) is where Client.Rust-014's CLI-side fix is now visible, so this is the natural place to clean it up.

Recommendation: Pick one of:

Treat next_correlation_id as part of the SDK's public API. Re-export it from lib.rs (#[doc(inline)] pub use session::next_correlation_id;), rewrite the doc comment to not promise the literal "rust-client-{label}-{N}" format (just the property "monotonic, unique within a process, includes the supplied label"), and call that out in RustClientDesign.md.
Treat it as internal-only. Mark it #[doc(hidden)] pub and add a // Internal helper exposed for the in-tree mxgw CLI; not part of the public SDK contract. comment so out-of-tree consumers do not build against a format that the SDK is free to change.

The CLI integration in Client.Rust-014 works either way; this is solely about declaring intent so the SDK's public surface is unambiguous.

Resolution: 2026-05-20 — Took the "treat as public SDK API" branch. Re-exported next_correlation_id at the crate root in clients/rust/src/lib.rs (#[doc(inline)] pub use session::{next_correlation_id, Session};) so in-tree and external callers can write the short mxgateway_client::next_correlation_id(...) path. Updated the in-tree mxgw CLI (Ping and CloseSession subcommands) to call through the crate-root re-export instead of mxgateway_client::session::next_correlation_id. Rewrote the doc comment to drop the format promise: the returned id is now documented as an opaque token with three guaranteed properties (embeds the supplied label, unique within a process via an internal monotonic atomic sequence, carries no embedded secret beyond label), and the doc explicitly states that the textual format rust-client-{label}-{N} is not part of the public contract and that callers must not parse it. Cross-referenced the crate-root re-export from the function-level doc. Updated RustClientDesign.md to call out that next_correlation_id is part of the public SDK surface, re-exported at the crate root, and that its textual format is intentionally not part of the contract.

Client.Rust-021

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/rust/RustClientDesign.md:14-33`
Status	Open

Description: The crate-name change in commit 397d3c5 (top-level mxgateway-client → zb-mom-ww-mxgateway-client) is reflected in Cargo.toml, Cargo.lock, every use zb_mom_ww_mxgateway_client:: import, and build.rs. The "Recommended layout" block in RustClientDesign.md:21-33 shows a nested structure with a top-level crates/zb-mom-ww-mxgateway-client/ subdirectory containing src/lib.rs, src/client.rs, etc. — but the actual layout on disk is flat: the top-level crate lives at clients/rust/ directly (with src/, Cargo.toml, and build.rs at the workspace root) and crates/mxgw-cli/ is the only nested member. A reader consulting the design to understand the layout will be misled into looking for crates/zb-mom-ww-mxgateway-client/ that does not exist. CLAUDE.md requires design docs to track code changes.

Recommendation: Update RustClientDesign.md:14-33 to describe the actual flat layout: workspace root at clients/rust/, top-level crate zb-mom-ww-mxgateway-client (declared in Cargo.toml) with src/lib.rs, src/client.rs, etc. directly under it; crates/mxgw-cli/ is the single member subcrate. Alternatively label the block as "Aspirational nested layout (not currently adopted)" and add a separate "Current layout" section.

Resolution: (empty until closed)

42 KiB Raw Blame History