Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules

Re-reviewed every module/client against the 10-category checklist (REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and fixed them in three priority waves (3 High, 17 Medium, 52 Low). Highs - Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in GatewayGrpcScopeResolver so non-admin keys can use them; document the mapping in docs/Authorization.md; add interceptor tests. - Client.Java-013: add the five missing bulk-method stubs to the CLI FakeSession so the test module compiles on a clean tree. - Client.Rust-013: fix the clippy::doc_lazy_continuation regression in generated tonic code by reformatting the ReadBulkCommand proto comment and scoping a #![allow(...)] to the generated submodules. Mediums (highlights) - Server: unify GatewaySession state-lock discipline (-015) and make DisposeAsync race-safe against in-flight CloseAsync (-016); add constraint-enforcement test coverage for the bulk-plan path (-021). - Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop can distinguish graceful shutdown from a real STA-affinity violation (-016); have the watchdog skip StaHung while CurrentCommandCorrelationId is non-empty so a legitimate slow ReadBulk no longer self-faults (-017). - Tests: add per-method round-trip + cancellation coverage for the 11 GatewaySession bulk methods (-013); replace the real TCP probe in GalaxyHierarchyCacheTests with an IGalaxyRepository fake (-016). - IntegrationTests: drive the StreamEvents writer in the live Write test and assert OnWriteComplete (-012); add live tests for Unadvise/RemoveItem/Unregister ordering, WriteSecured, and abnormal worker exit (-014). - Worker.Tests: replace MxAccessSession reflection with an internal CreateForTesting factory (-016); cover WorkerCancel and unexpected-body envelope branches (-017). - Client.Java: cancel MxEventStream when close() races beforeStart() (-014); return a CancellingCompletableFuture that actually forwards cancellation through .thenApply chains (-015). - Client.Python: drop the silent localhost-plaintext downgrade in the CLI; require explicit --plaintext (-013). - Client.Rust: stop bench-read-bulk from polluting success-latency histograms with failed-call durations (-015); add coverage for the five MalformedReply paths, the bulk-write helpers, the Error::Unavailable mapping, and the unary-fault path (-016). - Contracts: extend docs/Contracts.md with the bulk read/write command family (-009). Lows (highlights) - Server: cap GalaxyGlobMatcher.RegexCache; align WorkerAlarmRpcDispatcher missing-session handling; drop the duplicate dashboard @page routes; refresh IAlarmRpcDispatcher XML doc. - Worker: surface SetXmlAlarmQuery COM failures; remove dead subscriptionExpression / ExecutingCommand arms; preserve factory-supplied runtime sessions; split MxAlarmSnapshot.cs into three files. - Tests: dispose the WebApplication in seven test classes; rebuild FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion source; switch the heartbeat-expires test to ManualTimeProvider; add InvariantCulture to the remaining DateTimeOffset.Parse sites; document GalaxyFilterInputSafetyTests in GatewayTesting.md. - IntegrationTests: comment fixes, RecordingServerStreamWriter IDisposable, class-level [Trait], single-source ZB default connection string. - Worker.Tests: replace silent-return gating with LiveMxAccessFact so absent env vars SKIP not pass; PascalCase rename of probe [Fact]s; deterministic deadline test; new frame-protocol error tests; ComputeTransitions diff-coverage; relocate dev-rig probes to Probes/. - Contracts: add round-trip coverage and per-field redaction / Galaxy-identifier comments to the protos. - Client.Dotnet: introduce clients/dotnet/Directory.Build.props so TreatWarningsAsErrors / analysers apply; document DiscoverHierarchyOptions and IMxGatewayCliClient; require typed bulk-read handles in CLI; surface AcknowledgeAlarm transport faults through Translate(). - Client.Go: kill dead code in alarms_test / fakeGalaxyServer / runWriteBulkVariant; document the six new subcommands in writeUsage; drain galaxy-watch events on limit; switch io.EOF comparisons to errors.Is. - Client.Java: shared shutdown helpers + new shutdownTimeout option; regex-based credential redaction; Long.toUnsignedString for uint64 sequence; doc fixes. - Client.Python: combine duplicate imports; add coverage for _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS / _api_key_from_env; populate pyproject metadata and ship py.typed. - Client.Rust: expose next_correlation_id() so CLI ping/close stop hard-coding correlation IDs; resync RustClientDesign.md with the current Session / Error surface and CLI subcommand set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00
parent 1cd51bbda3
commit a0203503a7
122 changed files with 8723 additions and 757 deletions
@@ -93,11 +93,24 @@ impl Session {
    pub async fn subscribe_bulk(&self, server_handle: i32, tag_addresses: Vec<String>) -> Result<Vec<SubscribeResult>, Error>;
    pub async fn unsubscribe_bulk(&self, server_handle: i32, item_handles: Vec<i32>) -> Result<Vec<SubscribeResult>, Error>;
    pub async fn write(&self, server_handle: i32, item_handle: i32, value: MxValue, user_id: i32) -> Result<(), Error>;
+    pub async fn write_bulk(&self, server_handle: i32, entries: Vec<WriteBulkEntry>, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
+    pub async fn write2_bulk(&self, server_handle: i32, entries: Vec<Write2BulkEntry>, timestamp: prost_types::Timestamp, user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
+    pub async fn write_secured_bulk(&self, server_handle: i32, entries: Vec<WriteSecuredBulkEntry>, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
+    pub async fn write_secured2_bulk(&self, server_handle: i32, entries: Vec<WriteSecured2BulkEntry>, timestamp: prost_types::Timestamp, current_user_id: i32, verifier_user_id: i32) -> Result<Vec<BulkWriteResult>, Error>;
+    pub async fn read_bulk(&self, server_handle: i32, tags: &[String], timeout_ms: u32) -> Result<Vec<ReadBulkResult>, Error>;
    pub async fn events(&self) -> Result<impl Stream<Item = Result<MxEvent, Error>>, Error>;
    pub async fn close(&self) -> Result<(), Error>;
 }
 ```

+The five bulk-write helpers (`write_bulk`, `write2_bulk`, `write_secured_bulk`,
+`write_secured2_bulk`) and `read_bulk` mirror the worker's bulk command shapes
+in `mxaccess_gateway.proto` and use the same correlation-id discipline as the
+unary helpers — `session::next_correlation_id` is `pub` so that consumers
+constructing raw `MxCommandRequest`/`CloseSessionRequest` payloads outside
+the `Session` helpers (notably the `mxgw` test CLI's `ping` and
+`close-session` subcommands) share the same id generation.
+
 ## Authentication

 Use a `tonic` interceptor or request extension layer to add:
@@ -132,19 +145,29 @@ Use `thiserror`:

 ```rust
 pub enum Error {
+    InvalidEndpoint { endpoint: String, detail: String },
+    InvalidArgument { name: String, detail: String },
    Transport(tonic::transport::Error),
-    Status(tonic::Status),
-    Authentication(String),
-    Authorization(String),
-    Session(SessionError),
-    Worker(WorkerError),
-    Command(CommandError),
-    MxAccess(MxAccessError),
-    Timeout,
-    Cancelled,
+    Authentication { message: String, status: Box<tonic::Status> },
+    Authorization { message: String, status: Box<tonic::Status> },
+    Timeout { message: String, status: Box<tonic::Status> },
+    Cancelled { message: String, status: Box<tonic::Status> },
+    Unavailable { message: String, status: Box<tonic::Status> },
+    Status(Box<tonic::Status>),
+    Command(Box<CommandError>),
+    ProtocolStatus { operation: &'static str, code: ProtocolStatusCode, message: String },
+    MalformedReply { detail: String },
 }
 ```

+`Unavailable` classifies the transient `Code::Unavailable` /
+`Code::ResourceExhausted` statuses so callers can decide whether to retry
+without unwrapping the raw status. `MalformedReply` surfaces OK replies
+whose payload does not carry the data the command contract requires (for
+example, an `AddItem` reply missing the item handle, or a `WriteBulk` reply
+carrying the wrong payload arm). `InvalidEndpoint` is returned when the
+endpoint URL fails to parse or its TLS material cannot be loaded.
+
 Preserve raw command replies in `CommandError` where applicable.

 ## Test CLI
@@ -153,13 +176,32 @@ Binary: `mxgw`.

 Use `clap` derive.

-Commands:
+Commands (see `clients/rust/README.md` for full argument lists):

 ```text
 mxgw version
-mxgw smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item TestChildObject.TestInt
+mxgw ping
+mxgw open-session
+mxgw close-session --session-id <id>
+mxgw register --session-id <id> --client-name <name>
+mxgw add-item --session-id <id> --server-handle <h> --item <tag>
+mxgw advise --session-id <id> --server-handle <h> --item-handle <h>
+mxgw subscribe-bulk --session-id <id> --server-handle <h> --items <a,b,c>
+mxgw unsubscribe-bulk --session-id <id> --server-handle <h> --item-handles <1,2,3>
+mxgw read-bulk --session-id <id> --server-handle <h> --items <a,b,c> --timeout-ms 1500
+mxgw write --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123
+mxgw write2 --session-id <id> --server-handle 1 --item-handle 1 --value-type int32 --value 123 --timestamp <rfc3339>
+mxgw write-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2>
+mxgw write2-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2> --timestamp <rfc3339>
+mxgw write-secured-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2>
+mxgw write-secured2-bulk --session-id <id> --server-handle <h> --item-handles <1,2> --value-type int32 --values <1,2> --timestamp <rfc3339>
 mxgw stream-events --session-id <id> --json
-mxgw write --session-id <id> --server-handle 1 --item-handle 1 --type int32 --value 123
+mxgw bench-read-bulk --duration-seconds 30 --bulk-size 6 --json
+mxgw smoke --endpoint http://localhost:5000 --api-key-env MXGATEWAY_API_KEY --item TestChildObject.TestInt
+mxgw galaxy test-connection
+mxgw galaxy last-deploy-time
+mxgw galaxy discover-hierarchy
+mxgw galaxy watch [--last-seen-deploy-time <rfc3339>] [--max-events N]
 ```

 JSON output should use `serde_json`.
@@ -447,7 +447,9 @@ async fn run(cli: Cli) -> Result<(), Error> {
            let client = connect(connection).await?;
            let reply = client
                .invoke(MxCommandRequest {
-                    client_correlation_id: "rust-cli-ping".to_owned(),
+                    client_correlation_id: mxgateway_client::session::next_correlation_id(
+                        "cli-ping",
+                    ),
                    command: Some(MxCommand {
                        kind: MxCommandKind::Ping as i32,
                        payload: Some(mxgateway_client::generated::mxaccess_gateway::v1::mx_command::Payload::Ping(
@@ -494,7 +496,9 @@ async fn run(cli: Cli) -> Result<(), Error> {
            let reply = client
                .close_session_raw(CloseSessionRequest {
                    session_id,
-                    client_correlation_id: "rust-cli-close-session".to_owned(),
+                    client_correlation_id: mxgateway_client::session::next_correlation_id(
+                        "cli-close-session",
+                    ),
                })
                .await?;
            if json {
@@ -1034,19 +1038,13 @@ async fn run_bench_read_bulk(
            .map(|r| r.item_handle)
            .collect();

-        let warmup_deadline = std::time::Instant::now()
-            + std::time::Duration::from_secs(warmup_seconds);
+        let warmup_deadline =
+            std::time::Instant::now() + std::time::Duration::from_secs(warmup_seconds);
        while std::time::Instant::now() < warmup_deadline {
-            let _ = session
-                .read_bulk(server_handle, &tags, timeout_ms)
-                .await;
+            let _ = session.read_bulk(server_handle, &tags, timeout_ms).await;
        }

-        let mut latencies_ms: Vec<f64> = Vec::with_capacity(65_536);
-        let mut total_read_results: u64 = 0;
-        let mut cached_read_results: u64 = 0;
-        let mut successful_calls: u64 = 0;
-        let mut failed_calls: u64 = 0;
+        let mut stats = BenchReadBulkStats::default();
        let steady_start = std::time::Instant::now();
        let steady_deadline = steady_start + std::time::Duration::from_secs(duration_seconds);

@@ -1054,18 +1052,9 @@ async fn run_bench_read_bulk(
            let call_start = std::time::Instant::now();
            let outcome = session.read_bulk(server_handle, &tags, timeout_ms).await;
            let elapsed_ms = call_start.elapsed().as_secs_f64() * 1000.0;
-            latencies_ms.push(elapsed_ms);
            match outcome {
-                Ok(results) => {
-                    successful_calls += 1;
-                    for r in &results {
-                        total_read_results += 1;
-                        if r.was_cached {
-                            cached_read_results += 1;
-                        }
-                    }
-                }
-                Err(_) => failed_calls += 1,
+                Ok(results) => stats.record_success(elapsed_ms, &results),
+                Err(error) => stats.record_failure(elapsed_ms, &error),
            }
        }
        let steady_elapsed = steady_start.elapsed();
@@ -1074,36 +1063,20 @@ async fn run_bench_read_bulk(
            let _ = session.unsubscribe_bulk(server_handle, item_handles).await;
        }

-        let total_calls = successful_calls + failed_calls;
-        let calls_per_second = if steady_elapsed.as_secs_f64() > 0.0 {
-            total_calls as f64 / steady_elapsed.as_secs_f64()
-        } else {
-            0.0
+        let context = BenchReadBulkContext {
+            endpoint: &endpoint,
+            client_name: &client_name,
+            bulk_size,
+            duration_seconds,
+            warmup_seconds,
+            steady_elapsed,
+            tags: &tags,
        };
-
-        let summary = percentile_summary(&latencies_ms);
-        let stats = serde_json::json!({
-            "language": "rust",
-            "command": "bench-read-bulk",
-            "endpoint": endpoint,
-            "clientName": client_name,
-            "bulkSize": bulk_size,
-            "durationSeconds": duration_seconds,
-            "warmupSeconds": warmup_seconds,
-            "durationMs": steady_elapsed.as_millis() as u64,
-            "tags": tags,
-            "totalCalls": total_calls,
-            "successfulCalls": successful_calls,
-            "failedCalls": failed_calls,
-            "totalReadResults": total_read_results,
-            "cachedReadResults": cached_read_results,
-            "callsPerSecond": round_to(calls_per_second, 2),
-            "latencyMs": summary,
-        });
+        let json_stats = stats.to_json(&context);
        if use_json {
-            println!("{}", stats);
+            println!("{}", json_stats);
        } else {
-            println!("{calls_per_second}");
+            println!("{}", stats.calls_per_second(steady_elapsed));
        }
        Ok::<(), Error>(())
    }
@@ -1113,6 +1086,102 @@ async fn run_bench_read_bulk(
    bench_outcome
 }

+/// Per-iteration accounting for `bench-read-bulk`.
+///
+/// Only successful `read_bulk` calls contribute to the success-latency
+/// histogram (`success_latencies_ms`). Failures are tracked separately in
+/// `failure_latencies_ms` and the first failure's redacted error string is
+/// stashed in `first_failure` so a partial-failure run is visible in the
+/// emitted JSON. This keeps the cross-language `latencyMs.p99`/`max`
+/// contract honest: it reports successful-call latency only and never
+/// folds in a per-call timeout from a failed RPC.
+#[derive(Default)]
+struct BenchReadBulkStats {
+    success_latencies_ms: Vec<f64>,
+    failure_latencies_ms: Vec<f64>,
+    total_read_results: u64,
+    cached_read_results: u64,
+    successful_calls: u64,
+    failed_calls: u64,
+    first_failure: Option<String>,
+}
+
+impl BenchReadBulkStats {
+    fn record_success(
+        &mut self,
+        elapsed_ms: f64,
+        results: &[mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult],
+    ) {
+        self.success_latencies_ms.push(elapsed_ms);
+        self.successful_calls += 1;
+        for result in results {
+            self.total_read_results += 1;
+            if result.was_cached {
+                self.cached_read_results += 1;
+            }
+        }
+    }
+
+    fn record_failure(&mut self, elapsed_ms: f64, error: &Error) {
+        self.failure_latencies_ms.push(elapsed_ms);
+        self.failed_calls += 1;
+        if self.first_failure.is_none() {
+            self.first_failure = Some(error.to_string());
+        }
+    }
+
+    fn total_calls(&self) -> u64 {
+        self.successful_calls + self.failed_calls
+    }
+
+    fn calls_per_second(&self, elapsed: std::time::Duration) -> f64 {
+        let seconds = elapsed.as_secs_f64();
+        if seconds > 0.0 {
+            self.total_calls() as f64 / seconds
+        } else {
+            0.0
+        }
+    }
+
+    fn to_json(&self, context: &BenchReadBulkContext<'_>) -> serde_json::Value {
+        let calls_per_second = self.calls_per_second(context.steady_elapsed);
+        let success_summary = percentile_summary(&self.success_latencies_ms);
+        let failure_summary = percentile_summary(&self.failure_latencies_ms);
+        serde_json::json!({
+            "language": "rust",
+            "command": "bench-read-bulk",
+            "endpoint": context.endpoint,
+            "clientName": context.client_name,
+            "bulkSize": context.bulk_size,
+            "durationSeconds": context.duration_seconds,
+            "warmupSeconds": context.warmup_seconds,
+            "durationMs": context.steady_elapsed.as_millis() as u64,
+            "tags": context.tags,
+            "totalCalls": self.total_calls(),
+            "successfulCalls": self.successful_calls,
+            "failedCalls": self.failed_calls,
+            "totalReadResults": self.total_read_results,
+            "cachedReadResults": self.cached_read_results,
+            "callsPerSecond": round_to(calls_per_second, 2),
+            "latencyMs": success_summary,
+            "failureLatencyMs": failure_summary,
+            "firstFailure": self.first_failure,
+        })
+    }
+}
+
+/// Static configuration for one `bench-read-bulk` run, packaged so the
+/// JSON serialiser can quote it back without taking eight positional args.
+struct BenchReadBulkContext<'a> {
+    endpoint: &'a str,
+    client_name: &'a str,
+    bulk_size: usize,
+    duration_seconds: u64,
+    warmup_seconds: u64,
+    steady_elapsed: std::time::Duration,
+    tags: &'a [String],
+}
+
 fn percentile_summary(sample: &[f64]) -> serde_json::Value {
    if sample.is_empty() {
        return serde_json::json!({ "p50": 0.0, "p95": 0.0, "p99": 0.0, "max": 0.0, "mean": 0.0 });
@@ -1294,7 +1363,13 @@ fn build_write_bulk_entries(
    item_handles: &[i32],
    value_type: CliValueType,
    values: &[String],
-) -> Result<Vec<(i32, mxgateway_client::generated::mxaccess_gateway::v1::MxValue)>, Error> {
+) -> Result<
+    Vec<(
+        i32,
+        mxgateway_client::generated::mxaccess_gateway::v1::MxValue,
+    )>,
+    Error,
+> {
    if item_handles.len() != values.len() {
        return Err(Error::InvalidArgument {
            name: "values".to_owned(),
@@ -1660,4 +1735,77 @@ mod tests {
        assert_eq!(frac.seconds, utc.seconds);
        assert_eq!(frac.nanos, 250_000_000);
    }
+
+    #[test]
+    fn bench_read_bulk_stats_keeps_failures_out_of_success_latency_histogram() {
+        use mxgateway_client::generated::mxaccess_gateway::v1::BulkReadResult;
+        use mxgateway_client::Error;
+
+        let mut stats = super::BenchReadBulkStats::default();
+        let cached = BulkReadResult {
+            was_cached: true,
+            was_successful: true,
+            ..BulkReadResult::default()
+        };
+        let uncached = BulkReadResult {
+            was_cached: false,
+            was_successful: true,
+            ..BulkReadResult::default()
+        };
+
+        // Two fast successes and one slow failure: the slow failure must
+        // not pollute the success p99/max histogram.
+        stats.record_success(1.5, std::slice::from_ref(&cached));
+        stats.record_success(2.0, std::slice::from_ref(&uncached));
+        let failure = Error::MalformedReply {
+            detail: "synthetic failure for the bench test".to_owned(),
+        };
+        stats.record_failure(1_500.0, &failure);
+
+        assert_eq!(stats.success_latencies_ms, vec![1.5, 2.0]);
+        assert_eq!(stats.failure_latencies_ms, vec![1_500.0]);
+        assert_eq!(stats.successful_calls, 2);
+        assert_eq!(stats.failed_calls, 1);
+        assert_eq!(stats.total_calls(), 3);
+        assert_eq!(stats.total_read_results, 2);
+        assert_eq!(stats.cached_read_results, 1);
+        assert!(stats
+            .first_failure
+            .as_deref()
+            .unwrap()
+            .contains("synthetic failure"));
+
+        let elapsed = std::time::Duration::from_secs(1);
+        let context = super::BenchReadBulkContext {
+            endpoint: "http://fake",
+            client_name: "client",
+            bulk_size: 2,
+            duration_seconds: 1,
+            warmup_seconds: 0,
+            steady_elapsed: elapsed,
+            tags: &[],
+        };
+        let payload = stats.to_json(&context);
+        // The success-latency histogram must never see the 1_500 ms failure.
+        assert_eq!(payload["latencyMs"]["max"].as_f64().unwrap(), 2.0);
+        assert!(payload["latencyMs"]["p99"].as_f64().unwrap() <= 2.0);
+        // The failure-latency histogram must own it instead.
+        assert_eq!(
+            payload["failureLatencyMs"]["max"].as_f64().unwrap(),
+            1_500.0
+        );
+        assert_eq!(payload["failedCalls"].as_u64().unwrap(), 1);
+        assert_eq!(payload["successfulCalls"].as_u64().unwrap(), 2);
+        assert!(payload["firstFailure"]
+            .as_str()
+            .unwrap()
+            .contains("synthetic failure"));
+    }
+
+    #[test]
+    fn bench_read_bulk_stats_calls_per_second_handles_zero_duration() {
+        let stats = super::BenchReadBulkStats::default();
+
+        assert_eq!(stats.calls_per_second(std::time::Duration::ZERO), 0.0);
+    }
 }
@@ -14,6 +14,7 @@ pub mod mxaccess_gateway {
    /// gateway to language clients.
    pub mod v1 {
        #![allow(clippy::large_enum_variant)]
+        #![allow(clippy::doc_lazy_continuation)]

        tonic::include_proto!("mxaccess_gateway.v1");
    }
@@ -25,6 +26,7 @@ pub mod mxaccess_worker {
    /// the named-pipe transport between gateway and worker.
    pub mod v1 {
        #![allow(clippy::large_enum_variant)]
+        #![allow(clippy::doc_lazy_continuation)]

        tonic::include_proto!("mxaccess_worker.v1");
    }
@@ -36,6 +38,7 @@ pub mod galaxy_repository {
    /// discovery and deploy-event watch RPCs.
    pub mod v1 {
        #![allow(clippy::large_enum_variant)]
+        #![allow(clippy::doc_lazy_continuation)]

        tonic::include_proto!("galaxy_repository.v1");
    }
@@ -33,7 +33,14 @@ static CORRELATION_SEQUENCE: AtomicU64 = AtomicU64::new(0);

 /// Build a unique `client_correlation_id` for a request so concurrent or
 /// repeated calls of the same command kind can be told apart in gateway logs.
-fn next_correlation_id(label: &str) -> String {
+///
+/// Exposed so consumers that construct raw [`MxCommandRequest`] /
+/// [`CloseSessionRequest`] payloads outside the `Session` helpers — notably
+/// the `mxgw` test CLI — share the same correlation-id discipline as the
+/// library. The returned id is `rust-client-{label}-{N}` where `N` comes
+/// from a process-wide atomic sequence.
+#[must_use]
+pub fn next_correlation_id(label: &str) -> String {
    let sequence = CORRELATION_SEQUENCE.fetch_add(1, Ordering::Relaxed);
    format!("rust-client-{label}-{sequence}")
 }
@@ -761,8 +768,7 @@ fn bulk_write_results(
            BulkWriteReplyKind::WriteSecured2,
        ) => Ok(reply.results),
        _ => Err(Error::MalformedReply {
-            detail: "bulk write reply did not carry the expected BulkWriteReply payload"
-                .to_owned(),
+            detail: "bulk write reply did not carry the expected BulkWriteReply payload".to_owned(),
        }),
    }
 }
@@ -20,7 +20,8 @@ use mxgateway_client::generated::mxaccess_gateway::v1::{
    CloseSessionReply, CloseSessionRequest, MxCommandKind, MxCommandReply, MxDataType, MxEvent,
    MxEventFamily, MxStatusCategory, MxStatusProxy, MxStatusSource, MxValue, OpenSessionReply,
    OpenSessionRequest, ProtocolStatus, ProtocolStatusCode, QueryActiveAlarmsRequest, SessionState,
-    StreamEventsRequest, SubscribeResult, WriteBulkEntry,
+    StreamEventsRequest, SubscribeResult, Write2BulkEntry, WriteBulkEntry, WriteSecured2BulkEntry,
+    WriteSecuredBulkEntry,
 };
 use mxgateway_client::{
    ApiKey, ClientOptions, CommandError, Error, GatewayClient, MxStatus, MxValue as ClientMxValue,
@@ -160,7 +161,10 @@ async fn read_bulk_forwards_timeout_and_unpacks_cached_flag() {

    let entry = &results[0];
    assert!(entry.was_cached);
-    assert_eq!(entry.value.as_ref().and_then(|v| v.kind.as_ref()), Some(&Kind::Int32Value(99)));
+    assert_eq!(
+        entry.value.as_ref().and_then(|v| v.kind.as_ref()),
+        Some(&Kind::Int32Value(99))
+    );
    assert_eq!(*state.last_read_bulk_timeout_ms.lock().await, Some(750));
 }

@@ -393,6 +397,238 @@ async fn connect_with_unreadable_ca_file_reports_invalid_endpoint() {
    );
 }

+#[tokio::test]
+async fn register_returns_malformed_reply_when_ok_reply_has_no_payload() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session.register("client-name").await.unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("Register")),
+        "expected MalformedReply for register, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn add_item_returns_malformed_reply_when_ok_reply_has_no_payload() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("AddItem")),
+        "expected MalformedReply for add_item, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn add_item2_returns_malformed_reply_when_ok_reply_has_no_payload() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyNoPayload);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session
+        .add_item2(12, "Plant.Area.Tag", "ctx")
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("AddItem2")),
+        "expected MalformedReply for add_item2, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn subscribe_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForBulk);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session
+        .subscribe_bulk(12, vec!["Tank01.Level".to_owned()])
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("bulk")),
+        "expected MalformedReply for subscribe_bulk, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn write_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForBulkWrite);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session
+        .write_bulk(
+            12,
+            vec![WriteBulkEntry {
+                item_handle: 901,
+                value: Some(int_value(11)),
+                user_id: 5,
+            }],
+        )
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("bulk write")),
+        "expected MalformedReply for write_bulk, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn read_bulk_returns_malformed_reply_on_mismatched_payload_arm() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await = Some(InvokeOverride::OkReplyWrongPayloadForReadBulk);
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session
+        .read_bulk(12, &["Tank01.Level"], 500)
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(&error, Error::MalformedReply { detail } if detail.contains("ReadBulk")),
+        "expected MalformedReply for read_bulk, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn unary_invoke_maps_status_unavailable_to_error_unavailable() {
+    let state = Arc::new(FakeState::default());
+    *state.invoke_override.lock().await =
+        Some(InvokeOverride::Unavailable("gateway restarting".to_owned()));
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let error = session.add_item(12, "Plant.Area.Tag").await.unwrap_err();
+
+    assert!(
+        matches!(&error, Error::Unavailable { .. }),
+        "expected Error::Unavailable for unary unavailable, got {error:?}"
+    );
+}
+
+#[tokio::test]
+async fn write2_bulk_round_trips_through_the_fake_gateway() {
+    let state = Arc::new(FakeState::default());
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let results = session
+        .write2_bulk(
+            12,
+            vec![Write2BulkEntry {
+                item_handle: 901,
+                value: Some(int_value(11)),
+                timestamp_value: Some(int_value(0)),
+                user_id: 5,
+            }],
+        )
+        .await
+        .unwrap();
+
+    assert_eq!(results.len(), 2);
+    assert!(results[0].was_successful);
+    assert!(!results[1].was_successful);
+    let last_command = state.last_command_kind.lock().await;
+    assert_eq!(*last_command, Some(MxCommandKind::Write2Bulk as i32));
+}
+
+#[tokio::test]
+async fn write_secured_bulk_round_trips_through_the_fake_gateway() {
+    let state = Arc::new(FakeState::default());
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let results = session
+        .write_secured_bulk(
+            12,
+            vec![WriteSecuredBulkEntry {
+                item_handle: 901,
+                current_user_id: 7,
+                verifier_user_id: 9,
+                value: Some(int_value(11)),
+            }],
+        )
+        .await
+        .unwrap();
+
+    assert_eq!(results.len(), 2);
+    assert!(results[0].was_successful);
+    let last_command = state.last_command_kind.lock().await;
+    assert_eq!(*last_command, Some(MxCommandKind::WriteSecuredBulk as i32));
+}
+
+#[tokio::test]
+async fn write_secured2_bulk_round_trips_through_the_fake_gateway() {
+    let state = Arc::new(FakeState::default());
+    let endpoint = spawn_fake_gateway(state.clone()).await;
+    let client = GatewayClient::connect(ClientOptions::new(endpoint))
+        .await
+        .unwrap();
+    let session = client.session("session-fixture");
+
+    let results = session
+        .write_secured2_bulk(
+            12,
+            vec![WriteSecured2BulkEntry {
+                item_handle: 901,
+                current_user_id: 7,
+                verifier_user_id: 9,
+                value: Some(int_value(11)),
+                timestamp_value: Some(int_value(0)),
+            }],
+        )
+        .await
+        .unwrap();
+
+    assert_eq!(results.len(), 2);
+    assert!(results[0].was_successful);
+    let last_command = state.last_command_kind.lock().await;
+    assert_eq!(*last_command, Some(MxCommandKind::WriteSecured2Bulk as i32));
+}
+
 #[derive(Default)]
 struct FakeState {
    authorization: Mutex<Option<String>>,
@@ -400,6 +636,39 @@ struct FakeState {
    last_read_bulk_timeout_ms: Mutex<Option<u32>>,
    stream_dropped: Arc<AtomicBool>,
    emit_stream_fault: AtomicBool,
+    /// Test-injected override for the next (and all subsequent) `Invoke`
+    /// calls. When `Some`, the fake gateway returns the override's response
+    /// instead of its default per-kind reply. Used by the malformed-reply
+    /// and unary-Unavailable tests; default `None` preserves existing
+    /// happy-path test behaviour.
+    invoke_override: Mutex<Option<InvokeOverride>>,
+}
+
+/// Test-injected override for the fake gateway's `Invoke` handler.
+///
+/// Each variant short-circuits the per-kind dispatch in `FakeGateway::invoke`
+/// and reproduces one of the wire shapes the Rust client's error paths must
+/// handle. The bool tags the OK reply variants as "OK envelope, payload
+/// missing/wrong" — the exact condition the new `Error::MalformedReply`
+/// paths in `session.rs` are designed to catch.
+#[derive(Clone)]
+enum InvokeOverride {
+    /// Return `Status::unavailable(message)` from the unary Invoke RPC, so
+    /// the client maps it to `Error::Unavailable`.
+    Unavailable(String),
+    /// Return an OK `MxCommandReply` whose `payload` field is `None`. Used
+    /// to exercise `register_server_handle` / `add_item_handle` /
+    /// `add_item2_handle` falling through to the `MalformedReply` arm.
+    OkReplyNoPayload,
+    /// Return an OK reply whose payload arm does not match the bulk-read
+    /// command, so `read_bulk` falls through to its `MalformedReply` arm.
+    OkReplyWrongPayloadForReadBulk,
+    /// Return an OK reply whose payload arm does not match the requested
+    /// bulk command, so `bulk_results` falls through to `MalformedReply`.
+    OkReplyWrongPayloadForBulk,
+    /// Return an OK reply whose payload arm does not match the requested
+    /// bulk-write command, so `bulk_write_results` returns `MalformedReply`.
+    OkReplyWrongPayloadForBulkWrite,
 }

 #[derive(Clone)]
@@ -453,6 +722,58 @@ impl MxAccessGateway for FakeGateway {
            .unwrap_or_default();
        *self.state.last_command_kind.lock().await = Some(kind);

+        if let Some(override_) = self.state.invoke_override.lock().await.clone() {
+            return match override_ {
+                InvokeOverride::Unavailable(message) => Err(Status::unavailable(message)),
+                InvokeOverride::OkReplyNoPayload => Ok(Response::new(MxCommandReply {
+                    session_id: request.session_id,
+                    correlation_id: "fake-correlation".to_owned(),
+                    kind,
+                    protocol_status: Some(ok_status("command ok but payload omitted")),
+                    payload: None,
+                    ..MxCommandReply::default()
+                })),
+                InvokeOverride::OkReplyWrongPayloadForReadBulk => {
+                    Ok(Response::new(MxCommandReply {
+                        session_id: request.session_id,
+                        correlation_id: "fake-correlation".to_owned(),
+                        kind,
+                        protocol_status: Some(ok_status("read-bulk wrong payload arm")),
+                        // AddItem payload arm against a ReadBulk request:
+                        // the client's `read_bulk` matcher must reject it.
+                        payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
+                            item_handle: 0,
+                        })),
+                        ..MxCommandReply::default()
+                    }))
+                }
+                InvokeOverride::OkReplyWrongPayloadForBulk => Ok(Response::new(MxCommandReply {
+                    session_id: request.session_id,
+                    correlation_id: "fake-correlation".to_owned(),
+                    kind,
+                    protocol_status: Some(ok_status("bulk wrong payload arm")),
+                    // AddItem payload arm against a SubscribeBulk request.
+                    payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
+                        item_handle: 0,
+                    })),
+                    ..MxCommandReply::default()
+                })),
+                InvokeOverride::OkReplyWrongPayloadForBulkWrite => {
+                    Ok(Response::new(MxCommandReply {
+                        session_id: request.session_id,
+                        correlation_id: "fake-correlation".to_owned(),
+                        kind,
+                        protocol_status: Some(ok_status("bulk-write wrong payload arm")),
+                        // AddItem payload arm against a WriteBulk request.
+                        payload: Some(mx_command_reply::Payload::AddItem(AddItemReply {
+                            item_handle: 0,
+                        })),
+                        ..MxCommandReply::default()
+                    }))
+                }
+            };
+        }
+
        if kind == MxCommandKind::Write as i32 {
            return Ok(Response::new(mxaccess_failure_reply()));
        }
@@ -478,36 +799,41 @@ impl MxAccessGateway for FakeGateway {
            }));
        }

+        // All four bulk-write families return `BulkWriteReply` over the
+        // wire and only differ by which `payload` arm carries it. The
+        // round-trip tests below want one entry per family, so wire them
+        // all up to the same canned reply (one success + one failure) and
+        // pick the matching payload arm by kind.
        if kind == MxCommandKind::WriteBulk as i32 {
-            // Echo one success and one failure so the test can assert the per-entry
-            // shape and verify the call did not throw on per-entry failure.
-            return Ok(Response::new(MxCommandReply {
-                session_id: request.session_id,
-                correlation_id: "fake-correlation".to_owned(),
+            return Ok(Response::new(bulk_write_envelope(
+                request.session_id,
                kind,
-                protocol_status: Some(ok_status("command ok")),
-                payload: Some(mx_command_reply::Payload::WriteBulk(BulkWriteReply {
-                    results: vec![
-                        BulkWriteResult {
-                            server_handle: 12,
-                            item_handle: 901,
-                            was_successful: true,
-                            hresult: None,
-                            statuses: vec![],
-                            error_message: String::new(),
-                        },
-                        BulkWriteResult {
-                            server_handle: 12,
-                            item_handle: 902,
-                            was_successful: false,
-                            hresult: None,
-                            statuses: vec![],
-                            error_message: "invalid handle".to_owned(),
-                        },
-                    ],
-                })),
-                ..MxCommandReply::default()
-            }));
+                mx_command_reply::Payload::WriteBulk(canned_bulk_write_reply()),
+            )));
+        }
+
+        if kind == MxCommandKind::Write2Bulk as i32 {
+            return Ok(Response::new(bulk_write_envelope(
+                request.session_id,
+                kind,
+                mx_command_reply::Payload::Write2Bulk(canned_bulk_write_reply()),
+            )));
+        }
+
+        if kind == MxCommandKind::WriteSecuredBulk as i32 {
+            return Ok(Response::new(bulk_write_envelope(
+                request.session_id,
+                kind,
+                mx_command_reply::Payload::WriteSecuredBulk(canned_bulk_write_reply()),
+            )));
+        }
+
+        if kind == MxCommandKind::WriteSecured2Bulk as i32 {
+            return Ok(Response::new(bulk_write_envelope(
+                request.session_id,
+                kind,
+                mx_command_reply::Payload::WriteSecured2Bulk(canned_bulk_write_reply()),
+            )));
        }

        if kind == MxCommandKind::ReadBulk as i32 {
@@ -699,6 +1025,44 @@ fn mxaccess_failure_reply() -> MxCommandReply {
    }
 }

+fn canned_bulk_write_reply() -> BulkWriteReply {
+    BulkWriteReply {
+        results: vec![
+            BulkWriteResult {
+                server_handle: 12,
+                item_handle: 901,
+                was_successful: true,
+                hresult: None,
+                statuses: vec![],
+                error_message: String::new(),
+            },
+            BulkWriteResult {
+                server_handle: 12,
+                item_handle: 902,
+                was_successful: false,
+                hresult: None,
+                statuses: vec![],
+                error_message: "invalid handle".to_owned(),
+            },
+        ],
+    }
+}
+
+fn bulk_write_envelope(
+    session_id: String,
+    kind: i32,
+    payload: mx_command_reply::Payload,
+) -> MxCommandReply {
+    MxCommandReply {
+        session_id,
+        correlation_id: "fake-correlation".to_owned(),
+        kind,
+        protocol_status: Some(ok_status("command ok")),
+        payload: Some(payload),
+        ..MxCommandReply::default()
+    }
+}
+
 fn event(sequence: u64) -> MxEvent {
    MxEvent {
        family: MxEventFamily::OnDataChange as i32,