Code-review 2026-05-20 sweep #2: re-review at a020350, resolve 48 findings

Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.

High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
  pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
  string (it must be a valid SPDX expression), so `pip wheel .` and
  `pip install -e .` both fail before any source compiles. Tests
  still pass because pytest bypasses the build backend via
  `pythonpath`. Dropped the invalid license string, kept the
  `License :: Other/Proprietary License` classifier, and added
  `tests/test_packaging.py` so a future regression of the same shape
  is caught in CI.

Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
  on WorkerPipeSessionOptions bounds the in-flight-command watchdog
  suppression so a truly stuck COM call still triggers StaHung
  instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
  cross-language bench comparison is apples-to-apples again;
  `failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
  serialisation pattern to DeployEventStream so close() arriving
  after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
  stability check after UnAdvise instead of strict equality against
  the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
  log sink the WriteSecured live test owns (worker stdout/stderr,
  gateway logs, direct WriteLine) so the credential is proven
  absent from the full output buffer, not just the diagnostic
  message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
  for the previously-uncovered Write2Bulk and WriteSecured2Bulk
  arms of WriteBulkConstraintPlan.SetPayload.

Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
  GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
  AlarmsOptions validated at startup (Server-026); Authorization.md
  Constraint Enforcement snippet/prose enumerate the bulk write/read
  family (Server-027); bulk-read-commands and bulk-write-commands
  capability tokens added to OpenSession (Server-029);
  NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
  state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
  guard the poll path uses, at every command entry (Worker-024);
  RunAsync null-checks the runtime-session factory result
  (Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
  GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
  rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
  CancelCommandReturnValue serialised under lock (Worker.Tests-027);
  Probes namespace lifted to MxGateway.Worker.Tests.Probes
  (Worker.Tests-029); cancel-envelope sequence numbers monotonised
  (Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
  section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
  (Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
  test backed by a TaskCompletionSource fake (Tests-022); companion
  FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
  (Tests-023); constraint plan reply-count divergence pinned
  (Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
  end-to-end (IntegrationTests-018); abnormal-exit keyword set
  tightened to pipe-disconnected/end-of-stream and the test now
  asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
  default 30s wall-clock budget doesn't kill them (015);
  BenchStreamEventsAsync observes the inner stream task on every
  exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
  %w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
  RFC3339Nano with fractional seconds (019); runStreamEvents installs
  signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
  table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
  cancellation contract Client.Java-015 established (022); stream-events
  text path uses Long.toUnsignedString for worker_sequence (023);
  bench-read-bulk no longer pollutes success-latency histogram with
  failure durations (024); --shutdown-timeout CLI option propagates
  through to ClientOptions (025); seven new MxGatewayCliTests cover
  the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
  wheel-build smoke test added under tests/test_packaging.py (020);
  README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
  document the AsRef<str> read_bulk genericism (019);
  next_correlation_id re-exported at the crate root, with a
  property-style doc contract and an explicit disclaimer that the
  literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
  IConstraintEnforcer mechanism instead of "tag-allowlist filter"
  (014); BulkReadResult gains explicit per-arm payload-population
  documentation for the success vs failure cases (015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 10:28:54 -04:00
parent a0203503a7
commit 1aafd6bde4
74 changed files with 3349 additions and 395 deletions
+102 -12
View File
@@ -5,28 +5,28 @@
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | `1cd51bb` |
| Commit reviewed | `a020350` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
A second-pass review against commit `1cd51bb`. Client.Java-001 through
Client.Java-012 are unchanged from the prior pass; the table below records the
new findings raised in this pass against the same checklist categories.
A third-pass review against commit `a020350` (the sweep that resolved
Client.Java-013 through Client.Java-020). Prior findings are unchanged; new
findings raised in this pass are numbered Client.Java-021 onward.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: CLI `MxEventStream(1024)` capacity contradicts Javadoc/README "16-element buffer" claim (Client.Java-017); CLI `DeployEvent.sequence` printed with `%d` as signed `long` (Client.Java-020). |
| 1 | Correctness & logic bugs | Issues found: `stream-events` CLI text path still prints the proto `uint64 worker_sequence` with `%d` (Client.Java-023), the same bug Client.Java-020 fixed for `galaxy-watch`; `bench-read-bulk` includes failed-call durations in its success-latency histogram (Client.Java-024), mirroring the bug Client.Rust-015 fixed in Rust. |
| 2 | mxaccessgw conventions | No new issues found in this pass. |
| 3 | Concurrency & thread safety | Issues found: `MxEventStream.beforeStart` does not honour pre-start `close()` and leaks the gRPC call (Client.Java-014); `MxGatewayChannels.toCompletable` cancellation propagation is broken once the future is wrapped in `thenApply` (Client.Java-015). |
| 4 | Error handling & resilience | Issue found: `MxGatewaySecrets.redactCredentials` only inspects whitespace-delimited tokens, so colon/comma/quote-embedded `mxgw_` credentials leak through (Client.Java-018). |
| 5 | Security | Issue found: same `redactCredentials` leak — see Client.Java-018. |
| 6 | Performance & resource management | Issue found: client `close()` uses the *connect* timeout as its shutdown deadline (Client.Java-019). |
| 3 | Concurrency & thread safety | Issue found: `DeployEventStream` did not receive the deterministic terminal-state serialisation that Client.Java-002 added to `MxEventStream`, so a concurrent queue-overflow + `close()` race can still erase the overflow signal (Client.Java-021). |
| 4 | Error handling & resilience | No new issues found in this pass. |
| 5 | Security | No new issues found in this pass. The Client.Java-018 regex correctly handles colon/comma/quote/paren/URL embeddings and is verified by the existing fixture tests. |
| 6 | Performance & resource management | No new issues found in this pass. `shutdownTimeout` is consistently honoured everywhere `ownedChannel.shutdown()` is called — both clients delegate to the shared `MxGatewayChannels.shutdown` / `shutdownAndAwaitTermination` helpers. |
| 7 | Design-document adherence | No new issues found in this pass. |
| 8 | Code organization & conventions | Issue found: channel `close()` / `closeAndAwaitTermination()` are still duplicated verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` despite Client.Java-009's stated resolution (Client.Java-016). |
| 9 | Testing coverage | Issue found: CLI `FakeSession` does not implement the five bulk methods added to `MxGatewayCliSession`, so the CLI test module fails to compile against the current source (Client.Java-013). |
| 10 | Documentation & comments | Issue found: docs claim a 16-element event-stream buffer that is actually 1024 in production (Client.Java-017). |
| 8 | Code organization & conventions | Issue found: the CLI `CommonOptions.toClientOptions()` does not propagate `shutdownTimeout` to the underlying `MxGatewayClientOptions`, so CLI users have no way to override the new option introduced by Client.Java-019 (Client.Java-025). |
| 9 | Testing coverage | Issue found: there is no CLI-level test coverage for the `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, or `bench-read-bulk` subcommands — Client.Java-013 noted this as out-of-scope but never filed a follow-up (Client.Java-026). |
| 10 | Documentation & comments | Issue found: `MxGatewayChannels.toCompletable` Javadoc claims chained `thenApply` futures forward `cancel()` upstream to `CancellingCompletableFuture`, which is not true of `CompletableFuture.thenApply`; the implementation works only because all validator chains are inlined into the new `toCompletable(source, operation, validator)` overload (Client.Java-022). |
## Findings
@@ -329,3 +329,93 @@ new findings raised in this pass against the same checklist categories.
**Recommendation:** Print the sequence with `Long.toUnsignedString(event.getSequence())` (or switch the text format to `%s` and pass the unsigned-string conversion). The same rule should apply to any other `uint64` proto fields that surface in CLI text output.
**Resolution:** 2026-05-20 — Updated the `galaxy-watch` text-mode `out.printf` in `MxGatewayCli.GalaxyWatchCommand.call()` to use `%s` for the sequence field and pass `Long.toUnsignedString(event.getSequence())`, so deploy sequences past `2^63` render as their correct unsigned decimal string instead of a negative signed long. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. Regression test: `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output contains `seq=18446744073709551615` and does not contain `seq=-1`.
### Client.Java-021
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135` |
| Status | Resolved |
**Description:** Client.Java-002 fixed a deterministic terminal-state race in `MxEventStream` by introducing a `terminate(MxGatewayException)` method, a `terminalLock`, and a `terminated` flag so a `close()` arriving after a queue-overflow `offer()` cannot wipe the overflow exception. `DeployEventStream` — added later and structurally a copy of `MxEventStream` — never received the same fix. Its current `close()` does `closed.set(true); stream.cancel(...); offer(END);`, and its `offer()` overflow branch does `queue.clear(); queue.offer(new MxGatewayException("...queue overflowed")); queue.offer(END);` (lines 117-135). With these two paths running concurrently, the same sequence Client.Java-002 documented can repeat: the overflow branch enqueues `[overflowException, END]`, `close()` then calls `offer(END)` which sees the queue full and falls into the END branch (`queue.clear(); queue.offer(value);`), wiping the overflow exception and leaving a clean end-of-stream. The CLI `galaxy-watch` (and any `WatchDeployEvents` consumer) loses the overflow signal it was supposed to surface, defeating the fail-fast backpressure contract. The 16-element buffer on `DeployEventStream` makes overflow far less likely than on `MxEventStream` in practice, but the race is identical.
**Recommendation:** Mirror the `MxEventStream` fix: add a `terminated` flag and `terminalLock`, route `close()`, `onCompleted`, and the overflow branch through a single `terminate(MxGatewayException)` method that wins on first arrival, and add the regression analogous to `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose`. Given the two stream classes are now structural copies of each other, consider extracting the queue/terminate plumbing into a shared base or helper so the next fix lands once.
**Resolution:** 2026-05-20 — Mirrored the `MxEventStream` terminal-state serialisation in `DeployEventStream`: replaced the `AtomicBoolean closed` field with a `volatile boolean closed`, added a `terminalLock`/`terminated` pair, and routed all terminal paths (`close()`, `onCompleted()`, the overflow branch in `offer()`) through a single private `terminate(MxGatewayException fault)` method guarded by `synchronized (terminalLock) { if (terminated) return; terminated = true; ... }`. The first terminal condition wins: an overflow that publishes `[exception, END]` is no longer wiped by a subsequent `close()`/`onCompleted()` that previously took the "queue full → clear + offer(END)" branch. The class-level Javadoc now documents the single-consumer-thread iterator contract and the deterministic terminal transition, matching `MxEventStream`. Behavior outside the terminal path is unchanged: `beforeStart` still resolves the close-before-beforeStart race (Client.Java-014's deploy-stream counterpart, already in place), `take()` still surfaces interrupts, and the request stream is still cancelled on overflow/close. Regression tests in `GalaxyRepositoryClientTests`: `deployEventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic — capacity-2 stream, force overflow, then close, assert the overflow exception is surfaced) and `deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration concurrent race stress, mirrors `MxGatewayMediumFindingsTests.eventStreamConcurrentOverflowAndCloseAlwaysTerminate`).
### Client.Java-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:161-172` |
| Status | Resolved |
**Description:** The Javadoc on the no-validator `toCompletable(source, operation)` overload claims: "calling `cancel(true)` on either the direct return value or the user-facing chained future ultimately invokes `source.cancel(true)` (chained futures forward to the upstream stage they were derived from, which is this future)." This is not how `CompletableFuture.thenApply` (or `thenCompose`, `whenComplete`, etc.) actually behaves: a downstream stage's `cancel()` only marks that derived stage as cancelled, it does NOT propagate cancellation upstream to the originating `CancellingCompletableFuture`. The Client.Java-015 resolution actually fixes the bug by inlining the validator into the new `toCompletable(source, operation, validator)` overload (lines 224-252) so users never need a downstream stage, and by `GalaxyRepositoryClient.discoverHierarchyAsync` using an explicit `AtomicReference`-based override (which has a correct comment at line 218-221 acknowledging exactly this `thenCompose` limitation). The contradiction between the two adjacent comments will mislead the next maintainer who decides to add a convenience `.thenApply` on top of a `*Async` return value — they will assume cancellation still flows through and re-introduce the Client.Java-015 leak.
**Recommendation:** Rewrite the `toCompletable` Javadoc to state the actual contract: `cancel(...)` on the direct return value (the `CancellingCompletableFuture` instance) forwards to the source RPC, but `cancel(...)` on a `thenApply`/`thenCompose`/`thenAccept` *of* that future does not — the cancellation is captured at the derived stage and the upstream RPC continues until its deadline. Callers that need cancellation through a chained pipeline must follow the `discoverHierarchyAsync` pattern (custom `CompletableFuture` subclass tracking the current in-flight stage). The underlying `CancellingCompletableFuture` class doc (lines 254-258) is already correct; only the `toCompletable` paragraph is misleading.
**Resolution:** 2026-05-20 — Rewrote the `toCompletable(source, operation)` Javadoc in `MxGatewayChannels` to reflect the actual `CompletableFuture` contract. The doc now states unambiguously: cancelling the direct return value (the `CancellingCompletableFuture`) forwards to the source `ListenableFuture` and aborts the underlying gRPC call (the Client.Java-015 fix), but cancelling a derived `thenApply`/`thenCompose`/`thenAccept`/`whenComplete` stage of that future does NOT propagate cancellation upstream — the derived stage is marked cancelled while the source RPC continues until its deadline. The Javadoc explicitly directs callers that need cancellation through a chained pipeline to either the `toCompletable(source, operation, validator)` overload (which inlines the validator into the `FutureCallback.onSuccess` path so the user-visible future is the same future cancellation is bound to) or the `GalaxyRepositoryClient.discoverHierarchyAsync` `AtomicReference`-based pattern (for `thenCompose` across paged calls). The `CancellingCompletableFuture` class Javadoc was already correct and is unchanged. Doc-only change; no behavior change and no new test required.
### Client.Java-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1054`, `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:634` |
| Status | Resolved |
**Description:** `MxEvent.worker_sequence` is a proto `uint64` (line 634 of `mxaccess_gateway.proto`). The `stream-events` CLI text path prints it with `%d` (`client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());`), which interprets the underlying signed `long` value — sequences past `2^63` would render as a negative number. This is the exact same `uint64`-with-`%d` bug that Client.Java-020 fixed for the `galaxy-watch` `DeployEvent.sequence` field; the resolution's stated rule ("The same rule should apply to any other `uint64` proto fields that surface in CLI text output") was never extended to this site. In practice worker sequences will not reach `2^63` so this is latent rather than active, but the same fix and the same regression-test pattern apply.
**Recommendation:** Replace the `%d` with `%s` plus `Long.toUnsignedString(event.getWorkerSequence())` (matching the Client.Java-020 fix in `GalaxyWatchCommand`), and add a regression test analogous to `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` covering the `stream-events` text-mode format string with `-1L`. The `--after-worker-sequence` CLI option (line 1035) is also typed as a `long`, which means the user cannot pass an unsigned value above `2^63 - 1` from the command line; that is a related but separate ergonomic gap worth noting in the same change.
**Resolution:** 2026-05-20 — Updated the `stream-events` text-mode `client.out().printf` in `MxGatewayCli.StreamEventsCommand.call()` to use `%s` for the sequence and pass `Long.toUnsignedString(event.getWorkerSequence())`, mirroring the Client.Java-020 fix in `GalaxyWatchCommand`. Worker sequences past `2^63` now render as their correct unsigned decimal string instead of a negative signed long. An inline comment near the `printf` documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and is unchanged. The `--after-worker-sequence` `long` ergonomic gap is a separate v2 concern and intentionally out of scope. Regression test: `MxGatewayCliTests.streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output starts with `18446744073709551615 ` and does not start with `-1 `, mirroring `deployEventSequenceRendersAsUnsignedForHighUint64`.
### Client.Java-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883` |
| Status | Resolved |
**Description:** `BenchReadBulkCommand` records per-call latency in `latenciesNanos[latencyCount++] = elapsed;` inside *both* the success branch (line 865) and the `catch (Exception ex)` failure branch (line 880). The failed-call durations are then fed into the `percentileSummaryMs` p50/p95/p99 calculation alongside successful calls, producing misleading latency stats when even a few transport errors occur during the bench window. Client.Rust-015 fixed exactly this pattern in `clients/rust/src/bin/bench-read-bulk.rs` ("stop bench-read-bulk from polluting success-latency histograms with failed-call durations"); the equivalent fix was not applied to the Java implementation. The cross-language matrix runner (`scripts/run-client-e2e-tests.ps1`) compares numbers across all five clients, so the Java numbers will be silently inconsistent with the Rust numbers on the same fault profile.
**Recommendation:** Drop the failure-branch latency record (only count `failed++`), or alternately maintain a separate `failedLatenciesNanos` array and report it as a distinct stat in the JSON output — but the success histogram must not include failed-call latencies. Cross-check the .NET, Go, and Python `bench-read-bulk` drivers in the same change to make sure all five clients use the same success-latency definition; the cross-language matrix is only useful if the metric is uniform.
**Resolution:** 2026-05-20 — Dropped the failure-branch latency record in `BenchReadBulkCommand.call()`: the `catch (Exception ex)` block no longer appends `elapsed` to `latenciesNanos` and no longer grows the array — it only increments `failed++`. The success-latency histogram fed into `percentileSummaryMs` (p50/p95/p99/max/mean) is now success-call-only, matching the Client.Rust-015 fix. The JSON output still surfaces `failedCalls` as a distinct top-level count so observers see fault rates separately from latency. An inline comment on the catch block documents the contract so the next maintainer doesn't reinstate the record. New CLI test `MxGatewayCliTests.benchReadBulkCommandEmitsJsonSchemaKeys` (added under Client.Java-026 below) covers the JSON schema produced by the corrected path. The .NET / Go / Python bench drivers were intentionally left out of scope for this Java-focused finding — that cross-client audit is its own follow-up and tracked separately.
### Client.Java-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185` |
| Status | Resolved |
**Description:** `CommonOptions.toClientOptions()` populates the `MxGatewayClientOptions` builder with `endpoint`, `apiKey`, `plaintext`, `caCertificatePath`, `serverNameOverride`, and `callTimeout`, but never sets `shutdownTimeout` even though Client.Java-019 introduced it as a first-class option. CLI users therefore always inherit the 10-second default and have no way to override it from the command line, which makes the new option effectively client-library-only. CLI users running long-lived operations (a big `discover-hierarchy` page-chain, a streaming `galaxy-watch` session that needs to drain on Ctrl+C) cannot tune the shutdown deadline up; users running short health probes who want a small `connectTimeout` *and* a small `shutdownTimeout` to keep the CLI snappy on failure also cannot.
**Recommendation:** Add a `--shutdown-timeout` option to `CommonOptions` (parsed via the existing `parseDuration` helper, default unset → use the 10-second library default) and propagate it into `toClientOptions()` so the CLI surface tracks the library surface. Include the resolved value in `redactedJsonMap()` so `--json` output shows the effective shutdown deadline.
**Resolution:** 2026-05-20 — Added a `--shutdown-timeout` option to `CommonOptions` in `MxGatewayCli.java`, parsed via the existing `parseDuration` helper (so it accepts `10s`, `500ms`, ISO-8601 `PT10S`, etc.). A new lazy accessor `resolvedShutdownTimeout()` returns the parsed `Duration` when the user passed `--shutdown-timeout`, or `null` when unset so the `MxGatewayClientOptions` builder default (10s, established by Client.Java-019) applies. `toClientOptions()` now conditionally calls `builder.shutdownTimeout(resolvedShutdownTimeout)` only when the user opted in, preserving the library default for the common case. `redactedJsonMap()` includes the resolved value under key `"shutdownTimeout"` (empty string when unset) so `--json` output shows the effective shutdown deadline. The CLI surface now tracks the library surface so a user running a long page-chain can pass `--shutdown-timeout 60s`, and a user running a short health probe can pair `--timeout 500ms` with `--shutdown-timeout 500ms` to keep the CLI snappy on failure. Behavior for callers who do not pass the new flag is unchanged.
### Client.Java-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java` |
| Status | Resolved |
**Description:** Client.Java-013 explicitly deferred adding CLI-level test coverage for the `read-bulk`, `write-bulk`, and `bench-read-bulk` subcommands ("Optionally also add at least one CLI-level test for `read-bulk`, `write-bulk`, and the `bench-read-bulk` subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix"), and the resolution explicitly stated that "follow-up is tracked separately and out of scope for this unblock-compilation fix." That follow-up was never filed. The current `MxGatewayCliTests` only covers `version`, `open-session` (JSON redaction), `write`, `smoke`, `subscribe-bulk`, `unsubscribe-bulk`, and the Client.Java-020 unsigned-uint64 format string — six of the thirteen non-trivial subcommands the CLI ships are completely untested at the CLI layer (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`), as are `stream-events`, the four `galaxy-*` commands, and `close-session`. The `FakeSession` stubs all return empty lists, so an end-to-end CLI test would catch JSON-shape regressions, argument-parsing bugs, and option contract breaks that the bulk Session unit tests on the library side do not exercise. This same coverage gap is what made Client.Java-013 itself only surface on a clean Gradle build.
**Recommendation:** Add at least one round-trip CLI test per bulk subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`) that exercises the JSON output shape and the value parser (`parseValue(type, text)` is shared across all five and the only `write*-bulk` path that catches typos in the type switch). Extending the `FakeSession` stubs to return at least one result row makes the assertions meaningful. The `bench-read-bulk` test can run with a 1-second `--duration-seconds` and a 0-second `--warmup-seconds` and assert the JSON schema keys (`totalCalls`, `latencyMs.p50`, `callsPerSecond`) rather than the numeric values.
**Resolution:** 2026-05-20 — Added round-trip CLI tests for all six bulk-family subcommands plus the new Client.Java-023 unsigned-uint64 regression to `MxGatewayCliTests`. The `FakeSession` stubs were upgraded from empty-list returns to per-call recorders that publish the parsed entries (e.g. `lastWriteBulkEntries`, `lastReadBulkTimeoutMs`) and synthesise one `BulkReadResult`/`BulkWriteResult` per requested handle so the JSON output assertions exercise the `bulkReadResultMap` and `bulkWriteResultMap` serialisers. New tests: (a) `readBulkCommandForwardsTimeoutAndPrintsResults` — asserts `--timeout-ms 750` reaches the session and the JSON output carries the per-tag `tagAddress`, `itemHandle`, `wasCached`, and `quality` fields; (b) `writeBulkCommandParsesTypedValuesAndPrintsResults` — asserts `--type int32 --values 111,222 --user-id 5` parses through the shared `parseValue` switch and the entries are constructed with the expected typed `MxValue` and `userId`; (c) `write2BulkCommandForwardsTimestampAndPrintsResults` — asserts the `--timestamp 2026-05-20T00:00:00Z` reaches the entry as a `timestampValue` (`hasTimestampValue()` is true); (d) `writeSecuredBulkCommandForwardsUserIdsAndPrintsResults` — asserts `--current-user-id 7 --verifier-user-id 8` are both propagated; (e) `writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults` — combination of (c) and (d); (f) `benchReadBulkCommandEmitsJsonSchemaKeys` — runs the bench in a 1s steady / 0s warmup window and asserts the JSON output contains the cross-language schema keys (`language=java`, `command=bench-read-bulk`, `bulkSize=2`, `totalCalls`, `successfulCalls`, `failedCalls`, `callsPerSecond`, `latencyMs.p50/p95/p99`, `tags` including the synthesised `TestMachine_001.TestChangingInt`/`TestMachine_002.TestChangingInt` pair); (g) `streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` — Client.Java-023 regression. The recommendation's stream-events and galaxy-* CLI tests were intentionally not added in this round — they require either an in-process gateway/galaxy server or package-private `MxEventStream`/`DeployEventStream` constructor access from the CLI test module, which is its own infrastructure work; the library-side tests in `GalaxyRepositoryClientTests` already cover the streaming wire behaviour.