Files
Joseph Doherty d3cb311aae Resolve Client.Java-032..036: shared subscription base, batch tokenizer
Client.Java-032  README CLI examples for stream-alarms and
                 acknowledge-alarm now use the correct picocli flags
                 (--filter-prefix and --reference); two regression
                 tests parse each documented invocation.
Client.Java-033  StreamAlarmsCommand publishes an
                 AtomicReference<MxGatewayAlarmFeedSubscription> and
                 mirrors MxEventStream's overflow branch: a failed
                 queue.offer cancels the subscription, queues an
                 IllegalStateException, then queues the END sentinel
                 — preserving the fail-fast contract.
Client.Java-034  BatchCommand routes through a new
                 MxGatewayCli.tokenizeBatchLine POSIX-style shell
                 tokenizer that respects double-quoted, single-quoted,
                 and backslash-escaped arguments.
Client.Java-035  Added streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages
                 to MxGatewayClientSessionTests; asserts request shape,
                 message ordering, and cancellation propagation.
Client.Java-036  Extracted MxGatewayStreamSubscription<TRequest,TResponse>
                 abstract base; the four subscription classes
                 (MxGatewayEventSubscription, MxGatewayAlarmFeedSubscription,
                 MxGatewayActiveAlarmsSubscription, DeployEventSubscription)
                 collapse to ~10-line subclasses. A new contract test
                 runs identical lifecycle / cancellation assertions
                 across all four subclasses.

All resolved at 2026-05-24; gradle build + gradle test BUILD SUCCESSFUL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 09:29:27 -04:00

666 lines
98 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — Client.Java
| Field | Value |
|---|---|
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 0 |
## Checklist coverage
A third-pass review against commit `a020350` (the sweep that resolved
Client.Java-013 through Client.Java-020). Prior findings are unchanged; new
findings raised in this pass are numbered Client.Java-021 onward.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: `stream-events` CLI text path still prints the proto `uint64 worker_sequence` with `%d` (Client.Java-023), the same bug Client.Java-020 fixed for `galaxy-watch`; `bench-read-bulk` includes failed-call durations in its success-latency histogram (Client.Java-024), mirroring the bug Client.Rust-015 fixed in Rust. |
| 2 | mxaccessgw conventions | No new issues found in this pass. |
| 3 | Concurrency & thread safety | Issue found: `DeployEventStream` did not receive the deterministic terminal-state serialisation that Client.Java-002 added to `MxEventStream`, so a concurrent queue-overflow + `close()` race can still erase the overflow signal (Client.Java-021). |
| 4 | Error handling & resilience | No new issues found in this pass. |
| 5 | Security | No new issues found in this pass. The Client.Java-018 regex correctly handles colon/comma/quote/paren/URL embeddings and is verified by the existing fixture tests. |
| 6 | Performance & resource management | No new issues found in this pass. `shutdownTimeout` is consistently honoured everywhere `ownedChannel.shutdown()` is called — both clients delegate to the shared `MxGatewayChannels.shutdown` / `shutdownAndAwaitTermination` helpers. |
| 7 | Design-document adherence | No new issues found in this pass. |
| 8 | Code organization & conventions | Issue found: the CLI `CommonOptions.toClientOptions()` does not propagate `shutdownTimeout` to the underlying `MxGatewayClientOptions`, so CLI users have no way to override the new option introduced by Client.Java-019 (Client.Java-025). |
| 9 | Testing coverage | Issue found: there is no CLI-level test coverage for the `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, or `bench-read-bulk` subcommands — Client.Java-013 noted this as out-of-scope but never filed a follow-up (Client.Java-026). |
| 10 | Documentation & comments | Issue found: `MxGatewayChannels.toCompletable` Javadoc claims chained `thenApply` futures forward `cancel()` upstream to `CancellingCompletableFuture`, which is not true of `CompletableFuture.thenApply`; the implementation works only because all validator chains are inlined into the new `toCompletable(source, operation, validator)` overload (Client.Java-022). |
### 2026-05-24 review (commit d692232)
Re-review pass at `d692232`. Diff against `a020350` is commit `397d3c5`:
gradle subprojects renamed `mxgateway-client``zb-mom-ww-mxgateway-client`,
`mxgateway-cli``zb-mom-ww-mxgateway-cli`. Java package change
`com.dohertylan.mxgateway.*``com.zb.mom.ww.mxgateway.*` (source directories
moved from `com/dohertylan/mxgateway/` to `com/zb/mom/ww/mxgateway/`).
`settings.gradle` and root `build.gradle` updated for the new project /
group names. CLI mainClass updated. The `gradle build` task at HEAD is green;
documentation updates lag — see the doc-side findings below.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
| 2 | mxaccessgw conventions | Issues found: Client.Java-031 (README prose still uses the old short project names in plain text — the actual subproject names that drive Gradle / IDE imports are the prefixed ones). |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | No issues found in this diff. |
| 5 | Security | No issues found in this diff. |
| 6 | Performance & resource management | No issues found in this diff. |
| 7 | Design-document adherence | Issues found: Client.Java-028 (`JavaClientDesign.md` build-layout example still cites the old `com/dohertylan/mxgateway/` package paths). |
| 8 | Code organization & conventions | Issues found: Client.Java-030 (no new test exercises the regenerated `QueryActiveAlarmsRequest` RPC path). |
| 9 | Testing coverage | Cross-reference to Client.Java-030; the bulk-command gaps tracked under Client.Java-026 remain. |
| 10 | Documentation & comments | Issues found: Client.Java-027 (Gradle task names in README and JavaClientDesign still reference the old `:mxgateway-client:` and `:mxgateway-cli:` paths — every command in the README breaks if copy-pasted); Client.Java-029 (`README.md:209` cites `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli` but the actual install path contains a doubled directory `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/`). |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037`. Diff against `d692232` is five commits touching
`clients/java`: `10bd0c0` (Client.Java-027..031 fixes), `71d2c39` (port
`batch` subcommand), `f90bff0` (port bulk read/write SDK + CLI subcommands),
`8a0c59d` (port `stream-alarms` + `acknowledge-alarm` to the Java CLI plus a
new `MxGatewayClient.streamAlarms` SDK method returning
`MxGatewayAlarmFeedSubscription`), and `8738735` (README updates for the
alarm surface). `gradle build` from `clients/java` is green; new findings
focus on the alarm surface and the `batch` subcommand. Prior findings
Client.Java-001..031 are unchanged.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: `BatchCommand` tokenises stdin with `line.trim().split("\\s+")`, so any argument with embedded whitespace (e.g. `--comment "needs verification"`) is shredded mid-string (Client.Java-034); `StreamAlarmsCommand` uses `queue.offer(value)` on a bounded queue with no overflow detection, silently dropping alarm messages when the gateway pushes faster than the consumer drains (Client.Java-033). |
| 2 | mxaccessgw conventions | Cross-reference to Client.Java-033 — silently-dropping events violates `JavaStyleGuide.md` ("Do not reorder, coalesce, or drop events in client code") and the fail-fast event-backpressure contract. No new finding raised separately. |
| 3 | Concurrency & thread safety | No issues found in this diff. `MxGatewayAlarmFeedSubscription` correctly mirrors `MxGatewayEventSubscription`'s `beforeStart`-vs-`cancel` race fix (Client.Java-014). |
| 4 | Error handling & resilience | No issues found in this diff. |
| 5 | Security | No issues found in this diff. |
| 6 | Performance & resource management | No issues found in this diff. |
| 7 | Design-document adherence | No issues found in this diff. The new `streamAlarms` SDK surface matches `gateway.md`'s always-on alarm-monitor design (no worker session, multi-subscriber). |
| 8 | Code organization & conventions | Issue found: `MxGatewayAlarmFeedSubscription` is a structural copy of `MxGatewayEventSubscription` (same fields, same `beforeStart`/`cancel` shape) — the subscription-class family should share a common base or helper (Client.Java-036). |
| 9 | Testing coverage | Issue found: the new `MxGatewayClient.streamAlarms` SDK method has no library-side test in `zb-mom-ww-mxgateway-client/src/test/...` — only the CLI test exercises the path via a `FakeClient.streamAlarms` override that bypasses the production `subscription.wrap(observer)` glue (Client.Java-035). |
| 10 | Documentation & comments | Issue found: README (`clients/java/README.md:182-183`) documents the new `stream-alarms` and `acknowledge-alarm` commands with `--session-id <id>` (neither command has that option) and `acknowledge-alarm --alarm-reference …` (actual flag is `--reference`) — every documented invocation fails at picocli parse time (Client.Java-032). |
## Findings
### Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Status | Resolved |
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
**Resolution:** (2026-05-18) Confirmed against source: the old `substring(0,4) + stars + substring(len-4)` echoed the last four secret characters. `redactApiKey` now masks the secret entirely: for gateway-shaped keys it returns the non-secret `mxgw_<key-id>_` prefix followed by `***` (locating the secret separator as the first `_` after `mxgw_`); any non-gateway-shaped token returns `<redacted>`. No leading/trailing secret characters are ever emitted. The pre-existing `MxGatewayCliTests.openSessionJsonRedactsApiKey` assertion that hardcoded the leaky `mxgw***********cret` form was corrected to assert the masked `mxgw_visible_***` form. Regression tests: `MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret`, `redactApiKeyForNonGatewayShapedKeyRevealsNothing`, `redactApiKeyStillHandlesNullAndShortInput`.
### Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Status | Resolved |
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
**Resolution:** (2026-05-18) Confirmed against source: the old `offer()` END-branch did `queue.clear(); queue.offer(END)` when full, so a `close()` arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single `terminate(MxGatewayException)` method guarded by a `terminated` flag and a `terminalLock`; the first terminal condition wins and a later `close()`/`END` cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while `close()` is safe from any thread. Regression tests: `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic) and `eventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration race stress).
### Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Status | Resolved |
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
**Resolution:** (2026-05-18) Confirmed against source: neither `openSessionRaw` nor `openSessionAsync` read `getGatewayProtocolVersion()`. Added a private `ensureGatewayProtocolCompatible` helper, called from both `openSessionRaw` and `openSessionAsync`, that throws `MxGatewayException` with a clear mismatch message when the gateway reports a non-zero version differing from `MxGatewayClientVersion.gatewayProtocolVersion()`. A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. `clients/java/README.md` documents the new fail-fast check. Regression tests: `MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion` and `openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion`.
### Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Status | Resolved |
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
**Resolution:** (2026-05-18) Confirmed against source: all three methods returned `reply.getReturnValue().getInt32Value()` (which yields `0` for an unset message field) when the typed payload was absent. Each method now guards the fallback with `reply.hasReturnValue()` and throws `MxGatewayException` describing the protocol violation when neither the typed payload nor a `return_value` is present. The legitimate `return_value` fallback is preserved. Regression tests: `MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, `addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, and `addItemStillHonoursReturnValueFallback`.
### Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Status | Resolved |
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
**Resolution:** (2026-05-18) Confirmed against source: `close()` called `closeRaw()` directly, so a `CloseSession` RPC failure propagated out of try-with-resources and replaced the body exception. `close()` now catches `MxGatewayException` from `closeRaw()` and logs it at WARNING via `System.Logger` instead of rethrowing, so a close-time failure never masks the body exception. `closeRaw()` is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use `closeRaw()` for explicit close handling are documented in `clients/java/README.md` and the `close()` Javadoc. Regression tests: `MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException` and `closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt`.
### Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Status | Resolved |
**Description:** `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. `closeAndAwaitTermination()` does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
**Recommendation:** Have `close()` await termination for a bounded time and `shutdownNow()` on timeout (the logic already in `closeAndAwaitTermination()`), or document that try-with-resources callers should call `closeAndAwaitTermination()`.
**Resolution:** (2026-05-18) Confirmed against source: both `MxGatewayClient.close()` and `GalaxyRepositoryClient.close()` called only `ownedChannel.shutdown()`. `close()` in both clients now performs the bounded-wait logic previously only in `closeAndAwaitTermination()`: it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls `shutdownNow()` on timeout. Because `close()` cannot throw a checked exception, an `InterruptedException` while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. `closeAndAwaitTermination()` is retained unchanged for callers who want the checked, blocking-aware variant. `clients/java/README.md` documents the new try-with-resources `close()` semantics.
### Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Status | Resolved |
**Description:** The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription` pre-start cancellation, and `MxEventStream` queue overflow are likewise untested. `JavaClientDesign.md` explicitly lists async stream-observer cancellation and status/error mapping as required tests.
**Recommendation:** Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
**Resolution:** (2026-05-18) Confirmed against source: no test referenced `acknowledgeAlarm`, `queryActiveAlarms`, `streamEventsAsync`, TLS construction, or `MxEventStream` overflow. Added `MxGatewayLowFindingsTests` (12 tests) covering: `acknowledgeAlarm`/`acknowledgeAlarmAsync` (success, typed protocol-failure, async transport-failure normalisation), `queryActiveAlarms` observer delivery, `MxGatewayActiveAlarmsSubscription` and `MxGatewayEventSubscription` pre-start cancellation, `streamEventsAsync` observer delivery, `MxEventStream` queue overflow surfacing `MxGatewayException`, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes `GrpcSslContexts` throw `IllegalArgumentException` (not `SSLException`), which the old `catch (SSLException)` let escape unwrapped — the catch in the shared channel builder was broadened to also wrap `RuntimeException` so callers always see one typed `MxGatewayException`.
### Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
| Status | Resolved |
**Description:** `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation, unlike the synchronous paths which normalise via `try/catch`. The async and sync error surfaces are therefore inconsistent.
**Recommendation:** Wrap the `thenApply` body so any non-`MxGatewayException` is routed through `MxGatewayErrors.fromGrpc`, matching the synchronous methods.
**Resolution:** (2026-05-18) Confirmed against source: the `thenApply` validators in `openSessionAsync`, `invokeAsync`, and `acknowledgeAlarmAsync` were not normalised — in practice the gateway's own validators (`ensureProtocolSuccess`, `ensureMxAccessSuccess`, `ensureGatewayProtocolCompatible`) only ever throw `MxGatewayException`, but a stray non-`MxGatewayException` `RuntimeException` (e.g. an NPE from a malformed reply) would surface raw inside `CompletionException`. Added `MxGatewayChannels.normalisingValidator(operation, fn)`: it rethrows `MxGatewayException` unchanged and routes any other `RuntimeException` through `MxGatewayErrors.fromGrpc`, matching the synchronous `try/catch` paths. All three async `thenApply` sites now use it. Regression test: `MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator`.
### Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
| Status | Resolved |
**Description:** `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other.
**Recommendation:** Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
**Resolution:** (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private `MxGatewayChannels` utility class holding `createChannel(options, tlsErrorPrefix)`, `withDeadline(stub, options)`, `withStreamDeadline(stub, options)`, `toCompletable(future, operation)`, and the new `normalisingValidator` helper (Client.Java-008). Both `MxGatewayClient` and `GalaxyRepositoryClient` now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into `MxGatewayErrors.fromGrpc` is now the specific RPC name instead of the generic `"async call"`/`"galaxy async call"`. Verified by the full existing async test suite plus the new `MxGatewayLowFindingsTests`.
### Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
| Status | Resolved |
**Description:** The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
**Recommendation:** Reconcile the scope names with `src/MxGateway.Server/Security/` and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
**Resolution:** (2026-05-18) Partially re-triaged. Verified against `src/MxGateway.Server/Security/Authorization/GatewayScopes.cs` and `GatewayGrpcScopeResolver.cs`: the canonical scope catalog is `session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`. (a) The README's `metadata:read` for the Galaxy Repository is **correct**`TestConnectionRequest`/`GetLastDeployTimeRequest`/`DiscoverHierarchyRequest`/`WatchDeployEventsRequest` all resolve to `GatewayScopes.MetadataRead`; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define `metadata:read`. (b) The `acknowledgeAlarm` Javadoc's `invoke:alarm-ack` is **wrong** — no such scope exists. `AcknowledgeAlarmRequest` and `QueryActiveAlarmsRequest` are not special-cased in `GatewayGrpcScopeResolver`, so they fall through the `_ => GatewayScopes.Admin` default and require the `admin` scope. The Javadoc was corrected to state the `admin` scope; `queryActiveAlarms` did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.
### Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
| Status | Resolved |
**Description:** The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
**Recommendation:** Confirm fail-fast is intended (it appears to be); if so, document it on `MxEventStream` so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
**Resolution:** (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and `docs/DesignDecisions.md` make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the `MxEventStream` class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an `MxGatewayException`, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. `clients/java/README.md` carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by `MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext`.
### Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
| Status | Resolved |
**Description:** `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
**Recommendation:** Make `resolved()` return an immutable resolved value object, or compute `resolvedApiKey`/`resolvedTimeout` lazily in their getters so call ordering cannot produce stale output.
**Resolution:** (2026-05-18) Confirmed against source: `resolved()` populated the `resolvedApiKey`/`resolvedTimeout` mutable fields and `toClientOptions()`/`redactedJsonMap()` read them, so calling either before `resolved()` emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods `resolvedApiKey()` and `resolvedTimeout()` that compute their value on each call (API key from `--api-key` or the `--api-key-env` variable; timeout via `parseDuration`). `toClientOptions()` and `redactedJsonMap()` now call those accessors directly, so call ordering can no longer produce stale output. `resolved()` is retained as a no-op returning `this` purely for call-site readability (`common.resolved()`), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing `MxGatewayCliTests` JSON-redaction and option-parsing tests.
### Client.Java-013
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244` |
| Status | Resolved |
**Description:** `MxGatewayCliSession` in `MxGatewayCli.java:1214` was extended in commit `f220908` (the "bulk read/write CLI subcommands" change) with five new abstract methods — `readBulk`, `writeBulk`, `write2Bulk`, `writeSecuredBulk`, `writeSecured2Bulk`. The test-only `FakeSession` in `MxGatewayCliTests.java:212` still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of `mxgateway-cli`'s test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale `.class` files under `build/classes/java/test/` predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. `gradle test` (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The `CLAUDE.md` source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.
**Recommendation:** Add the five missing `@Override` implementations to `FakeSession` (stubs returning empty lists are fine — only `subscribeBulk`/`unsubscribeBulk` are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for `read-bulk`, `write-bulk`, and the `bench-read-bulk` subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.
**Resolution:** 2026-05-20 — Added the five missing `@Override` stubs (`readBulk`, `writeBulk`, `write2Bulk`, `writeSecuredBulk`, `writeSecured2Bulk`) to `FakeSession` in `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java`, each returning an empty `ArrayList<>` to match the interface return types (`List<BulkReadResult>` / `List<BulkWriteResult>`) without throwing. Imported `BulkReadResult`, `BulkWriteResult`, `WriteBulkEntry`, `Write2BulkEntry`, `WriteSecuredBulkEntry`, `WriteSecured2BulkEntry` from `mxaccess_gateway.v1.MxaccessGateway`. `GrpcMxGatewayCliSession` in `MxGatewayCli.java` is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with `gradle clean` followed by `gradle :mxgateway-cli:compileTestJava` and `gradle :mxgateway-cli:test` from `clients/java`, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.
### Client.Java-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124` |
| Status | Resolved |
**Description:** `MxEventStream.observer().beforeStart` simply assigns `requestStream` without checking the `closed` flag, while `close()` reads `requestStream` after setting `closed = true`. If `close()` runs *before* the gRPC call has attached its `ClientCallStreamObserver` (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a `finally` block when an unrelated setup step throws), then at close time `requestStream` is `null`, so `stream.cancel(...)` is skipped. `beforeStart` then fires later, stores the live `requestStream`, and never observes `closed` — the underlying gRPC call leaks open and continues delivering events to a `MxEventStream` whose consumer has stopped iterating. The sibling `DeployEventStream.beforeStart` already does the correct thing (`if (closed.get()) { requestStream.cancel(...); }`); the two adaptors should behave identically.
**Recommendation:** Mirror `DeployEventStream`'s pattern in `MxEventStream.beforeStart`: after storing `requestStream`, check the `closed` flag and cancel the stream eagerly if a prior `close()` has already fired. Add a regression test analogous to `GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream` to lock in the behavior.
**Resolution:** 2026-05-20 — Mirrored `DeployEventStream.beforeStart` in `MxEventStream.beforeStart`: after storing the `ClientCallStreamObserver`, the observer now reads the `closed` flag and calls `requestStream.cancel("client cancelled event stream", null)` when a prior `close()` already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing `volatile boolean closed` field (already established as a happens-before publisher by `close()` setting it before reading `requestStream`); no field shape changes were needed. `clients/java/README.md` documents the new safe-close-before-beforeStart contract. Regression test: `MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream` (mirrors `GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream`).
### Client.Java-015
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138`, `MxGatewayClient.java:183-191,224-232,322-329`, `GalaxyRepositoryClient.java:164-170,212-214` |
| Status | Resolved |
**Description:** `MxGatewayChannels.toCompletable` registers a `whenComplete` on the local `target` future to forward cancellation to the source gRPC `ListenableFuture`. Every caller — `openSessionAsync`, `invokeAsync`, `acknowledgeAlarmAsync`, `discoverHierarchyPageAsync`, `getLastDeployTimeAsync` — then chains `.thenApply(normalisingValidator(...))` or `.thenApply(::getOk)` and returns the *chained* future to the user. `CompletableFuture.thenApply` returns a new future whose cancellation does **not** propagate back to the source `target`. Cancelling the user-facing future therefore never sets `target.isCancelled() == true`, so `source.cancel(true)` is never invoked and the underlying gRPC call continues until its deadline expires. The `JavaClientDesign.md` "Streaming" section explicitly says "Stream cancellation should call `ClientCall.cancel`" — the same expectation reasonably applies to the unary `*Async` surface.
**Recommendation:** Either return `target` directly from each `*Async` method (and inline the validator into the `FutureCallback.onSuccess` path so no `thenApply` is needed), or attach the cancellation listener to the *final* returned future. The cleanest fix is to have `MxGatewayChannels.toCompletable` return a future that wraps the validator internally and registers `whenComplete` on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a `ServerCallStreamObserver.setOnCancelHandler` latch).
**Resolution:** 2026-05-20 — Fixed by inlining the reply validator into `MxGatewayChannels.toCompletable` so the user-visible future is the same future cancellation is bound to: added a new `toCompletable(source, operation, validator)` overload that runs the validator inside the `FutureCallback.onSuccess` path (normalising non-`MxGatewayException` `RuntimeException`s through `MxGatewayErrors.fromGrpc`, matching the existing synchronous `try/catch`). Replaced the previous `whenComplete`-based cancellation listener with a small `CancellingCompletableFuture<T>` subclass whose `cancel(boolean)` forwards to the source `ListenableFuture.cancel(...)` unconditionally, so even the no-validator overload propagates cancellation deterministically (the `whenComplete` listener only fired when `target.isCancelled()` was already true, which is exactly the case `thenApply` broke). Updated `MxGatewayClient.openSessionAsync`, `MxGatewayClient.invokeAsync`, `MxGatewayClient.acknowledgeAlarmAsync`, `GalaxyRepositoryClient.testConnectionAsync`, and `GalaxyRepositoryClient.getLastDeployTimeAsync` to use the new validator overload directly (no `.thenApply` chain). `GalaxyRepositoryClient.discoverHierarchyAsync` is paged via `thenCompose`, so it now publishes the current in-flight page future via an `AtomicReference` and returns a top-level `CompletableFuture` whose overridden `cancel(boolean)` cancels whichever page is currently outstanding. `clients/java/README.md` documents the new cancellation contract: cancelling any `*Async` future aborts the underlying gRPC call. Regression tests: `MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall` (full in-process gRPC test using `ServerCallStreamObserver.setOnCancelHandler` to latch when the server observes RPC cancellation), `toCompletableValidatorOverloadForwardsCancellationToSource`, and `toCompletableNoValidatorOverloadForwardsCancellationToSource` (unit-level proofs that both `MxGatewayChannels.toCompletable` overloads forward `cancel(true)` to the source `ListenableFuture`).
### Client.Java-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391`, `GalaxyRepositoryClient.java:285-315` |
| Status | Resolved |
**Description:** Client.Java-009 introduced `MxGatewayChannels` to deduplicate `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable`. The two `close()` / `closeAndAwaitTermination()` methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of `MxGatewayClient.close()` + `closeAndAwaitTermination()` and `GalaxyRepositoryClient.close()` + `closeAndAwaitTermination()` are now duplicated verbatim, including the `awaitTermination(connectTimeout)` semantic (see Client.Java-019), the `InterruptedException` handling, and the `ownedChannel == null` guard. A fix to one path (e.g. introducing a dedicated `shutdownTimeout` option) will silently miss the other.
**Recommendation:** Move the shutdown logic into `MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options)` and `MxGatewayChannels.shutdownAndAwaitTermination(...)`. Have both clients delegate to it. Same recommendation applies to the duplicated `MxGatewayAuthInterceptor` construction in the two constructors (`MxGatewayClient(Channel, ...)` and `GalaxyRepositoryClient(Channel, ...)`).
**Resolution:** 2026-05-20 — Extracted the duplicated shutdown logic into `MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions)` and `MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions)`. Both helpers handle the `ownedChannel == null` no-op, the orderly-shutdown / `awaitTermination` / `shutdownNow`-on-timeout escalation, and the `InterruptedException`-restoring-the-interrupt-flag path. `MxGatewayClient.close()`/`closeAndAwaitTermination()` and `GalaxyRepositoryClient.close()`/`closeAndAwaitTermination()` are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's `shutdownTimeout`) lives in one place. Unused `java.util.concurrent.TimeUnit` imports were removed from both clients. The constructor-level `MxGatewayAuthInterceptor` duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: `MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel` (covers the null-channel guard), `shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout`, and `shutdownEscalatesToShutdownNowWhenTimeoutExceeded` (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).
### Client.Java-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36`, `clients/java/README.md:99-107` |
| Status | Resolved |
**Description:** `MxEventStream.streamEvents` was recently widened from a 16-element buffer to a 1024-element buffer (`MxGatewayClient.streamEvents` at line 268: `new MxEventStream(1024)`). The class-level Javadoc on `MxEventStream` still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and `clients/java/README.md` line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The `MxGatewayClient.streamEvents` inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.
**Recommendation:** Update the `MxEventStream` Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through `MxGatewayClientOptions` so callers can tune it per session.
**Resolution:** 2026-05-20 — Updated the `MxEventStream` class Javadoc and `clients/java/README.md` so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (`MxGatewayClient.streamEvents`) passes `1024` to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through `MxGatewayClientOptions` was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and `MxGatewayClient.streamEvents` is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.
### Client.Java-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66` |
| Status | Resolved |
**Description:** `redactCredentials(value)` splits its input on `\\s+` (whitespace) and only redacts whitespace-delimited tokens that start with `mxgw_` or equal `bearer` (case-insensitive). gRPC `Status.getDescription()` strings, log lines, and proto error messages can carry credentials separated by colons (`Bearer:mxgw_id_secret`), commas (`token=mxgw_id_secret,scope=...`), single quotes (`'mxgw_id_secret'`), parentheses (`(mxgw_id_secret)`), or embedded in URLs/paths — all of which leave the `mxgw_` token attached to a non-whitespace neighbour and survive redaction. `MxGatewayErrors.fromGrpc` is the primary consumer; a gateway error description like `authentication failed: 'mxgw_id_secret'` would round-trip the secret into the resulting `MxGatewayAuthenticationException` message.
**Recommendation:** Replace the whitespace-split scrub with a regex-based pass that matches `mxgw_[A-Za-z0-9_-]+` anywhere in the string and substitutes `<redacted>`; also redact `Bearer\s+\S+` as a unit so the token after `Bearer` is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside `MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted` that asserts a quoted or comma-delimited credential is fully masked.
**Resolution:** 2026-05-20 — Replaced the whitespace-split scrub with two compiled `Pattern` regexes: `mxgw_[A-Za-z0-9_-]+` matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and `(?i)bearer\s+\S+` masks an authorization-header style `Bearer <token>` as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes `Bearer <redacted>` for the common combined case and renders it idempotently. Regression tests in `MxGatewayFixtureTests`: `redactCredentialsHandlesNonWhitespaceDelimitedTokens` exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; `redactCredentialsLeavesBenignContentAlone` confirms strings without credentials and a `null` input are unchanged.
### Client.Java-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391`, `GalaxyRepositoryClient.java:286-315` |
| Status | Resolved |
**Description:** Both clients' `close()` / `closeAndAwaitTermination()` use `options.connectTimeout()` as the upper bound on `awaitTermination`. The `connectTimeout` semantically describes how long the client will wait to *establish* the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after `shutdown()`. With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to `shutdownNow()` and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on `withDeadline`. Conversely, a caller who sets a small `connectTimeout` (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.
**Recommendation:** Introduce a dedicated `shutdownTimeout` on `MxGatewayClientOptions` (defaulting to e.g. 510 s independent of `connectTimeout`) and use it in `close()` and `closeAndAwaitTermination()`. Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.
**Resolution:** 2026-05-20 — Added a dedicated `shutdownTimeout` `Duration` on `MxGatewayClientOptions` (builder method `shutdownTimeout(Duration)`, accessor `shutdownTimeout()`, default 10 s), independent of `connectTimeout`. Both shared shutdown helpers introduced for Client.Java-016 (`MxGatewayChannels.shutdown` and `shutdownAndAwaitTermination`) call `options.shutdownTimeout()` as the `awaitTermination` upper bound, so a small `connectTimeout` (e.g. a 500 ms health-probe timeout) no longer forces a premature `shutdownNow()` on in-flight calls. The new option is reflected in `toString()` and documented on both helpers and the `close()`/`closeAndAwaitTermination()` Javadoc on both clients; `clients/java/README.md` notes the default and the independence from `connectTimeout`. Regression tests in `MxGatewayLowFindingsIITests`: `shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout` (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to `shutdownNow()`), `shutdownEscalatesToShutdownNowWhenTimeoutExceeded` (a stuck channel beyond the shutdown timeout is forcibly shut down), and `shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout` (the default holds even when `connectTimeout` is small).
### Client.Java-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254`, `galaxy_repository.proto:94` |
| Status | Resolved |
**Description:** `galaxy_repository.proto` defines `DeployEvent.sequence` as `uint64`; the protobuf Java mapping projects that to a signed `long`. The CLI's text-mode `galaxy-watch` output prints it as `"seq=%d ..."`, which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach `2^63`), but the broader pattern is brittle: any unsigned proto field printed via `%d` will display incorrectly past the signed boundary. The JSON path uses `protoJson(event)` which formats unsigned longs as numeric strings via `JsonFormat`, so JSON output is correct; only the text mode is at risk.
**Recommendation:** Print the sequence with `Long.toUnsignedString(event.getSequence())` (or switch the text format to `%s` and pass the unsigned-string conversion). The same rule should apply to any other `uint64` proto fields that surface in CLI text output.
**Resolution:** 2026-05-20 — Updated the `galaxy-watch` text-mode `out.printf` in `MxGatewayCli.GalaxyWatchCommand.call()` to use `%s` for the sequence field and pass `Long.toUnsignedString(event.getSequence())`, so deploy sequences past `2^63` render as their correct unsigned decimal string instead of a negative signed long. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. Regression test: `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output contains `seq=18446744073709551615` and does not contain `seq=-1`.
### Client.Java-021
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135` |
| Status | Resolved |
**Description:** Client.Java-002 fixed a deterministic terminal-state race in `MxEventStream` by introducing a `terminate(MxGatewayException)` method, a `terminalLock`, and a `terminated` flag so a `close()` arriving after a queue-overflow `offer()` cannot wipe the overflow exception. `DeployEventStream` — added later and structurally a copy of `MxEventStream` — never received the same fix. Its current `close()` does `closed.set(true); stream.cancel(...); offer(END);`, and its `offer()` overflow branch does `queue.clear(); queue.offer(new MxGatewayException("...queue overflowed")); queue.offer(END);` (lines 117-135). With these two paths running concurrently, the same sequence Client.Java-002 documented can repeat: the overflow branch enqueues `[overflowException, END]`, `close()` then calls `offer(END)` which sees the queue full and falls into the END branch (`queue.clear(); queue.offer(value);`), wiping the overflow exception and leaving a clean end-of-stream. The CLI `galaxy-watch` (and any `WatchDeployEvents` consumer) loses the overflow signal it was supposed to surface, defeating the fail-fast backpressure contract. The 16-element buffer on `DeployEventStream` makes overflow far less likely than on `MxEventStream` in practice, but the race is identical.
**Recommendation:** Mirror the `MxEventStream` fix: add a `terminated` flag and `terminalLock`, route `close()`, `onCompleted`, and the overflow branch through a single `terminate(MxGatewayException)` method that wins on first arrival, and add the regression analogous to `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose`. Given the two stream classes are now structural copies of each other, consider extracting the queue/terminate plumbing into a shared base or helper so the next fix lands once.
**Resolution:** 2026-05-20 — Mirrored the `MxEventStream` terminal-state serialisation in `DeployEventStream`: replaced the `AtomicBoolean closed` field with a `volatile boolean closed`, added a `terminalLock`/`terminated` pair, and routed all terminal paths (`close()`, `onCompleted()`, the overflow branch in `offer()`) through a single private `terminate(MxGatewayException fault)` method guarded by `synchronized (terminalLock) { if (terminated) return; terminated = true; ... }`. The first terminal condition wins: an overflow that publishes `[exception, END]` is no longer wiped by a subsequent `close()`/`onCompleted()` that previously took the "queue full → clear + offer(END)" branch. The class-level Javadoc now documents the single-consumer-thread iterator contract and the deterministic terminal transition, matching `MxEventStream`. Behavior outside the terminal path is unchanged: `beforeStart` still resolves the close-before-beforeStart race (Client.Java-014's deploy-stream counterpart, already in place), `take()` still surfaces interrupts, and the request stream is still cancelled on overflow/close. Regression tests in `GalaxyRepositoryClientTests`: `deployEventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic — capacity-2 stream, force overflow, then close, assert the overflow exception is surfaced) and `deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration concurrent race stress, mirrors `MxGatewayMediumFindingsTests.eventStreamConcurrentOverflowAndCloseAlwaysTerminate`).
### Client.Java-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:161-172` |
| Status | Resolved |
**Description:** The Javadoc on the no-validator `toCompletable(source, operation)` overload claims: "calling `cancel(true)` on either the direct return value or the user-facing chained future ultimately invokes `source.cancel(true)` (chained futures forward to the upstream stage they were derived from, which is this future)." This is not how `CompletableFuture.thenApply` (or `thenCompose`, `whenComplete`, etc.) actually behaves: a downstream stage's `cancel()` only marks that derived stage as cancelled, it does NOT propagate cancellation upstream to the originating `CancellingCompletableFuture`. The Client.Java-015 resolution actually fixes the bug by inlining the validator into the new `toCompletable(source, operation, validator)` overload (lines 224-252) so users never need a downstream stage, and by `GalaxyRepositoryClient.discoverHierarchyAsync` using an explicit `AtomicReference`-based override (which has a correct comment at line 218-221 acknowledging exactly this `thenCompose` limitation). The contradiction between the two adjacent comments will mislead the next maintainer who decides to add a convenience `.thenApply` on top of a `*Async` return value — they will assume cancellation still flows through and re-introduce the Client.Java-015 leak.
**Recommendation:** Rewrite the `toCompletable` Javadoc to state the actual contract: `cancel(...)` on the direct return value (the `CancellingCompletableFuture` instance) forwards to the source RPC, but `cancel(...)` on a `thenApply`/`thenCompose`/`thenAccept` *of* that future does not — the cancellation is captured at the derived stage and the upstream RPC continues until its deadline. Callers that need cancellation through a chained pipeline must follow the `discoverHierarchyAsync` pattern (custom `CompletableFuture` subclass tracking the current in-flight stage). The underlying `CancellingCompletableFuture` class doc (lines 254-258) is already correct; only the `toCompletable` paragraph is misleading.
**Resolution:** 2026-05-20 — Rewrote the `toCompletable(source, operation)` Javadoc in `MxGatewayChannels` to reflect the actual `CompletableFuture` contract. The doc now states unambiguously: cancelling the direct return value (the `CancellingCompletableFuture`) forwards to the source `ListenableFuture` and aborts the underlying gRPC call (the Client.Java-015 fix), but cancelling a derived `thenApply`/`thenCompose`/`thenAccept`/`whenComplete` stage of that future does NOT propagate cancellation upstream — the derived stage is marked cancelled while the source RPC continues until its deadline. The Javadoc explicitly directs callers that need cancellation through a chained pipeline to either the `toCompletable(source, operation, validator)` overload (which inlines the validator into the `FutureCallback.onSuccess` path so the user-visible future is the same future cancellation is bound to) or the `GalaxyRepositoryClient.discoverHierarchyAsync` `AtomicReference`-based pattern (for `thenCompose` across paged calls). The `CancellingCompletableFuture` class Javadoc was already correct and is unchanged. Doc-only change; no behavior change and no new test required.
### Client.Java-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1054`, `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:634` |
| Status | Resolved |
**Description:** `MxEvent.worker_sequence` is a proto `uint64` (line 634 of `mxaccess_gateway.proto`). The `stream-events` CLI text path prints it with `%d` (`client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());`), which interprets the underlying signed `long` value — sequences past `2^63` would render as a negative number. This is the exact same `uint64`-with-`%d` bug that Client.Java-020 fixed for the `galaxy-watch` `DeployEvent.sequence` field; the resolution's stated rule ("The same rule should apply to any other `uint64` proto fields that surface in CLI text output") was never extended to this site. In practice worker sequences will not reach `2^63` so this is latent rather than active, but the same fix and the same regression-test pattern apply.
**Recommendation:** Replace the `%d` with `%s` plus `Long.toUnsignedString(event.getWorkerSequence())` (matching the Client.Java-020 fix in `GalaxyWatchCommand`), and add a regression test analogous to `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` covering the `stream-events` text-mode format string with `-1L`. The `--after-worker-sequence` CLI option (line 1035) is also typed as a `long`, which means the user cannot pass an unsigned value above `2^63 - 1` from the command line; that is a related but separate ergonomic gap worth noting in the same change.
**Resolution:** 2026-05-20 — Updated the `stream-events` text-mode `client.out().printf` in `MxGatewayCli.StreamEventsCommand.call()` to use `%s` for the sequence and pass `Long.toUnsignedString(event.getWorkerSequence())`, mirroring the Client.Java-020 fix in `GalaxyWatchCommand`. Worker sequences past `2^63` now render as their correct unsigned decimal string instead of a negative signed long. An inline comment near the `printf` documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and is unchanged. The `--after-worker-sequence` `long` ergonomic gap is a separate v2 concern and intentionally out of scope. Regression test: `MxGatewayCliTests.streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output starts with `18446744073709551615 ` and does not start with `-1 `, mirroring `deployEventSequenceRendersAsUnsignedForHighUint64`.
### Client.Java-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883` |
| Status | Resolved |
**Description:** `BenchReadBulkCommand` records per-call latency in `latenciesNanos[latencyCount++] = elapsed;` inside *both* the success branch (line 865) and the `catch (Exception ex)` failure branch (line 880). The failed-call durations are then fed into the `percentileSummaryMs` p50/p95/p99 calculation alongside successful calls, producing misleading latency stats when even a few transport errors occur during the bench window. Client.Rust-015 fixed exactly this pattern in `clients/rust/src/bin/bench-read-bulk.rs` ("stop bench-read-bulk from polluting success-latency histograms with failed-call durations"); the equivalent fix was not applied to the Java implementation. The cross-language matrix runner (`scripts/run-client-e2e-tests.ps1`) compares numbers across all five clients, so the Java numbers will be silently inconsistent with the Rust numbers on the same fault profile.
**Recommendation:** Drop the failure-branch latency record (only count `failed++`), or alternately maintain a separate `failedLatenciesNanos` array and report it as a distinct stat in the JSON output — but the success histogram must not include failed-call latencies. Cross-check the .NET, Go, and Python `bench-read-bulk` drivers in the same change to make sure all five clients use the same success-latency definition; the cross-language matrix is only useful if the metric is uniform.
**Resolution:** 2026-05-20 — Dropped the failure-branch latency record in `BenchReadBulkCommand.call()`: the `catch (Exception ex)` block no longer appends `elapsed` to `latenciesNanos` and no longer grows the array — it only increments `failed++`. The success-latency histogram fed into `percentileSummaryMs` (p50/p95/p99/max/mean) is now success-call-only, matching the Client.Rust-015 fix. The JSON output still surfaces `failedCalls` as a distinct top-level count so observers see fault rates separately from latency. An inline comment on the catch block documents the contract so the next maintainer doesn't reinstate the record. New CLI test `MxGatewayCliTests.benchReadBulkCommandEmitsJsonSchemaKeys` (added under Client.Java-026 below) covers the JSON schema produced by the corrected path. The .NET / Go / Python bench drivers were intentionally left out of scope for this Java-focused finding — that cross-client audit is its own follow-up and tracked separately.
### Client.Java-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185` |
| Status | Resolved |
**Description:** `CommonOptions.toClientOptions()` populates the `MxGatewayClientOptions` builder with `endpoint`, `apiKey`, `plaintext`, `caCertificatePath`, `serverNameOverride`, and `callTimeout`, but never sets `shutdownTimeout` even though Client.Java-019 introduced it as a first-class option. CLI users therefore always inherit the 10-second default and have no way to override it from the command line, which makes the new option effectively client-library-only. CLI users running long-lived operations (a big `discover-hierarchy` page-chain, a streaming `galaxy-watch` session that needs to drain on Ctrl+C) cannot tune the shutdown deadline up; users running short health probes who want a small `connectTimeout` *and* a small `shutdownTimeout` to keep the CLI snappy on failure also cannot.
**Recommendation:** Add a `--shutdown-timeout` option to `CommonOptions` (parsed via the existing `parseDuration` helper, default unset → use the 10-second library default) and propagate it into `toClientOptions()` so the CLI surface tracks the library surface. Include the resolved value in `redactedJsonMap()` so `--json` output shows the effective shutdown deadline.
**Resolution:** 2026-05-20 — Added a `--shutdown-timeout` option to `CommonOptions` in `MxGatewayCli.java`, parsed via the existing `parseDuration` helper (so it accepts `10s`, `500ms`, ISO-8601 `PT10S`, etc.). A new lazy accessor `resolvedShutdownTimeout()` returns the parsed `Duration` when the user passed `--shutdown-timeout`, or `null` when unset so the `MxGatewayClientOptions` builder default (10s, established by Client.Java-019) applies. `toClientOptions()` now conditionally calls `builder.shutdownTimeout(resolvedShutdownTimeout)` only when the user opted in, preserving the library default for the common case. `redactedJsonMap()` includes the resolved value under key `"shutdownTimeout"` (empty string when unset) so `--json` output shows the effective shutdown deadline. The CLI surface now tracks the library surface so a user running a long page-chain can pass `--shutdown-timeout 60s`, and a user running a short health probe can pair `--timeout 500ms` with `--shutdown-timeout 500ms` to keep the CLI snappy on failure. Behavior for callers who do not pass the new flag is unchanged.
### Client.Java-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java` |
| Status | Resolved |
**Description:** Client.Java-013 explicitly deferred adding CLI-level test coverage for the `read-bulk`, `write-bulk`, and `bench-read-bulk` subcommands ("Optionally also add at least one CLI-level test for `read-bulk`, `write-bulk`, and the `bench-read-bulk` subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix"), and the resolution explicitly stated that "follow-up is tracked separately and out of scope for this unblock-compilation fix." That follow-up was never filed. The current `MxGatewayCliTests` only covers `version`, `open-session` (JSON redaction), `write`, `smoke`, `subscribe-bulk`, `unsubscribe-bulk`, and the Client.Java-020 unsigned-uint64 format string — six of the thirteen non-trivial subcommands the CLI ships are completely untested at the CLI layer (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`), as are `stream-events`, the four `galaxy-*` commands, and `close-session`. The `FakeSession` stubs all return empty lists, so an end-to-end CLI test would catch JSON-shape regressions, argument-parsing bugs, and option contract breaks that the bulk Session unit tests on the library side do not exercise. This same coverage gap is what made Client.Java-013 itself only surface on a clean Gradle build.
**Recommendation:** Add at least one round-trip CLI test per bulk subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`) that exercises the JSON output shape and the value parser (`parseValue(type, text)` is shared across all five and the only `write*-bulk` path that catches typos in the type switch). Extending the `FakeSession` stubs to return at least one result row makes the assertions meaningful. The `bench-read-bulk` test can run with a 1-second `--duration-seconds` and a 0-second `--warmup-seconds` and assert the JSON schema keys (`totalCalls`, `latencyMs.p50`, `callsPerSecond`) rather than the numeric values.
**Resolution:** 2026-05-20 — Added round-trip CLI tests for all six bulk-family subcommands plus the new Client.Java-023 unsigned-uint64 regression to `MxGatewayCliTests`. The `FakeSession` stubs were upgraded from empty-list returns to per-call recorders that publish the parsed entries (e.g. `lastWriteBulkEntries`, `lastReadBulkTimeoutMs`) and synthesise one `BulkReadResult`/`BulkWriteResult` per requested handle so the JSON output assertions exercise the `bulkReadResultMap` and `bulkWriteResultMap` serialisers. New tests: (a) `readBulkCommandForwardsTimeoutAndPrintsResults` — asserts `--timeout-ms 750` reaches the session and the JSON output carries the per-tag `tagAddress`, `itemHandle`, `wasCached`, and `quality` fields; (b) `writeBulkCommandParsesTypedValuesAndPrintsResults` — asserts `--type int32 --values 111,222 --user-id 5` parses through the shared `parseValue` switch and the entries are constructed with the expected typed `MxValue` and `userId`; (c) `write2BulkCommandForwardsTimestampAndPrintsResults` — asserts the `--timestamp 2026-05-20T00:00:00Z` reaches the entry as a `timestampValue` (`hasTimestampValue()` is true); (d) `writeSecuredBulkCommandForwardsUserIdsAndPrintsResults` — asserts `--current-user-id 7 --verifier-user-id 8` are both propagated; (e) `writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults` — combination of (c) and (d); (f) `benchReadBulkCommandEmitsJsonSchemaKeys` — runs the bench in a 1s steady / 0s warmup window and asserts the JSON output contains the cross-language schema keys (`language=java`, `command=bench-read-bulk`, `bulkSize=2`, `totalCalls`, `successfulCalls`, `failedCalls`, `callsPerSecond`, `latencyMs.p50/p95/p99`, `tags` including the synthesised `TestMachine_001.TestChangingInt`/`TestMachine_002.TestChangingInt` pair); (g) `streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` — Client.Java-023 regression. The recommendation's stream-events and galaxy-* CLI tests were intentionally not added in this round — they require either an in-process gateway/galaxy server or package-private `MxEventStream`/`DeployEventStream` constructor access from the CLI test module, which is its own infrastructure work; the library-side tests in `GalaxyRepositoryClientTests` already cover the streaming wire behaviour.
### Client.Java-027
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/java/README.md:36,107-175,185,205,220`, `clients/java/JavaClientDesign.md:195-211` |
| Status | Resolved |
**Description:** Commit `397d3c5` renamed the gradle subprojects to `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` in `settings.gradle`, but did not propagate that rename into the README's documented gradle commands or into `JavaClientDesign.md`. Every documented gradle invocation still uses the old short names — `gradle :mxgateway-client:generateProto`, `gradle :mxgateway-cli:run --args=...`, `gradle :mxgateway-client:jar :mxgateway-cli:installDist` — and every one fails with `project 'mxgateway-client' not found in root project 'zb-mom-ww-mxaccessgw-java'`. A user copy-pasting from the README or design doc will hit this on the very first command.
**Recommendation:** Find-and-replace `:mxgateway-client:``:zb-mom-ww-mxgateway-client:` and `:mxgateway-cli:``:zb-mom-ww-mxgateway-cli:` across `clients/java/README.md` and `clients/java/JavaClientDesign.md`. (Roughly 17 occurrences in the README, 3 in the design doc.) Test by running one updated command end-to-end.
**Resolution:** 2026-05-24 — Replaced every stale gradle task path in `clients/java/README.md` (line 37 `:mxgateway-client:generateProto`, lines 108110 `:mxgateway-cli:run` galaxy-* invocations, lines 160161 `:mxgateway-cli:run` galaxy-watch invocations, lines 169176 `:mxgateway-cli:run` gateway-command invocations, line 186 `:mxgateway-cli:run` TLS smoke invocation, line 206 `:mxgateway-client:jar :mxgateway-cli:installDist` packaging command, line 221 `:mxgateway-cli:run` integration-check invocation) and in `clients/java/JavaClientDesign.md` (lines 195196 packaging bullets, lines 209212 "Current Build" prose) with their prefixed `:zb-mom-ww-mxgateway-client:` / `:zb-mom-ww-mxgateway-cli:` equivalents. Verified by running `gradle :zb-mom-ww-mxgateway-client:test --tests …` (now the updated documented form) end-to-end and `gradle build`, both BUILD SUCCESSFUL. Doc-only change; no production code touched.
### Client.Java-028
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/java/JavaClientDesign.md:23-27` |
| Status | Resolved |
**Description:** The build-layout block in `JavaClientDesign.md` still shows the old Java package paths `com/dohertylan/mxgateway/client/` and `com/dohertylan/mxgateway/cli/`. The actual source tree was moved to `com/zb/mom/ww/mxgateway/{client,cli}/` in commit `397d3c5`. Anyone using the design doc to locate or navigate code will look in the wrong place.
**Recommendation:** Update the layout block to reflect the new paths: `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/`. Comment-only change.
**Resolution:** 2026-05-24 — Updated the Build Layout block in `clients/java/JavaClientDesign.md:23-27` to show `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/` instead of the old `com/dohertylan/mxgateway/{client,cli}/` package paths, matching the actual on-disk source tree under `clients/java/zb-mom-ww-mxgateway-{client,cli}/src/{main,test}/java/`. Doc-only change; no production code touched.
### Client.Java-029
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/README.md:208-209` |
| Status | Resolved |
**Description:** The packaging section states "The library jar is under `zb-mom-ww-mxgateway-client/build/libs`. The installed CLI distribution is under `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli`." The library-jar path is correct, but the install-distribution path is wrong — gradle's `installDist` produces a directory whose name matches the project name, not the (now-retired) short name, so the actual path is `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/`. The e2e script (`scripts/run-client-e2e-tests.ps1`) uses the correct path, so the script works; only the README is wrong.
**Recommendation:** Correct the README to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`. Comment-only.
**Resolution:** 2026-05-24 — Corrected the install-distribution path in `clients/java/README.md:210` from `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli` to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`, matching what gradle's `installDist` actually produces (the directory name matches the subproject name). The library-jar path on the preceding line was already correct and is unchanged. Doc-only change; the e2e script `scripts/run-client-e2e-tests.ps1` was already using the correct path.
### Client.Java-030
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/` |
| Status | Resolved |
**Description:** Commit `397d3c5` added the missing `QueryActiveAlarmsRequest` proto message and the corresponding `rpc QueryActiveAlarms` to `mxaccess_gateway.proto`. The Java client now generates the request type and the gRPC stub method, and `MxGatewayClient.queryActiveAlarms` correctly references both. No unit test exercises the new RPC end-to-end on the Java side — the proto compiles, the import resolves, the method is callable, but the absence of a `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` (or similar) fixture means a serialisation regression in `QueryActiveAlarmsRequest` or the streaming reply would not surface in the Java unit tests.
**Recommendation:** Add a unit test in `MxGatewayFixtureTests` (or wherever the alarm fixtures live) that pushes a `QueryActiveAlarmsRequest` through a `FakeMxAccessGateway` and asserts the `ActiveAlarmSnapshot` stream is consumed correctly — mirror the existing `acknowledgeAlarm` test shape.
**Resolution:** 2026-05-24 — Re-triaged the recommendation: the suite uses the `InProcessGateway` + `TestGatewayService` fixture in `MxGatewayClientSessionTests` (no separate `FakeMxAccessGateway`), and there is in fact no existing `acknowledgeAlarm` test in the current tree to mirror — the alarm RPC surface has zero coverage. Added `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` to `MxGatewayClientSessionTests` (the same file/fixture style as the other unary/streaming RPC tests): the test overrides `TestGatewayService.queryActiveAlarms` to capture the inbound `QueryActiveAlarmsRequest` and emit two `ActiveAlarmSnapshot` messages (one `ACTIVE`, one `ACTIVE_ACKED` with an operator/comment populated), then calls `MxGatewayClient.queryActiveAlarms` with a request carrying `session_id`, `client_correlation_id`, and `alarm_filter_prefix`. It awaits `onCompleted` via a `CountDownLatch`, closes the subscription, and asserts (a) the server observed all three inbound request fields, (b) both snapshots arrived in order with the expected `alarm_full_reference`/`current_state`/`severity`/`operator_user`, and (c) the observer never received an error. TDD red phase confirmed by temporarily changing the `session_id` assertion to `"WRONG-SESSION-ID"``gradle :zb-mom-ww-mxgateway-client:test --tests "…queryActiveAlarmsForwardsRequestAndStreamsSnapshots"` failed with `AssertionFailedError at MxGatewayClientSessionTests.java:252`. Restoring the assertion turned the build green. Full `gradle build` from `clients/java` is BUILD SUCCESSFUL.
### Client.Java-031
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/java/README.md:13,17,26` |
| Status | Resolved |
**Description:** The README prose at lines 1326 introduces the subprojects as `mxgateway-client` and `mxgateway-cli` (the old short names) when discussing the layout. Those are no longer the actual subproject names — `settings.gradle` declares `zb-mom-ww-mxgateway-client` / `zb-mom-ww-mxgateway-cli`. The prose works as a naming mnemonic, but it confuses anyone trying to map README descriptions to actual gradle output, IDE project trees, or the e2e script.
**Recommendation:** Either (a) update the prose to the full prefixed names, or (b) clarify in a one-line note: "The subprojects are `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli`; this README refers to them by their short suffixes below for readability." (a) is more honest; (b) preserves readability at the cost of one extra concept.
**Resolution:** 2026-05-24 — Took option (a): updated the README layout-section prose in `clients/java/README.md:17,22,26` (the `mxgateway-client` generates… paragraph, the `mxgateway-client` exposes… paragraph, and the `mxgateway-cli` depends on `mxgateway-client`… paragraph) to use the full prefixed `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` names, matching the layout block at lines 1314 and `settings.gradle`. Reflows the final paragraph slightly because the prefixed names push past the existing 80-column wrap; content is unchanged otherwise. Doc-only change.
### Client.Java-032
| Field | Value |
|---|---|
| Severity | High |
| Category | Documentation & comments |
| Location | `clients/java/README.md:182-183` |
| Status | Resolved |
**Description:** Commit `8738735` ("clients: document StreamAlarms + AcknowledgeAlarm in each README") added two new gradle invocations to the CLI Usage block:
```
gradle :zb-mom-ww-mxgateway-cli:run --args="stream-alarms --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --limit 1 --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="acknowledge-alarm --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --alarm-reference \"\\Galaxy\Area001.Pump001.PumpFault\" --json"
```
Both are wrong against the actual CLI surface in `MxGatewayCli.java`:
- `StreamAlarmsCommand` (line 1060) has no `--session-id` option — the gateway's alarm feed is session-less ("Served by the gateway's always-on alarm monitor — no worker session is opened" per the SDK Javadoc at `MxGatewayClient.java:331`). picocli will reject the unknown option with a non-zero exit before the command body runs.
- `AcknowledgeAlarmCommand` (line 1135) also has no `--session-id` option, AND the option is named `--reference` (line 1136), not `--alarm-reference`. Two unknown-option errors per copy-paste.
A user copying either invocation from the README hits a picocli parse error immediately. The other clients' README updates in the same commit (commit `8738735`) likely have the same issue but are out of scope for this Java review.
**Recommendation:** Drop the `--session-id <id>` token from both documented invocations, and change `--alarm-reference` to `--reference` in the `acknowledge-alarm` line. Optionally also add `--filter-prefix` to the `stream-alarms` example so readers see the scoping option, and align README option names with the actual CLI by either renaming the CLI option `--reference``--alarm-reference` (matches the proto `alarm_full_reference` field semantically) or leaving as is and only fixing the README. Add a small `MxGatewayCliTests` parse-only assertion for both subcommands that exercises every option flag to prevent the same drift the next time the CLI surface or README is touched.
**Resolution:** 2026-05-24 — Confirmed root cause against `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1174-1182,1248-1258`: `StreamAlarmsCommand` exposes only `--filter-prefix` / `--limit` and `AcknowledgeAlarmCommand` exposes `--reference` / `--comment` / `--operator` — neither has a `--session-id` option and `acknowledge-alarm` has no `--alarm-reference` option, so both documented invocations failed picocli parse at the first unknown option. Fixed `clients/java/README.md:182-183` by dropping the `--session-id <id>` token from both lines, replacing it with `--filter-prefix Galaxy` on the `stream-alarms` example so readers see the actual scoping flag, and changing `--alarm-reference` to `--reference` on the `acknowledge-alarm` example. Added `MxGatewayCli.commandLine(...)` to package-private visibility (was `private`) so the test can drive the production picocli `CommandLine` directly without executing the command body. Regression tests in `MxGatewayCliTests`: `readmeDocumentedStreamAlarmsExampleParsesCleanly` and `readmeDocumentedAcknowledgeAlarmExampleParsesCleanly` pin the exact token list documented in the README and assert `commandLine.parseArgs(...)` returns without throwing a `picocli.CommandLine.ParameterException`. TDD red phase: before the README fix the previously-documented tokens (`--session-id <id>` + `--alarm-reference ...`) would have thrown `Unknown option: '--session-id'` / `Unknown option: '--alarm-reference'` at parse time; the new tests pass against the corrected README and would fail the next time someone drifts the documented surface from the actual CLI options.
### Client.Java-033
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1078-1098` |
| Status | Resolved |
**Description:** `StreamAlarmsCommand.call()` allocates a bounded `ArrayBlockingQueue<Object>(1024)` and the gRPC observer publishes each `AlarmFeedMessage` via `queue.offer(value)`:
```
BlockingQueue<Object> queue = new ArrayBlockingQueue<>(1024);
@Override public void onNext(AlarmFeedMessage value) { queue.offer(value); }
```
`BlockingQueue.offer` returns `false` when the queue is full and the message is dropped silently — no exception, no log, no signal to the consumer that data was lost. Combined with gRPC's default auto-inbound flow control (the same regime that Client.Java-011 documented for `MxEventStream`), any consumer stall long enough to fill 1024 slots will silently lose alarms past the boundary. The active-alarm snapshot replay at session open is exactly the burst case that pushes a moderately busy alarm feed close to that limit, and the failure mode is invisible — the CLI prints a truncated feed and exits 0.
The library-side `MxEventStream` (Client.Java-002 resolution) and `DeployEventStream` (Client.Java-021 resolution) explicitly surface overflow as `MxGatewayException` so the consumer can resubscribe with a resume cursor. The new `stream-alarms` CLI path regresses that contract.
`JavaStyleGuide.md` ("Streaming" section, line 57) explicitly prohibits this pattern: "Do not reorder, coalesce, or drop events in client code." `CLAUDE.md` ("fail-fast event backpressure") makes overflow a deliberate cancel-and-surface signal, not a silent drop.
**Recommendation:** Either (a) wrap the gRPC observer in the existing `MxEventStream`-style adaptor that calls `subscription.cancel()` and queues an exception on `queue.offer` returning `false`, then surface that exception from the drain loop — mirroring `MxEventStream.observer().onNext`'s overflow branch; or (b) reuse the library-side fail-fast plumbing by promoting `MxEventStream` (or extracting its terminal-state base) into a public `MxAlarmFeedStream` and have `MxGatewayClient.streamAlarms` return that instead of a bare subscription handle. Option (b) lines up with Client.Java-036 (deduplicate the subscription class family). Add a CLI regression test that overflows the bounded queue and asserts a non-zero exit / overflow exception, mirroring `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose`.
**Resolution:** 2026-05-24 — Confirmed root cause at `MxGatewayCli.java` `StreamAlarmsCommand.call()`: the observer's `onNext` did `queue.offer(value)` and ignored the boolean return, so a 1024-element queue would silently drop messages past capacity. The same silent-drop affected the `onCompleted` branch (which `offer`s `ALARM_FEED_END`) once the queue was full, deadlocking the consumer since the drain loop never sees END. Took option (a) — minimal change that matches `MxEventStream`'s overflow branch. The fix: detect a failed `offer` inside `onNext`, call `subscription.cancel()` (via an `AtomicReference<MxGatewayAlarmFeedSubscription>` published immediately after `client.streamAlarms` returns), `queue.clear()`, then `queue.offer(IllegalStateException("stream-alarms queue overflowed (capacity 1024); consumer too slow"))` followed by `queue.offer(ALARM_FEED_END)`. The existing drain-loop `Throwable`-branch then surfaces the overflow as a thrown `IllegalStateException` from `call()`, which picocli reports as a non-zero CLI exit. Option (b) (promoting `MxEventStream` to a public alarm-feed stream) was considered and rejected for this change — it would change the public SDK surface; Client.Java-036's refactor handles deduplication at the subscription layer instead. Regression test: `MxGatewayCliTests.streamAlarmsCommandFailsFastOnQueueOverflow` — drives an `OverflowingFakeClient` whose `streamAlarms` synchronously pushes 2000 messages to the observer (exceeding the 1024 buffer), then asserts `run.exitCode() != 0`. TDD red phase confirmed deterministically: before the fix the test deadlocked (the buggy `offer` silently dropped both the overflowing alarms AND the `ALARM_FEED_END` sentinel that arrived after the queue filled, so the drain loop's `queue.take()` blocked forever); the background gradle run had to be killed with `TaskStop`. After the fix the same test exits in <1 second with the overflow exception propagating through picocli.
### Client.Java-034
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:182-198` |
| Status | Resolved |
**Description:** `BatchCommand.call()` reads one CLI invocation per stdin line and tokenises with:
```
String[] args = line.trim().split("\\s+");
int exitCode = cmd.execute(args);
```
`split("\\s+")` does no shell-quoting parsing — it just splits on whitespace runs. Any argument with embedded whitespace gets shredded mid-string. Examples that break:
- `acknowledge-alarm --comment "needs verification" --operator op1` becomes `[acknowledge-alarm, --comment, "needs, verification", --operator, op1]` — picocli sees `"needs` as the comment value and `verification"` as an unknown positional argument.
- `galaxy-watch --json --last-seen-deploy-time "2026-04-28 18:30:00Z"` (any value with a space) fails the same way.
The commit message for `71d2c39` says the protocol expects "one line of stdin = one full CLI invocation" without specifying whether quoted arguments are honoured, but the cross-language matrix (`scripts/run-client-e2e-tests.ps1` per the prior reviews) compares output across all five clients — if other clients (.NET, Go, Rust, Python) implement shell-like tokenisation, the Java CLI will diverge on any command-line value containing whitespace, and the e2e harness will fail or skip those cases inconsistently.
The current `MxGatewayCliTests` test set (`batchCommandExecutesVersionAndEmitsEorMarker`, `batchCommandEmitsEorAfterFailedCommandAndContinues`) only covers whitespace-free args, so the bug is invisible in the test suite.
**Recommendation:** Replace `line.trim().split("\\s+")` with a real shell-style tokeniser that honours single and double quotes and backslash escapes — `picocli.CommandLine.ArgumentParser` doesn't ship one, but Apache Commons Exec's `CommandLine.translateCommandline(String)`, JDK 21's `java.util.spi.ToolProvider` argument parsing, or a small hand-written state machine all work. Cross-check the .NET / Go / Rust / Python `batch` implementations in the same change so all five clients use the same tokenisation; document the contract in the protocol comment in `MxGatewayCli.java` and in `scripts/run-client-e2e-tests.ps1`. Add a CLI test that feeds `acknowledge-alarm --comment "with spaces"` through `batch` and asserts the `--comment` value reaches the gateway as `"with spaces"`.
**Resolution:** 2026-05-24 — Confirmed root cause: `BatchCommand.call()` at the per-line loop used `line.trim().split("\\s+")` which has no quote handling. Replaced with a new package-private `MxGatewayCli.tokenizeBatchLine(String)` static helper — a hand-rolled POSIX-style shell tokenizer (no new dependency added) that honours: (a) double-quoted runs `"..."` with `\\`, `\"`, and `\n` escapes inside; (b) single-quoted runs `'...'` taken literally with no escapes (POSIX rule); (c) backslash escapes for any single character outside quotes (so `needs\ verification` is one token); (d) whitespace runs outside quotes separate tokens; (e) explicit `IllegalArgumentException` on unterminated quote or trailing backslash so the batch loop surfaces it as a JSON error instead of emitting wrong args. The `BatchCommand` per-line tokenisation now calls `tokenizeBatchLine(line)` and treats an empty-array result as a blank line (skip). Behaviour for whitespace-only input is unchanged. The cross-client `batch` audit (.NET / Go / Rust / Python) is out of scope for this Java-focused finding and tracked separately. Regression tests in `MxGatewayCliTests`: (a) `batchCommandTokenisesDoubleQuotedArgumentWithEmbeddedSpaces``--comment "needs verification"` round-trips intact; (b) `batchCommandTokenisesSingleQuotedArgumentWithEmbeddedSpaces` — single-quoted variant; (c) `batchCommandTokenisesBackslashEscapedSpaceOutsideQuotes``needs\ verification` outside quotes; (d) `batchCommandPreservesEmptyQuotedArgument``""` parses to an empty-string argument; (e) `batchCommandSupportsBackslashEscapedQuoteInsideDoubleQuotes``\"inner\"` survives the inner quotes. TDD red phase confirmed: all five tests failed against the original `split("\\s+")` implementation; after the fix all five pass.
### Client.Java-035
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientSessionTests.java` |
| Status | Resolved |
**Description:** Commit `8a0c59d` added `MxGatewayClient.streamAlarms(StreamAlarmsRequest, StreamObserver<AlarmFeedMessage>)` and a new public `MxGatewayAlarmFeedSubscription` class. No library-side test exercises either: a grep for `streamAlarms` across `zb-mom-ww-mxgateway-client/src/test/...` returns zero matches. The CLI tests (`MxGatewayCliTests.streamAlarmsCommand*`) exercise the path end-to-end, but they route through a `FakeClient.streamAlarms` override that bypasses the production `subscription.wrap(observer)` glue and the `withStreamDeadline(rawAsyncStub()).streamAlarms(...)` call. A regression to either — forgetting `.wrap(observer)`, dropping the deadline interceptor, misnaming the request — would compile and pass the CLI tests but break against a real gateway.
This is the same coverage gap pattern as Client.Java-030 (no fixture test for `QueryActiveAlarmsRequest`) which was resolved by adding `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` to `MxGatewayClientSessionTests`. The new alarm-feed RPC deserves the same shape.
**Recommendation:** Add `streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages` to `MxGatewayClientSessionTests` (in-process gRPC via the existing `InProcessGateway` + `TestGatewayService` fixture): override `TestGatewayService.streamAlarms` to capture the inbound `StreamAlarmsRequest` and emit one `active_alarm` snapshot, one `snapshot_complete`, and one `transition`, then complete. Call `MxGatewayClient.streamAlarms`, drain the observer via a `CountDownLatch`, and assert (a) the server observed the `alarm_filter_prefix`, (b) all three messages arrived in order with the expected payload-case, and (c) `MxGatewayAlarmFeedSubscription.cancel()` aborts the call (latch via `ServerCallStreamObserver.setOnCancelHandler`, mirroring the Client.Java-015 cancellation regression). Optionally also cover the cancel-before-beforeStart race that `MxGatewayAlarmFeedSubscription.wrap` handles, mirroring `mxEventStreamCloseBeforeBeforeStartCancelsStream`.
**Resolution:** 2026-05-24 — Confirmed the coverage gap: a grep across `zb-mom-ww-mxgateway-client/src/test/...` for `streamAlarms` returned zero matches; the CLI-only test routed through `FakeClient.streamAlarms` which bypassed both the production `subscription.wrap(observer)` and the `withStreamDeadline(rawAsyncStub()).streamAlarms(...)` gRPC call. Added `streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages` to `MxGatewayClientSessionTests` in the same shape as `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` (Client.Java-030 resolved this way). The test overrides `TestGatewayService.streamAlarms` to capture the inbound `StreamAlarmsRequest`, register a `serverCancelled` latch via `(ServerCallStreamObserver<AlarmFeedMessage>) responseObserver).setOnCancelHandler(...)`, then emit three messages: an `active_alarm` snapshot, a `snapshot_complete` sentinel, and a `transition`. It deliberately does NOT call `onCompleted()` so the call remains open for the cancellation assertion. The test then calls `MxGatewayClient.streamAlarms` against the in-process gateway, drains the wrapped observer via a `threeReceived` `CountDownLatch`, and asserts (a) the server observed `alarm_filter_prefix=Tank01`, (b) all three messages arrived in order with the expected payload-case (`ACTIVE_ALARM`, `SNAPSHOT_COMPLETE`, `TRANSITION`) and payload values (`Tank01.Level.HiHi`, transition kind `ACKNOWLEDGE`), and (c) `subscription.cancel()` causes the server's on-cancel handler to fire within 5 s (proves cancellation propagates through the production `subscription.wrap(observer)` glue, not just the CLI fake). TDD red phase: temporarily replaced the production `MxGatewayClient.streamAlarms` body with `withStreamDeadline(rawAsyncStub()).streamAlarms(request, observer);` (dropping the `subscription.wrap(observer)` indirection); the test failed at the `serverCancelled.await` assertion because cancellation was no longer wired to the underlying gRPC call. Restoring the production glue turned the build green.
### Client.Java-036
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayAlarmFeedSubscription.java`, `MxGatewayEventSubscription.java`, `MxGatewayActiveAlarmsSubscription.java`, `DeployEventSubscription.java` |
| Status | Resolved |
**Description:** `MxGatewayAlarmFeedSubscription` is a structural near-copy of `MxGatewayEventSubscription` — same `AtomicReference<ClientCallStreamObserver<…>>` + `AtomicBoolean cancelled` field shape, the same `wrap(observer)` returning a `ClientResponseObserver` that stores `requestStream` in `beforeStart`, the same close-before-beforeStart race handling that Client.Java-014 originally fixed for `MxEventStream`, and the same `cancel()`+`close()` idempotency contract. The four subscription classes (`MxGatewayEventSubscription`, `MxGatewayActiveAlarmsSubscription`, `MxGatewayAlarmFeedSubscription`, `DeployEventSubscription`) are now ~60-line near-clones differing only in the request/response generic parameters and the `cancel` message string.
This is the same maintenance-hazard pattern Client.Java-009 / Client.Java-016 identified for `createChannel` / `withDeadline` / `shutdown` and which was resolved by extracting `MxGatewayChannels`. A future fix to one subscription class (e.g. propagating Client.Java-014's race fix to a future fifth subscription type, or hardening idempotent `cancel`) will silently miss the others — exactly what happened with Client.Java-021 (the `MxEventStream` terminal-state fix that didn't reach `DeployEventStream` for months).
**Recommendation:** Extract a package-private abstract base, e.g. `MxGatewayStreamSubscription<TRequest>`, holding the `AtomicReference` / `AtomicBoolean` pair, the `cancel()` / `close()` implementation, and a `ClientResponseObserver` factory parameterised by the cancel-message string and the response observer. Have all four subscription classes extend it. Behaviour-only refactor — no public API change, existing tests cover the contract.
**Resolution:** 2026-05-24 — Extracted a package-private abstract base `MxGatewayStreamSubscription<TRequest, TResponse> implements AutoCloseable` (new file `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayStreamSubscription.java`). It holds the shared `AtomicReference<ClientCallStreamObserver<TRequest>>` and `AtomicBoolean cancelled` pair, the `wrap(StreamObserver<TResponse>)` factory that returns a `ClientResponseObserver` with the Client.Java-014 close-before-beforeStart fix baked in, the `cancel()` / `close()` implementation, and an immutable `cancelMessage` injected by the subclass constructor. The four prior 60-line near-clones (`MxGatewayEventSubscription`, `MxGatewayAlarmFeedSubscription`, `MxGatewayActiveAlarmsSubscription`, `DeployEventSubscription`) collapse to ~10-line subclasses that only declare their `<Request, Response>` type parameters and supply the cancel-message string to `super(...)`. Public API surface is preserved: each subclass remains a `public final class` with a public no-arg constructor (the constructor was implicit on the original classes; I made it explicit `public` on the subclasses so the existing CLI `FakeClient.streamAlarms` in a different package can still `new MxGatewayAlarmFeedSubscription()`). The `wrap(...)` method is `final` and package-private on the base — same accessibility the four subclasses had before — so production callers in `MxGatewayClient`/`GalaxyRepositoryClient` see no change. New test file `MxGatewayStreamSubscriptionContractTests` exercises the lifecycle/cancellation contract identically across all four subclasses (16 tests, four per scenario): (a) cancel-before-beforeStart eagerly cancels the stream once it attaches with the subclass-specific message, (b) cancel-after-beforeStart forwards directly to the stream, (c) `close()` delegates to `cancel()`, (d) the wrapped observer forwards `onNext`/`onError`/`onCompleted` verbatim, and a compile-time `typeBoundsCheck` helper that asserts each subclass still binds its `<Req, Resp>` parameters to the right proto types. TDD red phase confirmed: temporarily breaking one subclass's `super(...)` message to `"BROKEN MESSAGE"` made the contract test for that subclass fail with `expected: <client cancelled alarm feed> but was: <BROKEN MESSAGE>`; restoring the correct value turned all 16 contract tests green. Future fixes to the shared lifecycle now live in one place — the next Client.Java-014/021-style race fix cannot drift across the four classes.