Files
mxaccessgw/code-reviews/Client.Java/findings.md
T
Joseph Doherty d3cb311aae Resolve Client.Java-032..036: shared subscription base, batch tokenizer
Client.Java-032  README CLI examples for stream-alarms and
                 acknowledge-alarm now use the correct picocli flags
                 (--filter-prefix and --reference); two regression
                 tests parse each documented invocation.
Client.Java-033  StreamAlarmsCommand publishes an
                 AtomicReference<MxGatewayAlarmFeedSubscription> and
                 mirrors MxEventStream's overflow branch: a failed
                 queue.offer cancels the subscription, queues an
                 IllegalStateException, then queues the END sentinel
                 — preserving the fail-fast contract.
Client.Java-034  BatchCommand routes through a new
                 MxGatewayCli.tokenizeBatchLine POSIX-style shell
                 tokenizer that respects double-quoted, single-quoted,
                 and backslash-escaped arguments.
Client.Java-035  Added streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages
                 to MxGatewayClientSessionTests; asserts request shape,
                 message ordering, and cancellation propagation.
Client.Java-036  Extracted MxGatewayStreamSubscription<TRequest,TResponse>
                 abstract base; the four subscription classes
                 (MxGatewayEventSubscription, MxGatewayAlarmFeedSubscription,
                 MxGatewayActiveAlarmsSubscription, DeployEventSubscription)
                 collapse to ~10-line subclasses. A new contract test
                 runs identical lifecycle / cancellation assertions
                 across all four subclasses.

All resolved at 2026-05-24; gradle build + gradle test BUILD SUCCESSFUL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 09:29:27 -04:00

98 KiB
Raw Blame History

Code Review — Client.Java

Field Value
Module clients/java
Reviewer Claude Code
Review date 2026-05-24
Commit reviewed 42b0037
Status Re-reviewed
Open findings 0

Checklist coverage

A third-pass review against commit a020350 (the sweep that resolved Client.Java-013 through Client.Java-020). Prior findings are unchanged; new findings raised in this pass are numbered Client.Java-021 onward.

# Category Result
1 Correctness & logic bugs Issues found: stream-events CLI text path still prints the proto uint64 worker_sequence with %d (Client.Java-023), the same bug Client.Java-020 fixed for galaxy-watch; bench-read-bulk includes failed-call durations in its success-latency histogram (Client.Java-024), mirroring the bug Client.Rust-015 fixed in Rust.
2 mxaccessgw conventions No new issues found in this pass.
3 Concurrency & thread safety Issue found: DeployEventStream did not receive the deterministic terminal-state serialisation that Client.Java-002 added to MxEventStream, so a concurrent queue-overflow + close() race can still erase the overflow signal (Client.Java-021).
4 Error handling & resilience No new issues found in this pass.
5 Security No new issues found in this pass. The Client.Java-018 regex correctly handles colon/comma/quote/paren/URL embeddings and is verified by the existing fixture tests.
6 Performance & resource management No new issues found in this pass. shutdownTimeout is consistently honoured everywhere ownedChannel.shutdown() is called — both clients delegate to the shared MxGatewayChannels.shutdown / shutdownAndAwaitTermination helpers.
7 Design-document adherence No new issues found in this pass.
8 Code organization & conventions Issue found: the CLI CommonOptions.toClientOptions() does not propagate shutdownTimeout to the underlying MxGatewayClientOptions, so CLI users have no way to override the new option introduced by Client.Java-019 (Client.Java-025).
9 Testing coverage Issue found: there is no CLI-level test coverage for the read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, or bench-read-bulk subcommands — Client.Java-013 noted this as out-of-scope but never filed a follow-up (Client.Java-026).
10 Documentation & comments Issue found: MxGatewayChannels.toCompletable Javadoc claims chained thenApply futures forward cancel() upstream to CancellingCompletableFuture, which is not true of CompletableFuture.thenApply; the implementation works only because all validator chains are inlined into the new toCompletable(source, operation, validator) overload (Client.Java-022).

2026-05-24 review (commit d692232)

Re-review pass at d692232. Diff against a020350 is commit 397d3c5: gradle subprojects renamed mxgateway-clientzb-mom-ww-mxgateway-client, mxgateway-clizb-mom-ww-mxgateway-cli. Java package change com.dohertylan.mxgateway.*com.zb.mom.ww.mxgateway.* (source directories moved from com/dohertylan/mxgateway/ to com/zb/mom/ww/mxgateway/). settings.gradle and root build.gradle updated for the new project / group names. CLI mainClass updated. The gradle build task at HEAD is green; documentation updates lag — see the doc-side findings below.

# Category Result
1 Correctness & logic bugs No issues found in the a020350..d692232 diff.
2 mxaccessgw conventions Issues found: Client.Java-031 (README prose still uses the old short project names in plain text — the actual subproject names that drive Gradle / IDE imports are the prefixed ones).
3 Concurrency & thread safety No issues found in this diff.
4 Error handling & resilience No issues found in this diff.
5 Security No issues found in this diff.
6 Performance & resource management No issues found in this diff.
7 Design-document adherence Issues found: Client.Java-028 (JavaClientDesign.md build-layout example still cites the old com/dohertylan/mxgateway/ package paths).
8 Code organization & conventions Issues found: Client.Java-030 (no new test exercises the regenerated QueryActiveAlarmsRequest RPC path).
9 Testing coverage Cross-reference to Client.Java-030; the bulk-command gaps tracked under Client.Java-026 remain.
10 Documentation & comments Issues found: Client.Java-027 (Gradle task names in README and JavaClientDesign still reference the old :mxgateway-client: and :mxgateway-cli: paths — every command in the README breaks if copy-pasted); Client.Java-029 (README.md:209 cites zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli but the actual install path contains a doubled directory zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/).

2026-05-24 re-review (commit 42b0037)

Re-review pass at 42b0037. Diff against d692232 is five commits touching clients/java: 10bd0c0 (Client.Java-027..031 fixes), 71d2c39 (port batch subcommand), f90bff0 (port bulk read/write SDK + CLI subcommands), 8a0c59d (port stream-alarms + acknowledge-alarm to the Java CLI plus a new MxGatewayClient.streamAlarms SDK method returning MxGatewayAlarmFeedSubscription), and 8738735 (README updates for the alarm surface). gradle build from clients/java is green; new findings focus on the alarm surface and the batch subcommand. Prior findings Client.Java-001..031 are unchanged.

# Category Result
1 Correctness & logic bugs Issues found: BatchCommand tokenises stdin with line.trim().split("\\s+"), so any argument with embedded whitespace (e.g. --comment "needs verification") is shredded mid-string (Client.Java-034); StreamAlarmsCommand uses queue.offer(value) on a bounded queue with no overflow detection, silently dropping alarm messages when the gateway pushes faster than the consumer drains (Client.Java-033).
2 mxaccessgw conventions Cross-reference to Client.Java-033 — silently-dropping events violates JavaStyleGuide.md ("Do not reorder, coalesce, or drop events in client code") and the fail-fast event-backpressure contract. No new finding raised separately.
3 Concurrency & thread safety No issues found in this diff. MxGatewayAlarmFeedSubscription correctly mirrors MxGatewayEventSubscription's beforeStart-vs-cancel race fix (Client.Java-014).
4 Error handling & resilience No issues found in this diff.
5 Security No issues found in this diff.
6 Performance & resource management No issues found in this diff.
7 Design-document adherence No issues found in this diff. The new streamAlarms SDK surface matches gateway.md's always-on alarm-monitor design (no worker session, multi-subscriber).
8 Code organization & conventions Issue found: MxGatewayAlarmFeedSubscription is a structural copy of MxGatewayEventSubscription (same fields, same beforeStart/cancel shape) — the subscription-class family should share a common base or helper (Client.Java-036).
9 Testing coverage Issue found: the new MxGatewayClient.streamAlarms SDK method has no library-side test in zb-mom-ww-mxgateway-client/src/test/... — only the CLI test exercises the path via a FakeClient.streamAlarms override that bypasses the production subscription.wrap(observer) glue (Client.Java-035).
10 Documentation & comments Issue found: README (clients/java/README.md:182-183) documents the new stream-alarms and acknowledge-alarm commands with --session-id <id> (neither command has that option) and acknowledge-alarm --alarm-reference … (actual flag is --reference) — every documented invocation fails at picocli parse time (Client.Java-032).

Findings

Client.Java-001

Field Value
Severity Medium
Category Security
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32
Status Resolved

Description: redactApiKey preserves the leading and trailing four characters of the key. A gateway API key has the form mxgw_<key-id>_<secret>; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (CommonOptions.redactedJsonMap), and MxGatewayClientOptions.toString(). CLAUDE.md states API keys must never reach logs.

Recommendation: Redact the secret entirely. Show only a stable non-secret prefix (e.g. the mxgw_<key-id>_ portion) and mask everything after it, or emit a fixed mxgw_*** form. Do not echo any trailing characters of the secret.

Resolution: (2026-05-18) Confirmed against source: the old substring(0,4) + stars + substring(len-4) echoed the last four secret characters. redactApiKey now masks the secret entirely: for gateway-shaped keys it returns the non-secret mxgw_<key-id>_ prefix followed by *** (locating the secret separator as the first _ after mxgw_); any non-gateway-shaped token returns <redacted>. No leading/trailing secret characters are ever emitted. The pre-existing MxGatewayCliTests.openSessionJsonRedactsApiKey assertion that hardcoded the leaky mxgw***********cret form was corrected to assert the masked mxgw_visible_*** form. Regression tests: MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret, redactApiKeyForNonGatewayShapedKeyRevealsNothing, redactApiKeyStillHandlesNullAndShortInput.

Client.Java-002

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92
Status Resolved

Description: The next field is a plain (non-volatile) instance field, and MxEventStream exposes no thread-confinement guarantee. More concretely, a queue-overflow offer() and a close() offer(END) can interleave so the overflow exception is enqueued after END and never observed — the contract that "next() throws after overflow" is not guaranteed once close() has been called.

Recommendation: Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.

Resolution: (2026-05-18) Confirmed against source: the old offer() END-branch did queue.clear(); queue.offer(END) when full, so a close() arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single terminate(MxGatewayException) method guarded by a terminated flag and a terminalLock; the first terminal condition wins and a later close()/END cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while close() is safe from any thread. Regression tests: MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose (deterministic) and eventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration race stress).

Client.Java-003

Field Value
Severity Medium
Category mxaccessgw conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140
Status Resolved

Description: OpenSessionReply carries gateway_protocol_version (proto field 8), and MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION exists so the client can reject incompatible generated-code inputs. The client never reads reply.getGatewayProtocolVersion() nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.

Recommendation: In openSession/openSessionRaw, compare reply.getGatewayProtocolVersion() with MxGatewayClientVersion.gatewayProtocolVersion() and throw a typed MxGatewayException on mismatch.

Resolution: (2026-05-18) Confirmed against source: neither openSessionRaw nor openSessionAsync read getGatewayProtocolVersion(). Added a private ensureGatewayProtocolCompatible helper, called from both openSessionRaw and openSessionAsync, that throws MxGatewayException with a clear mismatch message when the gateway reports a non-zero version differing from MxGatewayClientVersion.gatewayProtocolVersion(). A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. clients/java/README.md documents the new fail-fast check. Regression tests: MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion and openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion.

Client.Java-004

Field Value
Severity Medium
Category Correctness & logic bugs
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197
Status Resolved

Description: register, addItem, and addItem2 check reply.hasRegister()/hasAddItem() and otherwise fall back to reply.getReturnValue().getInt32Value(). If the gateway returns a reply with neither the typed payload nor a return_value set, the method silently returns 0 — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.

Recommendation: If the expected typed payload is absent and no return_value is present, throw MxGatewayException (protocol violation) instead of returning 0.

Resolution: (2026-05-18) Confirmed against source: all three methods returned reply.getReturnValue().getInt32Value() (which yields 0 for an unset message field) when the typed payload was absent. Each method now guards the fallback with reply.hasReturnValue() and throws MxGatewayException describing the protocol violation when neither the typed payload nor a return_value is present. The legitimate return_value fallback is preserved. Regression tests: MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, and addItemStillHonoursReturnValueFallback.

Client.Java-005

Field Value
Severity Medium
Category Error handling & resilience
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105
Status Resolved

Description: close() delegates to closeRaw(), which performs a network RPC. When MxGatewaySession is used in try-with-resources and the body throws, a failure inside closeSession (e.g. WORKER_UNAVAILABLE) throws from close() and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing close().

Recommendation: Either make close() swallow/log close-time failures (keeping closeRaw() for callers who want the result), or document clearly that close() performs a network call that can throw.

Resolution: (2026-05-18) Confirmed against source: close() called closeRaw() directly, so a CloseSession RPC failure propagated out of try-with-resources and replaced the body exception. close() now catches MxGatewayException from closeRaw() and logs it at WARNING via System.Logger instead of rethrowing, so a close-time failure never masks the body exception. closeRaw() is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use closeRaw() for explicit close handling are documented in clients/java/README.md and the close() Javadoc. Regression tests: MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException and closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt.

Client.Java-006

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284
Status Resolved

Description: close() (the AutoCloseable method invoked by try-with-resources) calls only ownedChannel.shutdown() and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. closeAndAwaitTermination() does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.

Recommendation: Have close() await termination for a bounded time and shutdownNow() on timeout (the logic already in closeAndAwaitTermination()), or document that try-with-resources callers should call closeAndAwaitTermination().

Resolution: (2026-05-18) Confirmed against source: both MxGatewayClient.close() and GalaxyRepositoryClient.close() called only ownedChannel.shutdown(). close() in both clients now performs the bounded-wait logic previously only in closeAndAwaitTermination(): it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls shutdownNow() on timeout. Because close() cannot throw a checked exception, an InterruptedException while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. closeAndAwaitTermination() is retained unchanged for callers who want the checked, blocking-aware variant. clients/java/README.md documents the new try-with-resources close() semantics.

Client.Java-007

Field Value
Severity Low
Category Testing coverage
Location clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/
Status Resolved

Description: The alarm surface — acknowledgeAlarm/acknowledgeAlarmAsync/queryActiveAlarms and MxGatewayActiveAlarmsSubscription — has zero test coverage. TLS channel construction, the async streamEventsAsync path, MxGatewayEventSubscription pre-start cancellation, and MxEventStream queue overflow are likewise untested. JavaClientDesign.md explicitly lists async stream-observer cancellation and status/error mapping as required tests.

Recommendation: Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.

Resolution: (2026-05-18) Confirmed against source: no test referenced acknowledgeAlarm, queryActiveAlarms, streamEventsAsync, TLS construction, or MxEventStream overflow. Added MxGatewayLowFindingsTests (12 tests) covering: acknowledgeAlarm/acknowledgeAlarmAsync (success, typed protocol-failure, async transport-failure normalisation), queryActiveAlarms observer delivery, MxGatewayActiveAlarmsSubscription and MxGatewayEventSubscription pre-start cancellation, streamEventsAsync observer delivery, MxEventStream queue overflow surfacing MxGatewayException, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes GrpcSslContexts throw IllegalArgumentException (not SSLException), which the old catch (SSLException) let escape unwrapped — the catch in the shared channel builder was broadened to also wrap RuntimeException so callers always see one typed MxGatewayException.

Client.Java-008

Field Value
Severity Low
Category Error handling & resilience
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304
Status Resolved

Description: acknowledgeAlarmAsync and openSessionAsync apply ensureProtocolSuccess inside thenApply. If that validator throws a non-MxGatewayException RuntimeException it is wrapped by CompletionException with no fromGrpc normalisation, unlike the synchronous paths which normalise via try/catch. The async and sync error surfaces are therefore inconsistent.

Recommendation: Wrap the thenApply body so any non-MxGatewayException is routed through MxGatewayErrors.fromGrpc, matching the synchronous methods.

Resolution: (2026-05-18) Confirmed against source: the thenApply validators in openSessionAsync, invokeAsync, and acknowledgeAlarmAsync were not normalised — in practice the gateway's own validators (ensureProtocolSuccess, ensureMxAccessSuccess, ensureGatewayProtocolCompatible) only ever throw MxGatewayException, but a stray non-MxGatewayException RuntimeException (e.g. an NPE from a malformed reply) would surface raw inside CompletionException. Added MxGatewayChannels.normalisingValidator(operation, fn): it rethrows MxGatewayException unchanged and routes any other RuntimeException through MxGatewayErrors.fromGrpc, matching the synchronous try/catch paths. All three async thenApply sites now use it. Regression test: MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator.

Client.Java-009

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413
Status Resolved

Description: createChannel, withDeadline, withStreamDeadline, and toCompletable are duplicated nearly verbatim across MxGatewayClient and GalaxyRepositoryClient (~80 lines). A fix to one will not propagate to the other.

Recommendation: Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.

Resolution: (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private MxGatewayChannels utility class holding createChannel(options, tlsErrorPrefix), withDeadline(stub, options), withStreamDeadline(stub, options), toCompletable(future, operation), and the new normalisingValidator helper (Client.Java-008). Both MxGatewayClient and GalaxyRepositoryClient now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into MxGatewayErrors.fromGrpc is now the specific RPC name instead of the generic "async call"/"galaxy async call". Verified by the full existing async test suite plus the new MxGatewayLowFindingsTests.

Client.Java-010

Field Value
Severity Low
Category Documentation & comments
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272, clients/java/README.md:76
Status Resolved

Description: The acknowledgeAlarm Javadoc states the gateway authenticates against an invoke:alarm-ack scope, and the README states the Galaxy Repository requires a metadata:read scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.

Recommendation: Reconcile the scope names with src/MxGateway.Server/Security/ and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.

Resolution: (2026-05-18) Partially re-triaged. Verified against src/MxGateway.Server/Security/Authorization/GatewayScopes.cs and GatewayGrpcScopeResolver.cs: the canonical scope catalog is session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. (a) The README's metadata:read for the Galaxy Repository is correctTestConnectionRequest/GetLastDeployTimeRequest/DiscoverHierarchyRequest/WatchDeployEventsRequest all resolve to GatewayScopes.MetadataRead; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define metadata:read. (b) The acknowledgeAlarm Javadoc's invoke:alarm-ack is wrong — no such scope exists. AcknowledgeAlarmRequest and QueryActiveAlarmsRequest are not special-cased in GatewayGrpcScopeResolver, so they fall through the _ => GatewayScopes.Admin default and require the admin scope. The Javadoc was corrected to state the admin scope; queryActiveAlarms did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.

Client.Java-011

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63
Status Resolved

Description: The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.

Recommendation: Confirm fail-fast is intended (it appears to be); if so, document it on MxEventStream so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.

Resolution: (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and docs/DesignDecisions.md make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the MxEventStream class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an MxGatewayException, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. clients/java/README.md carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext.

Client.Java-012

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674
Status Resolved

Description: CommonOptions.resolved() mutates this (resolvedApiKey, resolvedTimeout) and returns this, but toClientOptions() and redactedJsonMap() read those mutated fields. If redactedJsonMap() is ever called before resolved(), it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.

Recommendation: Make resolved() return an immutable resolved value object, or compute resolvedApiKey/resolvedTimeout lazily in their getters so call ordering cannot produce stale output.

Resolution: (2026-05-18) Confirmed against source: resolved() populated the resolvedApiKey/resolvedTimeout mutable fields and toClientOptions()/redactedJsonMap() read them, so calling either before resolved() emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods resolvedApiKey() and resolvedTimeout() that compute their value on each call (API key from --api-key or the --api-key-env variable; timeout via parseDuration). toClientOptions() and redactedJsonMap() now call those accessors directly, so call ordering can no longer produce stale output. resolved() is retained as a no-op returning this purely for call-site readability (common.resolved()), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing MxGatewayCliTests JSON-redaction and option-parsing tests.

Client.Java-013

Field Value
Severity High
Category Testing coverage
Location clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304, clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244
Status Resolved

Description: MxGatewayCliSession in MxGatewayCli.java:1214 was extended in commit f220908 (the "bulk read/write CLI subcommands" change) with five new abstract methods — readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk. The test-only FakeSession in MxGatewayCliTests.java:212 still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of mxgateway-cli's test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale .class files under build/classes/java/test/ predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. gradle test (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The CLAUDE.md source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.

Recommendation: Add the five missing @Override implementations to FakeSession (stubs returning empty lists are fine — only subscribeBulk/unsubscribeBulk are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.

Resolution: 2026-05-20 — Added the five missing @Override stubs (readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk) to FakeSession in clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java, each returning an empty ArrayList<> to match the interface return types (List<BulkReadResult> / List<BulkWriteResult>) without throwing. Imported BulkReadResult, BulkWriteResult, WriteBulkEntry, Write2BulkEntry, WriteSecuredBulkEntry, WriteSecured2BulkEntry from mxaccess_gateway.v1.MxaccessGateway. GrpcMxGatewayCliSession in MxGatewayCli.java is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with gradle clean followed by gradle :mxgateway-cli:compileTestJava and gradle :mxgateway-cli:test from clients/java, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.

Client.Java-014

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124
Status Resolved

Description: MxEventStream.observer().beforeStart simply assigns requestStream without checking the closed flag, while close() reads requestStream after setting closed = true. If close() runs before the gRPC call has attached its ClientCallStreamObserver (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a finally block when an unrelated setup step throws), then at close time requestStream is null, so stream.cancel(...) is skipped. beforeStart then fires later, stores the live requestStream, and never observes closed — the underlying gRPC call leaks open and continues delivering events to a MxEventStream whose consumer has stopped iterating. The sibling DeployEventStream.beforeStart already does the correct thing (if (closed.get()) { requestStream.cancel(...); }); the two adaptors should behave identically.

Recommendation: Mirror DeployEventStream's pattern in MxEventStream.beforeStart: after storing requestStream, check the closed flag and cancel the stream eagerly if a prior close() has already fired. Add a regression test analogous to GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream to lock in the behavior.

Resolution: 2026-05-20 — Mirrored DeployEventStream.beforeStart in MxEventStream.beforeStart: after storing the ClientCallStreamObserver, the observer now reads the closed flag and calls requestStream.cancel("client cancelled event stream", null) when a prior close() already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing volatile boolean closed field (already established as a happens-before publisher by close() setting it before reading requestStream); no field shape changes were needed. clients/java/README.md documents the new safe-close-before-beforeStart contract. Regression test: MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream (mirrors GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream).

Client.Java-015

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138, MxGatewayClient.java:183-191,224-232,322-329, GalaxyRepositoryClient.java:164-170,212-214
Status Resolved

Description: MxGatewayChannels.toCompletable registers a whenComplete on the local target future to forward cancellation to the source gRPC ListenableFuture. Every caller — openSessionAsync, invokeAsync, acknowledgeAlarmAsync, discoverHierarchyPageAsync, getLastDeployTimeAsync — then chains .thenApply(normalisingValidator(...)) or .thenApply(::getOk) and returns the chained future to the user. CompletableFuture.thenApply returns a new future whose cancellation does not propagate back to the source target. Cancelling the user-facing future therefore never sets target.isCancelled() == true, so source.cancel(true) is never invoked and the underlying gRPC call continues until its deadline expires. The JavaClientDesign.md "Streaming" section explicitly says "Stream cancellation should call ClientCall.cancel" — the same expectation reasonably applies to the unary *Async surface.

Recommendation: Either return target directly from each *Async method (and inline the validator into the FutureCallback.onSuccess path so no thenApply is needed), or attach the cancellation listener to the final returned future. The cleanest fix is to have MxGatewayChannels.toCompletable return a future that wraps the validator internally and registers whenComplete on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a ServerCallStreamObserver.setOnCancelHandler latch).

Resolution: 2026-05-20 — Fixed by inlining the reply validator into MxGatewayChannels.toCompletable so the user-visible future is the same future cancellation is bound to: added a new toCompletable(source, operation, validator) overload that runs the validator inside the FutureCallback.onSuccess path (normalising non-MxGatewayException RuntimeExceptions through MxGatewayErrors.fromGrpc, matching the existing synchronous try/catch). Replaced the previous whenComplete-based cancellation listener with a small CancellingCompletableFuture<T> subclass whose cancel(boolean) forwards to the source ListenableFuture.cancel(...) unconditionally, so even the no-validator overload propagates cancellation deterministically (the whenComplete listener only fired when target.isCancelled() was already true, which is exactly the case thenApply broke). Updated MxGatewayClient.openSessionAsync, MxGatewayClient.invokeAsync, MxGatewayClient.acknowledgeAlarmAsync, GalaxyRepositoryClient.testConnectionAsync, and GalaxyRepositoryClient.getLastDeployTimeAsync to use the new validator overload directly (no .thenApply chain). GalaxyRepositoryClient.discoverHierarchyAsync is paged via thenCompose, so it now publishes the current in-flight page future via an AtomicReference and returns a top-level CompletableFuture whose overridden cancel(boolean) cancels whichever page is currently outstanding. clients/java/README.md documents the new cancellation contract: cancelling any *Async future aborts the underlying gRPC call. Regression tests: MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall (full in-process gRPC test using ServerCallStreamObserver.setOnCancelHandler to latch when the server observes RPC cancellation), toCompletableValidatorOverloadForwardsCancellationToSource, and toCompletableNoValidatorOverloadForwardsCancellationToSource (unit-level proofs that both MxGatewayChannels.toCompletable overloads forward cancel(true) to the source ListenableFuture).

Client.Java-016

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391, GalaxyRepositoryClient.java:285-315
Status Resolved

Description: Client.Java-009 introduced MxGatewayChannels to deduplicate createChannel, withDeadline, withStreamDeadline, and toCompletable. The two close() / closeAndAwaitTermination() methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of MxGatewayClient.close() + closeAndAwaitTermination() and GalaxyRepositoryClient.close() + closeAndAwaitTermination() are now duplicated verbatim, including the awaitTermination(connectTimeout) semantic (see Client.Java-019), the InterruptedException handling, and the ownedChannel == null guard. A fix to one path (e.g. introducing a dedicated shutdownTimeout option) will silently miss the other.

Recommendation: Move the shutdown logic into MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options) and MxGatewayChannels.shutdownAndAwaitTermination(...). Have both clients delegate to it. Same recommendation applies to the duplicated MxGatewayAuthInterceptor construction in the two constructors (MxGatewayClient(Channel, ...) and GalaxyRepositoryClient(Channel, ...)).

Resolution: 2026-05-20 — Extracted the duplicated shutdown logic into MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions) and MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions). Both helpers handle the ownedChannel == null no-op, the orderly-shutdown / awaitTermination / shutdownNow-on-timeout escalation, and the InterruptedException-restoring-the-interrupt-flag path. MxGatewayClient.close()/closeAndAwaitTermination() and GalaxyRepositoryClient.close()/closeAndAwaitTermination() are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's shutdownTimeout) lives in one place. Unused java.util.concurrent.TimeUnit imports were removed from both clients. The constructor-level MxGatewayAuthInterceptor duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel (covers the null-channel guard), shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout, and shutdownEscalatesToShutdownNowWhenTimeoutExceeded (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).

Client.Java-017

Field Value
Severity Low
Category Documentation & comments
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36, clients/java/README.md:99-107
Status Resolved

Description: MxEventStream.streamEvents was recently widened from a 16-element buffer to a 1024-element buffer (MxGatewayClient.streamEvents at line 268: new MxEventStream(1024)). The class-level Javadoc on MxEventStream still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and clients/java/README.md line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The MxGatewayClient.streamEvents inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.

Recommendation: Update the MxEventStream Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through MxGatewayClientOptions so callers can tune it per session.

Resolution: 2026-05-20 — Updated the MxEventStream class Javadoc and clients/java/README.md so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (MxGatewayClient.streamEvents) passes 1024 to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through MxGatewayClientOptions was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and MxGatewayClient.streamEvents is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.

Client.Java-018

Field Value
Severity Low
Category Security
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66
Status Resolved

Description: redactCredentials(value) splits its input on \\s+ (whitespace) and only redacts whitespace-delimited tokens that start with mxgw_ or equal bearer (case-insensitive). gRPC Status.getDescription() strings, log lines, and proto error messages can carry credentials separated by colons (Bearer:mxgw_id_secret), commas (token=mxgw_id_secret,scope=...), single quotes ('mxgw_id_secret'), parentheses ((mxgw_id_secret)), or embedded in URLs/paths — all of which leave the mxgw_ token attached to a non-whitespace neighbour and survive redaction. MxGatewayErrors.fromGrpc is the primary consumer; a gateway error description like authentication failed: 'mxgw_id_secret' would round-trip the secret into the resulting MxGatewayAuthenticationException message.

Recommendation: Replace the whitespace-split scrub with a regex-based pass that matches mxgw_[A-Za-z0-9_-]+ anywhere in the string and substitutes <redacted>; also redact Bearer\s+\S+ as a unit so the token after Bearer is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted that asserts a quoted or comma-delimited credential is fully masked.

Resolution: 2026-05-20 — Replaced the whitespace-split scrub with two compiled Pattern regexes: mxgw_[A-Za-z0-9_-]+ matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and (?i)bearer\s+\S+ masks an authorization-header style Bearer <token> as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes Bearer <redacted> for the common combined case and renders it idempotently. Regression tests in MxGatewayFixtureTests: redactCredentialsHandlesNonWhitespaceDelimitedTokens exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; redactCredentialsLeavesBenignContentAlone confirms strings without credentials and a null input are unchanged.

Client.Java-019

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391, GalaxyRepositoryClient.java:286-315
Status Resolved

Description: Both clients' close() / closeAndAwaitTermination() use options.connectTimeout() as the upper bound on awaitTermination. The connectTimeout semantically describes how long the client will wait to establish the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after shutdown(). With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to shutdownNow() and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on withDeadline. Conversely, a caller who sets a small connectTimeout (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.

Recommendation: Introduce a dedicated shutdownTimeout on MxGatewayClientOptions (defaulting to e.g. 510 s independent of connectTimeout) and use it in close() and closeAndAwaitTermination(). Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.

Resolution: 2026-05-20 — Added a dedicated shutdownTimeout Duration on MxGatewayClientOptions (builder method shutdownTimeout(Duration), accessor shutdownTimeout(), default 10 s), independent of connectTimeout. Both shared shutdown helpers introduced for Client.Java-016 (MxGatewayChannels.shutdown and shutdownAndAwaitTermination) call options.shutdownTimeout() as the awaitTermination upper bound, so a small connectTimeout (e.g. a 500 ms health-probe timeout) no longer forces a premature shutdownNow() on in-flight calls. The new option is reflected in toString() and documented on both helpers and the close()/closeAndAwaitTermination() Javadoc on both clients; clients/java/README.md notes the default and the independence from connectTimeout. Regression tests in MxGatewayLowFindingsIITests: shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to shutdownNow()), shutdownEscalatesToShutdownNowWhenTimeoutExceeded (a stuck channel beyond the shutdown timeout is forcibly shut down), and shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout (the default holds even when connectTimeout is small).

Client.Java-020

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254, galaxy_repository.proto:94
Status Resolved

Description: galaxy_repository.proto defines DeployEvent.sequence as uint64; the protobuf Java mapping projects that to a signed long. The CLI's text-mode galaxy-watch output prints it as "seq=%d ...", which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach 2^63), but the broader pattern is brittle: any unsigned proto field printed via %d will display incorrectly past the signed boundary. The JSON path uses protoJson(event) which formats unsigned longs as numeric strings via JsonFormat, so JSON output is correct; only the text mode is at risk.

Recommendation: Print the sequence with Long.toUnsignedString(event.getSequence()) (or switch the text format to %s and pass the unsigned-string conversion). The same rule should apply to any other uint64 proto fields that surface in CLI text output.

Resolution: 2026-05-20 — Updated the galaxy-watch text-mode out.printf in MxGatewayCli.GalaxyWatchCommand.call() to use %s for the sequence field and pass Long.toUnsignedString(event.getSequence()), so deploy sequences past 2^63 render as their correct unsigned decimal string instead of a negative signed long. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. Regression test: MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output contains seq=18446744073709551615 and does not contain seq=-1.

Client.Java-021

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135
Status Resolved

Description: Client.Java-002 fixed a deterministic terminal-state race in MxEventStream by introducing a terminate(MxGatewayException) method, a terminalLock, and a terminated flag so a close() arriving after a queue-overflow offer() cannot wipe the overflow exception. DeployEventStream — added later and structurally a copy of MxEventStream — never received the same fix. Its current close() does closed.set(true); stream.cancel(...); offer(END);, and its offer() overflow branch does queue.clear(); queue.offer(new MxGatewayException("...queue overflowed")); queue.offer(END); (lines 117-135). With these two paths running concurrently, the same sequence Client.Java-002 documented can repeat: the overflow branch enqueues [overflowException, END], close() then calls offer(END) which sees the queue full and falls into the END branch (queue.clear(); queue.offer(value);), wiping the overflow exception and leaving a clean end-of-stream. The CLI galaxy-watch (and any WatchDeployEvents consumer) loses the overflow signal it was supposed to surface, defeating the fail-fast backpressure contract. The 16-element buffer on DeployEventStream makes overflow far less likely than on MxEventStream in practice, but the race is identical.

Recommendation: Mirror the MxEventStream fix: add a terminated flag and terminalLock, route close(), onCompleted, and the overflow branch through a single terminate(MxGatewayException) method that wins on first arrival, and add the regression analogous to MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose. Given the two stream classes are now structural copies of each other, consider extracting the queue/terminate plumbing into a shared base or helper so the next fix lands once.

Resolution: 2026-05-20 — Mirrored the MxEventStream terminal-state serialisation in DeployEventStream: replaced the AtomicBoolean closed field with a volatile boolean closed, added a terminalLock/terminated pair, and routed all terminal paths (close(), onCompleted(), the overflow branch in offer()) through a single private terminate(MxGatewayException fault) method guarded by synchronized (terminalLock) { if (terminated) return; terminated = true; ... }. The first terminal condition wins: an overflow that publishes [exception, END] is no longer wiped by a subsequent close()/onCompleted() that previously took the "queue full → clear + offer(END)" branch. The class-level Javadoc now documents the single-consumer-thread iterator contract and the deterministic terminal transition, matching MxEventStream. Behavior outside the terminal path is unchanged: beforeStart still resolves the close-before-beforeStart race (Client.Java-014's deploy-stream counterpart, already in place), take() still surfaces interrupts, and the request stream is still cancelled on overflow/close. Regression tests in GalaxyRepositoryClientTests: deployEventStreamOverflowExceptionSurvivesASubsequentClose (deterministic — capacity-2 stream, force overflow, then close, assert the overflow exception is surfaced) and deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration concurrent race stress, mirrors MxGatewayMediumFindingsTests.eventStreamConcurrentOverflowAndCloseAlwaysTerminate).

Client.Java-022

Field Value
Severity Low
Category Documentation & comments
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:161-172
Status Resolved

Description: The Javadoc on the no-validator toCompletable(source, operation) overload claims: "calling cancel(true) on either the direct return value or the user-facing chained future ultimately invokes source.cancel(true) (chained futures forward to the upstream stage they were derived from, which is this future)." This is not how CompletableFuture.thenApply (or thenCompose, whenComplete, etc.) actually behaves: a downstream stage's cancel() only marks that derived stage as cancelled, it does NOT propagate cancellation upstream to the originating CancellingCompletableFuture. The Client.Java-015 resolution actually fixes the bug by inlining the validator into the new toCompletable(source, operation, validator) overload (lines 224-252) so users never need a downstream stage, and by GalaxyRepositoryClient.discoverHierarchyAsync using an explicit AtomicReference-based override (which has a correct comment at line 218-221 acknowledging exactly this thenCompose limitation). The contradiction between the two adjacent comments will mislead the next maintainer who decides to add a convenience .thenApply on top of a *Async return value — they will assume cancellation still flows through and re-introduce the Client.Java-015 leak.

Recommendation: Rewrite the toCompletable Javadoc to state the actual contract: cancel(...) on the direct return value (the CancellingCompletableFuture instance) forwards to the source RPC, but cancel(...) on a thenApply/thenCompose/thenAccept of that future does not — the cancellation is captured at the derived stage and the upstream RPC continues until its deadline. Callers that need cancellation through a chained pipeline must follow the discoverHierarchyAsync pattern (custom CompletableFuture subclass tracking the current in-flight stage). The underlying CancellingCompletableFuture class doc (lines 254-258) is already correct; only the toCompletable paragraph is misleading.

Resolution: 2026-05-20 — Rewrote the toCompletable(source, operation) Javadoc in MxGatewayChannels to reflect the actual CompletableFuture contract. The doc now states unambiguously: cancelling the direct return value (the CancellingCompletableFuture) forwards to the source ListenableFuture and aborts the underlying gRPC call (the Client.Java-015 fix), but cancelling a derived thenApply/thenCompose/thenAccept/whenComplete stage of that future does NOT propagate cancellation upstream — the derived stage is marked cancelled while the source RPC continues until its deadline. The Javadoc explicitly directs callers that need cancellation through a chained pipeline to either the toCompletable(source, operation, validator) overload (which inlines the validator into the FutureCallback.onSuccess path so the user-visible future is the same future cancellation is bound to) or the GalaxyRepositoryClient.discoverHierarchyAsync AtomicReference-based pattern (for thenCompose across paged calls). The CancellingCompletableFuture class Javadoc was already correct and is unchanged. Doc-only change; no behavior change and no new test required.

Client.Java-023

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1054, src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:634
Status Resolved

Description: MxEvent.worker_sequence is a proto uint64 (line 634 of mxaccess_gateway.proto). The stream-events CLI text path prints it with %d (client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());), which interprets the underlying signed long value — sequences past 2^63 would render as a negative number. This is the exact same uint64-with-%d bug that Client.Java-020 fixed for the galaxy-watch DeployEvent.sequence field; the resolution's stated rule ("The same rule should apply to any other uint64 proto fields that surface in CLI text output") was never extended to this site. In practice worker sequences will not reach 2^63 so this is latent rather than active, but the same fix and the same regression-test pattern apply.

Recommendation: Replace the %d with %s plus Long.toUnsignedString(event.getWorkerSequence()) (matching the Client.Java-020 fix in GalaxyWatchCommand), and add a regression test analogous to MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 covering the stream-events text-mode format string with -1L. The --after-worker-sequence CLI option (line 1035) is also typed as a long, which means the user cannot pass an unsigned value above 2^63 - 1 from the command line; that is a related but separate ergonomic gap worth noting in the same change.

Resolution: 2026-05-20 — Updated the stream-events text-mode client.out().printf in MxGatewayCli.StreamEventsCommand.call() to use %s for the sequence and pass Long.toUnsignedString(event.getWorkerSequence()), mirroring the Client.Java-020 fix in GalaxyWatchCommand. Worker sequences past 2^63 now render as their correct unsigned decimal string instead of a negative signed long. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and is unchanged. The --after-worker-sequence long ergonomic gap is a separate v2 concern and intentionally out of scope. Regression test: MxGatewayCliTests.streamEventsWorkerSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output starts with 18446744073709551615 and does not start with -1 , mirroring deployEventSequenceRendersAsUnsignedForHighUint64.

Client.Java-024

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883
Status Resolved

Description: BenchReadBulkCommand records per-call latency in latenciesNanos[latencyCount++] = elapsed; inside both the success branch (line 865) and the catch (Exception ex) failure branch (line 880). The failed-call durations are then fed into the percentileSummaryMs p50/p95/p99 calculation alongside successful calls, producing misleading latency stats when even a few transport errors occur during the bench window. Client.Rust-015 fixed exactly this pattern in clients/rust/src/bin/bench-read-bulk.rs ("stop bench-read-bulk from polluting success-latency histograms with failed-call durations"); the equivalent fix was not applied to the Java implementation. The cross-language matrix runner (scripts/run-client-e2e-tests.ps1) compares numbers across all five clients, so the Java numbers will be silently inconsistent with the Rust numbers on the same fault profile.

Recommendation: Drop the failure-branch latency record (only count failed++), or alternately maintain a separate failedLatenciesNanos array and report it as a distinct stat in the JSON output — but the success histogram must not include failed-call latencies. Cross-check the .NET, Go, and Python bench-read-bulk drivers in the same change to make sure all five clients use the same success-latency definition; the cross-language matrix is only useful if the metric is uniform.

Resolution: 2026-05-20 — Dropped the failure-branch latency record in BenchReadBulkCommand.call(): the catch (Exception ex) block no longer appends elapsed to latenciesNanos and no longer grows the array — it only increments failed++. The success-latency histogram fed into percentileSummaryMs (p50/p95/p99/max/mean) is now success-call-only, matching the Client.Rust-015 fix. The JSON output still surfaces failedCalls as a distinct top-level count so observers see fault rates separately from latency. An inline comment on the catch block documents the contract so the next maintainer doesn't reinstate the record. New CLI test MxGatewayCliTests.benchReadBulkCommandEmitsJsonSchemaKeys (added under Client.Java-026 below) covers the JSON schema produced by the corrected path. The .NET / Go / Python bench drivers were intentionally left out of scope for this Java-focused finding — that cross-client audit is its own follow-up and tracked separately.

Client.Java-025

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185
Status Resolved

Description: CommonOptions.toClientOptions() populates the MxGatewayClientOptions builder with endpoint, apiKey, plaintext, caCertificatePath, serverNameOverride, and callTimeout, but never sets shutdownTimeout even though Client.Java-019 introduced it as a first-class option. CLI users therefore always inherit the 10-second default and have no way to override it from the command line, which makes the new option effectively client-library-only. CLI users running long-lived operations (a big discover-hierarchy page-chain, a streaming galaxy-watch session that needs to drain on Ctrl+C) cannot tune the shutdown deadline up; users running short health probes who want a small connectTimeout and a small shutdownTimeout to keep the CLI snappy on failure also cannot.

Recommendation: Add a --shutdown-timeout option to CommonOptions (parsed via the existing parseDuration helper, default unset → use the 10-second library default) and propagate it into toClientOptions() so the CLI surface tracks the library surface. Include the resolved value in redactedJsonMap() so --json output shows the effective shutdown deadline.

Resolution: 2026-05-20 — Added a --shutdown-timeout option to CommonOptions in MxGatewayCli.java, parsed via the existing parseDuration helper (so it accepts 10s, 500ms, ISO-8601 PT10S, etc.). A new lazy accessor resolvedShutdownTimeout() returns the parsed Duration when the user passed --shutdown-timeout, or null when unset so the MxGatewayClientOptions builder default (10s, established by Client.Java-019) applies. toClientOptions() now conditionally calls builder.shutdownTimeout(resolvedShutdownTimeout) only when the user opted in, preserving the library default for the common case. redactedJsonMap() includes the resolved value under key "shutdownTimeout" (empty string when unset) so --json output shows the effective shutdown deadline. The CLI surface now tracks the library surface so a user running a long page-chain can pass --shutdown-timeout 60s, and a user running a short health probe can pair --timeout 500ms with --shutdown-timeout 500ms to keep the CLI snappy on failure. Behavior for callers who do not pass the new flag is unchanged.

Client.Java-026

Field Value
Severity Low
Category Testing coverage
Location clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java
Status Resolved

Description: Client.Java-013 explicitly deferred adding CLI-level test coverage for the read-bulk, write-bulk, and bench-read-bulk subcommands ("Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix"), and the resolution explicitly stated that "follow-up is tracked separately and out of scope for this unblock-compilation fix." That follow-up was never filed. The current MxGatewayCliTests only covers version, open-session (JSON redaction), write, smoke, subscribe-bulk, unsubscribe-bulk, and the Client.Java-020 unsigned-uint64 format string — six of the thirteen non-trivial subcommands the CLI ships are completely untested at the CLI layer (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk), as are stream-events, the four galaxy-* commands, and close-session. The FakeSession stubs all return empty lists, so an end-to-end CLI test would catch JSON-shape regressions, argument-parsing bugs, and option contract breaks that the bulk Session unit tests on the library side do not exercise. This same coverage gap is what made Client.Java-013 itself only surface on a clean Gradle build.

Recommendation: Add at least one round-trip CLI test per bulk subcommand (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk) that exercises the JSON output shape and the value parser (parseValue(type, text) is shared across all five and the only write*-bulk path that catches typos in the type switch). Extending the FakeSession stubs to return at least one result row makes the assertions meaningful. The bench-read-bulk test can run with a 1-second --duration-seconds and a 0-second --warmup-seconds and assert the JSON schema keys (totalCalls, latencyMs.p50, callsPerSecond) rather than the numeric values.

Resolution: 2026-05-20 — Added round-trip CLI tests for all six bulk-family subcommands plus the new Client.Java-023 unsigned-uint64 regression to MxGatewayCliTests. The FakeSession stubs were upgraded from empty-list returns to per-call recorders that publish the parsed entries (e.g. lastWriteBulkEntries, lastReadBulkTimeoutMs) and synthesise one BulkReadResult/BulkWriteResult per requested handle so the JSON output assertions exercise the bulkReadResultMap and bulkWriteResultMap serialisers. New tests: (a) readBulkCommandForwardsTimeoutAndPrintsResults — asserts --timeout-ms 750 reaches the session and the JSON output carries the per-tag tagAddress, itemHandle, wasCached, and quality fields; (b) writeBulkCommandParsesTypedValuesAndPrintsResults — asserts --type int32 --values 111,222 --user-id 5 parses through the shared parseValue switch and the entries are constructed with the expected typed MxValue and userId; (c) write2BulkCommandForwardsTimestampAndPrintsResults — asserts the --timestamp 2026-05-20T00:00:00Z reaches the entry as a timestampValue (hasTimestampValue() is true); (d) writeSecuredBulkCommandForwardsUserIdsAndPrintsResults — asserts --current-user-id 7 --verifier-user-id 8 are both propagated; (e) writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults — combination of (c) and (d); (f) benchReadBulkCommandEmitsJsonSchemaKeys — runs the bench in a 1s steady / 0s warmup window and asserts the JSON output contains the cross-language schema keys (language=java, command=bench-read-bulk, bulkSize=2, totalCalls, successfulCalls, failedCalls, callsPerSecond, latencyMs.p50/p95/p99, tags including the synthesised TestMachine_001.TestChangingInt/TestMachine_002.TestChangingInt pair); (g) streamEventsWorkerSequenceRendersAsUnsignedForHighUint64 — Client.Java-023 regression. The recommendation's stream-events and galaxy-* CLI tests were intentionally not added in this round — they require either an in-process gateway/galaxy server or package-private MxEventStream/DeployEventStream constructor access from the CLI test module, which is its own infrastructure work; the library-side tests in GalaxyRepositoryClientTests already cover the streaming wire behaviour.

Client.Java-027

Field Value
Severity Medium
Category Documentation & comments
Location clients/java/README.md:36,107-175,185,205,220, clients/java/JavaClientDesign.md:195-211
Status Resolved

Description: Commit 397d3c5 renamed the gradle subprojects to zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli in settings.gradle, but did not propagate that rename into the README's documented gradle commands or into JavaClientDesign.md. Every documented gradle invocation still uses the old short names — gradle :mxgateway-client:generateProto, gradle :mxgateway-cli:run --args=..., gradle :mxgateway-client:jar :mxgateway-cli:installDist — and every one fails with project 'mxgateway-client' not found in root project 'zb-mom-ww-mxaccessgw-java'. A user copy-pasting from the README or design doc will hit this on the very first command.

Recommendation: Find-and-replace :mxgateway-client::zb-mom-ww-mxgateway-client: and :mxgateway-cli::zb-mom-ww-mxgateway-cli: across clients/java/README.md and clients/java/JavaClientDesign.md. (Roughly 17 occurrences in the README, 3 in the design doc.) Test by running one updated command end-to-end.

Resolution: 2026-05-24 — Replaced every stale gradle task path in clients/java/README.md (line 37 :mxgateway-client:generateProto, lines 108110 :mxgateway-cli:run galaxy-* invocations, lines 160161 :mxgateway-cli:run galaxy-watch invocations, lines 169176 :mxgateway-cli:run gateway-command invocations, line 186 :mxgateway-cli:run TLS smoke invocation, line 206 :mxgateway-client:jar :mxgateway-cli:installDist packaging command, line 221 :mxgateway-cli:run integration-check invocation) and in clients/java/JavaClientDesign.md (lines 195196 packaging bullets, lines 209212 "Current Build" prose) with their prefixed :zb-mom-ww-mxgateway-client: / :zb-mom-ww-mxgateway-cli: equivalents. Verified by running gradle :zb-mom-ww-mxgateway-client:test --tests … (now the updated documented form) end-to-end and gradle build, both BUILD SUCCESSFUL. Doc-only change; no production code touched.

Client.Java-028

Field Value
Severity Medium
Category Documentation & comments
Location clients/java/JavaClientDesign.md:23-27
Status Resolved

Description: The build-layout block in JavaClientDesign.md still shows the old Java package paths com/dohertylan/mxgateway/client/ and com/dohertylan/mxgateway/cli/. The actual source tree was moved to com/zb/mom/ww/mxgateway/{client,cli}/ in commit 397d3c5. Anyone using the design doc to locate or navigate code will look in the wrong place.

Recommendation: Update the layout block to reflect the new paths: com/zb/mom/ww/mxgateway/client/ and com/zb/mom/ww/mxgateway/cli/. Comment-only change.

Resolution: 2026-05-24 — Updated the Build Layout block in clients/java/JavaClientDesign.md:23-27 to show com/zb/mom/ww/mxgateway/client/ and com/zb/mom/ww/mxgateway/cli/ instead of the old com/dohertylan/mxgateway/{client,cli}/ package paths, matching the actual on-disk source tree under clients/java/zb-mom-ww-mxgateway-{client,cli}/src/{main,test}/java/. Doc-only change; no production code touched.

Client.Java-029

Field Value
Severity Low
Category Documentation & comments
Location clients/java/README.md:208-209
Status Resolved

Description: The packaging section states "The library jar is under zb-mom-ww-mxgateway-client/build/libs. The installed CLI distribution is under zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli." The library-jar path is correct, but the install-distribution path is wrong — gradle's installDist produces a directory whose name matches the project name, not the (now-retired) short name, so the actual path is zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/. The e2e script (scripts/run-client-e2e-tests.ps1) uses the correct path, so the script works; only the README is wrong.

Recommendation: Correct the README to zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli. Comment-only.

Resolution: 2026-05-24 — Corrected the install-distribution path in clients/java/README.md:210 from zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli to zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli, matching what gradle's installDist actually produces (the directory name matches the subproject name). The library-jar path on the preceding line was already correct and is unchanged. Doc-only change; the e2e script scripts/run-client-e2e-tests.ps1 was already using the correct path.

Client.Java-030

Field Value
Severity Low
Category Testing coverage
Location clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/
Status Resolved

Description: Commit 397d3c5 added the missing QueryActiveAlarmsRequest proto message and the corresponding rpc QueryActiveAlarms to mxaccess_gateway.proto. The Java client now generates the request type and the gRPC stub method, and MxGatewayClient.queryActiveAlarms correctly references both. No unit test exercises the new RPC end-to-end on the Java side — the proto compiles, the import resolves, the method is callable, but the absence of a queryActiveAlarmsForwardsRequestAndStreamsSnapshots (or similar) fixture means a serialisation regression in QueryActiveAlarmsRequest or the streaming reply would not surface in the Java unit tests.

Recommendation: Add a unit test in MxGatewayFixtureTests (or wherever the alarm fixtures live) that pushes a QueryActiveAlarmsRequest through a FakeMxAccessGateway and asserts the ActiveAlarmSnapshot stream is consumed correctly — mirror the existing acknowledgeAlarm test shape.

Resolution: 2026-05-24 — Re-triaged the recommendation: the suite uses the InProcessGateway + TestGatewayService fixture in MxGatewayClientSessionTests (no separate FakeMxAccessGateway), and there is in fact no existing acknowledgeAlarm test in the current tree to mirror — the alarm RPC surface has zero coverage. Added queryActiveAlarmsForwardsRequestAndStreamsSnapshots to MxGatewayClientSessionTests (the same file/fixture style as the other unary/streaming RPC tests): the test overrides TestGatewayService.queryActiveAlarms to capture the inbound QueryActiveAlarmsRequest and emit two ActiveAlarmSnapshot messages (one ACTIVE, one ACTIVE_ACKED with an operator/comment populated), then calls MxGatewayClient.queryActiveAlarms with a request carrying session_id, client_correlation_id, and alarm_filter_prefix. It awaits onCompleted via a CountDownLatch, closes the subscription, and asserts (a) the server observed all three inbound request fields, (b) both snapshots arrived in order with the expected alarm_full_reference/current_state/severity/operator_user, and (c) the observer never received an error. TDD red phase confirmed by temporarily changing the session_id assertion to "WRONG-SESSION-ID"gradle :zb-mom-ww-mxgateway-client:test --tests "…queryActiveAlarmsForwardsRequestAndStreamsSnapshots" failed with AssertionFailedError at MxGatewayClientSessionTests.java:252. Restoring the assertion turned the build green. Full gradle build from clients/java is BUILD SUCCESSFUL.

Client.Java-031

Field Value
Severity Low
Category mxaccessgw conventions
Location clients/java/README.md:13,17,26
Status Resolved

Description: The README prose at lines 1326 introduces the subprojects as mxgateway-client and mxgateway-cli (the old short names) when discussing the layout. Those are no longer the actual subproject names — settings.gradle declares zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli. The prose works as a naming mnemonic, but it confuses anyone trying to map README descriptions to actual gradle output, IDE project trees, or the e2e script.

Recommendation: Either (a) update the prose to the full prefixed names, or (b) clarify in a one-line note: "The subprojects are zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli; this README refers to them by their short suffixes below for readability." (a) is more honest; (b) preserves readability at the cost of one extra concept.

Resolution: 2026-05-24 — Took option (a): updated the README layout-section prose in clients/java/README.md:17,22,26 (the mxgateway-client generates… paragraph, the mxgateway-client exposes… paragraph, and the mxgateway-cli depends on mxgateway-client… paragraph) to use the full prefixed zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli names, matching the layout block at lines 1314 and settings.gradle. Reflows the final paragraph slightly because the prefixed names push past the existing 80-column wrap; content is unchanged otherwise. Doc-only change.

Client.Java-032

Field Value
Severity High
Category Documentation & comments
Location clients/java/README.md:182-183
Status Resolved

Description: Commit 8738735 ("clients: document StreamAlarms + AcknowledgeAlarm in each README") added two new gradle invocations to the CLI Usage block:

gradle :zb-mom-ww-mxgateway-cli:run --args="stream-alarms --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --limit 1 --json"
gradle :zb-mom-ww-mxgateway-cli:run --args="acknowledge-alarm --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --alarm-reference \"\\Galaxy\Area001.Pump001.PumpFault\" --json"

Both are wrong against the actual CLI surface in MxGatewayCli.java:

  • StreamAlarmsCommand (line 1060) has no --session-id option — the gateway's alarm feed is session-less ("Served by the gateway's always-on alarm monitor — no worker session is opened" per the SDK Javadoc at MxGatewayClient.java:331). picocli will reject the unknown option with a non-zero exit before the command body runs.
  • AcknowledgeAlarmCommand (line 1135) also has no --session-id option, AND the option is named --reference (line 1136), not --alarm-reference. Two unknown-option errors per copy-paste.

A user copying either invocation from the README hits a picocli parse error immediately. The other clients' README updates in the same commit (commit 8738735) likely have the same issue but are out of scope for this Java review.

Recommendation: Drop the --session-id <id> token from both documented invocations, and change --alarm-reference to --reference in the acknowledge-alarm line. Optionally also add --filter-prefix to the stream-alarms example so readers see the scoping option, and align README option names with the actual CLI by either renaming the CLI option --reference--alarm-reference (matches the proto alarm_full_reference field semantically) or leaving as is and only fixing the README. Add a small MxGatewayCliTests parse-only assertion for both subcommands that exercises every option flag to prevent the same drift the next time the CLI surface or README is touched.

Resolution: 2026-05-24 — Confirmed root cause against clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1174-1182,1248-1258: StreamAlarmsCommand exposes only --filter-prefix / --limit and AcknowledgeAlarmCommand exposes --reference / --comment / --operator — neither has a --session-id option and acknowledge-alarm has no --alarm-reference option, so both documented invocations failed picocli parse at the first unknown option. Fixed clients/java/README.md:182-183 by dropping the --session-id <id> token from both lines, replacing it with --filter-prefix Galaxy on the stream-alarms example so readers see the actual scoping flag, and changing --alarm-reference to --reference on the acknowledge-alarm example. Added MxGatewayCli.commandLine(...) to package-private visibility (was private) so the test can drive the production picocli CommandLine directly without executing the command body. Regression tests in MxGatewayCliTests: readmeDocumentedStreamAlarmsExampleParsesCleanly and readmeDocumentedAcknowledgeAlarmExampleParsesCleanly pin the exact token list documented in the README and assert commandLine.parseArgs(...) returns without throwing a picocli.CommandLine.ParameterException. TDD red phase: before the README fix the previously-documented tokens (--session-id <id> + --alarm-reference ...) would have thrown Unknown option: '--session-id' / Unknown option: '--alarm-reference' at parse time; the new tests pass against the corrected README and would fail the next time someone drifts the documented surface from the actual CLI options.

Client.Java-033

Field Value
Severity Medium
Category Correctness & logic bugs
Location clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1078-1098
Status Resolved

Description: StreamAlarmsCommand.call() allocates a bounded ArrayBlockingQueue<Object>(1024) and the gRPC observer publishes each AlarmFeedMessage via queue.offer(value):

BlockingQueue<Object> queue = new ArrayBlockingQueue<>(1024);
…
@Override public void onNext(AlarmFeedMessage value) { queue.offer(value); }

BlockingQueue.offer returns false when the queue is full and the message is dropped silently — no exception, no log, no signal to the consumer that data was lost. Combined with gRPC's default auto-inbound flow control (the same regime that Client.Java-011 documented for MxEventStream), any consumer stall long enough to fill 1024 slots will silently lose alarms past the boundary. The active-alarm snapshot replay at session open is exactly the burst case that pushes a moderately busy alarm feed close to that limit, and the failure mode is invisible — the CLI prints a truncated feed and exits 0.

The library-side MxEventStream (Client.Java-002 resolution) and DeployEventStream (Client.Java-021 resolution) explicitly surface overflow as MxGatewayException so the consumer can resubscribe with a resume cursor. The new stream-alarms CLI path regresses that contract.

JavaStyleGuide.md ("Streaming" section, line 57) explicitly prohibits this pattern: "Do not reorder, coalesce, or drop events in client code." CLAUDE.md ("fail-fast event backpressure") makes overflow a deliberate cancel-and-surface signal, not a silent drop.

Recommendation: Either (a) wrap the gRPC observer in the existing MxEventStream-style adaptor that calls subscription.cancel() and queues an exception on queue.offer returning false, then surface that exception from the drain loop — mirroring MxEventStream.observer().onNext's overflow branch; or (b) reuse the library-side fail-fast plumbing by promoting MxEventStream (or extracting its terminal-state base) into a public MxAlarmFeedStream and have MxGatewayClient.streamAlarms return that instead of a bare subscription handle. Option (b) lines up with Client.Java-036 (deduplicate the subscription class family). Add a CLI regression test that overflows the bounded queue and asserts a non-zero exit / overflow exception, mirroring MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose.

Resolution: 2026-05-24 — Confirmed root cause at MxGatewayCli.java StreamAlarmsCommand.call(): the observer's onNext did queue.offer(value) and ignored the boolean return, so a 1024-element queue would silently drop messages past capacity. The same silent-drop affected the onCompleted branch (which offers ALARM_FEED_END) once the queue was full, deadlocking the consumer since the drain loop never sees END. Took option (a) — minimal change that matches MxEventStream's overflow branch. The fix: detect a failed offer inside onNext, call subscription.cancel() (via an AtomicReference<MxGatewayAlarmFeedSubscription> published immediately after client.streamAlarms returns), queue.clear(), then queue.offer(IllegalStateException("stream-alarms queue overflowed (capacity 1024); consumer too slow")) followed by queue.offer(ALARM_FEED_END). The existing drain-loop Throwable-branch then surfaces the overflow as a thrown IllegalStateException from call(), which picocli reports as a non-zero CLI exit. Option (b) (promoting MxEventStream to a public alarm-feed stream) was considered and rejected for this change — it would change the public SDK surface; Client.Java-036's refactor handles deduplication at the subscription layer instead. Regression test: MxGatewayCliTests.streamAlarmsCommandFailsFastOnQueueOverflow — drives an OverflowingFakeClient whose streamAlarms synchronously pushes 2000 messages to the observer (exceeding the 1024 buffer), then asserts run.exitCode() != 0. TDD red phase confirmed deterministically: before the fix the test deadlocked (the buggy offer silently dropped both the overflowing alarms AND the ALARM_FEED_END sentinel that arrived after the queue filled, so the drain loop's queue.take() blocked forever); the background gradle run had to be killed with TaskStop. After the fix the same test exits in <1 second with the overflow exception propagating through picocli.

Client.Java-034

Field Value
Severity Medium
Category Correctness & logic bugs
Location clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:182-198
Status Resolved

Description: BatchCommand.call() reads one CLI invocation per stdin line and tokenises with:

String[] args = line.trim().split("\\s+");
…
int exitCode = cmd.execute(args);

split("\\s+") does no shell-quoting parsing — it just splits on whitespace runs. Any argument with embedded whitespace gets shredded mid-string. Examples that break:

  • acknowledge-alarm --comment "needs verification" --operator op1 becomes [acknowledge-alarm, --comment, "needs, verification", --operator, op1] — picocli sees "needs as the comment value and verification" as an unknown positional argument.
  • galaxy-watch --json --last-seen-deploy-time "2026-04-28 18:30:00Z" (any value with a space) fails the same way.

The commit message for 71d2c39 says the protocol expects "one line of stdin = one full CLI invocation" without specifying whether quoted arguments are honoured, but the cross-language matrix (scripts/run-client-e2e-tests.ps1 per the prior reviews) compares output across all five clients — if other clients (.NET, Go, Rust, Python) implement shell-like tokenisation, the Java CLI will diverge on any command-line value containing whitespace, and the e2e harness will fail or skip those cases inconsistently.

The current MxGatewayCliTests test set (batchCommandExecutesVersionAndEmitsEorMarker, batchCommandEmitsEorAfterFailedCommandAndContinues) only covers whitespace-free args, so the bug is invisible in the test suite.

Recommendation: Replace line.trim().split("\\s+") with a real shell-style tokeniser that honours single and double quotes and backslash escapes — picocli.CommandLine.ArgumentParser doesn't ship one, but Apache Commons Exec's CommandLine.translateCommandline(String), JDK 21's java.util.spi.ToolProvider argument parsing, or a small hand-written state machine all work. Cross-check the .NET / Go / Rust / Python batch implementations in the same change so all five clients use the same tokenisation; document the contract in the protocol comment in MxGatewayCli.java and in scripts/run-client-e2e-tests.ps1. Add a CLI test that feeds acknowledge-alarm --comment "with spaces" through batch and asserts the --comment value reaches the gateway as "with spaces".

Resolution: 2026-05-24 — Confirmed root cause: BatchCommand.call() at the per-line loop used line.trim().split("\\s+") which has no quote handling. Replaced with a new package-private MxGatewayCli.tokenizeBatchLine(String) static helper — a hand-rolled POSIX-style shell tokenizer (no new dependency added) that honours: (a) double-quoted runs "..." with \\, \", and \n escapes inside; (b) single-quoted runs '...' taken literally with no escapes (POSIX rule); (c) backslash escapes for any single character outside quotes (so needs\ verification is one token); (d) whitespace runs outside quotes separate tokens; (e) explicit IllegalArgumentException on unterminated quote or trailing backslash so the batch loop surfaces it as a JSON error instead of emitting wrong args. The BatchCommand per-line tokenisation now calls tokenizeBatchLine(line) and treats an empty-array result as a blank line (skip). Behaviour for whitespace-only input is unchanged. The cross-client batch audit (.NET / Go / Rust / Python) is out of scope for this Java-focused finding and tracked separately. Regression tests in MxGatewayCliTests: (a) batchCommandTokenisesDoubleQuotedArgumentWithEmbeddedSpaces--comment "needs verification" round-trips intact; (b) batchCommandTokenisesSingleQuotedArgumentWithEmbeddedSpaces — single-quoted variant; (c) batchCommandTokenisesBackslashEscapedSpaceOutsideQuotesneeds\ verification outside quotes; (d) batchCommandPreservesEmptyQuotedArgument"" parses to an empty-string argument; (e) batchCommandSupportsBackslashEscapedQuoteInsideDoubleQuotes\"inner\" survives the inner quotes. TDD red phase confirmed: all five tests failed against the original split("\\s+") implementation; after the fix all five pass.

Client.Java-035

Field Value
Severity Low
Category Testing coverage
Location clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientSessionTests.java
Status Resolved

Description: Commit 8a0c59d added MxGatewayClient.streamAlarms(StreamAlarmsRequest, StreamObserver<AlarmFeedMessage>) and a new public MxGatewayAlarmFeedSubscription class. No library-side test exercises either: a grep for streamAlarms across zb-mom-ww-mxgateway-client/src/test/... returns zero matches. The CLI tests (MxGatewayCliTests.streamAlarmsCommand*) exercise the path end-to-end, but they route through a FakeClient.streamAlarms override that bypasses the production subscription.wrap(observer) glue and the withStreamDeadline(rawAsyncStub()).streamAlarms(...) call. A regression to either — forgetting .wrap(observer), dropping the deadline interceptor, misnaming the request — would compile and pass the CLI tests but break against a real gateway.

This is the same coverage gap pattern as Client.Java-030 (no fixture test for QueryActiveAlarmsRequest) which was resolved by adding queryActiveAlarmsForwardsRequestAndStreamsSnapshots to MxGatewayClientSessionTests. The new alarm-feed RPC deserves the same shape.

Recommendation: Add streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages to MxGatewayClientSessionTests (in-process gRPC via the existing InProcessGateway + TestGatewayService fixture): override TestGatewayService.streamAlarms to capture the inbound StreamAlarmsRequest and emit one active_alarm snapshot, one snapshot_complete, and one transition, then complete. Call MxGatewayClient.streamAlarms, drain the observer via a CountDownLatch, and assert (a) the server observed the alarm_filter_prefix, (b) all three messages arrived in order with the expected payload-case, and (c) MxGatewayAlarmFeedSubscription.cancel() aborts the call (latch via ServerCallStreamObserver.setOnCancelHandler, mirroring the Client.Java-015 cancellation regression). Optionally also cover the cancel-before-beforeStart race that MxGatewayAlarmFeedSubscription.wrap handles, mirroring mxEventStreamCloseBeforeBeforeStartCancelsStream.

Resolution: 2026-05-24 — Confirmed the coverage gap: a grep across zb-mom-ww-mxgateway-client/src/test/... for streamAlarms returned zero matches; the CLI-only test routed through FakeClient.streamAlarms which bypassed both the production subscription.wrap(observer) and the withStreamDeadline(rawAsyncStub()).streamAlarms(...) gRPC call. Added streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages to MxGatewayClientSessionTests in the same shape as queryActiveAlarmsForwardsRequestAndStreamsSnapshots (Client.Java-030 resolved this way). The test overrides TestGatewayService.streamAlarms to capture the inbound StreamAlarmsRequest, register a serverCancelled latch via (ServerCallStreamObserver<AlarmFeedMessage>) responseObserver).setOnCancelHandler(...), then emit three messages: an active_alarm snapshot, a snapshot_complete sentinel, and a transition. It deliberately does NOT call onCompleted() so the call remains open for the cancellation assertion. The test then calls MxGatewayClient.streamAlarms against the in-process gateway, drains the wrapped observer via a threeReceived CountDownLatch, and asserts (a) the server observed alarm_filter_prefix=Tank01, (b) all three messages arrived in order with the expected payload-case (ACTIVE_ALARM, SNAPSHOT_COMPLETE, TRANSITION) and payload values (Tank01.Level.HiHi, transition kind ACKNOWLEDGE), and (c) subscription.cancel() causes the server's on-cancel handler to fire within 5 s (proves cancellation propagates through the production subscription.wrap(observer) glue, not just the CLI fake). TDD red phase: temporarily replaced the production MxGatewayClient.streamAlarms body with withStreamDeadline(rawAsyncStub()).streamAlarms(request, observer); (dropping the subscription.wrap(observer) indirection); the test failed at the serverCancelled.await assertion because cancellation was no longer wired to the underlying gRPC call. Restoring the production glue turned the build green.

Client.Java-036

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayAlarmFeedSubscription.java, MxGatewayEventSubscription.java, MxGatewayActiveAlarmsSubscription.java, DeployEventSubscription.java
Status Resolved

Description: MxGatewayAlarmFeedSubscription is a structural near-copy of MxGatewayEventSubscription — same AtomicReference<ClientCallStreamObserver<…>> + AtomicBoolean cancelled field shape, the same wrap(observer) returning a ClientResponseObserver that stores requestStream in beforeStart, the same close-before-beforeStart race handling that Client.Java-014 originally fixed for MxEventStream, and the same cancel()+close() idempotency contract. The four subscription classes (MxGatewayEventSubscription, MxGatewayActiveAlarmsSubscription, MxGatewayAlarmFeedSubscription, DeployEventSubscription) are now ~60-line near-clones differing only in the request/response generic parameters and the cancel message string.

This is the same maintenance-hazard pattern Client.Java-009 / Client.Java-016 identified for createChannel / withDeadline / shutdown and which was resolved by extracting MxGatewayChannels. A future fix to one subscription class (e.g. propagating Client.Java-014's race fix to a future fifth subscription type, or hardening idempotent cancel) will silently miss the others — exactly what happened with Client.Java-021 (the MxEventStream terminal-state fix that didn't reach DeployEventStream for months).

Recommendation: Extract a package-private abstract base, e.g. MxGatewayStreamSubscription<TRequest>, holding the AtomicReference / AtomicBoolean pair, the cancel() / close() implementation, and a ClientResponseObserver factory parameterised by the cancel-message string and the response observer. Have all four subscription classes extend it. Behaviour-only refactor — no public API change, existing tests cover the contract.

Resolution: 2026-05-24 — Extracted a package-private abstract base MxGatewayStreamSubscription<TRequest, TResponse> implements AutoCloseable (new file clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayStreamSubscription.java). It holds the shared AtomicReference<ClientCallStreamObserver<TRequest>> and AtomicBoolean cancelled pair, the wrap(StreamObserver<TResponse>) factory that returns a ClientResponseObserver with the Client.Java-014 close-before-beforeStart fix baked in, the cancel() / close() implementation, and an immutable cancelMessage injected by the subclass constructor. The four prior 60-line near-clones (MxGatewayEventSubscription, MxGatewayAlarmFeedSubscription, MxGatewayActiveAlarmsSubscription, DeployEventSubscription) collapse to ~10-line subclasses that only declare their <Request, Response> type parameters and supply the cancel-message string to super(...). Public API surface is preserved: each subclass remains a public final class with a public no-arg constructor (the constructor was implicit on the original classes; I made it explicit public on the subclasses so the existing CLI FakeClient.streamAlarms in a different package can still new MxGatewayAlarmFeedSubscription()). The wrap(...) method is final and package-private on the base — same accessibility the four subclasses had before — so production callers in MxGatewayClient/GalaxyRepositoryClient see no change. New test file MxGatewayStreamSubscriptionContractTests exercises the lifecycle/cancellation contract identically across all four subclasses (16 tests, four per scenario): (a) cancel-before-beforeStart eagerly cancels the stream once it attaches with the subclass-specific message, (b) cancel-after-beforeStart forwards directly to the stream, (c) close() delegates to cancel(), (d) the wrapped observer forwards onNext/onError/onCompleted verbatim, and a compile-time typeBoundsCheck helper that asserts each subclass still binds its <Req, Resp> parameters to the right proto types. TDD red phase confirmed: temporarily breaking one subclass's super(...) message to "BROKEN MESSAGE" made the contract test for that subclass fail with expected: <client cancelled alarm feed> but was: <BROKEN MESSAGE>; restoring the correct value turned all 16 contract tests green. Future fixes to the shared lifecycle now live in one place — the next Client.Java-014/021-style race fix cannot drift across the four classes.