Client.Java-027 (Documentation): Updated 17 Gradle task references in
clients/java/README.md (lines 37, 108-110, 160-161, 169-176, 186, 206,
221) and 3 in clients/java/JavaClientDesign.md from the retired short
subproject names to the canonical zb-mom-ww-mxgateway-client /
zb-mom-ww-mxgateway-cli names. Copy-pasting any documented command now
matches the subproject names declared in settings.gradle.
Client.Java-028 (Design adherence): Build-layout block in
JavaClientDesign.md lines 23-27 updated to show the actual package
paths com/zb/mom/ww/mxgateway/{client,cli}/ instead of the retired
com/dohertylan/mxgateway/{client,cli}/ paths.
Client.Java-029 (Documentation): README.md line 210 corrected from
"zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli" to
"zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli" — Gradle
installDist produces a directory whose name matches the project name,
not the short suffix. The e2e script already used the correct path.
Client.Java-030 (Testing coverage): Added
queryActiveAlarmsForwardsRequestAndStreamsSnapshots to
MxGatewayClientSessionTests. The test pushes a QueryActiveAlarmsRequest
carrying session_id / client_correlation_id / alarm_filter_prefix
through an InProcessGateway + TestGatewayService and asserts the server
observed all three request fields, two ActiveAlarmSnapshots stream in
order, and onError is never called. TDD red→green confirmed via a
deliberately-wrong session_id assertion. The re-triage note in
Client.Java-030's resolution clarifies that the finding's reference to
"the existing acknowledgeAlarm test" was aspirational — the alarm RPC
surface had zero coverage before this commit.
Client.Java-031 (Conventions): README.md prose lines 17, 22, 26 updated
to use the canonical zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli
names so the layout description matches Gradle / IDE project names.
Verification: gradle build BUILD SUCCESSFUL; all Java unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
74 KiB
Code Review — Client.Java
| Field | Value |
|---|---|
| Module | clients/java |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | d692232 |
| Status | Reviewed |
| Open findings | 5 |
Checklist coverage
A third-pass review against commit a020350 (the sweep that resolved
Client.Java-013 through Client.Java-020). Prior findings are unchanged; new
findings raised in this pass are numbered Client.Java-021 onward.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: stream-events CLI text path still prints the proto uint64 worker_sequence with %d (Client.Java-023), the same bug Client.Java-020 fixed for galaxy-watch; bench-read-bulk includes failed-call durations in its success-latency histogram (Client.Java-024), mirroring the bug Client.Rust-015 fixed in Rust. |
| 2 | mxaccessgw conventions | No new issues found in this pass. |
| 3 | Concurrency & thread safety | Issue found: DeployEventStream did not receive the deterministic terminal-state serialisation that Client.Java-002 added to MxEventStream, so a concurrent queue-overflow + close() race can still erase the overflow signal (Client.Java-021). |
| 4 | Error handling & resilience | No new issues found in this pass. |
| 5 | Security | No new issues found in this pass. The Client.Java-018 regex correctly handles colon/comma/quote/paren/URL embeddings and is verified by the existing fixture tests. |
| 6 | Performance & resource management | No new issues found in this pass. shutdownTimeout is consistently honoured everywhere ownedChannel.shutdown() is called — both clients delegate to the shared MxGatewayChannels.shutdown / shutdownAndAwaitTermination helpers. |
| 7 | Design-document adherence | No new issues found in this pass. |
| 8 | Code organization & conventions | Issue found: the CLI CommonOptions.toClientOptions() does not propagate shutdownTimeout to the underlying MxGatewayClientOptions, so CLI users have no way to override the new option introduced by Client.Java-019 (Client.Java-025). |
| 9 | Testing coverage | Issue found: there is no CLI-level test coverage for the read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, or bench-read-bulk subcommands — Client.Java-013 noted this as out-of-scope but never filed a follow-up (Client.Java-026). |
| 10 | Documentation & comments | Issue found: MxGatewayChannels.toCompletable Javadoc claims chained thenApply futures forward cancel() upstream to CancellingCompletableFuture, which is not true of CompletableFuture.thenApply; the implementation works only because all validator chains are inlined into the new toCompletable(source, operation, validator) overload (Client.Java-022). |
2026-05-24 review (commit d692232)
Re-review pass at d692232. Diff against a020350 is commit 397d3c5:
gradle subprojects renamed mxgateway-client → zb-mom-ww-mxgateway-client,
mxgateway-cli → zb-mom-ww-mxgateway-cli. Java package change
com.dohertylan.mxgateway.* → com.zb.mom.ww.mxgateway.* (source directories
moved from com/dohertylan/mxgateway/ to com/zb/mom/ww/mxgateway/).
settings.gradle and root build.gradle updated for the new project /
group names. CLI mainClass updated. The gradle build task at HEAD is green;
documentation updates lag — see the doc-side findings below.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
| 2 | mxaccessgw conventions | Issues found: Client.Java-031 (README prose still uses the old short project names in plain text — the actual subproject names that drive Gradle / IDE imports are the prefixed ones). |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | No issues found in this diff. |
| 5 | Security | No issues found in this diff. |
| 6 | Performance & resource management | No issues found in this diff. |
| 7 | Design-document adherence | Issues found: Client.Java-028 (JavaClientDesign.md build-layout example still cites the old com/dohertylan/mxgateway/ package paths). |
| 8 | Code organization & conventions | Issues found: Client.Java-030 (no new test exercises the regenerated QueryActiveAlarmsRequest RPC path). |
| 9 | Testing coverage | Cross-reference to Client.Java-030; the bulk-command gaps tracked under Client.Java-026 remain. |
| 10 | Documentation & comments | Issues found: Client.Java-027 (Gradle task names in README and JavaClientDesign still reference the old :mxgateway-client: and :mxgateway-cli: paths — every command in the README breaks if copy-pasted); Client.Java-029 (README.md:209 cites zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli but the actual install path contains a doubled directory zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/). |
Findings
Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32 |
| Status | Resolved |
Description: redactApiKey preserves the leading and trailing four characters of the key. A gateway API key has the form mxgw_<key-id>_<secret>; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (CommonOptions.redactedJsonMap), and MxGatewayClientOptions.toString(). CLAUDE.md states API keys must never reach logs.
Recommendation: Redact the secret entirely. Show only a stable non-secret prefix (e.g. the mxgw_<key-id>_ portion) and mask everything after it, or emit a fixed mxgw_*** form. Do not echo any trailing characters of the secret.
Resolution: (2026-05-18) Confirmed against source: the old substring(0,4) + stars + substring(len-4) echoed the last four secret characters. redactApiKey now masks the secret entirely: for gateway-shaped keys it returns the non-secret mxgw_<key-id>_ prefix followed by *** (locating the secret separator as the first _ after mxgw_); any non-gateway-shaped token returns <redacted>. No leading/trailing secret characters are ever emitted. The pre-existing MxGatewayCliTests.openSessionJsonRedactsApiKey assertion that hardcoded the leaky mxgw***********cret form was corrected to assert the masked mxgw_visible_*** form. Regression tests: MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret, redactApiKeyForNonGatewayShapedKeyRevealsNothing, redactApiKeyStillHandlesNullAndShortInput.
Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92 |
| Status | Resolved |
Description: The next field is a plain (non-volatile) instance field, and MxEventStream exposes no thread-confinement guarantee. More concretely, a queue-overflow offer() and a close() offer(END) can interleave so the overflow exception is enqueued after END and never observed — the contract that "next() throws after overflow" is not guaranteed once close() has been called.
Recommendation: Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
Resolution: (2026-05-18) Confirmed against source: the old offer() END-branch did queue.clear(); queue.offer(END) when full, so a close() arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single terminate(MxGatewayException) method guarded by a terminated flag and a terminalLock; the first terminal condition wins and a later close()/END cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while close() is safe from any thread. Regression tests: MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose (deterministic) and eventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration race stress).
Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140 |
| Status | Resolved |
Description: OpenSessionReply carries gateway_protocol_version (proto field 8), and MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION exists so the client can reject incompatible generated-code inputs. The client never reads reply.getGatewayProtocolVersion() nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
Recommendation: In openSession/openSessionRaw, compare reply.getGatewayProtocolVersion() with MxGatewayClientVersion.gatewayProtocolVersion() and throw a typed MxGatewayException on mismatch.
Resolution: (2026-05-18) Confirmed against source: neither openSessionRaw nor openSessionAsync read getGatewayProtocolVersion(). Added a private ensureGatewayProtocolCompatible helper, called from both openSessionRaw and openSessionAsync, that throws MxGatewayException with a clear mismatch message when the gateway reports a non-zero version differing from MxGatewayClientVersion.gatewayProtocolVersion(). A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. clients/java/README.md documents the new fail-fast check. Regression tests: MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion and openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion.
Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197 |
| Status | Resolved |
Description: register, addItem, and addItem2 check reply.hasRegister()/hasAddItem() and otherwise fall back to reply.getReturnValue().getInt32Value(). If the gateway returns a reply with neither the typed payload nor a return_value set, the method silently returns 0 — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
Recommendation: If the expected typed payload is absent and no return_value is present, throw MxGatewayException (protocol violation) instead of returning 0.
Resolution: (2026-05-18) Confirmed against source: all three methods returned reply.getReturnValue().getInt32Value() (which yields 0 for an unset message field) when the typed payload was absent. Each method now guards the fallback with reply.hasReturnValue() and throws MxGatewayException describing the protocol violation when neither the typed payload nor a return_value is present. The legitimate return_value fallback is preserved. Regression tests: MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, and addItemStillHonoursReturnValueFallback.
Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105 |
| Status | Resolved |
Description: close() delegates to closeRaw(), which performs a network RPC. When MxGatewaySession is used in try-with-resources and the body throws, a failure inside closeSession (e.g. WORKER_UNAVAILABLE) throws from close() and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing close().
Recommendation: Either make close() swallow/log close-time failures (keeping closeRaw() for callers who want the result), or document clearly that close() performs a network call that can throw.
Resolution: (2026-05-18) Confirmed against source: close() called closeRaw() directly, so a CloseSession RPC failure propagated out of try-with-resources and replaced the body exception. close() now catches MxGatewayException from closeRaw() and logs it at WARNING via System.Logger instead of rethrowing, so a close-time failure never masks the body exception. closeRaw() is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use closeRaw() for explicit close handling are documented in clients/java/README.md and the close() Javadoc. Regression tests: MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException and closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt.
Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284 |
| Status | Resolved |
Description: close() (the AutoCloseable method invoked by try-with-resources) calls only ownedChannel.shutdown() and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. closeAndAwaitTermination() does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
Recommendation: Have close() await termination for a bounded time and shutdownNow() on timeout (the logic already in closeAndAwaitTermination()), or document that try-with-resources callers should call closeAndAwaitTermination().
Resolution: (2026-05-18) Confirmed against source: both MxGatewayClient.close() and GalaxyRepositoryClient.close() called only ownedChannel.shutdown(). close() in both clients now performs the bounded-wait logic previously only in closeAndAwaitTermination(): it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls shutdownNow() on timeout. Because close() cannot throw a checked exception, an InterruptedException while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. closeAndAwaitTermination() is retained unchanged for callers who want the checked, blocking-aware variant. clients/java/README.md documents the new try-with-resources close() semantics.
Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/ |
| Status | Resolved |
Description: The alarm surface — acknowledgeAlarm/acknowledgeAlarmAsync/queryActiveAlarms and MxGatewayActiveAlarmsSubscription — has zero test coverage. TLS channel construction, the async streamEventsAsync path, MxGatewayEventSubscription pre-start cancellation, and MxEventStream queue overflow are likewise untested. JavaClientDesign.md explicitly lists async stream-observer cancellation and status/error mapping as required tests.
Recommendation: Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
Resolution: (2026-05-18) Confirmed against source: no test referenced acknowledgeAlarm, queryActiveAlarms, streamEventsAsync, TLS construction, or MxEventStream overflow. Added MxGatewayLowFindingsTests (12 tests) covering: acknowledgeAlarm/acknowledgeAlarmAsync (success, typed protocol-failure, async transport-failure normalisation), queryActiveAlarms observer delivery, MxGatewayActiveAlarmsSubscription and MxGatewayEventSubscription pre-start cancellation, streamEventsAsync observer delivery, MxEventStream queue overflow surfacing MxGatewayException, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes GrpcSslContexts throw IllegalArgumentException (not SSLException), which the old catch (SSLException) let escape unwrapped — the catch in the shared channel builder was broadened to also wrap RuntimeException so callers always see one typed MxGatewayException.
Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304 |
| Status | Resolved |
Description: acknowledgeAlarmAsync and openSessionAsync apply ensureProtocolSuccess inside thenApply. If that validator throws a non-MxGatewayException RuntimeException it is wrapped by CompletionException with no fromGrpc normalisation, unlike the synchronous paths which normalise via try/catch. The async and sync error surfaces are therefore inconsistent.
Recommendation: Wrap the thenApply body so any non-MxGatewayException is routed through MxGatewayErrors.fromGrpc, matching the synchronous methods.
Resolution: (2026-05-18) Confirmed against source: the thenApply validators in openSessionAsync, invokeAsync, and acknowledgeAlarmAsync were not normalised — in practice the gateway's own validators (ensureProtocolSuccess, ensureMxAccessSuccess, ensureGatewayProtocolCompatible) only ever throw MxGatewayException, but a stray non-MxGatewayException RuntimeException (e.g. an NPE from a malformed reply) would surface raw inside CompletionException. Added MxGatewayChannels.normalisingValidator(operation, fn): it rethrows MxGatewayException unchanged and routes any other RuntimeException through MxGatewayErrors.fromGrpc, matching the synchronous try/catch paths. All three async thenApply sites now use it. Regression test: MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator.
Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413 |
| Status | Resolved |
Description: createChannel, withDeadline, withStreamDeadline, and toCompletable are duplicated nearly verbatim across MxGatewayClient and GalaxyRepositoryClient (~80 lines). A fix to one will not propagate to the other.
Recommendation: Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
Resolution: (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private MxGatewayChannels utility class holding createChannel(options, tlsErrorPrefix), withDeadline(stub, options), withStreamDeadline(stub, options), toCompletable(future, operation), and the new normalisingValidator helper (Client.Java-008). Both MxGatewayClient and GalaxyRepositoryClient now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into MxGatewayErrors.fromGrpc is now the specific RPC name instead of the generic "async call"/"galaxy async call". Verified by the full existing async test suite plus the new MxGatewayLowFindingsTests.
Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272, clients/java/README.md:76 |
| Status | Resolved |
Description: The acknowledgeAlarm Javadoc states the gateway authenticates against an invoke:alarm-ack scope, and the README states the Galaxy Repository requires a metadata:read scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
Recommendation: Reconcile the scope names with src/MxGateway.Server/Security/ and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
Resolution: (2026-05-18) Partially re-triaged. Verified against src/MxGateway.Server/Security/Authorization/GatewayScopes.cs and GatewayGrpcScopeResolver.cs: the canonical scope catalog is session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. (a) The README's metadata:read for the Galaxy Repository is correct — TestConnectionRequest/GetLastDeployTimeRequest/DiscoverHierarchyRequest/WatchDeployEventsRequest all resolve to GatewayScopes.MetadataRead; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define metadata:read. (b) The acknowledgeAlarm Javadoc's invoke:alarm-ack is wrong — no such scope exists. AcknowledgeAlarmRequest and QueryActiveAlarmsRequest are not special-cased in GatewayGrpcScopeResolver, so they fall through the _ => GatewayScopes.Admin default and require the admin scope. The Javadoc was corrected to state the admin scope; queryActiveAlarms did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.
Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63 |
| Status | Resolved |
Description: The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
Recommendation: Confirm fail-fast is intended (it appears to be); if so, document it on MxEventStream so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
Resolution: (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and docs/DesignDecisions.md make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the MxEventStream class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an MxGatewayException, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. clients/java/README.md carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext.
Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674 |
| Status | Resolved |
Description: CommonOptions.resolved() mutates this (resolvedApiKey, resolvedTimeout) and returns this, but toClientOptions() and redactedJsonMap() read those mutated fields. If redactedJsonMap() is ever called before resolved(), it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
Recommendation: Make resolved() return an immutable resolved value object, or compute resolvedApiKey/resolvedTimeout lazily in their getters so call ordering cannot produce stale output.
Resolution: (2026-05-18) Confirmed against source: resolved() populated the resolvedApiKey/resolvedTimeout mutable fields and toClientOptions()/redactedJsonMap() read them, so calling either before resolved() emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods resolvedApiKey() and resolvedTimeout() that compute their value on each call (API key from --api-key or the --api-key-env variable; timeout via parseDuration). toClientOptions() and redactedJsonMap() now call those accessors directly, so call ordering can no longer produce stale output. resolved() is retained as a no-op returning this purely for call-site readability (common.resolved()), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing MxGatewayCliTests JSON-redaction and option-parsing tests.
Client.Java-013
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304, clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244 |
| Status | Resolved |
Description: MxGatewayCliSession in MxGatewayCli.java:1214 was extended in commit f220908 (the "bulk read/write CLI subcommands" change) with five new abstract methods — readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk. The test-only FakeSession in MxGatewayCliTests.java:212 still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of mxgateway-cli's test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale .class files under build/classes/java/test/ predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. gradle test (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The CLAUDE.md source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.
Recommendation: Add the five missing @Override implementations to FakeSession (stubs returning empty lists are fine — only subscribeBulk/unsubscribeBulk are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.
Resolution: 2026-05-20 — Added the five missing @Override stubs (readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk) to FakeSession in clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java, each returning an empty ArrayList<> to match the interface return types (List<BulkReadResult> / List<BulkWriteResult>) without throwing. Imported BulkReadResult, BulkWriteResult, WriteBulkEntry, Write2BulkEntry, WriteSecuredBulkEntry, WriteSecured2BulkEntry from mxaccess_gateway.v1.MxaccessGateway. GrpcMxGatewayCliSession in MxGatewayCli.java is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with gradle clean followed by gradle :mxgateway-cli:compileTestJava and gradle :mxgateway-cli:test from clients/java, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.
Client.Java-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124 |
| Status | Resolved |
Description: MxEventStream.observer().beforeStart simply assigns requestStream without checking the closed flag, while close() reads requestStream after setting closed = true. If close() runs before the gRPC call has attached its ClientCallStreamObserver (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a finally block when an unrelated setup step throws), then at close time requestStream is null, so stream.cancel(...) is skipped. beforeStart then fires later, stores the live requestStream, and never observes closed — the underlying gRPC call leaks open and continues delivering events to a MxEventStream whose consumer has stopped iterating. The sibling DeployEventStream.beforeStart already does the correct thing (if (closed.get()) { requestStream.cancel(...); }); the two adaptors should behave identically.
Recommendation: Mirror DeployEventStream's pattern in MxEventStream.beforeStart: after storing requestStream, check the closed flag and cancel the stream eagerly if a prior close() has already fired. Add a regression test analogous to GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream to lock in the behavior.
Resolution: 2026-05-20 — Mirrored DeployEventStream.beforeStart in MxEventStream.beforeStart: after storing the ClientCallStreamObserver, the observer now reads the closed flag and calls requestStream.cancel("client cancelled event stream", null) when a prior close() already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing volatile boolean closed field (already established as a happens-before publisher by close() setting it before reading requestStream); no field shape changes were needed. clients/java/README.md documents the new safe-close-before-beforeStart contract. Regression test: MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream (mirrors GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream).
Client.Java-015
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138, MxGatewayClient.java:183-191,224-232,322-329, GalaxyRepositoryClient.java:164-170,212-214 |
| Status | Resolved |
Description: MxGatewayChannels.toCompletable registers a whenComplete on the local target future to forward cancellation to the source gRPC ListenableFuture. Every caller — openSessionAsync, invokeAsync, acknowledgeAlarmAsync, discoverHierarchyPageAsync, getLastDeployTimeAsync — then chains .thenApply(normalisingValidator(...)) or .thenApply(::getOk) and returns the chained future to the user. CompletableFuture.thenApply returns a new future whose cancellation does not propagate back to the source target. Cancelling the user-facing future therefore never sets target.isCancelled() == true, so source.cancel(true) is never invoked and the underlying gRPC call continues until its deadline expires. The JavaClientDesign.md "Streaming" section explicitly says "Stream cancellation should call ClientCall.cancel" — the same expectation reasonably applies to the unary *Async surface.
Recommendation: Either return target directly from each *Async method (and inline the validator into the FutureCallback.onSuccess path so no thenApply is needed), or attach the cancellation listener to the final returned future. The cleanest fix is to have MxGatewayChannels.toCompletable return a future that wraps the validator internally and registers whenComplete on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a ServerCallStreamObserver.setOnCancelHandler latch).
Resolution: 2026-05-20 — Fixed by inlining the reply validator into MxGatewayChannels.toCompletable so the user-visible future is the same future cancellation is bound to: added a new toCompletable(source, operation, validator) overload that runs the validator inside the FutureCallback.onSuccess path (normalising non-MxGatewayException RuntimeExceptions through MxGatewayErrors.fromGrpc, matching the existing synchronous try/catch). Replaced the previous whenComplete-based cancellation listener with a small CancellingCompletableFuture<T> subclass whose cancel(boolean) forwards to the source ListenableFuture.cancel(...) unconditionally, so even the no-validator overload propagates cancellation deterministically (the whenComplete listener only fired when target.isCancelled() was already true, which is exactly the case thenApply broke). Updated MxGatewayClient.openSessionAsync, MxGatewayClient.invokeAsync, MxGatewayClient.acknowledgeAlarmAsync, GalaxyRepositoryClient.testConnectionAsync, and GalaxyRepositoryClient.getLastDeployTimeAsync to use the new validator overload directly (no .thenApply chain). GalaxyRepositoryClient.discoverHierarchyAsync is paged via thenCompose, so it now publishes the current in-flight page future via an AtomicReference and returns a top-level CompletableFuture whose overridden cancel(boolean) cancels whichever page is currently outstanding. clients/java/README.md documents the new cancellation contract: cancelling any *Async future aborts the underlying gRPC call. Regression tests: MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall (full in-process gRPC test using ServerCallStreamObserver.setOnCancelHandler to latch when the server observes RPC cancellation), toCompletableValidatorOverloadForwardsCancellationToSource, and toCompletableNoValidatorOverloadForwardsCancellationToSource (unit-level proofs that both MxGatewayChannels.toCompletable overloads forward cancel(true) to the source ListenableFuture).
Client.Java-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391, GalaxyRepositoryClient.java:285-315 |
| Status | Resolved |
Description: Client.Java-009 introduced MxGatewayChannels to deduplicate createChannel, withDeadline, withStreamDeadline, and toCompletable. The two close() / closeAndAwaitTermination() methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of MxGatewayClient.close() + closeAndAwaitTermination() and GalaxyRepositoryClient.close() + closeAndAwaitTermination() are now duplicated verbatim, including the awaitTermination(connectTimeout) semantic (see Client.Java-019), the InterruptedException handling, and the ownedChannel == null guard. A fix to one path (e.g. introducing a dedicated shutdownTimeout option) will silently miss the other.
Recommendation: Move the shutdown logic into MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options) and MxGatewayChannels.shutdownAndAwaitTermination(...). Have both clients delegate to it. Same recommendation applies to the duplicated MxGatewayAuthInterceptor construction in the two constructors (MxGatewayClient(Channel, ...) and GalaxyRepositoryClient(Channel, ...)).
Resolution: 2026-05-20 — Extracted the duplicated shutdown logic into MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions) and MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions). Both helpers handle the ownedChannel == null no-op, the orderly-shutdown / awaitTermination / shutdownNow-on-timeout escalation, and the InterruptedException-restoring-the-interrupt-flag path. MxGatewayClient.close()/closeAndAwaitTermination() and GalaxyRepositoryClient.close()/closeAndAwaitTermination() are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's shutdownTimeout) lives in one place. Unused java.util.concurrent.TimeUnit imports were removed from both clients. The constructor-level MxGatewayAuthInterceptor duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel (covers the null-channel guard), shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout, and shutdownEscalatesToShutdownNowWhenTimeoutExceeded (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).
Client.Java-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36, clients/java/README.md:99-107 |
| Status | Resolved |
Description: MxEventStream.streamEvents was recently widened from a 16-element buffer to a 1024-element buffer (MxGatewayClient.streamEvents at line 268: new MxEventStream(1024)). The class-level Javadoc on MxEventStream still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and clients/java/README.md line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The MxGatewayClient.streamEvents inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.
Recommendation: Update the MxEventStream Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through MxGatewayClientOptions so callers can tune it per session.
Resolution: 2026-05-20 — Updated the MxEventStream class Javadoc and clients/java/README.md so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (MxGatewayClient.streamEvents) passes 1024 to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through MxGatewayClientOptions was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and MxGatewayClient.streamEvents is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.
Client.Java-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66 |
| Status | Resolved |
Description: redactCredentials(value) splits its input on \\s+ (whitespace) and only redacts whitespace-delimited tokens that start with mxgw_ or equal bearer (case-insensitive). gRPC Status.getDescription() strings, log lines, and proto error messages can carry credentials separated by colons (Bearer:mxgw_id_secret), commas (token=mxgw_id_secret,scope=...), single quotes ('mxgw_id_secret'), parentheses ((mxgw_id_secret)), or embedded in URLs/paths — all of which leave the mxgw_ token attached to a non-whitespace neighbour and survive redaction. MxGatewayErrors.fromGrpc is the primary consumer; a gateway error description like authentication failed: 'mxgw_id_secret' would round-trip the secret into the resulting MxGatewayAuthenticationException message.
Recommendation: Replace the whitespace-split scrub with a regex-based pass that matches mxgw_[A-Za-z0-9_-]+ anywhere in the string and substitutes <redacted>; also redact Bearer\s+\S+ as a unit so the token after Bearer is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted that asserts a quoted or comma-delimited credential is fully masked.
Resolution: 2026-05-20 — Replaced the whitespace-split scrub with two compiled Pattern regexes: mxgw_[A-Za-z0-9_-]+ matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and (?i)bearer\s+\S+ masks an authorization-header style Bearer <token> as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes Bearer <redacted> for the common combined case and renders it idempotently. Regression tests in MxGatewayFixtureTests: redactCredentialsHandlesNonWhitespaceDelimitedTokens exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; redactCredentialsLeavesBenignContentAlone confirms strings without credentials and a null input are unchanged.
Client.Java-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391, GalaxyRepositoryClient.java:286-315 |
| Status | Resolved |
Description: Both clients' close() / closeAndAwaitTermination() use options.connectTimeout() as the upper bound on awaitTermination. The connectTimeout semantically describes how long the client will wait to establish the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after shutdown(). With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to shutdownNow() and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on withDeadline. Conversely, a caller who sets a small connectTimeout (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.
Recommendation: Introduce a dedicated shutdownTimeout on MxGatewayClientOptions (defaulting to e.g. 5–10 s independent of connectTimeout) and use it in close() and closeAndAwaitTermination(). Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.
Resolution: 2026-05-20 — Added a dedicated shutdownTimeout Duration on MxGatewayClientOptions (builder method shutdownTimeout(Duration), accessor shutdownTimeout(), default 10 s), independent of connectTimeout. Both shared shutdown helpers introduced for Client.Java-016 (MxGatewayChannels.shutdown and shutdownAndAwaitTermination) call options.shutdownTimeout() as the awaitTermination upper bound, so a small connectTimeout (e.g. a 500 ms health-probe timeout) no longer forces a premature shutdownNow() on in-flight calls. The new option is reflected in toString() and documented on both helpers and the close()/closeAndAwaitTermination() Javadoc on both clients; clients/java/README.md notes the default and the independence from connectTimeout. Regression tests in MxGatewayLowFindingsIITests: shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to shutdownNow()), shutdownEscalatesToShutdownNowWhenTimeoutExceeded (a stuck channel beyond the shutdown timeout is forcibly shut down), and shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout (the default holds even when connectTimeout is small).
Client.Java-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254, galaxy_repository.proto:94 |
| Status | Resolved |
Description: galaxy_repository.proto defines DeployEvent.sequence as uint64; the protobuf Java mapping projects that to a signed long. The CLI's text-mode galaxy-watch output prints it as "seq=%d ...", which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach 2^63), but the broader pattern is brittle: any unsigned proto field printed via %d will display incorrectly past the signed boundary. The JSON path uses protoJson(event) which formats unsigned longs as numeric strings via JsonFormat, so JSON output is correct; only the text mode is at risk.
Recommendation: Print the sequence with Long.toUnsignedString(event.getSequence()) (or switch the text format to %s and pass the unsigned-string conversion). The same rule should apply to any other uint64 proto fields that surface in CLI text output.
Resolution: 2026-05-20 — Updated the galaxy-watch text-mode out.printf in MxGatewayCli.GalaxyWatchCommand.call() to use %s for the sequence field and pass Long.toUnsignedString(event.getSequence()), so deploy sequences past 2^63 render as their correct unsigned decimal string instead of a negative signed long. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. Regression test: MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output contains seq=18446744073709551615 and does not contain seq=-1.
Client.Java-021
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135 |
| Status | Resolved |
Description: Client.Java-002 fixed a deterministic terminal-state race in MxEventStream by introducing a terminate(MxGatewayException) method, a terminalLock, and a terminated flag so a close() arriving after a queue-overflow offer() cannot wipe the overflow exception. DeployEventStream — added later and structurally a copy of MxEventStream — never received the same fix. Its current close() does closed.set(true); stream.cancel(...); offer(END);, and its offer() overflow branch does queue.clear(); queue.offer(new MxGatewayException("...queue overflowed")); queue.offer(END); (lines 117-135). With these two paths running concurrently, the same sequence Client.Java-002 documented can repeat: the overflow branch enqueues [overflowException, END], close() then calls offer(END) which sees the queue full and falls into the END branch (queue.clear(); queue.offer(value);), wiping the overflow exception and leaving a clean end-of-stream. The CLI galaxy-watch (and any WatchDeployEvents consumer) loses the overflow signal it was supposed to surface, defeating the fail-fast backpressure contract. The 16-element buffer on DeployEventStream makes overflow far less likely than on MxEventStream in practice, but the race is identical.
Recommendation: Mirror the MxEventStream fix: add a terminated flag and terminalLock, route close(), onCompleted, and the overflow branch through a single terminate(MxGatewayException) method that wins on first arrival, and add the regression analogous to MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose. Given the two stream classes are now structural copies of each other, consider extracting the queue/terminate plumbing into a shared base or helper so the next fix lands once.
Resolution: 2026-05-20 — Mirrored the MxEventStream terminal-state serialisation in DeployEventStream: replaced the AtomicBoolean closed field with a volatile boolean closed, added a terminalLock/terminated pair, and routed all terminal paths (close(), onCompleted(), the overflow branch in offer()) through a single private terminate(MxGatewayException fault) method guarded by synchronized (terminalLock) { if (terminated) return; terminated = true; ... }. The first terminal condition wins: an overflow that publishes [exception, END] is no longer wiped by a subsequent close()/onCompleted() that previously took the "queue full → clear + offer(END)" branch. The class-level Javadoc now documents the single-consumer-thread iterator contract and the deterministic terminal transition, matching MxEventStream. Behavior outside the terminal path is unchanged: beforeStart still resolves the close-before-beforeStart race (Client.Java-014's deploy-stream counterpart, already in place), take() still surfaces interrupts, and the request stream is still cancelled on overflow/close. Regression tests in GalaxyRepositoryClientTests: deployEventStreamOverflowExceptionSurvivesASubsequentClose (deterministic — capacity-2 stream, force overflow, then close, assert the overflow exception is surfaced) and deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration concurrent race stress, mirrors MxGatewayMediumFindingsTests.eventStreamConcurrentOverflowAndCloseAlwaysTerminate).
Client.Java-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:161-172 |
| Status | Resolved |
Description: The Javadoc on the no-validator toCompletable(source, operation) overload claims: "calling cancel(true) on either the direct return value or the user-facing chained future ultimately invokes source.cancel(true) (chained futures forward to the upstream stage they were derived from, which is this future)." This is not how CompletableFuture.thenApply (or thenCompose, whenComplete, etc.) actually behaves: a downstream stage's cancel() only marks that derived stage as cancelled, it does NOT propagate cancellation upstream to the originating CancellingCompletableFuture. The Client.Java-015 resolution actually fixes the bug by inlining the validator into the new toCompletable(source, operation, validator) overload (lines 224-252) so users never need a downstream stage, and by GalaxyRepositoryClient.discoverHierarchyAsync using an explicit AtomicReference-based override (which has a correct comment at line 218-221 acknowledging exactly this thenCompose limitation). The contradiction between the two adjacent comments will mislead the next maintainer who decides to add a convenience .thenApply on top of a *Async return value — they will assume cancellation still flows through and re-introduce the Client.Java-015 leak.
Recommendation: Rewrite the toCompletable Javadoc to state the actual contract: cancel(...) on the direct return value (the CancellingCompletableFuture instance) forwards to the source RPC, but cancel(...) on a thenApply/thenCompose/thenAccept of that future does not — the cancellation is captured at the derived stage and the upstream RPC continues until its deadline. Callers that need cancellation through a chained pipeline must follow the discoverHierarchyAsync pattern (custom CompletableFuture subclass tracking the current in-flight stage). The underlying CancellingCompletableFuture class doc (lines 254-258) is already correct; only the toCompletable paragraph is misleading.
Resolution: 2026-05-20 — Rewrote the toCompletable(source, operation) Javadoc in MxGatewayChannels to reflect the actual CompletableFuture contract. The doc now states unambiguously: cancelling the direct return value (the CancellingCompletableFuture) forwards to the source ListenableFuture and aborts the underlying gRPC call (the Client.Java-015 fix), but cancelling a derived thenApply/thenCompose/thenAccept/whenComplete stage of that future does NOT propagate cancellation upstream — the derived stage is marked cancelled while the source RPC continues until its deadline. The Javadoc explicitly directs callers that need cancellation through a chained pipeline to either the toCompletable(source, operation, validator) overload (which inlines the validator into the FutureCallback.onSuccess path so the user-visible future is the same future cancellation is bound to) or the GalaxyRepositoryClient.discoverHierarchyAsync AtomicReference-based pattern (for thenCompose across paged calls). The CancellingCompletableFuture class Javadoc was already correct and is unchanged. Doc-only change; no behavior change and no new test required.
Client.Java-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1054, src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:634 |
| Status | Resolved |
Description: MxEvent.worker_sequence is a proto uint64 (line 634 of mxaccess_gateway.proto). The stream-events CLI text path prints it with %d (client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());), which interprets the underlying signed long value — sequences past 2^63 would render as a negative number. This is the exact same uint64-with-%d bug that Client.Java-020 fixed for the galaxy-watch DeployEvent.sequence field; the resolution's stated rule ("The same rule should apply to any other uint64 proto fields that surface in CLI text output") was never extended to this site. In practice worker sequences will not reach 2^63 so this is latent rather than active, but the same fix and the same regression-test pattern apply.
Recommendation: Replace the %d with %s plus Long.toUnsignedString(event.getWorkerSequence()) (matching the Client.Java-020 fix in GalaxyWatchCommand), and add a regression test analogous to MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 covering the stream-events text-mode format string with -1L. The --after-worker-sequence CLI option (line 1035) is also typed as a long, which means the user cannot pass an unsigned value above 2^63 - 1 from the command line; that is a related but separate ergonomic gap worth noting in the same change.
Resolution: 2026-05-20 — Updated the stream-events text-mode client.out().printf in MxGatewayCli.StreamEventsCommand.call() to use %s for the sequence and pass Long.toUnsignedString(event.getWorkerSequence()), mirroring the Client.Java-020 fix in GalaxyWatchCommand. Worker sequences past 2^63 now render as their correct unsigned decimal string instead of a negative signed long. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and is unchanged. The --after-worker-sequence long ergonomic gap is a separate v2 concern and intentionally out of scope. Regression test: MxGatewayCliTests.streamEventsWorkerSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output starts with 18446744073709551615 and does not start with -1 , mirroring deployEventSequenceRendersAsUnsignedForHighUint64.
Client.Java-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883 |
| Status | Resolved |
Description: BenchReadBulkCommand records per-call latency in latenciesNanos[latencyCount++] = elapsed; inside both the success branch (line 865) and the catch (Exception ex) failure branch (line 880). The failed-call durations are then fed into the percentileSummaryMs p50/p95/p99 calculation alongside successful calls, producing misleading latency stats when even a few transport errors occur during the bench window. Client.Rust-015 fixed exactly this pattern in clients/rust/src/bin/bench-read-bulk.rs ("stop bench-read-bulk from polluting success-latency histograms with failed-call durations"); the equivalent fix was not applied to the Java implementation. The cross-language matrix runner (scripts/run-client-e2e-tests.ps1) compares numbers across all five clients, so the Java numbers will be silently inconsistent with the Rust numbers on the same fault profile.
Recommendation: Drop the failure-branch latency record (only count failed++), or alternately maintain a separate failedLatenciesNanos array and report it as a distinct stat in the JSON output — but the success histogram must not include failed-call latencies. Cross-check the .NET, Go, and Python bench-read-bulk drivers in the same change to make sure all five clients use the same success-latency definition; the cross-language matrix is only useful if the metric is uniform.
Resolution: 2026-05-20 — Dropped the failure-branch latency record in BenchReadBulkCommand.call(): the catch (Exception ex) block no longer appends elapsed to latenciesNanos and no longer grows the array — it only increments failed++. The success-latency histogram fed into percentileSummaryMs (p50/p95/p99/max/mean) is now success-call-only, matching the Client.Rust-015 fix. The JSON output still surfaces failedCalls as a distinct top-level count so observers see fault rates separately from latency. An inline comment on the catch block documents the contract so the next maintainer doesn't reinstate the record. New CLI test MxGatewayCliTests.benchReadBulkCommandEmitsJsonSchemaKeys (added under Client.Java-026 below) covers the JSON schema produced by the corrected path. The .NET / Go / Python bench drivers were intentionally left out of scope for this Java-focused finding — that cross-client audit is its own follow-up and tracked separately.
Client.Java-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185 |
| Status | Resolved |
Description: CommonOptions.toClientOptions() populates the MxGatewayClientOptions builder with endpoint, apiKey, plaintext, caCertificatePath, serverNameOverride, and callTimeout, but never sets shutdownTimeout even though Client.Java-019 introduced it as a first-class option. CLI users therefore always inherit the 10-second default and have no way to override it from the command line, which makes the new option effectively client-library-only. CLI users running long-lived operations (a big discover-hierarchy page-chain, a streaming galaxy-watch session that needs to drain on Ctrl+C) cannot tune the shutdown deadline up; users running short health probes who want a small connectTimeout and a small shutdownTimeout to keep the CLI snappy on failure also cannot.
Recommendation: Add a --shutdown-timeout option to CommonOptions (parsed via the existing parseDuration helper, default unset → use the 10-second library default) and propagate it into toClientOptions() so the CLI surface tracks the library surface. Include the resolved value in redactedJsonMap() so --json output shows the effective shutdown deadline.
Resolution: 2026-05-20 — Added a --shutdown-timeout option to CommonOptions in MxGatewayCli.java, parsed via the existing parseDuration helper (so it accepts 10s, 500ms, ISO-8601 PT10S, etc.). A new lazy accessor resolvedShutdownTimeout() returns the parsed Duration when the user passed --shutdown-timeout, or null when unset so the MxGatewayClientOptions builder default (10s, established by Client.Java-019) applies. toClientOptions() now conditionally calls builder.shutdownTimeout(resolvedShutdownTimeout) only when the user opted in, preserving the library default for the common case. redactedJsonMap() includes the resolved value under key "shutdownTimeout" (empty string when unset) so --json output shows the effective shutdown deadline. The CLI surface now tracks the library surface so a user running a long page-chain can pass --shutdown-timeout 60s, and a user running a short health probe can pair --timeout 500ms with --shutdown-timeout 500ms to keep the CLI snappy on failure. Behavior for callers who do not pass the new flag is unchanged.
Client.Java-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java |
| Status | Resolved |
Description: Client.Java-013 explicitly deferred adding CLI-level test coverage for the read-bulk, write-bulk, and bench-read-bulk subcommands ("Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix"), and the resolution explicitly stated that "follow-up is tracked separately and out of scope for this unblock-compilation fix." That follow-up was never filed. The current MxGatewayCliTests only covers version, open-session (JSON redaction), write, smoke, subscribe-bulk, unsubscribe-bulk, and the Client.Java-020 unsigned-uint64 format string — six of the thirteen non-trivial subcommands the CLI ships are completely untested at the CLI layer (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk), as are stream-events, the four galaxy-* commands, and close-session. The FakeSession stubs all return empty lists, so an end-to-end CLI test would catch JSON-shape regressions, argument-parsing bugs, and option contract breaks that the bulk Session unit tests on the library side do not exercise. This same coverage gap is what made Client.Java-013 itself only surface on a clean Gradle build.
Recommendation: Add at least one round-trip CLI test per bulk subcommand (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk) that exercises the JSON output shape and the value parser (parseValue(type, text) is shared across all five and the only write*-bulk path that catches typos in the type switch). Extending the FakeSession stubs to return at least one result row makes the assertions meaningful. The bench-read-bulk test can run with a 1-second --duration-seconds and a 0-second --warmup-seconds and assert the JSON schema keys (totalCalls, latencyMs.p50, callsPerSecond) rather than the numeric values.
Resolution: 2026-05-20 — Added round-trip CLI tests for all six bulk-family subcommands plus the new Client.Java-023 unsigned-uint64 regression to MxGatewayCliTests. The FakeSession stubs were upgraded from empty-list returns to per-call recorders that publish the parsed entries (e.g. lastWriteBulkEntries, lastReadBulkTimeoutMs) and synthesise one BulkReadResult/BulkWriteResult per requested handle so the JSON output assertions exercise the bulkReadResultMap and bulkWriteResultMap serialisers. New tests: (a) readBulkCommandForwardsTimeoutAndPrintsResults — asserts --timeout-ms 750 reaches the session and the JSON output carries the per-tag tagAddress, itemHandle, wasCached, and quality fields; (b) writeBulkCommandParsesTypedValuesAndPrintsResults — asserts --type int32 --values 111,222 --user-id 5 parses through the shared parseValue switch and the entries are constructed with the expected typed MxValue and userId; (c) write2BulkCommandForwardsTimestampAndPrintsResults — asserts the --timestamp 2026-05-20T00:00:00Z reaches the entry as a timestampValue (hasTimestampValue() is true); (d) writeSecuredBulkCommandForwardsUserIdsAndPrintsResults — asserts --current-user-id 7 --verifier-user-id 8 are both propagated; (e) writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults — combination of (c) and (d); (f) benchReadBulkCommandEmitsJsonSchemaKeys — runs the bench in a 1s steady / 0s warmup window and asserts the JSON output contains the cross-language schema keys (language=java, command=bench-read-bulk, bulkSize=2, totalCalls, successfulCalls, failedCalls, callsPerSecond, latencyMs.p50/p95/p99, tags including the synthesised TestMachine_001.TestChangingInt/TestMachine_002.TestChangingInt pair); (g) streamEventsWorkerSequenceRendersAsUnsignedForHighUint64 — Client.Java-023 regression. The recommendation's stream-events and galaxy-* CLI tests were intentionally not added in this round — they require either an in-process gateway/galaxy server or package-private MxEventStream/DeployEventStream constructor access from the CLI test module, which is its own infrastructure work; the library-side tests in GalaxyRepositoryClientTests already cover the streaming wire behaviour.
Client.Java-027
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Documentation & comments |
| Location | clients/java/README.md:36,107-175,185,205,220, clients/java/JavaClientDesign.md:195-211 |
| Status | Resolved |
Description: Commit 397d3c5 renamed the gradle subprojects to zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli in settings.gradle, but did not propagate that rename into the README's documented gradle commands or into JavaClientDesign.md. Every documented gradle invocation still uses the old short names — gradle :mxgateway-client:generateProto, gradle :mxgateway-cli:run --args=..., gradle :mxgateway-client:jar :mxgateway-cli:installDist — and every one fails with project 'mxgateway-client' not found in root project 'zb-mom-ww-mxaccessgw-java'. A user copy-pasting from the README or design doc will hit this on the very first command.
Recommendation: Find-and-replace :mxgateway-client: → :zb-mom-ww-mxgateway-client: and :mxgateway-cli: → :zb-mom-ww-mxgateway-cli: across clients/java/README.md and clients/java/JavaClientDesign.md. (Roughly 17 occurrences in the README, 3 in the design doc.) Test by running one updated command end-to-end.
Resolution: 2026-05-24 — Replaced every stale gradle task path in clients/java/README.md (line 37 :mxgateway-client:generateProto, lines 108–110 :mxgateway-cli:run galaxy-* invocations, lines 160–161 :mxgateway-cli:run galaxy-watch invocations, lines 169–176 :mxgateway-cli:run gateway-command invocations, line 186 :mxgateway-cli:run TLS smoke invocation, line 206 :mxgateway-client:jar :mxgateway-cli:installDist packaging command, line 221 :mxgateway-cli:run integration-check invocation) and in clients/java/JavaClientDesign.md (lines 195–196 packaging bullets, lines 209–212 "Current Build" prose) with their prefixed :zb-mom-ww-mxgateway-client: / :zb-mom-ww-mxgateway-cli: equivalents. Verified by running gradle :zb-mom-ww-mxgateway-client:test --tests … (now the updated documented form) end-to-end and gradle build, both BUILD SUCCESSFUL. Doc-only change; no production code touched.
Client.Java-028
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Documentation & comments |
| Location | clients/java/JavaClientDesign.md:23-27 |
| Status | Resolved |
Description: The build-layout block in JavaClientDesign.md still shows the old Java package paths com/dohertylan/mxgateway/client/ and com/dohertylan/mxgateway/cli/. The actual source tree was moved to com/zb/mom/ww/mxgateway/{client,cli}/ in commit 397d3c5. Anyone using the design doc to locate or navigate code will look in the wrong place.
Recommendation: Update the layout block to reflect the new paths: com/zb/mom/ww/mxgateway/client/ and com/zb/mom/ww/mxgateway/cli/. Comment-only change.
Resolution: 2026-05-24 — Updated the Build Layout block in clients/java/JavaClientDesign.md:23-27 to show com/zb/mom/ww/mxgateway/client/ and com/zb/mom/ww/mxgateway/cli/ instead of the old com/dohertylan/mxgateway/{client,cli}/ package paths, matching the actual on-disk source tree under clients/java/zb-mom-ww-mxgateway-{client,cli}/src/{main,test}/java/. Doc-only change; no production code touched.
Client.Java-029
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/README.md:208-209 |
| Status | Resolved |
Description: The packaging section states "The library jar is under zb-mom-ww-mxgateway-client/build/libs. The installed CLI distribution is under zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli." The library-jar path is correct, but the install-distribution path is wrong — gradle's installDist produces a directory whose name matches the project name, not the (now-retired) short name, so the actual path is zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/. The e2e script (scripts/run-client-e2e-tests.ps1) uses the correct path, so the script works; only the README is wrong.
Recommendation: Correct the README to zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli. Comment-only.
Resolution: 2026-05-24 — Corrected the install-distribution path in clients/java/README.md:210 from zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli to zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli, matching what gradle's installDist actually produces (the directory name matches the subproject name). The library-jar path on the preceding line was already correct and is unchanged. Doc-only change; the e2e script scripts/run-client-e2e-tests.ps1 was already using the correct path.
Client.Java-030
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/ |
| Status | Resolved |
Description: Commit 397d3c5 added the missing QueryActiveAlarmsRequest proto message and the corresponding rpc QueryActiveAlarms to mxaccess_gateway.proto. The Java client now generates the request type and the gRPC stub method, and MxGatewayClient.queryActiveAlarms correctly references both. No unit test exercises the new RPC end-to-end on the Java side — the proto compiles, the import resolves, the method is callable, but the absence of a queryActiveAlarmsForwardsRequestAndStreamsSnapshots (or similar) fixture means a serialisation regression in QueryActiveAlarmsRequest or the streaming reply would not surface in the Java unit tests.
Recommendation: Add a unit test in MxGatewayFixtureTests (or wherever the alarm fixtures live) that pushes a QueryActiveAlarmsRequest through a FakeMxAccessGateway and asserts the ActiveAlarmSnapshot stream is consumed correctly — mirror the existing acknowledgeAlarm test shape.
Resolution: 2026-05-24 — Re-triaged the recommendation: the suite uses the InProcessGateway + TestGatewayService fixture in MxGatewayClientSessionTests (no separate FakeMxAccessGateway), and there is in fact no existing acknowledgeAlarm test in the current tree to mirror — the alarm RPC surface has zero coverage. Added queryActiveAlarmsForwardsRequestAndStreamsSnapshots to MxGatewayClientSessionTests (the same file/fixture style as the other unary/streaming RPC tests): the test overrides TestGatewayService.queryActiveAlarms to capture the inbound QueryActiveAlarmsRequest and emit two ActiveAlarmSnapshot messages (one ACTIVE, one ACTIVE_ACKED with an operator/comment populated), then calls MxGatewayClient.queryActiveAlarms with a request carrying session_id, client_correlation_id, and alarm_filter_prefix. It awaits onCompleted via a CountDownLatch, closes the subscription, and asserts (a) the server observed all three inbound request fields, (b) both snapshots arrived in order with the expected alarm_full_reference/current_state/severity/operator_user, and (c) the observer never received an error. TDD red phase confirmed by temporarily changing the session_id assertion to "WRONG-SESSION-ID" — gradle :zb-mom-ww-mxgateway-client:test --tests "…queryActiveAlarmsForwardsRequestAndStreamsSnapshots" failed with AssertionFailedError at MxGatewayClientSessionTests.java:252. Restoring the assertion turned the build green. Full gradle build from clients/java is BUILD SUCCESSFUL.
Client.Java-031
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | clients/java/README.md:13,17,26 |
| Status | Resolved |
Description: The README prose at lines 13–26 introduces the subprojects as mxgateway-client and mxgateway-cli (the old short names) when discussing the layout. Those are no longer the actual subproject names — settings.gradle declares zb-mom-ww-mxgateway-client / zb-mom-ww-mxgateway-cli. The prose works as a naming mnemonic, but it confuses anyone trying to map README descriptions to actual gradle output, IDE project trees, or the e2e script.
Recommendation: Either (a) update the prose to the full prefixed names, or (b) clarify in a one-line note: "The subprojects are zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli; this README refers to them by their short suffixes below for readability." (a) is more honest; (b) preserves readability at the cost of one extra concept.
Resolution: 2026-05-24 — Took option (a): updated the README layout-section prose in clients/java/README.md:17,22,26 (the mxgateway-client generates… paragraph, the mxgateway-client exposes… paragraph, and the mxgateway-cli depends on mxgateway-client… paragraph) to use the full prefixed zb-mom-ww-mxgateway-client and zb-mom-ww-mxgateway-cli names, matching the layout block at lines 13–14 and settings.gradle. Reflows the final paragraph slightly because the prefixed names push past the existing 80-column wrap; content is unchanged otherwise. Doc-only change.