Files
mxaccessgw/code-reviews/Client.Java/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

44 KiB
Raw Blame History

Code Review — Client.Java

Field Value
Module clients/java
Reviewer Claude Code
Review date 2026-05-20
Commit reviewed 1cd51bb
Status Reviewed
Open findings 0

Checklist coverage

A second-pass review against commit 1cd51bb. Client.Java-001 through Client.Java-012 are unchanged from the prior pass; the table below records the new findings raised in this pass against the same checklist categories.

# Category Result
1 Correctness & logic bugs Issues found: CLI MxEventStream(1024) capacity contradicts Javadoc/README "16-element buffer" claim (Client.Java-017); CLI DeployEvent.sequence printed with %d as signed long (Client.Java-020).
2 mxaccessgw conventions No new issues found in this pass.
3 Concurrency & thread safety Issues found: MxEventStream.beforeStart does not honour pre-start close() and leaks the gRPC call (Client.Java-014); MxGatewayChannels.toCompletable cancellation propagation is broken once the future is wrapped in thenApply (Client.Java-015).
4 Error handling & resilience Issue found: MxGatewaySecrets.redactCredentials only inspects whitespace-delimited tokens, so colon/comma/quote-embedded mxgw_ credentials leak through (Client.Java-018).
5 Security Issue found: same redactCredentials leak — see Client.Java-018.
6 Performance & resource management Issue found: client close() uses the connect timeout as its shutdown deadline (Client.Java-019).
7 Design-document adherence No new issues found in this pass.
8 Code organization & conventions Issue found: channel close() / closeAndAwaitTermination() are still duplicated verbatim across MxGatewayClient and GalaxyRepositoryClient despite Client.Java-009's stated resolution (Client.Java-016).
9 Testing coverage Issue found: CLI FakeSession does not implement the five bulk methods added to MxGatewayCliSession, so the CLI test module fails to compile against the current source (Client.Java-013).
10 Documentation & comments Issue found: docs claim a 16-element event-stream buffer that is actually 1024 in production (Client.Java-017).

Findings

Client.Java-001

Field Value
Severity Medium
Category Security
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32
Status Resolved

Description: redactApiKey preserves the leading and trailing four characters of the key. A gateway API key has the form mxgw_<key-id>_<secret>; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (CommonOptions.redactedJsonMap), and MxGatewayClientOptions.toString(). CLAUDE.md states API keys must never reach logs.

Recommendation: Redact the secret entirely. Show only a stable non-secret prefix (e.g. the mxgw_<key-id>_ portion) and mask everything after it, or emit a fixed mxgw_*** form. Do not echo any trailing characters of the secret.

Resolution: (2026-05-18) Confirmed against source: the old substring(0,4) + stars + substring(len-4) echoed the last four secret characters. redactApiKey now masks the secret entirely: for gateway-shaped keys it returns the non-secret mxgw_<key-id>_ prefix followed by *** (locating the secret separator as the first _ after mxgw_); any non-gateway-shaped token returns <redacted>. No leading/trailing secret characters are ever emitted. The pre-existing MxGatewayCliTests.openSessionJsonRedactsApiKey assertion that hardcoded the leaky mxgw***********cret form was corrected to assert the masked mxgw_visible_*** form. Regression tests: MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret, redactApiKeyForNonGatewayShapedKeyRevealsNothing, redactApiKeyStillHandlesNullAndShortInput.

Client.Java-002

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92
Status Resolved

Description: The next field is a plain (non-volatile) instance field, and MxEventStream exposes no thread-confinement guarantee. More concretely, a queue-overflow offer() and a close() offer(END) can interleave so the overflow exception is enqueued after END and never observed — the contract that "next() throws after overflow" is not guaranteed once close() has been called.

Recommendation: Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.

Resolution: (2026-05-18) Confirmed against source: the old offer() END-branch did queue.clear(); queue.offer(END) when full, so a close() arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single terminate(MxGatewayException) method guarded by a terminated flag and a terminalLock; the first terminal condition wins and a later close()/END cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while close() is safe from any thread. Regression tests: MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose (deterministic) and eventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration race stress).

Client.Java-003

Field Value
Severity Medium
Category mxaccessgw conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140
Status Resolved

Description: OpenSessionReply carries gateway_protocol_version (proto field 8), and MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION exists so the client can reject incompatible generated-code inputs. The client never reads reply.getGatewayProtocolVersion() nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.

Recommendation: In openSession/openSessionRaw, compare reply.getGatewayProtocolVersion() with MxGatewayClientVersion.gatewayProtocolVersion() and throw a typed MxGatewayException on mismatch.

Resolution: (2026-05-18) Confirmed against source: neither openSessionRaw nor openSessionAsync read getGatewayProtocolVersion(). Added a private ensureGatewayProtocolCompatible helper, called from both openSessionRaw and openSessionAsync, that throws MxGatewayException with a clear mismatch message when the gateway reports a non-zero version differing from MxGatewayClientVersion.gatewayProtocolVersion(). A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. clients/java/README.md documents the new fail-fast check. Regression tests: MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion and openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion.

Client.Java-004

Field Value
Severity Medium
Category Correctness & logic bugs
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197
Status Resolved

Description: register, addItem, and addItem2 check reply.hasRegister()/hasAddItem() and otherwise fall back to reply.getReturnValue().getInt32Value(). If the gateway returns a reply with neither the typed payload nor a return_value set, the method silently returns 0 — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.

Recommendation: If the expected typed payload is absent and no return_value is present, throw MxGatewayException (protocol violation) instead of returning 0.

Resolution: (2026-05-18) Confirmed against source: all three methods returned reply.getReturnValue().getInt32Value() (which yields 0 for an unset message field) when the typed payload was absent. Each method now guards the fallback with reply.hasReturnValue() and throws MxGatewayException describing the protocol violation when neither the typed payload nor a return_value is present. The legitimate return_value fallback is preserved. Regression tests: MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, and addItemStillHonoursReturnValueFallback.

Client.Java-005

Field Value
Severity Medium
Category Error handling & resilience
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105
Status Resolved

Description: close() delegates to closeRaw(), which performs a network RPC. When MxGatewaySession is used in try-with-resources and the body throws, a failure inside closeSession (e.g. WORKER_UNAVAILABLE) throws from close() and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing close().

Recommendation: Either make close() swallow/log close-time failures (keeping closeRaw() for callers who want the result), or document clearly that close() performs a network call that can throw.

Resolution: (2026-05-18) Confirmed against source: close() called closeRaw() directly, so a CloseSession RPC failure propagated out of try-with-resources and replaced the body exception. close() now catches MxGatewayException from closeRaw() and logs it at WARNING via System.Logger instead of rethrowing, so a close-time failure never masks the body exception. closeRaw() is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use closeRaw() for explicit close handling are documented in clients/java/README.md and the close() Javadoc. Regression tests: MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException and closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt.

Client.Java-006

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284
Status Resolved

Description: close() (the AutoCloseable method invoked by try-with-resources) calls only ownedChannel.shutdown() and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. closeAndAwaitTermination() does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.

Recommendation: Have close() await termination for a bounded time and shutdownNow() on timeout (the logic already in closeAndAwaitTermination()), or document that try-with-resources callers should call closeAndAwaitTermination().

Resolution: (2026-05-18) Confirmed against source: both MxGatewayClient.close() and GalaxyRepositoryClient.close() called only ownedChannel.shutdown(). close() in both clients now performs the bounded-wait logic previously only in closeAndAwaitTermination(): it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls shutdownNow() on timeout. Because close() cannot throw a checked exception, an InterruptedException while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. closeAndAwaitTermination() is retained unchanged for callers who want the checked, blocking-aware variant. clients/java/README.md documents the new try-with-resources close() semantics.

Client.Java-007

Field Value
Severity Low
Category Testing coverage
Location clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/
Status Resolved

Description: The alarm surface — acknowledgeAlarm/acknowledgeAlarmAsync/queryActiveAlarms and MxGatewayActiveAlarmsSubscription — has zero test coverage. TLS channel construction, the async streamEventsAsync path, MxGatewayEventSubscription pre-start cancellation, and MxEventStream queue overflow are likewise untested. JavaClientDesign.md explicitly lists async stream-observer cancellation and status/error mapping as required tests.

Recommendation: Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.

Resolution: (2026-05-18) Confirmed against source: no test referenced acknowledgeAlarm, queryActiveAlarms, streamEventsAsync, TLS construction, or MxEventStream overflow. Added MxGatewayLowFindingsTests (12 tests) covering: acknowledgeAlarm/acknowledgeAlarmAsync (success, typed protocol-failure, async transport-failure normalisation), queryActiveAlarms observer delivery, MxGatewayActiveAlarmsSubscription and MxGatewayEventSubscription pre-start cancellation, streamEventsAsync observer delivery, MxEventStream queue overflow surfacing MxGatewayException, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes GrpcSslContexts throw IllegalArgumentException (not SSLException), which the old catch (SSLException) let escape unwrapped — the catch in the shared channel builder was broadened to also wrap RuntimeException so callers always see one typed MxGatewayException.

Client.Java-008

Field Value
Severity Low
Category Error handling & resilience
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304
Status Resolved

Description: acknowledgeAlarmAsync and openSessionAsync apply ensureProtocolSuccess inside thenApply. If that validator throws a non-MxGatewayException RuntimeException it is wrapped by CompletionException with no fromGrpc normalisation, unlike the synchronous paths which normalise via try/catch. The async and sync error surfaces are therefore inconsistent.

Recommendation: Wrap the thenApply body so any non-MxGatewayException is routed through MxGatewayErrors.fromGrpc, matching the synchronous methods.

Resolution: (2026-05-18) Confirmed against source: the thenApply validators in openSessionAsync, invokeAsync, and acknowledgeAlarmAsync were not normalised — in practice the gateway's own validators (ensureProtocolSuccess, ensureMxAccessSuccess, ensureGatewayProtocolCompatible) only ever throw MxGatewayException, but a stray non-MxGatewayException RuntimeException (e.g. an NPE from a malformed reply) would surface raw inside CompletionException. Added MxGatewayChannels.normalisingValidator(operation, fn): it rethrows MxGatewayException unchanged and routes any other RuntimeException through MxGatewayErrors.fromGrpc, matching the synchronous try/catch paths. All three async thenApply sites now use it. Regression test: MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator.

Client.Java-009

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413
Status Resolved

Description: createChannel, withDeadline, withStreamDeadline, and toCompletable are duplicated nearly verbatim across MxGatewayClient and GalaxyRepositoryClient (~80 lines). A fix to one will not propagate to the other.

Recommendation: Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.

Resolution: (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private MxGatewayChannels utility class holding createChannel(options, tlsErrorPrefix), withDeadline(stub, options), withStreamDeadline(stub, options), toCompletable(future, operation), and the new normalisingValidator helper (Client.Java-008). Both MxGatewayClient and GalaxyRepositoryClient now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into MxGatewayErrors.fromGrpc is now the specific RPC name instead of the generic "async call"/"galaxy async call". Verified by the full existing async test suite plus the new MxGatewayLowFindingsTests.

Client.Java-010

Field Value
Severity Low
Category Documentation & comments
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272, clients/java/README.md:76
Status Resolved

Description: The acknowledgeAlarm Javadoc states the gateway authenticates against an invoke:alarm-ack scope, and the README states the Galaxy Repository requires a metadata:read scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.

Recommendation: Reconcile the scope names with src/MxGateway.Server/Security/ and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.

Resolution: (2026-05-18) Partially re-triaged. Verified against src/MxGateway.Server/Security/Authorization/GatewayScopes.cs and GatewayGrpcScopeResolver.cs: the canonical scope catalog is session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. (a) The README's metadata:read for the Galaxy Repository is correctTestConnectionRequest/GetLastDeployTimeRequest/DiscoverHierarchyRequest/WatchDeployEventsRequest all resolve to GatewayScopes.MetadataRead; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define metadata:read. (b) The acknowledgeAlarm Javadoc's invoke:alarm-ack is wrong — no such scope exists. AcknowledgeAlarmRequest and QueryActiveAlarmsRequest are not special-cased in GatewayGrpcScopeResolver, so they fall through the _ => GatewayScopes.Admin default and require the admin scope. The Javadoc was corrected to state the admin scope; queryActiveAlarms did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.

Client.Java-011

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63
Status Resolved

Description: The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.

Recommendation: Confirm fail-fast is intended (it appears to be); if so, document it on MxEventStream so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.

Resolution: (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and docs/DesignDecisions.md make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the MxEventStream class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an MxGatewayException, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. clients/java/README.md carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext.

Client.Java-012

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674
Status Resolved

Description: CommonOptions.resolved() mutates this (resolvedApiKey, resolvedTimeout) and returns this, but toClientOptions() and redactedJsonMap() read those mutated fields. If redactedJsonMap() is ever called before resolved(), it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.

Recommendation: Make resolved() return an immutable resolved value object, or compute resolvedApiKey/resolvedTimeout lazily in their getters so call ordering cannot produce stale output.

Resolution: (2026-05-18) Confirmed against source: resolved() populated the resolvedApiKey/resolvedTimeout mutable fields and toClientOptions()/redactedJsonMap() read them, so calling either before resolved() emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods resolvedApiKey() and resolvedTimeout() that compute their value on each call (API key from --api-key or the --api-key-env variable; timeout via parseDuration). toClientOptions() and redactedJsonMap() now call those accessors directly, so call ordering can no longer produce stale output. resolved() is retained as a no-op returning this purely for call-site readability (common.resolved()), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing MxGatewayCliTests JSON-redaction and option-parsing tests.

Client.Java-013

Field Value
Severity High
Category Testing coverage
Location clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304, clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244
Status Resolved

Description: MxGatewayCliSession in MxGatewayCli.java:1214 was extended in commit f220908 (the "bulk read/write CLI subcommands" change) with five new abstract methods — readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk. The test-only FakeSession in MxGatewayCliTests.java:212 still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of mxgateway-cli's test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale .class files under build/classes/java/test/ predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. gradle test (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The CLAUDE.md source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.

Recommendation: Add the five missing @Override implementations to FakeSession (stubs returning empty lists are fine — only subscribeBulk/unsubscribeBulk are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.

Resolution: 2026-05-20 — Added the five missing @Override stubs (readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk) to FakeSession in clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java, each returning an empty ArrayList<> to match the interface return types (List<BulkReadResult> / List<BulkWriteResult>) without throwing. Imported BulkReadResult, BulkWriteResult, WriteBulkEntry, Write2BulkEntry, WriteSecuredBulkEntry, WriteSecured2BulkEntry from mxaccess_gateway.v1.MxaccessGateway. GrpcMxGatewayCliSession in MxGatewayCli.java is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with gradle clean followed by gradle :mxgateway-cli:compileTestJava and gradle :mxgateway-cli:test from clients/java, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.

Client.Java-014

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124
Status Resolved

Description: MxEventStream.observer().beforeStart simply assigns requestStream without checking the closed flag, while close() reads requestStream after setting closed = true. If close() runs before the gRPC call has attached its ClientCallStreamObserver (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a finally block when an unrelated setup step throws), then at close time requestStream is null, so stream.cancel(...) is skipped. beforeStart then fires later, stores the live requestStream, and never observes closed — the underlying gRPC call leaks open and continues delivering events to a MxEventStream whose consumer has stopped iterating. The sibling DeployEventStream.beforeStart already does the correct thing (if (closed.get()) { requestStream.cancel(...); }); the two adaptors should behave identically.

Recommendation: Mirror DeployEventStream's pattern in MxEventStream.beforeStart: after storing requestStream, check the closed flag and cancel the stream eagerly if a prior close() has already fired. Add a regression test analogous to GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream to lock in the behavior.

Resolution: 2026-05-20 — Mirrored DeployEventStream.beforeStart in MxEventStream.beforeStart: after storing the ClientCallStreamObserver, the observer now reads the closed flag and calls requestStream.cancel("client cancelled event stream", null) when a prior close() already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing volatile boolean closed field (already established as a happens-before publisher by close() setting it before reading requestStream); no field shape changes were needed. clients/java/README.md documents the new safe-close-before-beforeStart contract. Regression test: MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream (mirrors GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream).

Client.Java-015

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138, MxGatewayClient.java:183-191,224-232,322-329, GalaxyRepositoryClient.java:164-170,212-214
Status Resolved

Description: MxGatewayChannels.toCompletable registers a whenComplete on the local target future to forward cancellation to the source gRPC ListenableFuture. Every caller — openSessionAsync, invokeAsync, acknowledgeAlarmAsync, discoverHierarchyPageAsync, getLastDeployTimeAsync — then chains .thenApply(normalisingValidator(...)) or .thenApply(::getOk) and returns the chained future to the user. CompletableFuture.thenApply returns a new future whose cancellation does not propagate back to the source target. Cancelling the user-facing future therefore never sets target.isCancelled() == true, so source.cancel(true) is never invoked and the underlying gRPC call continues until its deadline expires. The JavaClientDesign.md "Streaming" section explicitly says "Stream cancellation should call ClientCall.cancel" — the same expectation reasonably applies to the unary *Async surface.

Recommendation: Either return target directly from each *Async method (and inline the validator into the FutureCallback.onSuccess path so no thenApply is needed), or attach the cancellation listener to the final returned future. The cleanest fix is to have MxGatewayChannels.toCompletable return a future that wraps the validator internally and registers whenComplete on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a ServerCallStreamObserver.setOnCancelHandler latch).

Resolution: 2026-05-20 — Fixed by inlining the reply validator into MxGatewayChannels.toCompletable so the user-visible future is the same future cancellation is bound to: added a new toCompletable(source, operation, validator) overload that runs the validator inside the FutureCallback.onSuccess path (normalising non-MxGatewayException RuntimeExceptions through MxGatewayErrors.fromGrpc, matching the existing synchronous try/catch). Replaced the previous whenComplete-based cancellation listener with a small CancellingCompletableFuture<T> subclass whose cancel(boolean) forwards to the source ListenableFuture.cancel(...) unconditionally, so even the no-validator overload propagates cancellation deterministically (the whenComplete listener only fired when target.isCancelled() was already true, which is exactly the case thenApply broke). Updated MxGatewayClient.openSessionAsync, MxGatewayClient.invokeAsync, MxGatewayClient.acknowledgeAlarmAsync, GalaxyRepositoryClient.testConnectionAsync, and GalaxyRepositoryClient.getLastDeployTimeAsync to use the new validator overload directly (no .thenApply chain). GalaxyRepositoryClient.discoverHierarchyAsync is paged via thenCompose, so it now publishes the current in-flight page future via an AtomicReference and returns a top-level CompletableFuture whose overridden cancel(boolean) cancels whichever page is currently outstanding. clients/java/README.md documents the new cancellation contract: cancelling any *Async future aborts the underlying gRPC call. Regression tests: MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall (full in-process gRPC test using ServerCallStreamObserver.setOnCancelHandler to latch when the server observes RPC cancellation), toCompletableValidatorOverloadForwardsCancellationToSource, and toCompletableNoValidatorOverloadForwardsCancellationToSource (unit-level proofs that both MxGatewayChannels.toCompletable overloads forward cancel(true) to the source ListenableFuture).

Client.Java-016

Field Value
Severity Low
Category Code organization & conventions
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391, GalaxyRepositoryClient.java:285-315
Status Resolved

Description: Client.Java-009 introduced MxGatewayChannels to deduplicate createChannel, withDeadline, withStreamDeadline, and toCompletable. The two close() / closeAndAwaitTermination() methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of MxGatewayClient.close() + closeAndAwaitTermination() and GalaxyRepositoryClient.close() + closeAndAwaitTermination() are now duplicated verbatim, including the awaitTermination(connectTimeout) semantic (see Client.Java-019), the InterruptedException handling, and the ownedChannel == null guard. A fix to one path (e.g. introducing a dedicated shutdownTimeout option) will silently miss the other.

Recommendation: Move the shutdown logic into MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options) and MxGatewayChannels.shutdownAndAwaitTermination(...). Have both clients delegate to it. Same recommendation applies to the duplicated MxGatewayAuthInterceptor construction in the two constructors (MxGatewayClient(Channel, ...) and GalaxyRepositoryClient(Channel, ...)).

Resolution: 2026-05-20 — Extracted the duplicated shutdown logic into MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions) and MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions). Both helpers handle the ownedChannel == null no-op, the orderly-shutdown / awaitTermination / shutdownNow-on-timeout escalation, and the InterruptedException-restoring-the-interrupt-flag path. MxGatewayClient.close()/closeAndAwaitTermination() and GalaxyRepositoryClient.close()/closeAndAwaitTermination() are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's shutdownTimeout) lives in one place. Unused java.util.concurrent.TimeUnit imports were removed from both clients. The constructor-level MxGatewayAuthInterceptor duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel (covers the null-channel guard), shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout, and shutdownEscalatesToShutdownNowWhenTimeoutExceeded (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).

Client.Java-017

Field Value
Severity Low
Category Documentation & comments
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36, clients/java/README.md:99-107
Status Resolved

Description: MxEventStream.streamEvents was recently widened from a 16-element buffer to a 1024-element buffer (MxGatewayClient.streamEvents at line 268: new MxEventStream(1024)). The class-level Javadoc on MxEventStream still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and clients/java/README.md line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The MxGatewayClient.streamEvents inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.

Recommendation: Update the MxEventStream Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through MxGatewayClientOptions so callers can tune it per session.

Resolution: 2026-05-20 — Updated the MxEventStream class Javadoc and clients/java/README.md so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (MxGatewayClient.streamEvents) passes 1024 to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through MxGatewayClientOptions was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and MxGatewayClient.streamEvents is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.

Client.Java-018

Field Value
Severity Low
Category Security
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66
Status Resolved

Description: redactCredentials(value) splits its input on \\s+ (whitespace) and only redacts whitespace-delimited tokens that start with mxgw_ or equal bearer (case-insensitive). gRPC Status.getDescription() strings, log lines, and proto error messages can carry credentials separated by colons (Bearer:mxgw_id_secret), commas (token=mxgw_id_secret,scope=...), single quotes ('mxgw_id_secret'), parentheses ((mxgw_id_secret)), or embedded in URLs/paths — all of which leave the mxgw_ token attached to a non-whitespace neighbour and survive redaction. MxGatewayErrors.fromGrpc is the primary consumer; a gateway error description like authentication failed: 'mxgw_id_secret' would round-trip the secret into the resulting MxGatewayAuthenticationException message.

Recommendation: Replace the whitespace-split scrub with a regex-based pass that matches mxgw_[A-Za-z0-9_-]+ anywhere in the string and substitutes <redacted>; also redact Bearer\s+\S+ as a unit so the token after Bearer is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted that asserts a quoted or comma-delimited credential is fully masked.

Resolution: 2026-05-20 — Replaced the whitespace-split scrub with two compiled Pattern regexes: mxgw_[A-Za-z0-9_-]+ matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and (?i)bearer\s+\S+ masks an authorization-header style Bearer <token> as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes Bearer <redacted> for the common combined case and renders it idempotently. Regression tests in MxGatewayFixtureTests: redactCredentialsHandlesNonWhitespaceDelimitedTokens exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; redactCredentialsLeavesBenignContentAlone confirms strings without credentials and a null input are unchanged.

Client.Java-019

Field Value
Severity Low
Category Performance & resource management
Location clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391, GalaxyRepositoryClient.java:286-315
Status Resolved

Description: Both clients' close() / closeAndAwaitTermination() use options.connectTimeout() as the upper bound on awaitTermination. The connectTimeout semantically describes how long the client will wait to establish the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after shutdown(). With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to shutdownNow() and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on withDeadline. Conversely, a caller who sets a small connectTimeout (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.

Recommendation: Introduce a dedicated shutdownTimeout on MxGatewayClientOptions (defaulting to e.g. 510 s independent of connectTimeout) and use it in close() and closeAndAwaitTermination(). Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.

Resolution: 2026-05-20 — Added a dedicated shutdownTimeout Duration on MxGatewayClientOptions (builder method shutdownTimeout(Duration), accessor shutdownTimeout(), default 10 s), independent of connectTimeout. Both shared shutdown helpers introduced for Client.Java-016 (MxGatewayChannels.shutdown and shutdownAndAwaitTermination) call options.shutdownTimeout() as the awaitTermination upper bound, so a small connectTimeout (e.g. a 500 ms health-probe timeout) no longer forces a premature shutdownNow() on in-flight calls. The new option is reflected in toString() and documented on both helpers and the close()/closeAndAwaitTermination() Javadoc on both clients; clients/java/README.md notes the default and the independence from connectTimeout. Regression tests in MxGatewayLowFindingsIITests: shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to shutdownNow()), shutdownEscalatesToShutdownNowWhenTimeoutExceeded (a stuck channel beyond the shutdown timeout is forcibly shut down), and shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout (the default holds even when connectTimeout is small).

Client.Java-020

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254, galaxy_repository.proto:94
Status Resolved

Description: galaxy_repository.proto defines DeployEvent.sequence as uint64; the protobuf Java mapping projects that to a signed long. The CLI's text-mode galaxy-watch output prints it as "seq=%d ...", which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach 2^63), but the broader pattern is brittle: any unsigned proto field printed via %d will display incorrectly past the signed boundary. The JSON path uses protoJson(event) which formats unsigned longs as numeric strings via JsonFormat, so JSON output is correct; only the text mode is at risk.

Recommendation: Print the sequence with Long.toUnsignedString(event.getSequence()) (or switch the text format to %s and pass the unsigned-string conversion). The same rule should apply to any other uint64 proto fields that surface in CLI text output.

Resolution: 2026-05-20 — Updated the galaxy-watch text-mode out.printf in MxGatewayCli.GalaxyWatchCommand.call() to use %s for the sequence field and pass Long.toUnsignedString(event.getSequence()), so deploy sequences past 2^63 render as their correct unsigned decimal string instead of a negative signed long. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. Regression test: MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output contains seq=18446744073709551615 and does not contain seq=-1.