Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
44 KiB
Code Review — Client.Java
| Field | Value |
|---|---|
| Module | clients/java |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | 1cd51bb |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
A second-pass review against commit 1cd51bb. Client.Java-001 through
Client.Java-012 are unchanged from the prior pass; the table below records the
new findings raised in this pass against the same checklist categories.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: CLI MxEventStream(1024) capacity contradicts Javadoc/README "16-element buffer" claim (Client.Java-017); CLI DeployEvent.sequence printed with %d as signed long (Client.Java-020). |
| 2 | mxaccessgw conventions | No new issues found in this pass. |
| 3 | Concurrency & thread safety | Issues found: MxEventStream.beforeStart does not honour pre-start close() and leaks the gRPC call (Client.Java-014); MxGatewayChannels.toCompletable cancellation propagation is broken once the future is wrapped in thenApply (Client.Java-015). |
| 4 | Error handling & resilience | Issue found: MxGatewaySecrets.redactCredentials only inspects whitespace-delimited tokens, so colon/comma/quote-embedded mxgw_ credentials leak through (Client.Java-018). |
| 5 | Security | Issue found: same redactCredentials leak — see Client.Java-018. |
| 6 | Performance & resource management | Issue found: client close() uses the connect timeout as its shutdown deadline (Client.Java-019). |
| 7 | Design-document adherence | No new issues found in this pass. |
| 8 | Code organization & conventions | Issue found: channel close() / closeAndAwaitTermination() are still duplicated verbatim across MxGatewayClient and GalaxyRepositoryClient despite Client.Java-009's stated resolution (Client.Java-016). |
| 9 | Testing coverage | Issue found: CLI FakeSession does not implement the five bulk methods added to MxGatewayCliSession, so the CLI test module fails to compile against the current source (Client.Java-013). |
| 10 | Documentation & comments | Issue found: docs claim a 16-element event-stream buffer that is actually 1024 in production (Client.Java-017). |
Findings
Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32 |
| Status | Resolved |
Description: redactApiKey preserves the leading and trailing four characters of the key. A gateway API key has the form mxgw_<key-id>_<secret>; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (CommonOptions.redactedJsonMap), and MxGatewayClientOptions.toString(). CLAUDE.md states API keys must never reach logs.
Recommendation: Redact the secret entirely. Show only a stable non-secret prefix (e.g. the mxgw_<key-id>_ portion) and mask everything after it, or emit a fixed mxgw_*** form. Do not echo any trailing characters of the secret.
Resolution: (2026-05-18) Confirmed against source: the old substring(0,4) + stars + substring(len-4) echoed the last four secret characters. redactApiKey now masks the secret entirely: for gateway-shaped keys it returns the non-secret mxgw_<key-id>_ prefix followed by *** (locating the secret separator as the first _ after mxgw_); any non-gateway-shaped token returns <redacted>. No leading/trailing secret characters are ever emitted. The pre-existing MxGatewayCliTests.openSessionJsonRedactsApiKey assertion that hardcoded the leaky mxgw***********cret form was corrected to assert the masked mxgw_visible_*** form. Regression tests: MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret, redactApiKeyForNonGatewayShapedKeyRevealsNothing, redactApiKeyStillHandlesNullAndShortInput.
Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92 |
| Status | Resolved |
Description: The next field is a plain (non-volatile) instance field, and MxEventStream exposes no thread-confinement guarantee. More concretely, a queue-overflow offer() and a close() offer(END) can interleave so the overflow exception is enqueued after END and never observed — the contract that "next() throws after overflow" is not guaranteed once close() has been called.
Recommendation: Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
Resolution: (2026-05-18) Confirmed against source: the old offer() END-branch did queue.clear(); queue.offer(END) when full, so a close() arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single terminate(MxGatewayException) method guarded by a terminated flag and a terminalLock; the first terminal condition wins and a later close()/END cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while close() is safe from any thread. Regression tests: MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose (deterministic) and eventStreamConcurrentOverflowAndCloseAlwaysTerminate (300-iteration race stress).
Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140 |
| Status | Resolved |
Description: OpenSessionReply carries gateway_protocol_version (proto field 8), and MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION exists so the client can reject incompatible generated-code inputs. The client never reads reply.getGatewayProtocolVersion() nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
Recommendation: In openSession/openSessionRaw, compare reply.getGatewayProtocolVersion() with MxGatewayClientVersion.gatewayProtocolVersion() and throw a typed MxGatewayException on mismatch.
Resolution: (2026-05-18) Confirmed against source: neither openSessionRaw nor openSessionAsync read getGatewayProtocolVersion(). Added a private ensureGatewayProtocolCompatible helper, called from both openSessionRaw and openSessionAsync, that throws MxGatewayException with a clear mismatch message when the gateway reports a non-zero version differing from MxGatewayClientVersion.gatewayProtocolVersion(). A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. clients/java/README.md documents the new fail-fast check. Regression tests: MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion and openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion.
Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197 |
| Status | Resolved |
Description: register, addItem, and addItem2 check reply.hasRegister()/hasAddItem() and otherwise fall back to reply.getReturnValue().getInt32Value(). If the gateway returns a reply with neither the typed payload nor a return_value set, the method silently returns 0 — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
Recommendation: If the expected typed payload is absent and no return_value is present, throw MxGatewayException (protocol violation) instead of returning 0.
Resolution: (2026-05-18) Confirmed against source: all three methods returned reply.getReturnValue().getInt32Value() (which yields 0 for an unset message field) when the typed payload was absent. Each method now guards the fallback with reply.hasReturnValue() and throws MxGatewayException describing the protocol violation when neither the typed payload nor a return_value is present. The legitimate return_value fallback is preserved. Regression tests: MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue, and addItemStillHonoursReturnValueFallback.
Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105 |
| Status | Resolved |
Description: close() delegates to closeRaw(), which performs a network RPC. When MxGatewaySession is used in try-with-resources and the body throws, a failure inside closeSession (e.g. WORKER_UNAVAILABLE) throws from close() and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing close().
Recommendation: Either make close() swallow/log close-time failures (keeping closeRaw() for callers who want the result), or document clearly that close() performs a network call that can throw.
Resolution: (2026-05-18) Confirmed against source: close() called closeRaw() directly, so a CloseSession RPC failure propagated out of try-with-resources and replaced the body exception. close() now catches MxGatewayException from closeRaw() and logs it at WARNING via System.Logger instead of rethrowing, so a close-time failure never masks the body exception. closeRaw() is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use closeRaw() for explicit close handling are documented in clients/java/README.md and the close() Javadoc. Regression tests: MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException and closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt.
Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284 |
| Status | Resolved |
Description: close() (the AutoCloseable method invoked by try-with-resources) calls only ownedChannel.shutdown() and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. closeAndAwaitTermination() does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
Recommendation: Have close() await termination for a bounded time and shutdownNow() on timeout (the logic already in closeAndAwaitTermination()), or document that try-with-resources callers should call closeAndAwaitTermination().
Resolution: (2026-05-18) Confirmed against source: both MxGatewayClient.close() and GalaxyRepositoryClient.close() called only ownedChannel.shutdown(). close() in both clients now performs the bounded-wait logic previously only in closeAndAwaitTermination(): it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls shutdownNow() on timeout. Because close() cannot throw a checked exception, an InterruptedException while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. closeAndAwaitTermination() is retained unchanged for callers who want the checked, blocking-aware variant. clients/java/README.md documents the new try-with-resources close() semantics.
Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/ |
| Status | Resolved |
Description: The alarm surface — acknowledgeAlarm/acknowledgeAlarmAsync/queryActiveAlarms and MxGatewayActiveAlarmsSubscription — has zero test coverage. TLS channel construction, the async streamEventsAsync path, MxGatewayEventSubscription pre-start cancellation, and MxEventStream queue overflow are likewise untested. JavaClientDesign.md explicitly lists async stream-observer cancellation and status/error mapping as required tests.
Recommendation: Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
Resolution: (2026-05-18) Confirmed against source: no test referenced acknowledgeAlarm, queryActiveAlarms, streamEventsAsync, TLS construction, or MxEventStream overflow. Added MxGatewayLowFindingsTests (12 tests) covering: acknowledgeAlarm/acknowledgeAlarmAsync (success, typed protocol-failure, async transport-failure normalisation), queryActiveAlarms observer delivery, MxGatewayActiveAlarmsSubscription and MxGatewayEventSubscription pre-start cancellation, streamEventsAsync observer delivery, MxEventStream queue overflow surfacing MxGatewayException, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes GrpcSslContexts throw IllegalArgumentException (not SSLException), which the old catch (SSLException) let escape unwrapped — the catch in the shared channel builder was broadened to also wrap RuntimeException so callers always see one typed MxGatewayException.
Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304 |
| Status | Resolved |
Description: acknowledgeAlarmAsync and openSessionAsync apply ensureProtocolSuccess inside thenApply. If that validator throws a non-MxGatewayException RuntimeException it is wrapped by CompletionException with no fromGrpc normalisation, unlike the synchronous paths which normalise via try/catch. The async and sync error surfaces are therefore inconsistent.
Recommendation: Wrap the thenApply body so any non-MxGatewayException is routed through MxGatewayErrors.fromGrpc, matching the synchronous methods.
Resolution: (2026-05-18) Confirmed against source: the thenApply validators in openSessionAsync, invokeAsync, and acknowledgeAlarmAsync were not normalised — in practice the gateway's own validators (ensureProtocolSuccess, ensureMxAccessSuccess, ensureGatewayProtocolCompatible) only ever throw MxGatewayException, but a stray non-MxGatewayException RuntimeException (e.g. an NPE from a malformed reply) would surface raw inside CompletionException. Added MxGatewayChannels.normalisingValidator(operation, fn): it rethrows MxGatewayException unchanged and routes any other RuntimeException through MxGatewayErrors.fromGrpc, matching the synchronous try/catch paths. All three async thenApply sites now use it. Regression test: MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator.
Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391, clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413 |
| Status | Resolved |
Description: createChannel, withDeadline, withStreamDeadline, and toCompletable are duplicated nearly verbatim across MxGatewayClient and GalaxyRepositoryClient (~80 lines). A fix to one will not propagate to the other.
Recommendation: Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
Resolution: (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private MxGatewayChannels utility class holding createChannel(options, tlsErrorPrefix), withDeadline(stub, options), withStreamDeadline(stub, options), toCompletable(future, operation), and the new normalisingValidator helper (Client.Java-008). Both MxGatewayClient and GalaxyRepositoryClient now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into MxGatewayErrors.fromGrpc is now the specific RPC name instead of the generic "async call"/"galaxy async call". Verified by the full existing async test suite plus the new MxGatewayLowFindingsTests.
Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272, clients/java/README.md:76 |
| Status | Resolved |
Description: The acknowledgeAlarm Javadoc states the gateway authenticates against an invoke:alarm-ack scope, and the README states the Galaxy Repository requires a metadata:read scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
Recommendation: Reconcile the scope names with src/MxGateway.Server/Security/ and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
Resolution: (2026-05-18) Partially re-triaged. Verified against src/MxGateway.Server/Security/Authorization/GatewayScopes.cs and GatewayGrpcScopeResolver.cs: the canonical scope catalog is session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. (a) The README's metadata:read for the Galaxy Repository is correct — TestConnectionRequest/GetLastDeployTimeRequest/DiscoverHierarchyRequest/WatchDeployEventsRequest all resolve to GatewayScopes.MetadataRead; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define metadata:read. (b) The acknowledgeAlarm Javadoc's invoke:alarm-ack is wrong — no such scope exists. AcknowledgeAlarmRequest and QueryActiveAlarmsRequest are not special-cased in GatewayGrpcScopeResolver, so they fall through the _ => GatewayScopes.Admin default and require the admin scope. The Javadoc was corrected to state the admin scope; queryActiveAlarms did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.
Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63 |
| Status | Resolved |
Description: The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
Recommendation: Confirm fail-fast is intended (it appears to be); if so, document it on MxEventStream so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
Resolution: (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and docs/DesignDecisions.md make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the MxEventStream class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an MxGatewayException, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. clients/java/README.md carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext.
Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674 |
| Status | Resolved |
Description: CommonOptions.resolved() mutates this (resolvedApiKey, resolvedTimeout) and returns this, but toClientOptions() and redactedJsonMap() read those mutated fields. If redactedJsonMap() is ever called before resolved(), it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
Recommendation: Make resolved() return an immutable resolved value object, or compute resolvedApiKey/resolvedTimeout lazily in their getters so call ordering cannot produce stale output.
Resolution: (2026-05-18) Confirmed against source: resolved() populated the resolvedApiKey/resolvedTimeout mutable fields and toClientOptions()/redactedJsonMap() read them, so calling either before resolved() emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods resolvedApiKey() and resolvedTimeout() that compute their value on each call (API key from --api-key or the --api-key-env variable; timeout via parseDuration). toClientOptions() and redactedJsonMap() now call those accessors directly, so call ordering can no longer produce stale output. resolved() is retained as a no-op returning this purely for call-site readability (common.resolved()), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing MxGatewayCliTests JSON-redaction and option-parsing tests.
Client.Java-013
| Field | Value |
|---|---|
| Severity | High |
| Category | Testing coverage |
| Location | clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304, clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244 |
| Status | Resolved |
Description: MxGatewayCliSession in MxGatewayCli.java:1214 was extended in commit f220908 (the "bulk read/write CLI subcommands" change) with five new abstract methods — readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk. The test-only FakeSession in MxGatewayCliTests.java:212 still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of mxgateway-cli's test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale .class files under build/classes/java/test/ predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. gradle test (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The CLAUDE.md source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.
Recommendation: Add the five missing @Override implementations to FakeSession (stubs returning empty lists are fine — only subscribeBulk/unsubscribeBulk are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for read-bulk, write-bulk, and the bench-read-bulk subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.
Resolution: 2026-05-20 — Added the five missing @Override stubs (readBulk, writeBulk, write2Bulk, writeSecuredBulk, writeSecured2Bulk) to FakeSession in clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java, each returning an empty ArrayList<> to match the interface return types (List<BulkReadResult> / List<BulkWriteResult>) without throwing. Imported BulkReadResult, BulkWriteResult, WriteBulkEntry, Write2BulkEntry, WriteSecuredBulkEntry, WriteSecured2BulkEntry from mxaccess_gateway.v1.MxaccessGateway. GrpcMxGatewayCliSession in MxGatewayCli.java is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with gradle clean followed by gradle :mxgateway-cli:compileTestJava and gradle :mxgateway-cli:test from clients/java, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.
Client.Java-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124 |
| Status | Resolved |
Description: MxEventStream.observer().beforeStart simply assigns requestStream without checking the closed flag, while close() reads requestStream after setting closed = true. If close() runs before the gRPC call has attached its ClientCallStreamObserver (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a finally block when an unrelated setup step throws), then at close time requestStream is null, so stream.cancel(...) is skipped. beforeStart then fires later, stores the live requestStream, and never observes closed — the underlying gRPC call leaks open and continues delivering events to a MxEventStream whose consumer has stopped iterating. The sibling DeployEventStream.beforeStart already does the correct thing (if (closed.get()) { requestStream.cancel(...); }); the two adaptors should behave identically.
Recommendation: Mirror DeployEventStream's pattern in MxEventStream.beforeStart: after storing requestStream, check the closed flag and cancel the stream eagerly if a prior close() has already fired. Add a regression test analogous to GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream to lock in the behavior.
Resolution: 2026-05-20 — Mirrored DeployEventStream.beforeStart in MxEventStream.beforeStart: after storing the ClientCallStreamObserver, the observer now reads the closed flag and calls requestStream.cancel("client cancelled event stream", null) when a prior close() already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing volatile boolean closed field (already established as a happens-before publisher by close() setting it before reading requestStream); no field shape changes were needed. clients/java/README.md documents the new safe-close-before-beforeStart contract. Regression test: MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream (mirrors GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream).
Client.Java-015
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138, MxGatewayClient.java:183-191,224-232,322-329, GalaxyRepositoryClient.java:164-170,212-214 |
| Status | Resolved |
Description: MxGatewayChannels.toCompletable registers a whenComplete on the local target future to forward cancellation to the source gRPC ListenableFuture. Every caller — openSessionAsync, invokeAsync, acknowledgeAlarmAsync, discoverHierarchyPageAsync, getLastDeployTimeAsync — then chains .thenApply(normalisingValidator(...)) or .thenApply(::getOk) and returns the chained future to the user. CompletableFuture.thenApply returns a new future whose cancellation does not propagate back to the source target. Cancelling the user-facing future therefore never sets target.isCancelled() == true, so source.cancel(true) is never invoked and the underlying gRPC call continues until its deadline expires. The JavaClientDesign.md "Streaming" section explicitly says "Stream cancellation should call ClientCall.cancel" — the same expectation reasonably applies to the unary *Async surface.
Recommendation: Either return target directly from each *Async method (and inline the validator into the FutureCallback.onSuccess path so no thenApply is needed), or attach the cancellation listener to the final returned future. The cleanest fix is to have MxGatewayChannels.toCompletable return a future that wraps the validator internally and registers whenComplete on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a ServerCallStreamObserver.setOnCancelHandler latch).
Resolution: 2026-05-20 — Fixed by inlining the reply validator into MxGatewayChannels.toCompletable so the user-visible future is the same future cancellation is bound to: added a new toCompletable(source, operation, validator) overload that runs the validator inside the FutureCallback.onSuccess path (normalising non-MxGatewayException RuntimeExceptions through MxGatewayErrors.fromGrpc, matching the existing synchronous try/catch). Replaced the previous whenComplete-based cancellation listener with a small CancellingCompletableFuture<T> subclass whose cancel(boolean) forwards to the source ListenableFuture.cancel(...) unconditionally, so even the no-validator overload propagates cancellation deterministically (the whenComplete listener only fired when target.isCancelled() was already true, which is exactly the case thenApply broke). Updated MxGatewayClient.openSessionAsync, MxGatewayClient.invokeAsync, MxGatewayClient.acknowledgeAlarmAsync, GalaxyRepositoryClient.testConnectionAsync, and GalaxyRepositoryClient.getLastDeployTimeAsync to use the new validator overload directly (no .thenApply chain). GalaxyRepositoryClient.discoverHierarchyAsync is paged via thenCompose, so it now publishes the current in-flight page future via an AtomicReference and returns a top-level CompletableFuture whose overridden cancel(boolean) cancels whichever page is currently outstanding. clients/java/README.md documents the new cancellation contract: cancelling any *Async future aborts the underlying gRPC call. Regression tests: MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall (full in-process gRPC test using ServerCallStreamObserver.setOnCancelHandler to latch when the server observes RPC cancellation), toCompletableValidatorOverloadForwardsCancellationToSource, and toCompletableNoValidatorOverloadForwardsCancellationToSource (unit-level proofs that both MxGatewayChannels.toCompletable overloads forward cancel(true) to the source ListenableFuture).
Client.Java-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391, GalaxyRepositoryClient.java:285-315 |
| Status | Resolved |
Description: Client.Java-009 introduced MxGatewayChannels to deduplicate createChannel, withDeadline, withStreamDeadline, and toCompletable. The two close() / closeAndAwaitTermination() methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of MxGatewayClient.close() + closeAndAwaitTermination() and GalaxyRepositoryClient.close() + closeAndAwaitTermination() are now duplicated verbatim, including the awaitTermination(connectTimeout) semantic (see Client.Java-019), the InterruptedException handling, and the ownedChannel == null guard. A fix to one path (e.g. introducing a dedicated shutdownTimeout option) will silently miss the other.
Recommendation: Move the shutdown logic into MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options) and MxGatewayChannels.shutdownAndAwaitTermination(...). Have both clients delegate to it. Same recommendation applies to the duplicated MxGatewayAuthInterceptor construction in the two constructors (MxGatewayClient(Channel, ...) and GalaxyRepositoryClient(Channel, ...)).
Resolution: 2026-05-20 — Extracted the duplicated shutdown logic into MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions) and MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions). Both helpers handle the ownedChannel == null no-op, the orderly-shutdown / awaitTermination / shutdownNow-on-timeout escalation, and the InterruptedException-restoring-the-interrupt-flag path. MxGatewayClient.close()/closeAndAwaitTermination() and GalaxyRepositoryClient.close()/closeAndAwaitTermination() are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's shutdownTimeout) lives in one place. Unused java.util.concurrent.TimeUnit imports were removed from both clients. The constructor-level MxGatewayAuthInterceptor duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel (covers the null-channel guard), shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout, and shutdownEscalatesToShutdownNowWhenTimeoutExceeded (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).
Client.Java-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36, clients/java/README.md:99-107 |
| Status | Resolved |
Description: MxEventStream.streamEvents was recently widened from a 16-element buffer to a 1024-element buffer (MxGatewayClient.streamEvents at line 268: new MxEventStream(1024)). The class-level Javadoc on MxEventStream still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and clients/java/README.md line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The MxGatewayClient.streamEvents inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.
Recommendation: Update the MxEventStream Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through MxGatewayClientOptions so callers can tune it per session.
Resolution: 2026-05-20 — Updated the MxEventStream class Javadoc and clients/java/README.md so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (MxGatewayClient.streamEvents) passes 1024 to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through MxGatewayClientOptions was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and MxGatewayClient.streamEvents is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.
Client.Java-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66 |
| Status | Resolved |
Description: redactCredentials(value) splits its input on \\s+ (whitespace) and only redacts whitespace-delimited tokens that start with mxgw_ or equal bearer (case-insensitive). gRPC Status.getDescription() strings, log lines, and proto error messages can carry credentials separated by colons (Bearer:mxgw_id_secret), commas (token=mxgw_id_secret,scope=...), single quotes ('mxgw_id_secret'), parentheses ((mxgw_id_secret)), or embedded in URLs/paths — all of which leave the mxgw_ token attached to a non-whitespace neighbour and survive redaction. MxGatewayErrors.fromGrpc is the primary consumer; a gateway error description like authentication failed: 'mxgw_id_secret' would round-trip the secret into the resulting MxGatewayAuthenticationException message.
Recommendation: Replace the whitespace-split scrub with a regex-based pass that matches mxgw_[A-Za-z0-9_-]+ anywhere in the string and substitutes <redacted>; also redact Bearer\s+\S+ as a unit so the token after Bearer is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted that asserts a quoted or comma-delimited credential is fully masked.
Resolution: 2026-05-20 — Replaced the whitespace-split scrub with two compiled Pattern regexes: mxgw_[A-Za-z0-9_-]+ matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and (?i)bearer\s+\S+ masks an authorization-header style Bearer <token> as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes Bearer <redacted> for the common combined case and renders it idempotently. Regression tests in MxGatewayFixtureTests: redactCredentialsHandlesNonWhitespaceDelimitedTokens exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; redactCredentialsLeavesBenignContentAlone confirms strings without credentials and a null input are unchanged.
Client.Java-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391, GalaxyRepositoryClient.java:286-315 |
| Status | Resolved |
Description: Both clients' close() / closeAndAwaitTermination() use options.connectTimeout() as the upper bound on awaitTermination. The connectTimeout semantically describes how long the client will wait to establish the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after shutdown(). With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to shutdownNow() and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on withDeadline. Conversely, a caller who sets a small connectTimeout (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.
Recommendation: Introduce a dedicated shutdownTimeout on MxGatewayClientOptions (defaulting to e.g. 5–10 s independent of connectTimeout) and use it in close() and closeAndAwaitTermination(). Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.
Resolution: 2026-05-20 — Added a dedicated shutdownTimeout Duration on MxGatewayClientOptions (builder method shutdownTimeout(Duration), accessor shutdownTimeout(), default 10 s), independent of connectTimeout. Both shared shutdown helpers introduced for Client.Java-016 (MxGatewayChannels.shutdown and shutdownAndAwaitTermination) call options.shutdownTimeout() as the awaitTermination upper bound, so a small connectTimeout (e.g. a 500 ms health-probe timeout) no longer forces a premature shutdownNow() on in-flight calls. The new option is reflected in toString() and documented on both helpers and the close()/closeAndAwaitTermination() Javadoc on both clients; clients/java/README.md notes the default and the independence from connectTimeout. Regression tests in MxGatewayLowFindingsIITests: shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to shutdownNow()), shutdownEscalatesToShutdownNowWhenTimeoutExceeded (a stuck channel beyond the shutdown timeout is forcibly shut down), and shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout (the default holds even when connectTimeout is small).
Client.Java-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254, galaxy_repository.proto:94 |
| Status | Resolved |
Description: galaxy_repository.proto defines DeployEvent.sequence as uint64; the protobuf Java mapping projects that to a signed long. The CLI's text-mode galaxy-watch output prints it as "seq=%d ...", which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach 2^63), but the broader pattern is brittle: any unsigned proto field printed via %d will display incorrectly past the signed boundary. The JSON path uses protoJson(event) which formats unsigned longs as numeric strings via JsonFormat, so JSON output is correct; only the text mode is at risk.
Recommendation: Print the sequence with Long.toUnsignedString(event.getSequence()) (or switch the text format to %s and pass the unsigned-string conversion). The same rule should apply to any other uint64 proto fields that surface in CLI text output.
Resolution: 2026-05-20 — Updated the galaxy-watch text-mode out.printf in MxGatewayCli.GalaxyWatchCommand.call() to use %s for the sequence field and pass Long.toUnsignedString(event.getSequence()), so deploy sequences past 2^63 render as their correct unsigned decimal string instead of a negative signed long. The JSON path through protoJson(event) was already correct (proto JsonFormat emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to %d. Regression test: MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64 exercises the format string with the max-uint64 bit pattern (-1L) and asserts the output contains seq=18446744073709551615 and does not contain seq=-1.