db2218f395
Append 5 new findings (Client.Java-032..036): README flags for new alarm subcommands do not exist; StreamAlarmsCommand bounded queue silently drops alarms instead of fail-fast; BatchCommand whitespace tokenisation shreds quoted args; no library-side stream_alarms test; MxGatewayAlarmFeedSubscription is the fourth duplicate subscription class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
656 lines
89 KiB
Markdown
656 lines
89 KiB
Markdown
# Code Review — Client.Java
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Module | `clients/java` |
|
||
| Reviewer | Claude Code |
|
||
| Review date | 2026-05-24 |
|
||
| Commit reviewed | `42b0037` |
|
||
| Status | Re-reviewed |
|
||
| Open findings | 5 |
|
||
|
||
## Checklist coverage
|
||
|
||
A third-pass review against commit `a020350` (the sweep that resolved
|
||
Client.Java-013 through Client.Java-020). Prior findings are unchanged; new
|
||
findings raised in this pass are numbered Client.Java-021 onward.
|
||
|
||
| # | Category | Result |
|
||
|---|---|---|
|
||
| 1 | Correctness & logic bugs | Issues found: `stream-events` CLI text path still prints the proto `uint64 worker_sequence` with `%d` (Client.Java-023), the same bug Client.Java-020 fixed for `galaxy-watch`; `bench-read-bulk` includes failed-call durations in its success-latency histogram (Client.Java-024), mirroring the bug Client.Rust-015 fixed in Rust. |
|
||
| 2 | mxaccessgw conventions | No new issues found in this pass. |
|
||
| 3 | Concurrency & thread safety | Issue found: `DeployEventStream` did not receive the deterministic terminal-state serialisation that Client.Java-002 added to `MxEventStream`, so a concurrent queue-overflow + `close()` race can still erase the overflow signal (Client.Java-021). |
|
||
| 4 | Error handling & resilience | No new issues found in this pass. |
|
||
| 5 | Security | No new issues found in this pass. The Client.Java-018 regex correctly handles colon/comma/quote/paren/URL embeddings and is verified by the existing fixture tests. |
|
||
| 6 | Performance & resource management | No new issues found in this pass. `shutdownTimeout` is consistently honoured everywhere `ownedChannel.shutdown()` is called — both clients delegate to the shared `MxGatewayChannels.shutdown` / `shutdownAndAwaitTermination` helpers. |
|
||
| 7 | Design-document adherence | No new issues found in this pass. |
|
||
| 8 | Code organization & conventions | Issue found: the CLI `CommonOptions.toClientOptions()` does not propagate `shutdownTimeout` to the underlying `MxGatewayClientOptions`, so CLI users have no way to override the new option introduced by Client.Java-019 (Client.Java-025). |
|
||
| 9 | Testing coverage | Issue found: there is no CLI-level test coverage for the `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, or `bench-read-bulk` subcommands — Client.Java-013 noted this as out-of-scope but never filed a follow-up (Client.Java-026). |
|
||
| 10 | Documentation & comments | Issue found: `MxGatewayChannels.toCompletable` Javadoc claims chained `thenApply` futures forward `cancel()` upstream to `CancellingCompletableFuture`, which is not true of `CompletableFuture.thenApply`; the implementation works only because all validator chains are inlined into the new `toCompletable(source, operation, validator)` overload (Client.Java-022). |
|
||
|
||
### 2026-05-24 review (commit d692232)
|
||
|
||
Re-review pass at `d692232`. Diff against `a020350` is commit `397d3c5`:
|
||
gradle subprojects renamed `mxgateway-client` → `zb-mom-ww-mxgateway-client`,
|
||
`mxgateway-cli` → `zb-mom-ww-mxgateway-cli`. Java package change
|
||
`com.dohertylan.mxgateway.*` → `com.zb.mom.ww.mxgateway.*` (source directories
|
||
moved from `com/dohertylan/mxgateway/` to `com/zb/mom/ww/mxgateway/`).
|
||
`settings.gradle` and root `build.gradle` updated for the new project /
|
||
group names. CLI mainClass updated. The `gradle build` task at HEAD is green;
|
||
documentation updates lag — see the doc-side findings below.
|
||
|
||
| # | Category | Result |
|
||
|---|---|---|
|
||
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
|
||
| 2 | mxaccessgw conventions | Issues found: Client.Java-031 (README prose still uses the old short project names in plain text — the actual subproject names that drive Gradle / IDE imports are the prefixed ones). |
|
||
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
||
| 4 | Error handling & resilience | No issues found in this diff. |
|
||
| 5 | Security | No issues found in this diff. |
|
||
| 6 | Performance & resource management | No issues found in this diff. |
|
||
| 7 | Design-document adherence | Issues found: Client.Java-028 (`JavaClientDesign.md` build-layout example still cites the old `com/dohertylan/mxgateway/` package paths). |
|
||
| 8 | Code organization & conventions | Issues found: Client.Java-030 (no new test exercises the regenerated `QueryActiveAlarmsRequest` RPC path). |
|
||
| 9 | Testing coverage | Cross-reference to Client.Java-030; the bulk-command gaps tracked under Client.Java-026 remain. |
|
||
| 10 | Documentation & comments | Issues found: Client.Java-027 (Gradle task names in README and JavaClientDesign still reference the old `:mxgateway-client:` and `:mxgateway-cli:` paths — every command in the README breaks if copy-pasted); Client.Java-029 (`README.md:209` cites `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli` but the actual install path contains a doubled directory `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/`). |
|
||
|
||
### 2026-05-24 re-review (commit 42b0037)
|
||
|
||
Re-review pass at `42b0037`. Diff against `d692232` is five commits touching
|
||
`clients/java`: `10bd0c0` (Client.Java-027..031 fixes), `71d2c39` (port
|
||
`batch` subcommand), `f90bff0` (port bulk read/write SDK + CLI subcommands),
|
||
`8a0c59d` (port `stream-alarms` + `acknowledge-alarm` to the Java CLI plus a
|
||
new `MxGatewayClient.streamAlarms` SDK method returning
|
||
`MxGatewayAlarmFeedSubscription`), and `8738735` (README updates for the
|
||
alarm surface). `gradle build` from `clients/java` is green; new findings
|
||
focus on the alarm surface and the `batch` subcommand. Prior findings
|
||
Client.Java-001..031 are unchanged.
|
||
|
||
| # | Category | Result |
|
||
|---|---|---|
|
||
| 1 | Correctness & logic bugs | Issues found: `BatchCommand` tokenises stdin with `line.trim().split("\\s+")`, so any argument with embedded whitespace (e.g. `--comment "needs verification"`) is shredded mid-string (Client.Java-034); `StreamAlarmsCommand` uses `queue.offer(value)` on a bounded queue with no overflow detection, silently dropping alarm messages when the gateway pushes faster than the consumer drains (Client.Java-033). |
|
||
| 2 | mxaccessgw conventions | Cross-reference to Client.Java-033 — silently-dropping events violates `JavaStyleGuide.md` ("Do not reorder, coalesce, or drop events in client code") and the fail-fast event-backpressure contract. No new finding raised separately. |
|
||
| 3 | Concurrency & thread safety | No issues found in this diff. `MxGatewayAlarmFeedSubscription` correctly mirrors `MxGatewayEventSubscription`'s `beforeStart`-vs-`cancel` race fix (Client.Java-014). |
|
||
| 4 | Error handling & resilience | No issues found in this diff. |
|
||
| 5 | Security | No issues found in this diff. |
|
||
| 6 | Performance & resource management | No issues found in this diff. |
|
||
| 7 | Design-document adherence | No issues found in this diff. The new `streamAlarms` SDK surface matches `gateway.md`'s always-on alarm-monitor design (no worker session, multi-subscriber). |
|
||
| 8 | Code organization & conventions | Issue found: `MxGatewayAlarmFeedSubscription` is a structural copy of `MxGatewayEventSubscription` (same fields, same `beforeStart`/`cancel` shape) — the subscription-class family should share a common base or helper (Client.Java-036). |
|
||
| 9 | Testing coverage | Issue found: the new `MxGatewayClient.streamAlarms` SDK method has no library-side test in `zb-mom-ww-mxgateway-client/src/test/...` — only the CLI test exercises the path via a `FakeClient.streamAlarms` override that bypasses the production `subscription.wrap(observer)` glue (Client.Java-035). |
|
||
| 10 | Documentation & comments | Issue found: README (`clients/java/README.md:182-183`) documents the new `stream-alarms` and `acknowledge-alarm` commands with `--session-id <id>` (neither command has that option) and `acknowledge-alarm --alarm-reference …` (actual flag is `--reference`) — every documented invocation fails at picocli parse time (Client.Java-032). |
|
||
|
||
## Findings
|
||
|
||
### Client.Java-001
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Security |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
|
||
|
||
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: the old `substring(0,4) + stars + substring(len-4)` echoed the last four secret characters. `redactApiKey` now masks the secret entirely: for gateway-shaped keys it returns the non-secret `mxgw_<key-id>_` prefix followed by `***` (locating the secret separator as the first `_` after `mxgw_`); any non-gateway-shaped token returns `<redacted>`. No leading/trailing secret characters are ever emitted. The pre-existing `MxGatewayCliTests.openSessionJsonRedactsApiKey` assertion that hardcoded the leaky `mxgw***********cret` form was corrected to assert the masked `mxgw_visible_***` form. Regression tests: `MxGatewayMediumFindingsTests.redactApiKeyDoesNotLeakAnyCharacterOfTheSecret`, `redactApiKeyForNonGatewayShapedKeyRevealsNothing`, `redactApiKeyStillHandlesNullAndShortInput`.
|
||
|
||
### Client.Java-002
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
|
||
|
||
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: the old `offer()` END-branch did `queue.clear(); queue.offer(END)` when full, so a `close()` arriving after an overflow wiped the already-enqueued overflow exception, leaving the consumer with a clean end-of-stream and the overflow silently lost. Terminal transitions are now serialised through a single `terminate(MxGatewayException)` method guarded by a `terminated` flag and a `terminalLock`; the first terminal condition wins and a later `close()`/`END` cannot overwrite a published overflow fault. The Javadoc now explicitly documents that the iterator methods are single-consumer-only while `close()` is safe from any thread. Regression tests: `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic) and `eventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration race stress).
|
||
|
||
### Client.Java-003
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | mxaccessgw conventions |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
|
||
|
||
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: neither `openSessionRaw` nor `openSessionAsync` read `getGatewayProtocolVersion()`. Added a private `ensureGatewayProtocolCompatible` helper, called from both `openSessionRaw` and `openSessionAsync`, that throws `MxGatewayException` with a clear mismatch message when the gateway reports a non-zero version differing from `MxGatewayClientVersion.gatewayProtocolVersion()`. A gateway that leaves the field unset (value 0, e.g. an older gateway) is accepted unchanged for backward compatibility. `clients/java/README.md` documents the new fail-fast check. Regression tests: `MxGatewayMediumFindingsTests.openSessionRejectsIncompatibleGatewayProtocolVersion` and `openSessionAcceptsMatchingOrUnsetGatewayProtocolVersion`.
|
||
|
||
### Client.Java-004
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
|
||
|
||
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: all three methods returned `reply.getReturnValue().getInt32Value()` (which yields `0` for an unset message field) when the typed payload was absent. Each method now guards the fallback with `reply.hasReturnValue()` and throws `MxGatewayException` describing the protocol violation when neither the typed payload nor a `return_value` is present. The legitimate `return_value` fallback is preserved. Regression tests: `MxGatewayMediumFindingsTests.registerThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, `addItemThrowsWhenReplyHasNeitherTypedPayloadNorReturnValue`, and `addItemStillHonoursReturnValueFallback`.
|
||
|
||
### Client.Java-005
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Error handling & resilience |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
|
||
|
||
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: `close()` called `closeRaw()` directly, so a `CloseSession` RPC failure propagated out of try-with-resources and replaced the body exception. `close()` now catches `MxGatewayException` from `closeRaw()` and logs it at WARNING via `System.Logger` instead of rethrowing, so a close-time failure never masks the body exception. `closeRaw()` is unchanged and still throws for callers who want to observe the close result. The behavior change and the recommendation to use `closeRaw()` for explicit close handling are documented in `clients/java/README.md` and the `close()` Javadoc. Regression tests: `MxGatewayMediumFindingsTests.closeSuppressesCloseTimeFailureInsteadOfMaskingBodyException` and `closeRawStillSurfacesCloseTimeFailureForCallersWhoWantIt`.
|
||
|
||
### Client.Java-006
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Performance & resource management |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. `closeAndAwaitTermination()` does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
|
||
|
||
**Recommendation:** Have `close()` await termination for a bounded time and `shutdownNow()` on timeout (the logic already in `closeAndAwaitTermination()`), or document that try-with-resources callers should call `closeAndAwaitTermination()`.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: both `MxGatewayClient.close()` and `GalaxyRepositoryClient.close()` called only `ownedChannel.shutdown()`. `close()` in both clients now performs the bounded-wait logic previously only in `closeAndAwaitTermination()`: it shuts the channel down, waits up to the configured connect timeout for graceful termination, and calls `shutdownNow()` on timeout. Because `close()` cannot throw a checked exception, an `InterruptedException` while awaiting is handled by forcibly shutting the channel down and restoring the thread interrupt flag. `closeAndAwaitTermination()` is retained unchanged for callers who want the checked, blocking-aware variant. `clients/java/README.md` documents the new try-with-resources `close()` semantics.
|
||
|
||
### Client.Java-007
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Location | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription` pre-start cancellation, and `MxEventStream` queue overflow are likewise untested. `JavaClientDesign.md` explicitly lists async stream-observer cancellation and status/error mapping as required tests.
|
||
|
||
**Recommendation:** Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: no test referenced `acknowledgeAlarm`, `queryActiveAlarms`, `streamEventsAsync`, TLS construction, or `MxEventStream` overflow. Added `MxGatewayLowFindingsTests` (12 tests) covering: `acknowledgeAlarm`/`acknowledgeAlarmAsync` (success, typed protocol-failure, async transport-failure normalisation), `queryActiveAlarms` observer delivery, `MxGatewayActiveAlarmsSubscription` and `MxGatewayEventSubscription` pre-start cancellation, `streamEventsAsync` observer delivery, `MxEventStream` queue overflow surfacing `MxGatewayException`, TLS channel construction (missing CA file rejected with a typed exception, system-trust path builds cleanly), and the Client.Java-008 async-validator normalisation. While writing the TLS test a latent bug was found: a missing/unreadable CA file makes `GrpcSslContexts` throw `IllegalArgumentException` (not `SSLException`), which the old `catch (SSLException)` let escape unwrapped — the catch in the shared channel builder was broadened to also wrap `RuntimeException` so callers always see one typed `MxGatewayException`.
|
||
|
||
### Client.Java-008
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Error handling & resilience |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation, unlike the synchronous paths which normalise via `try/catch`. The async and sync error surfaces are therefore inconsistent.
|
||
|
||
**Recommendation:** Wrap the `thenApply` body so any non-`MxGatewayException` is routed through `MxGatewayErrors.fromGrpc`, matching the synchronous methods.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: the `thenApply` validators in `openSessionAsync`, `invokeAsync`, and `acknowledgeAlarmAsync` were not normalised — in practice the gateway's own validators (`ensureProtocolSuccess`, `ensureMxAccessSuccess`, `ensureGatewayProtocolCompatible`) only ever throw `MxGatewayException`, but a stray non-`MxGatewayException` `RuntimeException` (e.g. an NPE from a malformed reply) would surface raw inside `CompletionException`. Added `MxGatewayChannels.normalisingValidator(operation, fn)`: it rethrows `MxGatewayException` unchanged and routes any other `RuntimeException` through `MxGatewayErrors.fromGrpc`, matching the synchronous `try/catch` paths. All three async `thenApply` sites now use it. Regression test: `MxGatewayLowFindingsTests.openSessionAsyncNormalisesNonGatewayRuntimeExceptionFromValidator`.
|
||
|
||
### Client.Java-009
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other.
|
||
|
||
**Recommendation:** Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: the four helpers were duplicated near-verbatim. Added a package-private `MxGatewayChannels` utility class holding `createChannel(options, tlsErrorPrefix)`, `withDeadline(stub, options)`, `withStreamDeadline(stub, options)`, `toCompletable(future, operation)`, and the new `normalisingValidator` helper (Client.Java-008). Both `MxGatewayClient` and `GalaxyRepositoryClient` now delegate to it and their private copies were deleted, so a future fix lives in one place. Behavior is unchanged except the operation-name carried into `MxGatewayErrors.fromGrpc` is now the specific RPC name instead of the generic `"async call"`/`"galaxy async call"`. Verified by the full existing async test suite plus the new `MxGatewayLowFindingsTests`.
|
||
|
||
### Client.Java-010
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
|
||
|
||
**Recommendation:** Reconcile the scope names with `src/MxGateway.Server/Security/` and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
|
||
|
||
**Resolution:** (2026-05-18) Partially re-triaged. Verified against `src/MxGateway.Server/Security/Authorization/GatewayScopes.cs` and `GatewayGrpcScopeResolver.cs`: the canonical scope catalog is `session:open`, `session:close`, `invoke:read`, `invoke:write`, `invoke:secure`, `events:read`, `metadata:read`, `admin`. (a) The README's `metadata:read` for the Galaxy Repository is **correct** — `TestConnectionRequest`/`GetLastDeployTimeRequest`/`DiscoverHierarchyRequest`/`WatchDeployEventsRequest` all resolve to `GatewayScopes.MetadataRead`; no change needed. CLAUDE.md's prose lists only coarse scope groups, but the canonical resolver does define `metadata:read`. (b) The `acknowledgeAlarm` Javadoc's `invoke:alarm-ack` is **wrong** — no such scope exists. `AcknowledgeAlarmRequest` and `QueryActiveAlarmsRequest` are not special-cased in `GatewayGrpcScopeResolver`, so they fall through the `_ => GatewayScopes.Admin` default and require the `admin` scope. The Javadoc was corrected to state the `admin` scope; `queryActiveAlarms` did not assert a scope and was left unchanged. The README does not mention alarms, so no README change was required.
|
||
|
||
### Client.Java-011
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Performance & resource management |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
|
||
|
||
**Recommendation:** Confirm fail-fast is intended (it appears to be); if so, document it on `MxEventStream` so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed fail-fast is intended — CLAUDE.md ("fail-fast event backpressure") and `docs/DesignDecisions.md` make a slow consumer losing its subscription a deliberate v1 design choice, so this is documentation-only, not a behavior bug. Added an explicit "Backpressure (fail-fast)" section to the `MxEventStream` class Javadoc explaining that the adaptor uses gRPC auto-inbound flow control with a fixed 16-element buffer and no client flow control, that a consumer stall long enough to fill the buffer triggers an overflow that cancels the subscription and surfaces an `MxGatewayException`, and that consumers must drain promptly and be ready to resubscribe with a resume cursor. `clients/java/README.md` carries the same caveat. The queue capacity was intentionally left non-configurable to keep the v1 surface aligned with the gateway design; overflow behavior is covered by `MxGatewayLowFindingsTests.eventStreamQueueOverflowSurfacesExceptionFromNext`.
|
||
|
||
### Client.Java-012
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
|
||
|
||
**Recommendation:** Make `resolved()` return an immutable resolved value object, or compute `resolvedApiKey`/`resolvedTimeout` lazily in their getters so call ordering cannot produce stale output.
|
||
|
||
**Resolution:** (2026-05-18) Confirmed against source: `resolved()` populated the `resolvedApiKey`/`resolvedTimeout` mutable fields and `toClientOptions()`/`redactedJsonMap()` read them, so calling either before `resolved()` emitted stale empty/30s defaults. The two mutable fields were removed and replaced with side-effect-free accessor methods `resolvedApiKey()` and `resolvedTimeout()` that compute their value on each call (API key from `--api-key` or the `--api-key-env` variable; timeout via `parseDuration`). `toClientOptions()` and `redactedJsonMap()` now call those accessors directly, so call ordering can no longer produce stale output. `resolved()` is retained as a no-op returning `this` purely for call-site readability (`common.resolved()`), with its Javadoc updated to state resolution is now lazy. Pure-refactor with no runtime-behavior change for the existing call order, so no new test was added; covered by the existing `MxGatewayCliTests` JSON-redaction and option-parsing tests.
|
||
|
||
### Client.Java-013
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | High |
|
||
| Category | Testing coverage |
|
||
| Location | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java:212-304`, `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1214-1244` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `MxGatewayCliSession` in `MxGatewayCli.java:1214` was extended in commit `f220908` (the "bulk read/write CLI subcommands" change) with five new abstract methods — `readBulk`, `writeBulk`, `write2Bulk`, `writeSecuredBulk`, `writeSecured2Bulk`. The test-only `FakeSession` in `MxGatewayCliTests.java:212` still only implements the original set (register/addItem/advise/writeRaw/subscribeBulk/unsubscribeBulk/streamEventsAfter) and is declared a concrete (non-abstract) class. A clean compile of `mxgateway-cli`'s test source set therefore fails: a concrete implementer that omits abstract interface methods is a compile error. The stale `.class` files under `build/classes/java/test/` predate the interface change (dated 2026-05-20 03:38 vs CLI source dated 2026-05-20 05:06), which is why the issue is not visible until the next clean build. `gradle test` (or any CI pipeline that does not retain incremental state) will fail to build the CLI test module. The `CLAUDE.md` source-update workflow row "When source code changes, build and test the affected component" was not honoured for this CLI contract change.
|
||
|
||
**Recommendation:** Add the five missing `@Override` implementations to `FakeSession` (stubs returning empty lists are fine — only `subscribeBulk`/`unsubscribeBulk` are exercised by the existing tests, and the new bulk subcommands have no dedicated CLI tests yet). Optionally also add at least one CLI-level test for `read-bulk`, `write-bulk`, and the `bench-read-bulk` subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix.
|
||
|
||
**Resolution:** 2026-05-20 — Added the five missing `@Override` stubs (`readBulk`, `writeBulk`, `write2Bulk`, `writeSecuredBulk`, `writeSecured2Bulk`) to `FakeSession` in `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java`, each returning an empty `ArrayList<>` to match the interface return types (`List<BulkReadResult>` / `List<BulkWriteResult>`) without throwing. Imported `BulkReadResult`, `BulkWriteResult`, `WriteBulkEntry`, `Write2BulkEntry`, `WriteSecuredBulkEntry`, `WriteSecured2BulkEntry` from `mxaccess_gateway.v1.MxaccessGateway`. `GrpcMxGatewayCliSession` in `MxGatewayCli.java` is the only other implementer and already provides the methods (the source change that introduced the contract added them there). Verified with `gradle clean` followed by `gradle :mxgateway-cli:compileTestJava` and `gradle :mxgateway-cli:test` from `clients/java`, both BUILD SUCCESSFUL. No new CLI-level tests for the bulk subcommands were added — that follow-up is tracked separately and out of scope for this unblock-compilation fix.
|
||
|
||
### Client.Java-014
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:59-65,117-124` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `MxEventStream.observer().beforeStart` simply assigns `requestStream` without checking the `closed` flag, while `close()` reads `requestStream` after setting `closed = true`. If `close()` runs *before* the gRPC call has attached its `ClientCallStreamObserver` (a real race when callers cancel immediately after subscribing — e.g. construct, then close in a `finally` block when an unrelated setup step throws), then at close time `requestStream` is `null`, so `stream.cancel(...)` is skipped. `beforeStart` then fires later, stores the live `requestStream`, and never observes `closed` — the underlying gRPC call leaks open and continues delivering events to a `MxEventStream` whose consumer has stopped iterating. The sibling `DeployEventStream.beforeStart` already does the correct thing (`if (closed.get()) { requestStream.cancel(...); }`); the two adaptors should behave identically.
|
||
|
||
**Recommendation:** Mirror `DeployEventStream`'s pattern in `MxEventStream.beforeStart`: after storing `requestStream`, check the `closed` flag and cancel the stream eagerly if a prior `close()` has already fired. Add a regression test analogous to `GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream` to lock in the behavior.
|
||
|
||
**Resolution:** 2026-05-20 — Mirrored `DeployEventStream.beforeStart` in `MxEventStream.beforeStart`: after storing the `ClientCallStreamObserver`, the observer now reads the `closed` flag and calls `requestStream.cancel("client cancelled event stream", null)` when a prior `close()` already fired, closing the close/beforeStart race that previously leaked the underlying gRPC call. The fix uses the existing `volatile boolean closed` field (already established as a happens-before publisher by `close()` setting it before reading `requestStream`); no field shape changes were needed. `clients/java/README.md` documents the new safe-close-before-beforeStart contract. Regression test: `MxGatewayMediumFindingsTests.mxEventStreamCloseBeforeBeforeStartCancelsStream` (mirrors `GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream`).
|
||
|
||
### Client.Java-015
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:112-138`, `MxGatewayClient.java:183-191,224-232,322-329`, `GalaxyRepositoryClient.java:164-170,212-214` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `MxGatewayChannels.toCompletable` registers a `whenComplete` on the local `target` future to forward cancellation to the source gRPC `ListenableFuture`. Every caller — `openSessionAsync`, `invokeAsync`, `acknowledgeAlarmAsync`, `discoverHierarchyPageAsync`, `getLastDeployTimeAsync` — then chains `.thenApply(normalisingValidator(...))` or `.thenApply(::getOk)` and returns the *chained* future to the user. `CompletableFuture.thenApply` returns a new future whose cancellation does **not** propagate back to the source `target`. Cancelling the user-facing future therefore never sets `target.isCancelled() == true`, so `source.cancel(true)` is never invoked and the underlying gRPC call continues until its deadline expires. The `JavaClientDesign.md` "Streaming" section explicitly says "Stream cancellation should call `ClientCall.cancel`" — the same expectation reasonably applies to the unary `*Async` surface.
|
||
|
||
**Recommendation:** Either return `target` directly from each `*Async` method (and inline the validator into the `FutureCallback.onSuccess` path so no `thenApply` is needed), or attach the cancellation listener to the *final* returned future. The cleanest fix is to have `MxGatewayChannels.toCompletable` return a future that wraps the validator internally and registers `whenComplete` on the final future. Add a regression test that cancels the user-facing future and verifies the gRPC call was cancelled (e.g. via a `ServerCallStreamObserver.setOnCancelHandler` latch).
|
||
|
||
**Resolution:** 2026-05-20 — Fixed by inlining the reply validator into `MxGatewayChannels.toCompletable` so the user-visible future is the same future cancellation is bound to: added a new `toCompletable(source, operation, validator)` overload that runs the validator inside the `FutureCallback.onSuccess` path (normalising non-`MxGatewayException` `RuntimeException`s through `MxGatewayErrors.fromGrpc`, matching the existing synchronous `try/catch`). Replaced the previous `whenComplete`-based cancellation listener with a small `CancellingCompletableFuture<T>` subclass whose `cancel(boolean)` forwards to the source `ListenableFuture.cancel(...)` unconditionally, so even the no-validator overload propagates cancellation deterministically (the `whenComplete` listener only fired when `target.isCancelled()` was already true, which is exactly the case `thenApply` broke). Updated `MxGatewayClient.openSessionAsync`, `MxGatewayClient.invokeAsync`, `MxGatewayClient.acknowledgeAlarmAsync`, `GalaxyRepositoryClient.testConnectionAsync`, and `GalaxyRepositoryClient.getLastDeployTimeAsync` to use the new validator overload directly (no `.thenApply` chain). `GalaxyRepositoryClient.discoverHierarchyAsync` is paged via `thenCompose`, so it now publishes the current in-flight page future via an `AtomicReference` and returns a top-level `CompletableFuture` whose overridden `cancel(boolean)` cancels whichever page is currently outstanding. `clients/java/README.md` documents the new cancellation contract: cancelling any `*Async` future aborts the underlying gRPC call. Regression tests: `MxGatewayMediumFindingsTests.invokeAsyncCancellationCancelsUnderlyingGrpcCall` (full in-process gRPC test using `ServerCallStreamObserver.setOnCancelHandler` to latch when the server observes RPC cancellation), `toCompletableValidatorOverloadForwardsCancellationToSource`, and `toCompletableNoValidatorOverloadForwardsCancellationToSource` (unit-level proofs that both `MxGatewayChannels.toCompletable` overloads forward `cancel(true)` to the source `ListenableFuture`).
|
||
|
||
### Client.Java-016
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:361-391`, `GalaxyRepositoryClient.java:285-315` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Client.Java-009 introduced `MxGatewayChannels` to deduplicate `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable`. The two `close()` / `closeAndAwaitTermination()` methods — added shortly after to fix Client.Java-006 — were not extracted along with them. The 30-line bodies of `MxGatewayClient.close()` + `closeAndAwaitTermination()` and `GalaxyRepositoryClient.close()` + `closeAndAwaitTermination()` are now duplicated verbatim, including the `awaitTermination(connectTimeout)` semantic (see Client.Java-019), the `InterruptedException` handling, and the `ownedChannel == null` guard. A fix to one path (e.g. introducing a dedicated `shutdownTimeout` option) will silently miss the other.
|
||
|
||
**Recommendation:** Move the shutdown logic into `MxGatewayChannels.shutdown(ManagedChannel channel, MxGatewayClientOptions options)` and `MxGatewayChannels.shutdownAndAwaitTermination(...)`. Have both clients delegate to it. Same recommendation applies to the duplicated `MxGatewayAuthInterceptor` construction in the two constructors (`MxGatewayClient(Channel, ...)` and `GalaxyRepositoryClient(Channel, ...)`).
|
||
|
||
**Resolution:** 2026-05-20 — Extracted the duplicated shutdown logic into `MxGatewayChannels.shutdown(ManagedChannel, MxGatewayClientOptions)` and `MxGatewayChannels.shutdownAndAwaitTermination(ManagedChannel, MxGatewayClientOptions)`. Both helpers handle the `ownedChannel == null` no-op, the orderly-shutdown / `awaitTermination` / `shutdownNow`-on-timeout escalation, and the `InterruptedException`-restoring-the-interrupt-flag path. `MxGatewayClient.close()`/`closeAndAwaitTermination()` and `GalaxyRepositoryClient.close()`/`closeAndAwaitTermination()` are now one-liners that delegate to the shared helpers, so a future change (such as Client.Java-019's `shutdownTimeout`) lives in one place. Unused `java.util.concurrent.TimeUnit` imports were removed from both clients. The constructor-level `MxGatewayAuthInterceptor` duplication noted in the recommendation was left in place — it is a single intercept call per constructor (2 lines) versus the 30-line shutdown duplication that was the actual maintenance hazard. Regression tests: `MxGatewayLowFindingsIITests.sharedShutdownHelperIsNoOpForNullChannel` (covers the null-channel guard), `shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout`, and `shutdownEscalatesToShutdownNowWhenTimeoutExceeded` (cover the shared shutdown semantics; the second is also the Client.Java-019 regression).
|
||
|
||
### Client.Java-017
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:25-36`, `clients/java/README.md:99-107` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `MxEventStream.streamEvents` was recently widened from a 16-element buffer to a 1024-element buffer (`MxGatewayClient.streamEvents` at line 268: `new MxEventStream(1024)`). The class-level Javadoc on `MxEventStream` still says "the gateway can push events faster than the consumer drains the bounded 16-element buffer", and `clients/java/README.md` line 103 says "uses gRPC's default auto-inbound flow control with a fixed 16-element buffer". The fail-fast event-backpressure contract (Client.Java-011 resolution) was written against the older capacity. The `MxGatewayClient.streamEvents` inline comment even acknowledges the change ("A small queue overflows on any moderately active session; 1024 covers a realistic backlog"). Users of this surface will reason about realistic backpressure budgets using the wrong number.
|
||
|
||
**Recommendation:** Update the `MxEventStream` Javadoc and the README to say "1024-element buffer" (or, since the capacity is a passed parameter, document it as a parameter rather than a constant). Consider exposing the capacity through `MxGatewayClientOptions` so callers can tune it per session.
|
||
|
||
**Resolution:** 2026-05-20 — Updated the `MxEventStream` class Javadoc and `clients/java/README.md` so both say "1024-element buffer" instead of the obsolete "16-element buffer". The Javadoc also notes that capacity is a constructor parameter and that the production caller (`MxGatewayClient.streamEvents`) passes `1024` to absorb the session-backlog replay burst, so readers understand the value is a deliberate choice rather than a constant. Exposing the capacity through `MxGatewayClientOptions` was intentionally left out of scope — the v1 design keeps the event-stream surface minimal and `MxGatewayClient.streamEvents` is the only caller; if a tuning need arises in v2 the existing constructor already accepts the capacity.
|
||
|
||
### Client.Java-018
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Security |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:54-66` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `redactCredentials(value)` splits its input on `\\s+` (whitespace) and only redacts whitespace-delimited tokens that start with `mxgw_` or equal `bearer` (case-insensitive). gRPC `Status.getDescription()` strings, log lines, and proto error messages can carry credentials separated by colons (`Bearer:mxgw_id_secret`), commas (`token=mxgw_id_secret,scope=...`), single quotes (`'mxgw_id_secret'`), parentheses (`(mxgw_id_secret)`), or embedded in URLs/paths — all of which leave the `mxgw_` token attached to a non-whitespace neighbour and survive redaction. `MxGatewayErrors.fromGrpc` is the primary consumer; a gateway error description like `authentication failed: 'mxgw_id_secret'` would round-trip the secret into the resulting `MxGatewayAuthenticationException` message.
|
||
|
||
**Recommendation:** Replace the whitespace-split scrub with a regex-based pass that matches `mxgw_[A-Za-z0-9_-]+` anywhere in the string and substitutes `<redacted>`; also redact `Bearer\s+\S+` as a unit so the token after `Bearer` is masked regardless of the surrounding punctuation. Cover with a fixture-style test alongside `MxGatewayFixtureTests.grpcAuthErrorsAreClassifiedAndRedacted` that asserts a quoted or comma-delimited credential is fully masked.
|
||
|
||
**Resolution:** 2026-05-20 — Replaced the whitespace-split scrub with two compiled `Pattern` regexes: `mxgw_[A-Za-z0-9_-]+` matches any gateway-shaped credential anywhere in the string regardless of surrounding punctuation, and `(?i)bearer\s+\S+` masks an authorization-header style `Bearer <token>` as a unit so a non-mxgw bearer token cannot leak either. The mxgw pass runs first, so the bearer pass observes `Bearer <redacted>` for the common combined case and renders it idempotently. Regression tests in `MxGatewayFixtureTests`: `redactCredentialsHandlesNonWhitespaceDelimitedTokens` exercises single-quoted, double-quoted, comma-delimited, colon-delimited, parenthesised, URL-embedded, and bearer-header credentials; `redactCredentialsLeavesBenignContentAlone` confirms strings without credentials and a `null` input are unchanged.
|
||
|
||
### Client.Java-019
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Performance & resource management |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:362-391`, `GalaxyRepositoryClient.java:286-315` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Both clients' `close()` / `closeAndAwaitTermination()` use `options.connectTimeout()` as the upper bound on `awaitTermination`. The `connectTimeout` semantically describes how long the client will wait to *establish* the channel, not how long it should wait for in-flight calls and the Netty event loop to drain after `shutdown()`. With the default 10s connect timeout, shutting down a client with a long-running unary call already in flight will silently escalate to `shutdownNow()` and forcibly cancel it before the call's own deadline expires, defeating the deadline contract on `withDeadline`. Conversely, a caller who sets a small `connectTimeout` (e.g. 500 ms for a health probe) inherits an aggressively short shutdown deadline they probably did not intend.
|
||
|
||
**Recommendation:** Introduce a dedicated `shutdownTimeout` on `MxGatewayClientOptions` (defaulting to e.g. 5–10 s independent of `connectTimeout`) and use it in `close()` and `closeAndAwaitTermination()`. Document the precedence in the Javadoc. This pairs naturally with the Client.Java-016 deduplication fix.
|
||
|
||
**Resolution:** 2026-05-20 — Added a dedicated `shutdownTimeout` `Duration` on `MxGatewayClientOptions` (builder method `shutdownTimeout(Duration)`, accessor `shutdownTimeout()`, default 10 s), independent of `connectTimeout`. Both shared shutdown helpers introduced for Client.Java-016 (`MxGatewayChannels.shutdown` and `shutdownAndAwaitTermination`) call `options.shutdownTimeout()` as the `awaitTermination` upper bound, so a small `connectTimeout` (e.g. a 500 ms health-probe timeout) no longer forces a premature `shutdownNow()` on in-flight calls. The new option is reflected in `toString()` and documented on both helpers and the `close()`/`closeAndAwaitTermination()` Javadoc on both clients; `clients/java/README.md` notes the default and the independence from `connectTimeout`. Regression tests in `MxGatewayLowFindingsIITests`: `shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout` (a 50 ms connect timeout + 1 s shutdown timeout + 200 ms graceful-termination channel never escalates to `shutdownNow()`), `shutdownEscalatesToShutdownNowWhenTimeoutExceeded` (a stuck channel beyond the shutdown timeout is forcibly shut down), and `shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout` (the default holds even when `connectTimeout` is small).
|
||
|
||
### Client.Java-020
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:244-254`, `galaxy_repository.proto:94` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `galaxy_repository.proto` defines `DeployEvent.sequence` as `uint64`; the protobuf Java mapping projects that to a signed `long`. The CLI's text-mode `galaxy-watch` output prints it as `"seq=%d ..."`, which interprets the value as signed. For genuine wraparound this is implausible (deploy sequences will not reach `2^63`), but the broader pattern is brittle: any unsigned proto field printed via `%d` will display incorrectly past the signed boundary. The JSON path uses `protoJson(event)` which formats unsigned longs as numeric strings via `JsonFormat`, so JSON output is correct; only the text mode is at risk.
|
||
|
||
**Recommendation:** Print the sequence with `Long.toUnsignedString(event.getSequence())` (or switch the text format to `%s` and pass the unsigned-string conversion). The same rule should apply to any other `uint64` proto fields that surface in CLI text output.
|
||
|
||
**Resolution:** 2026-05-20 — Updated the `galaxy-watch` text-mode `out.printf` in `MxGatewayCli.GalaxyWatchCommand.call()` to use `%s` for the sequence field and pass `Long.toUnsignedString(event.getSequence())`, so deploy sequences past `2^63` render as their correct unsigned decimal string instead of a negative signed long. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and was left unchanged. An inline comment near the printf documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. Regression test: `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output contains `seq=18446744073709551615` and does not contain `seq=-1`.
|
||
|
||
### Client.Java-021
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Concurrency & thread safety |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/DeployEventStream.java:96-135` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Client.Java-002 fixed a deterministic terminal-state race in `MxEventStream` by introducing a `terminate(MxGatewayException)` method, a `terminalLock`, and a `terminated` flag so a `close()` arriving after a queue-overflow `offer()` cannot wipe the overflow exception. `DeployEventStream` — added later and structurally a copy of `MxEventStream` — never received the same fix. Its current `close()` does `closed.set(true); stream.cancel(...); offer(END);`, and its `offer()` overflow branch does `queue.clear(); queue.offer(new MxGatewayException("...queue overflowed")); queue.offer(END);` (lines 117-135). With these two paths running concurrently, the same sequence Client.Java-002 documented can repeat: the overflow branch enqueues `[overflowException, END]`, `close()` then calls `offer(END)` which sees the queue full and falls into the END branch (`queue.clear(); queue.offer(value);`), wiping the overflow exception and leaving a clean end-of-stream. The CLI `galaxy-watch` (and any `WatchDeployEvents` consumer) loses the overflow signal it was supposed to surface, defeating the fail-fast backpressure contract. The 16-element buffer on `DeployEventStream` makes overflow far less likely than on `MxEventStream` in practice, but the race is identical.
|
||
|
||
**Recommendation:** Mirror the `MxEventStream` fix: add a `terminated` flag and `terminalLock`, route `close()`, `onCompleted`, and the overflow branch through a single `terminate(MxGatewayException)` method that wins on first arrival, and add the regression analogous to `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose`. Given the two stream classes are now structural copies of each other, consider extracting the queue/terminate plumbing into a shared base or helper so the next fix lands once.
|
||
|
||
**Resolution:** 2026-05-20 — Mirrored the `MxEventStream` terminal-state serialisation in `DeployEventStream`: replaced the `AtomicBoolean closed` field with a `volatile boolean closed`, added a `terminalLock`/`terminated` pair, and routed all terminal paths (`close()`, `onCompleted()`, the overflow branch in `offer()`) through a single private `terminate(MxGatewayException fault)` method guarded by `synchronized (terminalLock) { if (terminated) return; terminated = true; ... }`. The first terminal condition wins: an overflow that publishes `[exception, END]` is no longer wiped by a subsequent `close()`/`onCompleted()` that previously took the "queue full → clear + offer(END)" branch. The class-level Javadoc now documents the single-consumer-thread iterator contract and the deterministic terminal transition, matching `MxEventStream`. Behavior outside the terminal path is unchanged: `beforeStart` still resolves the close-before-beforeStart race (Client.Java-014's deploy-stream counterpart, already in place), `take()` still surfaces interrupts, and the request stream is still cancelled on overflow/close. Regression tests in `GalaxyRepositoryClientTests`: `deployEventStreamOverflowExceptionSurvivesASubsequentClose` (deterministic — capacity-2 stream, force overflow, then close, assert the overflow exception is surfaced) and `deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate` (300-iteration concurrent race stress, mirrors `MxGatewayMediumFindingsTests.eventStreamConcurrentOverflowAndCloseAlwaysTerminate`).
|
||
|
||
### Client.Java-022
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayChannels.java:161-172` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The Javadoc on the no-validator `toCompletable(source, operation)` overload claims: "calling `cancel(true)` on either the direct return value or the user-facing chained future ultimately invokes `source.cancel(true)` (chained futures forward to the upstream stage they were derived from, which is this future)." This is not how `CompletableFuture.thenApply` (or `thenCompose`, `whenComplete`, etc.) actually behaves: a downstream stage's `cancel()` only marks that derived stage as cancelled, it does NOT propagate cancellation upstream to the originating `CancellingCompletableFuture`. The Client.Java-015 resolution actually fixes the bug by inlining the validator into the new `toCompletable(source, operation, validator)` overload (lines 224-252) so users never need a downstream stage, and by `GalaxyRepositoryClient.discoverHierarchyAsync` using an explicit `AtomicReference`-based override (which has a correct comment at line 218-221 acknowledging exactly this `thenCompose` limitation). The contradiction between the two adjacent comments will mislead the next maintainer who decides to add a convenience `.thenApply` on top of a `*Async` return value — they will assume cancellation still flows through and re-introduce the Client.Java-015 leak.
|
||
|
||
**Recommendation:** Rewrite the `toCompletable` Javadoc to state the actual contract: `cancel(...)` on the direct return value (the `CancellingCompletableFuture` instance) forwards to the source RPC, but `cancel(...)` on a `thenApply`/`thenCompose`/`thenAccept` *of* that future does not — the cancellation is captured at the derived stage and the upstream RPC continues until its deadline. Callers that need cancellation through a chained pipeline must follow the `discoverHierarchyAsync` pattern (custom `CompletableFuture` subclass tracking the current in-flight stage). The underlying `CancellingCompletableFuture` class doc (lines 254-258) is already correct; only the `toCompletable` paragraph is misleading.
|
||
|
||
**Resolution:** 2026-05-20 — Rewrote the `toCompletable(source, operation)` Javadoc in `MxGatewayChannels` to reflect the actual `CompletableFuture` contract. The doc now states unambiguously: cancelling the direct return value (the `CancellingCompletableFuture`) forwards to the source `ListenableFuture` and aborts the underlying gRPC call (the Client.Java-015 fix), but cancelling a derived `thenApply`/`thenCompose`/`thenAccept`/`whenComplete` stage of that future does NOT propagate cancellation upstream — the derived stage is marked cancelled while the source RPC continues until its deadline. The Javadoc explicitly directs callers that need cancellation through a chained pipeline to either the `toCompletable(source, operation, validator)` overload (which inlines the validator into the `FutureCallback.onSuccess` path so the user-visible future is the same future cancellation is bound to) or the `GalaxyRepositoryClient.discoverHierarchyAsync` `AtomicReference`-based pattern (for `thenCompose` across paged calls). The `CancellingCompletableFuture` class Javadoc was already correct and is unchanged. Doc-only change; no behavior change and no new test required.
|
||
|
||
### Client.Java-023
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1054`, `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:634` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `MxEvent.worker_sequence` is a proto `uint64` (line 634 of `mxaccess_gateway.proto`). The `stream-events` CLI text path prints it with `%d` (`client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());`), which interprets the underlying signed `long` value — sequences past `2^63` would render as a negative number. This is the exact same `uint64`-with-`%d` bug that Client.Java-020 fixed for the `galaxy-watch` `DeployEvent.sequence` field; the resolution's stated rule ("The same rule should apply to any other `uint64` proto fields that surface in CLI text output") was never extended to this site. In practice worker sequences will not reach `2^63` so this is latent rather than active, but the same fix and the same regression-test pattern apply.
|
||
|
||
**Recommendation:** Replace the `%d` with `%s` plus `Long.toUnsignedString(event.getWorkerSequence())` (matching the Client.Java-020 fix in `GalaxyWatchCommand`), and add a regression test analogous to `MxGatewayCliTests.deployEventSequenceRendersAsUnsignedForHighUint64` covering the `stream-events` text-mode format string with `-1L`. The `--after-worker-sequence` CLI option (line 1035) is also typed as a `long`, which means the user cannot pass an unsigned value above `2^63 - 1` from the command line; that is a related but separate ergonomic gap worth noting in the same change.
|
||
|
||
**Resolution:** 2026-05-20 — Updated the `stream-events` text-mode `client.out().printf` in `MxGatewayCli.StreamEventsCommand.call()` to use `%s` for the sequence and pass `Long.toUnsignedString(event.getWorkerSequence())`, mirroring the Client.Java-020 fix in `GalaxyWatchCommand`. Worker sequences past `2^63` now render as their correct unsigned decimal string instead of a negative signed long. An inline comment near the `printf` documents the unsigned-uint64 contract so the next person editing the format string knows not to switch back to `%d`. The JSON path through `protoJson(event)` was already correct (proto `JsonFormat` emits unsigned longs as decimal strings) and is unchanged. The `--after-worker-sequence` `long` ergonomic gap is a separate v2 concern and intentionally out of scope. Regression test: `MxGatewayCliTests.streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` exercises the format string with the max-uint64 bit pattern (`-1L`) and asserts the output starts with `18446744073709551615 ` and does not start with `-1 `, mirroring `deployEventSequenceRendersAsUnsignedForHighUint64`.
|
||
|
||
### Client.Java-024
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:855-883` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `BenchReadBulkCommand` records per-call latency in `latenciesNanos[latencyCount++] = elapsed;` inside *both* the success branch (line 865) and the `catch (Exception ex)` failure branch (line 880). The failed-call durations are then fed into the `percentileSummaryMs` p50/p95/p99 calculation alongside successful calls, producing misleading latency stats when even a few transport errors occur during the bench window. Client.Rust-015 fixed exactly this pattern in `clients/rust/src/bin/bench-read-bulk.rs` ("stop bench-read-bulk from polluting success-latency histograms with failed-call durations"); the equivalent fix was not applied to the Java implementation. The cross-language matrix runner (`scripts/run-client-e2e-tests.ps1`) compares numbers across all five clients, so the Java numbers will be silently inconsistent with the Rust numbers on the same fault profile.
|
||
|
||
**Recommendation:** Drop the failure-branch latency record (only count `failed++`), or alternately maintain a separate `failedLatenciesNanos` array and report it as a distinct stat in the JSON output — but the success histogram must not include failed-call latencies. Cross-check the .NET, Go, and Python `bench-read-bulk` drivers in the same change to make sure all five clients use the same success-latency definition; the cross-language matrix is only useful if the metric is uniform.
|
||
|
||
**Resolution:** 2026-05-20 — Dropped the failure-branch latency record in `BenchReadBulkCommand.call()`: the `catch (Exception ex)` block no longer appends `elapsed` to `latenciesNanos` and no longer grows the array — it only increments `failed++`. The success-latency histogram fed into `percentileSummaryMs` (p50/p95/p99/max/mean) is now success-call-only, matching the Client.Rust-015 fix. The JSON output still surfaces `failedCalls` as a distinct top-level count so observers see fault rates separately from latency. An inline comment on the catch block documents the contract so the next maintainer doesn't reinstate the record. New CLI test `MxGatewayCliTests.benchReadBulkCommandEmitsJsonSchemaKeys` (added under Client.Java-026 below) covers the JSON schema produced by the corrected path. The .NET / Go / Python bench drivers were intentionally left out of scope for this Java-focused finding — that cross-client audit is its own follow-up and tracked separately.
|
||
|
||
### Client.Java-025
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:1176-1185` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** `CommonOptions.toClientOptions()` populates the `MxGatewayClientOptions` builder with `endpoint`, `apiKey`, `plaintext`, `caCertificatePath`, `serverNameOverride`, and `callTimeout`, but never sets `shutdownTimeout` even though Client.Java-019 introduced it as a first-class option. CLI users therefore always inherit the 10-second default and have no way to override it from the command line, which makes the new option effectively client-library-only. CLI users running long-lived operations (a big `discover-hierarchy` page-chain, a streaming `galaxy-watch` session that needs to drain on Ctrl+C) cannot tune the shutdown deadline up; users running short health probes who want a small `connectTimeout` *and* a small `shutdownTimeout` to keep the CLI snappy on failure also cannot.
|
||
|
||
**Recommendation:** Add a `--shutdown-timeout` option to `CommonOptions` (parsed via the existing `parseDuration` helper, default unset → use the 10-second library default) and propagate it into `toClientOptions()` so the CLI surface tracks the library surface. Include the resolved value in `redactedJsonMap()` so `--json` output shows the effective shutdown deadline.
|
||
|
||
**Resolution:** 2026-05-20 — Added a `--shutdown-timeout` option to `CommonOptions` in `MxGatewayCli.java`, parsed via the existing `parseDuration` helper (so it accepts `10s`, `500ms`, ISO-8601 `PT10S`, etc.). A new lazy accessor `resolvedShutdownTimeout()` returns the parsed `Duration` when the user passed `--shutdown-timeout`, or `null` when unset so the `MxGatewayClientOptions` builder default (10s, established by Client.Java-019) applies. `toClientOptions()` now conditionally calls `builder.shutdownTimeout(resolvedShutdownTimeout)` only when the user opted in, preserving the library default for the common case. `redactedJsonMap()` includes the resolved value under key `"shutdownTimeout"` (empty string when unset) so `--json` output shows the effective shutdown deadline. The CLI surface now tracks the library surface so a user running a long page-chain can pass `--shutdown-timeout 60s`, and a user running a short health probe can pair `--timeout 500ms` with `--shutdown-timeout 500ms` to keep the CLI snappy on failure. Behavior for callers who do not pass the new flag is unchanged.
|
||
|
||
### Client.Java-026
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Location | `clients/java/mxgateway-cli/src/test/java/com/dohertylan/mxgateway/cli/MxGatewayCliTests.java` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Client.Java-013 explicitly deferred adding CLI-level test coverage for the `read-bulk`, `write-bulk`, and `bench-read-bulk` subcommands ("Optionally also add at least one CLI-level test for `read-bulk`, `write-bulk`, and the `bench-read-bulk` subcommands to keep parity with the .NET / Go / Rust CLI smoke matrix"), and the resolution explicitly stated that "follow-up is tracked separately and out of scope for this unblock-compilation fix." That follow-up was never filed. The current `MxGatewayCliTests` only covers `version`, `open-session` (JSON redaction), `write`, `smoke`, `subscribe-bulk`, `unsubscribe-bulk`, and the Client.Java-020 unsigned-uint64 format string — six of the thirteen non-trivial subcommands the CLI ships are completely untested at the CLI layer (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`), as are `stream-events`, the four `galaxy-*` commands, and `close-session`. The `FakeSession` stubs all return empty lists, so an end-to-end CLI test would catch JSON-shape regressions, argument-parsing bugs, and option contract breaks that the bulk Session unit tests on the library side do not exercise. This same coverage gap is what made Client.Java-013 itself only surface on a clean Gradle build.
|
||
|
||
**Recommendation:** Add at least one round-trip CLI test per bulk subcommand (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`) that exercises the JSON output shape and the value parser (`parseValue(type, text)` is shared across all five and the only `write*-bulk` path that catches typos in the type switch). Extending the `FakeSession` stubs to return at least one result row makes the assertions meaningful. The `bench-read-bulk` test can run with a 1-second `--duration-seconds` and a 0-second `--warmup-seconds` and assert the JSON schema keys (`totalCalls`, `latencyMs.p50`, `callsPerSecond`) rather than the numeric values.
|
||
|
||
**Resolution:** 2026-05-20 — Added round-trip CLI tests for all six bulk-family subcommands plus the new Client.Java-023 unsigned-uint64 regression to `MxGatewayCliTests`. The `FakeSession` stubs were upgraded from empty-list returns to per-call recorders that publish the parsed entries (e.g. `lastWriteBulkEntries`, `lastReadBulkTimeoutMs`) and synthesise one `BulkReadResult`/`BulkWriteResult` per requested handle so the JSON output assertions exercise the `bulkReadResultMap` and `bulkWriteResultMap` serialisers. New tests: (a) `readBulkCommandForwardsTimeoutAndPrintsResults` — asserts `--timeout-ms 750` reaches the session and the JSON output carries the per-tag `tagAddress`, `itemHandle`, `wasCached`, and `quality` fields; (b) `writeBulkCommandParsesTypedValuesAndPrintsResults` — asserts `--type int32 --values 111,222 --user-id 5` parses through the shared `parseValue` switch and the entries are constructed with the expected typed `MxValue` and `userId`; (c) `write2BulkCommandForwardsTimestampAndPrintsResults` — asserts the `--timestamp 2026-05-20T00:00:00Z` reaches the entry as a `timestampValue` (`hasTimestampValue()` is true); (d) `writeSecuredBulkCommandForwardsUserIdsAndPrintsResults` — asserts `--current-user-id 7 --verifier-user-id 8` are both propagated; (e) `writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults` — combination of (c) and (d); (f) `benchReadBulkCommandEmitsJsonSchemaKeys` — runs the bench in a 1s steady / 0s warmup window and asserts the JSON output contains the cross-language schema keys (`language=java`, `command=bench-read-bulk`, `bulkSize=2`, `totalCalls`, `successfulCalls`, `failedCalls`, `callsPerSecond`, `latencyMs.p50/p95/p99`, `tags` including the synthesised `TestMachine_001.TestChangingInt`/`TestMachine_002.TestChangingInt` pair); (g) `streamEventsWorkerSequenceRendersAsUnsignedForHighUint64` — Client.Java-023 regression. The recommendation's stream-events and galaxy-* CLI tests were intentionally not added in this round — they require either an in-process gateway/galaxy server or package-private `MxEventStream`/`DeployEventStream` constructor access from the CLI test module, which is its own infrastructure work; the library-side tests in `GalaxyRepositoryClientTests` already cover the streaming wire behaviour.
|
||
|
||
### Client.Java-027
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/README.md:36,107-175,185,205,220`, `clients/java/JavaClientDesign.md:195-211` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Commit `397d3c5` renamed the gradle subprojects to `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` in `settings.gradle`, but did not propagate that rename into the README's documented gradle commands or into `JavaClientDesign.md`. Every documented gradle invocation still uses the old short names — `gradle :mxgateway-client:generateProto`, `gradle :mxgateway-cli:run --args=...`, `gradle :mxgateway-client:jar :mxgateway-cli:installDist` — and every one fails with `project 'mxgateway-client' not found in root project 'zb-mom-ww-mxaccessgw-java'`. A user copy-pasting from the README or design doc will hit this on the very first command.
|
||
|
||
**Recommendation:** Find-and-replace `:mxgateway-client:` → `:zb-mom-ww-mxgateway-client:` and `:mxgateway-cli:` → `:zb-mom-ww-mxgateway-cli:` across `clients/java/README.md` and `clients/java/JavaClientDesign.md`. (Roughly 17 occurrences in the README, 3 in the design doc.) Test by running one updated command end-to-end.
|
||
|
||
**Resolution:** 2026-05-24 — Replaced every stale gradle task path in `clients/java/README.md` (line 37 `:mxgateway-client:generateProto`, lines 108–110 `:mxgateway-cli:run` galaxy-* invocations, lines 160–161 `:mxgateway-cli:run` galaxy-watch invocations, lines 169–176 `:mxgateway-cli:run` gateway-command invocations, line 186 `:mxgateway-cli:run` TLS smoke invocation, line 206 `:mxgateway-client:jar :mxgateway-cli:installDist` packaging command, line 221 `:mxgateway-cli:run` integration-check invocation) and in `clients/java/JavaClientDesign.md` (lines 195–196 packaging bullets, lines 209–212 "Current Build" prose) with their prefixed `:zb-mom-ww-mxgateway-client:` / `:zb-mom-ww-mxgateway-cli:` equivalents. Verified by running `gradle :zb-mom-ww-mxgateway-client:test --tests …` (now the updated documented form) end-to-end and `gradle build`, both BUILD SUCCESSFUL. Doc-only change; no production code touched.
|
||
|
||
### Client.Java-028
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/JavaClientDesign.md:23-27` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The build-layout block in `JavaClientDesign.md` still shows the old Java package paths `com/dohertylan/mxgateway/client/` and `com/dohertylan/mxgateway/cli/`. The actual source tree was moved to `com/zb/mom/ww/mxgateway/{client,cli}/` in commit `397d3c5`. Anyone using the design doc to locate or navigate code will look in the wrong place.
|
||
|
||
**Recommendation:** Update the layout block to reflect the new paths: `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/`. Comment-only change.
|
||
|
||
**Resolution:** 2026-05-24 — Updated the Build Layout block in `clients/java/JavaClientDesign.md:23-27` to show `com/zb/mom/ww/mxgateway/client/` and `com/zb/mom/ww/mxgateway/cli/` instead of the old `com/dohertylan/mxgateway/{client,cli}/` package paths, matching the actual on-disk source tree under `clients/java/zb-mom-ww-mxgateway-{client,cli}/src/{main,test}/java/`. Doc-only change; no production code touched.
|
||
|
||
### Client.Java-029
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/README.md:208-209` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The packaging section states "The library jar is under `zb-mom-ww-mxgateway-client/build/libs`. The installed CLI distribution is under `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli`." The library-jar path is correct, but the install-distribution path is wrong — gradle's `installDist` produces a directory whose name matches the project name, not the (now-retired) short name, so the actual path is `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli/`. The e2e script (`scripts/run-client-e2e-tests.ps1`) uses the correct path, so the script works; only the README is wrong.
|
||
|
||
**Recommendation:** Correct the README to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`. Comment-only.
|
||
|
||
**Resolution:** 2026-05-24 — Corrected the install-distribution path in `clients/java/README.md:210` from `zb-mom-ww-mxgateway-cli/build/install/mxgateway-cli` to `zb-mom-ww-mxgateway-cli/build/install/zb-mom-ww-mxgateway-cli`, matching what gradle's `installDist` actually produces (the directory name matches the subproject name). The library-jar path on the preceding line was already correct and is unchanged. Doc-only change; the e2e script `scripts/run-client-e2e-tests.ps1` was already using the correct path.
|
||
|
||
### Client.Java-030
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** Commit `397d3c5` added the missing `QueryActiveAlarmsRequest` proto message and the corresponding `rpc QueryActiveAlarms` to `mxaccess_gateway.proto`. The Java client now generates the request type and the gRPC stub method, and `MxGatewayClient.queryActiveAlarms` correctly references both. No unit test exercises the new RPC end-to-end on the Java side — the proto compiles, the import resolves, the method is callable, but the absence of a `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` (or similar) fixture means a serialisation regression in `QueryActiveAlarmsRequest` or the streaming reply would not surface in the Java unit tests.
|
||
|
||
**Recommendation:** Add a unit test in `MxGatewayFixtureTests` (or wherever the alarm fixtures live) that pushes a `QueryActiveAlarmsRequest` through a `FakeMxAccessGateway` and asserts the `ActiveAlarmSnapshot` stream is consumed correctly — mirror the existing `acknowledgeAlarm` test shape.
|
||
|
||
**Resolution:** 2026-05-24 — Re-triaged the recommendation: the suite uses the `InProcessGateway` + `TestGatewayService` fixture in `MxGatewayClientSessionTests` (no separate `FakeMxAccessGateway`), and there is in fact no existing `acknowledgeAlarm` test in the current tree to mirror — the alarm RPC surface has zero coverage. Added `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` to `MxGatewayClientSessionTests` (the same file/fixture style as the other unary/streaming RPC tests): the test overrides `TestGatewayService.queryActiveAlarms` to capture the inbound `QueryActiveAlarmsRequest` and emit two `ActiveAlarmSnapshot` messages (one `ACTIVE`, one `ACTIVE_ACKED` with an operator/comment populated), then calls `MxGatewayClient.queryActiveAlarms` with a request carrying `session_id`, `client_correlation_id`, and `alarm_filter_prefix`. It awaits `onCompleted` via a `CountDownLatch`, closes the subscription, and asserts (a) the server observed all three inbound request fields, (b) both snapshots arrived in order with the expected `alarm_full_reference`/`current_state`/`severity`/`operator_user`, and (c) the observer never received an error. TDD red phase confirmed by temporarily changing the `session_id` assertion to `"WRONG-SESSION-ID"` — `gradle :zb-mom-ww-mxgateway-client:test --tests "…queryActiveAlarmsForwardsRequestAndStreamsSnapshots"` failed with `AssertionFailedError at MxGatewayClientSessionTests.java:252`. Restoring the assertion turned the build green. Full `gradle build` from `clients/java` is BUILD SUCCESSFUL.
|
||
|
||
### Client.Java-031
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | mxaccessgw conventions |
|
||
| Location | `clients/java/README.md:13,17,26` |
|
||
| Status | Resolved |
|
||
|
||
**Description:** The README prose at lines 13–26 introduces the subprojects as `mxgateway-client` and `mxgateway-cli` (the old short names) when discussing the layout. Those are no longer the actual subproject names — `settings.gradle` declares `zb-mom-ww-mxgateway-client` / `zb-mom-ww-mxgateway-cli`. The prose works as a naming mnemonic, but it confuses anyone trying to map README descriptions to actual gradle output, IDE project trees, or the e2e script.
|
||
|
||
**Recommendation:** Either (a) update the prose to the full prefixed names, or (b) clarify in a one-line note: "The subprojects are `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli`; this README refers to them by their short suffixes below for readability." (a) is more honest; (b) preserves readability at the cost of one extra concept.
|
||
|
||
**Resolution:** 2026-05-24 — Took option (a): updated the README layout-section prose in `clients/java/README.md:17,22,26` (the `mxgateway-client` generates… paragraph, the `mxgateway-client` exposes… paragraph, and the `mxgateway-cli` depends on `mxgateway-client`… paragraph) to use the full prefixed `zb-mom-ww-mxgateway-client` and `zb-mom-ww-mxgateway-cli` names, matching the layout block at lines 13–14 and `settings.gradle`. Reflows the final paragraph slightly because the prefixed names push past the existing 80-column wrap; content is unchanged otherwise. Doc-only change.
|
||
|
||
### Client.Java-032
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | High |
|
||
| Category | Documentation & comments |
|
||
| Location | `clients/java/README.md:182-183` |
|
||
| Status | Open |
|
||
|
||
**Description:** Commit `8738735` ("clients: document StreamAlarms + AcknowledgeAlarm in each README") added two new gradle invocations to the CLI Usage block:
|
||
|
||
```
|
||
gradle :zb-mom-ww-mxgateway-cli:run --args="stream-alarms --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --limit 1 --json"
|
||
gradle :zb-mom-ww-mxgateway-cli:run --args="acknowledge-alarm --endpoint localhost:5000 --api-key-env MXGATEWAY_API_KEY --plaintext --session-id <id> --alarm-reference \"\\Galaxy\Area001.Pump001.PumpFault\" --json"
|
||
```
|
||
|
||
Both are wrong against the actual CLI surface in `MxGatewayCli.java`:
|
||
|
||
- `StreamAlarmsCommand` (line 1060) has no `--session-id` option — the gateway's alarm feed is session-less ("Served by the gateway's always-on alarm monitor — no worker session is opened" per the SDK Javadoc at `MxGatewayClient.java:331`). picocli will reject the unknown option with a non-zero exit before the command body runs.
|
||
- `AcknowledgeAlarmCommand` (line 1135) also has no `--session-id` option, AND the option is named `--reference` (line 1136), not `--alarm-reference`. Two unknown-option errors per copy-paste.
|
||
|
||
A user copying either invocation from the README hits a picocli parse error immediately. The other clients' README updates in the same commit (commit `8738735`) likely have the same issue but are out of scope for this Java review.
|
||
|
||
**Recommendation:** Drop the `--session-id <id>` token from both documented invocations, and change `--alarm-reference` to `--reference` in the `acknowledge-alarm` line. Optionally also add `--filter-prefix` to the `stream-alarms` example so readers see the scoping option, and align README option names with the actual CLI by either renaming the CLI option `--reference` → `--alarm-reference` (matches the proto `alarm_full_reference` field semantically) or leaving as is and only fixing the README. Add a small `MxGatewayCliTests` parse-only assertion for both subcommands that exercises every option flag to prevent the same drift the next time the CLI surface or README is touched.
|
||
|
||
### Client.Java-033
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1078-1098` |
|
||
| Status | Open |
|
||
|
||
**Description:** `StreamAlarmsCommand.call()` allocates a bounded `ArrayBlockingQueue<Object>(1024)` and the gRPC observer publishes each `AlarmFeedMessage` via `queue.offer(value)`:
|
||
|
||
```
|
||
BlockingQueue<Object> queue = new ArrayBlockingQueue<>(1024);
|
||
…
|
||
@Override public void onNext(AlarmFeedMessage value) { queue.offer(value); }
|
||
```
|
||
|
||
`BlockingQueue.offer` returns `false` when the queue is full and the message is dropped silently — no exception, no log, no signal to the consumer that data was lost. Combined with gRPC's default auto-inbound flow control (the same regime that Client.Java-011 documented for `MxEventStream`), any consumer stall long enough to fill 1024 slots will silently lose alarms past the boundary. The active-alarm snapshot replay at session open is exactly the burst case that pushes a moderately busy alarm feed close to that limit, and the failure mode is invisible — the CLI prints a truncated feed and exits 0.
|
||
|
||
The library-side `MxEventStream` (Client.Java-002 resolution) and `DeployEventStream` (Client.Java-021 resolution) explicitly surface overflow as `MxGatewayException` so the consumer can resubscribe with a resume cursor. The new `stream-alarms` CLI path regresses that contract.
|
||
|
||
`JavaStyleGuide.md` ("Streaming" section, line 57) explicitly prohibits this pattern: "Do not reorder, coalesce, or drop events in client code." `CLAUDE.md` ("fail-fast event backpressure") makes overflow a deliberate cancel-and-surface signal, not a silent drop.
|
||
|
||
**Recommendation:** Either (a) wrap the gRPC observer in the existing `MxEventStream`-style adaptor that calls `subscription.cancel()` and queues an exception on `queue.offer` returning `false`, then surface that exception from the drain loop — mirroring `MxEventStream.observer().onNext`'s overflow branch; or (b) reuse the library-side fail-fast plumbing by promoting `MxEventStream` (or extracting its terminal-state base) into a public `MxAlarmFeedStream` and have `MxGatewayClient.streamAlarms` return that instead of a bare subscription handle. Option (b) lines up with Client.Java-036 (deduplicate the subscription class family). Add a CLI regression test that overflows the bounded queue and asserts a non-zero exit / overflow exception, mirroring `MxGatewayMediumFindingsTests.eventStreamOverflowExceptionSurvivesASubsequentClose`.
|
||
|
||
### Client.Java-034
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Medium |
|
||
| Category | Correctness & logic bugs |
|
||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:182-198` |
|
||
| Status | Open |
|
||
|
||
**Description:** `BatchCommand.call()` reads one CLI invocation per stdin line and tokenises with:
|
||
|
||
```
|
||
String[] args = line.trim().split("\\s+");
|
||
…
|
||
int exitCode = cmd.execute(args);
|
||
```
|
||
|
||
`split("\\s+")` does no shell-quoting parsing — it just splits on whitespace runs. Any argument with embedded whitespace gets shredded mid-string. Examples that break:
|
||
|
||
- `acknowledge-alarm --comment "needs verification" --operator op1` becomes `[acknowledge-alarm, --comment, "needs, verification", --operator, op1]` — picocli sees `"needs` as the comment value and `verification"` as an unknown positional argument.
|
||
- `galaxy-watch --json --last-seen-deploy-time "2026-04-28 18:30:00Z"` (any value with a space) fails the same way.
|
||
|
||
The commit message for `71d2c39` says the protocol expects "one line of stdin = one full CLI invocation" without specifying whether quoted arguments are honoured, but the cross-language matrix (`scripts/run-client-e2e-tests.ps1` per the prior reviews) compares output across all five clients — if other clients (.NET, Go, Rust, Python) implement shell-like tokenisation, the Java CLI will diverge on any command-line value containing whitespace, and the e2e harness will fail or skip those cases inconsistently.
|
||
|
||
The current `MxGatewayCliTests` test set (`batchCommandExecutesVersionAndEmitsEorMarker`, `batchCommandEmitsEorAfterFailedCommandAndContinues`) only covers whitespace-free args, so the bug is invisible in the test suite.
|
||
|
||
**Recommendation:** Replace `line.trim().split("\\s+")` with a real shell-style tokeniser that honours single and double quotes and backslash escapes — `picocli.CommandLine.ArgumentParser` doesn't ship one, but Apache Commons Exec's `CommandLine.translateCommandline(String)`, JDK 21's `java.util.spi.ToolProvider` argument parsing, or a small hand-written state machine all work. Cross-check the .NET / Go / Rust / Python `batch` implementations in the same change so all five clients use the same tokenisation; document the contract in the protocol comment in `MxGatewayCli.java` and in `scripts/run-client-e2e-tests.ps1`. Add a CLI test that feeds `acknowledge-alarm --comment "with spaces"` through `batch` and asserts the `--comment` value reaches the gateway as `"with spaces"`.
|
||
|
||
### Client.Java-035
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Testing coverage |
|
||
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/test/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientSessionTests.java` |
|
||
| Status | Open |
|
||
|
||
**Description:** Commit `8a0c59d` added `MxGatewayClient.streamAlarms(StreamAlarmsRequest, StreamObserver<AlarmFeedMessage>)` and a new public `MxGatewayAlarmFeedSubscription` class. No library-side test exercises either: a grep for `streamAlarms` across `zb-mom-ww-mxgateway-client/src/test/...` returns zero matches. The CLI tests (`MxGatewayCliTests.streamAlarmsCommand*`) exercise the path end-to-end, but they route through a `FakeClient.streamAlarms` override that bypasses the production `subscription.wrap(observer)` glue and the `withStreamDeadline(rawAsyncStub()).streamAlarms(...)` call. A regression to either — forgetting `.wrap(observer)`, dropping the deadline interceptor, misnaming the request — would compile and pass the CLI tests but break against a real gateway.
|
||
|
||
This is the same coverage gap pattern as Client.Java-030 (no fixture test for `QueryActiveAlarmsRequest`) which was resolved by adding `queryActiveAlarmsForwardsRequestAndStreamsSnapshots` to `MxGatewayClientSessionTests`. The new alarm-feed RPC deserves the same shape.
|
||
|
||
**Recommendation:** Add `streamAlarmsForwardsRequestAndStreamsAlarmFeedMessages` to `MxGatewayClientSessionTests` (in-process gRPC via the existing `InProcessGateway` + `TestGatewayService` fixture): override `TestGatewayService.streamAlarms` to capture the inbound `StreamAlarmsRequest` and emit one `active_alarm` snapshot, one `snapshot_complete`, and one `transition`, then complete. Call `MxGatewayClient.streamAlarms`, drain the observer via a `CountDownLatch`, and assert (a) the server observed the `alarm_filter_prefix`, (b) all three messages arrived in order with the expected payload-case, and (c) `MxGatewayAlarmFeedSubscription.cancel()` aborts the call (latch via `ServerCallStreamObserver.setOnCancelHandler`, mirroring the Client.Java-015 cancellation regression). Optionally also cover the cancel-before-beforeStart race that `MxGatewayAlarmFeedSubscription.wrap` handles, mirroring `mxEventStreamCloseBeforeBeforeStartCancelsStream`.
|
||
|
||
### Client.Java-036
|
||
|
||
| Field | Value |
|
||
|---|---|
|
||
| Severity | Low |
|
||
| Category | Code organization & conventions |
|
||
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayAlarmFeedSubscription.java`, `MxGatewayEventSubscription.java`, `MxGatewayActiveAlarmsSubscription.java`, `DeployEventSubscription.java` |
|
||
| Status | Open |
|
||
|
||
**Description:** `MxGatewayAlarmFeedSubscription` is a structural near-copy of `MxGatewayEventSubscription` — same `AtomicReference<ClientCallStreamObserver<…>>` + `AtomicBoolean cancelled` field shape, the same `wrap(observer)` returning a `ClientResponseObserver` that stores `requestStream` in `beforeStart`, the same close-before-beforeStart race handling that Client.Java-014 originally fixed for `MxEventStream`, and the same `cancel()`+`close()` idempotency contract. The four subscription classes (`MxGatewayEventSubscription`, `MxGatewayActiveAlarmsSubscription`, `MxGatewayAlarmFeedSubscription`, `DeployEventSubscription`) are now ~60-line near-clones differing only in the request/response generic parameters and the `cancel` message string.
|
||
|
||
This is the same maintenance-hazard pattern Client.Java-009 / Client.Java-016 identified for `createChannel` / `withDeadline` / `shutdown` and which was resolved by extracting `MxGatewayChannels`. A future fix to one subscription class (e.g. propagating Client.Java-014's race fix to a future fifth subscription type, or hardening idempotent `cancel`) will silently miss the others — exactly what happened with Client.Java-021 (the `MxEventStream` terminal-state fix that didn't reach `DeployEventStream` for months).
|
||
|
||
**Recommendation:** Extract a package-private abstract base, e.g. `MxGatewayStreamSubscription<TRequest>`, holding the `AtomicReference` / `AtomicBoolean` pair, the `cancel()` / `close()` implementation, and a `ClientResponseObserver` factory parameterised by the cancel-message string and the response observer. Have all four subscription classes extend it. Behaviour-only refactor — no public API change, existing tests cover the contract.
|
||
|
||
|