Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules

Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 09:46:47 -04:00
parent 1cd51bbda3
commit a0203503a7
122 changed files with 8723 additions and 757 deletions
@@ -21,7 +21,7 @@ import java.util.List;
import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
/**
* Thin wrapper around the generated {@link GalaxyRepositoryGrpc} stubs that
@@ -128,10 +128,14 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* exceptionally with {@link MxGatewayException} on failure
*/
public CompletableFuture<Boolean> testConnectionAsync() {
// Apply the projection inside toCompletable rather than via .thenApply
// so the user-visible future is the same future cancellation is bound
// to; a downstream .thenApply stage would not forward cancel() to the
// source RPC.
return MxGatewayChannels.toCompletable(
rawFutureStub().testConnection(TestConnectionRequest.getDefaultInstance()),
"galaxy test connection")
.thenApply(TestConnectionReply::getOk);
rawFutureStub().testConnection(TestConnectionRequest.getDefaultInstance()),
"galaxy test connection",
TestConnectionReply::getOk);
}
/**
@@ -163,10 +167,9 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
*/
public CompletableFuture<Optional<Instant>> getLastDeployTimeAsync() {
return MxGatewayChannels.toCompletable(
rawFutureStub().getLastDeployTime(GetLastDeployTimeRequest.getDefaultInstance()),
"galaxy get last deploy time")
.thenApply(MxGatewayChannels.normalisingValidator(
"galaxy get last deploy time", GalaxyRepositoryClient::mapDeployTime));
rawFutureStub().getLastDeployTime(GetLastDeployTimeRequest.getDefaultInstance()),
"galaxy get last deploy time",
GalaxyRepositoryClient::mapDeployTime);
}
/**
@@ -210,7 +213,33 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* exceptionally with {@link MxGatewayException} on failure
*/
public CompletableFuture<List<GalaxyObject>> discoverHierarchyAsync() {
return discoverHierarchyPageAsync("", new java.util.ArrayList<>(), new java.util.HashSet<>());
// The recursive page chain produces a fresh in-flight RPC per page.
// Track the current in-flight stage in an AtomicReference and return a
// user-facing future whose cancel() forwards to that current stage —
// otherwise cancelling the chained CompletableFuture would not abort
// the in-flight gRPC call. Without this, .thenCompose creates new
// stages whose cancel() does not propagate upstream.
AtomicReference<CompletableFuture<?>> currentStage = new AtomicReference<>();
CompletableFuture<List<GalaxyObject>> userFuture = new CompletableFuture<>() {
@Override
public boolean cancel(boolean mayInterruptIfRunning) {
boolean cancelled = super.cancel(mayInterruptIfRunning);
CompletableFuture<?> stage = currentStage.get();
if (stage != null) {
stage.cancel(mayInterruptIfRunning);
}
return cancelled;
}
};
discoverHierarchyPageAsync("", new java.util.ArrayList<>(), new java.util.HashSet<>(), currentStage)
.whenComplete((result, error) -> {
if (error != null) {
userFuture.completeExceptionally(error);
} else {
userFuture.complete(result);
}
});
return userFuture;
}
/**
@@ -275,43 +304,30 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
* callers do not leave in-flight calls or Netty event-loop threads running
* after the block exits.
*
* <p>Waits up to the configured connect timeout for graceful termination
* and forcibly shuts the channel down on timeout. If the calling thread is
* interrupted while waiting, the channel is forcibly shut down and the
* thread's interrupt flag is restored. No-op for clients that do not own
* their channel. For an explicitly checked, blocking-aware shutdown call
* {@link #closeAndAwaitTermination()}.
* <p>Waits up to {@link MxGatewayClientOptions#shutdownTimeout()} for
* graceful termination and forcibly shuts the channel down on timeout. If
* the calling thread is interrupted while waiting, the channel is forcibly
* shut down and the thread's interrupt flag is restored. No-op for clients
* that do not own their channel. For an explicitly checked, blocking-aware
* shutdown call {@link #closeAndAwaitTermination()}. Delegates to the
* shared {@link MxGatewayChannels#shutdown} so behavior stays in lockstep
* with {@link MxGatewayClient}.
*/
@Override
public void close() {
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
try {
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
} catch (InterruptedException error) {
ownedChannel.shutdownNow();
Thread.currentThread().interrupt();
}
MxGatewayChannels.shutdown(ownedChannel, options);
}
/**
* Shuts the owned channel down and waits up to the configured connect
* timeout for termination, forcibly shutting it down on timeout. No-op
* for clients that do not own their channel.
* Shuts the owned channel down and waits up to
* {@link MxGatewayClientOptions#shutdownTimeout()} for termination,
* forcibly shutting it down on timeout. No-op for clients that do not own
* their channel.
*
* @throws InterruptedException if the calling thread is interrupted while waiting
*/
public void closeAndAwaitTermination() throws InterruptedException {
if (ownedChannel != null) {
ownedChannel.shutdown();
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
}
MxGatewayChannels.shutdownAndAwaitTermination(ownedChannel, options);
}
private static Optional<Instant> mapDeployTime(GetLastDeployTimeReply reply) {
@@ -323,25 +339,33 @@ public final class GalaxyRepositoryClient implements AutoCloseable {
}
private CompletableFuture<List<GalaxyObject>> discoverHierarchyPageAsync(
String pageToken, java.util.ArrayList<GalaxyObject> objects, java.util.HashSet<String> seenPageTokens) {
String pageToken,
java.util.ArrayList<GalaxyObject> objects,
java.util.HashSet<String> seenPageTokens,
AtomicReference<CompletableFuture<?>> currentStage) {
DiscoverHierarchyRequest request = DiscoverHierarchyRequest.newBuilder()
.setPageSize(DISCOVER_HIERARCHY_PAGE_SIZE)
.setPageToken(pageToken)
.build();
return MxGatewayChannels.toCompletable(rawFutureStub().discoverHierarchy(request), "galaxy discover hierarchy")
.thenCompose(reply -> {
objects.addAll(reply.getObjectsList());
if (reply.getNextPageToken().isBlank()) {
return CompletableFuture.completedFuture(objects);
}
if (!seenPageTokens.add(reply.getNextPageToken())) {
CompletableFuture<List<GalaxyObject>> failed = new CompletableFuture<>();
failed.completeExceptionally(new MxGatewayException(
"galaxy discover hierarchy returned repeated page token: "
+ reply.getNextPageToken()));
return failed;
}
return discoverHierarchyPageAsync(reply.getNextPageToken(), objects, seenPageTokens);
});
CompletableFuture<DiscoverHierarchyReply> pageFuture = MxGatewayChannels.toCompletable(
rawFutureStub().discoverHierarchy(request), "galaxy discover hierarchy");
// Publish the in-flight page future so a user cancellation can abort
// the current outstanding RPC (the recursion replaces this reference
// before each subsequent page).
currentStage.set(pageFuture);
return pageFuture.thenCompose(reply -> {
objects.addAll(reply.getObjectsList());
if (reply.getNextPageToken().isBlank()) {
return CompletableFuture.completedFuture(objects);
}
if (!seenPageTokens.add(reply.getNextPageToken())) {
CompletableFuture<List<GalaxyObject>> failed = new CompletableFuture<>();
failed.completeExceptionally(new MxGatewayException(
"galaxy discover hierarchy returned repeated page token: "
+ reply.getNextPageToken()));
return failed;
}
return discoverHierarchyPageAsync(reply.getNextPageToken(), objects, seenPageTokens, currentStage);
});
}
}
@@ -25,14 +25,17 @@ import mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest;
* <p><strong>Backpressure (fail-fast):</strong> this adaptor relies on gRPC's
* default auto-inbound flow control — the async stub auto-requests messages, so
* the gateway can push events faster than the consumer drains the bounded
* 16-element buffer. There is intentionally <em>no</em> real client flow
* control: a consumer that stalls long enough to let the buffer fill triggers
* an immediate overflow that cancels the subscription and surfaces an
* {@link MxGatewayException} on the next {@link #next()} call. This matches the
* gateway's documented fail-fast event-backpressure design — a slow consumer
* loses its subscription rather than silently dropping events. Consumers that
* cannot keep up must drain {@link #next()} promptly (e.g. hand events to their
* own larger queue) and be prepared to resubscribe with a resume cursor.
* 1024-element buffer (the buffer capacity is a constructor parameter; the
* production caller {@code MxGatewayClient.streamEvents} passes {@code 1024} to
* absorb the gateway's session-backlog replay burst). There is intentionally
* <em>no</em> real client flow control: a consumer that stalls long enough to
* let the buffer fill triggers an immediate overflow that cancels the
* subscription and surfaces an {@link MxGatewayException} on the next
* {@link #next()} call. This matches the gateway's documented fail-fast
* event-backpressure design — a slow consumer loses its subscription rather
* than silently dropping events. Consumers that cannot keep up must drain
* {@link #next()} promptly (e.g. hand events to their own larger queue) and be
* prepared to resubscribe with a resume cursor.
*
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
@@ -60,7 +63,16 @@ public final class MxEventStream implements Iterator<MxEvent>, AutoCloseable {
return new ClientResponseObserver<>() {
@Override
public void beforeStart(ClientCallStreamObserver<StreamEventsRequest> requestStream) {
// Resolve the close()/beforeStart() race the same way DeployEventStream does:
// store the request stream first, then check the close flag and cancel the
// call if a prior close() already fired. Without this, a close() that ran
// before the gRPC call attached its ClientCallStreamObserver would skip
// stream.cancel() (because requestStream is still null) and beforeStart()
// arriving afterwards would leak the underlying call open.
MxEventStream.this.requestStream = requestStream;
if (closed) {
requestStream.cancel("client cancelled event stream", null);
}
}
@Override
@@ -98,19 +98,86 @@ final class MxGatewayChannels {
return stub.withDeadlineAfter(options.streamTimeout().toNanos(), TimeUnit.NANOSECONDS);
}
/**
* Shuts a client-owned channel down and waits up to the configured
* {@link MxGatewayClientOptions#shutdownTimeout()} for graceful
* termination, forcing {@code shutdownNow()} on timeout. If the calling
* thread is interrupted while waiting, the channel is forcibly shut down
* and the thread's interrupt flag is restored — this matches the
* try-with-resources {@code close()} contract that cannot throw a checked
* exception.
*
* <p>No-op when {@code ownedChannel} is {@code null} (i.e. the caller owns
* the channel lifecycle on a borrowed channel).
*
* @param ownedChannel the channel to shut down, may be {@code null}
* @param options the client options carrying the shutdown timeout
*/
static void shutdown(ManagedChannel ownedChannel, MxGatewayClientOptions options) {
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
try {
if (!ownedChannel.awaitTermination(options.shutdownTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
} catch (InterruptedException error) {
ownedChannel.shutdownNow();
Thread.currentThread().interrupt();
}
}
/**
* Shuts a client-owned channel down and waits up to the configured
* {@link MxGatewayClientOptions#shutdownTimeout()} for termination,
* forcing {@code shutdownNow()} on timeout. Throws
* {@link InterruptedException} when the calling thread is interrupted —
* for callers that want a checked, blocking-aware shutdown.
*
* <p>No-op when {@code ownedChannel} is {@code null}.
*
* @param ownedChannel the channel to shut down, may be {@code null}
* @param options the client options carrying the shutdown timeout
* @throws InterruptedException if the calling thread is interrupted while waiting
*/
static void shutdownAndAwaitTermination(ManagedChannel ownedChannel, MxGatewayClientOptions options)
throws InterruptedException {
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
if (!ownedChannel.awaitTermination(options.shutdownTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
}
/**
* Bridges a Guava {@link ListenableFuture} to a {@link CompletableFuture},
* normalising any failure through {@link MxGatewayErrors#fromGrpc} so the
* async error surface matches the synchronous methods. Cancelling the
* returned future cancels the source RPC.
*
* <p><strong>Cancellation contract:</strong> the returned future is a
* {@link CancellingCompletableFuture} that overrides
* {@link CompletableFuture#cancel(boolean)} so cancellation always forwards
* to the source {@link ListenableFuture}, even when callers wrap the
* future in additional {@code thenApply}/{@code thenCompose} stages. The
* historical {@code whenComplete}-based forwarder was buggy because
* {@code thenApply} returns a new {@code CompletableFuture} whose
* cancellation does <em>not</em> propagate back to this future; with the
* override-based design, calling {@code cancel(true)} on either the
* direct return value or the user-facing chained future ultimately
* invokes {@code source.cancel(true)} (chained futures forward to the
* upstream stage they were derived from, which is this future).
*
* @param source the gRPC future-stub result
* @param operation the operation name used in normalised error messages
* @param <T> the reply type
* @return a completable future mirroring the source
*/
static <T> CompletableFuture<T> toCompletable(ListenableFuture<T> source, String operation) {
CompletableFuture<T> target = new CompletableFuture<>();
CancellingCompletableFuture<T> target = new CancellingCompletableFuture<>(source);
Futures.addCallback(
source,
new FutureCallback<>() {
@@ -129,14 +196,83 @@ final class MxGatewayChannels {
}
},
MoreExecutors.directExecutor());
target.whenComplete((ignoredResult, ignoredError) -> {
if (target.isCancelled()) {
source.cancel(true);
}
});
return target;
}
/**
* Bridges a Guava {@link ListenableFuture} to a {@link CompletableFuture}
* and applies {@code validator} to the reply inline (i.e. without a
* downstream {@code thenApply}), so the user-visible future is the same
* future cancellation is bound to. Any non-{@link MxGatewayException}
* {@link RuntimeException} thrown by {@code validator} is routed through
* {@link MxGatewayErrors#fromGrpc} to match the synchronous error surface.
*
* <p>This overload exists because the prior {@code toCompletable(...)
* .thenApply(validator)} pattern broke cancellation propagation: the
* future returned by {@code thenApply} is a new stage whose cancellation
* does not propagate to the underlying gRPC call. Using this overload, the
* single returned future is the one users hold, so calling {@code cancel}
* on it forwards to the source RPC.
*
* @param source the gRPC future-stub result
* @param operation the operation name used in normalised error messages
* @param validator the validating/transforming function applied to the reply
* @param <T> the reply type
* @param <R> the validated/transformed result type
* @return a completable future mirroring the validated source
*/
static <T, R> CompletableFuture<R> toCompletable(
ListenableFuture<T> source, String operation, Function<T, R> validator) {
CancellingCompletableFuture<R> target = new CancellingCompletableFuture<>(source);
Futures.addCallback(
source,
new FutureCallback<>() {
@Override
public void onSuccess(T result) {
try {
target.complete(validator.apply(result));
} catch (MxGatewayException error) {
target.completeExceptionally(error);
} catch (RuntimeException error) {
target.completeExceptionally(MxGatewayErrors.fromGrpc(operation, error));
}
}
@Override
public void onFailure(Throwable error) {
if (error instanceof RuntimeException runtimeException) {
target.completeExceptionally(MxGatewayErrors.fromGrpc(operation, runtimeException));
return;
}
target.completeExceptionally(error);
}
},
MoreExecutors.directExecutor());
return target;
}
/**
* {@link CompletableFuture} subclass that forwards {@link #cancel(boolean)}
* to a backing {@link ListenableFuture}. Used by {@link #toCompletable} so
* cancelling the user-visible future cancels the underlying gRPC call.
*/
static final class CancellingCompletableFuture<T> extends CompletableFuture<T> {
private final ListenableFuture<?> source;
CancellingCompletableFuture(ListenableFuture<?> source) {
this.source = source;
}
@Override
public boolean cancel(boolean mayInterruptIfRunning) {
boolean cancelled = super.cancel(mayInterruptIfRunning);
// Always forward; the source future is idempotent on cancel and the
// user contract is that cancelling the future cancels the RPC.
source.cancel(mayInterruptIfRunning);
return cancelled;
}
}
/**
* Adapts a reply-validating function for use inside {@code thenApply} so
* any non-{@link MxGatewayException} {@link RuntimeException} it raises is
@@ -7,7 +7,6 @@ import io.grpc.ManagedChannel;
import io.grpc.stub.StreamObserver;
import java.util.Objects;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmReply;
import mxaccess_gateway.v1.MxaccessGateway.AcknowledgeAlarmRequest;
@@ -181,13 +180,16 @@ public final class MxGatewayClient implements AutoCloseable {
* with {@link MxGatewayException} on failure
*/
public CompletableFuture<OpenSessionReply> openSessionAsync(OpenSessionRequest request) {
CompletableFuture<OpenSessionReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().openSession(request), "open session");
return future.thenApply(MxGatewayChannels.normalisingValidator("open session", reply -> {
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
}));
// Apply the validator inside toCompletable rather than via .thenApply so
// cancellation on the returned future forwards to the source RPC (a
// .thenApply stage returns a fresh CompletableFuture whose cancel()
// does not propagate back to the upstream stage).
return MxGatewayChannels.toCompletable(
rawFutureStub().openSession(request), "open session", reply -> {
MxGatewayErrors.ensureProtocolSuccess("open session", reply.getProtocolStatus(), null);
ensureGatewayProtocolCompatible(reply);
return reply;
});
}
/**
@@ -222,13 +224,11 @@ public final class MxGatewayClient implements AutoCloseable {
* on failure
*/
public CompletableFuture<MxCommandReply> invokeAsync(MxCommandRequest request) {
CompletableFuture<MxCommandReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().invoke(request), "invoke");
return future.thenApply(MxGatewayChannels.normalisingValidator("invoke", reply -> {
return MxGatewayChannels.toCompletable(rawFutureStub().invoke(request), "invoke", reply -> {
MxGatewayErrors.ensureProtocolSuccess("invoke", reply.getProtocolStatus(), reply);
MxGatewayErrors.ensureMxAccessSuccess("invoke", reply);
return reply;
}));
});
}
/**
@@ -320,12 +320,11 @@ public final class MxGatewayClient implements AutoCloseable {
* with {@link MxGatewayException} on failure
*/
public CompletableFuture<AcknowledgeAlarmReply> acknowledgeAlarmAsync(AcknowledgeAlarmRequest request) {
CompletableFuture<AcknowledgeAlarmReply> future =
MxGatewayChannels.toCompletable(rawFutureStub().acknowledgeAlarm(request), "acknowledge alarm");
return future.thenApply(MxGatewayChannels.normalisingValidator("acknowledge alarm", reply -> {
MxGatewayErrors.ensureProtocolSuccess("acknowledge alarm", reply.getProtocolStatus(), null);
return reply;
}));
return MxGatewayChannels.toCompletable(
rawFutureStub().acknowledgeAlarm(request), "acknowledge alarm", reply -> {
MxGatewayErrors.ensureProtocolSuccess("acknowledge alarm", reply.getProtocolStatus(), null);
return reply;
});
}
/**
@@ -351,43 +350,30 @@ public final class MxGatewayClient implements AutoCloseable {
* callers do not leave in-flight calls or Netty event-loop threads running
* after the block exits.
*
* <p>Waits up to the configured connect timeout for graceful termination
* and forcibly shuts the channel down on timeout. If the calling thread is
* interrupted while waiting, the channel is forcibly shut down and the
* thread's interrupt flag is restored. No-op for clients that do not own
* their channel. For an explicitly checked, blocking-aware shutdown call
* {@link #closeAndAwaitTermination()}.
* <p>Waits up to {@link MxGatewayClientOptions#shutdownTimeout()} for
* graceful termination and forcibly shuts the channel down on timeout. If
* the calling thread is interrupted while waiting, the channel is forcibly
* shut down and the thread's interrupt flag is restored. No-op for clients
* that do not own their channel. For an explicitly checked, blocking-aware
* shutdown call {@link #closeAndAwaitTermination()}. Delegates to the
* shared {@link MxGatewayChannels#shutdown} so behavior stays in lockstep
* with {@link GalaxyRepositoryClient}.
*/
@Override
public void close() {
if (ownedChannel == null) {
return;
}
ownedChannel.shutdown();
try {
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
} catch (InterruptedException error) {
ownedChannel.shutdownNow();
Thread.currentThread().interrupt();
}
MxGatewayChannels.shutdown(ownedChannel, options);
}
/**
* Shuts the owned channel down and waits up to the configured connect
* timeout for termination, forcibly shutting it down on timeout. No-op
* for clients that do not own their channel.
* Shuts the owned channel down and waits up to
* {@link MxGatewayClientOptions#shutdownTimeout()} for termination,
* forcibly shutting it down on timeout. No-op for clients that do not own
* their channel.
*
* @throws InterruptedException if the calling thread is interrupted while waiting
*/
public void closeAndAwaitTermination() throws InterruptedException {
if (ownedChannel != null) {
ownedChannel.shutdown();
if (!ownedChannel.awaitTermination(options.connectTimeout().toMillis(), TimeUnit.MILLISECONDS)) {
ownedChannel.shutdownNow();
}
}
MxGatewayChannels.shutdownAndAwaitTermination(ownedChannel, options);
}
static ProtocolStatusCode okStatusCode() {
@@ -14,6 +14,7 @@ import java.util.Objects;
public final class MxGatewayClientOptions {
private static final Duration DEFAULT_CONNECT_TIMEOUT = Duration.ofSeconds(10);
private static final Duration DEFAULT_CALL_TIMEOUT = Duration.ofSeconds(30);
private static final Duration DEFAULT_SHUTDOWN_TIMEOUT = Duration.ofSeconds(10);
private static final int DEFAULT_MAX_GRPC_MESSAGE_BYTES = 16 * 1024 * 1024;
private final String endpoint;
@@ -24,6 +25,7 @@ public final class MxGatewayClientOptions {
private final Duration connectTimeout;
private final Duration callTimeout;
private final Duration streamTimeout;
private final Duration shutdownTimeout;
private final int maxGrpcMessageBytes;
private MxGatewayClientOptions(Builder builder) {
@@ -35,6 +37,7 @@ public final class MxGatewayClientOptions {
connectTimeout = builder.connectTimeout == null ? DEFAULT_CONNECT_TIMEOUT : builder.connectTimeout;
callTimeout = builder.callTimeout == null ? DEFAULT_CALL_TIMEOUT : builder.callTimeout;
streamTimeout = builder.streamTimeout;
shutdownTimeout = builder.shutdownTimeout == null ? DEFAULT_SHUTDOWN_TIMEOUT : builder.shutdownTimeout;
maxGrpcMessageBytes = builder.maxGrpcMessageBytes <= 0
? DEFAULT_MAX_GRPC_MESSAGE_BYTES
: builder.maxGrpcMessageBytes;
@@ -131,6 +134,18 @@ public final class MxGatewayClientOptions {
return streamTimeout;
}
/**
* Returns the upper bound on graceful shutdown waiting, applied by
* {@code close()} and {@code closeAndAwaitTermination()}. Independent of
* {@link #connectTimeout()}; a small connect timeout no longer forces an
* aggressive {@code shutdownNow()} on in-flight calls.
*
* @return the shutdown timeout duration
*/
public Duration shutdownTimeout() {
return shutdownTimeout;
}
public int maxGrpcMessageBytes() {
return maxGrpcMessageBytes;
}
@@ -157,6 +172,8 @@ public final class MxGatewayClientOptions {
+ callTimeout
+ ", streamTimeout="
+ streamTimeout
+ ", shutdownTimeout="
+ shutdownTimeout
+ ", maxGrpcMessageBytes="
+ maxGrpcMessageBytes
+ '}';
@@ -181,6 +198,7 @@ public final class MxGatewayClientOptions {
private Duration connectTimeout;
private Duration callTimeout;
private Duration streamTimeout;
private Duration shutdownTimeout;
private int maxGrpcMessageBytes;
private Builder() {
@@ -277,6 +295,20 @@ public final class MxGatewayClientOptions {
return this;
}
/**
* Sets the upper bound on graceful shutdown waiting (applied by
* {@code close()} and {@code closeAndAwaitTermination()}). Defaults to
* 10 s and is independent of the connect timeout.
*
* @param value the shutdown timeout, must be non-{@code null}
* @return this builder
* @throws NullPointerException if {@code value} is {@code null}
*/
public Builder shutdownTimeout(Duration value) {
shutdownTimeout = Objects.requireNonNull(value, "shutdownTimeout");
return this;
}
public Builder maxGrpcMessageBytes(int value) {
maxGrpcMessageBytes = value;
return this;
@@ -1,5 +1,7 @@
package com.dohertylan.mxgateway.client;
import java.util.regex.Pattern;
/**
* Helpers for redacting secrets such as gateway API keys from log output.
*
@@ -7,6 +9,16 @@ package com.dohertylan.mxgateway.client;
* produce shortened, masked forms safe for diagnostic messages.
*/
public final class MxGatewaySecrets {
// Match any gateway-shaped credential anywhere in the string, regardless of
// surrounding punctuation: quoted, colon/comma-delimited, embedded in URLs
// or parens. The underscore-separated character class also covers a
// trailing hyphen in case a future key format introduces one.
private static final Pattern MXGW_TOKEN = Pattern.compile("mxgw_[A-Za-z0-9_-]+");
// Mask the token after a Bearer marker as a unit so callers cannot
// accidentally leak the secret when the surrounding text is a header-style
// string (e.g. "Bearer mxgw_id_secret").
private static final Pattern BEARER_TOKEN = Pattern.compile("(?i)bearer\\s+\\S+");
private MxGatewaySecrets() {
}
@@ -43,9 +55,15 @@ public final class MxGatewaySecrets {
}
/**
* Replaces gateway-style credential tokens (the {@code mxgw_} prefix and
* any {@code Bearer} marker) inside a free-form string with a redaction
* placeholder.
* Replaces gateway-style credential tokens inside a free-form string with a
* redaction placeholder.
*
* <p>Matches any {@code mxgw_<...>} token anywhere in the string,
* irrespective of surrounding punctuation (whitespace, colons, commas,
* single/double quotes, parentheses, embedded URL paths). Also masks the
* argument of an authorization-header style {@code Bearer <token>} marker
* as a unit so the token cannot leak through when the surrounding string
* is a raw header value.
*
* @param value the string to scrub, may be {@code null}
* @return an empty string for {@code null}, the original value when blank,
@@ -56,12 +74,8 @@ public final class MxGatewaySecrets {
return value == null ? "" : value;
}
String[] parts = value.split("\\s+");
for (int index = 0; index < parts.length; index++) {
if (parts[index].startsWith("mxgw_") || parts[index].equalsIgnoreCase("bearer")) {
parts[index] = "<redacted>";
}
}
return String.join(" ", parts);
String scrubbed = MXGW_TOKEN.matcher(value).replaceAll("<redacted>");
scrubbed = BEARER_TOKEN.matcher(scrubbed).replaceAll("Bearer <redacted>");
return scrubbed;
}
}
@@ -106,6 +106,37 @@ final class MxGatewayFixtureTests {
assertFalse(authError.getMessage().contains("visible_secret"));
}
@Test
void redactCredentialsHandlesNonWhitespaceDelimitedTokens() {
// Client.Java-018 regression: the previous whitespace-split scrub left
// mxgw_ credentials attached to quotes, commas, colons, parens, and
// URL paths intact. The strengthened pattern matches mxgw_<...>
// anywhere in the string regardless of surrounding punctuation.
String singleQuoted = MxGatewaySecrets.redactCredentials("authentication failed: 'mxgw_keyid_secret'");
String doubleQuoted = MxGatewaySecrets.redactCredentials("Bearer:\"mxgw_keyid_secret\"");
String commaDelimited = MxGatewaySecrets.redactCredentials("token=mxgw_keyid_secret,scope=admin");
String colonDelimited = MxGatewaySecrets.redactCredentials("Bearer:mxgw_keyid_secret");
String parenthesised = MxGatewaySecrets.redactCredentials("auth(mxgw_keyid_secret)");
String urlEmbedded = MxGatewaySecrets.redactCredentials("https://gw/api?key=mxgw_keyid_secret&x=1");
String bearerHeader = MxGatewaySecrets.redactCredentials("Bearer mxgw_keyid_secret");
for (String redacted : new String[] {
singleQuoted, doubleQuoted, commaDelimited, colonDelimited, parenthesised, urlEmbedded, bearerHeader
}) {
assertFalse(redacted.contains("mxgw_keyid_secret"), "expected redaction, got: " + redacted);
assertFalse(redacted.contains("keyid_secret"), "tail leaked: " + redacted);
assertTrue(redacted.contains("<redacted>"), "expected <redacted>, got: " + redacted);
}
}
@Test
void redactCredentialsLeavesBenignContentAlone() {
assertEquals(
"no credentials here",
MxGatewaySecrets.redactCredentials("no credentials here"));
assertEquals("", MxGatewaySecrets.redactCredentials(null));
}
private static JsonObject readFixture(String relativePath) throws Exception {
return JsonParser.parseString(Files.readString(fixtureRoot().resolve(relativePath))).getAsJsonObject();
}
@@ -0,0 +1,182 @@
package com.dohertylan.mxgateway.client;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
import io.grpc.CallOptions;
import io.grpc.ClientCall;
import io.grpc.ConnectivityState;
import io.grpc.ManagedChannel;
import io.grpc.MethodDescriptor;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
import org.junit.jupiter.api.Test;
/**
* Regression tests for the second-pass Low-severity Client.Java findings
* Client.Java-016, Client.Java-019, and the shared shutdown helpers extracted
* to {@link MxGatewayChannels}.
*/
final class MxGatewayLowFindingsIITests {
// --- Client.Java-019: shutdown timeout is independent of connect timeout ---
@Test
void shutdownAndAwaitTerminationHonoursShutdownTimeoutNotConnectTimeout() throws Exception {
// The historical bug: close() used connectTimeout as the awaitTermination
// deadline, so a small connectTimeout forced a premature shutdownNow()
// on in-flight calls. The fix uses a dedicated shutdownTimeout. This
// test verifies the helper waits up to shutdownTimeout (1s) even when
// connectTimeout is set to a tiny value (50ms).
RecordingChannel channel = new RecordingChannel(/* terminatesAfterMillis = */ 200);
MxGatewayClientOptions options = MxGatewayClientOptions.builder()
.endpoint("in-process")
.plaintext(true)
.connectTimeout(Duration.ofMillis(50))
.shutdownTimeout(Duration.ofSeconds(1))
.build();
long start = System.nanoTime();
MxGatewayChannels.shutdownAndAwaitTermination(channel, options);
long elapsedMillis = (System.nanoTime() - start) / 1_000_000L;
// The channel finished orderly termination within the shutdown timeout
// window, so shutdownNow() must NOT have been called. With the old
// implementation a 50ms connect-timeout-as-shutdown-deadline would
// have escalated to shutdownNow() before the channel's 200ms graceful
// termination completed.
assertTrue(channel.shutdownCalled, "shutdown() must be called");
assertFalse(
channel.shutdownNowCalled,
"graceful termination finished within shutdownTimeout; shutdownNow() must not have been called");
// Allow ample slack for build-machine variance but assert we waited at
// least the channel's graceful-termination window.
assertTrue(elapsedMillis >= 150, "should have waited for graceful termination, elapsed=" + elapsedMillis);
}
@Test
void shutdownEscalatesToShutdownNowWhenTimeoutExceeded() {
// The other half of the contract: a channel that does not terminate
// within the shutdownTimeout window must be forcibly shut down.
RecordingChannel channel = new RecordingChannel(/* terminatesAfterMillis = */ 5_000);
MxGatewayClientOptions options = MxGatewayClientOptions.builder()
.endpoint("in-process")
.plaintext(true)
.shutdownTimeout(Duration.ofMillis(100))
.build();
MxGatewayChannels.shutdown(channel, options);
assertTrue(channel.shutdownCalled);
assertTrue(channel.shutdownNowCalled, "stuck channel must be forcibly shut down");
}
@Test
void shutdownTimeoutDefaultIsTenSecondsIndependentOfConnectTimeout() {
MxGatewayClientOptions defaults = MxGatewayClientOptions.builder()
.endpoint("in-process")
.build();
// Default is 10s; an unset connectTimeout-of-10s default coincides but
// the two are now independent options.
assertEquals(Duration.ofSeconds(10), defaults.shutdownTimeout());
MxGatewayClientOptions tinyConnect = MxGatewayClientOptions.builder()
.endpoint("in-process")
.connectTimeout(Duration.ofMillis(500))
.build();
assertEquals(Duration.ofSeconds(10), tinyConnect.shutdownTimeout(),
"shutdownTimeout default is independent of connectTimeout");
}
// --- Client.Java-016: shared shutdown helpers behave identically for both clients ---
@Test
void sharedShutdownHelperIsNoOpForNullChannel() throws Exception {
MxGatewayClientOptions options = MxGatewayClientOptions.builder()
.endpoint("in-process")
.plaintext(true)
.shutdownTimeout(Duration.ofMillis(50))
.build();
// Both helpers must tolerate a null owned-channel (caller-managed channel case).
MxGatewayChannels.shutdown(null, options);
MxGatewayChannels.shutdownAndAwaitTermination(null, options);
}
/**
* Test double for {@link ManagedChannel} that records {@code shutdown}/
* {@code shutdownNow} invocations and simulates an orderly termination
* after a configurable delay. Avoids the heavy in-process gRPC machinery —
* the shutdown helpers only touch the three lifecycle methods.
*/
private static final class RecordingChannel extends ManagedChannel {
private final long terminatesAfterMillis;
private final long createdAtNanos;
private volatile boolean shutdownCalled;
private volatile boolean shutdownNowCalled;
RecordingChannel(long terminatesAfterMillis) {
this.terminatesAfterMillis = terminatesAfterMillis;
this.createdAtNanos = System.nanoTime();
}
@Override
public ManagedChannel shutdown() {
shutdownCalled = true;
return this;
}
@Override
public boolean isShutdown() {
return shutdownCalled || shutdownNowCalled;
}
@Override
public boolean isTerminated() {
if (shutdownNowCalled) {
return true;
}
if (!shutdownCalled) {
return false;
}
long elapsed = (System.nanoTime() - createdAtNanos) / 1_000_000L;
return elapsed >= terminatesAfterMillis;
}
@Override
public ManagedChannel shutdownNow() {
shutdownNowCalled = true;
return this;
}
@Override
public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException {
long deadlineNanos = System.nanoTime() + unit.toNanos(timeout);
while (System.nanoTime() < deadlineNanos) {
if (isTerminated()) {
return true;
}
long remaining = Math.max(1, (deadlineNanos - System.nanoTime()) / 1_000_000L);
Thread.sleep(Math.min(remaining, 10));
}
return isTerminated();
}
@Override
public <RequestT, ResponseT> ClientCall<RequestT, ResponseT> newCall(
MethodDescriptor<RequestT, ResponseT> methodDescriptor, CallOptions callOptions) {
throw new UnsupportedOperationException("no RPCs are issued in shutdown tests");
}
@Override
public String authority() {
return "in-process";
}
@Override
public ConnectivityState getState(boolean requestConnection) {
return ConnectivityState.IDLE;
}
}
}
@@ -13,6 +13,7 @@ import io.grpc.inprocess.InProcessServerBuilder;
import io.grpc.stub.StreamObserver;
import java.time.Duration;
import java.util.UUID;
import java.util.concurrent.CompletableFuture;
import mxaccess_gateway.v1.MxAccessGatewayGrpc;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionReply;
import mxaccess_gateway.v1.MxaccessGateway.CloseSessionRequest;
@@ -27,7 +28,7 @@ import org.junit.jupiter.api.Test;
/**
* Regression tests for the Medium-severity Client.Java code-review findings
* (Client.Java-001 through Client.Java-005).
* (Client.Java-001 through Client.Java-005, and Client.Java-014/015).
*/
final class MxGatewayMediumFindingsTests {
@@ -323,6 +324,138 @@ final class MxGatewayMediumFindingsTests {
}
}
// --- Client.Java-014: MxEventStream.close() before beforeStart must cancel the call ---
@Test
void mxEventStreamCloseBeforeBeforeStartCancelsStream() {
// Mirrors GalaxyRepositoryClientTests.deployEventStreamCloseBeforeBeforeStartCancelsStream:
// if close() runs before the gRPC call has attached its ClientCallStreamObserver,
// beforeStart() must observe the prior close and cancel the underlying call so the
// gRPC subscription does not leak open after the consumer has stopped iterating.
MxEventStream stream = new MxEventStream(4);
io.grpc.stub.ClientResponseObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest,
mxaccess_gateway.v1.MxaccessGateway.MxEvent>
observer = stream.observer();
RecordingEventsRequestStream requestStream = new RecordingEventsRequestStream();
stream.close();
observer.beforeStart(requestStream);
assertTrue(requestStream.cancelled, "beforeStart must cancel the underlying call after a prior close()");
assertEquals("client cancelled event stream", requestStream.cancelMessage);
assertFalse(stream.hasNext());
}
// --- Client.Java-015: cancelling the user-visible *Async future cancels the gRPC call ---
@Test
void invokeAsyncCancellationCancelsUnderlyingGrpcCall() throws Exception {
// Set up a gateway service that never completes the invoke call so cancellation is
// the only way the call terminates. Hook ServerCallStreamObserver.setOnCancelHandler
// to latch when the server observes cancellation.
java.util.concurrent.CountDownLatch serverCancelled = new java.util.concurrent.CountDownLatch(1);
TestService service = new TestService() {
@Override
public void invoke(MxCommandRequest request, StreamObserver<MxCommandReply> responseObserver) {
io.grpc.stub.ServerCallStreamObserver<MxCommandReply> serverObserver =
(io.grpc.stub.ServerCallStreamObserver<MxCommandReply>) responseObserver;
serverObserver.setOnCancelHandler(serverCancelled::countDown);
// Intentionally never complete — the call must be terminated by the client
// cancelling its future, which must propagate to the gRPC cancellation.
}
};
try (Harness harness = Harness.start(service)) {
CompletableFuture<MxCommandReply> future = harness.client().invokeAsync(MxCommandRequest.newBuilder()
.setSessionId("s-cancel")
.setCommand(mxaccess_gateway.v1.MxaccessGateway.MxCommand.newBuilder()
.setKind(MxCommandKind.MX_COMMAND_KIND_REGISTER))
.build());
// Cancellation of the user-visible future must propagate to the gRPC call.
assertTrue(future.cancel(true), "cancel(true) should return true on a pending future");
assertTrue(
serverCancelled.await(5, java.util.concurrent.TimeUnit.SECONDS),
"server must observe RPC cancellation after future.cancel(true)");
}
}
@Test
void toCompletableValidatorOverloadForwardsCancellationToSource() {
// Unit-level proof: cancel() on the future returned by the validator-aware
// toCompletable overload must call cancel(true) on the source ListenableFuture.
// This is the core fix for Client.Java-015 — the validator runs inside
// toCompletable instead of via .thenApply, so the user holds the future
// that is bound to the source.
com.google.common.util.concurrent.SettableFuture<String> source =
com.google.common.util.concurrent.SettableFuture.create();
java.util.concurrent.CompletableFuture<Integer> target =
MxGatewayChannels.toCompletable(source, "noop", String::length);
assertFalse(source.isCancelled());
assertTrue(target.cancel(true));
assertTrue(source.isCancelled(), "source ListenableFuture must be cancelled");
}
@Test
void toCompletableNoValidatorOverloadForwardsCancellationToSource() {
// Regression for the no-validator overload (the historic toCompletable shape).
com.google.common.util.concurrent.SettableFuture<String> source =
com.google.common.util.concurrent.SettableFuture.create();
java.util.concurrent.CompletableFuture<String> target = MxGatewayChannels.toCompletable(source, "noop");
assertFalse(source.isCancelled());
assertTrue(target.cancel(true));
assertTrue(source.isCancelled(), "source ListenableFuture must be cancelled");
}
private static final class RecordingEventsRequestStream
extends io.grpc.stub.ClientCallStreamObserver<
mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest> {
private boolean cancelled;
private String cancelMessage;
@Override
public void cancel(String message, Throwable cause) {
cancelled = true;
cancelMessage = message;
}
@Override
public boolean isReady() {
return true;
}
@Override
public void setOnReadyHandler(Runnable onReadyHandler) {
}
@Override
public void request(int count) {
}
@Override
public void setMessageCompression(boolean enable) {
}
@Override
public void disableAutoInboundFlowControl() {
}
@Override
public void onNext(mxaccess_gateway.v1.MxaccessGateway.StreamEventsRequest value) {
}
@Override
public void onError(Throwable t) {
}
@Override
public void onCompleted() {
}
}
private static mxaccess_gateway.v1.MxaccessGateway.MxEvent testEvent(int sequence) {
return mxaccess_gateway.v1.MxaccessGateway.MxEvent.newBuilder()
.setWorkerSequence(sequence)