Code-review 2026-05-20 sweep #2: re-review at a020350, resolve 48 findings

Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.

High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
  pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
  string (it must be a valid SPDX expression), so `pip wheel .` and
  `pip install -e .` both fail before any source compiles. Tests
  still pass because pytest bypasses the build backend via
  `pythonpath`. Dropped the invalid license string, kept the
  `License :: Other/Proprietary License` classifier, and added
  `tests/test_packaging.py` so a future regression of the same shape
  is caught in CI.

Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
  on WorkerPipeSessionOptions bounds the in-flight-command watchdog
  suppression so a truly stuck COM call still triggers StaHung
  instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
  cross-language bench comparison is apples-to-apples again;
  `failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
  serialisation pattern to DeployEventStream so close() arriving
  after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
  stability check after UnAdvise instead of strict equality against
  the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
  log sink the WriteSecured live test owns (worker stdout/stderr,
  gateway logs, direct WriteLine) so the credential is proven
  absent from the full output buffer, not just the diagnostic
  message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
  for the previously-uncovered Write2Bulk and WriteSecured2Bulk
  arms of WriteBulkConstraintPlan.SetPayload.

Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
  GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
  AlarmsOptions validated at startup (Server-026); Authorization.md
  Constraint Enforcement snippet/prose enumerate the bulk write/read
  family (Server-027); bulk-read-commands and bulk-write-commands
  capability tokens added to OpenSession (Server-029);
  NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
  state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
  guard the poll path uses, at every command entry (Worker-024);
  RunAsync null-checks the runtime-session factory result
  (Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
  GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
  rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
  CancelCommandReturnValue serialised under lock (Worker.Tests-027);
  Probes namespace lifted to MxGateway.Worker.Tests.Probes
  (Worker.Tests-029); cancel-envelope sequence numbers monotonised
  (Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
  section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
  (Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
  test backed by a TaskCompletionSource fake (Tests-022); companion
  FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
  (Tests-023); constraint plan reply-count divergence pinned
  (Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
  end-to-end (IntegrationTests-018); abnormal-exit keyword set
  tightened to pipe-disconnected/end-of-stream and the test now
  asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
  default 30s wall-clock budget doesn't kill them (015);
  BenchStreamEventsAsync observes the inner stream task on every
  exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
  %w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
  RFC3339Nano with fractional seconds (019); runStreamEvents installs
  signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
  table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
  cancellation contract Client.Java-015 established (022); stream-events
  text path uses Long.toUnsignedString for worker_sequence (023);
  bench-read-bulk no longer pollutes success-latency histogram with
  failure durations (024); --shutdown-timeout CLI option propagates
  through to ClientOptions (025); seven new MxGatewayCliTests cover
  the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
  wheel-build smoke test added under tests/test_packaging.py (020);
  README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
  document the AsRef<str> read_bulk genericism (019);
  next_correlation_id re-exported at the crate root, with a
  property-style doc contract and an explicit disclaimer that the
  literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
  IConstraintEnforcer mechanism instead of "tag-allowlist filter"
  (014); BulkReadResult gains explicit per-arm payload-population
  documentation for the success vs failure cases (015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 10:28:54 -04:00
parent a0203503a7
commit 1aafd6bde4
74 changed files with 3349 additions and 395 deletions
@@ -857,6 +857,10 @@ public final class MxGatewayCli implements Callable<Integer> {
try {
List<BulkReadResult> results = session.readBulk(serverHandle, tags, timeoutMs);
long elapsed = System.nanoTime() - callStart;
// Only record successful-call latencies — including failed-call
// durations would pollute the p50/p95/p99 percentile summary
// (Client.Java-024, mirrors Client.Rust-015). The cross-language
// bench matrix expects success-only latency histograms.
if (latencyCount >= latenciesNanos.length) {
long[] grown = new long[latenciesNanos.length * 2];
System.arraycopy(latenciesNanos, 0, grown, 0, latencyCount);
@@ -871,13 +875,9 @@ public final class MxGatewayCli implements Callable<Integer> {
}
}
} catch (Exception ex) {
long elapsed = System.nanoTime() - callStart;
if (latencyCount >= latenciesNanos.length) {
long[] grown = new long[latenciesNanos.length * 2];
System.arraycopy(latenciesNanos, 0, grown, 0, latencyCount);
latenciesNanos = grown;
}
latenciesNanos[latencyCount++] = elapsed;
// Failed-call duration is intentionally NOT recorded into
// the success-latency histogram — only count the failure so
// the failedCalls JSON field reflects it.
failed++;
}
}
@@ -1051,7 +1051,13 @@ public final class MxGatewayCli implements Callable<Integer> {
if (json) {
client.out().println(protoJson(event));
} else {
client.out().printf("%d %s%n", event.getWorkerSequence(), event.getFamily());
// worker_sequence is a proto uint64 — print as unsigned so
// values past 2^63 do not render as negative signed longs.
// JSON path goes through JsonFormat which already does this.
client.out().printf(
"%s %s%n",
Long.toUnsignedString(event.getWorkerSequence()),
event.getFamily());
}
count++;
if (limit > 0 && count >= limit) {
@@ -1134,6 +1140,12 @@ public final class MxGatewayCli implements Callable<Integer> {
@Option(names = "--timeout", defaultValue = "30s", description = "Per-call timeout.")
String timeout;
@Option(
names = "--shutdown-timeout",
description =
"Channel shutdown timeout (e.g. 10s, 500ms). When unset, the library default applies.")
String shutdownTimeout;
/**
* Returns this options object unchanged.
*
@@ -1173,15 +1185,35 @@ public final class MxGatewayCli implements Callable<Integer> {
return parseDuration(timeout);
}
/**
* Resolves the effective channel-shutdown timeout from the
* {@code --shutdown-timeout} option, or {@code null} when the user did
* not pass one (in which case the {@link MxGatewayClientOptions}
* default applies). Computed on each call so there is no stale cached
* state.
*
* @return the resolved shutdown timeout, or {@code null} when unset
*/
Duration resolvedShutdownTimeout() {
if (shutdownTimeout == null || shutdownTimeout.isBlank()) {
return null;
}
return parseDuration(shutdownTimeout);
}
MxGatewayClientOptions toClientOptions() {
return MxGatewayClientOptions.builder()
MxGatewayClientOptions.Builder builder = MxGatewayClientOptions.builder()
.endpoint(endpoint)
.apiKey(resolvedApiKey())
.plaintext(plaintext)
.caCertificatePath(caFile)
.serverNameOverride(serverNameOverride)
.callTimeout(resolvedTimeout())
.build();
.callTimeout(resolvedTimeout());
Duration resolvedShutdownTimeout = resolvedShutdownTimeout();
if (resolvedShutdownTimeout != null) {
builder.shutdownTimeout(resolvedShutdownTimeout);
}
return builder.build();
}
Map<String, Object> redactedJsonMap() {
@@ -1193,6 +1225,8 @@ public final class MxGatewayCli implements Callable<Integer> {
values.put("caFile", caFile == null ? "" : caFile.toString());
values.put("serverNameOverride", serverNameOverride);
values.put("timeout", timeout);
Duration resolvedShutdownTimeout = resolvedShutdownTimeout();
values.put("shutdownTimeout", resolvedShutdownTimeout == null ? "" : resolvedShutdownTimeout.toString());
return values;
}
}
@@ -149,6 +149,21 @@ final class MxGatewayCliTests {
assertFalse(text.contains("seq=-1"), "must not render as signed -1");
}
@Test
void streamEventsWorkerSequenceRendersAsUnsignedForHighUint64() {
// Client.Java-023 regression: stream-events text output now uses
// Long.toUnsignedString to format the proto uint64 worker_sequence
// field, mirroring the Client.Java-020 fix for DeployEvent.sequence.
long highUnsigned = -1L; // bit-pattern for 2^64 - 1, i.e. 18446744073709551615 unsigned
String text = String.format(
"%s %s",
Long.toUnsignedString(highUnsigned),
"MX_EVENT_FAMILY_DATA_CHANGE");
assertTrue(text.startsWith("18446744073709551615 "), "expected unsigned rendering, got: " + text);
assertFalse(text.startsWith("-1 "), "must not render as signed -1");
}
@Test
void unsubscribeBulkCommandPrintsResults() {
CliRun run = execute(
@@ -168,6 +183,209 @@ final class MxGatewayCliTests {
assertTrue(run.output().contains("\"wasSuccessful\":true"));
}
// ---- Client.Java-026: CLI-level coverage for bulk subcommands ----
@Test
void readBulkCommandForwardsTimeoutAndPrintsResults() {
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"read-bulk",
"--session-id",
"session-cli",
"--server-handle",
"42",
"--items",
"TestMachine_001.TestChangingInt,TestMachine_002.TestChangingInt",
"--timeout-ms",
"750",
"--json");
assertEquals(0, run.exitCode());
assertEquals(750, factory.client.session.lastReadBulkTimeoutMs);
assertEquals(2, factory.client.session.lastReadBulkItems.size());
assertTrue(run.output().contains("\"command\":\"read-bulk\""));
assertTrue(run.output().contains("\"tagAddress\":\"TestMachine_001.TestChangingInt\""));
assertTrue(run.output().contains("\"itemHandle\":200"));
assertTrue(run.output().contains("\"wasCached\":true"));
assertTrue(run.output().contains("\"quality\":192"));
}
@Test
void writeBulkCommandParsesTypedValuesAndPrintsResults() {
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"write-bulk",
"--session-id",
"session-cli",
"--server-handle",
"42",
"--item-handles",
"100,101",
"--type",
"int32",
"--values",
"111,222",
"--user-id",
"5",
"--json");
assertEquals(0, run.exitCode());
assertEquals(2, factory.client.session.lastWriteBulkEntries.size());
assertEquals(111, factory.client.session.lastWriteBulkEntries.get(0).getValue().getInt32Value());
assertEquals(222, factory.client.session.lastWriteBulkEntries.get(1).getValue().getInt32Value());
assertEquals(5, factory.client.session.lastWriteBulkEntries.get(0).getUserId());
assertTrue(run.output().contains("\"command\":\"write-bulk\""));
assertTrue(run.output().contains("\"itemHandle\":100"));
assertTrue(run.output().contains("\"wasSuccessful\":true"));
}
@Test
void write2BulkCommandForwardsTimestampAndPrintsResults() {
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"write2-bulk",
"--session-id",
"session-cli",
"--server-handle",
"42",
"--item-handles",
"100",
"--type",
"string",
"--values",
"hello",
"--timestamp",
"2026-05-20T00:00:00Z",
"--json");
assertEquals(0, run.exitCode());
assertEquals(1, factory.client.session.lastWrite2BulkEntries.size());
assertEquals(
"hello",
factory.client.session.lastWrite2BulkEntries.get(0).getValue().getStringValue());
assertTrue(
factory.client.session.lastWrite2BulkEntries.get(0).hasTimestampValue(),
"expected timestampValue to be forwarded");
assertTrue(run.output().contains("\"command\":\"write2-bulk\""));
assertTrue(run.output().contains("\"itemHandle\":100"));
assertTrue(run.output().contains("\"wasSuccessful\":true"));
}
@Test
void writeSecuredBulkCommandForwardsUserIdsAndPrintsResults() {
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"write-secured-bulk",
"--session-id",
"session-cli",
"--server-handle",
"42",
"--item-handles",
"100",
"--type",
"int32",
"--values",
"9",
"--current-user-id",
"7",
"--verifier-user-id",
"8",
"--json");
assertEquals(0, run.exitCode());
assertEquals(1, factory.client.session.lastWriteSecuredBulkEntries.size());
assertEquals(7, factory.client.session.lastWriteSecuredBulkEntries.get(0).getCurrentUserId());
assertEquals(8, factory.client.session.lastWriteSecuredBulkEntries.get(0).getVerifierUserId());
assertEquals(9, factory.client.session.lastWriteSecuredBulkEntries.get(0).getValue().getInt32Value());
assertTrue(run.output().contains("\"command\":\"write-secured-bulk\""));
assertTrue(run.output().contains("\"wasSuccessful\":true"));
}
@Test
void writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults() {
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"write-secured2-bulk",
"--session-id",
"session-cli",
"--server-handle",
"42",
"--item-handles",
"100",
"--type",
"string",
"--values",
"value",
"--timestamp",
"2026-05-20T00:00:00Z",
"--current-user-id",
"7",
"--verifier-user-id",
"8",
"--json");
assertEquals(0, run.exitCode());
assertEquals(1, factory.client.session.lastWriteSecured2BulkEntries.size());
assertEquals(7, factory.client.session.lastWriteSecured2BulkEntries.get(0).getCurrentUserId());
assertEquals(8, factory.client.session.lastWriteSecured2BulkEntries.get(0).getVerifierUserId());
assertTrue(
factory.client.session.lastWriteSecured2BulkEntries.get(0).hasTimestampValue(),
"expected timestampValue to be forwarded");
assertTrue(run.output().contains("\"command\":\"write-secured2-bulk\""));
assertTrue(run.output().contains("\"wasSuccessful\":true"));
}
@Test
void benchReadBulkCommandEmitsJsonSchemaKeys() {
// Short bench window (1 s steady, 0 s warmup) keeps the test fast; we assert
// the JSON schema rather than numeric values so the cross-language matrix
// (.NET / Go / Rust / Python) and the Java path agree on the output shape.
FakeClientFactory factory = new FakeClientFactory();
CliRun run = execute(
factory,
"bench-read-bulk",
"--duration-seconds",
"1",
"--warmup-seconds",
"0",
"--bulk-size",
"2",
"--tag-start",
"1",
"--tag-prefix",
"TestMachine_",
"--tag-attribute",
"TestChangingInt",
"--timeout-ms",
"100",
"--json");
assertEquals(0, run.exitCode());
String output = run.output();
assertTrue(output.contains("\"language\":\"java\""), output);
assertTrue(output.contains("\"command\":\"bench-read-bulk\""), output);
assertTrue(output.contains("\"bulkSize\":2"), output);
assertTrue(output.contains("\"durationSeconds\":1"), output);
assertTrue(output.contains("\"warmupSeconds\":0"), output);
assertTrue(output.contains("\"totalCalls\":"), output);
assertTrue(output.contains("\"successfulCalls\":"), output);
assertTrue(output.contains("\"failedCalls\":"), output);
assertTrue(output.contains("\"callsPerSecond\":"), output);
assertTrue(output.contains("\"latencyMs\":"), output);
assertTrue(output.contains("\"p50\":"), output);
assertTrue(output.contains("\"p95\":"), output);
assertTrue(output.contains("\"p99\":"), output);
assertTrue(output.contains("\"tags\":"), output);
// Bench tag synthesis: TestMachine_001.TestChangingInt, TestMachine_002.TestChangingInt.
assertTrue(output.contains("TestMachine_001.TestChangingInt"), output);
assertTrue(output.contains("TestMachine_002.TestChangingInt"), output);
}
private static CliRun execute(MxGatewayCli.MxGatewayCliClientFactory factory, String... args) {
StringWriter output = new StringWriter();
StringWriter errors = new StringWriter();
@@ -322,29 +540,89 @@ final class MxGatewayCliTests {
return results;
}
// Recorded so tests can assert the CLI forwarded the parsed options through to
// the session interface. The bulk subcommands return at least one result so the
// JSON output assertions exercise the *Map serialisers in MxGatewayCli.
private int lastReadBulkTimeoutMs;
private List<String> lastReadBulkItems = new ArrayList<>();
private List<WriteBulkEntry> lastWriteBulkEntries = new ArrayList<>();
private List<Write2BulkEntry> lastWrite2BulkEntries = new ArrayList<>();
private List<WriteSecuredBulkEntry> lastWriteSecuredBulkEntries = new ArrayList<>();
private List<WriteSecured2BulkEntry> lastWriteSecured2BulkEntries = new ArrayList<>();
@Override
public List<BulkReadResult> readBulk(int serverHandle, List<String> items, int timeoutMs) {
return new ArrayList<>();
lastReadBulkTimeoutMs = timeoutMs;
lastReadBulkItems = items;
List<BulkReadResult> results = new ArrayList<>();
for (int index = 0; index < items.size(); index++) {
results.add(BulkReadResult.newBuilder()
.setServerHandle(serverHandle)
.setTagAddress(items.get(index))
.setItemHandle(200 + index)
.setWasSuccessful(true)
.setWasCached(index % 2 == 0)
.setQuality(192)
.build());
}
return results;
}
@Override
public List<BulkWriteResult> writeBulk(int serverHandle, List<WriteBulkEntry> entries) {
return new ArrayList<>();
lastWriteBulkEntries = entries;
List<BulkWriteResult> results = new ArrayList<>();
for (WriteBulkEntry entry : entries) {
results.add(BulkWriteResult.newBuilder()
.setServerHandle(serverHandle)
.setItemHandle(entry.getItemHandle())
.setWasSuccessful(true)
.build());
}
return results;
}
@Override
public List<BulkWriteResult> write2Bulk(int serverHandle, List<Write2BulkEntry> entries) {
return new ArrayList<>();
lastWrite2BulkEntries = entries;
List<BulkWriteResult> results = new ArrayList<>();
for (Write2BulkEntry entry : entries) {
results.add(BulkWriteResult.newBuilder()
.setServerHandle(serverHandle)
.setItemHandle(entry.getItemHandle())
.setWasSuccessful(true)
.build());
}
return results;
}
@Override
public List<BulkWriteResult> writeSecuredBulk(int serverHandle, List<WriteSecuredBulkEntry> entries) {
return new ArrayList<>();
lastWriteSecuredBulkEntries = entries;
List<BulkWriteResult> results = new ArrayList<>();
for (WriteSecuredBulkEntry entry : entries) {
results.add(BulkWriteResult.newBuilder()
.setServerHandle(serverHandle)
.setItemHandle(entry.getItemHandle())
.setWasSuccessful(true)
.build());
}
return results;
}
@Override
public List<BulkWriteResult> writeSecured2Bulk(int serverHandle, List<WriteSecured2BulkEntry> entries) {
return new ArrayList<>();
lastWriteSecured2BulkEntries = entries;
List<BulkWriteResult> results = new ArrayList<>();
for (WriteSecured2BulkEntry entry : entries) {
results.add(BulkWriteResult.newBuilder()
.setServerHandle(serverHandle)
.setItemHandle(entry.getItemHandle())
.setWasSuccessful(true)
.build());
}
return results;
}
@Override
@@ -11,20 +11,29 @@ import java.util.NoSuchElementException;
import java.util.Objects;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.atomic.AtomicBoolean;
/**
* Iterator-style adaptor over the {@code WatchDeployEvents} server-streaming
* RPC. Mirrors {@link MxEventStream}: events arrive on a background gRPC thread
* and are buffered in a bounded blocking queue; the iterator drains them.
* Closing the stream cancels the underlying gRPC call.
*
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
* consumer thread. {@link #close()} may be called from any thread. Terminal
* state transitions (queue overflow, server completion, and {@code close()})
* are serialised so that the first terminal condition wins deterministically:
* once an overflow exception has been observed it is never silently replaced
* by an end-of-stream marker.
*/
public final class DeployEventStream implements Iterator<DeployEvent>, AutoCloseable {
private static final Object END = new Object();
private final BlockingQueue<Object> queue;
private final AtomicBoolean closed = new AtomicBoolean();
private final Object terminalLock = new Object();
private volatile ClientCallStreamObserver<WatchDeployEventsRequest> requestStream;
private volatile boolean closed;
private boolean terminated;
private Object next;
DeployEventStream(int capacity) {
@@ -36,7 +45,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
@Override
public void beforeStart(ClientCallStreamObserver<WatchDeployEventsRequest> requestStream) {
DeployEventStream.this.requestStream = requestStream;
if (closed.get()) {
if (closed) {
requestStream.cancel("client cancelled deploy event stream", null);
}
}
@@ -48,7 +57,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
@Override
public void onError(Throwable error) {
if (Status.fromThrowable(error).getCode() == Status.Code.CANCELLED && closed.get()) {
if (Status.fromThrowable(error).getCode() == Status.Code.CANCELLED && closed) {
offer(END);
return;
}
@@ -94,12 +103,12 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
@Override
public void close() {
closed.set(true);
closed = true;
ClientCallStreamObserver<WatchDeployEventsRequest> stream = requestStream;
if (stream != null) {
stream.cancel("client cancelled deploy event stream", null);
}
offer(END);
terminate(null);
}
private Object take() {
@@ -117,10 +126,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
private void offer(Object value) {
Objects.requireNonNull(value, "value");
if (value == END) {
if (!queue.offer(value)) {
queue.clear();
queue.offer(value);
}
terminate(null);
return;
}
if (!queue.offer(value)) {
@@ -128,9 +134,40 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
if (stream != null) {
stream.cancel("client deploy event stream queue overflowed", null);
}
queue.clear();
queue.offer(new MxGatewayException("galaxy watch deploy events queue overflowed"));
queue.offer(END);
terminate(new MxGatewayException("galaxy watch deploy events queue overflowed"));
}
}
/**
* Drives the single terminal transition. The first caller wins: a later
* end-of-stream or {@code close()} cannot overwrite or discard an overflow
* exception that has already been published to the consumer. Mirrors the
* {@link MxEventStream#terminate} contract — see Client.Java-002 for the
* race this guards against.
*
* @param fault the fault to surface to the consumer, or {@code null} for a
* clean end-of-stream
*/
private void terminate(MxGatewayException fault) {
synchronized (terminalLock) {
if (terminated) {
return;
}
terminated = true;
if (fault != null) {
// Make room for the fault marker; the consumer only needs the
// terminal signal, queued data events are no longer relevant.
queue.clear();
queue.offer(fault);
queue.offer(END);
return;
}
// Clean end-of-stream: ensure the END marker is delivered even when
// the queue is currently full of undrained data events.
if (!queue.offer(END)) {
queue.clear();
queue.offer(END);
}
}
}
}
@@ -160,16 +160,37 @@ final class MxGatewayChannels {
*
* <p><strong>Cancellation contract:</strong> the returned future is a
* {@link CancellingCompletableFuture} that overrides
* {@link CompletableFuture#cancel(boolean)} so cancellation always forwards
* to the source {@link ListenableFuture}, even when callers wrap the
* future in additional {@code thenApply}/{@code thenCompose} stages. The
* historical {@code whenComplete}-based forwarder was buggy because
* {@code thenApply} returns a new {@code CompletableFuture} whose
* cancellation does <em>not</em> propagate back to this future; with the
* override-based design, calling {@code cancel(true)} on either the
* direct return value or the user-facing chained future ultimately
* invokes {@code source.cancel(true)} (chained futures forward to the
* upstream stage they were derived from, which is this future).
* {@link CompletableFuture#cancel(boolean)} so cancelling the
* <em>direct return value</em> forwards to the source
* {@link ListenableFuture}, aborting the underlying gRPC call. This is the
* fix for Client.Java-015.
*
* <p><strong>Important — derived stages do <em>not</em> propagate
* cancellation upstream.</strong> Calling
* {@code cancel(...)} on a future obtained via
* {@code thenApply}/{@code thenCompose}/{@code thenAccept}/{@code whenComplete}
* of the value returned by this method only marks <em>that</em> derived stage
* as cancelled; it does <strong>not</strong> propagate back to this
* {@code CancellingCompletableFuture}, so the source RPC continues until its
* deadline expires. {@link CompletableFuture#thenApply} (and the other
* chaining methods) deliberately do not forward cancellation to the upstream
* stage they were derived from.
*
* <p>If a caller needs cancellation through a chained pipeline, either:
* <ul>
* <li>use the {@link #toCompletable(ListenableFuture, String, Function)}
* overload below, which inlines a validator into the
* {@code FutureCallback} so the user-visible future is the same
* future cancellation is bound to (this is what the {@code *Async}
* methods on {@link MxGatewayClient} and the unary methods on
* {@link GalaxyRepositoryClient} do); or</li>
* <li>follow {@link GalaxyRepositoryClient#discoverHierarchyAsync}'s
* pattern of returning a custom {@link CompletableFuture} subclass
* that tracks the current in-flight stage via an
* {@link java.util.concurrent.atomic.AtomicReference} and forwards
* {@code cancel(...)} to it (necessary when chaining
* {@code thenCompose} stages across paged calls).</li>
* </ul>
*
* @param source the gRPC future-stub result
* @param operation the operation name used in normalised error messages
@@ -175,6 +175,64 @@ final class GalaxyRepositoryClientTests {
assertFalse(stream.hasNext());
}
@Test
void deployEventStreamOverflowExceptionSurvivesASubsequentClose() {
// Client.Java-021 regression: mirror Client.Java-002's terminal-state
// serialisation in DeployEventStream — an overflow enqueues the overflow
// exception, and a later close() must NOT discard it. The first terminal
// condition (overflow) must win and stay observable by next().
DeployEventStream stream = new DeployEventStream(2);
ClientResponseObserver<WatchDeployEventsRequest, DeployEvent> observer = stream.observer();
observer.beforeStart(new RecordingClientCallStreamObserver());
// Force a queue overflow on a capacity-2 stream.
for (int i = 0; i < 8; i++) {
observer.onNext(DeployEvent.newBuilder().setSequence(i).build());
}
// A close() arriving after the overflow must not erase the overflow signal.
stream.close();
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
while (stream.hasNext()) {
stream.next();
}
});
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
}
@Test
void deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate() throws Exception {
// Client.Java-021 regression: the terminal-state transition must be
// serialised so whatever the interleaving of overflow and close,
// hasNext() always reaches a terminal state (no stuck consumer).
for (int iteration = 0; iteration < 300; iteration++) {
DeployEventStream stream = new DeployEventStream(2);
ClientResponseObserver<WatchDeployEventsRequest, DeployEvent> observer = stream.observer();
observer.beforeStart(new RecordingClientCallStreamObserver());
Thread filler = new Thread(() -> {
for (int i = 0; i < 8; i++) {
observer.onNext(DeployEvent.newBuilder().setSequence(i).build());
}
});
Thread closer = new Thread(stream::close);
filler.start();
closer.start();
filler.join();
closer.join();
try {
while (stream.hasNext()) {
stream.next();
}
} catch (MxGatewayException expected) {
assertTrue(expected.getMessage().contains("overflow"), expected::getMessage);
}
assertFalse(stream.hasNext());
}
}
@Test
void discoverHierarchyRejectsRepeatedPageToken() throws Exception {
TestService service = new TestService() {
@@ -59287,9 +59287,11 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
* <pre>
* Per-item result for the four bulk write families. `item_handle` mirrors the
* request entry's item_handle so callers can correlate inputs to outputs even
* when the gateway's tag-allowlist filter dropped some entries before reaching
* the worker. Per-item failures populate `error_message` + `hresult` and never
* raise callers iterate and inspect each entry.
* when the gateway's per-entry `IConstraintEnforcer.CheckWriteHandleAsync`
* filter (see `MxAccessGatewayService.ReplaceWriteBulkEntries` and
* `docs/Authorization.md`) dropped some entries before reaching the worker.
* Per-item failures populate `error_message` + `hresult` and never raise
* callers iterate and inspect each entry.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.BulkWriteResult}
@@ -59686,9 +59688,11 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
* <pre>
* Per-item result for the four bulk write families. `item_handle` mirrors the
* request entry's item_handle so callers can correlate inputs to outputs even
* when the gateway's tag-allowlist filter dropped some entries before reaching
* the worker. Per-item failures populate `error_message` + `hresult` and never
* raise callers iterate and inspect each entry.
* when the gateway's per-entry `IConstraintEnforcer.CheckWriteHandleAsync`
* filter (see `MxAccessGatewayService.ReplaceWriteBulkEntries` and
* `docs/Authorization.md`) dropped some entries before reaching the worker.
* Per-item failures populate `error_message` + `hresult` and never raise
* callers iterate and inspect each entry.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.BulkWriteResult}
@@ -61295,6 +61299,20 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
* an existing live subscription's last OnDataChange (the worker did not touch
* the subscription); false when the worker took the AddItem + Advise + wait +
* UnAdvise + RemoveItem snapshot lifecycle itself.
*
* On `was_successful = true`, `value`, `quality`, `source_timestamp`, and
* `statuses` carry the read data (from the cached subscription or the snapshot
* lifecycle, depending on `was_cached`) and `error_message` is empty. On
* `was_successful = false`, only `server_handle`, `tag_address`, `item_handle`
* (when allocated), `was_cached`, and `error_message` are populated; `value`,
* `quality`, `source_timestamp`, and `statuses` are left at their proto3
* defaults (null / 0 / null / empty) and must not be read as data they are
* wire-indistinguishable from "value is null with quality bad" data and serve
* only as absent markers. ReadBulk has no `hresult` field by design (its
* outcomes are timeout / cache / lifecycle states, not MXAccess COM return
* codes see `docs/DesignDecisions.md` "Bulk Command Family"). Per-tag
* failures populate `error_message` and never raise callers iterate and
* inspect each entry.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.BulkReadResult}
@@ -61837,6 +61855,20 @@ public final class MxaccessGateway extends com.google.protobuf.GeneratedFile {
* an existing live subscription's last OnDataChange (the worker did not touch
* the subscription); false when the worker took the AddItem + Advise + wait +
* UnAdvise + RemoveItem snapshot lifecycle itself.
*
* On `was_successful = true`, `value`, `quality`, `source_timestamp`, and
* `statuses` carry the read data (from the cached subscription or the snapshot
* lifecycle, depending on `was_cached`) and `error_message` is empty. On
* `was_successful = false`, only `server_handle`, `tag_address`, `item_handle`
* (when allocated), `was_cached`, and `error_message` are populated; `value`,
* `quality`, `source_timestamp`, and `statuses` are left at their proto3
* defaults (null / 0 / null / empty) and must not be read as data they are
* wire-indistinguishable from "value is null with quality bad" data and serve
* only as absent markers. ReadBulk has no `hresult` field by design (its
* outcomes are timeout / cache / lifecycle states, not MXAccess COM return
* codes see `docs/DesignDecisions.md` "Bulk Command Family"). Per-tag
* failures populate `error_message` and never raise callers iterate and
* inspect each entry.
* </pre>
*
* Protobuf type {@code mxaccess_gateway.v1.BulkReadResult}