Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+50
-13
@@ -11,20 +11,29 @@ import java.util.NoSuchElementException;
|
||||
import java.util.Objects;
|
||||
import java.util.concurrent.ArrayBlockingQueue;
|
||||
import java.util.concurrent.BlockingQueue;
|
||||
import java.util.concurrent.atomic.AtomicBoolean;
|
||||
|
||||
/**
|
||||
* Iterator-style adaptor over the {@code WatchDeployEvents} server-streaming
|
||||
* RPC. Mirrors {@link MxEventStream}: events arrive on a background gRPC thread
|
||||
* and are buffered in a bounded blocking queue; the iterator drains them.
|
||||
* Closing the stream cancels the underlying gRPC call.
|
||||
*
|
||||
* <p><strong>Threading:</strong> the iterator methods ({@link #hasNext()} and
|
||||
* {@link #next()}) are <em>not</em> thread-safe and must be driven by a single
|
||||
* consumer thread. {@link #close()} may be called from any thread. Terminal
|
||||
* state transitions (queue overflow, server completion, and {@code close()})
|
||||
* are serialised so that the first terminal condition wins deterministically:
|
||||
* once an overflow exception has been observed it is never silently replaced
|
||||
* by an end-of-stream marker.
|
||||
*/
|
||||
public final class DeployEventStream implements Iterator<DeployEvent>, AutoCloseable {
|
||||
private static final Object END = new Object();
|
||||
|
||||
private final BlockingQueue<Object> queue;
|
||||
private final AtomicBoolean closed = new AtomicBoolean();
|
||||
private final Object terminalLock = new Object();
|
||||
private volatile ClientCallStreamObserver<WatchDeployEventsRequest> requestStream;
|
||||
private volatile boolean closed;
|
||||
private boolean terminated;
|
||||
private Object next;
|
||||
|
||||
DeployEventStream(int capacity) {
|
||||
@@ -36,7 +45,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
|
||||
@Override
|
||||
public void beforeStart(ClientCallStreamObserver<WatchDeployEventsRequest> requestStream) {
|
||||
DeployEventStream.this.requestStream = requestStream;
|
||||
if (closed.get()) {
|
||||
if (closed) {
|
||||
requestStream.cancel("client cancelled deploy event stream", null);
|
||||
}
|
||||
}
|
||||
@@ -48,7 +57,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
|
||||
|
||||
@Override
|
||||
public void onError(Throwable error) {
|
||||
if (Status.fromThrowable(error).getCode() == Status.Code.CANCELLED && closed.get()) {
|
||||
if (Status.fromThrowable(error).getCode() == Status.Code.CANCELLED && closed) {
|
||||
offer(END);
|
||||
return;
|
||||
}
|
||||
@@ -94,12 +103,12 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
|
||||
|
||||
@Override
|
||||
public void close() {
|
||||
closed.set(true);
|
||||
closed = true;
|
||||
ClientCallStreamObserver<WatchDeployEventsRequest> stream = requestStream;
|
||||
if (stream != null) {
|
||||
stream.cancel("client cancelled deploy event stream", null);
|
||||
}
|
||||
offer(END);
|
||||
terminate(null);
|
||||
}
|
||||
|
||||
private Object take() {
|
||||
@@ -117,10 +126,7 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
|
||||
private void offer(Object value) {
|
||||
Objects.requireNonNull(value, "value");
|
||||
if (value == END) {
|
||||
if (!queue.offer(value)) {
|
||||
queue.clear();
|
||||
queue.offer(value);
|
||||
}
|
||||
terminate(null);
|
||||
return;
|
||||
}
|
||||
if (!queue.offer(value)) {
|
||||
@@ -128,9 +134,40 @@ public final class DeployEventStream implements Iterator<DeployEvent>, AutoClose
|
||||
if (stream != null) {
|
||||
stream.cancel("client deploy event stream queue overflowed", null);
|
||||
}
|
||||
queue.clear();
|
||||
queue.offer(new MxGatewayException("galaxy watch deploy events queue overflowed"));
|
||||
queue.offer(END);
|
||||
terminate(new MxGatewayException("galaxy watch deploy events queue overflowed"));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Drives the single terminal transition. The first caller wins: a later
|
||||
* end-of-stream or {@code close()} cannot overwrite or discard an overflow
|
||||
* exception that has already been published to the consumer. Mirrors the
|
||||
* {@link MxEventStream#terminate} contract — see Client.Java-002 for the
|
||||
* race this guards against.
|
||||
*
|
||||
* @param fault the fault to surface to the consumer, or {@code null} for a
|
||||
* clean end-of-stream
|
||||
*/
|
||||
private void terminate(MxGatewayException fault) {
|
||||
synchronized (terminalLock) {
|
||||
if (terminated) {
|
||||
return;
|
||||
}
|
||||
terminated = true;
|
||||
if (fault != null) {
|
||||
// Make room for the fault marker; the consumer only needs the
|
||||
// terminal signal, queued data events are no longer relevant.
|
||||
queue.clear();
|
||||
queue.offer(fault);
|
||||
queue.offer(END);
|
||||
return;
|
||||
}
|
||||
// Clean end-of-stream: ensure the END marker is delivered even when
|
||||
// the queue is currently full of undrained data events.
|
||||
if (!queue.offer(END)) {
|
||||
queue.clear();
|
||||
queue.offer(END);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
+31
-10
@@ -160,16 +160,37 @@ final class MxGatewayChannels {
|
||||
*
|
||||
* <p><strong>Cancellation contract:</strong> the returned future is a
|
||||
* {@link CancellingCompletableFuture} that overrides
|
||||
* {@link CompletableFuture#cancel(boolean)} so cancellation always forwards
|
||||
* to the source {@link ListenableFuture}, even when callers wrap the
|
||||
* future in additional {@code thenApply}/{@code thenCompose} stages. The
|
||||
* historical {@code whenComplete}-based forwarder was buggy because
|
||||
* {@code thenApply} returns a new {@code CompletableFuture} whose
|
||||
* cancellation does <em>not</em> propagate back to this future; with the
|
||||
* override-based design, calling {@code cancel(true)} on either the
|
||||
* direct return value or the user-facing chained future ultimately
|
||||
* invokes {@code source.cancel(true)} (chained futures forward to the
|
||||
* upstream stage they were derived from, which is this future).
|
||||
* {@link CompletableFuture#cancel(boolean)} so cancelling the
|
||||
* <em>direct return value</em> forwards to the source
|
||||
* {@link ListenableFuture}, aborting the underlying gRPC call. This is the
|
||||
* fix for Client.Java-015.
|
||||
*
|
||||
* <p><strong>Important — derived stages do <em>not</em> propagate
|
||||
* cancellation upstream.</strong> Calling
|
||||
* {@code cancel(...)} on a future obtained via
|
||||
* {@code thenApply}/{@code thenCompose}/{@code thenAccept}/{@code whenComplete}
|
||||
* of the value returned by this method only marks <em>that</em> derived stage
|
||||
* as cancelled; it does <strong>not</strong> propagate back to this
|
||||
* {@code CancellingCompletableFuture}, so the source RPC continues until its
|
||||
* deadline expires. {@link CompletableFuture#thenApply} (and the other
|
||||
* chaining methods) deliberately do not forward cancellation to the upstream
|
||||
* stage they were derived from.
|
||||
*
|
||||
* <p>If a caller needs cancellation through a chained pipeline, either:
|
||||
* <ul>
|
||||
* <li>use the {@link #toCompletable(ListenableFuture, String, Function)}
|
||||
* overload below, which inlines a validator into the
|
||||
* {@code FutureCallback} so the user-visible future is the same
|
||||
* future cancellation is bound to (this is what the {@code *Async}
|
||||
* methods on {@link MxGatewayClient} and the unary methods on
|
||||
* {@link GalaxyRepositoryClient} do); or</li>
|
||||
* <li>follow {@link GalaxyRepositoryClient#discoverHierarchyAsync}'s
|
||||
* pattern of returning a custom {@link CompletableFuture} subclass
|
||||
* that tracks the current in-flight stage via an
|
||||
* {@link java.util.concurrent.atomic.AtomicReference} and forwards
|
||||
* {@code cancel(...)} to it (necessary when chaining
|
||||
* {@code thenCompose} stages across paged calls).</li>
|
||||
* </ul>
|
||||
*
|
||||
* @param source the gRPC future-stub result
|
||||
* @param operation the operation name used in normalised error messages
|
||||
|
||||
+58
@@ -175,6 +175,64 @@ final class GalaxyRepositoryClientTests {
|
||||
assertFalse(stream.hasNext());
|
||||
}
|
||||
|
||||
@Test
|
||||
void deployEventStreamOverflowExceptionSurvivesASubsequentClose() {
|
||||
// Client.Java-021 regression: mirror Client.Java-002's terminal-state
|
||||
// serialisation in DeployEventStream — an overflow enqueues the overflow
|
||||
// exception, and a later close() must NOT discard it. The first terminal
|
||||
// condition (overflow) must win and stay observable by next().
|
||||
DeployEventStream stream = new DeployEventStream(2);
|
||||
ClientResponseObserver<WatchDeployEventsRequest, DeployEvent> observer = stream.observer();
|
||||
observer.beforeStart(new RecordingClientCallStreamObserver());
|
||||
|
||||
// Force a queue overflow on a capacity-2 stream.
|
||||
for (int i = 0; i < 8; i++) {
|
||||
observer.onNext(DeployEvent.newBuilder().setSequence(i).build());
|
||||
}
|
||||
|
||||
// A close() arriving after the overflow must not erase the overflow signal.
|
||||
stream.close();
|
||||
|
||||
MxGatewayException error = assertThrows(MxGatewayException.class, () -> {
|
||||
while (stream.hasNext()) {
|
||||
stream.next();
|
||||
}
|
||||
});
|
||||
assertTrue(error.getMessage().contains("overflow"), error::getMessage);
|
||||
}
|
||||
|
||||
@Test
|
||||
void deployEventStreamConcurrentOverflowAndCloseAlwaysTerminate() throws Exception {
|
||||
// Client.Java-021 regression: the terminal-state transition must be
|
||||
// serialised so whatever the interleaving of overflow and close,
|
||||
// hasNext() always reaches a terminal state (no stuck consumer).
|
||||
for (int iteration = 0; iteration < 300; iteration++) {
|
||||
DeployEventStream stream = new DeployEventStream(2);
|
||||
ClientResponseObserver<WatchDeployEventsRequest, DeployEvent> observer = stream.observer();
|
||||
observer.beforeStart(new RecordingClientCallStreamObserver());
|
||||
|
||||
Thread filler = new Thread(() -> {
|
||||
for (int i = 0; i < 8; i++) {
|
||||
observer.onNext(DeployEvent.newBuilder().setSequence(i).build());
|
||||
}
|
||||
});
|
||||
Thread closer = new Thread(stream::close);
|
||||
filler.start();
|
||||
closer.start();
|
||||
filler.join();
|
||||
closer.join();
|
||||
|
||||
try {
|
||||
while (stream.hasNext()) {
|
||||
stream.next();
|
||||
}
|
||||
} catch (MxGatewayException expected) {
|
||||
assertTrue(expected.getMessage().contains("overflow"), expected::getMessage);
|
||||
}
|
||||
assertFalse(stream.hasNext());
|
||||
}
|
||||
}
|
||||
|
||||
@Test
|
||||
void discoverHierarchyRejectsRepeatedPageToken() throws Exception {
|
||||
TestService service = new TestService() {
|
||||
|
||||
Reference in New Issue
Block a user