File and fix Server-030 and Client.Dotnet-017 from e2e surfacing

Both findings surfaced when running the cross-language e2e matrix
(scripts/run-client-e2e-tests.ps1) against the redeployed gateway at
commit 84d36b7. Filed in code-reviews/Server/findings.md and
code-reviews/Client.Dotnet/findings.md and fixed in the same change.

Server-030 (Medium / Error handling): GatewaySession.GetReadyWorkerClient
gated on `_state == Ready && _workerClient.State == Ready` but only
formatted `_state` into the SessionManagerException message. Under load
the gateway-driven `_state` and the worker-driven `WorkerClient.State`
can diverge, producing a self-contradictory diagnostic ("Session ... is
not ready. Current state is Ready."). The Java e2e client hit this on
the 56th item after 55 successful add-items. Rewrote the message to
include both states ("Session state is X; worker state is Y"), added
an XML doc explaining the two-state contract and that this branch is
the fail-fast for a divergence race, and added regression test
SessionManagerTests.InvokeAsync_WhenWorkerNotReadyButSessionReady_DiagnosticIncludesBothStates
that pins both states appear in the message. The deeper race (should
the gateway briefly wait for worker-Ready before failing?) remains
open as a follow-up.

Client.Dotnet-017 (Low / Error handling): stream-events CLI threw
OperationCanceledException as an unhandled exception when the user's
--timeout expired before --max-events was reached. Exit code
-532462766, no aggregate JSON. The other client CLIs (Go, Rust, Python,
Java) exit 0 in this case. Wrapped the `await foreach` in
`catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)`
so the supplied token's cancellation (--timeout, Ctrl+C, or parent
CTS) becomes graceful completion; the aggregate `{ "events": [...] }`
JSON still runs after the catch. Added regression test
RunAsync_StreamEvents_WhenTimeoutFiresAfterEvents_EmitsCollectedEventsAndExitsZero
backed by a new FakeCliClient.StreamHangAfterEvents hook that yields
the configured events then parks on the cancellation token.

Side cleanup: the GatewayApplicationTests test added under Server-020
was asserting an invariant (`/dashboard/dashboard/X` doesn't exist)
that I broke by reverting Server-020 in 84d36b7. The doubled endpoint
shapes do exist now (MapGroup("/dashboard") prefixing an already
"/dashboard/X" @page directive) but they're harmless — no client
requests `/dashboard/dashboard/X`. Replaced the test with a positive
assertion (`/dashboard/X` routes ARE registered) and rewrote the XML
doc to record the actual contract.

Verified: dotnet test src/MxGateway.Tests passes 480/480, dotnet test
clients/dotnet/MxGateway.Client.Tests passes 77/77, gateway redeployed
at this commit and GET http://localhost:5130/dashboard returns 200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-20 13:07:39 -04:00
parent 84d36b7638
commit b794c46bc7
8 changed files with 219 additions and 45 deletions
@@ -1216,29 +1216,43 @@ public static class MxGatewayClientCli
AfterWorkerSequence = arguments.GetUInt64("after-worker-sequence", 0),
};
await foreach (MxEvent gatewayEvent in client.StreamEventsAsync(request, cancellationToken)
.WithCancellation(cancellationToken)
.ConfigureAwait(false))
try
{
if (jsonLines)
await foreach (MxEvent gatewayEvent in client.StreamEventsAsync(request, cancellationToken)
.WithCancellation(cancellationToken)
.ConfigureAwait(false))
{
output.WriteLine(ProtobufJsonFormatter.Format(gatewayEvent));
}
else if (json)
{
events.Add(gatewayEvent);
}
else
{
output.WriteLine(ProtobufJsonFormatter.Format(gatewayEvent));
}
if (jsonLines)
{
output.WriteLine(ProtobufJsonFormatter.Format(gatewayEvent));
}
else if (json)
{
events.Add(gatewayEvent);
}
else
{
output.WriteLine(ProtobufJsonFormatter.Format(gatewayEvent));
}
eventCount++;
if (maxEvents > 0 && eventCount >= maxEvents)
{
break;
eventCount++;
if (maxEvents > 0 && eventCount >= maxEvents)
{
break;
}
}
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
// Client.Dotnet-017: the supplied cancellation token covers both the
// user's --timeout wall-clock budget (via CreateCancellation's
// CancelAfter) and external Ctrl+C / parent CTS cancellation. All
// three are graceful completion modes for a finite-window event
// collector: emit the events that arrived before the window closed
// and exit 0. The events list is well-formed at this point; the
// aggregate JSON below still runs. This matches how the Go, Rust,
// Python, and Java CLIs treat their equivalent timeouts.
}
if (json && !jsonLines)
{
@@ -184,6 +184,69 @@ public sealed class MxGatewayClientCliTests
Assert.DoesNotContain("ON_WRITE_COMPLETE", output.ToString());
}
/// <summary>
/// Client.Dotnet-017 regression: a finite-window event collector
/// (<c>stream-events --timeout</c>) must exit 0 and emit the events
/// that arrived before the timeout fired, instead of propagating the
/// timeout-driven <see cref="OperationCanceledException"/> as an
/// unhandled exception (exit code -532462766). The fix wraps the
/// <c>await foreach</c> in a token-aware catch so the cancellation
/// ends the foreach gracefully; the aggregated JSON output still runs.
/// </summary>
[Fact]
public async Task RunAsync_StreamEvents_WhenTimeoutFiresAfterEvents_EmitsCollectedEventsAndExitsZero()
{
using var output = new StringWriter();
using var error = new StringWriter();
FakeCliClient fakeClient = new();
fakeClient.Events.Add(new MxEvent
{
SessionId = "session-fixture",
Family = MxEventFamily.OnDataChange,
WorkerSequence = 1,
});
fakeClient.Events.Add(new MxEvent
{
SessionId = "session-fixture",
Family = MxEventFamily.OnDataChange,
WorkerSequence = 2,
});
// Park forever after yielding the configured events so the CLI's
// --timeout drives the cancellation path.
fakeClient.StreamHangAfterEvents = async token =>
{
await Task.Delay(Timeout.InfiniteTimeSpan, token).ConfigureAwait(false);
};
int exitCode = await MxGatewayClientCli.RunAsync(
[
"stream-events",
"--endpoint",
"http://localhost:5000",
"--api-key",
"test-api-key",
"--session-id",
"session-fixture",
"--json",
"--max-events",
"200",
"--timeout",
"1s",
],
output,
error,
_ => fakeClient);
Assert.Equal(0, exitCode);
string json = output.ToString();
// Aggregate JSON output must run even though the foreach exited via
// cancellation, and it must contain both events that arrived first.
Assert.Contains("\"events\"", json);
Assert.Contains("\"workerSequence\":\"1\"", json);
Assert.Contains("\"workerSequence\":\"2\"", json);
Assert.Equal(string.Empty, error.ToString());
}
/// <summary>Verifies that smoke command closes opened session when a command fails.</summary>
[Fact]
@@ -423,6 +486,14 @@ public sealed class MxGatewayClientCliTests
/// <summary>Exception to throw on invoke, if any.</summary>
public Exception? InvokeFailure { get; init; }
/// <summary>
/// When set, after yielding all <see cref="Events"/> the stream
/// awaits the provided handle and then throws
/// <see cref="OperationCanceledException"/> — used to simulate the
/// CLI timeout / Ctrl+C cancellation path (Client.Dotnet-017).
/// </summary>
public Func<CancellationToken, Task>? StreamHangAfterEvents { get; set; }
/// <inheritdoc />
public ValueTask DisposeAsync()
{
@@ -482,6 +553,11 @@ public sealed class MxGatewayClientCliTests
await Task.Yield();
yield return gatewayEvent;
}
if (StreamHangAfterEvents is not null)
{
await StreamHangAfterEvents(cancellationToken).ConfigureAwait(false);
}
}
/// <summary>Galaxy test connection reply to return.</summary>