e2e: drive each client CLI through one long-lived batch process

The cross-language e2e matrix spawned one CLI process per operation —
~250 per client — paying a process (and, for the Java CLI, a full JVM)
cold-start every time. The Java leg alone ran ~16 minutes.

Each client CLI (dotnet, go, rust, python, java) gains a `batch`
subcommand: a single process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result,
then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command
writes its `{"error":...}` envelope and the loop continues.

run-client-e2e-tests.ps1 now launches one batch process per client and
pings every operation through its stdin/stdout, so startup is paid once
per client. The orchestration and assertions are unchanged; the parity
and auth phases now read the `{"error":...}` envelope instead of a
process exit code.

Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java
leg dropped from ~16 min to ~2-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-21 06:20:13 -04:00
parent c1ff8c94e8
commit 6126099cdb
10 changed files with 970 additions and 47 deletions
+14 -7
View File
@@ -294,15 +294,22 @@ path and writes a JSON report under `artifacts/e2e/`:
write support (`MxAccessCommandExecutor` returning `InvalidRequest` for
`Write`/`Write2`/`WriteSecured`/`WriteSecured2`).
Each client CLI is driven through one long-lived `batch` process. Every CLI
exposes a `batch` subcommand: a process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result, then a
line containing exactly `__MXGW_BATCH_EOR__`. The harness launches one such
process per client and pings the ~250 operations of the flow through it, so the
process — and, for the JVM, the runtime — cold-start is paid once per client
instead of once per operation. A command that fails inside the batch process
writes its `{"error":...}` envelope and the loop continues; the harness treats
that envelope as the operation failure (used by the parity and auth phases).
Before the per-client phases run, the script builds the .NET CLI
(`dotnet build`) and installs the Java CLI (`gradle :mxgateway-cli:installDist`)
once, then invokes the compiled artifacts directly. The matrix issues several
hundred CLI calls per client; invoking `dotnet run` / `gradle
:mxgateway-cli:run` per call rebuilds and cold-starts the toolchain every time,
which stretches the add-item/advise loop long enough for the worker event
channel to overflow under `FailFast` backpressure. The Go, Rust, and Python
clients still build on demand (`go run` / `cargo run` / `python -m`) because
their per-call startup is already sub-second.
once, so the `batch` process launches straight from the compiled exe / the
installed launcher. The Go, Rust, and Python batch processes are launched via
`go run` / `cargo run` / `python -m`, which compile-or-start once when that
single per-client process starts.
Build the gateway and worker, start the gateway, and provide a valid API key
before running the client e2e script: