e2e: drive each client CLI through one long-lived batch process

The cross-language e2e matrix spawned one CLI process per operation —
~250 per client — paying a process (and, for the Java CLI, a full JVM)
cold-start every time. The Java leg alone ran ~16 minutes.

Each client CLI (dotnet, go, rust, python, java) gains a `batch`
subcommand: a single process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result,
then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command
writes its `{"error":...}` envelope and the loop continues.

run-client-e2e-tests.ps1 now launches one batch process per client and
pings every operation through its stdin/stdout, so startup is paid once
per client. The orchestration and assertions are unchanged; the parity
and auth phases now read the `{"error":...}` envelope instead of a
process exit code.

Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java
leg dropped from ~16 min to ~2-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-21 06:20:13 -04:00
parent c1ff8c94e8
commit 6126099cdb
10 changed files with 970 additions and 47 deletions
@@ -5,11 +5,13 @@ from __future__ import annotations
import asyncio
import json
import os
import sys
from collections.abc import Awaitable, Callable
from datetime import datetime, timezone
from typing import Any
import click
from click.testing import CliRunner
from google.protobuf.json_format import MessageToDict
from mxgateway import __version__
@@ -23,6 +25,8 @@ from mxgateway.values import MxValueInput, to_mx_value
MAX_AGGREGATE_EVENTS = 10_000
_BATCH_EOR = "__MXGW_BATCH_EOR__"
@click.group()
def main() -> None:
@@ -42,6 +46,80 @@ def version(output_json: bool) -> None:
_emit(payload, output_json=output_json, text=f"mxgw-py {__version__}")
@main.command()
def batch() -> None:
"""Read commands from stdin and execute each, writing output + __MXGW_BATCH_EOR__ after each.
Each non-empty line of stdin is a complete argument string (no quoting support — the
harness never passes whitespace-containing arguments). Lines are split on runs of ASCII
whitespace and dispatched through the normal CLI parser. On EOF or an empty line, exit 0.
Errors do NOT terminate the loop. Each command's output (including any error JSON) is
written to stdout followed by a line containing exactly ``__MXGW_BATCH_EOR__``, then
stdout is flushed. Error output is formatted as ``{"error": "...", "type": "..."}``.
"""
runner = CliRunner()
for raw_line in sys.stdin:
line = raw_line.rstrip("\n").rstrip("\r")
if not line:
# Empty line signals clean exit (matches the spec and .NET behaviour).
break
args = line.split()
try:
result = runner.invoke(main, args, catch_exceptions=True)
except Exception as exc: # noqa: BLE001 — be safe; never let batch loop die
_batch_write_error(exc.__class__.__name__, str(exc))
_batch_flush_eor()
continue
if result.exit_code == 0:
# Normal success — write captured output as-is.
sys.stdout.write(result.output)
else:
# Something went wrong. If the command already emitted a JSON object
# (e.g. the output starts with '{'), trust that and relay it verbatim.
# Otherwise synthesise the standard {"error": ..., "type": ...} shape.
output = result.output or ""
exc = result.exception
if output.lstrip().startswith("{"):
# Already JSON — relay verbatim (may or may not end with newline).
sys.stdout.write(output)
if not output.endswith("\n"):
sys.stdout.write("\n")
elif exc is not None and not isinstance(exc, SystemExit):
_batch_write_error(type(exc).__name__, str(exc))
else:
# Click's default error format is "Error: <message>\n"; extract the
# message so the harness gets clean JSON.
msg = output.strip()
if msg.startswith("Error: "):
msg = msg[len("Error: "):]
exc_type = (
type(exc).__name__
if exc is not None and not isinstance(exc, SystemExit)
else "CliError"
)
_batch_write_error(exc_type, msg)
_batch_flush_eor()
def _batch_write_error(exc_type: str, message: str) -> None:
"""Write a JSON error record to stdout in the standard batch error shape."""
sys.stdout.write(json.dumps({"error": message, "type": exc_type}) + "\n")
def _batch_flush_eor() -> None:
"""Write the end-of-record sentinel and flush stdout."""
sys.stdout.write(_BATCH_EOR + "\n")
sys.stdout.flush()
def gateway_options(command: Callable[..., Any]) -> Callable[..., Any]:
"""Apply the shared gateway connection options to a Click command."""
command = click.option("--endpoint", default="localhost:5000", show_default=True)(command)