e2e: drive each client CLI through one long-lived batch process

The cross-language e2e matrix spawned one CLI process per operation —
~250 per client — paying a process (and, for the Java CLI, a full JVM)
cold-start every time. The Java leg alone ran ~16 minutes.

Each client CLI (dotnet, go, rust, python, java) gains a `batch`
subcommand: a single process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result,
then a line containing exactly `__MXGW_BATCH_EOR__`. A failing command
writes its `{"error":...}` envelope and the loop continues.

run-client-e2e-tests.ps1 now launches one batch process per client and
pings every operation through its stdin/stdout, so startup is paid once
per client. The orchestration and assertions are unchanged; the parity
and auth phases now read the `{"error":...}` envelope instead of a
process exit code.

Full 5-client matrix with -VerifyWrite: ~15 min, down from ~35; the Java
leg dropped from ~16 min to ~2-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-21 06:20:13 -04:00
parent c1ff8c94e8
commit 6126099cdb
10 changed files with 970 additions and 47 deletions
+200 -30
View File
@@ -9,6 +9,11 @@ register, bulk subscribe/unsubscribe, per-tag add-item/advise, event
streaming, a write round-trip with value assertion, error-path (parity)
checks, and API-key auth rejection.
Each client CLI is driven through one long-lived `batch` process: the harness
writes one command line to its stdin and reads the JSON result back, so the
~250 operations per client pay the process (and JVM/runtime) cold-start once
instead of once per operation.
The gateway and worker are assumed to be already running at -Endpoint; the
script does not start or stop them.
#>
@@ -467,12 +472,11 @@ function Assert-BulkResults {
}
# Builds the dotnet and Java client CLIs once up front and records the path to
# each compiled artifact. The e2e matrix issues ~250 CLI calls per client;
# invoking `dotnet run` / `gradle :mxgateway-cli:run` per call rebuilds and
# cold-starts the toolchain every time, stretching the per-tag advise loop long
# enough for the worker event channel to overflow under the FailFast
# backpressure policy. Running the compiled artifact keeps per-call latency
# sub-second, matching the Go/Rust/Python paths.
# each compiled artifact, so the long-lived `batch` process is launched from
# the compiled exe / installed launcher without paying a `dotnet build` or
# `gradle` step at flow time. The Go, Rust, and Python batch processes are
# launched via `go run` / `cargo run` / `python -m`, which compile-or-start
# once when that single per-client process starts.
function Initialize-ClientBuilds {
if ($Clients -contains "dotnet") {
$cliProject = Join-Path $repoRoot "clients/dotnet/MxGateway.Client.Cli/MxGateway.Client.Cli.csproj"
@@ -801,6 +805,161 @@ function Get-DryRunReply {
}
}
# --- Batch-mode client process ---------------------------------------------
# The e2e flow issues ~250 operations per client. Spawning one CLI process per
# operation pays a process — and, for the JVM, a runtime — cold-start every
# time. Instead each client CLI exposes a `batch` subcommand: a single
# long-lived process that reads one command line from stdin, runs it, writes
# the JSON result, then a line containing exactly $batchTerminator. The harness
# drives that one process per client, so startup is paid once.
$script:batchTerminator = "__MXGW_BATCH_EOR__"
$script:currentBatchClient = $null
# A redirected child's StandardInput writer is created with Console.InputEncoding,
# which is UTF-8 *with a BOM* on this host. The writer then prepends that BOM to
# the first bytes it sends, and the CLIs parse it into their first argument.
# Switching the console input encoding to a BOM-less encoding before any batch
# process starts makes that writer BOM-free. The e2e command lines are ASCII.
try {
[Console]::InputEncoding = [System.Text.Encoding]::ASCII
} catch {
Write-Warning "Could not set a BOM-less console input encoding: $($_.Exception.Message)"
}
# Derives the `batch`-process launch spec from Get-ClientCommand: the launch
# prefix is whatever precedes the operation token (e.g. `run -p mxgw-cli --`),
# with the operation itself replaced by `batch`.
function Get-BatchLaunchSpec {
param([string]$Client)
$command = Get-ClientCommand -Client $Client -Operation "open-session" -Values @{} -ApiKeyEnvName $ApiKeyEnv
$argList = [object[]]$command.args
$operationIndex = [Array]::IndexOf($argList, "open-session")
if ($operationIndex -lt 0) {
throw "Cannot locate the operation token in the '$Client' command line."
}
$prefix = if ($operationIndex -gt 0) { @($argList[0..($operationIndex - 1)]) } else { @() }
return [pscustomobject]@{
file = $command.file
args = @($prefix + "batch")
cwd = $command.cwd
env = $command.env
}
}
# Returns just the operation arguments (operation token + flags) for a client
# command, stripping the launch prefix — this is the line written to the batch
# process for one operation.
function Get-ClientOperationArgs {
param(
[string]$Client,
[string]$Operation,
[hashtable]$Values,
[string]$ApiKeyEnvName = $ApiKeyEnv
)
$command = Get-ClientCommand -Client $Client -Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
$argList = [object[]]$command.args
$operationIndex = [Array]::IndexOf($argList, $Operation)
if ($operationIndex -le 0) {
return @($argList)
}
return @($argList[$operationIndex..($argList.Count - 1)])
}
# True when a parsed command reply is the CLI's failure envelope rather than a
# normal result. All five CLIs emit a top-level `error` field on failure.
function Test-OperationFailed {
param([object]$Json)
if ($null -eq $Json) {
return $true
}
$errorValue = Get-PropertyValue -Object $Json -Names @("error")
return -not [string]::IsNullOrEmpty([string]$errorValue)
}
# Starts the long-lived `batch` process for a client and returns a handle
# carrying the process and its redirected stdin/stdout streams.
function Start-BatchClient {
param([string]$Client)
$spec = Get-BatchLaunchSpec -Client $Client
$startInfo = [System.Diagnostics.ProcessStartInfo]::new()
$startInfo.FileName = $spec.file
$startInfo.Arguments = ($spec.args | ForEach-Object { ConvertTo-NativeArgument -Value $_ }) -join " "
$startInfo.WorkingDirectory = $spec.cwd
$startInfo.RedirectStandardInput = $true
$startInfo.RedirectStandardOutput = $true
# stderr is left attached to the console: the CLIs only log diagnostics
# there, and not redirecting it removes any risk of the child blocking on a
# full stderr pipe while the harness reads stdout.
$startInfo.RedirectStandardError = $false
$startInfo.UseShellExecute = $false
foreach ($entry in $spec.env.GetEnumerator()) {
$startInfo.Environment[$entry.Key] = [string]$entry.Value
}
$process = [System.Diagnostics.Process]::new()
$process.StartInfo = $startInfo
[void]$process.Start()
return [pscustomobject]@{ client = $Client; process = $process; input = $process.StandardInput }
}
# Sends one operation to a batch process and returns its raw JSON output text
# (everything written before the terminator line).
function Invoke-BatchOperation {
param(
[pscustomobject]$BatchClient,
[string]$Client,
[string]$Operation,
[hashtable]$Values,
[string]$ApiKeyEnvName
)
$operationArgs = Get-ClientOperationArgs -Client $Client -Operation $Operation `
-Values $Values -ApiKeyEnvName $ApiKeyEnvName
$process = $BatchClient.process
$BatchClient.input.WriteLine(($operationArgs -join " "))
$BatchClient.input.Flush()
$builder = [System.Text.StringBuilder]::new()
while ($true) {
$line = $process.StandardOutput.ReadLine()
if ($null -eq $line) {
throw ("Batch client '$Client' closed its output before terminating operation " +
"'$Operation' (process exited: $($process.HasExited)).")
}
if ($line -eq $script:batchTerminator) {
break
}
[void]$builder.AppendLine($line)
}
return $builder.ToString()
}
# Signals end-of-input to a batch process and waits for it to exit.
function Stop-BatchClient {
param([pscustomobject]$BatchClient)
if ($null -eq $BatchClient) {
return
}
$process = $BatchClient.process
try {
if (-not $process.HasExited) {
$BatchClient.input.Close()
if (-not $process.WaitForExit(15000)) {
$process.Kill($true)
}
}
} catch {
try { $process.Kill($true) } catch { }
} finally {
$process.Dispose()
}
}
function Invoke-ClientOperation {
param(
[string]$Client,
@@ -809,21 +968,27 @@ function Invoke-ClientOperation {
[string]$ApiKeyEnvName = $ApiKeyEnv
)
$command = Get-ClientCommand -Client $Client -Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
$result = Invoke-NativeCommand `
-FilePath $command.file `
-Arguments $command.args `
-WorkingDirectory $command.cwd `
-Environment $command.env
if ($DryRun) {
$operationArgs = Get-ClientOperationArgs -Client $Client -Operation $Operation `
-Values $Values -ApiKeyEnvName $ApiKeyEnvName
Write-Host "[dry-run] (batch:$Client) $($operationArgs -join ' ')"
return Get-DryRunReply -Client $Client -Operation $Operation -Values $Values
}
return Read-JsonObject -Text $result.stdout
$stdout = Invoke-BatchOperation -BatchClient $script:currentBatchClient -Client $Client `
-Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
$json = Read-JsonObject -Text $stdout
if (Test-OperationFailed -Json $json) {
$errorValue = Get-PropertyValue -Object $json -Names @("error")
throw "$Client $Operation failed: $errorValue"
}
return $json
}
# Runs a client operation that is expected to fail, returning the raw process
# result (exit code + stderr) without throwing. Under -DryRun a synthetic
# failure is returned so the parity and auth phases can be exercised offline.
# Runs a client operation that is expected to fail. Returns a record whose
# `failed` flag is true when the CLI reported its failure envelope. Under
# -DryRun a synthetic failure is returned so the parity and auth phases can be
# exercised offline.
function Invoke-ClientOperationExpectingFailure {
param(
[string]$Client,
@@ -833,18 +998,16 @@ function Invoke-ClientOperationExpectingFailure {
)
if ($DryRun) {
$command = Get-ClientCommand -Client $Client -Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
Write-Host "[dry-run] $(Join-CommandLine -FilePath $command.file -Arguments $command.args)"
return [pscustomobject]@{ exitCode = 1; stdout = ""; stderr = "[dry-run] synthetic expected failure" }
$operationArgs = Get-ClientOperationArgs -Client $Client -Operation $Operation `
-Values $Values -ApiKeyEnvName $ApiKeyEnvName
Write-Host "[dry-run] (batch:$Client) $($operationArgs -join ' ')"
return [pscustomobject]@{ failed = $true; json = $null }
}
$command = Get-ClientCommand -Client $Client -Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
return Invoke-NativeCommand `
-FilePath $command.file `
-Arguments $command.args `
-WorkingDirectory $command.cwd `
-Environment $command.env `
-AllowFailure
$stdout = Invoke-BatchOperation -BatchClient $script:currentBatchClient -Client $Client `
-Operation $Operation -Values $Values -ApiKeyEnvName $ApiKeyEnvName
$json = Read-JsonObject -Text $stdout
return [pscustomobject]@{ failed = (Test-OperationFailed -Json $json); json = $json }
}
# Connects a short-lived StreamEvents consumer so the gateway empties the worker
@@ -897,6 +1060,10 @@ function Invoke-ClientFlow {
}
try {
if (-not $DryRun) {
$script:currentBatchClient = Start-BatchClient -Client $Client
}
$openJson = Invoke-ClientOperation -Client $Client -Operation "open-session"
$sessionId = Get-OpenSessionId -Client $Client -Json $openJson
if ([string]::IsNullOrWhiteSpace($sessionId)) {
@@ -1138,11 +1305,10 @@ function Invoke-ClientFlow {
foreach ($parityCheck in $parityChecks) {
$parityResult = Invoke-ClientOperationExpectingFailure `
-Client $Client -Operation $parityCheck.operation -Values $parityCheck.values
$passed = $parityResult.exitCode -ne 0
$passed = [bool]$parityResult.failed
$clientResult.parity += [ordered]@{
check = $parityCheck.check
operation = $parityCheck.operation
exitCode = $parityResult.exitCode
passed = $passed
}
if (-not $passed) {
@@ -1165,10 +1331,9 @@ function Invoke-ClientFlow {
foreach ($authCheck in $authChecks) {
$authResult = Invoke-ClientOperationExpectingFailure `
-Client $Client -Operation "open-session" -ApiKeyEnvName $authCheck.apiKeyEnv
$passed = $authResult.exitCode -ne 0
$passed = [bool]$authResult.failed
$clientResult.auth += [ordered]@{
check = $authCheck.check
exitCode = $authResult.exitCode
passed = $passed
}
if (-not $passed) {
@@ -1190,6 +1355,11 @@ function Invoke-ClientFlow {
$clientResult.error = "$($clientResult.error) close-session failed: $($_.Exception.Message)"
}
}
if ($null -ne $script:currentBatchClient) {
Stop-BatchClient -BatchClient $script:currentBatchClient
$script:currentBatchClient = $null
}
}
return $clientResult