e2e: build client CLIs once and drain events so dotnet/java pass

The cross-language client e2e matrix failed for dotnet and Java. Both
failures were in the harness, not the client code.

1. Per-call toolchain cold-start. The matrix issues ~250 CLI calls per
   client; it invoked `dotnet run` / `gradle :mxgateway-cli:run` every
   time, rebuilding and cold-starting the toolchain per call. Build each
   CLI once up front (`dotnet build`, `gradle :mxgateway-cli:installDist`)
   and invoke the compiled artifact directly. This alone fixes dotnet.

2. Worker event-channel overflow. The per-tag advise loop advises every
   discovered tag with no StreamEvents consumer attached, so change
   events accumulate in the worker event channel
   (MxGateway:Events:QueueCapacity) until FailFast faults the worker.
   dotnet's faster loop slipped under the window; the Java CLI's
   process-per-call JVM cold-start did not. Every -DrainEveryTags advised
   tags (default 15) the loop connects a short StreamEvents drain; the
   gateway's per-stream producer empties the channel the instant a
   subscriber attaches, so a small bounded read suffices.

Full 5-client matrix (dotnet, go, rust, python, java) now passes with
-VerifyWrite against a live gateway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-21 05:24:24 -04:00
parent b794c46bc7
commit c1ff8c94e8
2 changed files with 120 additions and 13 deletions
+18 -1
View File
@@ -261,7 +261,14 @@ path and writes a JSON report under `artifacts/e2e/`:
1. **Session + register** — opens one session and registers.
2. **Bulk** — verifies `SubscribeBulk` / `UnsubscribeBulk` on a bounded tag
subset (skip with `-SkipBulk`).
3. **Add-item / advise** — adds and advises every discovered test tag.
3. **Add-item / advise** — adds and advises every discovered test tag. The
loop has no `StreamEvents` consumer attached, so advised tags accumulate
MXAccess change events in the worker event channel
(`MxGateway:Events:QueueCapacity`); left unbounded it overflows under
`FailFast` backpressure and faults the worker. Every `-DrainEveryTags`
advised tags (default 15) the loop connects a short-lived `StreamEvents`
drain so the gateway pumps that channel empty. `-DrainEveryTags 0` disables
the drain.
4. **Stream** — asserts a bounded event stream delivers at least one event
(skip with `-SkipStream`).
5. **Parity** — asserts MXAccess error paths are rejected rather than silently
@@ -287,6 +294,16 @@ path and writes a JSON report under `artifacts/e2e/`:
write support (`MxAccessCommandExecutor` returning `InvalidRequest` for
`Write`/`Write2`/`WriteSecured`/`WriteSecured2`).
Before the per-client phases run, the script builds the .NET CLI
(`dotnet build`) and installs the Java CLI (`gradle :mxgateway-cli:installDist`)
once, then invokes the compiled artifacts directly. The matrix issues several
hundred CLI calls per client; invoking `dotnet run` / `gradle
:mxgateway-cli:run` per call rebuilds and cold-starts the toolchain every time,
which stretches the add-item/advise loop long enough for the worker event
channel to overflow under `FailFast` backpressure. The Go, Rust, and Python
clients still build on demand (`go run` / `cargo run` / `python -m`) because
their per-call startup is already sub-second.
Build the gateway and worker, start the gateway, and provide a valid API key
before running the client e2e script: