7db4bffa30
Rust''s debug profile costs the bench ~45% of solo throughput and ~3x of p99 latency vs release (267 vs 184 solo calls/sec, p99 5.7 vs 16ms). Debug disables inlining, runs overflow checks on every arithmetic op, keeps Future state machines un-collapsed, and lets every Vec allocation through unoptimized. Other compiled clients in the matrix don''t see this gap: Go always builds optimized, Python is interpreted, and the JIT-tiered runtimes (HotSpot for Java, CoreCLR Tier 1 for .NET) close most of the gap during the warmup window. The driver now requests `cargo run --release` for Rust and `dotnet run -c Release --no-build` for .NET, so the two compiled-AOT clients race under their production-equivalent profiles. Callers must `cargo build --release -p mxgw-cli` and `dotnet build ... -c Release` once before running the bench; `--no-build` then keeps each measurement window free of compilation overhead. Live re-run (5-way concurrent, 30s, bulkSize 6) after the switch: rust: 145.35 calls/sec (was 123.26 in debug; 18% gain under contention) go: 185.59 calls/sec java: 171.80 calls/sec dotnet:172.31 calls/sec python:140.52 calls/sec Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>