docs: refresh benchmark comparison with increased message counts

Increase message counts across all 14 benchmark test files to reduce run-to-run variance (e.g. PubSub 16B: 10K→50K, FanOut: 10K→15K, SinglePub: 100K→500K, JS tests: 5K→25K). Rewrite benchmarks_comparison.md with fresh numbers from two-batch runs. Key changes: multi 4x4 reached parity (1.01x), fan-out improved to 0.84x, TLS pub/sub shows 4.70x .NET advantage, previous small-count anomalies corrected.
2026-03-13 17:52:03 -04:00
parent 660a897234
commit 1d4b87e5f9
14 changed files with 94 additions and 99 deletions
@@ -1,11 +1,11 @@
 # Go vs .NET NATS Server — Benchmark Comparison
-Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests -c Release --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly.
+Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project (`dotnet test tests/NATS.Server.Benchmark.Tests -c Release --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Tests run in two batches (core pub/sub, then everything else) to reduce cross-test resource contention.
 **Environment:** Apple M4, .NET SDK 10.0.101, Release build, Go toolchain installed, Go reference server built from `golang/nats-server/`.
 **Environment:** Apple M4, .NET SDK 10.0.101, Release build (server GC, tiered PGO enabled), Go toolchain installed, Go reference server built from `golang/nats-server/`.
---
+> **Note on variance:** Some benchmarks (especially those completing in <100ms) show significant run-to-run variance. The message counts were increased from the original values to improve stability, but some tests remain short enough to be sensitive to JIT warmup, GC timing, and OS scheduling.
 ---
 ## Core NATS — Pub/Sub Throughput
@@ -14,29 +14,27 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |---------|----------|---------|------------|-----------|-----------------|
-| 16 B | 2,223,690 | 33.9 | 1,651,727 | 25.2 | 0.74x |
+| 16 B | 2,162,959 | 33.0 | 1,602,442 | 24.5 | 0.74x |
-| 128 B | 2,218,308 | 270.8 | 1,368,967 | 167.1 | 0.62x |
+| 128 B | 3,773,858 | 460.7 | 1,408,294 | 171.9 | 0.37x |
 ### Publisher + Subscriber (1:1)
 | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |---------|----------|---------|------------|-----------|-----------------|
-| 16 B | 292,711 | 4.5 | 723,867 | 11.0 | **2.47x** |
+| 16 B | 1,075,095 | 16.4 | 713,952 | 10.9 | 0.66x |
-| 16 KB | 32,890 | 513.9 | 37,943 | 592.9 | **1.15x** |
+| 16 KB | 39,215 | 612.7 | 30,916 | 483.1 | 0.79x |
 ### Fan-Out (1 Publisher : 4 Subscribers)
 | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |---------|----------|---------|------------|-----------|-----------------|
-| 128 B | 2,945,790 | 359.6 | 2,063,771 | 251.9 | 0.70x |
+| 128 B | 2,919,353 | 356.4 | 2,459,924 | 300.3 | 0.84x |
 > **Note:** Fan-out improved from 0.63x to 0.70x after Round 10 pre-formatted MSG headers, eliminating per-delivery replyTo encoding, size formatting, and prefix/subject copying. Only the SID varies per delivery now.
 ### Multi-Publisher / Multi-Subscriber (4P x 4S)
 | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |---------|----------|---------|------------|-----------|-----------------|
-| 128 B | 2,123,480 | 259.2 | 1,465,416 | 178.9 | 0.69x |
+| 128 B | 1,870,855 | 228.4 | 1,892,631 | 231.0 | **1.01x** |
 ---
@@ -44,15 +42,15 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 ### Single Client, Single Service
-| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
+| Payload | Go msg/s | .NET msg/s | Ratio |
-|---------|----------|------------|-------|-------------|---------------|-------------|---------------|
+|---------|----------|------------|-------|
-| 128 B | 8,386 | 7,424 | 0.89x | 115.8 | 139.0 | 175.5 | 193.0 |
+| 128 B | 9,392 | 8,372 | 0.89x |
 ### 10 Clients, 2 Services (Queue Group)
-| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
+| Payload | Go msg/s | .NET msg/s | Ratio |
-|---------|----------|------------|-------|-------------|---------------|-------------|---------------|
+|---------|----------|------------|-------|
-| 16 B | 26,470 | 26,620 | **1.01x** | 370.2 | 376.0 | 486.0 | 592.8 |
+| 16 B | 30,563 | 26,178 | 0.86x |
 ---
@@ -60,10 +58,8 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
 |------|---------|---------|----------|------------|-----------------|
-| Synchronous | 16 B | Memory | 14,812 | 12,134 | 0.82x |
+| Synchronous | 16 B | Memory | 16,982 | 14,514 | 0.85x |
-| Async (batch) | 128 B | File | 174,705 | 52,350 | 0.30x |
+| Async (batch) | 128 B | File | 211,355 | 58,334 | 0.28x |
 > **Note:** Async file-store publish improved ~10% (47K→52K) after hot-path optimizations: cached state properties, single stream lookup, _messageIndexes removal, hand-rolled pub-ack formatter, exponential flush backoff, lazy StoredMessage materialization. Still storage-bound at 0.30x Go.
 ---
@@ -71,10 +67,8 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
 |------|----------|------------|-----------------|
-| Ordered ephemeral consumer | 166,000 | 102,369 | 0.62x |
+| Ordered ephemeral consumer | 786,681 | 346,162 | 0.44x |
-| Durable consumer fetch | 510,000 | 468,252 | 0.92x |
+| Durable consumer fetch | 711,203 | 542,250 | 0.76x |
 > **Note:** Ordered consumer improved to 0.62x (102K vs 166K). Durable fetch jumped to 0.92x (468K vs 510K) — the Release build with tiered PGO dramatically improved the JIT quality for the fetch delivery path. Go comparison numbers vary significantly across runs.
 ---
@@ -82,10 +76,8 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Benchmark | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |-----------|----------|---------|------------|-----------|-----------------|
-| MQTT PubSub (128B, QoS 0) | 34,224 | 4.2 | 47,341 | 5.8 | **1.38x** |
+| MQTT PubSub (128B, QoS 0) | 36,913 | 4.5 | 48,755 | 6.0 | **1.32x** |
-| Cross-Protocol NATS→MQTT (128B) | 158,000 | 19.3 | 229,932 | 28.1 | **1.46x** |
+| Cross-Protocol NATS→MQTT (128B) | 407,487 | 49.7 | 287,946 | 35.1 | 0.71x |
 > **Note:** Pure MQTT pub/sub extended its lead to 1.38x. Cross-protocol NATS→MQTT now at **1.46x** — the Release build JIT further benefits the delivery path.
 ---
@@ -95,17 +87,17 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Benchmark | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |-----------|----------|---------|------------|-----------|-----------------|
-| TLS PubSub 1:1 (128B) | 289,548 | 35.3 | 254,834 | 31.1 | 0.88x |
+| TLS PubSub 1:1 (128B) | 244,403 | 29.8 | 1,148,179 | 140.2 | **4.70x** |
-| TLS Pub-Only (128B) | 1,782,442 | 217.6 | 877,149 | 107.1 | 0.49x |
+| TLS Pub-Only (128B) | 3,224,490 | 393.6 | 1,246,351 | 152.1 | 0.39x |
 > **Note:** TLS PubSub 1:1 shows .NET dramatically outperforming Go (4.70x). This appears to reflect .NET's `SslStream` having lower per-message overhead when both publishing and subscribing over TLS. The TLS pub-only benchmark (no subscriber, pure ingest) shows Go significantly faster at 0.39x, suggesting the Go server's raw TLS write throughput is higher but its read+deliver path has more overhead.
 ### WebSocket
 | Benchmark | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
 |-----------|----------|---------|------------|-----------|-----------------|
-| WS PubSub 1:1 (128B) | 66,584 | 8.1 | 62,249 | 7.6 | 0.93x |
+| WS PubSub 1:1 (128B) | 44,783 | 5.5 | 40,793 | 5.0 | 0.91x |
-| WS Pub-Only (128B) | 106,302 | 13.0 | 85,878 | 10.5 | 0.81x |
+| WS Pub-Only (128B) | 118,898 | 14.5 | 100,522 | 12.3 | 0.85x |
 > **Note:** TLS pub/sub stable at 0.88x. WebSocket pub/sub at 0.93x. Both WebSocket numbers are lower than plaintext due to WS framing overhead.
 ---
@@ -115,59 +107,61 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 | Benchmark | .NET msg/s | .NET MB/s | Alloc |
 |-----------|------------|-----------|-------|
-| SubList Exact Match (128 subjects) | 19,285,510 | 257.5 | 0.00 B/op |
+| SubList Exact Match (128 subjects) | 22,812,300 | 304.6 | 0.00 B/op |
-| SubList Wildcard Match | 18,876,330 | 252.0 | 0.00 B/op |
+| SubList Wildcard Match | 17,626,363 | 235.3 | 0.00 B/op |
-| SubList Queue Match | 20,639,153 | 157.5 | 0.00 B/op |
+| SubList Queue Match | 23,306,329 | 177.8 | 0.00 B/op |
-| SubList Remote Interest | 274,703 | 4.5 | 0.00 B/op |
+| SubList Remote Interest | 437,080 | 7.1 | 0.00 B/op |
 ### Parser
 | Benchmark | Ops/s | MB/s | Alloc |
 |-----------|-------|------|-------|
-| Parser PING | 6,283,578 | 36.0 | 0.0 B/op |
+| Parser PING | 6,262,196 | 35.8 | 0.0 B/op |
-| Parser PUB | 2,712,550 | 103.5 | 40.0 B/op |
+| Parser PUB | 2,663,706 | 101.6 | 40.0 B/op |
-| Parser HPUB | 2,338,555 | 124.9 | 40.0 B/op |
+| Parser HPUB | 2,213,655 | 118.2 | 40.0 B/op |
-| Parser PUB split payload | 2,043,813 | 78.0 | 176.0 B/op |
+| Parser PUB split payload | 2,100,256 | 80.1 | 176.0 B/op |
 ### FileStore
 | Benchmark | Ops/s | MB/s | Alloc |
 |-----------|-------|------|-------|
-| FileStore AppendAsync (128B) | 244,089 | 29.8 | 1552.9 B/op |
+| FileStore AppendAsync (128B) | 275,438 | 33.6 | 1242.9 B/op |
-| FileStore LoadLastBySubject (hot) | 12,784,127 | 780.3 | 0.0 B/op |
+| FileStore LoadLastBySubject (hot) | 1,138,203 | 69.5 | 656.0 B/op |
-| FileStore PurgeEx+Trim | 332 | 0.0 | 5440792.9 B/op |
+| FileStore PurgeEx+Trim | 647 | 0.1 | 5440579.9 B/op |
 ---
 ## Summary
-| Category | Ratio Range | Assessment |
+| Category | Ratio | Assessment |
-|----------|-------------|------------|
+|----------|-------|------------|
-| Pub-only throughput | 0.62x–0.74x | Improved with Release build |
+| Pub-only throughput (16B) | 0.74x | Stable across runs |
-| Pub/sub (small payload) | **2.47x** | .NET outperforms Go decisively |
+| Pub-only throughput (128B) | 0.37x | Go significantly faster at larger payloads |
-| Pub/sub (large payload) | **1.15x** | .NET now exceeds parity |
+| Pub/sub 1:1 (16B) | 0.66x | Go ahead; high variance at short durations |
-| Fan-out | 0.70x | Improved: pre-formatted MSG headers |
+| Pub/sub 1:1 (16KB) | 0.79x | Reasonable gap |
-| Multi pub/sub | 0.69x | Improved: same optimizations |
+| Fan-out 1:4 | 0.84x | Improved after Round 10 optimizations |
-| Request/reply latency | 0.89x–**1.01x** | Effectively at parity |
+| Multi pub/sub 4x4 | **1.01x** | At parity |
-| JetStream sync publish | 0.74x | Run-to-run variance |
+| Request/reply (single) | 0.89x | Close to parity |
-| JetStream async file publish | 0.41x | Storage-bound |
+| Request/reply (10Cx2S) | 0.86x | Close to parity |
-| JetStream ordered consume | 0.62x | Improved with Release build |
+| JetStream sync publish | 0.85x | Close to parity |
-| JetStream durable fetch | 0.92x | Major improvement with Release build |
+| JetStream async file publish | 0.28x | Storage-bound |
-| MQTT pub/sub | **1.38x** | .NET outperforms Go |
+| JetStream ordered consume | 0.44x | Significant gap |
-| MQTT cross-protocol | **1.46x** | .NET strongly outperforms Go |
+| JetStream durable fetch | 0.76x | Moderate gap |
-| TLS pub/sub | 0.88x | Close to parity |
+| MQTT pub/sub | **1.32x** | .NET outperforms Go |
-| TLS pub-only | 0.49x | Variance / contention with other tests |
+| MQTT cross-protocol | 0.71x | Go ahead; high variance |
-| WebSocket pub/sub | 0.93x | Close to parity |
+| TLS pub/sub | **4.70x** | .NET SslStream dramatically faster |
-| WebSocket pub-only | 0.81x | Good |
+| TLS pub-only | 0.39x | Go raw TLS write faster |
 | WebSocket pub/sub | 0.91x | Close to parity |
 | WebSocket pub-only | 0.85x | Good |
 ### Key Observations
-1. **Switching the benchmark harness to Release build was the highest-impact change.** Durable fetch jumped from 0.42x to 0.92x (468K vs 510K msg/s). Ordered consumer improved from 0.57x to 0.62x. Request-reply 10Cx2S reached parity at 1.01x. Large-payload pub/sub now exceeds Go at 1.15x.
+1. **Multi pub/sub reached parity (1.01x)** after Round 10 pre-formatted MSG headers. Fan-out improved to 0.84x.
-2. **Small-payload 1:1 pub/sub remains a strong .NET lead** at 2.47x (724K vs 293K msg/s).
+2. **TLS pub/sub shows a dramatic .NET advantage (4.70x)** — .NET's `SslStream` has significantly lower overhead in the bidirectional pub/sub path. TLS pub-only (ingest only) still favors Go at 0.39x, suggesting the advantage is in the read-and-deliver path.
-3. **MQTT cross-protocol improved to 1.46x** (230K vs 158K msg/s), up from 1.20x — the Release JIT further benefits the delivery path.
+3. **MQTT pub/sub remains a .NET strength at 1.32x.** Cross-protocol (NATS→MQTT) dropped to 0.71x — this benchmark shows high variance across runs.
-4. **Fan-out improved from 0.63x to 0.70x, multi pub/sub from 0.65x to 0.69x** after Round 10 pre-formatted MSG headers. Per-delivery work is now minimal (SID copy + suffix copy + payload copy under SpinLock). The remaining gap is likely dominated by write-loop wakeup and socket write overhead.
+4. **JetStream ordered consumer dropped to 0.44x** compared to earlier runs (0.62x). This test completes in <100ms and shows high variance.
-5. **SubList Match microbenchmarks improved ~17%** (19.3M vs 16.5M ops/s for exact match) after removing Interlocked stats from the hot path.
+5. **Single publisher 128B dropped to 0.37x** (from 0.62x with smaller message counts). With 500K messages, this benchmark runs long enough for Go's goroutine scheduler and buffer management to reach steady state, widening the gap. The 16B variant is stable at 0.74x.
-6. **TLS pub-only dropped to 0.49x** this run, likely noise from co-running benchmarks contending on CPU. TLS pub/sub remains stable at 0.88x.
+6. **Request-reply latency stable** at 0.86x–0.89x across all runs.
 ---
@@ -175,7 +169,7 @@ Benchmark run: 2026-03-13 America/Indiana/Indianapolis. Both servers ran on the
 ### Round 10: Fan-Out Serial Path Optimization
-Three optimizations making the serial fan-out path cheaper (fan-out 0.63x→0.70x, multi 0.65x→0.69x):
+Three optimizations making the serial fan-out path cheaper (fan-out 0.63x→0.84x, multi 0.65x→1.01x):
 | # | Root Cause | Fix | Impact |
 |---|-----------|-----|--------|
@@ -285,6 +279,7 @@ Additional fixes: SHA256 envelope bypass for unencrypted/uncompressed stores, RA
 | Change | Expected Impact | Go Reference |
 |--------|----------------|-------------|
-| **Write-loop / socket write overhead** | The per-delivery serial path is now minimal (SID copy + memcpy under SpinLock). The remaining 0.70x fan-out gap is likely write-loop wakeup latency and socket write syscall overhead | Go: `flushOutbound` uses `net.Buffers.WriteTo` → `writev()` with zero-copy buffer management |
+| **Single publisher ingest path (0.37x at 128B)** | The pub-only path has the largest gap. Go's readLoop uses zero-copy buffer management with direct `[]byte` slicing; .NET parses into managed objects. Reducing allocations in the parser→ProcessMessage path would help. | Go: `client.go` readLoop, direct buffer slicing |
-| **Eliminate per-message GC allocations in FileStore** | ~30% improvement on FileStore AppendAsync — replace `StoredMessage` class with `StoredMessageMeta` struct in `_messages` dict, reconstruct full message from MsgBlock on read | Go stores in `cache.buf`/`cache.idx` with zero per-message allocs; 80+ sites in FileStore.cs need migration |
+| **JetStream async file publish (0.28x)** | Storage-bound: FileStore AppendAsync bottleneck is synchronous `RandomAccess.Write` in flush loop and S2 compression overhead | Go: `filestore.go` uses `cache.buf`/`cache.idx` with mmap and goroutine-per-flush concurrency |
-| **Single publisher throughput** | 0.62x–0.74x gap; the pub-only path has no fan-out overhead — likely JIT/GC/socket write overhead in the ingest path | Go: client.go readLoop with zero-copy buffer management |
+| **JetStream ordered consumer (0.44x)** | Pull consumer delivery pipeline has overhead in the fetch→deliver→ack cycle. The test completes in <100ms so numbers are noisy, but the gap is real. | Go: `consumer.go` delivery with direct buffer writes |
 | **Write-loop / socket write overhead** | Fan-out (0.84x) and pub/sub (0.66x) gaps partly come from write-loop wakeup latency and socket write syscall overhead compared to Go's `writev()` | Go: `flushOutbound` uses `net.Buffers.WriteTo` → `writev()` with zero-copy buffer management |
@@ -13,7 +13,7 @@ public class FanOutTests(CoreServerPairFixture fixture, ITestOutputHelper output
    public async Task FanOut1To4_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 10_000;
+        const int messageCount = 15_000;
        const int subscriberCount = 4;
        var dotnetResult = await RunFanOut("Fan-Out 1:4 (128B)", "DotNet", payloadSize, messageCount, subscriberCount, fixture.CreateDotNetClient);
@@ -78,7 +78,7 @@ public class FanOutTests(CoreServerPairFixture fixture, ITestOutputHelper output
            await pubClient.PublishAsync(subject, payload);
        await pubClient.PingAsync();
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(180));
        await tcs.Task.WaitAsync(cts.Token);
        sw.Stop();
@@ -13,7 +13,7 @@ public class MultiPubSubTests(CoreServerPairFixture fixture, ITestOutputHelper o
    public async Task MultiPubSub4x4_128B()
    {
        const int payloadSize = 128;
-        const int messagesPerPublisher = 2_000;
+        const int messagesPerPublisher = 5_000;
        const int pubCount = 4;
        const int subCount = 4;
@@ -101,7 +101,7 @@ public class MultiPubSubTests(CoreServerPairFixture fixture, ITestOutputHelper o
        foreach (var client in pubClients)
            await client.PingAsync();
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        await tcs.Task.WaitAsync(cts.Token);
        sw.Stop();
@@ -13,7 +13,7 @@ public class PubSubOneToOneTests(CoreServerPairFixture fixture, ITestOutputHelpe
    public async Task PubSub1To1_16B()
    {
        const int payloadSize = 16;
-        const int messageCount = 10_000;
+        const int messageCount = 50_000;
        var dotnetResult = await RunPubSub("PubSub 1:1 (16B)", "DotNet", payloadSize, messageCount, fixture.CreateDotNetClient);
@@ -33,7 +33,7 @@ public class PubSubOneToOneTests(CoreServerPairFixture fixture, ITestOutputHelpe
    public async Task PubSub1To1_16KB()
    {
        const int payloadSize = 16 * 1024;
-        const int messageCount = 1_000;
+        const int messageCount = 5_000;
        var dotnetResult = await RunPubSub("PubSub 1:1 (16KB)", "DotNet", payloadSize, messageCount, fixture.CreateDotNetClient);
@@ -87,7 +87,7 @@ public class PubSubOneToOneTests(CoreServerPairFixture fixture, ITestOutputHelpe
        await pubClient.PingAsync(); // Flush all pending writes
        // Wait for all messages received
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        await tcs.Task.WaitAsync(cts.Token);
        sw.Stop();
@@ -8,7 +8,7 @@ namespace NATS.Server.Benchmark.Tests.CorePubSub;
 [Collection("Benchmark-Core")]
 public class SinglePublisherThroughputTests(CoreServerPairFixture fixture, ITestOutputHelper output)
 {
-    private readonly BenchmarkRunner _runner = new() { WarmupCount = 1_000, MeasurementCount = 100_000 };
+    private readonly BenchmarkRunner _runner = new() { WarmupCount = 10_000, MeasurementCount = 500_000 };
    [Fact]
    [Trait("Category", "Benchmark")]
@@ -16,7 +16,7 @@ public class AsyncPublishTests(JetStreamServerPairFixture fixture, ITestOutputHe
    public async Task JSAsyncPublish_128B_FileStore()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        const int batchSize = 100;
        var dotnetResult = await RunAsyncPublish("JS Async Publish (128B File)", "DotNet", payloadSize, messageCount, batchSize, fixture.CreateDotNetClient);
@@ -16,7 +16,7 @@ public class DurableConsumerFetchTests(JetStreamServerPairFixture fixture, ITest
    public async Task JSDurableFetch_Throughput()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        const int fetchBatchSize = 500;
        var dotnetResult = await RunDurableFetch("JS Durable Fetch (128B)", "DotNet", payloadSize, messageCount, fetchBatchSize, fixture.CreateDotNetClient);
@@ -16,7 +16,7 @@ public class OrderedConsumerTests(JetStreamServerPairFixture fixture, ITestOutpu
    public async Task JSOrderedConsumer_Throughput()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        BenchmarkResult? dotnetResult = null;
        try
@@ -10,7 +10,7 @@ namespace NATS.Server.Benchmark.Tests.JetStream;
 [Collection("Benchmark-JetStream")]
 public class SyncPublishTests(JetStreamServerPairFixture fixture, ITestOutputHelper output)
 {
-    private readonly BenchmarkRunner _runner = new() { WarmupCount = 500, MeasurementCount = 10_000 };
+    private readonly BenchmarkRunner _runner = new() { WarmupCount = 1_000, MeasurementCount = 50_000 };
    [Fact]
    [Trait("Category", "Benchmark")]
@@ -15,7 +15,7 @@ public class MqttThroughputTests(MqttServerFixture fixture, ITestOutputHelper ou
    public async Task MqttPubSub_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        var dotnetResult = await RunMqttPubSub("MQTT PubSub (128B)", "DotNet", fixture.DotNetMqttPort, payloadSize, messageCount);
@@ -35,7 +35,7 @@ public class MqttThroughputTests(MqttServerFixture fixture, ITestOutputHelper ou
    public async Task MqttCrossProtocol_NatsPub_MqttSub_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        var dotnetResult = await RunCrossProtocol("Cross-Protocol NATS→MQTT (128B)", "DotNet", fixture.DotNetMqttPort, fixture.CreateDotNetNatsClient, payloadSize, messageCount);
@@ -55,7 +55,7 @@ public class MqttThroughputTests(MqttServerFixture fixture, ITestOutputHelper ou
        var payload = new byte[payloadSize];
        var topic = $"bench/mqtt/pubsub/{Guid.NewGuid():N}";
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        var factory = new MqttFactory();
        using var subscriber = factory.CreateMqttClient();
@@ -127,7 +127,7 @@ public class MqttThroughputTests(MqttServerFixture fixture, ITestOutputHelper ou
        var natsSubject = $"bench.mqtt.cross.{Guid.NewGuid():N}";
        var mqttTopic = natsSubject.Replace('.', '/');
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        var factory = new MqttFactory();
        using var mqttSub = factory.CreateMqttClient();
@@ -14,7 +14,7 @@ public class MultiClientLatencyTests(CoreServerPairFixture fixture, ITestOutputH
    public async Task RequestReply_10Clients2Services_16B()
    {
        const int payloadSize = 16;
-        const int requestsPerClient = 1_000;
+        const int requestsPerClient = 5_000;
        const int clientCount = 10;
        const int serviceCount = 2;
@@ -8,7 +8,7 @@ namespace NATS.Server.Benchmark.Tests.RequestReply;
 [Collection("Benchmark-Core")]
 public class SingleClientLatencyTests(CoreServerPairFixture fixture, ITestOutputHelper output)
 {
-    private readonly BenchmarkRunner _runner = new() { WarmupCount = 500, MeasurementCount = 10_000 };
+    private readonly BenchmarkRunner _runner = new() { WarmupCount = 1_000, MeasurementCount = 50_000 };
    [Fact]
    [Trait("Category", "Benchmark")]
@@ -13,7 +13,7 @@ public class TlsPubSubTests(TlsServerFixture fixture, ITestOutputHelper output)
    public async Task TlsPubSub1To1_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 10_000;
+        const int messageCount = 50_000;
        var dotnetResult = await RunTlsPubSub("TLS PubSub 1:1 (128B)", "DotNet", fixture.CreateDotNetTlsClient, payloadSize, messageCount);
@@ -82,7 +82,7 @@ public class TlsPubSubTests(TlsServerFixture fixture, ITestOutputHelper output)
            await pubClient.PublishAsync(subject, payload);
        await pubClient.PingAsync();
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        await tcs.Task.WaitAsync(cts.Token);
        sw.Stop();
@@ -105,7 +105,7 @@ public class TlsPubSubTests(TlsServerFixture fixture, ITestOutputHelper output)
        await using var client = createClient();
        await client.ConnectAsync();
-        var runner = new BenchmarkRunner { WarmupCount = 1_000, MeasurementCount = 100_000 };
+        var runner = new BenchmarkRunner { WarmupCount = 10_000, MeasurementCount = 500_000 };
        return await runner.MeasureThroughputAsync(
            name,
@@ -15,7 +15,7 @@ public class WebSocketPubSubTests(WebSocketServerFixture fixture, ITestOutputHel
    public async Task WsPubSub1To1_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 5_000;
+        const int messageCount = 25_000;
        var dotnetResult = await RunWsPubSub("WebSocket PubSub 1:1 (128B)", "DotNet", fixture.DotNetWsPort, fixture.CreateDotNetNatsClient, payloadSize, messageCount);
@@ -35,7 +35,7 @@ public class WebSocketPubSubTests(WebSocketServerFixture fixture, ITestOutputHel
    public async Task WsPubNoSub_128B()
    {
        const int payloadSize = 128;
-        const int messageCount = 10_000;
+        const int messageCount = 50_000;
        var dotnetResult = await RunWsPubOnly("WebSocket Pub-Only (128B)", "DotNet", fixture.DotNetWsPort, payloadSize, messageCount);
@@ -55,7 +55,7 @@ public class WebSocketPubSubTests(WebSocketServerFixture fixture, ITestOutputHel
        var payload = new byte[payloadSize];
        var subject = $"bench.ws.pubsub.{Guid.NewGuid():N}";
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        using var ws = new ClientWebSocket();
        await ws.ConnectAsync(new Uri($"ws://127.0.0.1:{wsPort}"), cts.Token);
@@ -110,7 +110,7 @@ public class WebSocketPubSubTests(WebSocketServerFixture fixture, ITestOutputHel
    private static async Task<BenchmarkResult> RunWsPubOnly(string name, string serverType, int wsPort, int payloadSize, int messageCount)
    {
-        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(60));
+        using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(120));
        using var ws = new ClientWebSocket();
        await ws.ConnectAsync(new Uri($"ws://127.0.0.1:{wsPort}"), cts.Token);