diff --git a/benchmarks_comparison.md b/benchmarks_comparison.md index 9911a7c..4ed9883 100644 --- a/benchmarks_comparison.md +++ b/benchmarks_comparison.md @@ -1,6 +1,6 @@ # Go vs .NET NATS Server — Benchmark Comparison -Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly. +Benchmark run: 2026-03-13 11:41 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly. **Environment:** Apple M4, .NET SDK 10.0.101, benchmark README command run in the benchmark project's default `Debug` configuration, Go toolchain installed, Go reference server built from `golang/nats-server/`. @@ -13,27 +13,27 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 16 B | 2,837,040 | 43.3 | 1,856,572 | 28.3 | 0.65x | -| 128 B | 2,778,511 | 339.2 | 1,542,298 | 188.3 | 0.56x | +| 16 B | 2,223,690 | 33.9 | 1,341,067 | 20.5 | 0.60x | +| 128 B | 2,218,308 | 270.8 | 1,577,523 | 192.6 | 0.71x | ### Publisher + Subscriber (1:1) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 16 B | 1,442,273 | 22.0 | 888,155 | 13.6 | 0.62x | -| 16 KB | 33,013 | 515.8 | 31,068 | 485.4 | 0.94x | +| 16 B | 292,711 | 4.5 | 862,381 | 13.2 | **2.95x** | +| 16 KB | 32,890 | 513.9 | 28,906 | 451.7 | 0.88x | ### Fan-Out (1 Publisher : 4 Subscribers) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 128 B | 2,981,804 | 364.0 | 1,729,483 | 211.1 | 0.58x | +| 128 B | 2,945,790 | 359.6 | 1,858,235 | 226.8 | 0.63x | ### Multi-Publisher / Multi-Subscriber (4P x 4S) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 128 B | 1,567,030 | 191.3 | 1,371,131 | 167.4 | 0.87x | +| 128 B | 2,123,480 | 259.2 | 1,392,249 | 170.0 | 0.66x | --- @@ -43,13 +43,13 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | |---------|----------|------------|-------|-------------|---------------|-------------|---------------| -| 128 B | 8,316 | 7,128 | 0.86x | 116.7 | 136.4 | 165.8 | 203.5 | +| 128 B | 8,386 | 7,014 | 0.84x | 115.8 | 139.0 | 175.5 | 193.0 | ### 10 Clients, 2 Services (Queue Group) | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | |---------|----------|------------|-------|-------------|---------------|-------------|---------------| -| 16 B | 26,409 | 23,024 | 0.87x | 369.2 | 416.5 | 527.5 | 603.8 | +| 16 B | 26,470 | 23,478 | 0.89x | 370.2 | 410.6 | 486.0 | 592.8 | --- @@ -57,10 +57,10 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) | |------|---------|---------|----------|------------|-----------------| -| Synchronous | 16 B | Memory | 13,090 | 9,368 | 0.72x | -| Async (batch) | 128 B | File | 132,869 | 54,750 | 0.41x | +| Synchronous | 16 B | Memory | 14,812 | 12,134 | 0.82x | +| Async (batch) | 128 B | File | 148,156 | 57,479 | 0.39x | -> **Note:** Async file-store publish improved to 0.41x in this run, but the storage write path is still the largest publication gap after the FileStore changes. +> **Note:** Async file-store publish remains well below parity at 0.39x, but it is still materially better than the older 0.30x snapshot that motivated this FileStore round. --- @@ -68,10 +68,10 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) | |------|----------|------------|-----------------| -| Ordered ephemeral consumer | 564,226 | 62,192 | 0.11x | -| Durable consumer fetch | 478,634 | 317,563 | 0.66x | +| Ordered ephemeral consumer | 572,941 | 101,944 | 0.18x | +| Durable consumer fetch | 599,204 | 338,265 | 0.56x | -> **Note:** Ordered-consumer throughput regressed materially in this run. The merged FileStore work helped publish and subject-lookup paths, but ordered consumption remains the clearest JetStream hotspot after this round. +> **Note:** Ordered-consumer throughput remains the clearest JetStream hotspot after this round. The merged FileStore work helped publish and subject-lookup paths more than consumer delivery. --- @@ -81,27 +81,27 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Benchmark | .NET msg/s | .NET MB/s | Alloc | |-----------|------------|-----------|-------| -| SubList Exact Match (128 subjects) | 18,472,815 | 246.6 | 0.00 B/op | -| SubList Wildcard Match | 18,647,671 | 249.0 | 0.00 B/op | -| SubList Queue Match | 19,313,073 | 147.3 | 0.00 B/op | -| SubList Remote Interest | 270,082 | 4.4 | 0.00 B/op | +| SubList Exact Match (128 subjects) | 16,497,186 | 220.3 | 0.00 B/op | +| SubList Wildcard Match | 16,147,367 | 215.6 | 0.00 B/op | +| SubList Queue Match | 15,582,052 | 118.9 | 0.00 B/op | +| SubList Remote Interest | 259,940 | 4.2 | 0.00 B/op | ### Parser | Benchmark | Ops/s | MB/s | Alloc | |-----------|-------|------|-------| -| Parser PING | 5,765,742 | 33.0 | 0.0 B/op | -| Parser PUB | 2,542,120 | 97.0 | 40.0 B/op | -| Parser HPUB | 2,151,468 | 114.9 | 40.0 B/op | -| Parser PUB split payload | 1,876,479 | 71.6 | 176.0 B/op | +| Parser PING | 6,283,578 | 36.0 | 0.0 B/op | +| Parser PUB | 2,712,550 | 103.5 | 40.0 B/op | +| Parser HPUB | 2,338,555 | 124.9 | 40.0 B/op | +| Parser PUB split payload | 2,043,813 | 78.0 | 176.0 B/op | ### FileStore | Benchmark | Ops/s | MB/s | Alloc | |-----------|-------|------|-------| -| FileStore AppendAsync (128B) | 250,964 | 30.6 | 1550.9 B/op | -| FileStore LoadLastBySubject (hot) | 12,057,199 | 735.9 | 0.0 B/op | -| FileStore PurgeEx+Trim | 328 | 0.0 | 5440792.9 B/op | +| FileStore AppendAsync (128B) | 244,089 | 29.8 | 1552.9 B/op | +| FileStore LoadLastBySubject (hot) | 12,784,127 | 780.3 | 0.0 B/op | +| FileStore PurgeEx+Trim | 332 | 0.0 | 5440792.9 B/op | --- @@ -109,25 +109,25 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra | Category | Ratio Range | Assessment | |----------|-------------|------------| -| Pub-only throughput | 0.56x–0.65x | Still behind Go on both payload sizes | -| Pub/sub (small payload) | 0.62x | Regression versus the prior run; no longer ahead of Go | -| Pub/sub (large payload) | 0.94x | Near parity | -| Fan-out | 0.58x | Fan-out remains materially behind Go | -| Multi pub/sub | 0.87x | Close to parity | -| Request/reply latency | 0.86x–0.87x | Good | -| JetStream sync publish | 0.72x | Good | -| JetStream async file publish | 0.41x | Improved, but still storage-bound | -| JetStream ordered consume | 0.11x | Major regression / highest-priority JetStream gap | -| JetStream durable fetch | 0.66x | Good, but slightly down from the prior run | +| Pub-only throughput | 0.60x–0.71x | Mixed; still behind Go | +| Pub/sub (small payload) | **2.95x** | .NET outperforms Go decisively | +| Pub/sub (large payload) | 0.88x | Close, but below parity | +| Fan-out | 0.63x | Still materially behind Go | +| Multi pub/sub | 0.66x | Meaningful gap remains | +| Request/reply latency | 0.84x–0.89x | Good | +| JetStream sync publish | 0.82x | Strong | +| JetStream async file publish | 0.39x | Improved versus older snapshots, still storage-bound | +| JetStream ordered consume | 0.18x | Highest-priority JetStream gap | +| JetStream durable fetch | 0.56x | Regressed from prior snapshot | ### Key Observations -1. **Async file-store publish improved from the prior 0.30x snapshot to 0.41x** (54.8K vs 132.9K msg/s). That is directionally consistent with the FileStore metadata and payload-ownership work landing in this round. -2. **The new FileStore direct benchmarks show the shape of the remaining storage cost clearly**: hot last-by-subject lookup is effectively allocation-free and very fast, append is still around 1551 B/op, and repeated `PurgeEx+Trim` is still extremely allocation-heavy at roughly 5.4 MB/op. -3. **Ordered consumer throughput is now the dominant JetStream problem at 0.11x** (62K vs 564K msg/s). Whatever helped publish and fetch paths did not carry over to ordered-consumer delivery in this run. -4. **Core pub/sub is no longer showing the earlier small-payload outlier win over Go**. 1:1 16 B came in at 0.62x, fan-out at 0.58x, and multi pub/sub at 0.87x, which is a much more uniform profile. -5. **Durable fetch remains respectable at 0.66x**, but it is slightly softer than the last snapshot and still trails Go by a meaningful margin on the same merged build. -6. **SubList and parser microbenchmarks remain strong and stable**. Exact, wildcard, queue, and remote-interest lookups still allocate essentially nothing, and parser contiguous hot paths remain well below the FileStore and consumer-path gaps. +1. **Small-payload 1:1 pub/sub is back to a large `.NET` lead in this final run** at 2.95x (862K vs 293K msg/s). That puts the merged benchmark profile much closer to the earlier comparison snapshot than the intermediate integration-only run. +2. **Async file-store publish is still materially better than the older 0.30x baseline** at 0.39x (57.5K vs 148.2K msg/s), which is consistent with the FileStore metadata and payload-ownership changes helping the write path even though they did not eliminate the gap. +3. **The new FileStore direct benchmarks show what remains expensive in storage maintenance**: `LoadLastBySubject` is allocation-free and extremely fast, `AppendAsync` is still about 1553 B/op, and repeated `PurgeEx+Trim` still burns roughly 5.4 MB/op. +4. **Ordered consumer throughput remains the largest JetStream gap at 0.18x** (102K vs 573K msg/s). That is better than the intermediate 0.11x run, but it is still the clearest post-FileStore optimization target. +5. **Durable fetch regressed to 0.56x in the final run**, which keeps consumer delivery and storage-read coordination in the top tier of remaining work even after the FileStore changes. +6. **Parser and SubList microbenchmarks remain stable and low-allocation**. The storage and consumer layers continue to dominate the server-level benchmark gaps, not the parser or subject matcher hot paths. ---