From a470e0bcdbda1669b35d3f0c466a01b35a30c2e5 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Fri, 13 Mar 2026 11:39:54 -0400 Subject: [PATCH] docs: refresh benchmark comparison --- benchmarks_comparison.md | 82 ++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 37 deletions(-) diff --git a/benchmarks_comparison.md b/benchmarks_comparison.md index b821999..9911a7c 100644 --- a/benchmarks_comparison.md +++ b/benchmarks_comparison.md @@ -1,6 +1,6 @@ # Go vs .NET NATS Server — Benchmark Comparison -Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly. +Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly. **Environment:** Apple M4, .NET SDK 10.0.101, benchmark README command run in the benchmark project's default `Debug` configuration, Go toolchain installed, Go reference server built from `golang/nats-server/`. @@ -13,27 +13,27 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 16 B | 2,258,647 | 34.5 | 1,275,230 | 19.5 | 0.56x | -| 128 B | 2,251,274 | 274.8 | 1,661,668 | 202.8 | 0.74x | +| 16 B | 2,837,040 | 43.3 | 1,856,572 | 28.3 | 0.65x | +| 128 B | 2,778,511 | 339.2 | 1,542,298 | 188.3 | 0.56x | ### Publisher + Subscriber (1:1) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 16 B | 296,374 | 4.5 | 875,105 | 13.4 | **2.95x** | -| 16 KB | 32,111 | 501.7 | 30,030 | 469.2 | 0.94x | +| 16 B | 1,442,273 | 22.0 | 888,155 | 13.6 | 0.62x | +| 16 KB | 33,013 | 515.8 | 31,068 | 485.4 | 0.94x | ### Fan-Out (1 Publisher : 4 Subscribers) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 128 B | 2,387,889 | 291.5 | 1,780,888 | 217.4 | 0.75x | +| 128 B | 2,981,804 | 364.0 | 1,729,483 | 211.1 | 0.58x | ### Multi-Publisher / Multi-Subscriber (4P x 4S) | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | |---------|----------|---------|------------|-----------|-----------------| -| 128 B | 1,079,112 | 131.7 | 953,596 | 116.4 | 0.88x | +| 128 B | 1,567,030 | 191.3 | 1,371,131 | 167.4 | 0.87x | --- @@ -43,13 +43,13 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | |---------|----------|------------|-------|-------------|---------------|-------------|---------------| -| 128 B | 8,506 | 7,182 | 0.84x | 114.9 | 135.2 | 161.2 | 189.8 | +| 128 B | 8,316 | 7,128 | 0.86x | 116.7 | 136.4 | 165.8 | 203.5 | ### 10 Clients, 2 Services (Queue Group) | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | |---------|----------|------------|-------|-------------|---------------|-------------|---------------| -| 16 B | 26,610 | 22,533 | 0.85x | 367.7 | 425.3 | 487.4 | 622.5 | +| 16 B | 26,409 | 23,024 | 0.87x | 369.2 | 416.5 | 527.5 | 603.8 | --- @@ -57,10 +57,10 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) | |------|---------|---------|----------|------------|-----------------| -| Synchronous | 16 B | Memory | 13,756 | 9,954 | 0.72x | -| Async (batch) | 128 B | File | 171,761 | 50,711 | 0.30x | +| Synchronous | 16 B | Memory | 13,090 | 9,368 | 0.72x | +| Async (batch) | 128 B | File | 132,869 | 54,750 | 0.41x | -> **Note:** Async file-store publish remains the largest JetStream gap at 0.30x. The bottleneck is still the storage write path and the remaining managed allocation pressure around persisted message state. +> **Note:** Async file-store publish improved to 0.41x in this run, but the storage write path is still the largest publication gap after the FileStore changes. --- @@ -68,10 +68,10 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) | |------|----------|------------|-----------------| -| Ordered ephemeral consumer | 135,704 | 107,168 | 0.79x | -| Durable consumer fetch | 533,441 | 375,652 | 0.70x | +| Ordered ephemeral consumer | 564,226 | 62,192 | 0.11x | +| Durable consumer fetch | 478,634 | 317,563 | 0.66x | -> **Note:** Ordered-consumer results in this run are much closer to parity than earlier snapshots. That suggests prior Go-side variance was material; `.NET` throughput is still clustered around ~107K msg/s. +> **Note:** Ordered-consumer throughput regressed materially in this run. The merged FileStore work helped publish and subject-lookup paths, but ordered consumption remains the clearest JetStream hotspot after this round. --- @@ -81,19 +81,27 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Benchmark | .NET msg/s | .NET MB/s | Alloc | |-----------|------------|-----------|-------| -| SubList Exact Match (128 subjects) | 17,746,607 | 236.9 | 0.00 B/op | -| SubList Wildcard Match | 18,811,278 | 251.2 | 0.00 B/op | -| SubList Queue Match | 20,624,510 | 157.4 | 0.00 B/op | -| SubList Remote Interest | 264,725 | 4.3 | 0.00 B/op | +| SubList Exact Match (128 subjects) | 18,472,815 | 246.6 | 0.00 B/op | +| SubList Wildcard Match | 18,647,671 | 249.0 | 0.00 B/op | +| SubList Queue Match | 19,313,073 | 147.3 | 0.00 B/op | +| SubList Remote Interest | 270,082 | 4.4 | 0.00 B/op | ### Parser | Benchmark | Ops/s | MB/s | Alloc | |-----------|-------|------|-------| -| Parser PING | 5,598,176 | 32.0 | 0.0 B/op | -| Parser PUB | 2,701,645 | 103.1 | 40.0 B/op | -| Parser HPUB | 2,177,745 | 116.3 | 40.0 B/op | -| Parser PUB split payload | 1,702,439 | 64.9 | 176.0 B/op | +| Parser PING | 5,765,742 | 33.0 | 0.0 B/op | +| Parser PUB | 2,542,120 | 97.0 | 40.0 B/op | +| Parser HPUB | 2,151,468 | 114.9 | 40.0 B/op | +| Parser PUB split payload | 1,876,479 | 71.6 | 176.0 B/op | + +### FileStore + +| Benchmark | Ops/s | MB/s | Alloc | +|-----------|-------|------|-------| +| FileStore AppendAsync (128B) | 250,964 | 30.6 | 1550.9 B/op | +| FileStore LoadLastBySubject (hot) | 12,057,199 | 735.9 | 0.0 B/op | +| FileStore PurgeEx+Trim | 328 | 0.0 | 5440792.9 B/op | --- @@ -101,25 +109,25 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra | Category | Ratio Range | Assessment | |----------|-------------|------------| -| Pub-only throughput | 0.56x–0.74x | Mixed — 128 B is solid, 16 B still trails materially | -| Pub/sub (small payload) | **2.95x** | .NET outperforms Go decisively | +| Pub-only throughput | 0.56x–0.65x | Still behind Go on both payload sizes | +| Pub/sub (small payload) | 0.62x | Regression versus the prior run; no longer ahead of Go | | Pub/sub (large payload) | 0.94x | Near parity | -| Fan-out | 0.75x | Good improvement; still limited by serial delivery | -| Multi pub/sub | 0.88x | Close to parity in this run | -| Request/reply latency | 0.84x–0.85x | Good | +| Fan-out | 0.58x | Fan-out remains materially behind Go | +| Multi pub/sub | 0.87x | Close to parity | +| Request/reply latency | 0.86x–0.87x | Good | | JetStream sync publish | 0.72x | Good | -| JetStream async file publish | 0.30x | Storage write path still dominates | -| JetStream ordered consume | 0.79x | Much closer to parity in this run | -| JetStream durable fetch | 0.70x | Good | +| JetStream async file publish | 0.41x | Improved, but still storage-bound | +| JetStream ordered consume | 0.11x | Major regression / highest-priority JetStream gap | +| JetStream durable fetch | 0.66x | Good, but slightly down from the prior run | ### Key Observations -1. **Small-payload 1:1 pub/sub still beats Go by ~3x** (875K vs 296K msg/s). The direct write path continues to pay off when message fanout is simple and payloads are tiny. -2. **Fan-out and multi pub/sub both improved in this run** to 0.75x and 0.88x respectively. The remaining gap is still consistent with Go's more naturally parallel fanout model. -3. **Ordered consumer moved up to 0.79x** (107K vs 136K msg/s). That is materially stronger than earlier runs and suggests previous Go-side variance was distorting the comparison more than the `.NET` consumer path itself. -4. **Durable fetch remains solid at 0.70x**. The Round 6 fetch-path work is still holding, but there is room left in consumer dispatch and storage reads. -5. **Async file-store publish is still the largest server-level gap at 0.30x**. The storage layer remains the highest-value runtime target after parser and SubList hot-path cleanup. -6. **The new SubList microbenchmarks show effectively zero temporary allocation per operation** for exact, wildcard, queue, and remote-interest lookups in the current implementation. Parser contiguous hot paths also remain small and stable, while split-payload `PUB` still pays a higher copy cost. +1. **Async file-store publish improved from the prior 0.30x snapshot to 0.41x** (54.8K vs 132.9K msg/s). That is directionally consistent with the FileStore metadata and payload-ownership work landing in this round. +2. **The new FileStore direct benchmarks show the shape of the remaining storage cost clearly**: hot last-by-subject lookup is effectively allocation-free and very fast, append is still around 1551 B/op, and repeated `PurgeEx+Trim` is still extremely allocation-heavy at roughly 5.4 MB/op. +3. **Ordered consumer throughput is now the dominant JetStream problem at 0.11x** (62K vs 564K msg/s). Whatever helped publish and fetch paths did not carry over to ordered-consumer delivery in this run. +4. **Core pub/sub is no longer showing the earlier small-payload outlier win over Go**. 1:1 16 B came in at 0.62x, fan-out at 0.58x, and multi pub/sub at 0.87x, which is a much more uniform profile. +5. **Durable fetch remains respectable at 0.66x**, but it is slightly softer than the last snapshot and still trails Go by a meaningful margin on the same merged build. +6. **SubList and parser microbenchmarks remain strong and stable**. Exact, wildcard, queue, and remote-interest lookups still allocate essentially nothing, and parser contiguous hot paths remain well below the FileStore and consumer-path gaps. ---