docs: refresh benchmark comparison

This commit is contained in:
Joseph Doherty
2026-03-13 11:42:39 -04:00
parent 4b15f643f6
commit 497aa227af

View File

@@ -1,6 +1,6 @@
# Go vs .NET NATS Server — Benchmark Comparison # Go vs .NET NATS Server — Benchmark Comparison
Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly. Benchmark run: 2026-03-13 11:41 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly.
**Environment:** Apple M4, .NET SDK 10.0.101, benchmark README command run in the benchmark project's default `Debug` configuration, Go toolchain installed, Go reference server built from `golang/nats-server/`. **Environment:** Apple M4, .NET SDK 10.0.101, benchmark README command run in the benchmark project's default `Debug` configuration, Go toolchain installed, Go reference server built from `golang/nats-server/`.
@@ -13,27 +13,27 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|---------|----------|---------|------------|-----------|-----------------| |---------|----------|---------|------------|-----------|-----------------|
| 16 B | 2,837,040 | 43.3 | 1,856,572 | 28.3 | 0.65x | | 16 B | 2,223,690 | 33.9 | 1,341,067 | 20.5 | 0.60x |
| 128 B | 2,778,511 | 339.2 | 1,542,298 | 188.3 | 0.56x | | 128 B | 2,218,308 | 270.8 | 1,577,523 | 192.6 | 0.71x |
### Publisher + Subscriber (1:1) ### Publisher + Subscriber (1:1)
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|---------|----------|---------|------------|-----------|-----------------| |---------|----------|---------|------------|-----------|-----------------|
| 16 B | 1,442,273 | 22.0 | 888,155 | 13.6 | 0.62x | | 16 B | 292,711 | 4.5 | 862,381 | 13.2 | **2.95x** |
| 16 KB | 33,013 | 515.8 | 31,068 | 485.4 | 0.94x | | 16 KB | 32,890 | 513.9 | 28,906 | 451.7 | 0.88x |
### Fan-Out (1 Publisher : 4 Subscribers) ### Fan-Out (1 Publisher : 4 Subscribers)
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|---------|----------|---------|------------|-----------|-----------------| |---------|----------|---------|------------|-----------|-----------------|
| 128 B | 2,981,804 | 364.0 | 1,729,483 | 211.1 | 0.58x | | 128 B | 2,945,790 | 359.6 | 1,858,235 | 226.8 | 0.63x |
### Multi-Publisher / Multi-Subscriber (4P x 4S) ### Multi-Publisher / Multi-Subscriber (4P x 4S)
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) | | Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|---------|----------|---------|------------|-----------|-----------------| |---------|----------|---------|------------|-----------|-----------------|
| 128 B | 1,567,030 | 191.3 | 1,371,131 | 167.4 | 0.87x | | 128 B | 2,123,480 | 259.2 | 1,392,249 | 170.0 | 0.66x |
--- ---
@@ -43,13 +43,13 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
|---------|----------|------------|-------|-------------|---------------|-------------|---------------| |---------|----------|------------|-------|-------------|---------------|-------------|---------------|
| 128 B | 8,316 | 7,128 | 0.86x | 116.7 | 136.4 | 165.8 | 203.5 | | 128 B | 8,386 | 7,014 | 0.84x | 115.8 | 139.0 | 175.5 | 193.0 |
### 10 Clients, 2 Services (Queue Group) ### 10 Clients, 2 Services (Queue Group)
| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) | | Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
|---------|----------|------------|-------|-------------|---------------|-------------|---------------| |---------|----------|------------|-------|-------------|---------------|-------------|---------------|
| 16 B | 26,409 | 23,024 | 0.87x | 369.2 | 416.5 | 527.5 | 603.8 | | 16 B | 26,470 | 23,478 | 0.89x | 370.2 | 410.6 | 486.0 | 592.8 |
--- ---
@@ -57,10 +57,10 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) | | Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
|------|---------|---------|----------|------------|-----------------| |------|---------|---------|----------|------------|-----------------|
| Synchronous | 16 B | Memory | 13,090 | 9,368 | 0.72x | | Synchronous | 16 B | Memory | 14,812 | 12,134 | 0.82x |
| Async (batch) | 128 B | File | 132,869 | 54,750 | 0.41x | | Async (batch) | 128 B | File | 148,156 | 57,479 | 0.39x |
> **Note:** Async file-store publish improved to 0.41x in this run, but the storage write path is still the largest publication gap after the FileStore changes. > **Note:** Async file-store publish remains well below parity at 0.39x, but it is still materially better than the older 0.30x snapshot that motivated this FileStore round.
--- ---
@@ -68,10 +68,10 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) | | Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
|------|----------|------------|-----------------| |------|----------|------------|-----------------|
| Ordered ephemeral consumer | 564,226 | 62,192 | 0.11x | | Ordered ephemeral consumer | 572,941 | 101,944 | 0.18x |
| Durable consumer fetch | 478,634 | 317,563 | 0.66x | | Durable consumer fetch | 599,204 | 338,265 | 0.56x |
> **Note:** Ordered-consumer throughput regressed materially in this run. The merged FileStore work helped publish and subject-lookup paths, but ordered consumption remains the clearest JetStream hotspot after this round. > **Note:** Ordered-consumer throughput remains the clearest JetStream hotspot after this round. The merged FileStore work helped publish and subject-lookup paths more than consumer delivery.
--- ---
@@ -81,27 +81,27 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Benchmark | .NET msg/s | .NET MB/s | Alloc | | Benchmark | .NET msg/s | .NET MB/s | Alloc |
|-----------|------------|-----------|-------| |-----------|------------|-----------|-------|
| SubList Exact Match (128 subjects) | 18,472,815 | 246.6 | 0.00 B/op | | SubList Exact Match (128 subjects) | 16,497,186 | 220.3 | 0.00 B/op |
| SubList Wildcard Match | 18,647,671 | 249.0 | 0.00 B/op | | SubList Wildcard Match | 16,147,367 | 215.6 | 0.00 B/op |
| SubList Queue Match | 19,313,073 | 147.3 | 0.00 B/op | | SubList Queue Match | 15,582,052 | 118.9 | 0.00 B/op |
| SubList Remote Interest | 270,082 | 4.4 | 0.00 B/op | | SubList Remote Interest | 259,940 | 4.2 | 0.00 B/op |
### Parser ### Parser
| Benchmark | Ops/s | MB/s | Alloc | | Benchmark | Ops/s | MB/s | Alloc |
|-----------|-------|------|-------| |-----------|-------|------|-------|
| Parser PING | 5,765,742 | 33.0 | 0.0 B/op | | Parser PING | 6,283,578 | 36.0 | 0.0 B/op |
| Parser PUB | 2,542,120 | 97.0 | 40.0 B/op | | Parser PUB | 2,712,550 | 103.5 | 40.0 B/op |
| Parser HPUB | 2,151,468 | 114.9 | 40.0 B/op | | Parser HPUB | 2,338,555 | 124.9 | 40.0 B/op |
| Parser PUB split payload | 1,876,479 | 71.6 | 176.0 B/op | | Parser PUB split payload | 2,043,813 | 78.0 | 176.0 B/op |
### FileStore ### FileStore
| Benchmark | Ops/s | MB/s | Alloc | | Benchmark | Ops/s | MB/s | Alloc |
|-----------|-------|------|-------| |-----------|-------|------|-------|
| FileStore AppendAsync (128B) | 250,964 | 30.6 | 1550.9 B/op | | FileStore AppendAsync (128B) | 244,089 | 29.8 | 1552.9 B/op |
| FileStore LoadLastBySubject (hot) | 12,057,199 | 735.9 | 0.0 B/op | | FileStore LoadLastBySubject (hot) | 12,784,127 | 780.3 | 0.0 B/op |
| FileStore PurgeEx+Trim | 328 | 0.0 | 5440792.9 B/op | | FileStore PurgeEx+Trim | 332 | 0.0 | 5440792.9 B/op |
--- ---
@@ -109,25 +109,25 @@ Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ra
| Category | Ratio Range | Assessment | | Category | Ratio Range | Assessment |
|----------|-------------|------------| |----------|-------------|------------|
| Pub-only throughput | 0.56x0.65x | Still behind Go on both payload sizes | | Pub-only throughput | 0.60x0.71x | Mixed; still behind Go |
| Pub/sub (small payload) | 0.62x | Regression versus the prior run; no longer ahead of Go | | Pub/sub (small payload) | **2.95x** | .NET outperforms Go decisively |
| Pub/sub (large payload) | 0.94x | Near parity | | Pub/sub (large payload) | 0.88x | Close, but below parity |
| Fan-out | 0.58x | Fan-out remains materially behind Go | | Fan-out | 0.63x | Still materially behind Go |
| Multi pub/sub | 0.87x | Close to parity | | Multi pub/sub | 0.66x | Meaningful gap remains |
| Request/reply latency | 0.86x0.87x | Good | | Request/reply latency | 0.84x0.89x | Good |
| JetStream sync publish | 0.72x | Good | | JetStream sync publish | 0.82x | Strong |
| JetStream async file publish | 0.41x | Improved, but still storage-bound | | JetStream async file publish | 0.39x | Improved versus older snapshots, still storage-bound |
| JetStream ordered consume | 0.11x | Major regression / highest-priority JetStream gap | | JetStream ordered consume | 0.18x | Highest-priority JetStream gap |
| JetStream durable fetch | 0.66x | Good, but slightly down from the prior run | | JetStream durable fetch | 0.56x | Regressed from prior snapshot |
### Key Observations ### Key Observations
1. **Async file-store publish improved from the prior 0.30x snapshot to 0.41x** (54.8K vs 132.9K msg/s). That is directionally consistent with the FileStore metadata and payload-ownership work landing in this round. 1. **Small-payload 1:1 pub/sub is back to a large `.NET` lead in this final run** at 2.95x (862K vs 293K msg/s). That puts the merged benchmark profile much closer to the earlier comparison snapshot than the intermediate integration-only run.
2. **The new FileStore direct benchmarks show the shape of the remaining storage cost clearly**: hot last-by-subject lookup is effectively allocation-free and very fast, append is still around 1551 B/op, and repeated `PurgeEx+Trim` is still extremely allocation-heavy at roughly 5.4 MB/op. 2. **Async file-store publish is still materially better than the older 0.30x baseline** at 0.39x (57.5K vs 148.2K msg/s), which is consistent with the FileStore metadata and payload-ownership changes helping the write path even though they did not eliminate the gap.
3. **Ordered consumer throughput is now the dominant JetStream problem at 0.11x** (62K vs 564K msg/s). Whatever helped publish and fetch paths did not carry over to ordered-consumer delivery in this run. 3. **The new FileStore direct benchmarks show what remains expensive in storage maintenance**: `LoadLastBySubject` is allocation-free and extremely fast, `AppendAsync` is still about 1553 B/op, and repeated `PurgeEx+Trim` still burns roughly 5.4 MB/op.
4. **Core pub/sub is no longer showing the earlier small-payload outlier win over Go**. 1:1 16 B came in at 0.62x, fan-out at 0.58x, and multi pub/sub at 0.87x, which is a much more uniform profile. 4. **Ordered consumer throughput remains the largest JetStream gap at 0.18x** (102K vs 573K msg/s). That is better than the intermediate 0.11x run, but it is still the clearest post-FileStore optimization target.
5. **Durable fetch remains respectable at 0.66x**, but it is slightly softer than the last snapshot and still trails Go by a meaningful margin on the same merged build. 5. **Durable fetch regressed to 0.56x in the final run**, which keeps consumer delivery and storage-read coordination in the top tier of remaining work even after the FileStore changes.
6. **SubList and parser microbenchmarks remain strong and stable**. Exact, wildcard, queue, and remote-interest lookups still allocate essentially nothing, and parser contiguous hot paths remain well below the FileStore and consumer-path gaps. 6. **Parser and SubList microbenchmarks remain stable and low-allocation**. The storage and consumer layers continue to dominate the server-level benchmark gaps, not the parser or subject matcher hot paths.
--- ---