docs: refresh benchmark comparison
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Go vs .NET NATS Server — Benchmark Comparison
|
||||
|
||||
Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly.
|
||||
Benchmark run: 2026-03-13 11:37 AM America/Indiana/Indianapolis. Both servers ran on the same machine using the benchmark project README command (`dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal --logger "console;verbosity=detailed"`). Test parallelization remained disabled inside the benchmark assembly.
|
||||
|
||||
**Environment:** Apple M4, .NET SDK 10.0.101, benchmark README command run in the benchmark project's default `Debug` configuration, Go toolchain installed, Go reference server built from `golang/nats-server/`.
|
||||
|
||||
@@ -13,27 +13,27 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|
||||
|---------|----------|---------|------------|-----------|-----------------|
|
||||
| 16 B | 2,258,647 | 34.5 | 1,275,230 | 19.5 | 0.56x |
|
||||
| 128 B | 2,251,274 | 274.8 | 1,661,668 | 202.8 | 0.74x |
|
||||
| 16 B | 2,837,040 | 43.3 | 1,856,572 | 28.3 | 0.65x |
|
||||
| 128 B | 2,778,511 | 339.2 | 1,542,298 | 188.3 | 0.56x |
|
||||
|
||||
### Publisher + Subscriber (1:1)
|
||||
|
||||
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|
||||
|---------|----------|---------|------------|-----------|-----------------|
|
||||
| 16 B | 296,374 | 4.5 | 875,105 | 13.4 | **2.95x** |
|
||||
| 16 KB | 32,111 | 501.7 | 30,030 | 469.2 | 0.94x |
|
||||
| 16 B | 1,442,273 | 22.0 | 888,155 | 13.6 | 0.62x |
|
||||
| 16 KB | 33,013 | 515.8 | 31,068 | 485.4 | 0.94x |
|
||||
|
||||
### Fan-Out (1 Publisher : 4 Subscribers)
|
||||
|
||||
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|
||||
|---------|----------|---------|------------|-----------|-----------------|
|
||||
| 128 B | 2,387,889 | 291.5 | 1,780,888 | 217.4 | 0.75x |
|
||||
| 128 B | 2,981,804 | 364.0 | 1,729,483 | 211.1 | 0.58x |
|
||||
|
||||
### Multi-Publisher / Multi-Subscriber (4P x 4S)
|
||||
|
||||
| Payload | Go msg/s | Go MB/s | .NET msg/s | .NET MB/s | Ratio (.NET/Go) |
|
||||
|---------|----------|---------|------------|-----------|-----------------|
|
||||
| 128 B | 1,079,112 | 131.7 | 953,596 | 116.4 | 0.88x |
|
||||
| 128 B | 1,567,030 | 191.3 | 1,371,131 | 167.4 | 0.87x |
|
||||
|
||||
---
|
||||
|
||||
@@ -43,13 +43,13 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
|
||||
|---------|----------|------------|-------|-------------|---------------|-------------|---------------|
|
||||
| 128 B | 8,506 | 7,182 | 0.84x | 114.9 | 135.2 | 161.2 | 189.8 |
|
||||
| 128 B | 8,316 | 7,128 | 0.86x | 116.7 | 136.4 | 165.8 | 203.5 |
|
||||
|
||||
### 10 Clients, 2 Services (Queue Group)
|
||||
|
||||
| Payload | Go msg/s | .NET msg/s | Ratio | Go P50 (us) | .NET P50 (us) | Go P99 (us) | .NET P99 (us) |
|
||||
|---------|----------|------------|-------|-------------|---------------|-------------|---------------|
|
||||
| 16 B | 26,610 | 22,533 | 0.85x | 367.7 | 425.3 | 487.4 | 622.5 |
|
||||
| 16 B | 26,409 | 23,024 | 0.87x | 369.2 | 416.5 | 527.5 | 603.8 |
|
||||
|
||||
---
|
||||
|
||||
@@ -57,10 +57,10 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Mode | Payload | Storage | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
|
||||
|------|---------|---------|----------|------------|-----------------|
|
||||
| Synchronous | 16 B | Memory | 13,756 | 9,954 | 0.72x |
|
||||
| Async (batch) | 128 B | File | 171,761 | 50,711 | 0.30x |
|
||||
| Synchronous | 16 B | Memory | 13,090 | 9,368 | 0.72x |
|
||||
| Async (batch) | 128 B | File | 132,869 | 54,750 | 0.41x |
|
||||
|
||||
> **Note:** Async file-store publish remains the largest JetStream gap at 0.30x. The bottleneck is still the storage write path and the remaining managed allocation pressure around persisted message state.
|
||||
> **Note:** Async file-store publish improved to 0.41x in this run, but the storage write path is still the largest publication gap after the FileStore changes.
|
||||
|
||||
---
|
||||
|
||||
@@ -68,10 +68,10 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Mode | Go msg/s | .NET msg/s | Ratio (.NET/Go) |
|
||||
|------|----------|------------|-----------------|
|
||||
| Ordered ephemeral consumer | 135,704 | 107,168 | 0.79x |
|
||||
| Durable consumer fetch | 533,441 | 375,652 | 0.70x |
|
||||
| Ordered ephemeral consumer | 564,226 | 62,192 | 0.11x |
|
||||
| Durable consumer fetch | 478,634 | 317,563 | 0.66x |
|
||||
|
||||
> **Note:** Ordered-consumer results in this run are much closer to parity than earlier snapshots. That suggests prior Go-side variance was material; `.NET` throughput is still clustered around ~107K msg/s.
|
||||
> **Note:** Ordered-consumer throughput regressed materially in this run. The merged FileStore work helped publish and subject-lookup paths, but ordered consumption remains the clearest JetStream hotspot after this round.
|
||||
|
||||
---
|
||||
|
||||
@@ -81,19 +81,27 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Benchmark | .NET msg/s | .NET MB/s | Alloc |
|
||||
|-----------|------------|-----------|-------|
|
||||
| SubList Exact Match (128 subjects) | 17,746,607 | 236.9 | 0.00 B/op |
|
||||
| SubList Wildcard Match | 18,811,278 | 251.2 | 0.00 B/op |
|
||||
| SubList Queue Match | 20,624,510 | 157.4 | 0.00 B/op |
|
||||
| SubList Remote Interest | 264,725 | 4.3 | 0.00 B/op |
|
||||
| SubList Exact Match (128 subjects) | 18,472,815 | 246.6 | 0.00 B/op |
|
||||
| SubList Wildcard Match | 18,647,671 | 249.0 | 0.00 B/op |
|
||||
| SubList Queue Match | 19,313,073 | 147.3 | 0.00 B/op |
|
||||
| SubList Remote Interest | 270,082 | 4.4 | 0.00 B/op |
|
||||
|
||||
### Parser
|
||||
|
||||
| Benchmark | Ops/s | MB/s | Alloc |
|
||||
|-----------|-------|------|-------|
|
||||
| Parser PING | 5,598,176 | 32.0 | 0.0 B/op |
|
||||
| Parser PUB | 2,701,645 | 103.1 | 40.0 B/op |
|
||||
| Parser HPUB | 2,177,745 | 116.3 | 40.0 B/op |
|
||||
| Parser PUB split payload | 1,702,439 | 64.9 | 176.0 B/op |
|
||||
| Parser PING | 5,765,742 | 33.0 | 0.0 B/op |
|
||||
| Parser PUB | 2,542,120 | 97.0 | 40.0 B/op |
|
||||
| Parser HPUB | 2,151,468 | 114.9 | 40.0 B/op |
|
||||
| Parser PUB split payload | 1,876,479 | 71.6 | 176.0 B/op |
|
||||
|
||||
### FileStore
|
||||
|
||||
| Benchmark | Ops/s | MB/s | Alloc |
|
||||
|-----------|-------|------|-------|
|
||||
| FileStore AppendAsync (128B) | 250,964 | 30.6 | 1550.9 B/op |
|
||||
| FileStore LoadLastBySubject (hot) | 12,057,199 | 735.9 | 0.0 B/op |
|
||||
| FileStore PurgeEx+Trim | 328 | 0.0 | 5440792.9 B/op |
|
||||
|
||||
---
|
||||
|
||||
@@ -101,25 +109,25 @@ Benchmark run: 2026-03-13 10:16 AM America/Indiana/Indianapolis. Both servers ra
|
||||
|
||||
| Category | Ratio Range | Assessment |
|
||||
|----------|-------------|------------|
|
||||
| Pub-only throughput | 0.56x–0.74x | Mixed — 128 B is solid, 16 B still trails materially |
|
||||
| Pub/sub (small payload) | **2.95x** | .NET outperforms Go decisively |
|
||||
| Pub-only throughput | 0.56x–0.65x | Still behind Go on both payload sizes |
|
||||
| Pub/sub (small payload) | 0.62x | Regression versus the prior run; no longer ahead of Go |
|
||||
| Pub/sub (large payload) | 0.94x | Near parity |
|
||||
| Fan-out | 0.75x | Good improvement; still limited by serial delivery |
|
||||
| Multi pub/sub | 0.88x | Close to parity in this run |
|
||||
| Request/reply latency | 0.84x–0.85x | Good |
|
||||
| Fan-out | 0.58x | Fan-out remains materially behind Go |
|
||||
| Multi pub/sub | 0.87x | Close to parity |
|
||||
| Request/reply latency | 0.86x–0.87x | Good |
|
||||
| JetStream sync publish | 0.72x | Good |
|
||||
| JetStream async file publish | 0.30x | Storage write path still dominates |
|
||||
| JetStream ordered consume | 0.79x | Much closer to parity in this run |
|
||||
| JetStream durable fetch | 0.70x | Good |
|
||||
| JetStream async file publish | 0.41x | Improved, but still storage-bound |
|
||||
| JetStream ordered consume | 0.11x | Major regression / highest-priority JetStream gap |
|
||||
| JetStream durable fetch | 0.66x | Good, but slightly down from the prior run |
|
||||
|
||||
### Key Observations
|
||||
|
||||
1. **Small-payload 1:1 pub/sub still beats Go by ~3x** (875K vs 296K msg/s). The direct write path continues to pay off when message fanout is simple and payloads are tiny.
|
||||
2. **Fan-out and multi pub/sub both improved in this run** to 0.75x and 0.88x respectively. The remaining gap is still consistent with Go's more naturally parallel fanout model.
|
||||
3. **Ordered consumer moved up to 0.79x** (107K vs 136K msg/s). That is materially stronger than earlier runs and suggests previous Go-side variance was distorting the comparison more than the `.NET` consumer path itself.
|
||||
4. **Durable fetch remains solid at 0.70x**. The Round 6 fetch-path work is still holding, but there is room left in consumer dispatch and storage reads.
|
||||
5. **Async file-store publish is still the largest server-level gap at 0.30x**. The storage layer remains the highest-value runtime target after parser and SubList hot-path cleanup.
|
||||
6. **The new SubList microbenchmarks show effectively zero temporary allocation per operation** for exact, wildcard, queue, and remote-interest lookups in the current implementation. Parser contiguous hot paths also remain small and stable, while split-payload `PUB` still pays a higher copy cost.
|
||||
1. **Async file-store publish improved from the prior 0.30x snapshot to 0.41x** (54.8K vs 132.9K msg/s). That is directionally consistent with the FileStore metadata and payload-ownership work landing in this round.
|
||||
2. **The new FileStore direct benchmarks show the shape of the remaining storage cost clearly**: hot last-by-subject lookup is effectively allocation-free and very fast, append is still around 1551 B/op, and repeated `PurgeEx+Trim` is still extremely allocation-heavy at roughly 5.4 MB/op.
|
||||
3. **Ordered consumer throughput is now the dominant JetStream problem at 0.11x** (62K vs 564K msg/s). Whatever helped publish and fetch paths did not carry over to ordered-consumer delivery in this run.
|
||||
4. **Core pub/sub is no longer showing the earlier small-payload outlier win over Go**. 1:1 16 B came in at 0.62x, fan-out at 0.58x, and multi pub/sub at 0.87x, which is a much more uniform profile.
|
||||
5. **Durable fetch remains respectable at 0.66x**, but it is slightly softer than the last snapshot and still trails Go by a meaningful margin on the same merged build.
|
||||
6. **SubList and parser microbenchmarks remain strong and stable**. Exact, wildcard, queue, and remote-interest lookups still allocate essentially nothing, and parser contiguous hot paths remain well below the FileStore and consumer-path gaps.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user