Files
natsdotnet/benchmarks_comparison.md
Joseph Doherty 4de691c9c5 perf: add FileStore buffered writes, O(1) state tracking, and eliminate redundant per-publish work
Implement Go-parity background flush loop (coalesce 16KB/8ms) in MsgBlock/FileStore,
replace O(n) GetStateAsync with incremental counters, skip PruneExpired/LoadAsync/
PrunePerSubject when not needed, and bypass RAFT for single-replica streams. Fix counter
tracking bugs in RemoveMsg/EraseMsg/TTL expiry and ObjectDisposedException races in
flush loop disposal. FileStore optimizations verified with 3112/3112 JetStream tests
passing; async publish benchmark remains at ~174 msg/s due to E2E protocol path bottleneck.
2026-03-13 03:11:11 -04:00

4.6 KiB
Raw Blame History

Go vs .NET NATS Server — Benchmark Comparison

Benchmark run: 2026-03-13. Both servers running on the same machine, tested with identical NATS.Client.Core workloads. Test parallelization disabled to avoid resource contention.

Environment: Apple M4, .NET 10, Go nats-server (latest from golang/nats-server/).


Core NATS — Pub/Sub Throughput

Single Publisher (no subscribers)

Payload Go msg/s Go MB/s .NET msg/s .NET MB/s Ratio (.NET/Go)
16 B 2,436,416 37.2 1,425,767 21.8 0.59x
128 B 2,143,434 261.6 1,654,692 202.0 0.77x

Publisher + Subscriber (1:1)

Payload Go msg/s Go MB/s .NET msg/s .NET MB/s Ratio (.NET/Go)
16 B 1,140,225 17.4 207,654 3.2 0.18x
16 KB 41,762 652.5 34,429 538.0 0.82x

Fan-Out (1 Publisher : 4 Subscribers)

Payload Go msg/s Go MB/s .NET msg/s .NET MB/s Ratio (.NET/Go)
128 B 3,192,313 389.7 581,284 71.0 0.18x

Multi-Publisher / Multi-Subscriber (4P x 4S)

Payload Go msg/s Go MB/s .NET msg/s .NET MB/s Ratio (.NET/Go)
128 B 269,445 32.9 529,808 64.7 1.97x

Core NATS — Request/Reply Latency

Single Client, Single Service

Payload Go msg/s .NET msg/s Ratio Go P50 (us) .NET P50 (us) Go P99 (us) .NET P99 (us)
128 B 9,347 7,215 0.77x 104.5 134.7 146.2 190.5

10 Clients, 2 Services (Queue Group)

Payload Go msg/s .NET msg/s Ratio Go P50 (us) .NET P50 (us) Go P99 (us) .NET P99 (us)
16 B 30,893 25,861 0.84x 315.0 370.2 451.1 595.0

JetStream — Publication

Mode Payload Storage Go msg/s .NET msg/s Ratio (.NET/Go)
Synchronous 16 B Memory 16,783 13,815 0.82x
Async (batch) 128 B File 210,387 174 0.00x

Note: Async file store publish remains extremely slow after FileStore-level optimizations (buffered writes, O(1) state tracking, redundant work elimination). The bottleneck is in the E2E network/protocol processing path (synchronous .GetAwaiter().GetResult() calls in the client read loop), not storage I/O.


JetStream — Consumption

Mode Go msg/s .NET msg/s Ratio (.NET/Go)
Ordered ephemeral consumer 109,519 N/A N/A
Durable consumer fetch 639,247 80,792 0.13x

Note: Ordered ephemeral consumer is not yet fully supported on the .NET server (API timeout during consumer creation).


Summary

Category Ratio Range Assessment
Pub-only throughput 0.59x0.77x Good — within 2x
Pub/sub (large payload) 0.82x Good
Pub/sub (small payload) 0.18x Needs optimization
Fan-out 0.18x Needs optimization
Multi pub/sub 1.97x .NET faster (likely measurement artifact at low counts)
Request/reply latency 0.77x0.84x Good
JetStream sync publish 0.82x Good
JetStream async file publish ~0x Broken — E2E protocol path bottleneck
JetStream durable fetch 0.13x Needs optimization

Key Observations

  1. Pub-only and request/reply are within striking distance (0.6x0.85x), suggesting the core message path is reasonably well ported.
  2. Small-payload pub/sub and fan-out are 5x slower (0.18x ratio). The bottleneck is likely in the subscription dispatch / message delivery hot path — the SubList.Match()MSG write loop.
  3. JetStream file store async publish remains at ~174 msg/s despite FileStore-level optimizations (buffered writes with background flush loop, O(1) state tracking, eliminating redundant per-publish work). The bottleneck is in the E2E network/protocol processing path — synchronous .GetAwaiter().GetResult() calls in the client read loop block the async pipeline.
  4. JetStream consumption (durable fetch) is 8x slower than Go. Ordered consumers don't work yet.
  5. The multi-pub/sub result showing .NET faster is likely a measurement artifact from the small message count (2,000 per publisher) — not representative at scale.