Files

Joseph Doherty 37575dc41c feat: add benchmark test project for Go vs .NET server comparison

Side-by-side performance benchmarks using NATS.Client.Core against both
servers on ephemeral ports. Includes core pub/sub, request/reply latency,
and JetStream throughput tests with comparison output and
benchmarks_comparison.md results. Also fixes timestamp flakiness in
StoreInterfaceTests by using explicit timestamps.

2026-03-13 01:23:31 -04:00

4.1 KiB

Raw Blame History

NATS Go Server — Reference Benchmark Numbers

Typical throughput and latency figures for the Go NATS server, collected from official documentation and community benchmarks. These serve as performance targets for the .NET port.

Test Environment

Official NATS docs benchmarks were run on an Apple M4 (10 cores: 4P + 6E, 16 GB RAM). Numbers will vary by hardware — treat these as order-of-magnitude targets, not exact goals.

Core NATS — Pub/Sub Throughput

All figures use the nats bench tool with default 16-byte messages unless noted.

Single Publisher (no subscribers)

Messages	Payload	Throughput	Latency
1M	16 B	14,786,683 msgs/sec (~226 MiB/s)	0.07 us

Publisher + Subscriber (1:1)

Messages	Payload	Throughput (each side)	Latency
1M	16 B	~4,927,000 msgs/sec (~75 MiB/s)	0.20 us
100K	16 KB	~228,000 msgs/sec (~3.5 GiB/s)	4.3 us

Fan-Out (1 Publisher : N Subscribers)

Subscribers	Payload	Per-Subscriber Rate	Aggregate	Latency
4	128 B	~1,010,000 msgs/sec	4,015,923 msgs/sec (~490 MiB/s)	~1.0 us

Multi-Publisher / Multi-Subscriber (N:M)

Config	Payload	Pub Aggregate	Sub Aggregate	Pub Latency	Sub Latency
4P x 4S	128 B	1,080,144 msgs/sec (~132 MiB/s)	4,323,201 msgs/sec (~528 MiB/s)	3.7 us	0.93 us

Core NATS — Request/Reply Latency

Config	Payload	Throughput	Avg Latency
1 client, 1 service	128 B	19,659 msgs/sec	50.9 us
50 clients, 2 services	16 B	132,438 msgs/sec	~370 us

Core NATS — Tail Latency Under Load

From the Brave New Geek latency benchmarks (loopback, request/reply):

Payload	Rate	p99.99	p99.9999	Notes
256 B	3,000 req/s	sub-ms	sub-ms	Minimal load
1 KB	3,000 req/s	sub-ms	~1.2 ms
5 KB	2,000 req/s	sub-ms	~1.2 ms
1 KB	20,000 req/s (25 conns)	elevated	~90 ms	Concurrent load
1 MB	100 req/s	~214 ms	—	Large payload tail

A protocol parser optimization improved 5 KB latencies by ~30% and 1 MB latencies by ~90% up to p90.

JetStream — Publication

Mode	Payload	Storage	Throughput	Latency
Synchronous	16 B	Memory	35,734 msgs/sec (~558 KiB/s)	28.0 us
Batch (1000 msgs)	16 B	Memory	627,430 msgs/sec (~9.6 MiB/s)	1.6 us
Async	128 B	File	403,828 msgs/sec (~49 MiB/s)	2.5 us

JetStream — Consumption

Mode	Clients	Throughput	Latency
Ordered ephemeral consumer	1	1,201,540 msgs/sec (~147 MiB/s)	0.83 us
Durable consumer (callback)	4	290,438 msgs/sec (~36 MiB/s)	13.7 us
Durable consumer fetch (no ack)	2	1,128,932 msgs/sec (~138 MiB/s)	1.76 us
Direct sync get	1	33,244 msgs/sec (~4.1 MiB/s)	30.1 us
Batched get	2	1,000,898 msgs/sec (~122 MiB/s)	—

JetStream — Key-Value Store

Operation	Clients	Payload	Throughput	Latency
Sync put	1	128 B	30,067 msgs/sec (~3.7 MiB/s)	33.3 us
Get (randomized keys)	16	128 B	102,844 msgs/sec (~13 MiB/s)	~153 us

Resource Usage

Scenario	RSS Memory
Core NATS at 2M msgs/sec (1 pub + 1 sub)	~11 MB
JetStream production (recommended minimum)	4 CPU cores, 8 GiB RAM

4.1 KiB Raw Blame History