Side-by-side performance benchmarks using NATS.Client.Core against both servers on ephemeral ports. Includes core pub/sub, request/reply latency, and JetStream throughput tests with comparison output and benchmarks_comparison.md results. Also fixes timestamp flakiness in StoreInterfaceTests by using explicit timestamps.
113 lines
4.1 KiB
Markdown
113 lines
4.1 KiB
Markdown
# NATS Go Server — Reference Benchmark Numbers
|
|
|
|
Typical throughput and latency figures for the Go NATS server, collected from official documentation and community benchmarks. These serve as performance targets for the .NET port.
|
|
|
|
## Test Environment
|
|
|
|
Official NATS docs benchmarks were run on an Apple M4 (10 cores: 4P + 6E, 16 GB RAM). Numbers will vary by hardware — treat these as order-of-magnitude targets, not exact goals.
|
|
|
|
---
|
|
|
|
## Core NATS — Pub/Sub Throughput
|
|
|
|
All figures use the `nats bench` tool with default 16-byte messages unless noted.
|
|
|
|
### Single Publisher (no subscribers)
|
|
|
|
| Messages | Payload | Throughput | Latency |
|
|
|----------|---------|------------|---------|
|
|
| 1M | 16 B | 14,786,683 msgs/sec (~226 MiB/s) | 0.07 us |
|
|
|
|
### Publisher + Subscriber (1:1)
|
|
|
|
| Messages | Payload | Throughput (each side) | Latency |
|
|
|----------|---------|------------------------|---------|
|
|
| 1M | 16 B | ~4,927,000 msgs/sec (~75 MiB/s) | 0.20 us |
|
|
| 100K | 16 KB | ~228,000 msgs/sec (~3.5 GiB/s) | 4.3 us |
|
|
|
|
### Fan-Out (1 Publisher : N Subscribers)
|
|
|
|
| Subscribers | Payload | Per-Subscriber Rate | Aggregate | Latency |
|
|
|-------------|---------|---------------------|-----------|---------|
|
|
| 4 | 128 B | ~1,010,000 msgs/sec | 4,015,923 msgs/sec (~490 MiB/s) | ~1.0 us |
|
|
|
|
### Multi-Publisher / Multi-Subscriber (N:M)
|
|
|
|
| Config | Payload | Pub Aggregate | Sub Aggregate | Pub Latency | Sub Latency |
|
|
|--------|---------|---------------|---------------|-------------|-------------|
|
|
| 4P x 4S | 128 B | 1,080,144 msgs/sec (~132 MiB/s) | 4,323,201 msgs/sec (~528 MiB/s) | 3.7 us | 0.93 us |
|
|
|
|
---
|
|
|
|
## Core NATS — Request/Reply Latency
|
|
|
|
| Config | Payload | Throughput | Avg Latency |
|
|
|--------|---------|------------|-------------|
|
|
| 1 client, 1 service | 128 B | 19,659 msgs/sec | 50.9 us |
|
|
| 50 clients, 2 services | 16 B | 132,438 msgs/sec | ~370 us |
|
|
|
|
---
|
|
|
|
## Core NATS — Tail Latency Under Load
|
|
|
|
From the Brave New Geek latency benchmarks (loopback, request/reply):
|
|
|
|
| Payload | Rate | p99.99 | p99.9999 | Notes |
|
|
|---------|------|--------|----------|-------|
|
|
| 256 B | 3,000 req/s | sub-ms | sub-ms | Minimal load |
|
|
| 1 KB | 3,000 req/s | sub-ms | ~1.2 ms | |
|
|
| 5 KB | 2,000 req/s | sub-ms | ~1.2 ms | |
|
|
| 1 KB | 20,000 req/s (25 conns) | elevated | ~90 ms | Concurrent load |
|
|
| 1 MB | 100 req/s | ~214 ms | — | Large payload tail |
|
|
|
|
A protocol parser optimization improved 5 KB latencies by ~30% and 1 MB latencies by ~90% up to p90.
|
|
|
|
---
|
|
|
|
## JetStream — Publication
|
|
|
|
| Mode | Payload | Storage | Throughput | Latency |
|
|
|------|---------|---------|------------|---------|
|
|
| Synchronous | 16 B | Memory | 35,734 msgs/sec (~558 KiB/s) | 28.0 us |
|
|
| Batch (1000 msgs) | 16 B | Memory | 627,430 msgs/sec (~9.6 MiB/s) | 1.6 us |
|
|
| Async | 128 B | File | 403,828 msgs/sec (~49 MiB/s) | 2.5 us |
|
|
|
|
---
|
|
|
|
## JetStream — Consumption
|
|
|
|
| Mode | Clients | Throughput | Latency |
|
|
|------|---------|------------|---------|
|
|
| Ordered ephemeral consumer | 1 | 1,201,540 msgs/sec (~147 MiB/s) | 0.83 us |
|
|
| Durable consumer (callback) | 4 | 290,438 msgs/sec (~36 MiB/s) | 13.7 us |
|
|
| Durable consumer fetch (no ack) | 2 | 1,128,932 msgs/sec (~138 MiB/s) | 1.76 us |
|
|
| Direct sync get | 1 | 33,244 msgs/sec (~4.1 MiB/s) | 30.1 us |
|
|
| Batched get | 2 | 1,000,898 msgs/sec (~122 MiB/s) | — |
|
|
|
|
---
|
|
|
|
## JetStream — Key-Value Store
|
|
|
|
| Operation | Clients | Payload | Throughput | Latency |
|
|
|-----------|---------|---------|------------|---------|
|
|
| Sync put | 1 | 128 B | 30,067 msgs/sec (~3.7 MiB/s) | 33.3 us |
|
|
| Get (randomized keys) | 16 | 128 B | 102,844 msgs/sec (~13 MiB/s) | ~153 us |
|
|
|
|
---
|
|
|
|
## Resource Usage
|
|
|
|
| Scenario | RSS Memory |
|
|
|----------|------------|
|
|
| Core NATS at 2M msgs/sec (1 pub + 1 sub) | ~11 MB |
|
|
| JetStream production (recommended minimum) | 4 CPU cores, 8 GiB RAM |
|
|
|
|
---
|
|
|
|
## Sources
|
|
|
|
- [NATS Bench — Official Docs](https://docs.nats.io/using-nats/nats-tools/nats_cli/natsbench)
|
|
- [NATS Latency Test Framework](https://github.com/nats-io/latency-tests)
|
|
- [Benchmarking Message Queue Latency — Brave New Geek](https://bravenewgeek.com/benchmarking-message-queue-latency/)
|
|
- [NATS CLI Benchmark Blog](https://nats.io/blog/cli-benchmark/)
|