Files
natsdotnet/tests/NATS.Server.Benchmark.Tests/README.md
Joseph Doherty 11e01b9026 perf: optimize MQTT cross-protocol path (0.30x → 0.78x Go)
Replace per-message async fire-and-forget with direct-buffer write loop
mirroring NatsClient pattern: SpinLock-guarded buffer append, double-
buffer swap, single WriteAsync per batch.

- MqttConnection: add _directBuf/_writeBuf + RunMqttWriteLoopAsync
- MqttConnection: add EnqueuePublishNoFlush (zero-alloc PUBLISH format)
- MqttPacketWriter: add WritePublishTo(Span<byte>) + MeasurePublish
- MqttTopicMapper: add NatsToMqttBytes with bounded ConcurrentDictionary
- MqttNatsClientAdapter: synchronous SendMessageNoFlush + SignalFlush
- Skip FlushAsync on plain TCP sockets (TCP auto-flushes)
2026-03-13 14:25:13 -04:00

132 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NATS.Server.Benchmark.Tests
Side-by-side performance comparison of the Go and .NET NATS server implementations. Both servers are launched as child processes on ephemeral ports and exercised with identical workloads using the `NATS.Client.Core` / `NATS.Client.JetStream` NuGet client libraries.
## Prerequisites
- .NET 10 SDK
- Go toolchain (for Go server comparison; benchmarks still run .NET-only if Go is unavailable)
- The Go NATS server source at `golang/nats-server/` (the Go binary is built automatically on first run)
## Running Benchmarks
All benchmark tests are tagged with `[Trait("Category", "Benchmark")]`, so a plain `dotnet test` against the solution will **not** run them. Use the `--filter` flag.
```bash
# Run all benchmarks
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal
# Core pub/sub only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~CorePubSub" -v normal
# Request/reply only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~RequestReply" -v normal
# JetStream only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~JetStream" -v normal
# MQTT benchmarks
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~Mqtt" -v normal
# Transport benchmarks (TLS + WebSocket)
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~Transport" -v normal
# A single benchmark by name
dotnet test tests/NATS.Server.Benchmark.Tests --filter "FullyQualifiedName=NATS.Server.Benchmark.Tests.CorePubSub.SinglePublisherThroughputTests.PubNoSub_16B" -v normal
```
Use `-v normal` or `--logger "console;verbosity=detailed"` to see the comparison output in the console. Without verbosity, xUnit suppresses `ITestOutputHelper` output for passing tests.
## Benchmark List
| Test Class | Test Method | What It Measures |
|---|---|---|
| `SinglePublisherThroughputTests` | `PubNoSub_16B` | Publish-only throughput, 16-byte payload |
| `SinglePublisherThroughputTests` | `PubNoSub_128B` | Publish-only throughput, 128-byte payload |
| `PubSubOneToOneTests` | `PubSub1To1_16B` | 1 publisher, 1 subscriber, 16-byte payload |
| `PubSubOneToOneTests` | `PubSub1To1_16KB` | 1 publisher, 1 subscriber, 16 KB payload |
| `FanOutTests` | `FanOut1To4_128B` | 1 publisher, 4 subscribers, 128-byte payload |
| `MultiPubSubTests` | `MultiPubSub4x4_128B` | 4 publishers, 4 subscribers, 128-byte payload |
| `SingleClientLatencyTests` | `RequestReply_SingleClient_128B` | Request/reply round-trip latency, 1 client, 1 service |
| `MultiClientLatencyTests` | `RequestReply_10Clients2Services_16B` | Request/reply latency, 10 concurrent clients, 2 queue-group services |
| `SyncPublishTests` | `JSSyncPublish_16B_MemoryStore` | JetStream synchronous publish, memory-backed stream |
| `AsyncPublishTests` | `JSAsyncPublish_128B_FileStore` | JetStream async batch publish, file-backed stream |
| `FileStoreAppendBenchmarks` | `FileStore_AppendAsync_128B_Throughput` | FileStore direct append throughput, 128-byte payload |
| `FileStoreAppendBenchmarks` | `FileStore_LoadLastBySubject_Throughput` | FileStore hot-path subject index lookup throughput |
| `FileStoreAppendBenchmarks` | `FileStore_PurgeEx_Trim_Overhead` | FileStore purge/trim maintenance overhead under repeated updates |
| `OrderedConsumerTests` | `JSOrderedConsumer_Throughput` | JetStream ordered ephemeral consumer read throughput |
| `DurableConsumerFetchTests` | `JSDurableFetch_Throughput` | JetStream durable consumer fetch-in-batches throughput |
| `MqttThroughputTests` | `MqttPubSub_128B` | MQTT pub/sub throughput, 128-byte payload, QoS 0 |
| `MqttThroughputTests` | `MqttCrossProtocol_NatsPub_MqttSub_128B` | Cross-protocol NATS→MQTT routing throughput |
| `TlsPubSubTests` | `TlsPubSub1To1_128B` | TLS pub/sub 1:1 throughput, 128-byte payload |
| `TlsPubSubTests` | `TlsPubNoSub_128B` | TLS publish-only throughput, 128-byte payload |
| `WebSocketPubSubTests` | `WsPubSub1To1_128B` | WebSocket pub/sub 1:1 throughput, 128-byte payload |
| `WebSocketPubSubTests` | `WsPubNoSub_128B` | WebSocket publish-only throughput, 128-byte payload |
## Output Format
Each test writes a side-by-side comparison to xUnit's test output:
```
=== Single Publisher (16B) ===
Go: 2,436,416 msg/s | 37.2 MB/s | 41 ms
.NET: 1,425,767 msg/s | 21.8 MB/s | 70 ms
Ratio: 0.59x (.NET / Go)
```
Request/reply tests also include latency percentiles:
```
Latency (us):
P50 P95 P99 Min Max
Go 104.5 124.2 146.2 82.8 1204.5
.NET 134.7 168.0 190.5 91.6 3469.5
```
If Go is not available, only the .NET result is printed.
## Updating benchmarks_comparison.md
After running the full benchmark suite, update `benchmarks_comparison.md` in the repository root with the new numbers:
1. Run all benchmarks with detailed output:
```bash
dotnet test tests/NATS.Server.Benchmark.Tests \
--filter "Category=Benchmark" \
-v normal \
--logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
```
2. Open `/tmp/bench-output.txt` and extract the comparison blocks from the "Standard Output Messages" sections.
3. Update the tables in `benchmarks_comparison.md` with the new msg/s, MB/s, ratio, and latency values. Update the date on the first line and the environment description.
4. Review the Summary table and Key Observations — update assessments if ratios have changed significantly.
## Architecture
```
Infrastructure/
PortAllocator.cs # Ephemeral port allocation (bind to port 0)
DotNetServerProcess.cs # Builds + launches NATS.Server.Host
GoServerProcess.cs # Builds + launches golang/nats-server
CoreServerPairFixture.cs # IAsyncLifetime: Go + .NET servers for core tests
JetStreamServerPairFixture # IAsyncLifetime: Go + .NET servers with JetStream
MqttServerFixture.cs # IAsyncLifetime: .NET server with MQTT + JetStream
TlsServerFixture.cs # IAsyncLifetime: .NET server with TLS
WebSocketServerFixture.cs # IAsyncLifetime: .NET server with WebSocket
Collections.cs # xUnit collection definitions
Harness/
BenchmarkResult.cs # Result record (msg/s, MB/s, latencies)
LatencyTracker.cs # Pre-allocated sample buffer, percentile math
BenchmarkRunner.cs # Warmup + timed measurement loop
BenchmarkResultWriter.cs # Formats side-by-side comparison output
```
Server pair fixtures start both servers once per xUnit collection and expose `CreateGoClient()` / `CreateDotNetClient()` factory methods. Test parallelization is disabled at the assembly level (`AssemblyInfo.cs`) to prevent resource contention between collections.
## Notes
- The Go binary at `golang/nats-server/nats-server` is built automatically on first run via `go build`. Subsequent runs reuse the cached binary.
- If Go is not installed or `golang/nats-server/` does not exist, Go benchmarks are skipped and only .NET results are reported.
- JetStream tests create temporary store directories under `$TMPDIR` and clean them up on teardown.
- Message counts are intentionally conservative (2K10K per test) to keep the full suite under 2 minutes while still producing meaningful throughput ratios. For higher-fidelity numbers, increase the counts in individual test methods or the `BenchmarkRunner` configuration.