feat: add benchmark test project for Go vs .NET server comparison

Side-by-side performance benchmarks using NATS.Client.Core against both
servers on ephemeral ports. Includes core pub/sub, request/reply latency,
and JetStream throughput tests with comparison output and
benchmarks_comparison.md results. Also fixes timestamp flakiness in
StoreInterfaceTests by using explicit timestamps.
This commit is contained in:
Joseph Doherty
2026-03-13 01:23:31 -04:00
parent e9c86c51c3
commit 37575dc41c
28 changed files with 2264 additions and 12 deletions

View File

@@ -0,0 +1,113 @@
# NATS.Server.Benchmark.Tests
Side-by-side performance comparison of the Go and .NET NATS server implementations. Both servers are launched as child processes on ephemeral ports and exercised with identical workloads using the `NATS.Client.Core` / `NATS.Client.JetStream` NuGet client libraries.
## Prerequisites
- .NET 10 SDK
- Go toolchain (for Go server comparison; benchmarks still run .NET-only if Go is unavailable)
- The Go NATS server source at `golang/nats-server/` (the Go binary is built automatically on first run)
## Running Benchmarks
All benchmark tests are tagged with `[Trait("Category", "Benchmark")]`, so a plain `dotnet test` against the solution will **not** run them. Use the `--filter` flag.
```bash
# Run all benchmarks
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark" -v normal
# Core pub/sub only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~CorePubSub" -v normal
# Request/reply only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~RequestReply" -v normal
# JetStream only
dotnet test tests/NATS.Server.Benchmark.Tests --filter "Category=Benchmark&FullyQualifiedName~JetStream" -v normal
# A single benchmark by name
dotnet test tests/NATS.Server.Benchmark.Tests --filter "FullyQualifiedName=NATS.Server.Benchmark.Tests.CorePubSub.SinglePublisherThroughputTests.PubNoSub_16B" -v normal
```
Use `-v normal` or `--logger "console;verbosity=detailed"` to see the comparison output in the console. Without verbosity, xUnit suppresses `ITestOutputHelper` output for passing tests.
## Benchmark List
| Test Class | Test Method | What It Measures |
|---|---|---|
| `SinglePublisherThroughputTests` | `PubNoSub_16B` | Publish-only throughput, 16-byte payload |
| `SinglePublisherThroughputTests` | `PubNoSub_128B` | Publish-only throughput, 128-byte payload |
| `PubSubOneToOneTests` | `PubSub1To1_16B` | 1 publisher, 1 subscriber, 16-byte payload |
| `PubSubOneToOneTests` | `PubSub1To1_16KB` | 1 publisher, 1 subscriber, 16 KB payload |
| `FanOutTests` | `FanOut1To4_128B` | 1 publisher, 4 subscribers, 128-byte payload |
| `MultiPubSubTests` | `MultiPubSub4x4_128B` | 4 publishers, 4 subscribers, 128-byte payload |
| `SingleClientLatencyTests` | `RequestReply_SingleClient_128B` | Request/reply round-trip latency, 1 client, 1 service |
| `MultiClientLatencyTests` | `RequestReply_10Clients2Services_16B` | Request/reply latency, 10 concurrent clients, 2 queue-group services |
| `SyncPublishTests` | `JSSyncPublish_16B_MemoryStore` | JetStream synchronous publish, memory-backed stream |
| `AsyncPublishTests` | `JSAsyncPublish_128B_FileStore` | JetStream async batch publish, file-backed stream |
| `OrderedConsumerTests` | `JSOrderedConsumer_Throughput` | JetStream ordered ephemeral consumer read throughput |
| `DurableConsumerFetchTests` | `JSDurableFetch_Throughput` | JetStream durable consumer fetch-in-batches throughput |
## Output Format
Each test writes a side-by-side comparison to xUnit's test output:
```
=== Single Publisher (16B) ===
Go: 2,436,416 msg/s | 37.2 MB/s | 41 ms
.NET: 1,425,767 msg/s | 21.8 MB/s | 70 ms
Ratio: 0.59x (.NET / Go)
```
Request/reply tests also include latency percentiles:
```
Latency (us):
P50 P95 P99 Min Max
Go 104.5 124.2 146.2 82.8 1204.5
.NET 134.7 168.0 190.5 91.6 3469.5
```
If Go is not available, only the .NET result is printed.
## Updating benchmarks_comparison.md
After running the full benchmark suite, update `benchmarks_comparison.md` in the repository root with the new numbers:
1. Run all benchmarks with detailed output:
```bash
dotnet test tests/NATS.Server.Benchmark.Tests \
--filter "Category=Benchmark" \
-v normal \
--logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
```
2. Open `/tmp/bench-output.txt` and extract the comparison blocks from the "Standard Output Messages" sections.
3. Update the tables in `benchmarks_comparison.md` with the new msg/s, MB/s, ratio, and latency values. Update the date on the first line and the environment description.
4. Review the Summary table and Key Observations — update assessments if ratios have changed significantly.
## Architecture
```
Infrastructure/
PortAllocator.cs # Ephemeral port allocation (bind to port 0)
DotNetServerProcess.cs # Builds + launches NATS.Server.Host
GoServerProcess.cs # Builds + launches golang/nats-server
CoreServerPairFixture.cs # IAsyncLifetime: Go + .NET servers for core tests
JetStreamServerPairFixture # IAsyncLifetime: Go + .NET servers with JetStream
Collections.cs # xUnit collection definitions
Harness/
BenchmarkResult.cs # Result record (msg/s, MB/s, latencies)
LatencyTracker.cs # Pre-allocated sample buffer, percentile math
BenchmarkRunner.cs # Warmup + timed measurement loop
BenchmarkResultWriter.cs # Formats side-by-side comparison output
```
Server pair fixtures start both servers once per xUnit collection and expose `CreateGoClient()` / `CreateDotNetClient()` factory methods. Test parallelization is disabled at the assembly level (`AssemblyInfo.cs`) to prevent resource contention between collections.
## Notes
- The Go binary at `golang/nats-server/nats-server` is built automatically on first run via `go build`. Subsequent runs reuse the cached binary.
- If Go is not installed or `golang/nats-server/` does not exist, Go benchmarks are skipped and only .NET results are reported.
- JetStream tests create temporary store directories under `$TMPDIR` and clean them up on teardown.
- Message counts are intentionally conservative (2K10K per test) to keep the full suite under 2 minutes while still producing meaningful throughput ratios. For higher-fidelity numbers, increase the counts in individual test methods or the `BenchmarkRunner` configuration.