Files

Joseph Doherty 4de691c9c5 perf: add FileStore buffered writes, O(1) state tracking, and eliminate redundant per-publish work

Implement Go-parity background flush loop (coalesce 16KB/8ms) in MsgBlock/FileStore,
replace O(n) GetStateAsync with incremental counters, skip PruneExpired/LoadAsync/
PrunePerSubject when not needed, and bypass RAFT for single-replica streams. Fix counter
tracking bugs in RemoveMsg/EraseMsg/TTL expiry and ObjectDisposedException races in
flush loop disposal. FileStore optimizations verified with 3112/3112 JetStream tests
passing; async publish benchmark remains at ~174 msg/s due to E2E protocol path bottleneck.

2026-03-13 03:11:11 -04:00

4.6 KiB

Raw Blame History

Go vs .NET NATS Server — Benchmark Comparison

Benchmark run: 2026-03-13. Both servers running on the same machine, tested with identical NATS.Client.Core workloads. Test parallelization disabled to avoid resource contention.

Environment: Apple M4, .NET 10, Go nats-server (latest from golang/nats-server/).

Core NATS — Pub/Sub Throughput

Single Publisher (no subscribers)

Payload	Go msg/s	Go MB/s	.NET msg/s	.NET MB/s	Ratio (.NET/Go)
16 B	2,436,416	37.2	1,425,767	21.8	0.59x
128 B	2,143,434	261.6	1,654,692	202.0	0.77x

Publisher + Subscriber (1:1)

Payload	Go msg/s	Go MB/s	.NET msg/s	.NET MB/s	Ratio (.NET/Go)
16 B	1,140,225	17.4	207,654	3.2	0.18x
16 KB	41,762	652.5	34,429	538.0	0.82x

Fan-Out (1 Publisher : 4 Subscribers)

Payload	Go msg/s	Go MB/s	.NET msg/s	.NET MB/s	Ratio (.NET/Go)
128 B	3,192,313	389.7	581,284	71.0	0.18x

Multi-Publisher / Multi-Subscriber (4P x 4S)

Payload	Go msg/s	Go MB/s	.NET msg/s	.NET MB/s	Ratio (.NET/Go)
128 B	269,445	32.9	529,808	64.7	1.97x

Core NATS — Request/Reply Latency

Single Client, Single Service

Payload	Go msg/s	.NET msg/s	Ratio	Go P50 (us)	.NET P50 (us)	Go P99 (us)	.NET P99 (us)
128 B	9,347	7,215	0.77x	104.5	134.7	146.2	190.5

10 Clients, 2 Services (Queue Group)

Payload	Go msg/s	.NET msg/s	Ratio	Go P50 (us)	.NET P50 (us)	Go P99 (us)	.NET P99 (us)
16 B	30,893	25,861	0.84x	315.0	370.2	451.1	595.0

JetStream — Publication

Mode	Payload	Storage	Go msg/s	.NET msg/s	Ratio (.NET/Go)
Synchronous	16 B	Memory	16,783	13,815	0.82x
Async (batch)	128 B	File	210,387	174	0.00x

Note: Async file store publish remains extremely slow after FileStore-level optimizations (buffered writes, O(1) state tracking, redundant work elimination). The bottleneck is in the E2E network/protocol processing path (synchronous .GetAwaiter().GetResult() calls in the client read loop), not storage I/O.

JetStream — Consumption

Mode	Go msg/s	.NET msg/s	Ratio (.NET/Go)
Ordered ephemeral consumer	109,519	N/A	N/A
Durable consumer fetch	639,247	80,792	0.13x

Note: Ordered ephemeral consumer is not yet fully supported on the .NET server (API timeout during consumer creation).

Summary

Category	Ratio Range	Assessment
Pub-only throughput	0.59x–0.77x	Good — within 2x
Pub/sub (large payload)	0.82x	Good
Pub/sub (small payload)	0.18x	Needs optimization
Fan-out	0.18x	Needs optimization
Multi pub/sub	1.97x	.NET faster (likely measurement artifact at low counts)
Request/reply latency	0.77x–0.84x	Good
JetStream sync publish	0.82x	Good
JetStream async file publish	~0x	Broken — E2E protocol path bottleneck
JetStream durable fetch	0.13x	Needs optimization

Key Observations

Pub-only and request/reply are within striking distance (0.6x–0.85x), suggesting the core message path is reasonably well ported.
Small-payload pub/sub and fan-out are 5x slower (0.18x ratio). The bottleneck is likely in the subscription dispatch / message delivery hot path — the SubList.Match() → MSG write loop.
JetStream file store async publish remains at ~174 msg/s despite FileStore-level optimizations (buffered writes with background flush loop, O(1) state tracking, eliminating redundant per-publish work). The bottleneck is in the E2E network/protocol processing path — synchronous .GetAwaiter().GetResult() calls in the client read loop block the async pipeline.
JetStream consumption (durable fetch) is 8x slower than Go. Ordered consumers don't work yet.
The multi-pub/sub result showing .NET faster is likely a measurement artifact from the small message count (2,000 per publisher) — not representative at scale.

4.6 KiB Raw Blame History Unescape Escape