15 KiB
.NET 10 Optimization Opportunities for NATS.Server
This document identifies the highest-value places in the current .NET port that are still leaving performance on the table relative to what modern .NET 10 can do well. The focus is runtime behavior in the current codebase, not generic style guidance.
The ranking is based on likely payoff in NATS workloads:
- Protocol parsing and per-message delivery paths
- Subscription matching and routing fanout
- JetStream storage hot paths
- Route, leaf, MQTT, and monitoring paths with avoidable allocation churn
Several areas already use Span<T>, ReadOnlyMemory<byte>, SequenceReader<byte>, and stack allocation correctly. The remaining gaps are mostly where the code falls back to string, byte[], List<T>, ToArray(), LINQ, or repeated serialization work on hot paths.
Detailed Implementation Plans
Highest ROI
1. Keep parser state in bytes/spans longer
- Files:
src/NATS.Server/Protocol/NatsParser.cssrc/NATS.Server/NatsClient.cs
- Current issue:
NatsParsertokenizes control lines with spans, but then converts subjects, reply subjects, queue names, SIDs, and JSON payloads intostringandbyte[]immediately.TryReadPayload()always allocates a newbyte[]and copies the payload, even when the underlyingReadOnlySequence<byte>is already usable.ParseConnect()andParseInfo()callToArray()on the JSON portion.
- Why it matters:
- This runs for every client protocol command.
- The parser sits directly on the publish/subscribe hot path, so small per-command allocations scale badly under fan-in.
- Recommended optimization:
- Introduce a split parsed representation:
- a hot-path
ref structorreadonly structview carryingReadOnlySpan<byte>/ReadOnlySequence<byte>slices for subject, reply, SID, queue, and payload - a slower materialization path only when code actually needs
string
- a hot-path
- Store pending parser state as byte slices or pooled byte segments instead of
_pendingSubject/_pendingReplyTostrings. - For single-segment payloads, hand through a
ReadOnlyMemory<byte>slice rather than copying to a new array. - Only copy multi-segment payloads when required.
- Use
SearchValues<byte>for whitespace scanning and command detection instead of manual per-byte branching where it simplifies repeated searches.
- Introduce a split parsed representation:
- .NET 10 techniques:
ref structReadOnlySpan<byte>ReadOnlySequence<byte>SearchValues<byte>Encoding.ASCII.GetString(ReadOnlySpan<byte>)only at materialization boundaries
- Risk / complexity:
- Medium to high. This touches command parsing contracts and downstream consumers.
- Worth doing first because it reduces allocations before messages enter the rest of the server.
2. Remove string-heavy trie traversal in SubList
- Files:
src/NATS.Server/Subscriptions/SubList.cssrc/NATS.Server/Subscriptions/SubjectMatch.cs
- Current issue:
- Insert/remove paths repeatedly call
token.ToString(). - Routed subscription keys are synthesized as
"route|account|subject|queue"strings and later split back withSplit('|'). - Match path tokenization and cache population allocate arrays/lists and depend on string tokens.
RemoveRemoteSubs()andRemoveRemoteSubsForAccount()call_remoteSubs.ToArray()and re-parse keys on every sweep.
- Insert/remove paths repeatedly call
- Why it matters:
SubList.Match()is one of the most performance-sensitive operations in the server.- Remote interest tracking becomes more expensive as the route/leaf topology grows.
- Recommended optimization:
- Replace composite routed-sub string keys with a dedicated value key:
readonly record struct RoutedSubKey(string RouteId, string Account, string Subject, string? Queue)- or a plain
readonly structwith a custom comparer if profiling shows hash/comparison cost matters
- Keep tokenized subjects in a pooled or cached token form for exact subjects.
- Investigate a span-based token walker for matching so exact-subject lookups avoid
string[]creation entirely. - Replace temporary
List<Subscription>/List<List<Subscription>>creation inMatch()with pooled builders orArrayBufferWriter<T>. - For remote-sub cleanup, iterate dictionary entries without
ToArray()and avoidSplit.
- Replace composite routed-sub string keys with a dedicated value key:
- .NET 10 techniques:
readonly struct/readonly record structfor composite keysReadOnlySpan<char>token parsing- pooled builders via
ArrayPool<T>orArrayBufferWriter<T>
- Risk / complexity:
- Medium. The data model change is straightforward; changing trie matching internals requires careful parity testing.
3. Eliminate avoidable message duplication in FileStore
- Files:
src/NATS.Server/JetStream/Storage/FileStore.cssrc/NATS.Server/JetStream/Storage/MsgBlock.cssrc/NATS.Server/JetStream/Storage/StoredMessage.cs
- Current issue:
AppendAsync()transforms payload for persistence and often also keeps another managed copy in_messages.StoreMsg()creates a combinedbyte[]for headers + payload.- Many maintenance operations (
TrimToMaxMessages,PurgeEx,LoadLastBySubjectAsync,ListAsync) use LINQ over_messages.Values, causing iterator allocations and repeated scans. - Snapshot creation base64-encodes transformed payloads, forcing extra copies.
- Why it matters:
- JetStream storage code runs continuously under persistence-heavy workloads.
- It is both allocation-sensitive and memory-residency-sensitive.
- Recommended optimization:
- Split stored payload representation into:
- persisted payload bytes
- logical payload view
- optional headers view
- Avoid constructing concatenated header+payload arrays when the record format can encode both spans directly.
- Rework
StoredMessageso hot metadata stays compact; consider a smallerreadonly structfor indexes/metadata while payload storage remains reference-based. - Replace LINQ scans in hot maintenance paths with explicit loops.
- Add per-subject indexes or rolling pointers for operations currently implemented as full scans when those operations are expected to be common.
- Split stored payload representation into:
- .NET 10 techniques:
ReadOnlyMemory<byte>slices over shared buffersreadonly structfor compact metadata/index entries- explicit loops over LINQ in storage hot paths
CollectionsMarshalwhere safe for dictionary/list access in tight loops
- Risk / complexity:
- High. This area needs careful correctness validation for retention, snapshots, and recovery.
- High payoff for persistent streams.
4. Reduce formatting and copy overhead in route and leaf message sends
- Files:
src/NATS.Server/Routes/RouteConnection.cssrc/NATS.Server/LeafNodes/LeafConnection.cs
- Current issue:
- Control lines are built with string interpolation, converted with
Encoding.ASCII.GetBytes, then written separately from payload and trailer. "\r\n"u8.ToArray()allocates for every send.- Batch protocol send methods build a
StringBuilder, then allocate one big ASCII byte array.
- Control lines are built with string interpolation, converted with
- Why it matters:
- Cluster routes and leaf nodes are high-throughput transport paths in real deployments.
- This code is not as hot as client publish fanout, but it is hot enough to matter under clustered load.
- Recommended optimization:
- Mirror the client path:
- encode control lines into stackalloc or pooled byte buffers with span formatting
- write control + payload + CRLF via scatter-gather (
ReadOnlyMemory<byte>[]) or a reusable outbound buffer
- Replace repeated CRLF arrays with a static
ReadOnlyMemory<byte>/ReadOnlySpan<byte>. - For route sub protocol batches, encode directly into an
ArrayBufferWriter<byte>instead ofStringBuilder-> string -> bytes.
- Mirror the client path:
- .NET 10 techniques:
string.Createor direct span formatting into pooled buffersArrayBufferWriter<byte>- scatter-gather writes where transport permits
- Risk / complexity:
- Medium. Mostly localized refactoring with low semantic risk.
Medium ROI
5. Stop using LINQ-heavy materialization in monitoring endpoints
- Files:
src/NATS.Server/Monitoring/ConnzHandler.cssrc/NATS.Server/Monitoring/SubszHandler.cs
- Current issue:
- Monitoring paths repeatedly call
ToArray(),Select(),Where(),OrderBy(),Skip(), andTake(). SubszHandlerbuilds full subscription lists even when the request only needs counts.ConnzHandlerrepeatedly rematerializes arrays while filtering and sorting.
- Monitoring paths repeatedly call
- Why it matters:
- Monitoring endpoints are not the publish hot path, but they can become disruptive on busy servers with many clients/subscriptions.
- These allocations are easy to avoid.
- Recommended optimization:
- Separate count-only and detail-request paths.
- Use single-pass loops and pooled temporary lists.
- Delay expensive subscription detail expansion until after paging when possible.
- Consider returning immutable snapshots generated incrementally by the server for common monitor queries.
- .NET 10 techniques:
- explicit loops
- pooled arrays/lists
CollectionsMarshal.AsSpan()for internal list traversal where safe
- Risk / complexity:
- Low to medium.
6. Modernize MQTT packet writing and text parsing
- Files:
src/NATS.Server/Mqtt/MqttPacketWriter.cssrc/NATS.Server/Mqtt/MqttProtocolParser.cs
- Current issue:
MqttPacketWriterreturns freshbyte[]instances for every string/packet write.- Remaining-length encoding returns
scratch[..index].ToArray(). ParseLine()usesTrim(),StartsWith(),Split(), slicing, and string-based parsing throughout.
- Why it matters:
- MQTT is a side protocol, so this is not the top optimization target.
- Still worth fixing because the code is currently allocation-heavy and straightforward to improve.
- Recommended optimization:
- Add
TryWrite...APIs that write into caller-providedSpan<byte>/IBufferWriter<byte>. - Keep remaining-length bytes on the stack and copy directly into the final destination buffer.
- Rework
ParseLine()to operate onReadOnlySpan<char>and avoidSplit.
- Add
- .NET 10 techniques:
Span<byte>IBufferWriter<byte>ReadOnlySpan<char>SearchValues<char>for token separators if useful
- Risk / complexity:
- Low.
7. Replace full scans over _messages with maintained indexes where operations are common
- Files:
src/NATS.Server/JetStream/Storage/FileStore.cs
- Current issue:
LoadLastBySubjectAsync()scans all messages, filters, sorts descending, and picks the first result.TrimToMaxMessages()repeatedly calls_messages.Keys.Min().PurgeEx()materializes candidate lists before deletion.
- Why it matters:
- These are algorithmic inefficiencies, not just allocation issues.
- They become more visible as streams grow.
- Recommended optimization:
- Maintain lightweight indexes:
- last sequence by subject
- first/last live sequence tracking without
Min()/Max()scans - optionally per-subject linked or sorted sequence sets for purge/retention operations
- If full indexing is too large a change, replace repeated LINQ scans with single-pass loops immediately.
- Maintain lightweight indexes:
- .NET 10 techniques:
- compact metadata structs
- tighter dictionary usage
- fewer transient enumerators
- Risk / complexity:
- Medium.
8. Reduce repeated small allocations in protocol constants and control frames
- Files:
src/NATS.Server/Protocol/NatsParser.cssrc/NATS.Server/Routes/RouteConnection.cssrc/NATS.Server/LeafNodes/LeafConnection.cs- other transport helpers
- Current issue:
- Some constants are still materialized via
ToArray()rather than held as staticbyte[]orReadOnlyMemory<byte>. - Control frames repeatedly build temporary arrays for tiny literals.
- Some constants are still materialized via
- Why it matters:
- These are cheap wins and remove noisy allocation churn.
- Recommended optimization:
- Standardize on shared static byte literals for CRLF and fixed protocol tokens.
- Audit for repeated
u8.ToArray()orEncoding.ASCII.GetByteson invariant text.
- .NET 10 techniques:
- static cached buffers
- span-based concatenation into reusable destinations
- Risk / complexity:
- Low.
Lower ROI Or Caution Areas
9. Be selective about introducing more structs
- Files:
- cross-cutting
- Current issue:
- Some parts of the code would benefit from value types, but others already contain references (
string,byte[],ReadOnlyMemory<byte>, dictionaries) where converting whole models to structs would increase copying and call-site complexity.
- Some parts of the code would benefit from value types, but others already contain references (
- Recommendation:
- Good struct candidates:
- composite dictionary keys
- compact metadata/index entries
- parser token views
- queue or routing bookkeeping records
- Poor struct candidates:
- large mutable models
- objects with many reference-type fields
- stateful connection objects
- Good struct candidates:
- Why it matters:
- “Use more structs” is only a win when the values are small, immutable, and heavily allocated.
10. Avoid premature replacement of already-good memory APIs
- Files:
src/NATS.Server/NatsClient.cssrc/NATS.Server/IO/OutboundBufferPool.cs- several JetStream codecs
- Current issue:
- There has already been meaningful optimization work in direct write buffering and pooled outbound paths.
- Replacing these with more exotic abstractions without profiling could regress behavior.
- Recommendation:
- Prefer extending the current buffer-pool and direct-write patterns into routes, leaves, and parser payload handling before redesigning the client write path again.
Suggested Implementation Order
NatsParserhot-path byte retention and reduced payload copyingSubListkey/token allocation cleanup and remote-sub key redesign- Route/leaf outbound buffer encoding cleanup
FileStorehot-path de-LINQ and payload/index refactoring- Monitoring endpoint de-materialization
- MQTT writer/parser span-based cleanup
What To Measure Before And After
Use the benchmark project and targeted microbenchmarks to measure:
- allocations per
PUB,SUB,UNSUB,CONNECT - allocations per delivered message under fanout
SubList.Match()throughput and allocations for:- exact subjects
- wildcard subjects
- queue subscriptions
- remote interest present
- JetStream append throughput and bytes allocated per append
- route/leaf forwarded-message allocations
- monitoring endpoint allocations for large client/subscription sets
Summary
The best remaining gains are not from sprinkling Span<T> everywhere. They come from carrying byte-oriented data further through the hot paths, removing composite-string bookkeeping, reducing duplicate payload ownership in JetStream storage, and eliminating materialization-heavy helper code around routing and monitoring.
If you only do three things, do these first:
- Rework
NatsParserto avoid earlystring/byte[]creation. - Replace
SubListcomposite string keys and string-heavy token handling. - Refactor
FileStoreand route/leaf send paths to reduce duplicate buffers and transient formatting allocations.