Files

Joseph Doherty a1fc600d84 docs: add optimization planning documents

2026-03-13 10:19:56 -04:00

15 KiB

Raw Blame History

.NET 10 Optimization Opportunities for `NATS.Server`

This document identifies the highest-value places in the current .NET port that are still leaving performance on the table relative to what modern .NET 10 can do well. The focus is runtime behavior in the current codebase, not generic style guidance.

The ranking is based on likely payoff in NATS workloads:

Protocol parsing and per-message delivery paths
Subscription matching and routing fanout
JetStream storage hot paths
Route, leaf, MQTT, and monitoring paths with avoidable allocation churn

Several areas already use Span<T>, ReadOnlyMemory<byte>, SequenceReader<byte>, and stack allocation correctly. The remaining gaps are mostly where the code falls back to string, byte[], List<T>, ToArray(), LINQ, or repeated serialization work on hot paths.

Detailed Implementation Plans

Highest ROI

1. Keep parser state in bytes/spans longer

Files:
- src/NATS.Server/Protocol/NatsParser.cs
- src/NATS.Server/NatsClient.cs
Current issue:
- NatsParser tokenizes control lines with spans, but then converts subjects, reply subjects, queue names, SIDs, and JSON payloads into string and byte[] immediately.
- TryReadPayload() always allocates a new byte[] and copies the payload, even when the underlying ReadOnlySequence<byte> is already usable.
- ParseConnect() and ParseInfo() call ToArray() on the JSON portion.
Why it matters:
- This runs for every client protocol command.
- The parser sits directly on the publish/subscribe hot path, so small per-command allocations scale badly under fan-in.
Recommended optimization:
- Introduce a split parsed representation:
  - a hot-path ref struct or readonly struct view carrying ReadOnlySpan<byte> / ReadOnlySequence<byte> slices for subject, reply, SID, queue, and payload
  - a slower materialization path only when code actually needs string
- Store pending parser state as byte slices or pooled byte segments instead of _pendingSubject / _pendingReplyTo strings.
- For single-segment payloads, hand through a ReadOnlyMemory<byte> slice rather than copying to a new array.
- Only copy multi-segment payloads when required.
- Use SearchValues<byte> for whitespace scanning and command detection instead of manual per-byte branching where it simplifies repeated searches.
.NET 10 techniques:
- ref struct
- ReadOnlySpan<byte>
- ReadOnlySequence<byte>
- SearchValues<byte>
- Encoding.ASCII.GetString(ReadOnlySpan<byte>) only at materialization boundaries
Risk / complexity:
- Medium to high. This touches command parsing contracts and downstream consumers.
- Worth doing first because it reduces allocations before messages enter the rest of the server.

2. Remove string-heavy trie traversal in `SubList`

Files:
- src/NATS.Server/Subscriptions/SubList.cs
- src/NATS.Server/Subscriptions/SubjectMatch.cs
Current issue:
- Insert/remove paths repeatedly call token.ToString().
- Routed subscription keys are synthesized as "route|account|subject|queue" strings and later split back with Split('|').
- Match path tokenization and cache population allocate arrays/lists and depend on string tokens.
- RemoveRemoteSubs() and RemoveRemoteSubsForAccount() call _remoteSubs.ToArray() and re-parse keys on every sweep.
Why it matters:
- SubList.Match() is one of the most performance-sensitive operations in the server.
- Remote interest tracking becomes more expensive as the route/leaf topology grows.
Recommended optimization:
- Replace composite routed-sub string keys with a dedicated value key:
  - readonly record struct RoutedSubKey(string RouteId, string Account, string Subject, string? Queue)
  - or a plain readonly struct with a custom comparer if profiling shows hash/comparison cost matters
- Keep tokenized subjects in a pooled or cached token form for exact subjects.
- Investigate a span-based token walker for matching so exact-subject lookups avoid string[] creation entirely.
- Replace temporary List<Subscription> / List<List<Subscription>> creation in Match() with pooled builders or ArrayBufferWriter<T>.
- For remote-sub cleanup, iterate dictionary entries without ToArray() and avoid Split.
.NET 10 techniques:
- readonly struct / readonly record struct for composite keys
- ReadOnlySpan<char> token parsing
- pooled builders via ArrayPool<T> or ArrayBufferWriter<T>
Risk / complexity:
- Medium. The data model change is straightforward; changing trie matching internals requires careful parity testing.

3. Eliminate avoidable message duplication in `FileStore`

Files:
- src/NATS.Server/JetStream/Storage/FileStore.cs
- src/NATS.Server/JetStream/Storage/MsgBlock.cs
- src/NATS.Server/JetStream/Storage/StoredMessage.cs
Current issue:
- AppendAsync() transforms payload for persistence and often also keeps another managed copy in _messages.
- StoreMsg() creates a combined byte[] for headers + payload.
- Many maintenance operations (TrimToMaxMessages, PurgeEx, LoadLastBySubjectAsync, ListAsync) use LINQ over _messages.Values, causing iterator allocations and repeated scans.
- Snapshot creation base64-encodes transformed payloads, forcing extra copies.
Why it matters:
- JetStream storage code runs continuously under persistence-heavy workloads.
- It is both allocation-sensitive and memory-residency-sensitive.
Recommended optimization:
- Split stored payload representation into:
  - persisted payload bytes
  - logical payload view
  - optional headers view
- Avoid constructing concatenated header+payload arrays when the record format can encode both spans directly.
- Rework StoredMessage so hot metadata stays compact; consider a smaller readonly struct for indexes/metadata while payload storage remains reference-based.
- Replace LINQ scans in hot maintenance paths with explicit loops.
- Add per-subject indexes or rolling pointers for operations currently implemented as full scans when those operations are expected to be common.
.NET 10 techniques:
- ReadOnlyMemory<byte> slices over shared buffers
- readonly struct for compact metadata/index entries
- explicit loops over LINQ in storage hot paths
- CollectionsMarshal where safe for dictionary/list access in tight loops
Risk / complexity:
- High. This area needs careful correctness validation for retention, snapshots, and recovery.
- High payoff for persistent streams.

4. Reduce formatting and copy overhead in route and leaf message sends

Files:
- src/NATS.Server/Routes/RouteConnection.cs
- src/NATS.Server/LeafNodes/LeafConnection.cs
Current issue:
- Control lines are built with string interpolation, converted with Encoding.ASCII.GetBytes, then written separately from payload and trailer.
- "\r\n"u8.ToArray() allocates for every send.
- Batch protocol send methods build a StringBuilder, then allocate one big ASCII byte array.
Why it matters:
- Cluster routes and leaf nodes are high-throughput transport paths in real deployments.
- This code is not as hot as client publish fanout, but it is hot enough to matter under clustered load.
Recommended optimization:
- Mirror the client path:
  - encode control lines into stackalloc or pooled byte buffers with span formatting
  - write control + payload + CRLF via scatter-gather (ReadOnlyMemory<byte>[]) or a reusable outbound buffer
- Replace repeated CRLF arrays with a static ReadOnlyMemory<byte> / ReadOnlySpan<byte>.
- For route sub protocol batches, encode directly into an ArrayBufferWriter<byte> instead of StringBuilder -> string -> bytes.
.NET 10 techniques:
- string.Create or direct span formatting into pooled buffers
- ArrayBufferWriter<byte>
- scatter-gather writes where transport permits
Risk / complexity:
- Medium. Mostly localized refactoring with low semantic risk.

Medium ROI

5. Stop using LINQ-heavy materialization in monitoring endpoints

Files:
- src/NATS.Server/Monitoring/ConnzHandler.cs
- src/NATS.Server/Monitoring/SubszHandler.cs
Current issue:
- Monitoring paths repeatedly call ToArray(), Select(), Where(), OrderBy(), Skip(), and Take().
- SubszHandler builds full subscription lists even when the request only needs counts.
- ConnzHandler repeatedly rematerializes arrays while filtering and sorting.
Why it matters:
- Monitoring endpoints are not the publish hot path, but they can become disruptive on busy servers with many clients/subscriptions.
- These allocations are easy to avoid.
Recommended optimization:
- Separate count-only and detail-request paths.
- Use single-pass loops and pooled temporary lists.
- Delay expensive subscription detail expansion until after paging when possible.
- Consider returning immutable snapshots generated incrementally by the server for common monitor queries.
.NET 10 techniques:
- explicit loops
- pooled arrays/lists
- CollectionsMarshal.AsSpan() for internal list traversal where safe
Risk / complexity:
- Low to medium.

6. Modernize MQTT packet writing and text parsing

Files:
- src/NATS.Server/Mqtt/MqttPacketWriter.cs
- src/NATS.Server/Mqtt/MqttProtocolParser.cs
Current issue:
- MqttPacketWriter returns fresh byte[] instances for every string/packet write.
- Remaining-length encoding returns scratch[..index].ToArray().
- ParseLine() uses Trim(), StartsWith(), Split(), slicing, and string-based parsing throughout.
Why it matters:
- MQTT is a side protocol, so this is not the top optimization target.
- Still worth fixing because the code is currently allocation-heavy and straightforward to improve.
Recommended optimization:
- Add TryWrite... APIs that write into caller-provided Span<byte> / IBufferWriter<byte>.
- Keep remaining-length bytes on the stack and copy directly into the final destination buffer.
- Rework ParseLine() to operate on ReadOnlySpan<char> and avoid Split.
.NET 10 techniques:
- Span<byte>
- IBufferWriter<byte>
- ReadOnlySpan<char>
- SearchValues<char> for token separators if useful
Risk / complexity:
- Low.

7. Replace full scans over `_messages` with maintained indexes where operations are common

Files:
- src/NATS.Server/JetStream/Storage/FileStore.cs
Current issue:
- LoadLastBySubjectAsync() scans all messages, filters, sorts descending, and picks the first result.
- TrimToMaxMessages() repeatedly calls _messages.Keys.Min().
- PurgeEx() materializes candidate lists before deletion.
Why it matters:
- These are algorithmic inefficiencies, not just allocation issues.
- They become more visible as streams grow.
Recommended optimization:
- Maintain lightweight indexes:
  - last sequence by subject
  - first/last live sequence tracking without Min() / Max() scans
  - optionally per-subject linked or sorted sequence sets for purge/retention operations
- If full indexing is too large a change, replace repeated LINQ scans with single-pass loops immediately.
.NET 10 techniques:
- compact metadata structs
- tighter dictionary usage
- fewer transient enumerators
Risk / complexity:
- Medium.

8. Reduce repeated small allocations in protocol constants and control frames

Files:
- src/NATS.Server/Protocol/NatsParser.cs
- src/NATS.Server/Routes/RouteConnection.cs
- src/NATS.Server/LeafNodes/LeafConnection.cs
- other transport helpers
Current issue:
- Some constants are still materialized via ToArray() rather than held as static byte[] or ReadOnlyMemory<byte>.
- Control frames repeatedly build temporary arrays for tiny literals.
Why it matters:
- These are cheap wins and remove noisy allocation churn.
Recommended optimization:
- Standardize on shared static byte literals for CRLF and fixed protocol tokens.
- Audit for repeated u8.ToArray() or Encoding.ASCII.GetBytes on invariant text.
.NET 10 techniques:
- static cached buffers
- span-based concatenation into reusable destinations
Risk / complexity:
- Low.

Lower ROI Or Caution Areas

9. Be selective about introducing more structs

Files:
- cross-cutting
Current issue:
- Some parts of the code would benefit from value types, but others already contain references (string, byte[], ReadOnlyMemory<byte>, dictionaries) where converting whole models to structs would increase copying and call-site complexity.
Recommendation:
- Good struct candidates:
  - composite dictionary keys
  - compact metadata/index entries
  - parser token views
  - queue or routing bookkeeping records
- Poor struct candidates:
  - large mutable models
  - objects with many reference-type fields
  - stateful connection objects
Why it matters:
- “Use more structs” is only a win when the values are small, immutable, and heavily allocated.

10. Avoid premature replacement of already-good memory APIs

Files:
- src/NATS.Server/NatsClient.cs
- src/NATS.Server/IO/OutboundBufferPool.cs
- several JetStream codecs
Current issue:
- There has already been meaningful optimization work in direct write buffering and pooled outbound paths.
- Replacing these with more exotic abstractions without profiling could regress behavior.
Recommendation:
- Prefer extending the current buffer-pool and direct-write patterns into routes, leaves, and parser payload handling before redesigning the client write path again.

Suggested Implementation Order

NatsParser hot-path byte retention and reduced payload copying
SubList key/token allocation cleanup and remote-sub key redesign
Route/leaf outbound buffer encoding cleanup
FileStore hot-path de-LINQ and payload/index refactoring
Monitoring endpoint de-materialization
MQTT writer/parser span-based cleanup

What To Measure Before And After

Use the benchmark project and targeted microbenchmarks to measure:

allocations per PUB, SUB, UNSUB, CONNECT
allocations per delivered message under fanout
SubList.Match() throughput and allocations for:
- exact subjects
- wildcard subjects
- queue subscriptions
- remote interest present
JetStream append throughput and bytes allocated per append
route/leaf forwarded-message allocations
monitoring endpoint allocations for large client/subscription sets

Summary

The best remaining gains are not from sprinkling Span<T> everywhere. They come from carrying byte-oriented data further through the hot paths, removing composite-string bookkeeping, reducing duplicate payload ownership in JetStream storage, and eliminating materialization-heavy helper code around routing and monitoring.

If you only do three things, do these first:

Rework NatsParser to avoid early string / byte[] creation.
Replace SubList composite string keys and string-heavy token handling.
Refactor FileStore and route/leaf send paths to reduce duplicate buffers and transient formatting allocations.

15 KiB Raw Blame History

.NET 10 Optimization Opportunities for NATS.Server