natsdotnet/optimizations.md

# .NET 10 Optimization Opportunities for `NATS.Server`

This document identifies the highest-value places in the current .NET port that are still leaving performance on the table relative to what modern .NET 10 can do well. The focus is runtime behavior in the current codebase, not generic style guidance.

The ranking is based on likely payoff in NATS workloads:

1. Protocol parsing and per-message delivery paths
2. Subscription matching and routing fanout
3. JetStream storage hot paths
4. Route, leaf, MQTT, and monitoring paths with avoidable allocation churn

Several areas already use `Span<T>`, `ReadOnlyMemory<byte>`, `SequenceReader<byte>`, and stack allocation correctly. The remaining gaps are mostly where the code falls back to `string`, `byte[]`, `List<T>`, `ToArray()`, LINQ, or repeated serialization work on hot paths.

## Detailed Implementation Plans

- [Parser span-retention plan](docs/plans/2026-03-13-optimizations_parser-plan.md)
- [SubList allocation-reduction plan](docs/plans/2026-03-13-optimizations_sublist-plan.md)
- [FileStore payload-and-index plan](docs/plans/2026-03-13-optimizations_filestore-plan.md)

## Highest ROI

### 1. Keep parser state in bytes/spans longer

- Files:
  - `src/NATS.Server/Protocol/NatsParser.cs`
  - `src/NATS.Server/NatsClient.cs`
- Current issue:
  - `NatsParser` tokenizes control lines with spans, but then converts subjects, reply subjects, queue names, SIDs, and JSON payloads into `string` and `byte[]` immediately.
  - `TryReadPayload()` always allocates a new `byte[]` and copies the payload, even when the underlying `ReadOnlySequence<byte>` is already usable.
  - `ParseConnect()` and `ParseInfo()` call `ToArray()` on the JSON portion.
- Why it matters:
  - This runs for every client protocol command.
  - The parser sits directly on the publish/subscribe hot path, so small per-command allocations scale badly under fan-in.
- Recommended optimization:
  - Introduce a split parsed representation:
    - a hot-path `ref struct` or `readonly struct` view carrying `ReadOnlySpan<byte>` / `ReadOnlySequence<byte>` slices for subject, reply, SID, queue, and payload
    - a slower materialization path only when code actually needs `string`
  - Store pending parser state as byte slices or pooled byte segments instead of `_pendingSubject` / `_pendingReplyTo` strings.
  - For single-segment payloads, hand through a `ReadOnlyMemory<byte>` slice rather than copying to a new array.
  - Only copy multi-segment payloads when required.
  - Use `SearchValues<byte>` for whitespace scanning and command detection instead of manual per-byte branching where it simplifies repeated searches.
- .NET 10 techniques:
  - `ref struct`
  - `ReadOnlySpan<byte>`
  - `ReadOnlySequence<byte>`
  - `SearchValues<byte>`
  - `Encoding.ASCII.GetString(ReadOnlySpan<byte>)` only at materialization boundaries
- Risk / complexity:
  - Medium to high. This touches command parsing contracts and downstream consumers.
  - Worth doing first because it reduces allocations before messages enter the rest of the server.

### 2. Remove string-heavy trie traversal in `SubList`

- Files:
  - `src/NATS.Server/Subscriptions/SubList.cs`
  - `src/NATS.Server/Subscriptions/SubjectMatch.cs`
- Current issue:
  - Insert/remove paths repeatedly call `token.ToString()`.
  - Routed subscription keys are synthesized as `"route|account|subject|queue"` strings and later split back with `Split('|')`.
  - Match path tokenization and cache population allocate arrays/lists and depend on string tokens.
  - `RemoveRemoteSubs()` and `RemoveRemoteSubsForAccount()` call `_remoteSubs.ToArray()` and re-parse keys on every sweep.
- Why it matters:
  - `SubList.Match()` is one of the most performance-sensitive operations in the server.
  - Remote interest tracking becomes more expensive as the route/leaf topology grows.
- Recommended optimization:
  - Replace composite routed-sub string keys with a dedicated value key:
    - `readonly record struct RoutedSubKey(string RouteId, string Account, string Subject, string? Queue)`
    - or a plain `readonly struct` with a custom comparer if profiling shows hash/comparison cost matters
  - Keep tokenized subjects in a pooled or cached token form for exact subjects.
  - Investigate a span-based token walker for matching so exact-subject lookups avoid `string[]` creation entirely.
  - Replace temporary `List<Subscription>` / `List<List<Subscription>>` creation in `Match()` with pooled builders or `ArrayBufferWriter<T>`.
  - For remote-sub cleanup, iterate dictionary entries without `ToArray()` and avoid `Split`.
- .NET 10 techniques:
  - `readonly struct` / `readonly record struct` for composite keys
  - `ReadOnlySpan<char>` token parsing
  - pooled builders via `ArrayPool<T>` or `ArrayBufferWriter<T>`
- Risk / complexity:
  - Medium. The data model change is straightforward; changing trie matching internals requires careful parity testing.

### 3. Eliminate avoidable message duplication in `FileStore`

- Files:
  - `src/NATS.Server/JetStream/Storage/FileStore.cs`
  - `src/NATS.Server/JetStream/Storage/MsgBlock.cs`
  - `src/NATS.Server/JetStream/Storage/StoredMessage.cs`
- Current issue:
  - `AppendAsync()` transforms payload for persistence and often also keeps another managed copy in `_messages`.
  - `StoreMsg()` creates a combined `byte[]` for headers + payload.
  - Many maintenance operations (`TrimToMaxMessages`, `PurgeEx`, `LoadLastBySubjectAsync`, `ListAsync`) use LINQ over `_messages.Values`, causing iterator allocations and repeated scans.
  - Snapshot creation base64-encodes transformed payloads, forcing extra copies.
- Why it matters:
  - JetStream storage code runs continuously under persistence-heavy workloads.
  - It is both allocation-sensitive and memory-residency-sensitive.
- Recommended optimization:
  - Split stored payload representation into:
    - persisted payload bytes
    - logical payload view
    - optional headers view
  - Avoid constructing concatenated header+payload arrays when the record format can encode both spans directly.
  - Rework `StoredMessage` so hot metadata stays compact; consider a smaller `readonly struct` for indexes/metadata while payload storage remains reference-based.
  - Replace LINQ scans in hot maintenance paths with explicit loops.
  - Add per-subject indexes or rolling pointers for operations currently implemented as full scans when those operations are expected to be common.
- .NET 10 techniques:
  - `ReadOnlyMemory<byte>` slices over shared buffers
  - `readonly struct` for compact metadata/index entries
  - explicit loops over LINQ in storage hot paths
  - `CollectionsMarshal` where safe for dictionary/list access in tight loops
- Risk / complexity:
  - High. This area needs careful correctness validation for retention, snapshots, and recovery.
  - High payoff for persistent streams.

### 4. Reduce formatting and copy overhead in route and leaf message sends

- Files:
  - `src/NATS.Server/Routes/RouteConnection.cs`
  - `src/NATS.Server/LeafNodes/LeafConnection.cs`
- Current issue:
  - Control lines are built with string interpolation, converted with `Encoding.ASCII.GetBytes`, then written separately from payload and trailer.
  - `"\r\n"u8.ToArray()` allocates for every send.
  - Batch protocol send methods build a `StringBuilder`, then allocate one big ASCII byte array.
- Why it matters:
  - Cluster routes and leaf nodes are high-throughput transport paths in real deployments.
  - This code is not as hot as client publish fanout, but it is hot enough to matter under clustered load.
- Recommended optimization:
  - Mirror the client path:
    - encode control lines into stackalloc or pooled byte buffers with span formatting
    - write control + payload + CRLF via scatter-gather (`ReadOnlyMemory<byte>[]`) or a reusable outbound buffer
  - Replace repeated CRLF arrays with a static `ReadOnlyMemory<byte>` / `ReadOnlySpan<byte>`.
  - For route sub protocol batches, encode directly into an `ArrayBufferWriter<byte>` instead of `StringBuilder` -> string -> bytes.
- .NET 10 techniques:
  - `string.Create` or direct span formatting into pooled buffers
  - `ArrayBufferWriter<byte>`
  - scatter-gather writes where transport permits
- Risk / complexity:
  - Medium. Mostly localized refactoring with low semantic risk.

## Medium ROI

### 5. Stop using LINQ-heavy materialization in monitoring endpoints

- Files:
  - `src/NATS.Server/Monitoring/ConnzHandler.cs`
  - `src/NATS.Server/Monitoring/SubszHandler.cs`
- Current issue:
  - Monitoring paths repeatedly call `ToArray()`, `Select()`, `Where()`, `OrderBy()`, `Skip()`, and `Take()`.
  - `SubszHandler` builds full subscription lists even when the request only needs counts.
  - `ConnzHandler` repeatedly rematerializes arrays while filtering and sorting.
- Why it matters:
  - Monitoring endpoints are not the publish hot path, but they can become disruptive on busy servers with many clients/subscriptions.
  - These allocations are easy to avoid.
- Recommended optimization:
  - Separate count-only and detail-request paths.
  - Use single-pass loops and pooled temporary lists.
  - Delay expensive subscription detail expansion until after paging when possible.
  - Consider returning immutable snapshots generated incrementally by the server for common monitor queries.
- .NET 10 techniques:
  - explicit loops
  - pooled arrays/lists
  - `CollectionsMarshal.AsSpan()` for internal list traversal where safe
- Risk / complexity:
  - Low to medium.

### 6. Modernize MQTT packet writing and text parsing

- Files:
  - `src/NATS.Server/Mqtt/MqttPacketWriter.cs`
  - `src/NATS.Server/Mqtt/MqttProtocolParser.cs`
- Current issue:
  - `MqttPacketWriter` returns fresh `byte[]` instances for every string/packet write.
  - Remaining-length encoding returns `scratch[..index].ToArray()`.
  - `ParseLine()` uses `Trim()`, `StartsWith()`, `Split()`, slicing, and string-based parsing throughout.
- Why it matters:
  - MQTT is a side protocol, so this is not the top optimization target.
  - Still worth fixing because the code is currently allocation-heavy and straightforward to improve.
- Recommended optimization:
  - Add `TryWrite...` APIs that write into caller-provided `Span<byte>` / `IBufferWriter<byte>`.
  - Keep remaining-length bytes on the stack and copy directly into the final destination buffer.
  - Rework `ParseLine()` to operate on `ReadOnlySpan<char>` and avoid `Split`.
- .NET 10 techniques:
  - `Span<byte>`
  - `IBufferWriter<byte>`
  - `ReadOnlySpan<char>`
  - `SearchValues<char>` for token separators if useful
- Risk / complexity:
  - Low.

### 7. Replace full scans over `_messages` with maintained indexes where operations are common

- Files:
  - `src/NATS.Server/JetStream/Storage/FileStore.cs`
- Current issue:
  - `LoadLastBySubjectAsync()` scans all messages, filters, sorts descending, and picks the first result.
  - `TrimToMaxMessages()` repeatedly calls `_messages.Keys.Min()`.
  - `PurgeEx()` materializes candidate lists before deletion.
- Why it matters:
  - These are algorithmic inefficiencies, not just allocation issues.
  - They become more visible as streams grow.
- Recommended optimization:
  - Maintain lightweight indexes:
    - last sequence by subject
    - first/last live sequence tracking without `Min()` / `Max()` scans
    - optionally per-subject linked or sorted sequence sets for purge/retention operations
  - If full indexing is too large a change, replace repeated LINQ scans with single-pass loops immediately.
- .NET 10 techniques:
  - compact metadata structs
  - tighter dictionary usage
  - fewer transient enumerators
- Risk / complexity:
  - Medium.

### 8. Reduce repeated small allocations in protocol constants and control frames

- Files:
  - `src/NATS.Server/Protocol/NatsParser.cs`
  - `src/NATS.Server/Routes/RouteConnection.cs`
  - `src/NATS.Server/LeafNodes/LeafConnection.cs`
  - other transport helpers
- Current issue:
  - Some constants are still materialized via `ToArray()` rather than held as static `byte[]` or `ReadOnlyMemory<byte>`.
  - Control frames repeatedly build temporary arrays for tiny literals.
- Why it matters:
  - These are cheap wins and remove noisy allocation churn.
- Recommended optimization:
  - Standardize on shared static byte literals for CRLF and fixed protocol tokens.
  - Audit for repeated `u8.ToArray()` or `Encoding.ASCII.GetBytes` on invariant text.
- .NET 10 techniques:
  - static cached buffers
  - span-based concatenation into reusable destinations
- Risk / complexity:
  - Low.

## Lower ROI Or Caution Areas

### 9. Be selective about introducing more structs

- Files:
  - cross-cutting
- Current issue:
  - Some parts of the code would benefit from value types, but others already contain references (`string`, `byte[]`, `ReadOnlyMemory<byte>`, dictionaries) where converting whole models to structs would increase copying and call-site complexity.
- Recommendation:
  - Good struct candidates:
    - composite dictionary keys
    - compact metadata/index entries
    - parser token views
    - queue or routing bookkeeping records
  - Poor struct candidates:
    - large mutable models
    - objects with many reference-type fields
    - stateful connection objects
- Why it matters:
  - “Use more structs” is only a win when the values are small, immutable, and heavily allocated.

### 10. Avoid premature replacement of already-good memory APIs

- Files:
  - `src/NATS.Server/NatsClient.cs`
  - `src/NATS.Server/IO/OutboundBufferPool.cs`
  - several JetStream codecs
- Current issue:
  - There has already been meaningful optimization work in direct write buffering and pooled outbound paths.
  - Replacing these with more exotic abstractions without profiling could regress behavior.
- Recommendation:
  - Prefer extending the current buffer-pool and direct-write patterns into routes, leaves, and parser payload handling before redesigning the client write path again.

## Suggested Implementation Order

1. `NatsParser` hot-path byte retention and reduced payload copying
2. `SubList` key/token allocation cleanup and remote-sub key redesign
3. Route/leaf outbound buffer encoding cleanup
4. `FileStore` hot-path de-LINQ and payload/index refactoring
5. Monitoring endpoint de-materialization
6. MQTT writer/parser span-based cleanup

## What To Measure Before And After

Use the benchmark project and targeted microbenchmarks to measure:

- allocations per `PUB`, `SUB`, `UNSUB`, `CONNECT`
- allocations per delivered message under fanout
- `SubList.Match()` throughput and allocations for:
  - exact subjects
  - wildcard subjects
  - queue subscriptions
  - remote interest present
- JetStream append throughput and bytes allocated per append
- route/leaf forwarded-message allocations
- monitoring endpoint allocations for large client/subscription sets

## Summary

The best remaining gains are not from sprinkling `Span<T>` everywhere. They come from carrying byte-oriented data further through the hot paths, removing composite-string bookkeeping, reducing duplicate payload ownership in JetStream storage, and eliminating materialization-heavy helper code around routing and monitoring.

If you only do three things, do these first:

1. Rework `NatsParser` to avoid early `string` / `byte[]` creation.
2. Replace `SubList` composite string keys and string-heavy token handling.
3. Refactor `FileStore` and route/leaf send paths to reduce duplicate buffers and transient formatting allocations.