docs: add optimization planning documents

2026-03-13 10:19:56 -04:00
parent fb0d31c615
commit a1fc600d84
4 changed files with 1172 additions and 0 deletions
--- a/docs/plans/2026-03-13-optimizations_filestore-plan.md
+++ b/docs/plans/2026-03-13-optimizations_filestore-plan.md
@@ -0,0 +1,331 @@
 # FileStore Payload And Index Optimization Implementation Plan
 > **For Codex:** REQUIRED SUB-SKILLS: Use `using-git-worktrees` to create an isolated workspace before Task 1, then use `executeplan` to implement this plan task-by-task. After verification is complete, merge the finished branch back into `main`.
 **Goal:** Reduce JetStream FileStore memory churn and repeated full scans by tightening payload ownership, splitting compact metadata from large payload buffers, and replacing LINQ-based maintenance work with explicit indexes and loops.
 **Architecture:** Start by freezing current behavior across `AppendAsync`, `StoreMsg`, retention, snapshots, and recovery. Then introduce compact metadata/index structures, remove avoidable duplicate payload buffers, replace repeated `_messages` scans with maintained indexes, and finish by updating recovery/snapshot paths plus benchmark coverage.
 **Tech Stack:** .NET 10, C#, JetStream storage stack, `ReadOnlyMemory<byte>`, pooled buffers where safe, xUnit, existing JetStream benchmark harness.
 ---
 ## Scope Anchors
 - Primary source: `src/NATS.Server/JetStream/Storage/FileStore.cs`
 - Supporting sources:
  - `src/NATS.Server/JetStream/Storage/MsgBlock.cs`
  - `src/NATS.Server/JetStream/Storage/StoredMessage.cs`
  - `src/NATS.Server/JetStream/Storage/MessageRecord.cs`
 - Existing contract tests:
  - `tests/NATS.Server.JetStream.Tests/StreamStoreContractTests.cs`
  - `tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs`
 - Existing FileStore coverage:
  - `tests/NATS.Server.JetStream.Tests/FileStoreTests.cs`
  - `tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs`
  - `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCompressionTests.cs`
  - `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cs`
  - `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs`
 - Documentation to update: `Documentation/JetStream/Overview.md`
 - Benchmark project: `tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj`
 - Benchmark comparison doc: `benchmarks_comparison.md`
 ## Task 0: Create an isolated git worktree and verify the baseline
 **Files:**
 - Modify: `.gitignore` only if the chosen local worktree directory is not already ignored
 **Step 1: Choose the worktree location using the repo convention**
 - Check for an existing `.worktrees/` directory first, then `worktrees/`.
 - If neither exists, check repo guidance before creating one.
 - Prefer a project-local `.worktrees/` directory when available.
 **Step 2: Verify the worktree directory is ignored before creating anything**
 - Run:
 ```bash
 git check-ignore -q .worktrees || git check-ignore -q worktrees
 ```
 - Expected: one configured worktree directory is ignored.
 - If neither directory is ignored, add the chosen directory to `.gitignore`, commit that change on `main`, and then continue.
 **Step 3: Create a dedicated branch and worktree for this plan**
 - Run:
 ```bash
 git worktree add .worktrees/filestore-payload-index-optimization -b codex/filestore-payload-index-optimization
 ```
 - Expected: a new isolated checkout exists at `.worktrees/filestore-payload-index-optimization`.
 **Step 4: Move into the worktree and verify the starting baseline**
 - Run:
 ```bash
 cd .worktrees/filestore-payload-index-optimization
 dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release
 ```
 - Expected: PASS before implementation starts.
 - If the baseline fails, stop and resolve whether to proceed before changing FileStore code.
 **Step 5: Commit only the worktree bootstrap change if one was required**
 - Run only if `.gitignore` had to change:
 ```bash
 git add .gitignore
 git commit -m "chore: ignore local worktree directory"
 ```
 ## Task 1: Freeze store behavior and add scan/ownership regression tests
 **Files:**
 - Modify: `tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs`
 - Modify: `tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs`
 - Create: `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs`
 **Step 1: Add failing tests for the targeted optimization boundaries**
 - Cover:
  - `AppendAsync` retaining logical payload behavior
  - `StoreMsg` with headers + payload
  - `LoadLastBySubjectAsync`
  - `TrimToMaxMessages`
  - `PurgeEx`
  - snapshot/recovery round-trips
 **Step 2: Add tests that lock first/last sequence bookkeeping**
 - Ensure `_firstSeq`, `_last`, and subject-last lookup behavior remain correct after removes, purges, compaction, and recovery.
 **Step 3: Run focused JetStream tests to prove the new tests fail first**
 - Run: `dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~FileStoreOptimizationGuardTests|FullyQualifiedName~JetStreamStoreIndexTests|FullyQualifiedName~StoreInterfaceTests" -c Release`
 - Expected: FAIL only in the newly added optimization-guard tests.
 **Step 4: Commit the failing-test baseline**
 - Run:
 ```bash
 git add tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs
 git commit -m "test: lock FileStore optimization boundaries"
 ```
 ## Task 2: Introduce compact metadata/index types and remove full-scan bookkeeping
 **Files:**
 - Create: `src/NATS.Server/JetStream/Storage/StoredMessageIndex.cs`
 - Modify: `src/NATS.Server/JetStream/Storage/FileStore.cs`
 - Modify: `src/NATS.Server/JetStream/Storage/StoredMessage.cs`
 **Step 1: Split compact indexing metadata from payload-bearing message objects**
 - Add a small immutable metadata/index type that tracks at least:
  - sequence
  - subject
  - logical payload length
  - timestamp
  - subject-local links or last-seen markers if needed
 **Step 2: Replace repeated `Min()` / `Max()` / full-value scans with maintained state**
 - Maintain first live sequence, last live sequence, and last-by-subject values incrementally rather than recomputing them with LINQ.
 **Step 3: Run targeted index tests**
 - Run: `dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~JetStreamStoreIndexTests|FullyQualifiedName~FileStoreOptimizationGuardTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit the metadata/index layer**
 - Run:
 ```bash
 git add src/NATS.Server/JetStream/Storage/StoredMessageIndex.cs src/NATS.Server/JetStream/Storage/FileStore.cs src/NATS.Server/JetStream/Storage/StoredMessage.cs
 git commit -m "perf: add compact FileStore index metadata"
 ```
 ## Task 3: Remove duplicate payload ownership in append and store paths
 **Files:**
 - Modify: `src/NATS.Server/JetStream/Storage/FileStore.cs`
 - Modify: `src/NATS.Server/JetStream/Storage/MsgBlock.cs`
 - Modify: `src/NATS.Server/JetStream/Storage/MessageRecord.cs`
 - Modify: `tests/NATS.Server.JetStream.Tests/FileStoreTests.cs`
 **Step 1: Rework `AppendAsync` and `StoreMsg` payload flow**
 - Stop eagerly keeping both a transformed persisted payload and a second fully duplicated managed payload when the same buffer/view can safely back both responsibilities.
 - Keep correctness for compression, encryption, and header-bearing records explicit.
 **Step 2: Remove concatenated header+payload arrays where possible**
 - Let record encoding paths consume header and payload spans directly instead of always building `combined = new byte[...]`.
 - Leave a copy in place only where the persistence or recovery contract actually requires one.
 **Step 3: Run targeted persistence tests**
 - Run: `dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~FileStoreTests|FullyQualifiedName~FileStoreCompressionTests|FullyQualifiedName~FileStoreEncryptionTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit the payload-ownership refactor**
 - Run:
 ```bash
 git add src/NATS.Server/JetStream/Storage/FileStore.cs src/NATS.Server/JetStream/Storage/MsgBlock.cs src/NATS.Server/JetStream/Storage/MessageRecord.cs tests/NATS.Server.JetStream.Tests/FileStoreTests.cs
 git commit -m "perf: reduce FileStore duplicate payload buffers"
 ```
 ## Task 4: Replace LINQ-heavy maintenance operations with explicit indexed paths
 **Files:**
 - Modify: `src/NATS.Server/JetStream/Storage/FileStore.cs`
 - Modify: `tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs`
 - Modify: `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cs`
 - Modify: `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs`
 **Step 1: Rewrite hot maintenance methods**
 - Replace LINQ-based implementations in:
  - `LoadLastBySubjectAsync`
  - `TrimToMaxMessages`
  - `PurgeEx`
  - snapshot/recovery recomputation paths
 - Use explicit loops and maintained indexes first; only add more elaborate per-subject structures if profiling still demands them.
 **Step 2: Preserve recovery and tombstone correctness**
 - Verify delete markers, TTL rebuilds, compaction, and sequence-gap handling still match the current parity tests.
 **Step 3: Run targeted JetStream storage suites**
 - Run: `dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~StoreInterfaceTests|FullyQualifiedName~FileStoreCrashRecoveryTests|FullyQualifiedName~FileStoreTombstoneTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit the maintenance-path rewrite**
 - Run:
 ```bash
 git add src/NATS.Server/JetStream/Storage/FileStore.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs
 git commit -m "perf: replace FileStore full scans with indexed loops"
 ```
 ## Task 5: Add benchmark coverage, update docs, and run full verification
 **Files:**
 - Create: `tests/NATS.Server.Benchmark.Tests/JetStream/FileStoreAppendBenchmarks.cs`
 - Modify: `Documentation/JetStream/Overview.md`
 **Step 1: Add FileStore-focused benchmarks**
 - Cover:
  - append throughput
  - sync publish
  - load-last-by-subject
  - purge/trim maintenance overhead
 - Record allocation deltas before/after.
 **Step 2: Update JetStream documentation**
 - Document how FileStore now separates metadata/index concerns from payload storage and where copies still remain by design.
 **Step 3: Run full verification**
 - Run: `dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~FileStore|FullyQualifiedName~SyncPublish|FullyQualifiedName~AsyncPublish" -c Release`
 - Expected: PASS; benchmark output shows fewer allocations in append-heavy scenarios.
 **Step 4: Commit docs and benchmarks**
 - Run:
 ```bash
 git add tests/NATS.Server.Benchmark.Tests/JetStream/FileStoreAppendBenchmarks.cs Documentation/JetStream/Overview.md
 git commit -m "docs: record FileStore payload and index strategy"
 ```
 ## Task 6: Merge the verified worktree branch back into `main`
 **Files:**
 - No source-file changes expected unless the merge surfaces conflicts that require a follow-up fix
 **Step 1: Confirm the worktree branch is clean and fully verified**
 - Re-run the Task 5 verification commands in the worktree if anything changed after the final commit.
 - Run:
 ```bash
 git status --short
 ```
 - Expected: no uncommitted changes.
 **Step 2: Update `main` before merging**
 - From the primary checkout, run:
 ```bash
 git switch main
 git pull --ff-only
 ```
 - Expected: local `main` matches the latest remote state.
 **Step 3: Merge the finished branch back to `main`**
 - Run:
 ```bash
 git merge --ff-only codex/filestore-payload-index-optimization
 ```
 - Expected: `main` fast-forwards to include the completed FileStore optimization commits.
 - If fast-forward is not possible, rebase `codex/filestore-payload-index-optimization` onto `main`, re-run verification, and then repeat this step.
 **Step 4: Confirm `main` still passes after the merge**
 - Run:
 ```bash
 dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release
 ```
 - Expected: PASS on `main`.
 **Step 5: Remove the temporary worktree after merge confirmation**
 - Run:
 ```bash
 git worktree remove .worktrees/filestore-payload-index-optimization
 git branch -d codex/filestore-payload-index-optimization
 ```
 - Expected: the temporary checkout is removed and the topic branch is no longer needed locally.
 ## Task 7: Run the benchmark suite per the benchmark README and update the comparison document
 **Files:**
 - Modify: `benchmarks_comparison.md`
 - Reference: `tests/NATS.Server.Benchmark.Tests/README.md`
 **Step 1: Run the full benchmark suite with the README-prescribed command**
 - From `main` after Task 6 succeeds, run:
 ```bash
 dotnet test tests/NATS.Server.Benchmark.Tests \
  --filter "Category=Benchmark" \
  -v normal \
  --logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
 ```
 - Expected: the benchmark suite completes and writes comparison blocks to `/tmp/bench-output.txt`.
 **Step 2: Extract the benchmark results from the captured output**
 - Review the `Standard Output Messages` sections in `/tmp/bench-output.txt`.
 - Capture the updated values for:
  - core pub/sub throughput
  - request/reply latency
  - JetStream sync publish
  - JetStream async file publish
  - ordered consumer throughput
  - durable consumer fetch throughput
 **Step 3: Update `benchmarks_comparison.md`**
 - Update:
  - the benchmark run date on the first line
  - environment details if they changed
  - all affected tables with the new msg/s, MB/s, ratio, and latency values
  - the Summary and Key Observations text if the new ratios materially change the assessment
 **Step 4: Verify the comparison document changes are the only remaining edits**
 - Run:
 ```bash
 git status --short
 ```
 - Expected: only `benchmarks_comparison.md` is modified at this point unless the benchmark run surfaced a legitimate follow-up issue to capture separately.
 **Step 5: Commit the benchmark comparison refresh**
 - Run:
 ```bash
 git add benchmarks_comparison.md
 git commit -m "docs: update benchmark comparison after FileStore optimization"
 ```
 ## Completion Checklist
 - [ ] Implementation started from an isolated git worktree on `codex/filestore-payload-index-optimization`.
 - [ ] `AppendAsync` and `StoreMsg` avoid unnecessary duplicate payload ownership.
 - [ ] `LoadLastBySubjectAsync`, `TrimToMaxMessages`, and `PurgeEx` no longer rely on repeated LINQ full scans.
 - [ ] First/last/live-sequence bookkeeping is maintained incrementally.
 - [ ] JetStream storage, recovery, compression, encryption, and tombstone tests remain green.
 - [ ] FileStore-focused benchmark coverage exists in `tests/NATS.Server.Benchmark.Tests/JetStream/`.
 - [ ] `Documentation/JetStream/Overview.md` explains the updated storage/index model.
 - [ ] Verified work has been merged back into `main` and the temporary worktree has been removed.
 - [ ] Full benchmark suite has been run from `main` using the command in `tests/NATS.Server.Benchmark.Tests/README.md`.
 - [ ] `benchmarks_comparison.md` has been updated to reflect the new benchmark results.
 ## Concise Execution Checklist For The Current Codebase
 - [ ] Create `codex/filestore-payload-index-optimization` in `.worktrees/filestore-payload-index-optimization` and verify `tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj` passes before changes.
 - [ ] Add optimization-guard coverage in `tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs`, `tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs`, and new `tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs`.
 - [ ] Rework the current FileStore hot paths in `src/NATS.Server/JetStream/Storage/FileStore.cs`: `AppendAsync`, `LoadLastBySubjectAsync`, `TrimToMaxMessages`, `StoreMsg`, and `PurgeEx`.
 - [ ] Introduce compact FileStore indexing metadata in new `src/NATS.Server/JetStream/Storage/StoredMessageIndex.cs` and adjust `src/NATS.Server/JetStream/Storage/StoredMessage.cs` accordingly.
 - [ ] Remove avoidable payload duplication in `src/NATS.Server/JetStream/Storage/FileStore.cs`, `src/NATS.Server/JetStream/Storage/MsgBlock.cs`, and `src/NATS.Server/JetStream/Storage/MessageRecord.cs`.
 - [ ] Keep JetStream storage parity green by re-running the existing storage-focused suites under `tests/NATS.Server.JetStream.Tests/JetStream/Storage/`, especially compression, crash recovery, tombstones, and store interface coverage.
 - [ ] Add FileStore benchmark coverage alongside the existing JetStream benchmark classes in `tests/NATS.Server.Benchmark.Tests/JetStream/`.
 - [ ] Update `Documentation/JetStream/Overview.md` to describe the new payload/index split and the remaining intentional copy boundaries.
 - [ ] Merge the verified topic branch back into `main`, re-run JetStream tests on `main`, then remove the temporary worktree.
 - [ ] Run the full benchmark suite exactly as documented in `tests/NATS.Server.Benchmark.Tests/README.md` and update `benchmarks_comparison.md` with the new measurements.
--- a/docs/plans/2026-03-13-optimizations_parser-plan.md
+++ b/docs/plans/2026-03-13-optimizations_parser-plan.md
@@ -0,0 +1,244 @@
 # Parser Span Retention Implementation Plan
 > **For Codex:** REQUIRED SUB-SKILLS: Use `using-git-worktrees` to create an isolated worktree before making changes, `executeplan` to implement this plan task-by-task, and `finishing-a-development-branch` to merge the verified work back to `main` when implementation is complete.
 **Goal:** Reduce parser hot-path allocations by keeping protocol fields and payloads in byte-oriented views until a caller explicitly needs materialized `string` or copied `byte[]` values.
 **Architecture:** Introduce a byte-first parser representation alongside the current `ParsedCommand` contract, then migrate `NatsClient` and adjacent hot paths to consume the new representation without changing wire behavior. Preserve compatibility through an adapter layer so functional parity stays stable while allocation-heavy paths move to spans, pooled buffers, and sequence slices.
 **Tech Stack:** .NET 10, C#, `System.Buffers`, `System.IO.Pipelines`, `ReadOnlySequence<byte>`, `SequenceReader<byte>`, xUnit, existing benchmark test harness.
 ---
 ## Scope Anchors
 - Primary source: `src/NATS.Server/Protocol/NatsParser.cs`
 - Primary consumer: `src/NATS.Server/NatsClient.cs`
 - Existing parser tests: `tests/NATS.Server.Core.Tests/ParserTests.cs`
 - Existing snippet/parity tests: `tests/NATS.Server.Core.Tests/Protocol/ProtocolParserSnippetGapParityTests.cs`
 - Documentation to update: `Documentation/Protocol/Parser.md`
 - Benchmark project: `tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj`
 - Benchmark run instructions: `tests/NATS.Server.Benchmark.Tests/README.md`
 - Benchmark comparison report: `benchmarks_comparison.md`
 ## Task 0: Create an isolated git worktree for the parser optimization work
 **Files:**
 - Verify: `.worktrees/`
 - Modify if needed: `.gitignore`
 **Step 1: Verify the preferred worktree directory is available and ignored**
 - Check that `.worktrees/` exists and is ignored by git before creating a project-local worktree.
 - If `.worktrees/` is not ignored, add it to `.gitignore`, then commit that repository hygiene fix before continuing.
 **Step 2: Create the feature worktree on a `codex/` branch**
 - Run:
 ```bash
 git worktree add .worktrees/codex-parser-span-retention -b codex/parser-span-retention
 cd .worktrees/codex-parser-span-retention
 ```
 - Expected: a new isolated worktree exists at `.worktrees/codex-parser-span-retention` on branch `codex/parser-span-retention`.
 **Step 3: Verify the worktree starts from a clean, passing baseline**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release`
 - Expected: PASS. If this fails, stop and resolve or explicitly confirm whether to proceed from a failing baseline.
 **Step 4: Commit any required worktree setup fix**
 - Only if `.gitignore` changed, run:
 ```bash
 git add .gitignore
 git commit -m "chore: ignore local worktree directories"
 ```
 ## Task 1: Freeze parser behavior and add allocation-focused tests
 **Files:**
 - Modify: `tests/NATS.Server.Core.Tests/ParserTests.cs`
 - Modify: `tests/NATS.Server.Core.Tests/Protocol/ProtocolParserSnippetGapParityTests.cs`
 - Create: `tests/NATS.Server.Core.Tests/Protocol/ParserSpanRetentionTests.cs`
 **Step 1: Add failing tests for byte-first parser behavior**
 - Cover `PUB`, `HPUB`, `CONNECT`, and `INFO` with assertions that the new parser path can expose field data without forcing immediate `string` materialization.
 - Add split-payload cases to prove the parser preserves pending payload state across reads.
 **Step 2: Add compatibility tests for existing `ParsedCommand` behavior**
 - Keep current semantics for `Type`, `Subject`, `ReplyTo`, `Queue`, `Sid`, `HeaderSize`, and `Payload`.
 - Ensure malformed protocol inputs still throw `ProtocolViolationException` with existing snippets/messages.
 **Step 3: Run targeted tests to verify the new tests fail first**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~ParserTests|FullyQualifiedName~ParserSpanRetentionTests|FullyQualifiedName~ProtocolParserSnippetGapParityTests" -c Release`
 - Expected: FAIL in the newly added parser span-retention tests only.
 **Step 4: Commit the failing-test baseline**
 - Run:
 ```bash
 git add tests/NATS.Server.Core.Tests/ParserTests.cs tests/NATS.Server.Core.Tests/Protocol/ProtocolParserSnippetGapParityTests.cs tests/NATS.Server.Core.Tests/Protocol/ParserSpanRetentionTests.cs
 git commit -m "test: lock parser span-retention behavior"
 ```
 ## Task 2: Introduce byte-oriented parser view types
 **Files:**
 - Create: `src/NATS.Server/Protocol/ParsedCommandView.cs`
 - Modify: `src/NATS.Server/Protocol/NatsParser.cs`
 **Step 1: Add a hot-path parser view contract**
 - Create a `ref struct` or small `readonly struct` representation for command views that can carry:
  - operation kind
  - subject/reply/queue/SID as spans or sequence-backed views
  - payload as `ReadOnlySequence<byte>` or `ReadOnlyMemory<byte>` when contiguous
  - header size and max-messages metadata
 **Step 2: Add an adapter to the current `ParsedCommand` shape**
 - Keep the public/internal `ParsedCommand` entry point usable for existing consumers and tests.
 - Centralize materialization so `Encoding.ASCII.GetString(...)` and `ToArray()` happen in one adapter layer instead of inside every parse branch.
 **Step 3: Re-run parser tests**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~Parser" -c Release`
 - Expected: FAIL only in branches not yet migrated to the new view path.
 **Step 4: Commit the parser-view scaffolding**
 - Run:
 ```bash
 git add src/NATS.Server/Protocol/ParsedCommandView.cs src/NATS.Server/Protocol/NatsParser.cs
 git commit -m "feat: add byte-oriented parser view contract"
 ```
 ## Task 3: Rework control-line parsing and pending payload state
 **Files:**
 - Modify: `src/NATS.Server/Protocol/NatsParser.cs`
 **Step 1: Remove early string materialization from control-line parsing**
 - Change `ParsePub`, `ParseHPub`, `ParseSub`, `ParseUnsub`, `ParseConnect`, and `ParseInfo` to keep raw byte slices in the hot parser path.
 - Replace `_pendingSubject` and `_pendingReplyTo` string fields with byte-oriented pending state.
 **Step 2: Avoid unconditional payload copies**
 - Update `TryReadPayload()` so single-segment payloads can flow through as borrowed memory/slices.
 - Copy only when the payload is multi-segment or when the compatibility adapter explicitly requires a standalone buffer.
 **Step 3: Replace repeated tiny literal allocations**
 - Stop using per-call `u8.ToArray()`-style buffers for CRLF and other fixed protocol tokens inside this parser path.
 - Add shared static buffers where appropriate.
 **Step 4: Run targeted regression tests**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~ParserTests|FullyQualifiedName~ProtocolParserSnippetGapParityTests" -c Release`
 - Expected: PASS.
 **Step 5: Commit the parser hot-path rewrite**
 - Run:
 ```bash
 git add src/NATS.Server/Protocol/NatsParser.cs
 git commit -m "perf: keep parser state in bytes until materialization"
 ```
 ## Task 4: Migrate `NatsClient` to the new parser path without changing behavior
 **Files:**
 - Modify: `src/NATS.Server/NatsClient.cs`
 - Modify: `tests/NATS.Server.Core.Tests/ParserTests.cs`
 - Modify: `tests/NATS.Server.Core.Tests/Protocol/ClientProtocolGoParityTests.cs`
 **Step 1: Consume parser views first, materialize only at command handling boundaries**
 - Update `ProcessCommandsAsync` and any parser call sites so hot `PUB`/`HPUB` handling can read subject, reply, and payload from the byte-oriented representation.
 - Keep logging/tracing behavior intact, but ensure tracing is the only reason strings are created on trace-enabled paths.
 **Step 2: Preserve feature parity**
 - Verify header parsing, payload size checks, connect/info handling, and slow-consumer/error behavior still match current tests.
 **Step 3: Run consumer-facing protocol tests**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~ClientProtocolGoParityTests|FullyQualifiedName~ParserTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit the consumer migration**
 - Run:
 ```bash
 git add src/NATS.Server/NatsClient.cs tests/NATS.Server.Core.Tests/ParserTests.cs tests/NATS.Server.Core.Tests/Protocol/ClientProtocolGoParityTests.cs
 git commit -m "perf: consume parser command views in client hot path"
 ```
 ## Task 5: Add benchmarks, document the change, run full verification, and refresh the benchmark comparison report
 **Files:**
 - Create: `tests/NATS.Server.Benchmark.Tests/Protocol/ParserHotPathBenchmarks.cs`
 - Modify: `Documentation/Protocol/Parser.md`
 - Modify: `benchmarks_comparison.md`
 **Step 1: Add parser-focused benchmark coverage**
 - Add microbenchmarks for:
  - `PING` / `PONG`
  - `PUB`
  - `HPUB`
  - split payload reads
 - Capture throughput and allocation deltas before/after.
 **Step 2: Update protocol documentation**
 - Document the new parser view + adapter split, why strings are deferred, and where payload copying is still intentionally required.
 **Step 3: Run full verification**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~Parser" -c Release`
 - Expected: PASS; benchmark output shows reduced allocations relative to the baseline run.
 **Step 4: Run the full benchmark suite per the benchmark project README**
 - Run:
 ```bash
 dotnet test tests/NATS.Server.Benchmark.Tests \
  --filter "Category=Benchmark" \
  -v normal \
  --logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
 ```
 - Expected: benchmark comparison output is captured in `/tmp/bench-output.txt`, including the "Standard Output Messages" blocks described in `tests/NATS.Server.Benchmark.Tests/README.md`.
 **Step 5: Update `benchmarks_comparison.md` with the new benchmark results**
 - Extract the comparison blocks from `/tmp/bench-output.txt`.
 - Update `benchmarks_comparison.md` with the latest msg/s, MB/s, ratio, and latency values.
 - Update the benchmark date, environment description, Summary table, and Key Observations so they match the new run.
 **Step 6: Commit the verification, docs, and benchmark report refresh**
 - Run:
 ```bash
 git add tests/NATS.Server.Benchmark.Tests/Protocol/ParserHotPathBenchmarks.cs Documentation/Protocol/Parser.md benchmarks_comparison.md
 git commit -m "docs: record parser hot-path allocation strategy"
 ```
 ## Task 6: Merge the verified parser work back to `main` and clean up the worktree
 **Files:**
 - No source changes expected
 **Step 1: Confirm the feature branch is fully verified before merge**
 - Reuse the verification from Task 5. Do not merge if the core tests, parser benchmark tests, or full benchmark suite run did not complete successfully.
 **Step 2: Merge `codex/parser-span-retention` back into `main`**
 - Return to the primary repository worktree and run:
 ```bash
 git checkout main
 git pull
 git merge codex/parser-span-retention
 ```
 - Expected: the parser optimization commits merge cleanly into `main`.
 **Step 3: Re-run verification on the merged `main` branch**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~Parser" -c Release`
 - Confirm `benchmarks_comparison.md` reflects the results captured from the Task 5 full benchmark suite.
 - Expected: PASS on merged `main`.
 **Step 4: Delete the feature branch and remove the worktree**
 - Run:
 ```bash
 git branch -d codex/parser-span-retention
 git worktree remove .worktrees/codex-parser-span-retention
 ```
 - Expected: only `main` remains checked out in the primary workspace, and the temporary parser worktree is removed.
 ## Completion Checklist
 - [ ] Parser optimization work was implemented in an isolated `.worktrees/codex-parser-span-retention` worktree on branch `codex/parser-span-retention`.
 - [ ] `NatsParser` no longer materializes hot-path `string` values during parse unless the compatibility adapter requests them.
 - [ ] Single-segment payloads can pass through without an unconditional `byte[]` copy.
 - [ ] Existing parser and protocol behavior remains green in core tests.
 - [ ] Parser-focused benchmark coverage exists in `tests/NATS.Server.Benchmark.Tests/Protocol/`.
 - [ ] The full benchmark suite was run using the workflow from `tests/NATS.Server.Benchmark.Tests/README.md`.
 - [ ] `benchmarks_comparison.md` was updated to reflect the latest benchmark run.
 - [ ] `Documentation/Protocol/Parser.md` explains the byte-first parser architecture.
 - [ ] Verified parser optimization commits were merged back into `main`.
--- a/docs/plans/2026-03-13-optimizations_sublist-plan.md
+++ b/docs/plans/2026-03-13-optimizations_sublist-plan.md
@@ -0,0 +1,300 @@
 # SubList Allocation Reduction Implementation Plan
 > **For Codex:** REQUIRED SUB-SKILLS: Use `using-git-worktrees` to create an isolated worktree before making changes, `executeplan` to implement this plan task-by-task, and `finishing-a-development-branch` to merge the verified work back to `main` when implementation is complete.
 **Goal:** Reduce publish-path allocation and lookup overhead in `SubList` by removing composite string keys, minimizing `token.ToString()` churn, and tightening `Match()` result building without changing subscription semantics.
 **Architecture:** First lock the current trie, cache, and remote-interest behavior with targeted tests. Then replace routed-sub bookkeeping with a dedicated value key, remove string split/rebuild work from remote cleanup, and finally optimize trie traversal and match result construction with span-friendly helpers and pooled builders.
 **Tech Stack:** .NET 10, C#, `ReaderWriterLockSlim`, span-based token parsing, xUnit, existing clustering/gateway parity suites, benchmark test harness.
 ---
 ## Scope Anchors
 - Primary source: `src/NATS.Server/Subscriptions/SubList.cs`
 - Supporting source: `src/NATS.Server/Subscriptions/SubjectMatch.cs`
 - Existing core tests: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
 - Existing ctor/notification tests: `tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs`
 - Route cleanup tests: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
 - Gateway/route interest tests:
  - `tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs`
  - `tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs`
  - `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
 - Documentation to update: `Documentation/Subscriptions/SubList.md`
 - Benchmark workflow reference: `tests/NATS.Server.Benchmark.Tests/README.md`
 - Benchmark comparison document: `benchmarks_comparison.md`
 ## Task 0: Create an isolated git worktree for the SubList optimization work
 **Files:**
 - Verify: `.worktrees/`
 - Modify if needed: `.gitignore`
 **Step 1: Verify the preferred worktree directory is available and ignored**
 - Check that `.worktrees/` exists and is ignored by git before creating a project-local worktree.
 - If `.worktrees/` is not ignored, add it to `.gitignore`, then commit that repository hygiene fix before continuing.
 **Step 2: Create the feature worktree on a `codex/` branch**
 - Run:
 ```bash
 git worktree add .worktrees/codex-sublist-allocation-reduction -b codex/sublist-allocation-reduction
 cd .worktrees/codex-sublist-allocation-reduction
 ```
 - Expected: a new isolated worktree exists at `.worktrees/codex-sublist-allocation-reduction` on branch `codex/sublist-allocation-reduction`.
 **Step 3: Verify the worktree starts from a clean, passing baseline**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
 - Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
 - Expected: PASS. If this fails, stop and resolve or explicitly confirm whether to proceed from a failing baseline.
 **Step 4: Commit any required worktree setup fix**
 - Only if `.gitignore` changed, run:
 ```bash
 git add .gitignore
 git commit -m "chore: ignore local worktree directories"
 ```
 ## Task 1: Lock behavior around remote interest, cleanup, and matching
 **Files:**
 - Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
 - Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
 - Create: `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`
 **Step 1: Add failing tests for routed-sub bookkeeping changes**
 - Cover:
  - applying the same remote subscription twice
  - removing remote subscriptions by route and by route/account
  - queue-weight updates
  - exact and wildcard remote-interest queries
 **Step 2: Add tests for match result stability**
 - Ensure `Match()` still returns correct plain and queue subscription sets for exact, `*`, and `>` subjects.
 - Add a test that specifically locks cache behavior across generation bumps.
 **Step 3: Run the focused tests to prove the new coverage fails first**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release`
 - Expected: FAIL only in the newly added allocation-guard or key-behavior tests.
 **Step 4: Commit the failing-test baseline**
 - Run:
 ```bash
 git add tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
 git commit -m "test: lock SubList remote-key and match behavior"
 ```
 ## Task 2: Replace composite routed-sub strings with a dedicated value key
 **Files:**
 - Create: `src/NATS.Server/Subscriptions/RoutedSubKey.cs`
 - Modify: `src/NATS.Server/Subscriptions/SubList.cs`
 - Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
 **Step 1: Introduce a strongly typed routed-sub key**
 - Add a small immutable value type for `(RouteId, Account, Subject, Queue)`.
 - Use it as the dictionary key for `_remoteSubs` instead of the `"route|account|subject|queue"` composite string.
 **Step 2: Remove string split/rebuild helpers from hot paths**
 - Replace `BuildRoutedSubKey(...)`, `GetAccNameFromRoutedSubKey(...)`, and `GetRoutedSubKeyInfo(...)` usage in runtime paths with the new value key.
 - Keep compatibility helper coverage only if other call sites still require string-facing helpers temporarily.
 **Step 3: Run remote-interest tests**
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
 - Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit the key-model refactor**
 - Run:
 ```bash
 git add src/NATS.Server/Subscriptions/RoutedSubKey.cs src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
 git commit -m "perf: replace SubList routed-sub string keys"
 ```
 ## Task 3: Remove avoidable string churn from trie traversal
 **Files:**
 - Modify: `src/NATS.Server/Subscriptions/SubList.cs`
 - Modify: `src/NATS.Server/Subscriptions/SubjectMatch.cs`
 - Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
 **Step 1: Rework token traversal helpers**
 - Add a shared subject token walker that can expose tokens as spans and only allocate when a trie node insertion truly needs a durable string key.
 - Remove repeated `token.ToString()` in traversal paths where lookups can operate on a transient token view first.
 **Step 2: Keep exact-subject match paths allocation-lean**
 - Prefer span/token comparison helpers for exact-match and wildcard traversal logic.
 - Leave wildcard semantics unchanged.
 **Step 3: Run core subscription tests**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubListGoParityTests|FullyQualifiedName~SubListCtorAndNotificationParityTests|FullyQualifiedName~SubListParityBatch2Tests" -c Release`
 - Expected: PASS.
 **Step 4: Commit trie traversal cleanup**
 - Run:
 ```bash
 git add src/NATS.Server/Subscriptions/SubList.cs src/NATS.Server/Subscriptions/SubjectMatch.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
 git commit -m "perf: reduce SubList token string churn"
 ```
 ## Task 4: Pool `Match()` result building and remove cleanup copies
 **Files:**
 - Modify: `src/NATS.Server/Subscriptions/SubList.cs`
 - Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`
 - Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
 **Step 1: Replace temporary per-match collections**
 - Rework `Match()` so temporary result building uses pooled builders or `ArrayBufferWriter<T>` instead of fresh nested `List<T>` allocations on every call.
 - Preserve the current public `SubListResult` shape unless profiling proves a larger contract change is justified.
 **Step 2: Remove `ToArray()` cleanup passes over `_remoteSubs`**
 - Update `RemoveRemoteSubs(...)` and `RemoveRemoteSubsForAccount(...)` to avoid eager dictionary array copies.
 - Ensure removal remains correct under the existing lock discipline.
 **Step 3: Run cross-module regression**
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteSubscriptionTests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release`
 - Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
 - Expected: PASS.
 **Step 4: Commit match-builder changes**
 - Run:
 ```bash
 git add src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs
 git commit -m "perf: pool SubList match builders and cleanup scans"
 ```
 ## Task 5: Add benchmark coverage, update docs, and run full verification
 **Files:**
 - Create: `tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs`
 - Modify: `Documentation/Subscriptions/SubList.md`
 **Step 1: Add focused `SubList` benchmarks**
 - Measure exact-match, wildcard-match, queue-sub, and remote-interest scenarios.
 - Capture throughput and allocations before/after the refactor.
 **Step 2: Update subscription documentation**
 - Document the new routed-sub key model, the allocation strategy for trie matching, and any remaining intentional copies.
 **Step 3: Run full verification**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj -c Release`
 - Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
 - Expected: PASS; benchmark output shows reduced per-match allocations.
 **Step 4: Commit the documentation and benchmark work**
 - Run:
 ```bash
 git add tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs Documentation/Subscriptions/SubList.md
 git commit -m "docs: record SubList allocation strategy"
 ```
 ## Task 6: Merge the verified SubList work back to `main` and clean up the worktree
 **Files:**
 - No source changes expected
 **Step 1: Confirm the feature branch is fully verified before merge**
 - Reuse the verification from Task 5. Do not merge if the core, clustering, gateway, or benchmark test commands are failing.
 **Step 2: Merge `codex/sublist-allocation-reduction` back into `main`**
 - Return to the primary repository worktree and run:
 ```bash
 git checkout main
 git pull
 git merge codex/sublist-allocation-reduction
 ```
 - Expected: the SubList optimization commits merge cleanly into `main`.
 **Step 3: Re-run verification on the merged `main` branch**
 - Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
 - Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
 - Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
 - Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
 - Expected: PASS on merged `main`.
 **Step 4: Delete the feature branch and remove the worktree**
 - Run:
 ```bash
 git branch -d codex/sublist-allocation-reduction
 git worktree remove .worktrees/codex-sublist-allocation-reduction
 ```
 - Expected: only `main` remains checked out in the primary workspace, and the temporary SubList worktree is removed.
 ## Task 7: Run the full benchmark suite per the benchmark README and update `benchmarks_comparison.md`
 **Files:**
 - Verify workflow against: `tests/NATS.Server.Benchmark.Tests/README.md`
 - Modify: `benchmarks_comparison.md`
 **Step 1: Run the benchmark test project using the repository-documented command**
 - From the primary repository worktree on verified `main`, run:
 ```bash
 dotnet test tests/NATS.Server.Benchmark.Tests \
  --filter "Category=Benchmark" \
  -v normal \
  --logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
 ```
 - Expected: the full benchmark suite completes and writes detailed comparison output to `/tmp/bench-output.txt`.
 **Step 2: Extract the benchmark comparison blocks from the captured output**
 - Open `/tmp/bench-output.txt` and pull the side-by-side comparison blocks from the `Standard Output Messages` sections for:
  - core pub-only
  - core 1:1 pub/sub
  - core fan-out
  - core multi pub/sub
  - request/reply
  - JetStream publish
  - JetStream consumption
 **Step 3: Update `benchmarks_comparison.md`**
 - Refresh:
  - the benchmark run date on the first line
  - the environment description if toolchain or machine details changed
  - throughput, MB/s, ratio, and latency values in the benchmark tables
  - summary assessments and key observations if the ratios materially changed
 - Keep the narrative tied to measured results from `/tmp/bench-output.txt`; do not preserve stale claims that no longer match the numbers.
 **Step 4: Commit the benchmark comparison refresh**
 - Run:
 ```bash
 git add benchmarks_comparison.md
 git commit -m "docs: refresh benchmark comparison after SubList optimization"
 ```
 - Expected: `main` contains the merged SubList optimization work plus the refreshed benchmark comparison document.
 ## Completion Checklist
 - [ ] SubList optimization work was implemented in an isolated `.worktrees/codex-sublist-allocation-reduction` worktree on branch `codex/sublist-allocation-reduction`.
 - [ ] `_remoteSubs` no longer uses composite string keys in runtime paths.
 - [ ] Remote cleanup paths no longer depend on `Split('|')` and `_remoteSubs.ToArray()`.
 - [ ] Trie traversal materially reduces `token.ToString()` churn.
 - [ ] `Match()` uses pooled or allocation-lean temporary builders.
 - [ ] Core, clustering, and gateway parity tests remain green.
 - [ ] `Documentation/Subscriptions/SubList.md` explains the new key and match strategy.
 - [ ] Verified SubList optimization commits were merged back into `main`.
 - [ ] The full benchmark test project was run on merged `main` per `tests/NATS.Server.Benchmark.Tests/README.md`.
 - [ ] `benchmarks_comparison.md` was updated to match the latest benchmark output.
 ## Concise Execution Checklist (Current Codebase)
 - [ ] Start from the current repo root and leave unrelated MQTT worktree changes untouched.
 - [ ] Create `.worktrees/codex-sublist-allocation-reduction` on `codex/sublist-allocation-reduction`.
 - [ ] Verify the current SubList baseline with:
  - `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
  - `tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs`
  - `tests/NATS.Server.Core.Tests/Subscriptions/SubListParityBatch2Tests.cs`
  - `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
  - `tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs`
  - `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
  - `tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs`
  - `tests/NATS.Server.Gateways.Tests/Gateways/GatewayForwardingTests.cs`
 - [ ] Add new guard coverage in `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`.
 - [ ] Refactor `src/NATS.Server/Subscriptions/SubList.cs` to use a new `src/NATS.Server/Subscriptions/RoutedSubKey.cs`.
 - [ ] Reduce token string churn in `src/NATS.Server/Subscriptions/SubList.cs` and `src/NATS.Server/Subscriptions/SubjectMatch.cs`.
 - [ ] Add a new benchmark file at `tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs` beside the existing CorePubSub benchmarks.
 - [ ] Update `Documentation/Subscriptions/SubList.md`.
 - [ ] Run full verification for core, clustering, gateway, and focused SubList benchmark tests.
 - [ ] Merge `codex/sublist-allocation-reduction` into `main` and re-run the merged verification.
 - [ ] Run the full benchmark suite using `tests/NATS.Server.Benchmark.Tests/README.md` guidance and refresh `benchmarks_comparison.md`.
--- a/optimizations.md
+++ b/optimizations.md
@@ -0,0 +1,297 @@
 # .NET 10 Optimization Opportunities for `NATS.Server`
 This document identifies the highest-value places in the current .NET port that are still leaving performance on the table relative to what modern .NET 10 can do well. The focus is runtime behavior in the current codebase, not generic style guidance.
 The ranking is based on likely payoff in NATS workloads:
 1. Protocol parsing and per-message delivery paths
 2. Subscription matching and routing fanout
 3. JetStream storage hot paths
 4. Route, leaf, MQTT, and monitoring paths with avoidable allocation churn
 Several areas already use `Span<T>`, `ReadOnlyMemory<byte>`, `SequenceReader<byte>`, and stack allocation correctly. The remaining gaps are mostly where the code falls back to `string`, `byte[]`, `List<T>`, `ToArray()`, LINQ, or repeated serialization work on hot paths.
 ## Detailed Implementation Plans
 - [Parser span-retention plan](docs/plans/2026-03-13-optimizations_parser-plan.md)
 - [SubList allocation-reduction plan](docs/plans/2026-03-13-optimizations_sublist-plan.md)
 - [FileStore payload-and-index plan](docs/plans/2026-03-13-optimizations_filestore-plan.md)
 ## Highest ROI
 ### 1. Keep parser state in bytes/spans longer
 - Files:
  - `src/NATS.Server/Protocol/NatsParser.cs`
  - `src/NATS.Server/NatsClient.cs`
 - Current issue:
  - `NatsParser` tokenizes control lines with spans, but then converts subjects, reply subjects, queue names, SIDs, and JSON payloads into `string` and `byte[]` immediately.
  - `TryReadPayload()` always allocates a new `byte[]` and copies the payload, even when the underlying `ReadOnlySequence<byte>` is already usable.
  - `ParseConnect()` and `ParseInfo()` call `ToArray()` on the JSON portion.
 - Why it matters:
  - This runs for every client protocol command.
  - The parser sits directly on the publish/subscribe hot path, so small per-command allocations scale badly under fan-in.
 - Recommended optimization:
  - Introduce a split parsed representation:
    - a hot-path `ref struct` or `readonly struct` view carrying `ReadOnlySpan<byte>` / `ReadOnlySequence<byte>` slices for subject, reply, SID, queue, and payload
    - a slower materialization path only when code actually needs `string`
  - Store pending parser state as byte slices or pooled byte segments instead of `_pendingSubject` / `_pendingReplyTo` strings.
  - For single-segment payloads, hand through a `ReadOnlyMemory<byte>` slice rather than copying to a new array.
  - Only copy multi-segment payloads when required.
  - Use `SearchValues<byte>` for whitespace scanning and command detection instead of manual per-byte branching where it simplifies repeated searches.
 - .NET 10 techniques:
  - `ref struct`
  - `ReadOnlySpan<byte>`
  - `ReadOnlySequence<byte>`
  - `SearchValues<byte>`
  - `Encoding.ASCII.GetString(ReadOnlySpan<byte>)` only at materialization boundaries
 - Risk / complexity:
  - Medium to high. This touches command parsing contracts and downstream consumers.
  - Worth doing first because it reduces allocations before messages enter the rest of the server.
 ### 2. Remove string-heavy trie traversal in `SubList`
 - Files:
  - `src/NATS.Server/Subscriptions/SubList.cs`
  - `src/NATS.Server/Subscriptions/SubjectMatch.cs`
 - Current issue:
  - Insert/remove paths repeatedly call `token.ToString()`.
  - Routed subscription keys are synthesized as `"route|account|subject|queue"` strings and later split back with `Split('|')`.
  - Match path tokenization and cache population allocate arrays/lists and depend on string tokens.
  - `RemoveRemoteSubs()` and `RemoveRemoteSubsForAccount()` call `_remoteSubs.ToArray()` and re-parse keys on every sweep.
 - Why it matters:
  - `SubList.Match()` is one of the most performance-sensitive operations in the server.
  - Remote interest tracking becomes more expensive as the route/leaf topology grows.
 - Recommended optimization:
  - Replace composite routed-sub string keys with a dedicated value key:
    - `readonly record struct RoutedSubKey(string RouteId, string Account, string Subject, string? Queue)`
    - or a plain `readonly struct` with a custom comparer if profiling shows hash/comparison cost matters
  - Keep tokenized subjects in a pooled or cached token form for exact subjects.
  - Investigate a span-based token walker for matching so exact-subject lookups avoid `string[]` creation entirely.
  - Replace temporary `List<Subscription>` / `List<List<Subscription>>` creation in `Match()` with pooled builders or `ArrayBufferWriter<T>`.
  - For remote-sub cleanup, iterate dictionary entries without `ToArray()` and avoid `Split`.
 - .NET 10 techniques:
  - `readonly struct` / `readonly record struct` for composite keys
  - `ReadOnlySpan<char>` token parsing
  - pooled builders via `ArrayPool<T>` or `ArrayBufferWriter<T>`
 - Risk / complexity:
  - Medium. The data model change is straightforward; changing trie matching internals requires careful parity testing.
 ### 3. Eliminate avoidable message duplication in `FileStore`
 - Files:
  - `src/NATS.Server/JetStream/Storage/FileStore.cs`
  - `src/NATS.Server/JetStream/Storage/MsgBlock.cs`
  - `src/NATS.Server/JetStream/Storage/StoredMessage.cs`
 - Current issue:
  - `AppendAsync()` transforms payload for persistence and often also keeps another managed copy in `_messages`.
  - `StoreMsg()` creates a combined `byte[]` for headers + payload.
  - Many maintenance operations (`TrimToMaxMessages`, `PurgeEx`, `LoadLastBySubjectAsync`, `ListAsync`) use LINQ over `_messages.Values`, causing iterator allocations and repeated scans.
  - Snapshot creation base64-encodes transformed payloads, forcing extra copies.
 - Why it matters:
  - JetStream storage code runs continuously under persistence-heavy workloads.
  - It is both allocation-sensitive and memory-residency-sensitive.
 - Recommended optimization:
  - Split stored payload representation into:
    - persisted payload bytes
    - logical payload view
    - optional headers view
  - Avoid constructing concatenated header+payload arrays when the record format can encode both spans directly.
  - Rework `StoredMessage` so hot metadata stays compact; consider a smaller `readonly struct` for indexes/metadata while payload storage remains reference-based.
  - Replace LINQ scans in hot maintenance paths with explicit loops.
  - Add per-subject indexes or rolling pointers for operations currently implemented as full scans when those operations are expected to be common.
 - .NET 10 techniques:
  - `ReadOnlyMemory<byte>` slices over shared buffers
  - `readonly struct` for compact metadata/index entries
  - explicit loops over LINQ in storage hot paths
  - `CollectionsMarshal` where safe for dictionary/list access in tight loops
 - Risk / complexity:
  - High. This area needs careful correctness validation for retention, snapshots, and recovery.
  - High payoff for persistent streams.
 ### 4. Reduce formatting and copy overhead in route and leaf message sends
 - Files:
  - `src/NATS.Server/Routes/RouteConnection.cs`
  - `src/NATS.Server/LeafNodes/LeafConnection.cs`
 - Current issue:
  - Control lines are built with string interpolation, converted with `Encoding.ASCII.GetBytes`, then written separately from payload and trailer.
  - `"\r\n"u8.ToArray()` allocates for every send.
  - Batch protocol send methods build a `StringBuilder`, then allocate one big ASCII byte array.
 - Why it matters:
  - Cluster routes and leaf nodes are high-throughput transport paths in real deployments.
  - This code is not as hot as client publish fanout, but it is hot enough to matter under clustered load.
 - Recommended optimization:
  - Mirror the client path:
    - encode control lines into stackalloc or pooled byte buffers with span formatting
    - write control + payload + CRLF via scatter-gather (`ReadOnlyMemory<byte>[]`) or a reusable outbound buffer
  - Replace repeated CRLF arrays with a static `ReadOnlyMemory<byte>` / `ReadOnlySpan<byte>`.
  - For route sub protocol batches, encode directly into an `ArrayBufferWriter<byte>` instead of `StringBuilder` -> string -> bytes.
 - .NET 10 techniques:
  - `string.Create` or direct span formatting into pooled buffers
  - `ArrayBufferWriter<byte>`
  - scatter-gather writes where transport permits
 - Risk / complexity:
  - Medium. Mostly localized refactoring with low semantic risk.
 ## Medium ROI
 ### 5. Stop using LINQ-heavy materialization in monitoring endpoints
 - Files:
  - `src/NATS.Server/Monitoring/ConnzHandler.cs`
  - `src/NATS.Server/Monitoring/SubszHandler.cs`
 - Current issue:
  - Monitoring paths repeatedly call `ToArray()`, `Select()`, `Where()`, `OrderBy()`, `Skip()`, and `Take()`.
  - `SubszHandler` builds full subscription lists even when the request only needs counts.
  - `ConnzHandler` repeatedly rematerializes arrays while filtering and sorting.
 - Why it matters:
  - Monitoring endpoints are not the publish hot path, but they can become disruptive on busy servers with many clients/subscriptions.
  - These allocations are easy to avoid.
 - Recommended optimization:
  - Separate count-only and detail-request paths.
  - Use single-pass loops and pooled temporary lists.
  - Delay expensive subscription detail expansion until after paging when possible.
  - Consider returning immutable snapshots generated incrementally by the server for common monitor queries.
 - .NET 10 techniques:
  - explicit loops
  - pooled arrays/lists
  - `CollectionsMarshal.AsSpan()` for internal list traversal where safe
 - Risk / complexity:
  - Low to medium.
 ### 6. Modernize MQTT packet writing and text parsing
 - Files:
  - `src/NATS.Server/Mqtt/MqttPacketWriter.cs`
  - `src/NATS.Server/Mqtt/MqttProtocolParser.cs`
 - Current issue:
  - `MqttPacketWriter` returns fresh `byte[]` instances for every string/packet write.
  - Remaining-length encoding returns `scratch[..index].ToArray()`.
  - `ParseLine()` uses `Trim()`, `StartsWith()`, `Split()`, slicing, and string-based parsing throughout.
 - Why it matters:
  - MQTT is a side protocol, so this is not the top optimization target.
  - Still worth fixing because the code is currently allocation-heavy and straightforward to improve.
 - Recommended optimization:
  - Add `TryWrite...` APIs that write into caller-provided `Span<byte>` / `IBufferWriter<byte>`.
  - Keep remaining-length bytes on the stack and copy directly into the final destination buffer.
  - Rework `ParseLine()` to operate on `ReadOnlySpan<char>` and avoid `Split`.
 - .NET 10 techniques:
  - `Span<byte>`
  - `IBufferWriter<byte>`
  - `ReadOnlySpan<char>`
  - `SearchValues<char>` for token separators if useful
 - Risk / complexity:
  - Low.
 ### 7. Replace full scans over `_messages` with maintained indexes where operations are common
 - Files:
  - `src/NATS.Server/JetStream/Storage/FileStore.cs`
 - Current issue:
  - `LoadLastBySubjectAsync()` scans all messages, filters, sorts descending, and picks the first result.
  - `TrimToMaxMessages()` repeatedly calls `_messages.Keys.Min()`.
  - `PurgeEx()` materializes candidate lists before deletion.
 - Why it matters:
  - These are algorithmic inefficiencies, not just allocation issues.
  - They become more visible as streams grow.
 - Recommended optimization:
  - Maintain lightweight indexes:
    - last sequence by subject
    - first/last live sequence tracking without `Min()` / `Max()` scans
    - optionally per-subject linked or sorted sequence sets for purge/retention operations
  - If full indexing is too large a change, replace repeated LINQ scans with single-pass loops immediately.
 - .NET 10 techniques:
  - compact metadata structs
  - tighter dictionary usage
  - fewer transient enumerators
 - Risk / complexity:
  - Medium.
 ### 8. Reduce repeated small allocations in protocol constants and control frames
 - Files:
  - `src/NATS.Server/Protocol/NatsParser.cs`
  - `src/NATS.Server/Routes/RouteConnection.cs`
  - `src/NATS.Server/LeafNodes/LeafConnection.cs`
  - other transport helpers
 - Current issue:
  - Some constants are still materialized via `ToArray()` rather than held as static `byte[]` or `ReadOnlyMemory<byte>`.
  - Control frames repeatedly build temporary arrays for tiny literals.
 - Why it matters:
  - These are cheap wins and remove noisy allocation churn.
 - Recommended optimization:
  - Standardize on shared static byte literals for CRLF and fixed protocol tokens.
  - Audit for repeated `u8.ToArray()` or `Encoding.ASCII.GetBytes` on invariant text.
 - .NET 10 techniques:
  - static cached buffers
  - span-based concatenation into reusable destinations
 - Risk / complexity:
  - Low.
 ## Lower ROI Or Caution Areas
 ### 9. Be selective about introducing more structs
 - Files:
  - cross-cutting
 - Current issue:
  - Some parts of the code would benefit from value types, but others already contain references (`string`, `byte[]`, `ReadOnlyMemory<byte>`, dictionaries) where converting whole models to structs would increase copying and call-site complexity.
 - Recommendation:
  - Good struct candidates:
    - composite dictionary keys
    - compact metadata/index entries
    - parser token views
    - queue or routing bookkeeping records
  - Poor struct candidates:
    - large mutable models
    - objects with many reference-type fields
    - stateful connection objects
 - Why it matters:
  - “Use more structs” is only a win when the values are small, immutable, and heavily allocated.
 ### 10. Avoid premature replacement of already-good memory APIs
 - Files:
  - `src/NATS.Server/NatsClient.cs`
  - `src/NATS.Server/IO/OutboundBufferPool.cs`
  - several JetStream codecs
 - Current issue:
  - There has already been meaningful optimization work in direct write buffering and pooled outbound paths.
  - Replacing these with more exotic abstractions without profiling could regress behavior.
 - Recommendation:
  - Prefer extending the current buffer-pool and direct-write patterns into routes, leaves, and parser payload handling before redesigning the client write path again.
 ## Suggested Implementation Order
 1. `NatsParser` hot-path byte retention and reduced payload copying
 2. `SubList` key/token allocation cleanup and remote-sub key redesign
 3. Route/leaf outbound buffer encoding cleanup
 4. `FileStore` hot-path de-LINQ and payload/index refactoring
 5. Monitoring endpoint de-materialization
 6. MQTT writer/parser span-based cleanup
 ## What To Measure Before And After
 Use the benchmark project and targeted microbenchmarks to measure:
 - allocations per `PUB`, `SUB`, `UNSUB`, `CONNECT`
 - allocations per delivered message under fanout
 - `SubList.Match()` throughput and allocations for:
  - exact subjects
  - wildcard subjects
  - queue subscriptions
  - remote interest present
 - JetStream append throughput and bytes allocated per append
 - route/leaf forwarded-message allocations
 - monitoring endpoint allocations for large client/subscription sets
 ## Summary
 The best remaining gains are not from sprinkling `Span<T>` everywhere. They come from carrying byte-oriented data further through the hot paths, removing composite-string bookkeeping, reducing duplicate payload ownership in JetStream storage, and eliminating materialization-heavy helper code around routing and monitoring.
 If you only do three things, do these first:
 1. Rework `NatsParser` to avoid early `string` / `byte[]` creation.
 2. Replace `SubList` composite string keys and string-heavy token handling.
 3. Refactor `FileStore` and route/leaf send paths to reduce duplicate buffers and transient formatting allocations.