17 KiB
FileStore Payload And Index Optimization Implementation Plan
For Codex: REQUIRED SUB-SKILLS: Use
using-git-worktreesto create an isolated workspace before Task 1, then useexecuteplanto implement this plan task-by-task. After verification is complete, merge the finished branch back intomain.
Goal: Reduce JetStream FileStore memory churn and repeated full scans by tightening payload ownership, splitting compact metadata from large payload buffers, and replacing LINQ-based maintenance work with explicit indexes and loops.
Architecture: Start by freezing current behavior across AppendAsync, StoreMsg, retention, snapshots, and recovery. Then introduce compact metadata/index structures, remove avoidable duplicate payload buffers, replace repeated _messages scans with maintained indexes, and finish by updating recovery/snapshot paths plus benchmark coverage.
Tech Stack: .NET 10, C#, JetStream storage stack, ReadOnlyMemory<byte>, pooled buffers where safe, xUnit, existing JetStream benchmark harness.
Scope Anchors
- Primary source:
src/NATS.Server/JetStream/Storage/FileStore.cs - Supporting sources:
src/NATS.Server/JetStream/Storage/MsgBlock.cssrc/NATS.Server/JetStream/Storage/StoredMessage.cssrc/NATS.Server/JetStream/Storage/MessageRecord.cs
- Existing contract tests:
tests/NATS.Server.JetStream.Tests/StreamStoreContractTests.cstests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs
- Existing FileStore coverage:
tests/NATS.Server.JetStream.Tests/FileStoreTests.cstests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cstests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCompressionTests.cstests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cstests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs
- Documentation to update:
Documentation/JetStream/Overview.md - Benchmark project:
tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj - Benchmark comparison doc:
benchmarks_comparison.md
Task 0: Create an isolated git worktree and verify the baseline
Files:
- Modify:
.gitignoreonly if the chosen local worktree directory is not already ignored
Step 1: Choose the worktree location using the repo convention
- Check for an existing
.worktrees/directory first, thenworktrees/. - If neither exists, check repo guidance before creating one.
- Prefer a project-local
.worktrees/directory when available.
Step 2: Verify the worktree directory is ignored before creating anything
- Run:
git check-ignore -q .worktrees || git check-ignore -q worktrees
- Expected: one configured worktree directory is ignored.
- If neither directory is ignored, add the chosen directory to
.gitignore, commit that change onmain, and then continue.
Step 3: Create a dedicated branch and worktree for this plan
- Run:
git worktree add .worktrees/filestore-payload-index-optimization -b codex/filestore-payload-index-optimization
- Expected: a new isolated checkout exists at
.worktrees/filestore-payload-index-optimization.
Step 4: Move into the worktree and verify the starting baseline
- Run:
cd .worktrees/filestore-payload-index-optimization
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release
- Expected: PASS before implementation starts.
- If the baseline fails, stop and resolve whether to proceed before changing FileStore code.
Step 5: Commit only the worktree bootstrap change if one was required
- Run only if
.gitignorehad to change:
git add .gitignore
git commit -m "chore: ignore local worktree directory"
Task 1: Freeze store behavior and add scan/ownership regression tests
Files:
- Modify:
tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs - Modify:
tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs - Create:
tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs
Step 1: Add failing tests for the targeted optimization boundaries
- Cover:
AppendAsyncretaining logical payload behaviorStoreMsgwith headers + payloadLoadLastBySubjectAsyncTrimToMaxMessagesPurgeEx- snapshot/recovery round-trips
Step 2: Add tests that lock first/last sequence bookkeeping
- Ensure
_firstSeq,_last, and subject-last lookup behavior remain correct after removes, purges, compaction, and recovery.
Step 3: Run focused JetStream tests to prove the new tests fail first
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~FileStoreOptimizationGuardTests|FullyQualifiedName~JetStreamStoreIndexTests|FullyQualifiedName~StoreInterfaceTests" -c Release - Expected: FAIL only in the newly added optimization-guard tests.
Step 4: Commit the failing-test baseline
- Run:
git add tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs
git commit -m "test: lock FileStore optimization boundaries"
Task 2: Introduce compact metadata/index types and remove full-scan bookkeeping
Files:
- Create:
src/NATS.Server/JetStream/Storage/StoredMessageIndex.cs - Modify:
src/NATS.Server/JetStream/Storage/FileStore.cs - Modify:
src/NATS.Server/JetStream/Storage/StoredMessage.cs
Step 1: Split compact indexing metadata from payload-bearing message objects
- Add a small immutable metadata/index type that tracks at least:
- sequence
- subject
- logical payload length
- timestamp
- subject-local links or last-seen markers if needed
Step 2: Replace repeated Min() / Max() / full-value scans with maintained state
- Maintain first live sequence, last live sequence, and last-by-subject values incrementally rather than recomputing them with LINQ.
Step 3: Run targeted index tests
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~JetStreamStoreIndexTests|FullyQualifiedName~FileStoreOptimizationGuardTests" -c Release - Expected: PASS.
Step 4: Commit the metadata/index layer
- Run:
git add src/NATS.Server/JetStream/Storage/StoredMessageIndex.cs src/NATS.Server/JetStream/Storage/FileStore.cs src/NATS.Server/JetStream/Storage/StoredMessage.cs
git commit -m "perf: add compact FileStore index metadata"
Task 3: Remove duplicate payload ownership in append and store paths
Files:
- Modify:
src/NATS.Server/JetStream/Storage/FileStore.cs - Modify:
src/NATS.Server/JetStream/Storage/MsgBlock.cs - Modify:
src/NATS.Server/JetStream/Storage/MessageRecord.cs - Modify:
tests/NATS.Server.JetStream.Tests/FileStoreTests.cs
Step 1: Rework AppendAsync and StoreMsg payload flow
- Stop eagerly keeping both a transformed persisted payload and a second fully duplicated managed payload when the same buffer/view can safely back both responsibilities.
- Keep correctness for compression, encryption, and header-bearing records explicit.
Step 2: Remove concatenated header+payload arrays where possible
- Let record encoding paths consume header and payload spans directly instead of always building
combined = new byte[...]. - Leave a copy in place only where the persistence or recovery contract actually requires one.
Step 3: Run targeted persistence tests
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~FileStoreTests|FullyQualifiedName~FileStoreCompressionTests|FullyQualifiedName~FileStoreEncryptionTests" -c Release - Expected: PASS.
Step 4: Commit the payload-ownership refactor
- Run:
git add src/NATS.Server/JetStream/Storage/FileStore.cs src/NATS.Server/JetStream/Storage/MsgBlock.cs src/NATS.Server/JetStream/Storage/MessageRecord.cs tests/NATS.Server.JetStream.Tests/FileStoreTests.cs
git commit -m "perf: reduce FileStore duplicate payload buffers"
Task 4: Replace LINQ-heavy maintenance operations with explicit indexed paths
Files:
- Modify:
src/NATS.Server/JetStream/Storage/FileStore.cs - Modify:
tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs - Modify:
tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cs - Modify:
tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs
Step 1: Rewrite hot maintenance methods
- Replace LINQ-based implementations in:
LoadLastBySubjectAsyncTrimToMaxMessagesPurgeEx- snapshot/recovery recomputation paths
- Use explicit loops and maintained indexes first; only add more elaborate per-subject structures if profiling still demands them.
Step 2: Preserve recovery and tombstone correctness
- Verify delete markers, TTL rebuilds, compaction, and sequence-gap handling still match the current parity tests.
Step 3: Run targeted JetStream storage suites
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj --filter "FullyQualifiedName~StoreInterfaceTests|FullyQualifiedName~FileStoreCrashRecoveryTests|FullyQualifiedName~FileStoreTombstoneTests" -c Release - Expected: PASS.
Step 4: Commit the maintenance-path rewrite
- Run:
git add src/NATS.Server/JetStream/Storage/FileStore.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreCrashRecoveryTests.cs tests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreTombstoneTests.cs
git commit -m "perf: replace FileStore full scans with indexed loops"
Task 5: Add benchmark coverage, update docs, and run full verification
Files:
- Create:
tests/NATS.Server.Benchmark.Tests/JetStream/FileStoreAppendBenchmarks.cs - Modify:
Documentation/JetStream/Overview.md
Step 1: Add FileStore-focused benchmarks
- Cover:
- append throughput
- sync publish
- load-last-by-subject
- purge/trim maintenance overhead
- Record allocation deltas before/after.
Step 2: Update JetStream documentation
- Document how FileStore now separates metadata/index concerns from payload storage and where copies still remain by design.
Step 3: Run full verification
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release - Run:
dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~FileStore|FullyQualifiedName~SyncPublish|FullyQualifiedName~AsyncPublish" -c Release - Expected: PASS; benchmark output shows fewer allocations in append-heavy scenarios.
Step 4: Commit docs and benchmarks
- Run:
git add tests/NATS.Server.Benchmark.Tests/JetStream/FileStoreAppendBenchmarks.cs Documentation/JetStream/Overview.md
git commit -m "docs: record FileStore payload and index strategy"
Task 6: Merge the verified worktree branch back into main
Files:
- No source-file changes expected unless the merge surfaces conflicts that require a follow-up fix
Step 1: Confirm the worktree branch is clean and fully verified
- Re-run the Task 5 verification commands in the worktree if anything changed after the final commit.
- Run:
git status --short
- Expected: no uncommitted changes.
Step 2: Update main before merging
- From the primary checkout, run:
git switch main
git pull --ff-only
- Expected: local
mainmatches the latest remote state.
Step 3: Merge the finished branch back to main
- Run:
git merge --ff-only codex/filestore-payload-index-optimization
- Expected:
mainfast-forwards to include the completed FileStore optimization commits. - If fast-forward is not possible, rebase
codex/filestore-payload-index-optimizationontomain, re-run verification, and then repeat this step.
Step 4: Confirm main still passes after the merge
- Run:
dotnet test tests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csproj -c Release
- Expected: PASS on
main.
Step 5: Remove the temporary worktree after merge confirmation
- Run:
git worktree remove .worktrees/filestore-payload-index-optimization
git branch -d codex/filestore-payload-index-optimization
- Expected: the temporary checkout is removed and the topic branch is no longer needed locally.
Task 7: Run the benchmark suite per the benchmark README and update the comparison document
Files:
- Modify:
benchmarks_comparison.md - Reference:
tests/NATS.Server.Benchmark.Tests/README.md
Step 1: Run the full benchmark suite with the README-prescribed command
- From
mainafter Task 6 succeeds, run:
dotnet test tests/NATS.Server.Benchmark.Tests \
--filter "Category=Benchmark" \
-v normal \
--logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
- Expected: the benchmark suite completes and writes comparison blocks to
/tmp/bench-output.txt.
Step 2: Extract the benchmark results from the captured output
- Review the
Standard Output Messagessections in/tmp/bench-output.txt. - Capture the updated values for:
- core pub/sub throughput
- request/reply latency
- JetStream sync publish
- JetStream async file publish
- ordered consumer throughput
- durable consumer fetch throughput
Step 3: Update benchmarks_comparison.md
- Update:
- the benchmark run date on the first line
- environment details if they changed
- all affected tables with the new msg/s, MB/s, ratio, and latency values
- the Summary and Key Observations text if the new ratios materially change the assessment
Step 4: Verify the comparison document changes are the only remaining edits
- Run:
git status --short
- Expected: only
benchmarks_comparison.mdis modified at this point unless the benchmark run surfaced a legitimate follow-up issue to capture separately.
Step 5: Commit the benchmark comparison refresh
- Run:
git add benchmarks_comparison.md
git commit -m "docs: update benchmark comparison after FileStore optimization"
Completion Checklist
- Implementation started from an isolated git worktree on
codex/filestore-payload-index-optimization. AppendAsyncandStoreMsgavoid unnecessary duplicate payload ownership.LoadLastBySubjectAsync,TrimToMaxMessages, andPurgeExno longer rely on repeated LINQ full scans.- First/last/live-sequence bookkeeping is maintained incrementally.
- JetStream storage, recovery, compression, encryption, and tombstone tests remain green.
- FileStore-focused benchmark coverage exists in
tests/NATS.Server.Benchmark.Tests/JetStream/. Documentation/JetStream/Overview.mdexplains the updated storage/index model.- Verified work has been merged back into
mainand the temporary worktree has been removed. - Full benchmark suite has been run from
mainusing the command intests/NATS.Server.Benchmark.Tests/README.md. benchmarks_comparison.mdhas been updated to reflect the new benchmark results.
Concise Execution Checklist For The Current Codebase
- Create
codex/filestore-payload-index-optimizationin.worktrees/filestore-payload-index-optimizationand verifytests/NATS.Server.JetStream.Tests/NATS.Server.JetStream.Tests.csprojpasses before changes. - Add optimization-guard coverage in
tests/NATS.Server.JetStream.Tests/JetStreamStoreIndexTests.cs,tests/NATS.Server.JetStream.Tests/JetStream/Storage/StoreInterfaceTests.cs, and newtests/NATS.Server.JetStream.Tests/JetStream/Storage/FileStoreOptimizationGuardTests.cs. - Rework the current FileStore hot paths in
src/NATS.Server/JetStream/Storage/FileStore.cs:AppendAsync,LoadLastBySubjectAsync,TrimToMaxMessages,StoreMsg, andPurgeEx. - Introduce compact FileStore indexing metadata in new
src/NATS.Server/JetStream/Storage/StoredMessageIndex.csand adjustsrc/NATS.Server/JetStream/Storage/StoredMessage.csaccordingly. - Remove avoidable payload duplication in
src/NATS.Server/JetStream/Storage/FileStore.cs,src/NATS.Server/JetStream/Storage/MsgBlock.cs, andsrc/NATS.Server/JetStream/Storage/MessageRecord.cs. - Keep JetStream storage parity green by re-running the existing storage-focused suites under
tests/NATS.Server.JetStream.Tests/JetStream/Storage/, especially compression, crash recovery, tombstones, and store interface coverage. - Add FileStore benchmark coverage alongside the existing JetStream benchmark classes in
tests/NATS.Server.Benchmark.Tests/JetStream/. - Update
Documentation/JetStream/Overview.mdto describe the new payload/index split and the remaining intentional copy boundaries. - Merge the verified topic branch back into
main, re-run JetStream tests onmain, then remove the temporary worktree. - Run the full benchmark suite exactly as documented in
tests/NATS.Server.Benchmark.Tests/README.mdand updatebenchmarks_comparison.mdwith the new measurements.