Files
natsdotnet/docs/plans/2026-03-13-optimizations_sublist-plan.md
2026-03-13 10:19:56 -04:00

17 KiB

SubList Allocation Reduction Implementation Plan

For Codex: REQUIRED SUB-SKILLS: Use using-git-worktrees to create an isolated worktree before making changes, executeplan to implement this plan task-by-task, and finishing-a-development-branch to merge the verified work back to main when implementation is complete.

Goal: Reduce publish-path allocation and lookup overhead in SubList by removing composite string keys, minimizing token.ToString() churn, and tightening Match() result building without changing subscription semantics.

Architecture: First lock the current trie, cache, and remote-interest behavior with targeted tests. Then replace routed-sub bookkeeping with a dedicated value key, remove string split/rebuild work from remote cleanup, and finally optimize trie traversal and match result construction with span-friendly helpers and pooled builders.

Tech Stack: .NET 10, C#, ReaderWriterLockSlim, span-based token parsing, xUnit, existing clustering/gateway parity suites, benchmark test harness.


Scope Anchors

  • Primary source: src/NATS.Server/Subscriptions/SubList.cs
  • Supporting source: src/NATS.Server/Subscriptions/SubjectMatch.cs
  • Existing core tests: tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
  • Existing ctor/notification tests: tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs
  • Route cleanup tests: tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
  • Gateway/route interest tests:
    • tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs
    • tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs
    • tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs
  • Documentation to update: Documentation/Subscriptions/SubList.md
  • Benchmark workflow reference: tests/NATS.Server.Benchmark.Tests/README.md
  • Benchmark comparison document: benchmarks_comparison.md

Task 0: Create an isolated git worktree for the SubList optimization work

Files:

  • Verify: .worktrees/
  • Modify if needed: .gitignore

Step 1: Verify the preferred worktree directory is available and ignored

  • Check that .worktrees/ exists and is ignored by git before creating a project-local worktree.
  • If .worktrees/ is not ignored, add it to .gitignore, then commit that repository hygiene fix before continuing.

Step 2: Create the feature worktree on a codex/ branch

  • Run:
git worktree add .worktrees/codex-sublist-allocation-reduction -b codex/sublist-allocation-reduction
cd .worktrees/codex-sublist-allocation-reduction
  • Expected: a new isolated worktree exists at .worktrees/codex-sublist-allocation-reduction on branch codex/sublist-allocation-reduction.

Step 3: Verify the worktree starts from a clean, passing baseline

  • Run: dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release
  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release
  • Run: dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release
  • Expected: PASS. If this fails, stop and resolve or explicitly confirm whether to proceed from a failing baseline.

Step 4: Commit any required worktree setup fix

  • Only if .gitignore changed, run:
git add .gitignore
git commit -m "chore: ignore local worktree directories"

Task 1: Lock behavior around remote interest, cleanup, and matching

Files:

  • Modify: tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
  • Modify: tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
  • Create: tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs

Step 1: Add failing tests for routed-sub bookkeeping changes

  • Cover:
    • applying the same remote subscription twice
    • removing remote subscriptions by route and by route/account
    • queue-weight updates
    • exact and wildcard remote-interest queries

Step 2: Add tests for match result stability

  • Ensure Match() still returns correct plain and queue subscription sets for exact, *, and > subjects.
  • Add a test that specifically locks cache behavior across generation bumps.

Step 3: Run the focused tests to prove the new coverage fails first

  • Run: dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release
  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release
  • Expected: FAIL only in the newly added allocation-guard or key-behavior tests.

Step 4: Commit the failing-test baseline

  • Run:
git add tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
git commit -m "test: lock SubList remote-key and match behavior"

Task 2: Replace composite routed-sub strings with a dedicated value key

Files:

  • Create: src/NATS.Server/Subscriptions/RoutedSubKey.cs
  • Modify: src/NATS.Server/Subscriptions/SubList.cs
  • Modify: tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs

Step 1: Introduce a strongly typed routed-sub key

  • Add a small immutable value type for (RouteId, Account, Subject, Queue).
  • Use it as the dictionary key for _remoteSubs instead of the "route|account|subject|queue" composite string.

Step 2: Remove string split/rebuild helpers from hot paths

  • Replace BuildRoutedSubKey(...), GetAccNameFromRoutedSubKey(...), and GetRoutedSubKeyInfo(...) usage in runtime paths with the new value key.
  • Keep compatibility helper coverage only if other call sites still require string-facing helpers temporarily.

Step 3: Run remote-interest tests

  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteSubscriptionTests" -c Release
  • Run: dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests" -c Release
  • Expected: PASS.

Step 4: Commit the key-model refactor

  • Run:
git add src/NATS.Server/Subscriptions/RoutedSubKey.cs src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
git commit -m "perf: replace SubList routed-sub string keys"

Task 3: Remove avoidable string churn from trie traversal

Files:

  • Modify: src/NATS.Server/Subscriptions/SubList.cs
  • Modify: src/NATS.Server/Subscriptions/SubjectMatch.cs
  • Modify: tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs

Step 1: Rework token traversal helpers

  • Add a shared subject token walker that can expose tokens as spans and only allocate when a trie node insertion truly needs a durable string key.
  • Remove repeated token.ToString() in traversal paths where lookups can operate on a transient token view first.

Step 2: Keep exact-subject match paths allocation-lean

  • Prefer span/token comparison helpers for exact-match and wildcard traversal logic.
  • Leave wildcard semantics unchanged.

Step 3: Run core subscription tests

  • Run: dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubListGoParityTests|FullyQualifiedName~SubListCtorAndNotificationParityTests|FullyQualifiedName~SubListParityBatch2Tests" -c Release
  • Expected: PASS.

Step 4: Commit trie traversal cleanup

  • Run:
git add src/NATS.Server/Subscriptions/SubList.cs src/NATS.Server/Subscriptions/SubjectMatch.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
git commit -m "perf: reduce SubList token string churn"

Task 4: Pool Match() result building and remove cleanup copies

Files:

  • Modify: src/NATS.Server/Subscriptions/SubList.cs
  • Modify: tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs
  • Modify: tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs

Step 1: Replace temporary per-match collections

  • Rework Match() so temporary result building uses pooled builders or ArrayBufferWriter<T> instead of fresh nested List<T> allocations on every call.
  • Preserve the current public SubListResult shape unless profiling proves a larger contract change is justified.

Step 2: Remove ToArray() cleanup passes over _remoteSubs

  • Update RemoveRemoteSubs(...) and RemoveRemoteSubsForAccount(...) to avoid eager dictionary array copies.
  • Ensure removal remains correct under the existing lock discipline.

Step 3: Run cross-module regression

  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteSubscriptionTests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release
  • Run: dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayForwardingTests" -c Release
  • Expected: PASS.

Step 4: Commit match-builder changes

  • Run:
git add src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs
git commit -m "perf: pool SubList match builders and cleanup scans"

Task 5: Add benchmark coverage, update docs, and run full verification

Files:

  • Create: tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs
  • Modify: Documentation/Subscriptions/SubList.md

Step 1: Add focused SubList benchmarks

  • Measure exact-match, wildcard-match, queue-sub, and remote-interest scenarios.
  • Capture throughput and allocations before/after the refactor.

Step 2: Update subscription documentation

  • Document the new routed-sub key model, the allocation strategy for trie matching, and any remaining intentional copies.

Step 3: Run full verification

  • Run: dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release
  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj -c Release
  • Run: dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj -c Release
  • Run: dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release
  • Expected: PASS; benchmark output shows reduced per-match allocations.

Step 4: Commit the documentation and benchmark work

  • Run:
git add tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs Documentation/Subscriptions/SubList.md
git commit -m "docs: record SubList allocation strategy"

Task 6: Merge the verified SubList work back to main and clean up the worktree

Files:

  • No source changes expected

Step 1: Confirm the feature branch is fully verified before merge

  • Reuse the verification from Task 5. Do not merge if the core, clustering, gateway, or benchmark test commands are failing.

Step 2: Merge codex/sublist-allocation-reduction back into main

  • Return to the primary repository worktree and run:
git checkout main
git pull
git merge codex/sublist-allocation-reduction
  • Expected: the SubList optimization commits merge cleanly into main.

Step 3: Re-run verification on the merged main branch

  • Run: dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release
  • Run: dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release
  • Run: dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release
  • Run: dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release
  • Expected: PASS on merged main.

Step 4: Delete the feature branch and remove the worktree

  • Run:
git branch -d codex/sublist-allocation-reduction
git worktree remove .worktrees/codex-sublist-allocation-reduction
  • Expected: only main remains checked out in the primary workspace, and the temporary SubList worktree is removed.

Task 7: Run the full benchmark suite per the benchmark README and update benchmarks_comparison.md

Files:

  • Verify workflow against: tests/NATS.Server.Benchmark.Tests/README.md
  • Modify: benchmarks_comparison.md

Step 1: Run the benchmark test project using the repository-documented command

  • From the primary repository worktree on verified main, run:
dotnet test tests/NATS.Server.Benchmark.Tests \
  --filter "Category=Benchmark" \
  -v normal \
  --logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
  • Expected: the full benchmark suite completes and writes detailed comparison output to /tmp/bench-output.txt.

Step 2: Extract the benchmark comparison blocks from the captured output

  • Open /tmp/bench-output.txt and pull the side-by-side comparison blocks from the Standard Output Messages sections for:
    • core pub-only
    • core 1:1 pub/sub
    • core fan-out
    • core multi pub/sub
    • request/reply
    • JetStream publish
    • JetStream consumption

Step 3: Update benchmarks_comparison.md

  • Refresh:
    • the benchmark run date on the first line
    • the environment description if toolchain or machine details changed
    • throughput, MB/s, ratio, and latency values in the benchmark tables
    • summary assessments and key observations if the ratios materially changed
  • Keep the narrative tied to measured results from /tmp/bench-output.txt; do not preserve stale claims that no longer match the numbers.

Step 4: Commit the benchmark comparison refresh

  • Run:
git add benchmarks_comparison.md
git commit -m "docs: refresh benchmark comparison after SubList optimization"
  • Expected: main contains the merged SubList optimization work plus the refreshed benchmark comparison document.

Completion Checklist

  • SubList optimization work was implemented in an isolated .worktrees/codex-sublist-allocation-reduction worktree on branch codex/sublist-allocation-reduction.
  • _remoteSubs no longer uses composite string keys in runtime paths.
  • Remote cleanup paths no longer depend on Split('|') and _remoteSubs.ToArray().
  • Trie traversal materially reduces token.ToString() churn.
  • Match() uses pooled or allocation-lean temporary builders.
  • Core, clustering, and gateway parity tests remain green.
  • Documentation/Subscriptions/SubList.md explains the new key and match strategy.
  • Verified SubList optimization commits were merged back into main.
  • The full benchmark test project was run on merged main per tests/NATS.Server.Benchmark.Tests/README.md.
  • benchmarks_comparison.md was updated to match the latest benchmark output.

Concise Execution Checklist (Current Codebase)

  • Start from the current repo root and leave unrelated MQTT worktree changes untouched.
  • Create .worktrees/codex-sublist-allocation-reduction on codex/sublist-allocation-reduction.
  • Verify the current SubList baseline with:
    • tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
    • tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs
    • tests/NATS.Server.Core.Tests/Subscriptions/SubListParityBatch2Tests.cs
    • tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
    • tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs
    • tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs
    • tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs
    • tests/NATS.Server.Gateways.Tests/Gateways/GatewayForwardingTests.cs
  • Add new guard coverage in tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs.
  • Refactor src/NATS.Server/Subscriptions/SubList.cs to use a new src/NATS.Server/Subscriptions/RoutedSubKey.cs.
  • Reduce token string churn in src/NATS.Server/Subscriptions/SubList.cs and src/NATS.Server/Subscriptions/SubjectMatch.cs.
  • Add a new benchmark file at tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs beside the existing CorePubSub benchmarks.
  • Update Documentation/Subscriptions/SubList.md.
  • Run full verification for core, clustering, gateway, and focused SubList benchmark tests.
  • Merge codex/sublist-allocation-reduction into main and re-run the merged verification.
  • Run the full benchmark suite using tests/NATS.Server.Benchmark.Tests/README.md guidance and refresh benchmarks_comparison.md.