docs: add optimization planning documents

This commit is contained in:
Joseph Doherty
2026-03-13 10:19:56 -04:00
parent fb0d31c615
commit a1fc600d84
4 changed files with 1172 additions and 0 deletions

View File

@@ -0,0 +1,300 @@
# SubList Allocation Reduction Implementation Plan
> **For Codex:** REQUIRED SUB-SKILLS: Use `using-git-worktrees` to create an isolated worktree before making changes, `executeplan` to implement this plan task-by-task, and `finishing-a-development-branch` to merge the verified work back to `main` when implementation is complete.
**Goal:** Reduce publish-path allocation and lookup overhead in `SubList` by removing composite string keys, minimizing `token.ToString()` churn, and tightening `Match()` result building without changing subscription semantics.
**Architecture:** First lock the current trie, cache, and remote-interest behavior with targeted tests. Then replace routed-sub bookkeeping with a dedicated value key, remove string split/rebuild work from remote cleanup, and finally optimize trie traversal and match result construction with span-friendly helpers and pooled builders.
**Tech Stack:** .NET 10, C#, `ReaderWriterLockSlim`, span-based token parsing, xUnit, existing clustering/gateway parity suites, benchmark test harness.
---
## Scope Anchors
- Primary source: `src/NATS.Server/Subscriptions/SubList.cs`
- Supporting source: `src/NATS.Server/Subscriptions/SubjectMatch.cs`
- Existing core tests: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
- Existing ctor/notification tests: `tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs`
- Route cleanup tests: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
- Gateway/route interest tests:
- `tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs`
- `tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs`
- `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
- Documentation to update: `Documentation/Subscriptions/SubList.md`
- Benchmark workflow reference: `tests/NATS.Server.Benchmark.Tests/README.md`
- Benchmark comparison document: `benchmarks_comparison.md`
## Task 0: Create an isolated git worktree for the SubList optimization work
**Files:**
- Verify: `.worktrees/`
- Modify if needed: `.gitignore`
**Step 1: Verify the preferred worktree directory is available and ignored**
- Check that `.worktrees/` exists and is ignored by git before creating a project-local worktree.
- If `.worktrees/` is not ignored, add it to `.gitignore`, then commit that repository hygiene fix before continuing.
**Step 2: Create the feature worktree on a `codex/` branch**
- Run:
```bash
git worktree add .worktrees/codex-sublist-allocation-reduction -b codex/sublist-allocation-reduction
cd .worktrees/codex-sublist-allocation-reduction
```
- Expected: a new isolated worktree exists at `.worktrees/codex-sublist-allocation-reduction` on branch `codex/sublist-allocation-reduction`.
**Step 3: Verify the worktree starts from a clean, passing baseline**
- Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
- Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
- Expected: PASS. If this fails, stop and resolve or explicitly confirm whether to proceed from a failing baseline.
**Step 4: Commit any required worktree setup fix**
- Only if `.gitignore` changed, run:
```bash
git add .gitignore
git commit -m "chore: ignore local worktree directories"
```
## Task 1: Lock behavior around remote interest, cleanup, and matching
**Files:**
- Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
- Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
- Create: `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`
**Step 1: Add failing tests for routed-sub bookkeeping changes**
- Cover:
- applying the same remote subscription twice
- removing remote subscriptions by route and by route/account
- queue-weight updates
- exact and wildcard remote-interest queries
**Step 2: Add tests for match result stability**
- Ensure `Match()` still returns correct plain and queue subscription sets for exact, `*`, and `>` subjects.
- Add a test that specifically locks cache behavior across generation bumps.
**Step 3: Run the focused tests to prove the new coverage fails first**
- Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release`
- Expected: FAIL only in the newly added allocation-guard or key-behavior tests.
**Step 4: Commit the failing-test baseline**
- Run:
```bash
git add tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
git commit -m "test: lock SubList remote-key and match behavior"
```
## Task 2: Replace composite routed-sub strings with a dedicated value key
**Files:**
- Create: `src/NATS.Server/Subscriptions/RoutedSubKey.cs`
- Modify: `src/NATS.Server/Subscriptions/SubList.cs`
- Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
**Step 1: Introduce a strongly typed routed-sub key**
- Add a small immutable value type for `(RouteId, Account, Subject, Queue)`.
- Use it as the dictionary key for `_remoteSubs` instead of the `"route|account|subject|queue"` composite string.
**Step 2: Remove string split/rebuild helpers from hot paths**
- Replace `BuildRoutedSubKey(...)`, `GetAccNameFromRoutedSubKey(...)`, and `GetRoutedSubKeyInfo(...)` usage in runtime paths with the new value key.
- Keep compatibility helper coverage only if other call sites still require string-facing helpers temporarily.
**Step 3: Run remote-interest tests**
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
- Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests" -c Release`
- Expected: PASS.
**Step 4: Commit the key-model refactor**
- Run:
```bash
git add src/NATS.Server/Subscriptions/RoutedSubKey.cs src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs
git commit -m "perf: replace SubList routed-sub string keys"
```
## Task 3: Remove avoidable string churn from trie traversal
**Files:**
- Modify: `src/NATS.Server/Subscriptions/SubList.cs`
- Modify: `src/NATS.Server/Subscriptions/SubjectMatch.cs`
- Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
**Step 1: Rework token traversal helpers**
- Add a shared subject token walker that can expose tokens as spans and only allocate when a trie node insertion truly needs a durable string key.
- Remove repeated `token.ToString()` in traversal paths where lookups can operate on a transient token view first.
**Step 2: Keep exact-subject match paths allocation-lean**
- Prefer span/token comparison helpers for exact-match and wildcard traversal logic.
- Leave wildcard semantics unchanged.
**Step 3: Run core subscription tests**
- Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubListGoParityTests|FullyQualifiedName~SubListCtorAndNotificationParityTests|FullyQualifiedName~SubListParityBatch2Tests" -c Release`
- Expected: PASS.
**Step 4: Commit trie traversal cleanup**
- Run:
```bash
git add src/NATS.Server/Subscriptions/SubList.cs src/NATS.Server/Subscriptions/SubjectMatch.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs
git commit -m "perf: reduce SubList token string churn"
```
## Task 4: Pool `Match()` result building and remove cleanup copies
**Files:**
- Modify: `src/NATS.Server/Subscriptions/SubList.cs`
- Modify: `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`
- Modify: `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
**Step 1: Replace temporary per-match collections**
- Rework `Match()` so temporary result building uses pooled builders or `ArrayBufferWriter<T>` instead of fresh nested `List<T>` allocations on every call.
- Preserve the current public `SubListResult` shape unless profiling proves a larger contract change is justified.
**Step 2: Remove `ToArray()` cleanup passes over `_remoteSubs`**
- Update `RemoveRemoteSubs(...)` and `RemoveRemoteSubsForAccount(...)` to avoid eager dictionary array copies.
- Ensure removal remains correct under the existing lock discipline.
**Step 3: Run cross-module regression**
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteSubscriptionTests|FullyQualifiedName~RouteInterestIdempotencyTests" -c Release`
- Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
- Expected: PASS.
**Step 4: Commit match-builder changes**
- Run:
```bash
git add src/NATS.Server/Subscriptions/SubList.cs tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs
git commit -m "perf: pool SubList match builders and cleanup scans"
```
## Task 5: Add benchmark coverage, update docs, and run full verification
**Files:**
- Create: `tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs`
- Modify: `Documentation/Subscriptions/SubList.md`
**Step 1: Add focused `SubList` benchmarks**
- Measure exact-match, wildcard-match, queue-sub, and remote-interest scenarios.
- Capture throughput and allocations before/after the refactor.
**Step 2: Update subscription documentation**
- Document the new routed-sub key model, the allocation strategy for trie matching, and any remaining intentional copies.
**Step 3: Run full verification**
- Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj -c Release`
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj -c Release`
- Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj -c Release`
- Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
- Expected: PASS; benchmark output shows reduced per-match allocations.
**Step 4: Commit the documentation and benchmark work**
- Run:
```bash
git add tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs Documentation/Subscriptions/SubList.md
git commit -m "docs: record SubList allocation strategy"
```
## Task 6: Merge the verified SubList work back to `main` and clean up the worktree
**Files:**
- No source changes expected
**Step 1: Confirm the feature branch is fully verified before merge**
- Reuse the verification from Task 5. Do not merge if the core, clustering, gateway, or benchmark test commands are failing.
**Step 2: Merge `codex/sublist-allocation-reduction` back into `main`**
- Return to the primary repository worktree and run:
```bash
git checkout main
git pull
git merge codex/sublist-allocation-reduction
```
- Expected: the SubList optimization commits merge cleanly into `main`.
**Step 3: Re-run verification on the merged `main` branch**
- Run: `dotnet test tests/NATS.Server.Core.Tests/NATS.Server.Core.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
- Run: `dotnet test tests/NATS.Server.Clustering.Tests/NATS.Server.Clustering.Tests.csproj --filter "FullyQualifiedName~RouteRemoteSubCleanupParityBatch2Tests|FullyQualifiedName~RouteInterestIdempotencyTests|FullyQualifiedName~RouteSubscriptionTests" -c Release`
- Run: `dotnet test tests/NATS.Server.Gateways.Tests/NATS.Server.Gateways.Tests.csproj --filter "FullyQualifiedName~GatewayInterestModeTests|FullyQualifiedName~GatewayInterestIdempotencyTests|FullyQualifiedName~GatewayForwardingTests" -c Release`
- Run: `dotnet test tests/NATS.Server.Benchmark.Tests/NATS.Server.Benchmark.Tests.csproj --filter "FullyQualifiedName~SubList" -c Release`
- Expected: PASS on merged `main`.
**Step 4: Delete the feature branch and remove the worktree**
- Run:
```bash
git branch -d codex/sublist-allocation-reduction
git worktree remove .worktrees/codex-sublist-allocation-reduction
```
- Expected: only `main` remains checked out in the primary workspace, and the temporary SubList worktree is removed.
## Task 7: Run the full benchmark suite per the benchmark README and update `benchmarks_comparison.md`
**Files:**
- Verify workflow against: `tests/NATS.Server.Benchmark.Tests/README.md`
- Modify: `benchmarks_comparison.md`
**Step 1: Run the benchmark test project using the repository-documented command**
- From the primary repository worktree on verified `main`, run:
```bash
dotnet test tests/NATS.Server.Benchmark.Tests \
--filter "Category=Benchmark" \
-v normal \
--logger "console;verbosity=detailed" 2>&1 | tee /tmp/bench-output.txt
```
- Expected: the full benchmark suite completes and writes detailed comparison output to `/tmp/bench-output.txt`.
**Step 2: Extract the benchmark comparison blocks from the captured output**
- Open `/tmp/bench-output.txt` and pull the side-by-side comparison blocks from the `Standard Output Messages` sections for:
- core pub-only
- core 1:1 pub/sub
- core fan-out
- core multi pub/sub
- request/reply
- JetStream publish
- JetStream consumption
**Step 3: Update `benchmarks_comparison.md`**
- Refresh:
- the benchmark run date on the first line
- the environment description if toolchain or machine details changed
- throughput, MB/s, ratio, and latency values in the benchmark tables
- summary assessments and key observations if the ratios materially changed
- Keep the narrative tied to measured results from `/tmp/bench-output.txt`; do not preserve stale claims that no longer match the numbers.
**Step 4: Commit the benchmark comparison refresh**
- Run:
```bash
git add benchmarks_comparison.md
git commit -m "docs: refresh benchmark comparison after SubList optimization"
```
- Expected: `main` contains the merged SubList optimization work plus the refreshed benchmark comparison document.
## Completion Checklist
- [ ] SubList optimization work was implemented in an isolated `.worktrees/codex-sublist-allocation-reduction` worktree on branch `codex/sublist-allocation-reduction`.
- [ ] `_remoteSubs` no longer uses composite string keys in runtime paths.
- [ ] Remote cleanup paths no longer depend on `Split('|')` and `_remoteSubs.ToArray()`.
- [ ] Trie traversal materially reduces `token.ToString()` churn.
- [ ] `Match()` uses pooled or allocation-lean temporary builders.
- [ ] Core, clustering, and gateway parity tests remain green.
- [ ] `Documentation/Subscriptions/SubList.md` explains the new key and match strategy.
- [ ] Verified SubList optimization commits were merged back into `main`.
- [ ] The full benchmark test project was run on merged `main` per `tests/NATS.Server.Benchmark.Tests/README.md`.
- [ ] `benchmarks_comparison.md` was updated to match the latest benchmark output.
## Concise Execution Checklist (Current Codebase)
- [ ] Start from the current repo root and leave unrelated MQTT worktree changes untouched.
- [ ] Create `.worktrees/codex-sublist-allocation-reduction` on `codex/sublist-allocation-reduction`.
- [ ] Verify the current SubList baseline with:
- `tests/NATS.Server.Core.Tests/Subscriptions/SubListGoParityTests.cs`
- `tests/NATS.Server.Core.Tests/Subscriptions/SubListCtorAndNotificationParityTests.cs`
- `tests/NATS.Server.Core.Tests/Subscriptions/SubListParityBatch2Tests.cs`
- `tests/NATS.Server.Clustering.Tests/Routes/RouteRemoteSubCleanupParityBatch2Tests.cs`
- `tests/NATS.Server.Clustering.Tests/Routes/RouteInterestIdempotencyTests.cs`
- `tests/NATS.Server.Clustering.Tests/Routes/RouteSubscriptionTests.cs`
- `tests/NATS.Server.Gateways.Tests/Gateways/GatewayInterestModeTests.cs`
- `tests/NATS.Server.Gateways.Tests/Gateways/GatewayForwardingTests.cs`
- [ ] Add new guard coverage in `tests/NATS.Server.Core.Tests/Subscriptions/SubListAllocationGuardTests.cs`.
- [ ] Refactor `src/NATS.Server/Subscriptions/SubList.cs` to use a new `src/NATS.Server/Subscriptions/RoutedSubKey.cs`.
- [ ] Reduce token string churn in `src/NATS.Server/Subscriptions/SubList.cs` and `src/NATS.Server/Subscriptions/SubjectMatch.cs`.
- [ ] Add a new benchmark file at `tests/NATS.Server.Benchmark.Tests/CorePubSub/SubListMatchBenchmarks.cs` beside the existing CorePubSub benchmarks.
- [ ] Update `Documentation/Subscriptions/SubList.md`.
- [ ] Run full verification for core, clustering, gateway, and focused SubList benchmark tests.
- [ ] Merge `codex/sublist-allocation-reduction` into `main` and re-run the merged verification.
- [ ] Run the full benchmark suite using `tests/NATS.Server.Benchmark.Tests/README.md` guidance and refresh `benchmarks_comparison.md`.