Files
natsnet/docs/plans/2026-02-27-batch-13-filestore-read-query-design.md
Joseph Doherty dc3e162608 Add batch plans for batches 13-15, 18-22 (rounds 8-11)
Generated design docs and implementation plans via Codex for:
- Batch 13: FileStore Read/Query
- Batch 14: FileStore Write/Lifecycle
- Batch 15: MsgBlock + ConsumerFileStore
- Batch 18: Server Core
- Batch 19: Accounts Core
- Batch 20: Accounts Resolvers
- Batch 21: Events + MsgTrace
- Batch 22: Monitoring

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 15:43:14 -05:00

128 lines
5.3 KiB
Markdown

# Batch 13 FileStore Read/Query Design
## Context
- Batch: `13` (`FileStore Read/Query`)
- Dependency: Batch `12` (`FileStore Recovery`)
- Scope: 8 feature IDs, 0 mapped tests
- Go reference: `golang/nats-server/server/filestore.go` (primarily lines `3241-3571`)
- .NET target: `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs`
Features in scope:
- `1006` `fileStore.checkSkipFirstBlock`
- `1007` `fileStore.checkSkipFirstBlockMulti`
- `1008` `fileStore.selectSkipFirstBlock`
- `1009` `fileStore.numFilteredPending`
- `1010` `fileStore.numFilteredPendingNoLast`
- `1011` `fileStore.numFilteredPendingWithLast`
- `1014` `fileStore.allLastSeqsLocked`
- `1015` `fileStore.filterIsAll`
## Problem Statement
`JetStreamFileStore` currently delegates nearly all query behavior to `JetStreamMemStore`. Batch 13 requires file-store-native read/query helpers that depend on file-store indexes (`_psim`, `_bim`, `_blks`) and file-store locking patterns. Without these helpers, later file-store read/write batches cannot safely switch from delegation to true file-backed query paths.
## Approaches Considered
### Approach 1 (Recommended): Implement file-store-native helpers now, keep public delegation stable
Implement the 8 methods in `JetStreamFileStore` as internal/private helpers that operate on `_psim`, `_bim`, `_blks`, and `MessageBlock` metadata. Keep the current public `IStreamStore` delegation unchanged for now, but add focused tests for helper correctness and edge cases.
Pros:
- Preserves current behavior while enabling later batches.
- Aligns tightly with Go logic and lock semantics.
- Reduces risk when moving public query paths to file-store-native execution.
Cons:
- Adds methods that are not yet fully wired into every public API path.
- Requires test harness/setup for internal state.
### Approach 2: Keep delegation and defer helper implementation to Batch 14+
Do nothing in Batch 13 besides documentation/notes, then implement all helpers when public read paths are ported.
Pros:
- Lower immediate implementation effort.
Cons:
- Violates the intent of Batch 13 feature scope.
- Pushes complexity/risk into later larger batches.
- Prevents granular verification of these helpers.
### Approach 3: Replace delegation immediately and port full read/query path end-to-end
Implement helpers plus rewire all relevant public methods (`NumPending`, `MultiLastSeqs`, etc.) to native file-store logic in this batch.
Pros:
- End-state behavior reached sooner.
Cons:
- Scope explosion beyond 8 Batch 13 features.
- High regression risk and poor fit for dependency sequencing.
## Recommended Design
### 1) FileStore helper surface for Batch 13
Add/port the following methods in `JetStreamFileStore` with Go-equivalent behavior:
- `CheckSkipFirstBlock(string filter, bool wc, int bi)`
- `CheckSkipFirstBlockMulti(SimpleSublist sl, int bi)`
- `SelectSkipFirstBlock(int bi, uint start, uint stop)`
- `NumFilteredPending(string filter, ref SimpleState ss)`
- `NumFilteredPendingNoLast(string filter, ref SimpleState ss)`
- `NumFilteredPendingWithLast(string filter, bool includeLast, ref SimpleState ss)`
- `AllLastSeqsLocked()`
- `FilterIsAll(string[] filters)`
### 2) State and indexing model
- Use `_psim` (`SubjectTree<Psi>`) for per-subject totals and first/last block index hints.
- Use `_bim` to dereference block index (`uint`) to `MessageBlock`.
- Use `_blks` ordering for forward/backward block scans.
- Preserve lazy correction behavior for stale `Psi.Fblk` when first-sequence lookup misses.
### 3) Locking and concurrency model
- Methods ending with `Locked` assume `_mu` read lock is held by caller.
- Avoid lock upgrade from read to write in-place.
- For stale `Fblk` correction, schedule deferred write-lock update (Go uses goroutine). In C#, use a background `Task.Run` that reacquires `_mu` write lock and revalidates before update.
- Avoid nested read-lock recursion patterns that can deadlock with pending writes.
### 4) Error and boundary behavior
- Return `StoreErrors.ErrStoreEOF` when skip/jump helpers find no matching future blocks.
- Return empty results (not exceptions) when there are no matching subjects for filtered pending.
- Keep null checks defensive for `_psim`, `_bim`, missing blocks, and empty `_blks`.
### 5) Testing strategy
- Add focused tests for helper behavior in `dotnet/tests/ZB.MOM.NatsNet.Server.Tests/JetStream/JetStreamFileStoreReadQueryTests.cs`.
- Derive cases from Go tests around:
- `TestFileStoreFilteredPendingPSIMFirstBlockUpdate`
- `TestFileStoreWildcardFilteredPendingPSIMFirstBlockUpdate`
- `TestFileStoreFilteredPendingPSIMFirstBlockUpdateNextBlock`
- `TestJetStreamStoreFilterIsAll`
- Run existing related tests (`JetStreamFileStoreTests`, `StorageEngineTests` subset, `JetStreamMemoryStoreTests` subset) as regression gates.
## Non-Goals
- Full migration of public file-store query APIs away from `_memStore` delegation.
- Implementing unrelated file-store write/lifecycle features from Batch 14.
- Porting new PortTracker-mapped test IDs (Batch 13 has 0 mapped tests).
## Acceptance Criteria
- All 8 feature methods are implemented with behavior aligned to Go intent.
- No placeholder/stub patterns in touched source/tests.
- `dotnet build dotnet/` passes after each feature group.
- Related test gates pass before features are marked `verified`.
- Feature status updates are evidence-backed and chunked (max 15 IDs).