Files
natsnet/docs/plans/2026-02-27-batch-15-msgblock-consumerfilestore-design.md
Joseph Doherty dc3e162608 Add batch plans for batches 13-15, 18-22 (rounds 8-11)
Generated design docs and implementation plans via Codex for:
- Batch 13: FileStore Read/Query
- Batch 14: FileStore Write/Lifecycle
- Batch 15: MsgBlock + ConsumerFileStore
- Batch 18: Server Core
- Batch 19: Accounts Core
- Batch 20: Accounts Resolvers
- Batch 21: Events + MsgTrace
- Batch 22: Monitoring

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 15:43:14 -05:00

157 lines
5.5 KiB
Markdown

# Batch 15 MsgBlock + ConsumerFileStore Design
I'm using `brainstorm` to produce this design before implementation planning.
## Context
- Batch: `15` (`MsgBlock + ConsumerFileStore`)
- Dependency: Batch `14` (`FileStore Write/Lifecycle`)
- Scope: `121` features + `89` tests
- Go source: `golang/nats-server/server/filestore.go`
- Current batch status: `pending` (all mapped features/tests currently `deferred`)
Observed with PortTracker:
- `batch show 15`: full feature/test inventory loaded from `server/filestore.go` and related tests.
- `batch list`: Batch 15 depends on Batch 14.
- `batch ready`: Batch 15 is not startable yet (dependency gate not satisfied).
- `report summary`: overall porting progress is `1924/6942 (27.7%)`.
## Problem Statement
Batch 15 is the deepest part of FileStore internals: message block recovery/scanning, cache lifecycle, compaction/tombstones, disk rewrite/compression/encryption paths, and `ConsumerFileStore` state flushing. If this batch is implemented without strict verification discipline, downstream stream/consumer lifecycle batches will inherit correctness and durability bugs that are expensive to debug.
## Constraints and Success Criteria
- Do not start Batch 15 execution until Batch 14 is complete.
- Keep implementation scoped to Batch 15 mapped IDs.
- Enforce non-stub behavior for both production code and tests.
- Promote statuses only with evidence-backed build/test proof.
- If blocked, keep IDs deferred with explicit reasons; do not fake-pass.
Success for this batch means:
1. All implementable Batch 15 features are real C# ports with parity-oriented behavior.
2. All implementable mapped tests are real behavioral tests and passing.
3. Blocked items remain deferred with concrete, auditable reasons.
4. Batch can pass final `batch complete 15` validation once dependencies are met and evidence is complete.
## Approaches Considered
### Approach 1 (Recommended): Vertical slices by feature groups and test waves
Split 121 features into seven groups (`20/20/20/20/20/20/1`) and execute each group with immediate build/test/status gates before moving on.
Pros:
- Contains risk in file-store concurrency code.
- Makes regressions local and debuggable.
- Supports strict anti-stub and evidence requirements.
Cons:
- More checkpoints and bookkeeping overhead.
### Approach 2: Feature-complete first, tests second
Implement all production features first, then port all tests.
Pros:
- Fewer context switches while coding.
Cons:
- Late detection of regressions.
- Weak traceability for status updates.
- Higher risk of last-minute stub behavior.
### Approach 3: Test-first broad wave with temporary placeholders
Attempt to satisfy test volume rapidly with minimal implementation placeholders.
Pros:
- Fast apparent progress.
Cons:
- Violates anti-stub requirement.
- Creates false confidence and rework.
- Unacceptable for durability-critical file store logic.
## Recommended Design
### 1) Implementation Topology
Primary code targets:
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/StoreTypes.cs`
Scope focus:
- `MessageBlock` internals (95 mapped features)
- `ConsumerFileStore` internals (18 mapped features)
- Store enum/codec helpers and error wrappers
### 2) Verification-First Control Plane
Every feature group and test wave must pass:
- Per-feature read/port/build/test loop
- Stub scan gates (production + tests)
- Build gate after each feature group
- Related test gate before feature `verified`
- Chunked status updates (`<=15` IDs/update)
- Full checkpoint (`build + full test + commit`) between tasks
### 3) Test Strategy
Mapped tests are concentrated in:
- `JetStreamFileStoreTests.Impltests.cs` (58)
- `JwtProcessorTests.Impltests.cs` (24)
- `GatewayHandlerTests.Impltests.cs` (2)
- `ConcurrencyTests1.Impltests.cs` (1)
- `ConcurrencyTests2.Impltests.cs` (2)
- `EventsHandlerTests.Impltests.cs` (1)
- `ConfigReloaderTests.Impltests.cs` (1)
Design decision:
- Treat filestore/concurrency tests as primary verification for Batch 15 behavior.
- Treat cross-module JWT/gateway/events/reload tests as dependency-sensitive; keep deferred with reasons if blocked by non-Batch-15 surfaces.
### 4) Data and Concurrency Parity Priorities
Highest-risk behavior areas to preserve from Go:
- Block rebuild and index/tombstone accounting integrity.
- Cache ownership transitions and forced expiration races.
- Flush loops and fsync boundaries for pending writes.
- Compression/encryption conversion and checksum correctness.
- Consumer state encode/encrypt/flush sequencing.
### 5) Status Governance
- Feature lifecycle: `deferred -> stub -> complete -> verified`
- Test lifecycle: `deferred -> stub -> verified` (or `deferred` with blocker note)
- Status updates require direct evidence references (Go line reviewed, build output, targeted test output, stub scan clean).
## Non-Goals
- Pulling in work from future stream/consumer lifecycle batches beyond mapped Batch 15 IDs.
- Marking cross-module tests verified without executable evidence.
- Using placeholder methods/tests to satisfy status transitions.
## Acceptance Criteria
- Batch 15 feature groups fully implemented with non-stub logic.
- Mandatory verification protocol passed for each group/wave.
- Evidence-backed status transitions applied in chunks of max 15 IDs.
- Deferred items carry explicit blocker reasons.
- Batch closure commands are expected to pass once dependency Batch 14 is complete.