# Batch 12 FileStore Recovery Implementation Plan > **For Codex:** REQUIRED SUB-SKILL: Use `executeplan` to implement this plan task-by-task. **Goal:** Implement and verify all Batch 12 FileStore Recovery features from `server/filestore.go` with no stub logic and evidence-backed status transitions. **Architecture:** Execute Batch 12 in two vertical feature groups (5 + 3). Implement recovery logic directly in `JetStream/FileStore.cs`, touching supporting JetStream types only when required. After each group, run strict stub scans, build, and related test gates before any status updates. **Tech Stack:** .NET 10, C# latest, xUnit 3, Shouldly, NSubstitute, PortTracker CLI, SQLite (`porting.db`) **Design doc:** `docs/plans/2026-02-27-batch-12-filestore-recovery-design.md` --- I'm using `writeplan` to create the implementation plan. ## Batch Inputs - Batch: `12` (`FileStore Recovery`) - Depends on: Batch `11` - Features: `8` - Tests: `0` (batch-owned), with known related reverse dependencies: - test `#519` (`FileStoreRecoverFullStateDetectCorruptState_ShouldSucceed`) - test `#545` (`FileStoreNoPanicOnRecoverTTLWithCorruptBlocks_ShouldSucceed`) - Go source scope: `golang/nats-server/server/filestore.go` lines ~1708-2580 Feature groups (max ~20 features each): - **Group 1 (5):** `987,988,991,992,993` - **Group 2 (3):** `995,996,997` --- ## MANDATORY VERIFICATION PROTOCOL > **NON-NEGOTIABLE:** Every task and every status update in this plan must follow this protocol. ### Per-Feature Verification Loop (REQUIRED for every feature ID) For each feature ID in the active group: 1. Read feature mapping and exact Go intent: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- feature show --db porting.db ``` 2. Read corresponding Go method span in `golang/nats-server/server/filestore.go`. 3. Implement minimal real C# behavior (no placeholders). 4. Build immediately: ```bash /usr/local/share/dotnet/dotnet build dotnet/ ``` 5. Run related tests for the touched behavior (see Test Gate below). 6. Record evidence (command + summary output) before adding the ID to status-update candidates. ### Stub Detection Check (REQUIRED after each feature group) Run all scans below. Any match is a hard blocker: ```bash # Production placeholder detection rg -n "NotImplementedException|TODO|PLACEHOLDER" \ dotnet/src/ZB.MOM.NatsNet.Server/JetStream -g '*.cs' # Empty method bodies (FileStore recovery surface) rg -n "^\s*(public|private|internal|protected).*(Warn|Debug|RecoverFullState|RecoverTTLState|RecoverMsgSchedulingState|CleanupOldMeta|RecoverMsgs|ExpireMsgsOnRecover)\s*\([^)]*\)\s*\{\s*\}$" \ dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs # Test placeholders in directly related classes rg -n "NotImplementedException|Assert\.True\(true\)|Assert\.Pass|// TODO|// PLACEHOLDER" \ dotnet/tests/ZB.MOM.NatsNet.Server.Tests/JetStream/JetStreamFileStoreTests.cs \ dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamFileStoreTests.Impltests.cs ``` ### Build Gate (REQUIRED after each feature group) This must pass before status updates and before moving to next group: ```bash /usr/local/share/dotnet/dotnet build dotnet/ ``` ### Test Gate (REQUIRED before marking features `verified`) All related tests must pass. Run at least: ```bash # Existing JetStream FileStore coverage /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \ --filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.JetStream.JetStreamFileStoreTests" \ --verbosity normal # Backlog coverage for FileStore implementation /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \ --filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.ImplBacklog.JetStreamFileStoreTests" \ --verbosity normal # Feature-linked methods from reverse dependencies /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \ --filter "FullyQualifiedName~FileStoreRecoverFullStateDetectCorruptState|FullyQualifiedName~FileStoreNoPanicOnRecoverTTLWithCorruptBlocks" \ --verbosity normal ``` Gate rule: - If related tests run and pass, eligible for `verified`. - If related tests are unavailable/not yet implemented (0 discovered), feature may be set to `complete` only, with explicit note explaining why `verified` is deferred. ### Status Update Protocol (REQUIRED) - Use max **15 IDs** per `feature batch-update` call. - Required status progression: `deferred -> stub -> complete -> verified`. - Do not mark `verified` without evidence from Build Gate + Test Gate. - Keep an evidence log folder (example: `/tmp/batch12-evidence/`) with per-group command outputs. Examples: ```bash # Move active group to stub before editing /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \ feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute # Move group to complete after successful implementation + build/test evidence /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \ feature batch-update --ids "987,988,991,992,993" --set-status complete --db porting.db --execute ``` ### Checkpoint Protocol Between Tasks (REQUIRED) At each group boundary: 1. Full build: ```bash /usr/local/share/dotnet/dotnet build dotnet/ ``` 2. Full unit test sweep (not just filtered): ```bash /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal ``` 3. Commit checkpoint before next task: ```bash git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \ dotnet/tests/ZB.MOM.NatsNet.Server.Tests \ porting.db git commit -m "feat(batch12): complete group filestore recovery" ``` --- ## ANTI-STUB GUARDRAILS (NON-NEGOTIABLE) ### Forbidden Patterns The following are forbidden in Batch 12 feature or related test code: - `throw new NotImplementedException(...)` - Empty recovery method bodies (`{ }`) - `// TODO` or `// PLACEHOLDER` in implemented recovery methods - Fake test pass patterns (`Assert.True(true)`, `Assert.Pass()`, assertion-only smoke checks that do not exercise production behavior) - Swallowing corruption/IO errors silently instead of preserving Go intent ### Hard Limits - Max ~20 features per implementation group (fixed here as 5 and 3) - Max 15 feature IDs per status-update command - One feature group per verification/update cycle - Zero stub-scan matches before `complete` or `verified` transitions - No `verified` transition without explicit Build Gate + Test Gate evidence ### If You Get Stuck (MANDATORY) 1. Do **not** add a stub, placeholder, or no-op workaround. 2. Mark only blocked feature IDs as `deferred` with a concrete reason. 3. Continue with remaining IDs in the group. 4. Record blocker details in evidence log and PortTracker override reason. Example: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \ feature update --status deferred --db porting.db \ --override "blocked: " ``` --- ### Task 1: Batch Start and Group 1 Staging **Files:** - Modify: `porting.db` - Create: `/tmp/batch12-evidence/` (evidence logs) **Step 1: Confirm current batch state** Run: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db ``` Expected: Batch 12 pending, dependency 11, 8 features, 0 tests. **Step 2: Start batch** Run: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch start 12 --db porting.db ``` Expected: batch marked in-progress. **Step 3: Stage Group 1 IDs to `stub`** Run: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \ feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute ``` Expected: only Group 1 IDs set to `stub`. **Step 4: Commit checkpoint** ```bash git add porting.db git commit -m "chore(batch12): start batch and stage group1 recovery ids" ``` ### Task 2: Implement Group 1 Recovery Features (5 IDs) **Files:** - Modify: `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs` - Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs` - Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs` **Feature IDs:** `987,988,991,992,993` **Step 1: Implement logging helpers** - ID `987` (`Warn`) and ID `988` (`Debug`) with FileStore-context prefixing and no-op behavior when logger/server is unavailable. **Step 2: Implement full-state recovery** - ID `991` (`RecoverFullState`): stream state file load, length/checksum validation, decode, stale/corrupt fallback signaling. **Step 3: Implement TTL and schedule recovery** - ID `992` (`RecoverTTLState`) - ID `993` (`RecoverMsgSchedulingState`) - Include stale-state linear scan fallback over recovered message blocks. **Step 4: Run mandatory verification protocol for Group 1** - Per-feature loop for all 5 IDs. - Stub Detection Check. - Build Gate. - Test Gate. **Step 5: Status updates (chunk <=15)** - Set Group 1 IDs to `complete` after successful evidence. - Promote to `verified` only if Test Gate evidence is sufficient for each feature. ### Task 3: Group 1 Checkpoint **Files:** - Modify: `porting.db` **Step 1: Run Checkpoint Protocol** - Full build + full unit tests. **Step 2: Commit** ```bash git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \ dotnet/tests/ZB.MOM.NatsNet.Server.Tests \ porting.db git commit -m "feat(batch12): complete group1 filestore recovery" ``` ### Task 4: Implement Group 2 Recovery Features (3 IDs) **Files:** - Modify: `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs` - Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs` - Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs` **Feature IDs:** `995,996,997` **Step 1: Implement metadata cleanup** - ID `995` (`CleanupOldMeta`): remove stale metadata file types in message directory safely. **Step 2: Implement ordered message block recovery** - ID `996` (`RecoverMsgs`): enumerate/sort blocks, recover block state, reconcile stream accounting, prune orphan keys. **Step 3: Implement startup expiration path** - ID `997` (`ExpireMsgsOnRecover`): max-age pass at startup, per-subject updates, empty-block cleanup, tombstone continuity. **Step 4: Run mandatory verification protocol for Group 2** - Per-feature loop for all 3 IDs. - Stub Detection Check. - Build Gate. - Test Gate. **Step 5: Status updates (chunk <=15)** - Set Group 2 IDs to `complete`, then `verified` only when test evidence criteria are met. ### Task 5: Group 2 Checkpoint and Batch Closure **Files:** - Modify: `porting.db` - Generate: `reports/current.md` **Step 1: Final gates** Run: ```bash /usr/local/share/dotnet/dotnet build dotnet/ /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal /usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.IntegrationTests/ --verbosity normal ``` Expected: zero failures in executed suites. **Step 2: Verify batch status and unblocked work** Run: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- dependency ready --db porting.db /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db ``` **Step 3: Complete batch** Run: ```bash /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch complete 12 --db porting.db ``` Expected: completion succeeds only if all items meet allowed terminal states. **Step 4: Generate report + commit** ```bash ./reports/generate-report.sh git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \ dotnet/tests/ZB.MOM.NatsNet.Server.Tests \ porting.db reports/ git commit -m "feat(batch12): complete filestore recovery" ``` --- Plan complete and saved to `docs/plans/2026-02-27-batch-12-filestore-recovery-plan.md`. Two execution options: **1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration **2. Parallel Session (separate)** - Open new session with `executeplan`, batch execution with checkpoints Which approach?