Generated design docs and implementation plans via Codex for: - Batch 6: Opts package-level functions - Batch 7: Opts class methods + Reload - Batch 9: Auth, DirStore, OCSP foundations - Batch 10: OCSP Cache + JS Events - Batch 11: FileStore Init - Batch 12: FileStore Recovery - Batch 16: Client Core (first half) - Batch 17: Client Core (second half) All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
12 KiB
Batch 12 FileStore Recovery Implementation Plan
For Codex: REQUIRED SUB-SKILL: Use
executeplanto implement this plan task-by-task.
Goal: Implement and verify all Batch 12 FileStore Recovery features from server/filestore.go with no stub logic and evidence-backed status transitions.
Architecture: Execute Batch 12 in two vertical feature groups (5 + 3). Implement recovery logic directly in JetStream/FileStore.cs, touching supporting JetStream types only when required. After each group, run strict stub scans, build, and related test gates before any status updates.
Tech Stack: .NET 10, C# latest, xUnit 3, Shouldly, NSubstitute, PortTracker CLI, SQLite (porting.db)
Design doc: docs/plans/2026-02-27-batch-12-filestore-recovery-design.md
I'm using writeplan to create the implementation plan.
Batch Inputs
- Batch:
12(FileStore Recovery) - Depends on: Batch
11 - Features:
8 - Tests:
0(batch-owned), with known related reverse dependencies:- test
#519(FileStoreRecoverFullStateDetectCorruptState_ShouldSucceed) - test
#545(FileStoreNoPanicOnRecoverTTLWithCorruptBlocks_ShouldSucceed)
- test
- Go source scope:
golang/nats-server/server/filestore.golines ~1708-2580
Feature groups (max ~20 features each):
- Group 1 (5):
987,988,991,992,993 - Group 2 (3):
995,996,997
MANDATORY VERIFICATION PROTOCOL
NON-NEGOTIABLE: Every task and every status update in this plan must follow this protocol.
Per-Feature Verification Loop (REQUIRED for every feature ID)
For each feature ID in the active group:
- Read feature mapping and exact Go intent:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- feature show <FEATURE_ID> --db porting.db
- Read corresponding Go method span in
golang/nats-server/server/filestore.go. - Implement minimal real C# behavior (no placeholders).
- Build immediately:
/usr/local/share/dotnet/dotnet build dotnet/
- Run related tests for the touched behavior (see Test Gate below).
- Record evidence (command + summary output) before adding the ID to status-update candidates.
Stub Detection Check (REQUIRED after each feature group)
Run all scans below. Any match is a hard blocker:
# Production placeholder detection
rg -n "NotImplementedException|TODO|PLACEHOLDER" \
dotnet/src/ZB.MOM.NatsNet.Server/JetStream -g '*.cs'
# Empty method bodies (FileStore recovery surface)
rg -n "^\s*(public|private|internal|protected).*(Warn|Debug|RecoverFullState|RecoverTTLState|RecoverMsgSchedulingState|CleanupOldMeta|RecoverMsgs|ExpireMsgsOnRecover)\s*\([^)]*\)\s*\{\s*\}$" \
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs
# Test placeholders in directly related classes
rg -n "NotImplementedException|Assert\.True\(true\)|Assert\.Pass|// TODO|// PLACEHOLDER" \
dotnet/tests/ZB.MOM.NatsNet.Server.Tests/JetStream/JetStreamFileStoreTests.cs \
dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamFileStoreTests.Impltests.cs
Build Gate (REQUIRED after each feature group)
This must pass before status updates and before moving to next group:
/usr/local/share/dotnet/dotnet build dotnet/
Test Gate (REQUIRED before marking features verified)
All related tests must pass. Run at least:
# Existing JetStream FileStore coverage
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
--filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.JetStream.JetStreamFileStoreTests" \
--verbosity normal
# Backlog coverage for FileStore implementation
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
--filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.ImplBacklog.JetStreamFileStoreTests" \
--verbosity normal
# Feature-linked methods from reverse dependencies
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
--filter "FullyQualifiedName~FileStoreRecoverFullStateDetectCorruptState|FullyQualifiedName~FileStoreNoPanicOnRecoverTTLWithCorruptBlocks" \
--verbosity normal
Gate rule:
- If related tests run and pass, eligible for
verified. - If related tests are unavailable/not yet implemented (0 discovered), feature may be set to
completeonly, with explicit note explaining whyverifiedis deferred.
Status Update Protocol (REQUIRED)
- Use max 15 IDs per
feature batch-updatecall. - Required status progression:
deferred -> stub -> complete -> verified. - Do not mark
verifiedwithout evidence from Build Gate + Test Gate. - Keep an evidence log folder (example:
/tmp/batch12-evidence/) with per-group command outputs.
Examples:
# Move active group to stub before editing
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute
# Move group to complete after successful implementation + build/test evidence
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
feature batch-update --ids "987,988,991,992,993" --set-status complete --db porting.db --execute
Checkpoint Protocol Between Tasks (REQUIRED)
At each group boundary:
- Full build:
/usr/local/share/dotnet/dotnet build dotnet/
- Full unit test sweep (not just filtered):
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
- Commit checkpoint before next task:
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
porting.db
git commit -m "feat(batch12): complete group <N> filestore recovery"
ANTI-STUB GUARDRAILS (NON-NEGOTIABLE)
Forbidden Patterns
The following are forbidden in Batch 12 feature or related test code:
throw new NotImplementedException(...)- Empty recovery method bodies (
{ }) // TODOor// PLACEHOLDERin implemented recovery methods- Fake test pass patterns (
Assert.True(true),Assert.Pass(), assertion-only smoke checks that do not exercise production behavior) - Swallowing corruption/IO errors silently instead of preserving Go intent
Hard Limits
- Max ~20 features per implementation group (fixed here as 5 and 3)
- Max 15 feature IDs per status-update command
- One feature group per verification/update cycle
- Zero stub-scan matches before
completeorverifiedtransitions - No
verifiedtransition without explicit Build Gate + Test Gate evidence
If You Get Stuck (MANDATORY)
- Do not add a stub, placeholder, or no-op workaround.
- Mark only blocked feature IDs as
deferredwith a concrete reason. - Continue with remaining IDs in the group.
- Record blocker details in evidence log and PortTracker override reason.
Example:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
feature update <ID> --status deferred --db porting.db \
--override "blocked: <specific technical reason>"
Task 1: Batch Start and Group 1 Staging
Files:
- Modify:
porting.db - Create:
/tmp/batch12-evidence/(evidence logs)
Step 1: Confirm current batch state
Run:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
Expected: Batch 12 pending, dependency 11, 8 features, 0 tests.
Step 2: Start batch
Run:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch start 12 --db porting.db
Expected: batch marked in-progress.
Step 3: Stage Group 1 IDs to stub
Run:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute
Expected: only Group 1 IDs set to stub.
Step 4: Commit checkpoint
git add porting.db
git commit -m "chore(batch12): start batch and stage group1 recovery ids"
Task 2: Implement Group 1 Recovery Features (5 IDs)
Files:
- Modify:
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs - Modify (if needed):
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs - Modify (if needed):
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs
Feature IDs: 987,988,991,992,993
Step 1: Implement logging helpers
- ID
987(Warn) and ID988(Debug) with FileStore-context prefixing and no-op behavior when logger/server is unavailable.
Step 2: Implement full-state recovery
- ID
991(RecoverFullState): stream state file load, length/checksum validation, decode, stale/corrupt fallback signaling.
Step 3: Implement TTL and schedule recovery
- ID
992(RecoverTTLState) - ID
993(RecoverMsgSchedulingState) - Include stale-state linear scan fallback over recovered message blocks.
Step 4: Run mandatory verification protocol for Group 1
- Per-feature loop for all 5 IDs.
- Stub Detection Check.
- Build Gate.
- Test Gate.
Step 5: Status updates (chunk <=15)
- Set Group 1 IDs to
completeafter successful evidence. - Promote to
verifiedonly if Test Gate evidence is sufficient for each feature.
Task 3: Group 1 Checkpoint
Files:
- Modify:
porting.db
Step 1: Run Checkpoint Protocol
- Full build + full unit tests.
Step 2: Commit
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
porting.db
git commit -m "feat(batch12): complete group1 filestore recovery"
Task 4: Implement Group 2 Recovery Features (3 IDs)
Files:
- Modify:
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs - Modify (if needed):
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs - Modify (if needed):
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs
Feature IDs: 995,996,997
Step 1: Implement metadata cleanup
- ID
995(CleanupOldMeta): remove stale metadata file types in message directory safely.
Step 2: Implement ordered message block recovery
- ID
996(RecoverMsgs): enumerate/sort blocks, recover block state, reconcile stream accounting, prune orphan keys.
Step 3: Implement startup expiration path
- ID
997(ExpireMsgsOnRecover): max-age pass at startup, per-subject updates, empty-block cleanup, tombstone continuity.
Step 4: Run mandatory verification protocol for Group 2
- Per-feature loop for all 3 IDs.
- Stub Detection Check.
- Build Gate.
- Test Gate.
Step 5: Status updates (chunk <=15)
- Set Group 2 IDs to
complete, thenverifiedonly when test evidence criteria are met.
Task 5: Group 2 Checkpoint and Batch Closure
Files:
- Modify:
porting.db - Generate:
reports/current.md
Step 1: Final gates
Run:
/usr/local/share/dotnet/dotnet build dotnet/
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.IntegrationTests/ --verbosity normal
Expected: zero failures in executed suites.
Step 2: Verify batch status and unblocked work
Run:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- dependency ready --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
Step 3: Complete batch
Run:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch complete 12 --db porting.db
Expected: completion succeeds only if all items meet allowed terminal states.
Step 4: Generate report + commit
./reports/generate-report.sh
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
porting.db reports/
git commit -m "feat(batch12): complete filestore recovery"
Plan complete and saved to docs/plans/2026-02-27-batch-12-filestore-recovery-plan.md. Two execution options:
1. Subagent-Driven (this session) - I dispatch fresh subagent per task, review between tasks, fast iteration
2. Parallel Session (separate) - Open new session with executeplan, batch execution with checkpoints
Which approach?