Files
natsnet/docs/plans/2026-02-27-batch-12-filestore-recovery-plan.md
Joseph Doherty f0455a1e45 Add batch plans for batches 6-7, 9-12, 16-17 (rounds 4-7)
Generated design docs and implementation plans via Codex for:
- Batch 6: Opts package-level functions
- Batch 7: Opts class methods + Reload
- Batch 9: Auth, DirStore, OCSP foundations
- Batch 10: OCSP Cache + JS Events
- Batch 11: FileStore Init
- Batch 12: FileStore Recovery
- Batch 16: Client Core (first half)
- Batch 17: Client Core (second half)

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 14:56:19 -05:00

12 KiB

Batch 12 FileStore Recovery Implementation Plan

For Codex: REQUIRED SUB-SKILL: Use executeplan to implement this plan task-by-task.

Goal: Implement and verify all Batch 12 FileStore Recovery features from server/filestore.go with no stub logic and evidence-backed status transitions.

Architecture: Execute Batch 12 in two vertical feature groups (5 + 3). Implement recovery logic directly in JetStream/FileStore.cs, touching supporting JetStream types only when required. After each group, run strict stub scans, build, and related test gates before any status updates.

Tech Stack: .NET 10, C# latest, xUnit 3, Shouldly, NSubstitute, PortTracker CLI, SQLite (porting.db)

Design doc: docs/plans/2026-02-27-batch-12-filestore-recovery-design.md


I'm using writeplan to create the implementation plan.

Batch Inputs

  • Batch: 12 (FileStore Recovery)
  • Depends on: Batch 11
  • Features: 8
  • Tests: 0 (batch-owned), with known related reverse dependencies:
    • test #519 (FileStoreRecoverFullStateDetectCorruptState_ShouldSucceed)
    • test #545 (FileStoreNoPanicOnRecoverTTLWithCorruptBlocks_ShouldSucceed)
  • Go source scope: golang/nats-server/server/filestore.go lines ~1708-2580

Feature groups (max ~20 features each):

  • Group 1 (5): 987,988,991,992,993
  • Group 2 (3): 995,996,997

MANDATORY VERIFICATION PROTOCOL

NON-NEGOTIABLE: Every task and every status update in this plan must follow this protocol.

Per-Feature Verification Loop (REQUIRED for every feature ID)

For each feature ID in the active group:

  1. Read feature mapping and exact Go intent:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- feature show <FEATURE_ID> --db porting.db
  1. Read corresponding Go method span in golang/nats-server/server/filestore.go.
  2. Implement minimal real C# behavior (no placeholders).
  3. Build immediately:
/usr/local/share/dotnet/dotnet build dotnet/
  1. Run related tests for the touched behavior (see Test Gate below).
  2. Record evidence (command + summary output) before adding the ID to status-update candidates.

Stub Detection Check (REQUIRED after each feature group)

Run all scans below. Any match is a hard blocker:

# Production placeholder detection
rg -n "NotImplementedException|TODO|PLACEHOLDER" \
  dotnet/src/ZB.MOM.NatsNet.Server/JetStream -g '*.cs'

# Empty method bodies (FileStore recovery surface)
rg -n "^\s*(public|private|internal|protected).*(Warn|Debug|RecoverFullState|RecoverTTLState|RecoverMsgSchedulingState|CleanupOldMeta|RecoverMsgs|ExpireMsgsOnRecover)\s*\([^)]*\)\s*\{\s*\}$" \
  dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs

# Test placeholders in directly related classes
rg -n "NotImplementedException|Assert\.True\(true\)|Assert\.Pass|// TODO|// PLACEHOLDER" \
  dotnet/tests/ZB.MOM.NatsNet.Server.Tests/JetStream/JetStreamFileStoreTests.cs \
  dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamFileStoreTests.Impltests.cs

Build Gate (REQUIRED after each feature group)

This must pass before status updates and before moving to next group:

/usr/local/share/dotnet/dotnet build dotnet/

Test Gate (REQUIRED before marking features verified)

All related tests must pass. Run at least:

# Existing JetStream FileStore coverage
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
  --filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.JetStream.JetStreamFileStoreTests" \
  --verbosity normal

# Backlog coverage for FileStore implementation
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
  --filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.ImplBacklog.JetStreamFileStoreTests" \
  --verbosity normal

# Feature-linked methods from reverse dependencies
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
  --filter "FullyQualifiedName~FileStoreRecoverFullStateDetectCorruptState|FullyQualifiedName~FileStoreNoPanicOnRecoverTTLWithCorruptBlocks" \
  --verbosity normal

Gate rule:

  • If related tests run and pass, eligible for verified.
  • If related tests are unavailable/not yet implemented (0 discovered), feature may be set to complete only, with explicit note explaining why verified is deferred.

Status Update Protocol (REQUIRED)

  • Use max 15 IDs per feature batch-update call.
  • Required status progression: deferred -> stub -> complete -> verified.
  • Do not mark verified without evidence from Build Gate + Test Gate.
  • Keep an evidence log folder (example: /tmp/batch12-evidence/) with per-group command outputs.

Examples:

# Move active group to stub before editing
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
  feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute

# Move group to complete after successful implementation + build/test evidence
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
  feature batch-update --ids "987,988,991,992,993" --set-status complete --db porting.db --execute

Checkpoint Protocol Between Tasks (REQUIRED)

At each group boundary:

  1. Full build:
/usr/local/share/dotnet/dotnet build dotnet/
  1. Full unit test sweep (not just filtered):
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
  1. Commit checkpoint before next task:
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
  dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
  porting.db
git commit -m "feat(batch12): complete group <N> filestore recovery"

ANTI-STUB GUARDRAILS (NON-NEGOTIABLE)

Forbidden Patterns

The following are forbidden in Batch 12 feature or related test code:

  • throw new NotImplementedException(...)
  • Empty recovery method bodies ({ })
  • // TODO or // PLACEHOLDER in implemented recovery methods
  • Fake test pass patterns (Assert.True(true), Assert.Pass(), assertion-only smoke checks that do not exercise production behavior)
  • Swallowing corruption/IO errors silently instead of preserving Go intent

Hard Limits

  • Max ~20 features per implementation group (fixed here as 5 and 3)
  • Max 15 feature IDs per status-update command
  • One feature group per verification/update cycle
  • Zero stub-scan matches before complete or verified transitions
  • No verified transition without explicit Build Gate + Test Gate evidence

If You Get Stuck (MANDATORY)

  1. Do not add a stub, placeholder, or no-op workaround.
  2. Mark only blocked feature IDs as deferred with a concrete reason.
  3. Continue with remaining IDs in the group.
  4. Record blocker details in evidence log and PortTracker override reason.

Example:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
  feature update <ID> --status deferred --db porting.db \
  --override "blocked: <specific technical reason>"

Task 1: Batch Start and Group 1 Staging

Files:

  • Modify: porting.db
  • Create: /tmp/batch12-evidence/ (evidence logs)

Step 1: Confirm current batch state

Run:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db

Expected: Batch 12 pending, dependency 11, 8 features, 0 tests.

Step 2: Start batch

Run:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch start 12 --db porting.db

Expected: batch marked in-progress.

Step 3: Stage Group 1 IDs to stub

Run:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
  feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute

Expected: only Group 1 IDs set to stub.

Step 4: Commit checkpoint

git add porting.db
git commit -m "chore(batch12): start batch and stage group1 recovery ids"

Task 2: Implement Group 1 Recovery Features (5 IDs)

Files:

  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs
  • Modify (if needed): dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs
  • Modify (if needed): dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs

Feature IDs: 987,988,991,992,993

Step 1: Implement logging helpers

  • ID 987 (Warn) and ID 988 (Debug) with FileStore-context prefixing and no-op behavior when logger/server is unavailable.

Step 2: Implement full-state recovery

  • ID 991 (RecoverFullState): stream state file load, length/checksum validation, decode, stale/corrupt fallback signaling.

Step 3: Implement TTL and schedule recovery

  • ID 992 (RecoverTTLState)
  • ID 993 (RecoverMsgSchedulingState)
  • Include stale-state linear scan fallback over recovered message blocks.

Step 4: Run mandatory verification protocol for Group 1

  • Per-feature loop for all 5 IDs.
  • Stub Detection Check.
  • Build Gate.
  • Test Gate.

Step 5: Status updates (chunk <=15)

  • Set Group 1 IDs to complete after successful evidence.
  • Promote to verified only if Test Gate evidence is sufficient for each feature.

Task 3: Group 1 Checkpoint

Files:

  • Modify: porting.db

Step 1: Run Checkpoint Protocol

  • Full build + full unit tests.

Step 2: Commit

git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
  dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
  porting.db
git commit -m "feat(batch12): complete group1 filestore recovery"

Task 4: Implement Group 2 Recovery Features (3 IDs)

Files:

  • Modify: dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs
  • Modify (if needed): dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs
  • Modify (if needed): dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs

Feature IDs: 995,996,997

Step 1: Implement metadata cleanup

  • ID 995 (CleanupOldMeta): remove stale metadata file types in message directory safely.

Step 2: Implement ordered message block recovery

  • ID 996 (RecoverMsgs): enumerate/sort blocks, recover block state, reconcile stream accounting, prune orphan keys.

Step 3: Implement startup expiration path

  • ID 997 (ExpireMsgsOnRecover): max-age pass at startup, per-subject updates, empty-block cleanup, tombstone continuity.

Step 4: Run mandatory verification protocol for Group 2

  • Per-feature loop for all 3 IDs.
  • Stub Detection Check.
  • Build Gate.
  • Test Gate.

Step 5: Status updates (chunk <=15)

  • Set Group 2 IDs to complete, then verified only when test evidence criteria are met.

Task 5: Group 2 Checkpoint and Batch Closure

Files:

  • Modify: porting.db
  • Generate: reports/current.md

Step 1: Final gates

Run:

/usr/local/share/dotnet/dotnet build dotnet/
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.IntegrationTests/ --verbosity normal

Expected: zero failures in executed suites.

Step 2: Verify batch status and unblocked work

Run:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- dependency ready --db porting.db
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db

Step 3: Complete batch

Run:

/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch complete 12 --db porting.db

Expected: completion succeeds only if all items meet allowed terminal states.

Step 4: Generate report + commit

./reports/generate-report.sh
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
  dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
  porting.db reports/
git commit -m "feat(batch12): complete filestore recovery"

Plan complete and saved to docs/plans/2026-02-27-batch-12-filestore-recovery-plan.md. Two execution options:

1. Subagent-Driven (this session) - I dispatch fresh subagent per task, review between tasks, fast iteration

2. Parallel Session (separate) - Open new session with executeplan, batch execution with checkpoints

Which approach?