Generated design docs and implementation plans via Codex for: - Batch 6: Opts package-level functions - Batch 7: Opts class methods + Reload - Batch 9: Auth, DirStore, OCSP foundations - Batch 10: OCSP Cache + JS Events - Batch 11: FileStore Init - Batch 12: FileStore Recovery - Batch 16: Client Core (first half) - Batch 17: Client Core (second half) All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
361 lines
12 KiB
Markdown
361 lines
12 KiB
Markdown
# Batch 12 FileStore Recovery Implementation Plan
|
|
|
|
> **For Codex:** REQUIRED SUB-SKILL: Use `executeplan` to implement this plan task-by-task.
|
|
|
|
**Goal:** Implement and verify all Batch 12 FileStore Recovery features from `server/filestore.go` with no stub logic and evidence-backed status transitions.
|
|
|
|
**Architecture:** Execute Batch 12 in two vertical feature groups (5 + 3). Implement recovery logic directly in `JetStream/FileStore.cs`, touching supporting JetStream types only when required. After each group, run strict stub scans, build, and related test gates before any status updates.
|
|
|
|
**Tech Stack:** .NET 10, C# latest, xUnit 3, Shouldly, NSubstitute, PortTracker CLI, SQLite (`porting.db`)
|
|
|
|
**Design doc:** `docs/plans/2026-02-27-batch-12-filestore-recovery-design.md`
|
|
|
|
---
|
|
|
|
I'm using `writeplan` to create the implementation plan.
|
|
|
|
## Batch Inputs
|
|
|
|
- Batch: `12` (`FileStore Recovery`)
|
|
- Depends on: Batch `11`
|
|
- Features: `8`
|
|
- Tests: `0` (batch-owned), with known related reverse dependencies:
|
|
- test `#519` (`FileStoreRecoverFullStateDetectCorruptState_ShouldSucceed`)
|
|
- test `#545` (`FileStoreNoPanicOnRecoverTTLWithCorruptBlocks_ShouldSucceed`)
|
|
- Go source scope: `golang/nats-server/server/filestore.go` lines ~1708-2580
|
|
|
|
Feature groups (max ~20 features each):
|
|
- **Group 1 (5):** `987,988,991,992,993`
|
|
- **Group 2 (3):** `995,996,997`
|
|
|
|
---
|
|
|
|
## MANDATORY VERIFICATION PROTOCOL
|
|
|
|
> **NON-NEGOTIABLE:** Every task and every status update in this plan must follow this protocol.
|
|
|
|
### Per-Feature Verification Loop (REQUIRED for every feature ID)
|
|
|
|
For each feature ID in the active group:
|
|
|
|
1. Read feature mapping and exact Go intent:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- feature show <FEATURE_ID> --db porting.db
|
|
```
|
|
2. Read corresponding Go method span in `golang/nats-server/server/filestore.go`.
|
|
3. Implement minimal real C# behavior (no placeholders).
|
|
4. Build immediately:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet build dotnet/
|
|
```
|
|
5. Run related tests for the touched behavior (see Test Gate below).
|
|
6. Record evidence (command + summary output) before adding the ID to status-update candidates.
|
|
|
|
### Stub Detection Check (REQUIRED after each feature group)
|
|
|
|
Run all scans below. Any match is a hard blocker:
|
|
|
|
```bash
|
|
# Production placeholder detection
|
|
rg -n "NotImplementedException|TODO|PLACEHOLDER" \
|
|
dotnet/src/ZB.MOM.NatsNet.Server/JetStream -g '*.cs'
|
|
|
|
# Empty method bodies (FileStore recovery surface)
|
|
rg -n "^\s*(public|private|internal|protected).*(Warn|Debug|RecoverFullState|RecoverTTLState|RecoverMsgSchedulingState|CleanupOldMeta|RecoverMsgs|ExpireMsgsOnRecover)\s*\([^)]*\)\s*\{\s*\}$" \
|
|
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs
|
|
|
|
# Test placeholders in directly related classes
|
|
rg -n "NotImplementedException|Assert\.True\(true\)|Assert\.Pass|// TODO|// PLACEHOLDER" \
|
|
dotnet/tests/ZB.MOM.NatsNet.Server.Tests/JetStream/JetStreamFileStoreTests.cs \
|
|
dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/JetStreamFileStoreTests.Impltests.cs
|
|
```
|
|
|
|
### Build Gate (REQUIRED after each feature group)
|
|
|
|
This must pass before status updates and before moving to next group:
|
|
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet build dotnet/
|
|
```
|
|
|
|
### Test Gate (REQUIRED before marking features `verified`)
|
|
|
|
All related tests must pass. Run at least:
|
|
|
|
```bash
|
|
# Existing JetStream FileStore coverage
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
|
|
--filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.JetStream.JetStreamFileStoreTests" \
|
|
--verbosity normal
|
|
|
|
# Backlog coverage for FileStore implementation
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
|
|
--filter "FullyQualifiedName~ZB.MOM.NatsNet.Server.Tests.ImplBacklog.JetStreamFileStoreTests" \
|
|
--verbosity normal
|
|
|
|
# Feature-linked methods from reverse dependencies
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ \
|
|
--filter "FullyQualifiedName~FileStoreRecoverFullStateDetectCorruptState|FullyQualifiedName~FileStoreNoPanicOnRecoverTTLWithCorruptBlocks" \
|
|
--verbosity normal
|
|
```
|
|
|
|
Gate rule:
|
|
- If related tests run and pass, eligible for `verified`.
|
|
- If related tests are unavailable/not yet implemented (0 discovered), feature may be set to `complete` only, with explicit note explaining why `verified` is deferred.
|
|
|
|
### Status Update Protocol (REQUIRED)
|
|
|
|
- Use max **15 IDs** per `feature batch-update` call.
|
|
- Required status progression: `deferred -> stub -> complete -> verified`.
|
|
- Do not mark `verified` without evidence from Build Gate + Test Gate.
|
|
- Keep an evidence log folder (example: `/tmp/batch12-evidence/`) with per-group command outputs.
|
|
|
|
Examples:
|
|
|
|
```bash
|
|
# Move active group to stub before editing
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
|
|
feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute
|
|
|
|
# Move group to complete after successful implementation + build/test evidence
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
|
|
feature batch-update --ids "987,988,991,992,993" --set-status complete --db porting.db --execute
|
|
```
|
|
|
|
### Checkpoint Protocol Between Tasks (REQUIRED)
|
|
|
|
At each group boundary:
|
|
|
|
1. Full build:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet build dotnet/
|
|
```
|
|
2. Full unit test sweep (not just filtered):
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
|
|
```
|
|
3. Commit checkpoint before next task:
|
|
```bash
|
|
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
|
|
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
|
|
porting.db
|
|
git commit -m "feat(batch12): complete group <N> filestore recovery"
|
|
```
|
|
|
|
---
|
|
|
|
## ANTI-STUB GUARDRAILS (NON-NEGOTIABLE)
|
|
|
|
### Forbidden Patterns
|
|
|
|
The following are forbidden in Batch 12 feature or related test code:
|
|
|
|
- `throw new NotImplementedException(...)`
|
|
- Empty recovery method bodies (`{ }`)
|
|
- `// TODO` or `// PLACEHOLDER` in implemented recovery methods
|
|
- Fake test pass patterns (`Assert.True(true)`, `Assert.Pass()`, assertion-only smoke checks that do not exercise production behavior)
|
|
- Swallowing corruption/IO errors silently instead of preserving Go intent
|
|
|
|
### Hard Limits
|
|
|
|
- Max ~20 features per implementation group (fixed here as 5 and 3)
|
|
- Max 15 feature IDs per status-update command
|
|
- One feature group per verification/update cycle
|
|
- Zero stub-scan matches before `complete` or `verified` transitions
|
|
- No `verified` transition without explicit Build Gate + Test Gate evidence
|
|
|
|
### If You Get Stuck (MANDATORY)
|
|
|
|
1. Do **not** add a stub, placeholder, or no-op workaround.
|
|
2. Mark only blocked feature IDs as `deferred` with a concrete reason.
|
|
3. Continue with remaining IDs in the group.
|
|
4. Record blocker details in evidence log and PortTracker override reason.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
|
|
feature update <ID> --status deferred --db porting.db \
|
|
--override "blocked: <specific technical reason>"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 1: Batch Start and Group 1 Staging
|
|
|
|
**Files:**
|
|
- Modify: `porting.db`
|
|
- Create: `/tmp/batch12-evidence/` (evidence logs)
|
|
|
|
**Step 1: Confirm current batch state**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
|
|
```
|
|
Expected: Batch 12 pending, dependency 11, 8 features, 0 tests.
|
|
|
|
**Step 2: Start batch**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch start 12 --db porting.db
|
|
```
|
|
Expected: batch marked in-progress.
|
|
|
|
**Step 3: Stage Group 1 IDs to `stub`**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- \
|
|
feature batch-update --ids "987,988,991,992,993" --set-status stub --db porting.db --execute
|
|
```
|
|
Expected: only Group 1 IDs set to `stub`.
|
|
|
|
**Step 4: Commit checkpoint**
|
|
|
|
```bash
|
|
git add porting.db
|
|
git commit -m "chore(batch12): start batch and stage group1 recovery ids"
|
|
```
|
|
|
|
### Task 2: Implement Group 1 Recovery Features (5 IDs)
|
|
|
|
**Files:**
|
|
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs`
|
|
- Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs`
|
|
- Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs`
|
|
|
|
**Feature IDs:** `987,988,991,992,993`
|
|
|
|
**Step 1: Implement logging helpers**
|
|
|
|
- ID `987` (`Warn`) and ID `988` (`Debug`) with FileStore-context prefixing and no-op behavior when logger/server is unavailable.
|
|
|
|
**Step 2: Implement full-state recovery**
|
|
|
|
- ID `991` (`RecoverFullState`): stream state file load, length/checksum validation, decode, stale/corrupt fallback signaling.
|
|
|
|
**Step 3: Implement TTL and schedule recovery**
|
|
|
|
- ID `992` (`RecoverTTLState`)
|
|
- ID `993` (`RecoverMsgSchedulingState`)
|
|
- Include stale-state linear scan fallback over recovered message blocks.
|
|
|
|
**Step 4: Run mandatory verification protocol for Group 1**
|
|
|
|
- Per-feature loop for all 5 IDs.
|
|
- Stub Detection Check.
|
|
- Build Gate.
|
|
- Test Gate.
|
|
|
|
**Step 5: Status updates (chunk <=15)**
|
|
|
|
- Set Group 1 IDs to `complete` after successful evidence.
|
|
- Promote to `verified` only if Test Gate evidence is sufficient for each feature.
|
|
|
|
### Task 3: Group 1 Checkpoint
|
|
|
|
**Files:**
|
|
- Modify: `porting.db`
|
|
|
|
**Step 1: Run Checkpoint Protocol**
|
|
|
|
- Full build + full unit tests.
|
|
|
|
**Step 2: Commit**
|
|
|
|
```bash
|
|
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
|
|
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
|
|
porting.db
|
|
git commit -m "feat(batch12): complete group1 filestore recovery"
|
|
```
|
|
|
|
### Task 4: Implement Group 2 Recovery Features (3 IDs)
|
|
|
|
**Files:**
|
|
- Modify: `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStore.cs`
|
|
- Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/MessageBlock.cs`
|
|
- Modify (if needed): `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/FileStoreTypes.cs`
|
|
|
|
**Feature IDs:** `995,996,997`
|
|
|
|
**Step 1: Implement metadata cleanup**
|
|
|
|
- ID `995` (`CleanupOldMeta`): remove stale metadata file types in message directory safely.
|
|
|
|
**Step 2: Implement ordered message block recovery**
|
|
|
|
- ID `996` (`RecoverMsgs`): enumerate/sort blocks, recover block state, reconcile stream accounting, prune orphan keys.
|
|
|
|
**Step 3: Implement startup expiration path**
|
|
|
|
- ID `997` (`ExpireMsgsOnRecover`): max-age pass at startup, per-subject updates, empty-block cleanup, tombstone continuity.
|
|
|
|
**Step 4: Run mandatory verification protocol for Group 2**
|
|
|
|
- Per-feature loop for all 3 IDs.
|
|
- Stub Detection Check.
|
|
- Build Gate.
|
|
- Test Gate.
|
|
|
|
**Step 5: Status updates (chunk <=15)**
|
|
|
|
- Set Group 2 IDs to `complete`, then `verified` only when test evidence criteria are met.
|
|
|
|
### Task 5: Group 2 Checkpoint and Batch Closure
|
|
|
|
**Files:**
|
|
- Modify: `porting.db`
|
|
- Generate: `reports/current.md`
|
|
|
|
**Step 1: Final gates**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet build dotnet/
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ --verbosity normal
|
|
/usr/local/share/dotnet/dotnet test dotnet/tests/ZB.MOM.NatsNet.Server.IntegrationTests/ --verbosity normal
|
|
```
|
|
Expected: zero failures in executed suites.
|
|
|
|
**Step 2: Verify batch status and unblocked work**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 12 --db porting.db
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- dependency ready --db porting.db
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
|
|
```
|
|
|
|
**Step 3: Complete batch**
|
|
|
|
Run:
|
|
```bash
|
|
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch complete 12 --db porting.db
|
|
```
|
|
Expected: completion succeeds only if all items meet allowed terminal states.
|
|
|
|
**Step 4: Generate report + commit**
|
|
|
|
```bash
|
|
./reports/generate-report.sh
|
|
git add dotnet/src/ZB.MOM.NatsNet.Server/JetStream \
|
|
dotnet/tests/ZB.MOM.NatsNet.Server.Tests \
|
|
porting.db reports/
|
|
git commit -m "feat(batch12): complete filestore recovery"
|
|
```
|
|
|
|
---
|
|
|
|
Plan complete and saved to `docs/plans/2026-02-27-batch-12-filestore-recovery-plan.md`. Two execution options:
|
|
|
|
**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration
|
|
|
|
**2. Parallel Session (separate)** - Open new session with `executeplan`, batch execution with checkpoints
|
|
|
|
Which approach?
|