Files

Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)

Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.

2026-02-27 17:01:31 -05:00

5.8 KiB

Raw Blame History

Batch 31 Raft Part 2 Design

Date: 2026-02-27
Batch: 31 (Raft Part 2)
Scope: 53 features + 19 unit tests
Dependencies: batch 30 (Raft Part 1)
Go source: golang/nats-server/server/raft.go

Problem

Batch 31 covers the second Raft tranche in raft.go (roughly lines 3239-5038), focused on catchup/snapshot transfer, append-entry processing, WAL consistency, quorum tracking, vote request/response handling, and leadership state transitions. The mapped test set (19 tests) is concentrated on candidate/leader transitions, quorum correctness, membership-change edge cases, and snapshot/catchup behavior.

The design goal is to produce an execution-ready plan that enforces evidence-based status changes and prevents placeholder drift across both production features and tests.

Context Findings

Required command results

batch show 31 --db porting.db
- Status: pending
- Features: 53 (currently deferred)
- Tests: 19 (currently deferred)
- Depends on: 30
- Go file: server/raft.go
batch list --db porting.db
- Batch 31 is directly gated by Batch 30 and itself gates Batch 32 (JS Cluster Meta).
report summary --db porting.db
- Overall progress: 1924/6942 (27.7%)
- Deferred backlog remains large; verification discipline is required.

Feature and source mapping findings

Batch 31 feature IDs map in order to raft.go methods from:
- sendSnapshotToFollower through updateLeader (2733-2750)
- processAppendEntry through setWriteErrLocked (2751-2777)
- isClosed through switchToLeader (2778-2796)
Existing .NET Raft surface is in:
- dotnet/src/ZB.MOM.NatsNet.Server/JetStream/RaftTypes.cs
Current comments in RaftTypes.cs still describe algorithm methods as stubbed; Batch 31 must replace those gaps with concrete behavior and tests.

Test mapping findings

All 19 mapped tests are from server/raft_test.go and map to RaftNodeTests methods.
dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/RaftNodeTests.Impltests.cs does not currently exist, so Batch 31 planning should include creating it.
The mapped tests are behavior-heavy; they cannot be verified using placeholder assertions.

Approaches

Approach A: Monolithic implementation of all 53 features and 19 tests in one pass

Pros: single sweep.
Cons: high regression risk, weak traceability, hard to isolate failures.

Approach B (Recommended): Three feature groups (<=20 each) plus two test waves

Features are implemented in ordered method clusters, each with strict gates before status updates.
Tests are ported in two behavioral waves (state/quorum first, then snapshot/membership edge cases).
Pros: bounded scope, better failure isolation, cleaner status evidence.
Cons: more checkpoint overhead.

Approach C: Test-first across all 19 tests, then fill feature gaps

Pros: quickly exposes missing behavior.
Cons: expensive thrash because many tests depend on broad feature slices.

Decision: Approach B.

Proposed Design

1. Architecture and file strategy

Keep Raft runtime behavior in JetStream/RaftTypes.cs, with optional split into partials if file size hurts reviewability:
- RaftTypes.Catchup.cs
- RaftTypes.AppendProcessing.cs
- RaftTypes.Elections.cs
Keep test implementation in dedicated mapped backlog file:
- dotnet/tests/ZB.MOM.NatsNet.Server.Tests/ImplBacklog/RaftNodeTests.Impltests.cs
Reuse existing support types (IpQueue<T>, Channel<T>, lock + Interlocked) and avoid introducing new infra unless required for deterministic testability.

2. Feature slicing (max ~20 per group)

Feature Group A (18): catchup/snapshot/commit foundations
2733,2734,2735,2736,2737,2738,2739,2740,2741,2742,2743,2744,2745,2746,2747,2748,2749,2750
Feature Group B (18): append-entry processing and peer/WAL state
2751,2752,2753,2754,2755,2756,2758,2759,2760,2761,2765,2766,2767,2768,2769,2772,2776,2777
Feature Group C (17): vote/RPC/state transitions
2778,2779,2780,2783,2784,2785,2786,2787,2788,2789,2790,2791,2792,2793,2794,2795,2796

3. Test slicing

Test Wave T1 (10): state/quorum/election behavior
2626,2629,2635,2636,2663,2664,2667,2687,2690,2692
Test Wave T2 (9): snapshot/catchup/membership-vote edge cases
2650,2651,2693,2694,2702,2704,2705,2712,2714

4. Verification model

Enforce per-feature and per-test loops (red/green + stub scan + build/test gates).
Enforce status-update chunking (<=15 IDs per feature/test batch-update).
Enforce checkpoint protocol after every group/wave before proceeding.

5. Stuck-item policy

A blocked item is not left as pseudo-implemented.
If blocked, set deferred immediately with explicit reason via --override, then continue with next unblocked ID.

Risks and Mitigations

Risk: Batch 30 dependency incomplete blocks execution.
Mitigation: preflight dependency gate is mandatory; no Batch 31 status updates until Batch 30 is complete/ready.
Risk: Large method processAppendEntry causes hidden regressions.
Mitigation: isolate with focused tests per behavior branch plus class-level gates.
Risk: fake progress via placeholder methods/tests.
Mitigation: mandatory anti-stub scans and hard promotion gates.

Success Criteria

All 53 features are either verified with evidence or deferred with explicit blocker reason.
All 19 tests are either verified with execution evidence or deferred with explicit blocker reason.
No placeholder/stub patterns in touched production or test code.
Batch-completion readiness is auditable through build/test outputs and chunked status updates.

Non-Goals

Executing implementation in this design doc.
Implementing Batch 32+ scope.
Building new distributed integration infrastructure beyond deterministic unit-level needs.

5.8 KiB Raw Blame History