Files

Joseph Doherty c05d93618e Add batch plans for batches 23-30 (rounds 12-15)

Generated design docs and implementation plans via Codex for:
- Batch 23: Routes
- Batch 24: Leaf Nodes
- Batch 25: Gateways
- Batch 26: WebSocket
- Batch 27: JetStream Core
- Batch 28: JetStream API
- Batch 29: JetStream Batching
- Batch 30: Raft Part 1

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.

2026-02-27 16:33:10 -05:00

6.0 KiB

Raw Blame History

Batch 29 JetStream Batching Design

Date: 2026-02-27
Batch: 29 (JetStream Batching)
Scope: 12 features + 3 unit tests
Dependencies: batch 27 (JetStream Core)
Go source: golang/nats-server/server/jetstream_batching.go (+ mapped tests in server/raft_test.go)

Problem

Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (checkMsgHeadersPreClusteredProposal). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.

Context Findings

Required command outputs (captured)

batch show 29 --db porting.db
- Batch 29 is pending
- 12 features + 3 tests are all deferred
- Dependency: batch 27
- Go file: server/jetstream_batching.go
batch list --db porting.db
- Batch 29 sits after batch 28 and depends on 27
- Batch 40 (MQTT Server/JSA) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
report summary --db porting.db
- Features verified: 1271 / 3673
- Tests verified: 430 / 3257
- Deferred backlog remains dominant, so no-stub discipline is mandatory

Batch 29 mapped IDs

Features:

1508 batching.newBatchGroup
1509 getBatchStoreDir
1510 newBatchStore
1511 batchGroup.readyForCommit
1512 batchGroup.cleanup
1513 batchGroup.cleanupLocked
1514 batchGroup.stopLocked
1515 batchStagedDiff.commit
1516 batchApply.clearBatchStateLocked
1517 batchApply.rejectBatchStateLocked
1518 batchApply.rejectBatchState
1519 checkMsgHeadersPreClusteredProposal (largest surface, ~423 Go LOC)

Tests:

2654 TestNRGMultipleStopsDontPanic -> RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed
2674 TestNRGKeepRunningOnServerShutdown -> RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed
2718 TestNRGInitSingleMemRaftNodeDefaults -> RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed

Existing .NET baseline

dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.cs exists but is partial and contains stub-like behavior (ReadyForCommit comment indicates stub).
No current RaftNodeTests file exists under dotnet/tests/...; mapped test targets are not implemented yet.
JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.

Approaches

Approach A: Single massive `JetStreamBatching.cs` pass in one shot

Pros: fewer commits, direct throughput.
Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.

Approach B (Recommended): Two feature waves + one test wave with strict evidence gates

Wave 1: batch/store lifecycle primitives (1508-1514)
Wave 2: staged/apply/header semantics (1515-1519)
Wave 3: mapped Raft tests (2654,2674,2718)
Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
Cons: more status updates/checkpoints.

Approach C: Signature-first fill (compile now, behavior later)

Pros: quick apparent progress.
Cons: violates anti-stub goals and creates false tracker progress.

Decision: Approach B.

Proposed Design

1. Component boundaries

Keep batching logic centered in JetStreamBatching.cs for mapped methods.
Add narrow helper methods/types only when required to preserve method-level mapping and testability.
Keep heavy validation (checkMsgHeadersPreClusteredProposal) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.

2. Data and concurrency model

Preserve lock expectations from Go comments by using existing Lock/ReaderWriterLockSlim conventions.
Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
Ensure timer cleanup and commit readiness are race-safe and idempotent.

3. Feature grouping strategy (max ~20)

Group A (7 features): 1508-1514
- Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
Group B (5 features): 1515-1519
- Staged diff commit state, batch apply clear/reject, and full header pre-check function.

4. Test strategy

Create/port mapped tests in RaftNodeTests (or equivalent mapped class file) with real Arrange/Act/Assert behavior.
Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
Add focused unit tests for JetStreamBatching helpers as needed to verify feature behavior before promoting feature status.

5. Status and evidence design

Status transitions must be evidence-backed: stub -> complete -> verified.
Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.

Risks and Mitigations

Risk: 1519 complexity causes partial/placeholder implementation.
- Mitigation: isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
Risk: Mapped Raft tests need runtime hooks not yet available.
- Mitigation: mark deferred with exact blocker reason; no fake-pass tests.
Risk: Tracker drift from actual behavior.
- Mitigation: per-ID evidence, max-15 update chunk, and post-task checkpoints.

Success Criteria

All Batch 29 features and tests are either:
- verified with captured verification evidence, or
- deferred with explicit blocker reason.
No new stub patterns introduced in touched production/test files.
Build and relevant test gates are green at each task checkpoint.

Non-Goals

Executing implementation in this design doc.
Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
Broad integration-test harness expansion beyond what Batch 29 requires.

6.0 KiB Raw Blame History