natsnet/docs/plans/2026-02-27-batch-29-jetstream-batching-design.md

# Batch 29 JetStream Batching Design

**Date:** 2026-02-27
**Batch:** 29 (`JetStream Batching`)
**Scope:** 12 features + 3 unit tests
**Dependencies:** batch `27` (`JetStream Core`)
**Go source:** `golang/nats-server/server/jetstream_batching.go` (+ mapped tests in `server/raft_test.go`)

## Problem

Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (`checkMsgHeadersPreClusteredProposal`). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.

## Context Findings

### Required command outputs (captured)

- `batch show 29 --db porting.db`
  - Batch 29 is `pending`
  - 12 features + 3 tests are all `deferred`
  - Dependency: batch 27
  - Go file: `server/jetstream_batching.go`
- `batch list --db porting.db`
  - Batch 29 sits after batch 28 and depends on 27
  - Batch 40 (`MQTT Server/JSA`) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
- `report summary --db porting.db`
  - Features verified: 1271 / 3673
  - Tests verified: 430 / 3257
  - Deferred backlog remains dominant, so no-stub discipline is mandatory

### Batch 29 mapped IDs

Features:

- `1508` `batching.newBatchGroup`
- `1509` `getBatchStoreDir`
- `1510` `newBatchStore`
- `1511` `batchGroup.readyForCommit`
- `1512` `batchGroup.cleanup`
- `1513` `batchGroup.cleanupLocked`
- `1514` `batchGroup.stopLocked`
- `1515` `batchStagedDiff.commit`
- `1516` `batchApply.clearBatchStateLocked`
- `1517` `batchApply.rejectBatchStateLocked`
- `1518` `batchApply.rejectBatchState`
- `1519` `checkMsgHeadersPreClusteredProposal` (largest surface, ~423 Go LOC)

Tests:

- `2654` `TestNRGMultipleStopsDontPanic` -> `RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed`
- `2674` `TestNRGKeepRunningOnServerShutdown` -> `RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed`
- `2718` `TestNRGInitSingleMemRaftNodeDefaults` -> `RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed`

### Existing .NET baseline

- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.cs` exists but is partial and contains stub-like behavior (`ReadyForCommit` comment indicates stub).
- No current `RaftNodeTests` file exists under `dotnet/tests/...`; mapped test targets are not implemented yet.
- JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.

## Approaches

### Approach A: Single massive `JetStreamBatching.cs` pass in one shot

- Pros: fewer commits, direct throughput.
- Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.

### Approach B (Recommended): Two feature waves + one test wave with strict evidence gates

- Wave 1: batch/store lifecycle primitives (`1508-1514`)
- Wave 2: staged/apply/header semantics (`1515-1519`)
- Wave 3: mapped Raft tests (`2654,2674,2718`)
- Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
- Cons: more status updates/checkpoints.

### Approach C: Signature-first fill (compile now, behavior later)

- Pros: quick apparent progress.
- Cons: violates anti-stub goals and creates false tracker progress.

Decision: **Approach B**.

## Proposed Design

### 1. Component boundaries

- Keep batching logic centered in `JetStreamBatching.cs` for mapped methods.
- Add narrow helper methods/types only when required to preserve method-level mapping and testability.
- Keep heavy validation (`checkMsgHeadersPreClusteredProposal`) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.

### 2. Data and concurrency model

- Preserve lock expectations from Go comments by using existing `Lock`/`ReaderWriterLockSlim` conventions.
- Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
- Ensure timer cleanup and commit readiness are race-safe and idempotent.

### 3. Feature grouping strategy (max ~20)

- **Group A (7 features):** `1508-1514`
  - Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
- **Group B (5 features):** `1515-1519`
  - Staged diff commit state, batch apply clear/reject, and full header pre-check function.

### 4. Test strategy

- Create/port mapped tests in `RaftNodeTests` (or equivalent mapped class file) with real Arrange/Act/Assert behavior.
- Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
- Add focused unit tests for `JetStreamBatching` helpers as needed to verify feature behavior before promoting feature status.

### 5. Status and evidence design

- Status transitions must be evidence-backed: `stub -> complete -> verified`.
- Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
- Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.

## Risks and Mitigations

- **Risk:** `1519` complexity causes partial/placeholder implementation.
  - **Mitigation:** isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
- **Risk:** Mapped Raft tests need runtime hooks not yet available.
  - **Mitigation:** mark `deferred` with exact blocker reason; no fake-pass tests.
- **Risk:** Tracker drift from actual behavior.
  - **Mitigation:** per-ID evidence, max-15 update chunk, and post-task checkpoints.

## Success Criteria

- All Batch 29 features and tests are either:
  - `verified` with captured verification evidence, or
  - `deferred` with explicit blocker reason.
- No new stub patterns introduced in touched production/test files.
- Build and relevant test gates are green at each task checkpoint.

## Non-Goals

- Executing implementation in this design doc.
- Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
- Broad integration-test harness expansion beyond what Batch 29 requires.