Files
natsnet/docs/plans/2026-02-27-batch-29-jetstream-batching-design.md
Joseph Doherty c05d93618e Add batch plans for batches 23-30 (rounds 12-15)
Generated design docs and implementation plans via Codex for:
- Batch 23: Routes
- Batch 24: Leaf Nodes
- Batch 25: Gateways
- Batch 26: WebSocket
- Batch 27: JetStream Core
- Batch 28: JetStream API
- Batch 29: JetStream Batching
- Batch 30: Raft Part 1

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 16:33:10 -05:00

136 lines
6.0 KiB
Markdown

# Batch 29 JetStream Batching Design
**Date:** 2026-02-27
**Batch:** 29 (`JetStream Batching`)
**Scope:** 12 features + 3 unit tests
**Dependencies:** batch `27` (`JetStream Core`)
**Go source:** `golang/nats-server/server/jetstream_batching.go` (+ mapped tests in `server/raft_test.go`)
## Problem
Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (`checkMsgHeadersPreClusteredProposal`). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.
## Context Findings
### Required command outputs (captured)
- `batch show 29 --db porting.db`
- Batch 29 is `pending`
- 12 features + 3 tests are all `deferred`
- Dependency: batch 27
- Go file: `server/jetstream_batching.go`
- `batch list --db porting.db`
- Batch 29 sits after batch 28 and depends on 27
- Batch 40 (`MQTT Server/JSA`) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
- `report summary --db porting.db`
- Features verified: 1271 / 3673
- Tests verified: 430 / 3257
- Deferred backlog remains dominant, so no-stub discipline is mandatory
### Batch 29 mapped IDs
Features:
- `1508` `batching.newBatchGroup`
- `1509` `getBatchStoreDir`
- `1510` `newBatchStore`
- `1511` `batchGroup.readyForCommit`
- `1512` `batchGroup.cleanup`
- `1513` `batchGroup.cleanupLocked`
- `1514` `batchGroup.stopLocked`
- `1515` `batchStagedDiff.commit`
- `1516` `batchApply.clearBatchStateLocked`
- `1517` `batchApply.rejectBatchStateLocked`
- `1518` `batchApply.rejectBatchState`
- `1519` `checkMsgHeadersPreClusteredProposal` (largest surface, ~423 Go LOC)
Tests:
- `2654` `TestNRGMultipleStopsDontPanic` -> `RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed`
- `2674` `TestNRGKeepRunningOnServerShutdown` -> `RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed`
- `2718` `TestNRGInitSingleMemRaftNodeDefaults` -> `RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed`
### Existing .NET baseline
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.cs` exists but is partial and contains stub-like behavior (`ReadyForCommit` comment indicates stub).
- No current `RaftNodeTests` file exists under `dotnet/tests/...`; mapped test targets are not implemented yet.
- JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.
## Approaches
### Approach A: Single massive `JetStreamBatching.cs` pass in one shot
- Pros: fewer commits, direct throughput.
- Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.
### Approach B (Recommended): Two feature waves + one test wave with strict evidence gates
- Wave 1: batch/store lifecycle primitives (`1508-1514`)
- Wave 2: staged/apply/header semantics (`1515-1519`)
- Wave 3: mapped Raft tests (`2654,2674,2718`)
- Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
- Cons: more status updates/checkpoints.
### Approach C: Signature-first fill (compile now, behavior later)
- Pros: quick apparent progress.
- Cons: violates anti-stub goals and creates false tracker progress.
Decision: **Approach B**.
## Proposed Design
### 1. Component boundaries
- Keep batching logic centered in `JetStreamBatching.cs` for mapped methods.
- Add narrow helper methods/types only when required to preserve method-level mapping and testability.
- Keep heavy validation (`checkMsgHeadersPreClusteredProposal`) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.
### 2. Data and concurrency model
- Preserve lock expectations from Go comments by using existing `Lock`/`ReaderWriterLockSlim` conventions.
- Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
- Ensure timer cleanup and commit readiness are race-safe and idempotent.
### 3. Feature grouping strategy (max ~20)
- **Group A (7 features):** `1508-1514`
- Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
- **Group B (5 features):** `1515-1519`
- Staged diff commit state, batch apply clear/reject, and full header pre-check function.
### 4. Test strategy
- Create/port mapped tests in `RaftNodeTests` (or equivalent mapped class file) with real Arrange/Act/Assert behavior.
- Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
- Add focused unit tests for `JetStreamBatching` helpers as needed to verify feature behavior before promoting feature status.
### 5. Status and evidence design
- Status transitions must be evidence-backed: `stub -> complete -> verified`.
- Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
- Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.
## Risks and Mitigations
- **Risk:** `1519` complexity causes partial/placeholder implementation.
- **Mitigation:** isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
- **Risk:** Mapped Raft tests need runtime hooks not yet available.
- **Mitigation:** mark `deferred` with exact blocker reason; no fake-pass tests.
- **Risk:** Tracker drift from actual behavior.
- **Mitigation:** per-ID evidence, max-15 update chunk, and post-task checkpoints.
## Success Criteria
- All Batch 29 features and tests are either:
- `verified` with captured verification evidence, or
- `deferred` with explicit blocker reason.
- No new stub patterns introduced in touched production/test files.
- Build and relevant test gates are green at each task checkpoint.
## Non-Goals
- Executing implementation in this design doc.
- Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
- Broad integration-test harness expansion beyond what Batch 29 requires.