Files
natsnet/docs/plans/2026-02-27-batch-29-jetstream-batching-design.md
Joseph Doherty c05d93618e Add batch plans for batches 23-30 (rounds 12-15)
Generated design docs and implementation plans via Codex for:
- Batch 23: Routes
- Batch 24: Leaf Nodes
- Batch 25: Gateways
- Batch 26: WebSocket
- Batch 27: JetStream Core
- Batch 28: JetStream API
- Batch 29: JetStream Batching
- Batch 30: Raft Part 1

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 16:33:10 -05:00

6.0 KiB

Batch 29 JetStream Batching Design

Date: 2026-02-27
Batch: 29 (JetStream Batching)
Scope: 12 features + 3 unit tests
Dependencies: batch 27 (JetStream Core)
Go source: golang/nats-server/server/jetstream_batching.go (+ mapped tests in server/raft_test.go)

Problem

Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (checkMsgHeadersPreClusteredProposal). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.

Context Findings

Required command outputs (captured)

  • batch show 29 --db porting.db
    • Batch 29 is pending
    • 12 features + 3 tests are all deferred
    • Dependency: batch 27
    • Go file: server/jetstream_batching.go
  • batch list --db porting.db
    • Batch 29 sits after batch 28 and depends on 27
    • Batch 40 (MQTT Server/JSA) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
  • report summary --db porting.db
    • Features verified: 1271 / 3673
    • Tests verified: 430 / 3257
    • Deferred backlog remains dominant, so no-stub discipline is mandatory

Batch 29 mapped IDs

Features:

  • 1508 batching.newBatchGroup
  • 1509 getBatchStoreDir
  • 1510 newBatchStore
  • 1511 batchGroup.readyForCommit
  • 1512 batchGroup.cleanup
  • 1513 batchGroup.cleanupLocked
  • 1514 batchGroup.stopLocked
  • 1515 batchStagedDiff.commit
  • 1516 batchApply.clearBatchStateLocked
  • 1517 batchApply.rejectBatchStateLocked
  • 1518 batchApply.rejectBatchState
  • 1519 checkMsgHeadersPreClusteredProposal (largest surface, ~423 Go LOC)

Tests:

  • 2654 TestNRGMultipleStopsDontPanic -> RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed
  • 2674 TestNRGKeepRunningOnServerShutdown -> RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed
  • 2718 TestNRGInitSingleMemRaftNodeDefaults -> RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed

Existing .NET baseline

  • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.cs exists but is partial and contains stub-like behavior (ReadyForCommit comment indicates stub).
  • No current RaftNodeTests file exists under dotnet/tests/...; mapped test targets are not implemented yet.
  • JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.

Approaches

Approach A: Single massive JetStreamBatching.cs pass in one shot

  • Pros: fewer commits, direct throughput.
  • Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.
  • Wave 1: batch/store lifecycle primitives (1508-1514)
  • Wave 2: staged/apply/header semantics (1515-1519)
  • Wave 3: mapped Raft tests (2654,2674,2718)
  • Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
  • Cons: more status updates/checkpoints.

Approach C: Signature-first fill (compile now, behavior later)

  • Pros: quick apparent progress.
  • Cons: violates anti-stub goals and creates false tracker progress.

Decision: Approach B.

Proposed Design

1. Component boundaries

  • Keep batching logic centered in JetStreamBatching.cs for mapped methods.
  • Add narrow helper methods/types only when required to preserve method-level mapping and testability.
  • Keep heavy validation (checkMsgHeadersPreClusteredProposal) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.

2. Data and concurrency model

  • Preserve lock expectations from Go comments by using existing Lock/ReaderWriterLockSlim conventions.
  • Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
  • Ensure timer cleanup and commit readiness are race-safe and idempotent.

3. Feature grouping strategy (max ~20)

  • Group A (7 features): 1508-1514
    • Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
  • Group B (5 features): 1515-1519
    • Staged diff commit state, batch apply clear/reject, and full header pre-check function.

4. Test strategy

  • Create/port mapped tests in RaftNodeTests (or equivalent mapped class file) with real Arrange/Act/Assert behavior.
  • Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
  • Add focused unit tests for JetStreamBatching helpers as needed to verify feature behavior before promoting feature status.

5. Status and evidence design

  • Status transitions must be evidence-backed: stub -> complete -> verified.
  • Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
  • Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.

Risks and Mitigations

  • Risk: 1519 complexity causes partial/placeholder implementation.
    • Mitigation: isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
  • Risk: Mapped Raft tests need runtime hooks not yet available.
    • Mitigation: mark deferred with exact blocker reason; no fake-pass tests.
  • Risk: Tracker drift from actual behavior.
    • Mitigation: per-ID evidence, max-15 update chunk, and post-task checkpoints.

Success Criteria

  • All Batch 29 features and tests are either:
    • verified with captured verification evidence, or
    • deferred with explicit blocker reason.
  • No new stub patterns introduced in touched production/test files.
  • Build and relevant test gates are green at each task checkpoint.

Non-Goals

  • Executing implementation in this design doc.
  • Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
  • Broad integration-test harness expansion beyond what Batch 29 requires.