Generated design docs and implementation plans via Codex for: - Batch 23: Routes - Batch 24: Leaf Nodes - Batch 25: Gateways - Batch 26: WebSocket - Batch 27: JetStream Core - Batch 28: JetStream API - Batch 29: JetStream Batching - Batch 30: Raft Part 1 All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
6.0 KiB
6.0 KiB
Batch 29 JetStream Batching Design
Date: 2026-02-27
Batch: 29 (JetStream Batching)
Scope: 12 features + 3 unit tests
Dependencies: batch 27 (JetStream Core)
Go source: golang/nats-server/server/jetstream_batching.go (+ mapped tests in server/raft_test.go)
Problem
Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (checkMsgHeadersPreClusteredProposal). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.
Context Findings
Required command outputs (captured)
batch show 29 --db porting.db- Batch 29 is
pending - 12 features + 3 tests are all
deferred - Dependency: batch 27
- Go file:
server/jetstream_batching.go
- Batch 29 is
batch list --db porting.db- Batch 29 sits after batch 28 and depends on 27
- Batch 40 (
MQTT Server/JSA) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
report summary --db porting.db- Features verified: 1271 / 3673
- Tests verified: 430 / 3257
- Deferred backlog remains dominant, so no-stub discipline is mandatory
Batch 29 mapped IDs
Features:
1508batching.newBatchGroup1509getBatchStoreDir1510newBatchStore1511batchGroup.readyForCommit1512batchGroup.cleanup1513batchGroup.cleanupLocked1514batchGroup.stopLocked1515batchStagedDiff.commit1516batchApply.clearBatchStateLocked1517batchApply.rejectBatchStateLocked1518batchApply.rejectBatchState1519checkMsgHeadersPreClusteredProposal(largest surface, ~423 Go LOC)
Tests:
2654TestNRGMultipleStopsDontPanic->RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed2674TestNRGKeepRunningOnServerShutdown->RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed2718TestNRGInitSingleMemRaftNodeDefaults->RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed
Existing .NET baseline
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.csexists but is partial and contains stub-like behavior (ReadyForCommitcomment indicates stub).- No current
RaftNodeTestsfile exists underdotnet/tests/...; mapped test targets are not implemented yet. - JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.
Approaches
Approach A: Single massive JetStreamBatching.cs pass in one shot
- Pros: fewer commits, direct throughput.
- Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.
Approach B (Recommended): Two feature waves + one test wave with strict evidence gates
- Wave 1: batch/store lifecycle primitives (
1508-1514) - Wave 2: staged/apply/header semantics (
1515-1519) - Wave 3: mapped Raft tests (
2654,2674,2718) - Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
- Cons: more status updates/checkpoints.
Approach C: Signature-first fill (compile now, behavior later)
- Pros: quick apparent progress.
- Cons: violates anti-stub goals and creates false tracker progress.
Decision: Approach B.
Proposed Design
1. Component boundaries
- Keep batching logic centered in
JetStreamBatching.csfor mapped methods. - Add narrow helper methods/types only when required to preserve method-level mapping and testability.
- Keep heavy validation (
checkMsgHeadersPreClusteredProposal) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.
2. Data and concurrency model
- Preserve lock expectations from Go comments by using existing
Lock/ReaderWriterLockSlimconventions. - Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
- Ensure timer cleanup and commit readiness are race-safe and idempotent.
3. Feature grouping strategy (max ~20)
- Group A (7 features):
1508-1514- Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
- Group B (5 features):
1515-1519- Staged diff commit state, batch apply clear/reject, and full header pre-check function.
4. Test strategy
- Create/port mapped tests in
RaftNodeTests(or equivalent mapped class file) with real Arrange/Act/Assert behavior. - Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
- Add focused unit tests for
JetStreamBatchinghelpers as needed to verify feature behavior before promoting feature status.
5. Status and evidence design
- Status transitions must be evidence-backed:
stub -> complete -> verified. - Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
- Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.
Risks and Mitigations
- Risk:
1519complexity causes partial/placeholder implementation.- Mitigation: isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
- Risk: Mapped Raft tests need runtime hooks not yet available.
- Mitigation: mark
deferredwith exact blocker reason; no fake-pass tests.
- Mitigation: mark
- Risk: Tracker drift from actual behavior.
- Mitigation: per-ID evidence, max-15 update chunk, and post-task checkpoints.
Success Criteria
- All Batch 29 features and tests are either:
verifiedwith captured verification evidence, ordeferredwith explicit blocker reason.
- No new stub patterns introduced in touched production/test files.
- Build and relevant test gates are green at each task checkpoint.
Non-Goals
- Executing implementation in this design doc.
- Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
- Broad integration-test harness expansion beyond what Batch 29 requires.