Generated design docs and implementation plans via Codex for: - Batch 23: Routes - Batch 24: Leaf Nodes - Batch 25: Gateways - Batch 26: WebSocket - Batch 27: JetStream Core - Batch 28: JetStream API - Batch 29: JetStream Batching - Batch 30: Raft Part 1 All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
136 lines
6.0 KiB
Markdown
136 lines
6.0 KiB
Markdown
# Batch 29 JetStream Batching Design
|
|
|
|
**Date:** 2026-02-27
|
|
**Batch:** 29 (`JetStream Batching`)
|
|
**Scope:** 12 features + 3 unit tests
|
|
**Dependencies:** batch `27` (`JetStream Core`)
|
|
**Go source:** `golang/nats-server/server/jetstream_batching.go` (+ mapped tests in `server/raft_test.go`)
|
|
|
|
## Problem
|
|
|
|
Batch 29 ports JetStream atomic batch internals: batch lifecycle/store setup, staged consistency bookkeeping, apply-path rejection/cleanup, and pre-proposal header validation (`checkMsgHeadersPreClusteredProposal`). This batch also includes 3 Raft-node behavior tests that depend on batch cleanup correctness.
|
|
|
|
## Context Findings
|
|
|
|
### Required command outputs (captured)
|
|
|
|
- `batch show 29 --db porting.db`
|
|
- Batch 29 is `pending`
|
|
- 12 features + 3 tests are all `deferred`
|
|
- Dependency: batch 27
|
|
- Go file: `server/jetstream_batching.go`
|
|
- `batch list --db porting.db`
|
|
- Batch 29 sits after batch 28 and depends on 27
|
|
- Batch 40 (`MQTT Server/JSA`) depends on 27 as well; keeping 29 high quality prevents later churn in JetStream/Raft behavior
|
|
- `report summary --db porting.db`
|
|
- Features verified: 1271 / 3673
|
|
- Tests verified: 430 / 3257
|
|
- Deferred backlog remains dominant, so no-stub discipline is mandatory
|
|
|
|
### Batch 29 mapped IDs
|
|
|
|
Features:
|
|
|
|
- `1508` `batching.newBatchGroup`
|
|
- `1509` `getBatchStoreDir`
|
|
- `1510` `newBatchStore`
|
|
- `1511` `batchGroup.readyForCommit`
|
|
- `1512` `batchGroup.cleanup`
|
|
- `1513` `batchGroup.cleanupLocked`
|
|
- `1514` `batchGroup.stopLocked`
|
|
- `1515` `batchStagedDiff.commit`
|
|
- `1516` `batchApply.clearBatchStateLocked`
|
|
- `1517` `batchApply.rejectBatchStateLocked`
|
|
- `1518` `batchApply.rejectBatchState`
|
|
- `1519` `checkMsgHeadersPreClusteredProposal` (largest surface, ~423 Go LOC)
|
|
|
|
Tests:
|
|
|
|
- `2654` `TestNRGMultipleStopsDontPanic` -> `RaftNodeTests.NRGMultipleStopsDontPanic_ShouldSucceed`
|
|
- `2674` `TestNRGKeepRunningOnServerShutdown` -> `RaftNodeTests.NRGKeepRunningOnServerShutdown_ShouldSucceed`
|
|
- `2718` `TestNRGInitSingleMemRaftNodeDefaults` -> `RaftNodeTests.NRGInitSingleMemRaftNodeDefaults_ShouldSucceed`
|
|
|
|
### Existing .NET baseline
|
|
|
|
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamBatching.cs` exists but is partial and contains stub-like behavior (`ReadyForCommit` comment indicates stub).
|
|
- No current `RaftNodeTests` file exists under `dotnet/tests/...`; mapped test targets are not implemented yet.
|
|
- JetStream batching test file currently contains deferred placeholders and does not cover Batch 29 mapped Raft tests.
|
|
|
|
## Approaches
|
|
|
|
### Approach A: Single massive `JetStreamBatching.cs` pass in one shot
|
|
|
|
- Pros: fewer commits, direct throughput.
|
|
- Cons: high defect risk around cross-cutting map/counter/header logic, hard to validate incrementally.
|
|
|
|
### Approach B (Recommended): Two feature waves + one test wave with strict evidence gates
|
|
|
|
- Wave 1: batch/store lifecycle primitives (`1508-1514`)
|
|
- Wave 2: staged/apply/header semantics (`1515-1519`)
|
|
- Wave 3: mapped Raft tests (`2654,2674,2718`)
|
|
- Pros: manageable review units, easier causality between feature changes and tests, strongest anti-stub control.
|
|
- Cons: more status updates/checkpoints.
|
|
|
|
### Approach C: Signature-first fill (compile now, behavior later)
|
|
|
|
- Pros: quick apparent progress.
|
|
- Cons: violates anti-stub goals and creates false tracker progress.
|
|
|
|
Decision: **Approach B**.
|
|
|
|
## Proposed Design
|
|
|
|
### 1. Component boundaries
|
|
|
|
- Keep batching logic centered in `JetStreamBatching.cs` for mapped methods.
|
|
- Add narrow helper methods/types only when required to preserve method-level mapping and testability.
|
|
- Keep heavy validation (`checkMsgHeadersPreClusteredProposal`) behaviorally aligned with Go checks: pre-check ordering, duplicate/msg-id checks, counter increment path, expected sequence checks, scheduling/rollup constraints, and discard policy checks.
|
|
|
|
### 2. Data and concurrency model
|
|
|
|
- Preserve lock expectations from Go comments by using existing `Lock`/`ReaderWriterLockSlim` conventions.
|
|
- Preserve inflight/global counters and cleanup semantics as deterministic state transitions.
|
|
- Ensure timer cleanup and commit readiness are race-safe and idempotent.
|
|
|
|
### 3. Feature grouping strategy (max ~20)
|
|
|
|
- **Group A (7 features):** `1508-1514`
|
|
- Batch group creation, store dir/store construction, commit readiness, cleanup and stop paths.
|
|
- **Group B (5 features):** `1515-1519`
|
|
- Staged diff commit state, batch apply clear/reject, and full header pre-check function.
|
|
|
|
### 4. Test strategy
|
|
|
|
- Create/port mapped tests in `RaftNodeTests` (or equivalent mapped class file) with real Arrange/Act/Assert behavior.
|
|
- Keep tests deterministic and non-networked where possible; if runtime infrastructure is missing, explicitly defer with reason.
|
|
- Add focused unit tests for `JetStreamBatching` helpers as needed to verify feature behavior before promoting feature status.
|
|
|
|
### 5. Status and evidence design
|
|
|
|
- Status transitions must be evidence-backed: `stub -> complete -> verified`.
|
|
- Chunked updates (max 15 IDs) to prevent bulk unverifiable promotion.
|
|
- Checkpoint between tasks: stub scan + build + targeted tests + tracker updates.
|
|
|
|
## Risks and Mitigations
|
|
|
|
- **Risk:** `1519` complexity causes partial/placeholder implementation.
|
|
- **Mitigation:** isolate into dedicated task, require feature-level gates and explicit defer-if-blocked path.
|
|
- **Risk:** Mapped Raft tests need runtime hooks not yet available.
|
|
- **Mitigation:** mark `deferred` with exact blocker reason; no fake-pass tests.
|
|
- **Risk:** Tracker drift from actual behavior.
|
|
- **Mitigation:** per-ID evidence, max-15 update chunk, and post-task checkpoints.
|
|
|
|
## Success Criteria
|
|
|
|
- All Batch 29 features and tests are either:
|
|
- `verified` with captured verification evidence, or
|
|
- `deferred` with explicit blocker reason.
|
|
- No new stub patterns introduced in touched production/test files.
|
|
- Build and relevant test gates are green at each task checkpoint.
|
|
|
|
## Non-Goals
|
|
|
|
- Executing implementation in this design doc.
|
|
- Refactoring unrelated JetStream or Raft subsystems beyond mapped Batch 29 behavior.
|
|
- Broad integration-test harness expansion beyond what Batch 29 requires.
|