Generated design docs and implementation plans via Codex for: - Batch 31: Raft Part 2 - Batch 32: JS Cluster Meta - Batch 33: JS Cluster Streams - Batch 34: JS Cluster Consumers - Batch 35: JS Cluster Remaining - Batch 36: Stream Lifecycle All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
5.8 KiB
Batch 35 JS Cluster Remaining Design
Date: 2026-02-27
Batch: 35 (JS Cluster Remaining)
Scope: 57 features + 49 unit tests
Dependency: batch 32 (JS Cluster Meta)
Go source: golang/nats-server/server/jetstream_cluster.go
Problem
Batch 35 covers the remaining JetStream cluster behavior in server/jetstream_cluster.go (roughly lines 8766-10866): delete-range and assignment encoding/decoding, stream snapshot/catchup processing, cluster info assembly, catchup throttling counters, and sync subject helpers. It also includes 49 tests, mostly RaftNodeTests, that validate catchup/truncation and snapshot correctness.
The plan must prevent false progress: no placeholder feature ports and no fake-pass tests.
Context Findings
Required command outputs
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 35 --db porting.db- Status:
pending - Features:
57(alldeferred) - Tests:
49(alldeferred) - Depends on:
32 - Go file:
server/jetstream_cluster.go
- Status:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db- Confirms chain around this area:
32 -> 33/34/35.
- Confirms chain around this area:
/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db- Overall progress:
1924/6942 (27.7%)
- Overall progress:
Environment note: dotnet was not on PATH in this shell; use /usr/local/share/dotnet/dotnet.
Feature ownership distribution (from porting.db)
NatsStream: 27JetStreamCluster: 19NatsServer: 8JetStreamEngine: 3
Test distribution (from porting.db)
RaftNodeTests: 42JetStreamClusterTests1: 6JetStreamBatchingTests: 1
Constraints and Success Criteria
- Planning only; no implementation execution in this session.
- Reuse Batch 0 rigor, but for features + tests.
- Feature tasks must be grouped in chunks of max ~20 features.
- Status updates must use batch-update chunks of at most 15 IDs.
- Blocked work must be marked
deferredwith explicit reason, never stubbed.
Success means all 57 feature IDs and 49 test IDs are either:
- promoted with verification evidence (
complete/verified), or - kept
deferredwith specific blocker notes.
Approaches
Approach A: Monolithic pass (all features, then all tests)
- Pros: simple sequencing.
- Cons: high risk, poor auditability, hard to isolate regressions.
Approach B (Recommended): Three feature groups plus four test waves with hard gates
- Pros: bounded scope, clearer rollback points, aligns with max-20 feature grouping and max-15 status updates.
- Cons: more command overhead.
Approach C: Test-first for all 49 tests before feature completion
- Pros: immediate behavior pressure.
- Cons: high churn because most tests depend on unported catchup/snapshot internals.
Decision: Approach B.
Proposed Design
1. Code Organization
Use behavior-focused files while keeping existing class ownership:
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamClusterTypes.cs- assignment/sync subject encode/decode helpers and cluster utility functions.
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/NatsStream.cs- snapshot and catchup state transitions plus inbound cluster message processing.
dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamTypes.cs- offline/online cluster info and alternate-stream assembly.
dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.csand/or new partialNatsServer.JetStreamClusterRemaining.cs- clustered consumer request path and gcb accounting helpers.
2. Feature Grouping (max ~20)
- Group A (20 IDs):
1694-1713
Delete-range, consumer assignment, stream/batch encode-decode, snapshot support/state capture, clustered inbound message entry point. - Group B (20 IDs):
1714-1733
Delete trace, sync request calculation, snapshot delete handling, catchup peer lifecycle, snapshot/catchup processing, stream sync handler, cluster info base methods. - Group C (17 IDs):
1734-1750
Cluster info checks, stream alternates/info request handling, gcb accounting/kick channel, run-catchup loop, sync subject helper family.
3. Test Wave Grouping
- Wave T1 (7 IDs):
730,846,847,848,890,891,893(JetStreamBatchingTests+JetStreamClusterTests1) - Wave T2 (14 IDs):
2640,2641,2643,2644,2645,2646,2647,2648,2649,2653,2655,2656,2658,2659 - Wave T3 (14 IDs):
2660,2661,2662,2665,2666,2668,2669,2673,2676,2677,2678,2679,2680,2681 - Wave T4 (14 IDs):
2682,2683,2684,2685,2686,2688,2691,2696,2697,2703,2715,2716,2717,2719
4. Verification Strategy (Design-Level)
- Every feature follows a per-feature loop with focused test evidence.
- Every test follows method-level then class-level verification.
- Stub detection runs after each loop and before status promotions.
- Build and targeted/full test gates are mandatory before checkpoint status updates.
- Checkpoints occur between every task boundary.
5. Deferral Strategy
If infrastructure or dependency behavior blocks a feature/test:
- Stop work on that ID.
- Do not add placeholder implementation/assertions.
- Mark
deferredwith explicit reason via--override. - Continue with next unblocked item.
Risks and Mitigations
- Dependency readiness risk (Batch 32): enforce preflight before
batch start 35. - Raft-heavy test concentration: split
RaftNodeTestsinto three equal waves and checkpoint each wave. - Stub regression under volume: hard anti-stub scans and strict status chunking (
<=15). - Class ownership drift: keep methods in mapped classes only (
JetStreamCluster,NatsStream,JetStreamEngine,NatsServer).
Non-Goals
- Executing the implementation in this session.
- Expanding scope beyond Batch 35 mappings.
- Changing batch dependencies/order in PortTracker.