Files
natsnet/docs/plans/2026-02-27-batch-35-js-cluster-remaining-design.md
Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)
Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 17:01:31 -05:00

136 lines
5.8 KiB
Markdown

# Batch 35 JS Cluster Remaining Design
**Date:** 2026-02-27
**Batch:** 35 (`JS Cluster Remaining`)
**Scope:** 57 features + 49 unit tests
**Dependency:** batch `32` (`JS Cluster Meta`)
**Go source:** `golang/nats-server/server/jetstream_cluster.go`
## Problem
Batch 35 covers the remaining JetStream cluster behavior in `server/jetstream_cluster.go` (roughly lines `8766-10866`): delete-range and assignment encoding/decoding, stream snapshot/catchup processing, cluster info assembly, catchup throttling counters, and sync subject helpers. It also includes 49 tests, mostly `RaftNodeTests`, that validate catchup/truncation and snapshot correctness.
The plan must prevent false progress: no placeholder feature ports and no fake-pass tests.
## Context Findings
### Required command outputs
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 35 --db porting.db`
- Status: `pending`
- Features: `57` (all `deferred`)
- Tests: `49` (all `deferred`)
- Depends on: `32`
- Go file: `server/jetstream_cluster.go`
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db`
- Confirms chain around this area: `32 -> 33/34/35`.
- `/usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db`
- Overall progress: `1924/6942 (27.7%)`
Environment note: `dotnet` was not on `PATH` in this shell; use `/usr/local/share/dotnet/dotnet`.
### Feature ownership distribution (from `porting.db`)
- `NatsStream`: 27
- `JetStreamCluster`: 19
- `NatsServer`: 8
- `JetStreamEngine`: 3
### Test distribution (from `porting.db`)
- `RaftNodeTests`: 42
- `JetStreamClusterTests1`: 6
- `JetStreamBatchingTests`: 1
## Constraints and Success Criteria
- Planning only; no implementation execution in this session.
- Reuse Batch 0 rigor, but for **features + tests**.
- Feature tasks must be grouped in chunks of max ~20 features.
- Status updates must use batch-update chunks of at most 15 IDs.
- Blocked work must be marked `deferred` with explicit reason, never stubbed.
Success means all 57 feature IDs and 49 test IDs are either:
- promoted with verification evidence (`complete/verified`), or
- kept `deferred` with specific blocker notes.
## Approaches
### Approach A: Monolithic pass (all features, then all tests)
- Pros: simple sequencing.
- Cons: high risk, poor auditability, hard to isolate regressions.
### Approach B (Recommended): Three feature groups plus four test waves with hard gates
- Pros: bounded scope, clearer rollback points, aligns with max-20 feature grouping and max-15 status updates.
- Cons: more command overhead.
### Approach C: Test-first for all 49 tests before feature completion
- Pros: immediate behavior pressure.
- Cons: high churn because most tests depend on unported catchup/snapshot internals.
**Decision:** Approach B.
## Proposed Design
### 1. Code Organization
Use behavior-focused files while keeping existing class ownership:
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamClusterTypes.cs`
- assignment/sync subject encode/decode helpers and cluster utility functions.
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/NatsStream.cs`
- snapshot and catchup state transitions plus inbound cluster message processing.
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamTypes.cs`
- offline/online cluster info and alternate-stream assembly.
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.cs` and/or new partial `NatsServer.JetStreamClusterRemaining.cs`
- clustered consumer request path and gcb accounting helpers.
### 2. Feature Grouping (max ~20)
- **Group A (20 IDs):** `1694-1713`
Delete-range, consumer assignment, stream/batch encode-decode, snapshot support/state capture, clustered inbound message entry point.
- **Group B (20 IDs):** `1714-1733`
Delete trace, sync request calculation, snapshot delete handling, catchup peer lifecycle, snapshot/catchup processing, stream sync handler, cluster info base methods.
- **Group C (17 IDs):** `1734-1750`
Cluster info checks, stream alternates/info request handling, gcb accounting/kick channel, run-catchup loop, sync subject helper family.
### 3. Test Wave Grouping
- **Wave T1 (7 IDs):** `730,846,847,848,890,891,893` (`JetStreamBatchingTests` + `JetStreamClusterTests1`)
- **Wave T2 (14 IDs):** `2640,2641,2643,2644,2645,2646,2647,2648,2649,2653,2655,2656,2658,2659`
- **Wave T3 (14 IDs):** `2660,2661,2662,2665,2666,2668,2669,2673,2676,2677,2678,2679,2680,2681`
- **Wave T4 (14 IDs):** `2682,2683,2684,2685,2686,2688,2691,2696,2697,2703,2715,2716,2717,2719`
### 4. Verification Strategy (Design-Level)
- Every feature follows a per-feature loop with focused test evidence.
- Every test follows method-level then class-level verification.
- Stub detection runs after each loop and before status promotions.
- Build and targeted/full test gates are mandatory before checkpoint status updates.
- Checkpoints occur between every task boundary.
### 5. Deferral Strategy
If infrastructure or dependency behavior blocks a feature/test:
1. Stop work on that ID.
2. Do not add placeholder implementation/assertions.
3. Mark `deferred` with explicit reason via `--override`.
4. Continue with next unblocked item.
## Risks and Mitigations
- **Dependency readiness risk (Batch 32):** enforce preflight before `batch start 35`.
- **Raft-heavy test concentration:** split `RaftNodeTests` into three equal waves and checkpoint each wave.
- **Stub regression under volume:** hard anti-stub scans and strict status chunking (`<=15`).
- **Class ownership drift:** keep methods in mapped classes only (`JetStreamCluster`, `NatsStream`, `JetStreamEngine`, `NatsServer`).
## Non-Goals
- Executing the implementation in this session.
- Expanding scope beyond Batch 35 mappings.
- Changing batch dependencies/order in PortTracker.