Files
natsnet/docs/plans/2026-02-27-batch-35-js-cluster-remaining-design.md
Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)
Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 17:01:31 -05:00

5.8 KiB

Batch 35 JS Cluster Remaining Design

Date: 2026-02-27
Batch: 35 (JS Cluster Remaining)
Scope: 57 features + 49 unit tests
Dependency: batch 32 (JS Cluster Meta)
Go source: golang/nats-server/server/jetstream_cluster.go

Problem

Batch 35 covers the remaining JetStream cluster behavior in server/jetstream_cluster.go (roughly lines 8766-10866): delete-range and assignment encoding/decoding, stream snapshot/catchup processing, cluster info assembly, catchup throttling counters, and sync subject helpers. It also includes 49 tests, mostly RaftNodeTests, that validate catchup/truncation and snapshot correctness.

The plan must prevent false progress: no placeholder feature ports and no fake-pass tests.

Context Findings

Required command outputs

  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch show 35 --db porting.db
    • Status: pending
    • Features: 57 (all deferred)
    • Tests: 49 (all deferred)
    • Depends on: 32
    • Go file: server/jetstream_cluster.go
  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- batch list --db porting.db
    • Confirms chain around this area: 32 -> 33/34/35.
  • /usr/local/share/dotnet/dotnet run --project tools/NatsNet.PortTracker -- report summary --db porting.db
    • Overall progress: 1924/6942 (27.7%)

Environment note: dotnet was not on PATH in this shell; use /usr/local/share/dotnet/dotnet.

Feature ownership distribution (from porting.db)

  • NatsStream: 27
  • JetStreamCluster: 19
  • NatsServer: 8
  • JetStreamEngine: 3

Test distribution (from porting.db)

  • RaftNodeTests: 42
  • JetStreamClusterTests1: 6
  • JetStreamBatchingTests: 1

Constraints and Success Criteria

  • Planning only; no implementation execution in this session.
  • Reuse Batch 0 rigor, but for features + tests.
  • Feature tasks must be grouped in chunks of max ~20 features.
  • Status updates must use batch-update chunks of at most 15 IDs.
  • Blocked work must be marked deferred with explicit reason, never stubbed.

Success means all 57 feature IDs and 49 test IDs are either:

  • promoted with verification evidence (complete/verified), or
  • kept deferred with specific blocker notes.

Approaches

Approach A: Monolithic pass (all features, then all tests)

  • Pros: simple sequencing.
  • Cons: high risk, poor auditability, hard to isolate regressions.
  • Pros: bounded scope, clearer rollback points, aligns with max-20 feature grouping and max-15 status updates.
  • Cons: more command overhead.

Approach C: Test-first for all 49 tests before feature completion

  • Pros: immediate behavior pressure.
  • Cons: high churn because most tests depend on unported catchup/snapshot internals.

Decision: Approach B.

Proposed Design

1. Code Organization

Use behavior-focused files while keeping existing class ownership:

  • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamClusterTypes.cs
    • assignment/sync subject encode/decode helpers and cluster utility functions.
  • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/NatsStream.cs
    • snapshot and catchup state transitions plus inbound cluster message processing.
  • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamTypes.cs
    • offline/online cluster info and alternate-stream assembly.
  • dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.cs and/or new partial NatsServer.JetStreamClusterRemaining.cs
    • clustered consumer request path and gcb accounting helpers.

2. Feature Grouping (max ~20)

  • Group A (20 IDs): 1694-1713
    Delete-range, consumer assignment, stream/batch encode-decode, snapshot support/state capture, clustered inbound message entry point.
  • Group B (20 IDs): 1714-1733
    Delete trace, sync request calculation, snapshot delete handling, catchup peer lifecycle, snapshot/catchup processing, stream sync handler, cluster info base methods.
  • Group C (17 IDs): 1734-1750
    Cluster info checks, stream alternates/info request handling, gcb accounting/kick channel, run-catchup loop, sync subject helper family.

3. Test Wave Grouping

  • Wave T1 (7 IDs): 730,846,847,848,890,891,893 (JetStreamBatchingTests + JetStreamClusterTests1)
  • Wave T2 (14 IDs): 2640,2641,2643,2644,2645,2646,2647,2648,2649,2653,2655,2656,2658,2659
  • Wave T3 (14 IDs): 2660,2661,2662,2665,2666,2668,2669,2673,2676,2677,2678,2679,2680,2681
  • Wave T4 (14 IDs): 2682,2683,2684,2685,2686,2688,2691,2696,2697,2703,2715,2716,2717,2719

4. Verification Strategy (Design-Level)

  • Every feature follows a per-feature loop with focused test evidence.
  • Every test follows method-level then class-level verification.
  • Stub detection runs after each loop and before status promotions.
  • Build and targeted/full test gates are mandatory before checkpoint status updates.
  • Checkpoints occur between every task boundary.

5. Deferral Strategy

If infrastructure or dependency behavior blocks a feature/test:

  1. Stop work on that ID.
  2. Do not add placeholder implementation/assertions.
  3. Mark deferred with explicit reason via --override.
  4. Continue with next unblocked item.

Risks and Mitigations

  • Dependency readiness risk (Batch 32): enforce preflight before batch start 35.
  • Raft-heavy test concentration: split RaftNodeTests into three equal waves and checkpoint each wave.
  • Stub regression under volume: hard anti-stub scans and strict status chunking (<=15).
  • Class ownership drift: keep methods in mapped classes only (JetStreamCluster, NatsStream, JetStreamEngine, NatsServer).

Non-Goals

  • Executing the implementation in this session.
  • Expanding scope beyond Batch 35 mappings.
  • Changing batch dependencies/order in PortTracker.