Files
natsnet/docs/plans/2026-02-27-batch-33-js-cluster-streams-design.md
Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)
Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 17:01:31 -05:00

5.9 KiB

Batch 33 JS Cluster Streams Design

Date: 2026-02-27
Batch: 33 (JS Cluster Streams)
Scope: 58 features + 22 unit tests
Dependency: batch 32 (JS Cluster Meta)
Go source: golang/nats-server/server/jetstream_cluster.go

Problem

Batch 33 ports JetStream cluster stream/consumer assignment execution paths from server/jetstream_cluster.go, covering cluster monitoring loops, metadata snapshots, raft-group creation, stream-entry application, leader-change advisories, and stream/consumer create-update-delete flows.

The mapped tests are spread across JetStream cluster, monitor, JWT, concurrency, and raft suites. The design objective is to define a strict, auditable implementation path that avoids placeholder code and only advances tracker statuses with build/test evidence.

Context Findings

Required command outputs

  • batch show 33 --db porting.db
    • Status: pending
    • Features: 58 (all deferred)
    • Tests: 22 (all deferred)
    • Depends on: 32
    • Go file: server/jetstream_cluster.go
  • batch list --db porting.db
    • Batch chain includes 32 -> 33 -> 34 for JS cluster progression.
  • report summary --db porting.db
    • Overall progress: 1924/6942 (27.7%)

Note: in this environment, dotnet is not on PATH; use /usr/local/share/dotnet/dotnet when needed.

Current .NET state relevant to Batch 33

  • Cluster data structures exist in dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamClusterTypes.cs.
  • Core types exist in:
    • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamTypes.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/JetStream/NatsStream.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/Accounts/Account.cs
    • dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.*.cs (partial server files)
  • Backlog test coverage is mostly placeholder-level today; JetStreamClusterTests2.Impltests.cs is present, while several mapped classes (for example JetStreamClusterTests3, JetStreamClusterLongTests, RaftNodeTests) still need concrete batch coverage.

Clarified Constraints

  • Planning only in this session: no implementation execution.
  • Mandatory guardrails from Batch 0 must be carried forward and adapted to features + tests.
  • Feature work must be chunked into groups of at most ~20 features.
  • Status updates must use batch-update chunks of max 15 IDs.

Approaches

Approach A: Monolithic pass (all 58 features + 22 tests)

  • Pros: fewer task boundaries.
  • Cons: weak traceability and high risk of hidden stubs/regressions.
  • Pros: bounded scope per task, stronger verification evidence, easier rollback/debug.
  • Cons: more command overhead and checkpoint ceremony.

Approach C: Test-heavy-first before major feature porting

  • Pros: early behavior signal.
  • Cons: high churn because many mapped tests depend on stream/consumer cluster plumbing not yet ported.

Decision: Approach B.

Proposed Design

1. File ownership model

  • JetStream cluster stream orchestration methods in JetStreamTypes.cs or a new focused partial file (JetStream.ClusterStreams.cs).
  • NatsStream raft/cluster helpers in NatsStream.cs or NatsStream.Cluster.cs.
  • RaftGroup, StreamAssignment, ConsumerAssignment, and cluster helpers in JetStreamClusterTypes.cs (or focused partials if split improves reviewability).
  • Server-facing operations and advisories in a new/updated server partial (NatsServer.JetStreamClusterStreams.cs).

2. Feature slicing (max ~20 each)

  • Feature Group A (20 IDs): cluster monitor + snapshot/recovery primitives
    IDs: 1578-1597
  • Feature Group B (20 IDs): meta-entry application + raft-group/stream monitoring + leader-change core
    IDs: 1598-1617
  • Feature Group C (18 IDs): advisory + stream assignment/process lifecycle + consumer assignment/process lifecycle
    IDs: 1618-1635

3. Test slicing

  • Test Wave T1 (5 IDs): cluster long-path + JWT/monitor/concurrency anchors
    IDs: 1118,1214,1402,2144,2504
  • Test Wave T2 (9 IDs): raft elections and term behavior (early raft set)
    IDs: 2616,2620,2622,2624,2627,2628,2630,2631,2634
  • Test Wave T3 (8 IDs): raft replay/catchup/chain-of-blocks paths
    IDs: 2637,2638,2652,2657,2670,2671,2698,2699

4. Verification architecture

  • Per-feature loop: feature show -> focused failing test -> minimal implementation -> stub scan -> build gate -> targeted test gate -> status transition.
  • Per-test loop: test show -> Go behavioral port -> single-test run evidence -> class-level run -> status transition.
  • Checkpoint after every feature group and test wave, including full unit suite run.

5. Deferred handling model

If blocked by missing dependency behavior/infrastructure, immediately mark item deferred with explicit reason via --override; do not leave stubs in source or tests.

Risks and Mitigations

  • Dependency risk: Batch 32 is prerequisite.
    Mitigation: block all Batch 33 status transitions until dependency preflight confirms readiness.
  • Stub-risk in backlog tests: existing placeholder-style tests can produce false progress.
    Mitigation: required stub scan + assertion-quality checks + single-test execution evidence.
  • Ownership ambiguity risk: methods span JetStream, NatsStream, JetStreamCluster, NatsServer.
    Mitigation: explicit file ownership map and grouped tasking by domain.

Success Criteria

  • All 58 features are either verified with evidence or deferred with explicit blocker reason.
  • All 22 tests are either verified with evidence or deferred with explicit blocker reason.
  • No forbidden stub patterns in touched files.
  • Batch progress is auditable from command outputs and chunked status updates.

Non-Goals

  • Executing the implementation in this document.
  • Extending scope into Batch 34/35.
  • Building full distributed integration harness beyond mapped unit/backlog verification needs.