Files
natsnet/docs/plans/2026-02-27-batch-33-js-cluster-streams-design.md
Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)
Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 17:01:31 -05:00

126 lines
5.9 KiB
Markdown

# Batch 33 JS Cluster Streams Design
**Date:** 2026-02-27
**Batch:** 33 (`JS Cluster Streams`)
**Scope:** 58 features + 22 unit tests
**Dependency:** batch `32` (`JS Cluster Meta`)
**Go source:** `golang/nats-server/server/jetstream_cluster.go`
## Problem
Batch 33 ports JetStream cluster stream/consumer assignment execution paths from `server/jetstream_cluster.go`, covering cluster monitoring loops, metadata snapshots, raft-group creation, stream-entry application, leader-change advisories, and stream/consumer create-update-delete flows.
The mapped tests are spread across JetStream cluster, monitor, JWT, concurrency, and raft suites. The design objective is to define a strict, auditable implementation path that avoids placeholder code and only advances tracker statuses with build/test evidence.
## Context Findings
### Required command outputs
- `batch show 33 --db porting.db`
- Status: `pending`
- Features: `58` (all `deferred`)
- Tests: `22` (all `deferred`)
- Depends on: `32`
- Go file: `server/jetstream_cluster.go`
- `batch list --db porting.db`
- Batch chain includes `32 -> 33 -> 34` for JS cluster progression.
- `report summary --db porting.db`
- Overall progress: `1924/6942 (27.7%)`
Note: in this environment, `dotnet` is not on `PATH`; use `/usr/local/share/dotnet/dotnet` when needed.
### Current .NET state relevant to Batch 33
- Cluster data structures exist in `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamClusterTypes.cs`.
- Core types exist in:
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/JetStreamTypes.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/JetStream/NatsStream.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/Accounts/Account.cs`
- `dotnet/src/ZB.MOM.NatsNet.Server/NatsServer.*.cs` (partial server files)
- Backlog test coverage is mostly placeholder-level today; `JetStreamClusterTests2.Impltests.cs` is present, while several mapped classes (for example `JetStreamClusterTests3`, `JetStreamClusterLongTests`, `RaftNodeTests`) still need concrete batch coverage.
## Clarified Constraints
- Planning only in this session: no implementation execution.
- Mandatory guardrails from Batch 0 must be carried forward and adapted to features + tests.
- Feature work must be chunked into groups of at most ~20 features.
- Status updates must use `batch-update` chunks of max 15 IDs.
## Approaches
### Approach A: Monolithic pass (all 58 features + 22 tests)
- Pros: fewer task boundaries.
- Cons: weak traceability and high risk of hidden stubs/regressions.
### Approach B (Recommended): Three feature groups + three test waves with hard checkpoints
- Pros: bounded scope per task, stronger verification evidence, easier rollback/debug.
- Cons: more command overhead and checkpoint ceremony.
### Approach C: Test-heavy-first before major feature porting
- Pros: early behavior signal.
- Cons: high churn because many mapped tests depend on stream/consumer cluster plumbing not yet ported.
**Decision:** Approach B.
## Proposed Design
### 1. File ownership model
- `JetStream` cluster stream orchestration methods in `JetStreamTypes.cs` or a new focused partial file (`JetStream.ClusterStreams.cs`).
- `NatsStream` raft/cluster helpers in `NatsStream.cs` or `NatsStream.Cluster.cs`.
- `RaftGroup`, `StreamAssignment`, `ConsumerAssignment`, and cluster helpers in `JetStreamClusterTypes.cs` (or focused partials if split improves reviewability).
- Server-facing operations and advisories in a new/updated server partial (`NatsServer.JetStreamClusterStreams.cs`).
### 2. Feature slicing (max ~20 each)
- **Feature Group A (20 IDs):** cluster monitor + snapshot/recovery primitives
IDs: `1578-1597`
- **Feature Group B (20 IDs):** meta-entry application + raft-group/stream monitoring + leader-change core
IDs: `1598-1617`
- **Feature Group C (18 IDs):** advisory + stream assignment/process lifecycle + consumer assignment/process lifecycle
IDs: `1618-1635`
### 3. Test slicing
- **Test Wave T1 (5 IDs):** cluster long-path + JWT/monitor/concurrency anchors
IDs: `1118,1214,1402,2144,2504`
- **Test Wave T2 (9 IDs):** raft elections and term behavior (early raft set)
IDs: `2616,2620,2622,2624,2627,2628,2630,2631,2634`
- **Test Wave T3 (8 IDs):** raft replay/catchup/chain-of-blocks paths
IDs: `2637,2638,2652,2657,2670,2671,2698,2699`
### 4. Verification architecture
- Per-feature loop: `feature show` -> focused failing test -> minimal implementation -> stub scan -> build gate -> targeted test gate -> status transition.
- Per-test loop: `test show` -> Go behavioral port -> single-test run evidence -> class-level run -> status transition.
- Checkpoint after every feature group and test wave, including full unit suite run.
### 5. Deferred handling model
If blocked by missing dependency behavior/infrastructure, immediately mark item `deferred` with explicit reason via `--override`; do not leave stubs in source or tests.
## Risks and Mitigations
- **Dependency risk:** Batch 32 is prerequisite.
**Mitigation:** block all Batch 33 status transitions until dependency preflight confirms readiness.
- **Stub-risk in backlog tests:** existing placeholder-style tests can produce false progress.
**Mitigation:** required stub scan + assertion-quality checks + single-test execution evidence.
- **Ownership ambiguity risk:** methods span `JetStream`, `NatsStream`, `JetStreamCluster`, `NatsServer`.
**Mitigation:** explicit file ownership map and grouped tasking by domain.
## Success Criteria
- All 58 features are either `verified` with evidence or `deferred` with explicit blocker reason.
- All 22 tests are either `verified` with evidence or `deferred` with explicit blocker reason.
- No forbidden stub patterns in touched files.
- Batch progress is auditable from command outputs and chunked status updates.
## Non-Goals
- Executing the implementation in this document.
- Extending scope into Batch 34/35.
- Building full distributed integration harness beyond mapped unit/backlog verification needs.