natsnet/docs/plans/2026-02-27-batch-34-js-cluster-consumers-design.md

# Batch 34 JS Cluster Consumers Design

**Date:** 2026-02-27
**Batch:** 34 (`JS Cluster Consumers`)
**Scope:** 58 features + 160 unit tests
**Dependency:** batch `33` (`JS Cluster Streams`)
**Go source:** `golang/nats-server/server/jetstream_cluster.go`

## Problem

Batch 34 ports JetStream cluster consumer operations from `server/jetstream_cluster.go` (lines ~5935-8744), including consumer assignment/inflight reconciliation, replicated ack processing, leader-change handling, peer-group placement logic, clustered stream request handling, and stream/consumer mutation encoding/decoding.

The mapped test set is broad (160 tests across 29 test classes), so the design must enforce strict evidence gates and avoid fake progress through placeholder implementations.

## Context Findings

### Required command outputs

- `batch show 34 --db porting.db`
  - Status: `pending`
  - Features: `58` (all `deferred`)
  - Tests: `160` (all `deferred`)
  - Depends on: `33`
  - Go file: `server/jetstream_cluster.go`
- `batch list --db porting.db`
  - Batch chain includes `33 -> 34 -> 38` for JS cluster consumer progression.
- `report summary --db porting.db`
  - Overall progress: `1924/6942 (27.7%)`

Environment note: `dotnet` was not on `PATH` in this shell; commands need `/usr/local/share/dotnet/dotnet` fallback.

### Mapped feature ownership (from `porting.db`)

- `JetStreamCluster`: 19
- `JetStreamEngine`: 13
- `NatsServer`: 13
- `NatsConsumer`: 7
- `SelectPeerError`: 4
- `JsAccount`: 1
- `Account`: 1

### Mapped test distribution (top classes)

- `ServerOptionsTests` (28), `JwtProcessorTests` (20), `WebSocketHandlerTests` (14), `LeafNodeHandlerTests` (11), `JetStreamEngineTests` (11), `JetStreamClusterTests1` (10), plus 23 additional classes.

## Clarified Constraints

- Planning only in this session; no implementation execution.
- Batch 0 guardrail rigor is mandatory and must be adapted for **features + tests**.
- Feature work must be sliced into groups with max ~20 feature IDs.
- Status updates must use `feature/test batch-update` chunks of max 15 IDs.
- If blocked, mark `deferred` with explicit reason; do not write stubs.

## Approaches

### Approach A: Single large implementation pass

- Pros: low planning overhead.
- Cons: poor auditability, high regression/stub risk, hard to isolate failures.

### Approach B (Recommended): Feature-first 3 groups, then 5 test waves, each with hard checkpoint gates

- Pros: bounded scope, auditable status transitions, faster root-cause isolation.
- Cons: more CLI/test command overhead.

### Approach C: Test-first across all 160 before feature completion

- Pros: immediate behavior pressure.
- Cons: high churn because many tests depend on not-yet-ported consumer cluster paths.

**Decision:** Approach B.

## Proposed Design

### 1. Architecture and File Ownership

Production code is split by behavior boundary instead of one monolithic file:

- `JetStream` consumer orchestration:
  - expected: `JetStream/JetStream.ClusterConsumers.cs` (create) or `JetStreamTypes.cs` (modify)
- `NatsConsumer` cluster hooks:
  - expected: `JetStream/NatsConsumer.Cluster.cs` (create) or `NatsConsumer.cs` (modify)
- `JetStreamCluster` placement + encoding/decoding:
  - expected: `JetStream/JetStreamCluster.Consumers.cs` (create) or `JetStreamClusterTypes.cs` (modify)
- `NatsServer` clustered request/advisory endpoints:
  - expected: `NatsServer.JetStreamClusterConsumers.cs` (create) as partial server extension
- `Account` limits selection helper:
  - expected: `Accounts/Account.JetStream.cs` (create) or `Accounts/Account.cs` (modify)

### 2. Feature Slicing (max ~20 IDs each)

- **Group A (20 IDs):** `1636-1655`
  Consumer assignment/inflight lookup, consumer raft-node helpers, monitor/apply entries, ack decode, leader advisory primitives.
- **Group B (20 IDs):** `1656-1675`
  Assignment result processors, updates subscription lifecycle, leader-change flow, peer remap/selection foundation, tier/limits checks, base clustered stream request helpers.
- **Group C (18 IDs):** `1676-1693`
  Clustered stream update/delete/purge/restore/list, consumer/message delete requests, and assignment/purge/message encode-decode helpers.

### 3. Test Slicing

- **Wave T1 (37 IDs):** JetStream cluster/consumer behavior core (`JetStreamClusterTests1/2/3/4`, `JetStreamEngineTests`, `NatsConsumerTests`)
- **Wave T2 (39 IDs):** config/reload/options surface (`ServerOptionsTests`, `ConfigCheckTests`, `ConfigReloaderTests`, `NatsServerTests`)
- **Wave T3 (33 IDs):** JWT/auth/cert/account validations (`JwtProcessorTests`, `JetStreamJwtTests`, `AuthCalloutTests`, `AuthHandlerTests`, `CertificateStoreWindowsTests`, `AccountTests`)
- **Wave T4 (32 IDs):** transport + route + leaf/websocket (`WebSocketHandlerTests`, `LeafNodeHandlerTests`, `LeafNodeProxyTests`, `RouteHandlerTests`, `GatewayHandlerTests`)
- **Wave T5 (19 IDs):** remaining integration-oriented regressions (`MqttHandlerTests`, `JetStreamLeafNodeTests`, `JetStreamSuperClusterTests`, `MessageTracerTests`, `MonitoringHandlerTests`, `EventsHandlerTests`, `JetStreamFileStoreTests`)

### 4. Verification Model

- Per-feature loop and per-test loop are mandatory.
- Every loop requires:
  - stub detection scan
  - build gate
  - targeted test gate
- Checkpoint required between all tasks before any `verified` promotion.
- Status transitions are evidence-driven only:
  - `deferred/not_started -> stub -> complete -> verified`

### 5. Failure and Deferral Strategy

If blocked by missing infra/dependency behavior:

1. Stop the current item.
2. Do not introduce placeholder logic or fake-pass tests.
3. Mark item `deferred` with explicit reason via `--override`.
4. Continue with next unblocked ID.

## Risks and Mitigations

- **Dependency readiness risk (Batch 33):**
  Mitigation: hard preflight gate before starting Batch 34.
- **Wide test blast radius (160 tests / 29 classes):**
  Mitigation: wave-based execution and strict checkpoints.
- **Stub regression risk in ported methods/tests:**
  Mitigation: non-negotiable anti-stub scans and hard limits.
- **Ownership ambiguity across partial classes:**
  Mitigation: explicit file ownership map and method-to-class grouping.

## Success Criteria

- All 58 features are `verified` with evidence or `deferred` with explicit blocker reason.
- All 160 tests are `verified` with evidence or `deferred` with explicit blocker reason.
- No forbidden stub patterns remain in touched production or test files.
- Status updates are auditable and chunked (`<=15` IDs per `batch-update` call).

## Non-Goals

- Executing implementation in this planning session.
- Expanding scope beyond Batch 34.
- Building new infrastructure outside existing batch-mapped feature/test needs.