# Batch 34 JS Cluster Consumers Design **Date:** 2026-02-27 **Batch:** 34 (`JS Cluster Consumers`) **Scope:** 58 features + 160 unit tests **Dependency:** batch `33` (`JS Cluster Streams`) **Go source:** `golang/nats-server/server/jetstream_cluster.go` ## Problem Batch 34 ports JetStream cluster consumer operations from `server/jetstream_cluster.go` (lines ~5935-8744), including consumer assignment/inflight reconciliation, replicated ack processing, leader-change handling, peer-group placement logic, clustered stream request handling, and stream/consumer mutation encoding/decoding. The mapped test set is broad (160 tests across 29 test classes), so the design must enforce strict evidence gates and avoid fake progress through placeholder implementations. ## Context Findings ### Required command outputs - `batch show 34 --db porting.db` - Status: `pending` - Features: `58` (all `deferred`) - Tests: `160` (all `deferred`) - Depends on: `33` - Go file: `server/jetstream_cluster.go` - `batch list --db porting.db` - Batch chain includes `33 -> 34 -> 38` for JS cluster consumer progression. - `report summary --db porting.db` - Overall progress: `1924/6942 (27.7%)` Environment note: `dotnet` was not on `PATH` in this shell; commands need `/usr/local/share/dotnet/dotnet` fallback. ### Mapped feature ownership (from `porting.db`) - `JetStreamCluster`: 19 - `JetStreamEngine`: 13 - `NatsServer`: 13 - `NatsConsumer`: 7 - `SelectPeerError`: 4 - `JsAccount`: 1 - `Account`: 1 ### Mapped test distribution (top classes) - `ServerOptionsTests` (28), `JwtProcessorTests` (20), `WebSocketHandlerTests` (14), `LeafNodeHandlerTests` (11), `JetStreamEngineTests` (11), `JetStreamClusterTests1` (10), plus 23 additional classes. ## Clarified Constraints - Planning only in this session; no implementation execution. - Batch 0 guardrail rigor is mandatory and must be adapted for **features + tests**. - Feature work must be sliced into groups with max ~20 feature IDs. - Status updates must use `feature/test batch-update` chunks of max 15 IDs. - If blocked, mark `deferred` with explicit reason; do not write stubs. ## Approaches ### Approach A: Single large implementation pass - Pros: low planning overhead. - Cons: poor auditability, high regression/stub risk, hard to isolate failures. ### Approach B (Recommended): Feature-first 3 groups, then 5 test waves, each with hard checkpoint gates - Pros: bounded scope, auditable status transitions, faster root-cause isolation. - Cons: more CLI/test command overhead. ### Approach C: Test-first across all 160 before feature completion - Pros: immediate behavior pressure. - Cons: high churn because many tests depend on not-yet-ported consumer cluster paths. **Decision:** Approach B. ## Proposed Design ### 1. Architecture and File Ownership Production code is split by behavior boundary instead of one monolithic file: - `JetStream` consumer orchestration: - expected: `JetStream/JetStream.ClusterConsumers.cs` (create) or `JetStreamTypes.cs` (modify) - `NatsConsumer` cluster hooks: - expected: `JetStream/NatsConsumer.Cluster.cs` (create) or `NatsConsumer.cs` (modify) - `JetStreamCluster` placement + encoding/decoding: - expected: `JetStream/JetStreamCluster.Consumers.cs` (create) or `JetStreamClusterTypes.cs` (modify) - `NatsServer` clustered request/advisory endpoints: - expected: `NatsServer.JetStreamClusterConsumers.cs` (create) as partial server extension - `Account` limits selection helper: - expected: `Accounts/Account.JetStream.cs` (create) or `Accounts/Account.cs` (modify) ### 2. Feature Slicing (max ~20 IDs each) - **Group A (20 IDs):** `1636-1655` Consumer assignment/inflight lookup, consumer raft-node helpers, monitor/apply entries, ack decode, leader advisory primitives. - **Group B (20 IDs):** `1656-1675` Assignment result processors, updates subscription lifecycle, leader-change flow, peer remap/selection foundation, tier/limits checks, base clustered stream request helpers. - **Group C (18 IDs):** `1676-1693` Clustered stream update/delete/purge/restore/list, consumer/message delete requests, and assignment/purge/message encode-decode helpers. ### 3. Test Slicing - **Wave T1 (37 IDs):** JetStream cluster/consumer behavior core (`JetStreamClusterTests1/2/3/4`, `JetStreamEngineTests`, `NatsConsumerTests`) - **Wave T2 (39 IDs):** config/reload/options surface (`ServerOptionsTests`, `ConfigCheckTests`, `ConfigReloaderTests`, `NatsServerTests`) - **Wave T3 (33 IDs):** JWT/auth/cert/account validations (`JwtProcessorTests`, `JetStreamJwtTests`, `AuthCalloutTests`, `AuthHandlerTests`, `CertificateStoreWindowsTests`, `AccountTests`) - **Wave T4 (32 IDs):** transport + route + leaf/websocket (`WebSocketHandlerTests`, `LeafNodeHandlerTests`, `LeafNodeProxyTests`, `RouteHandlerTests`, `GatewayHandlerTests`) - **Wave T5 (19 IDs):** remaining integration-oriented regressions (`MqttHandlerTests`, `JetStreamLeafNodeTests`, `JetStreamSuperClusterTests`, `MessageTracerTests`, `MonitoringHandlerTests`, `EventsHandlerTests`, `JetStreamFileStoreTests`) ### 4. Verification Model - Per-feature loop and per-test loop are mandatory. - Every loop requires: - stub detection scan - build gate - targeted test gate - Checkpoint required between all tasks before any `verified` promotion. - Status transitions are evidence-driven only: - `deferred/not_started -> stub -> complete -> verified` ### 5. Failure and Deferral Strategy If blocked by missing infra/dependency behavior: 1. Stop the current item. 2. Do not introduce placeholder logic or fake-pass tests. 3. Mark item `deferred` with explicit reason via `--override`. 4. Continue with next unblocked ID. ## Risks and Mitigations - **Dependency readiness risk (Batch 33):** Mitigation: hard preflight gate before starting Batch 34. - **Wide test blast radius (160 tests / 29 classes):** Mitigation: wave-based execution and strict checkpoints. - **Stub regression risk in ported methods/tests:** Mitigation: non-negotiable anti-stub scans and hard limits. - **Ownership ambiguity across partial classes:** Mitigation: explicit file ownership map and method-to-class grouping. ## Success Criteria - All 58 features are `verified` with evidence or `deferred` with explicit blocker reason. - All 160 tests are `verified` with evidence or `deferred` with explicit blocker reason. - No forbidden stub patterns remain in touched production or test files. - Status updates are auditable and chunked (`<=15` IDs per `batch-update` call). ## Non-Goals - Executing implementation in this planning session. - Expanding scope beyond Batch 34. - Building new infrastructure outside existing batch-mapped feature/test needs.