Generated design docs and implementation plans via Codex for: - Batch 31: Raft Part 2 - Batch 32: JS Cluster Meta - Batch 33: JS Cluster Streams - Batch 34: JS Cluster Consumers - Batch 35: JS Cluster Remaining - Batch 36: Stream Lifecycle All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
6.6 KiB
Batch 34 JS Cluster Consumers Design
Date: 2026-02-27
Batch: 34 (JS Cluster Consumers)
Scope: 58 features + 160 unit tests
Dependency: batch 33 (JS Cluster Streams)
Go source: golang/nats-server/server/jetstream_cluster.go
Problem
Batch 34 ports JetStream cluster consumer operations from server/jetstream_cluster.go (lines ~5935-8744), including consumer assignment/inflight reconciliation, replicated ack processing, leader-change handling, peer-group placement logic, clustered stream request handling, and stream/consumer mutation encoding/decoding.
The mapped test set is broad (160 tests across 29 test classes), so the design must enforce strict evidence gates and avoid fake progress through placeholder implementations.
Context Findings
Required command outputs
batch show 34 --db porting.db- Status:
pending - Features:
58(alldeferred) - Tests:
160(alldeferred) - Depends on:
33 - Go file:
server/jetstream_cluster.go
- Status:
batch list --db porting.db- Batch chain includes
33 -> 34 -> 38for JS cluster consumer progression.
- Batch chain includes
report summary --db porting.db- Overall progress:
1924/6942 (27.7%)
- Overall progress:
Environment note: dotnet was not on PATH in this shell; commands need /usr/local/share/dotnet/dotnet fallback.
Mapped feature ownership (from porting.db)
JetStreamCluster: 19JetStreamEngine: 13NatsServer: 13NatsConsumer: 7SelectPeerError: 4JsAccount: 1Account: 1
Mapped test distribution (top classes)
ServerOptionsTests(28),JwtProcessorTests(20),WebSocketHandlerTests(14),LeafNodeHandlerTests(11),JetStreamEngineTests(11),JetStreamClusterTests1(10), plus 23 additional classes.
Clarified Constraints
- Planning only in this session; no implementation execution.
- Batch 0 guardrail rigor is mandatory and must be adapted for features + tests.
- Feature work must be sliced into groups with max ~20 feature IDs.
- Status updates must use
feature/test batch-updatechunks of max 15 IDs. - If blocked, mark
deferredwith explicit reason; do not write stubs.
Approaches
Approach A: Single large implementation pass
- Pros: low planning overhead.
- Cons: poor auditability, high regression/stub risk, hard to isolate failures.
Approach B (Recommended): Feature-first 3 groups, then 5 test waves, each with hard checkpoint gates
- Pros: bounded scope, auditable status transitions, faster root-cause isolation.
- Cons: more CLI/test command overhead.
Approach C: Test-first across all 160 before feature completion
- Pros: immediate behavior pressure.
- Cons: high churn because many tests depend on not-yet-ported consumer cluster paths.
Decision: Approach B.
Proposed Design
1. Architecture and File Ownership
Production code is split by behavior boundary instead of one monolithic file:
JetStreamconsumer orchestration:- expected:
JetStream/JetStream.ClusterConsumers.cs(create) orJetStreamTypes.cs(modify)
- expected:
NatsConsumercluster hooks:- expected:
JetStream/NatsConsumer.Cluster.cs(create) orNatsConsumer.cs(modify)
- expected:
JetStreamClusterplacement + encoding/decoding:- expected:
JetStream/JetStreamCluster.Consumers.cs(create) orJetStreamClusterTypes.cs(modify)
- expected:
NatsServerclustered request/advisory endpoints:- expected:
NatsServer.JetStreamClusterConsumers.cs(create) as partial server extension
- expected:
Accountlimits selection helper:- expected:
Accounts/Account.JetStream.cs(create) orAccounts/Account.cs(modify)
- expected:
2. Feature Slicing (max ~20 IDs each)
- Group A (20 IDs):
1636-1655
Consumer assignment/inflight lookup, consumer raft-node helpers, monitor/apply entries, ack decode, leader advisory primitives. - Group B (20 IDs):
1656-1675
Assignment result processors, updates subscription lifecycle, leader-change flow, peer remap/selection foundation, tier/limits checks, base clustered stream request helpers. - Group C (18 IDs):
1676-1693
Clustered stream update/delete/purge/restore/list, consumer/message delete requests, and assignment/purge/message encode-decode helpers.
3. Test Slicing
- Wave T1 (37 IDs): JetStream cluster/consumer behavior core (
JetStreamClusterTests1/2/3/4,JetStreamEngineTests,NatsConsumerTests) - Wave T2 (39 IDs): config/reload/options surface (
ServerOptionsTests,ConfigCheckTests,ConfigReloaderTests,NatsServerTests) - Wave T3 (33 IDs): JWT/auth/cert/account validations (
JwtProcessorTests,JetStreamJwtTests,AuthCalloutTests,AuthHandlerTests,CertificateStoreWindowsTests,AccountTests) - Wave T4 (32 IDs): transport + route + leaf/websocket (
WebSocketHandlerTests,LeafNodeHandlerTests,LeafNodeProxyTests,RouteHandlerTests,GatewayHandlerTests) - Wave T5 (19 IDs): remaining integration-oriented regressions (
MqttHandlerTests,JetStreamLeafNodeTests,JetStreamSuperClusterTests,MessageTracerTests,MonitoringHandlerTests,EventsHandlerTests,JetStreamFileStoreTests)
4. Verification Model
- Per-feature loop and per-test loop are mandatory.
- Every loop requires:
- stub detection scan
- build gate
- targeted test gate
- Checkpoint required between all tasks before any
verifiedpromotion. - Status transitions are evidence-driven only:
deferred/not_started -> stub -> complete -> verified
5. Failure and Deferral Strategy
If blocked by missing infra/dependency behavior:
- Stop the current item.
- Do not introduce placeholder logic or fake-pass tests.
- Mark item
deferredwith explicit reason via--override. - Continue with next unblocked ID.
Risks and Mitigations
- Dependency readiness risk (Batch 33):
Mitigation: hard preflight gate before starting Batch 34. - Wide test blast radius (160 tests / 29 classes):
Mitigation: wave-based execution and strict checkpoints. - Stub regression risk in ported methods/tests:
Mitigation: non-negotiable anti-stub scans and hard limits. - Ownership ambiguity across partial classes:
Mitigation: explicit file ownership map and method-to-class grouping.
Success Criteria
- All 58 features are
verifiedwith evidence ordeferredwith explicit blocker reason. - All 160 tests are
verifiedwith evidence ordeferredwith explicit blocker reason. - No forbidden stub patterns remain in touched production or test files.
- Status updates are auditable and chunked (
<=15IDs perbatch-updatecall).
Non-Goals
- Executing implementation in this planning session.
- Expanding scope beyond Batch 34.
- Building new infrastructure outside existing batch-mapped feature/test needs.