Files
natsnet/docs/plans/2026-02-27-batch-34-js-cluster-consumers-design.md
Joseph Doherty f8dce79ac0 Add batch plans for batches 31-36 (rounds 16-18)
Generated design docs and implementation plans via Codex for:
- Batch 31: Raft Part 2
- Batch 32: JS Cluster Meta
- Batch 33: JS Cluster Streams
- Batch 34: JS Cluster Consumers
- Batch 35: JS Cluster Remaining
- Batch 36: Stream Lifecycle

All plans include mandatory verification protocol and anti-stub guardrails.
Updated batches.md with file paths and planned status.
2026-02-27 17:01:31 -05:00

6.6 KiB

Batch 34 JS Cluster Consumers Design

Date: 2026-02-27
Batch: 34 (JS Cluster Consumers)
Scope: 58 features + 160 unit tests
Dependency: batch 33 (JS Cluster Streams)
Go source: golang/nats-server/server/jetstream_cluster.go

Problem

Batch 34 ports JetStream cluster consumer operations from server/jetstream_cluster.go (lines ~5935-8744), including consumer assignment/inflight reconciliation, replicated ack processing, leader-change handling, peer-group placement logic, clustered stream request handling, and stream/consumer mutation encoding/decoding.

The mapped test set is broad (160 tests across 29 test classes), so the design must enforce strict evidence gates and avoid fake progress through placeholder implementations.

Context Findings

Required command outputs

  • batch show 34 --db porting.db
    • Status: pending
    • Features: 58 (all deferred)
    • Tests: 160 (all deferred)
    • Depends on: 33
    • Go file: server/jetstream_cluster.go
  • batch list --db porting.db
    • Batch chain includes 33 -> 34 -> 38 for JS cluster consumer progression.
  • report summary --db porting.db
    • Overall progress: 1924/6942 (27.7%)

Environment note: dotnet was not on PATH in this shell; commands need /usr/local/share/dotnet/dotnet fallback.

Mapped feature ownership (from porting.db)

  • JetStreamCluster: 19
  • JetStreamEngine: 13
  • NatsServer: 13
  • NatsConsumer: 7
  • SelectPeerError: 4
  • JsAccount: 1
  • Account: 1

Mapped test distribution (top classes)

  • ServerOptionsTests (28), JwtProcessorTests (20), WebSocketHandlerTests (14), LeafNodeHandlerTests (11), JetStreamEngineTests (11), JetStreamClusterTests1 (10), plus 23 additional classes.

Clarified Constraints

  • Planning only in this session; no implementation execution.
  • Batch 0 guardrail rigor is mandatory and must be adapted for features + tests.
  • Feature work must be sliced into groups with max ~20 feature IDs.
  • Status updates must use feature/test batch-update chunks of max 15 IDs.
  • If blocked, mark deferred with explicit reason; do not write stubs.

Approaches

Approach A: Single large implementation pass

  • Pros: low planning overhead.
  • Cons: poor auditability, high regression/stub risk, hard to isolate failures.
  • Pros: bounded scope, auditable status transitions, faster root-cause isolation.
  • Cons: more CLI/test command overhead.

Approach C: Test-first across all 160 before feature completion

  • Pros: immediate behavior pressure.
  • Cons: high churn because many tests depend on not-yet-ported consumer cluster paths.

Decision: Approach B.

Proposed Design

1. Architecture and File Ownership

Production code is split by behavior boundary instead of one monolithic file:

  • JetStream consumer orchestration:
    • expected: JetStream/JetStream.ClusterConsumers.cs (create) or JetStreamTypes.cs (modify)
  • NatsConsumer cluster hooks:
    • expected: JetStream/NatsConsumer.Cluster.cs (create) or NatsConsumer.cs (modify)
  • JetStreamCluster placement + encoding/decoding:
    • expected: JetStream/JetStreamCluster.Consumers.cs (create) or JetStreamClusterTypes.cs (modify)
  • NatsServer clustered request/advisory endpoints:
    • expected: NatsServer.JetStreamClusterConsumers.cs (create) as partial server extension
  • Account limits selection helper:
    • expected: Accounts/Account.JetStream.cs (create) or Accounts/Account.cs (modify)

2. Feature Slicing (max ~20 IDs each)

  • Group A (20 IDs): 1636-1655
    Consumer assignment/inflight lookup, consumer raft-node helpers, monitor/apply entries, ack decode, leader advisory primitives.
  • Group B (20 IDs): 1656-1675
    Assignment result processors, updates subscription lifecycle, leader-change flow, peer remap/selection foundation, tier/limits checks, base clustered stream request helpers.
  • Group C (18 IDs): 1676-1693
    Clustered stream update/delete/purge/restore/list, consumer/message delete requests, and assignment/purge/message encode-decode helpers.

3. Test Slicing

  • Wave T1 (37 IDs): JetStream cluster/consumer behavior core (JetStreamClusterTests1/2/3/4, JetStreamEngineTests, NatsConsumerTests)
  • Wave T2 (39 IDs): config/reload/options surface (ServerOptionsTests, ConfigCheckTests, ConfigReloaderTests, NatsServerTests)
  • Wave T3 (33 IDs): JWT/auth/cert/account validations (JwtProcessorTests, JetStreamJwtTests, AuthCalloutTests, AuthHandlerTests, CertificateStoreWindowsTests, AccountTests)
  • Wave T4 (32 IDs): transport + route + leaf/websocket (WebSocketHandlerTests, LeafNodeHandlerTests, LeafNodeProxyTests, RouteHandlerTests, GatewayHandlerTests)
  • Wave T5 (19 IDs): remaining integration-oriented regressions (MqttHandlerTests, JetStreamLeafNodeTests, JetStreamSuperClusterTests, MessageTracerTests, MonitoringHandlerTests, EventsHandlerTests, JetStreamFileStoreTests)

4. Verification Model

  • Per-feature loop and per-test loop are mandatory.
  • Every loop requires:
    • stub detection scan
    • build gate
    • targeted test gate
  • Checkpoint required between all tasks before any verified promotion.
  • Status transitions are evidence-driven only:
    • deferred/not_started -> stub -> complete -> verified

5. Failure and Deferral Strategy

If blocked by missing infra/dependency behavior:

  1. Stop the current item.
  2. Do not introduce placeholder logic or fake-pass tests.
  3. Mark item deferred with explicit reason via --override.
  4. Continue with next unblocked ID.

Risks and Mitigations

  • Dependency readiness risk (Batch 33):
    Mitigation: hard preflight gate before starting Batch 34.
  • Wide test blast radius (160 tests / 29 classes):
    Mitigation: wave-based execution and strict checkpoints.
  • Stub regression risk in ported methods/tests:
    Mitigation: non-negotiable anti-stub scans and hard limits.
  • Ownership ambiguity across partial classes:
    Mitigation: explicit file ownership map and method-to-class grouping.

Success Criteria

  • All 58 features are verified with evidence or deferred with explicit blocker reason.
  • All 160 tests are verified with evidence or deferred with explicit blocker reason.
  • No forbidden stub patterns remain in touched production or test files.
  • Status updates are auditable and chunked (<=15 IDs per batch-update call).

Non-Goals

  • Executing implementation in this planning session.
  • Expanding scope beyond Batch 34.
  • Building new infrastructure outside existing batch-mapped feature/test needs.