Generated design docs and implementation plans via Codex for: - Batch 31: Raft Part 2 - Batch 32: JS Cluster Meta - Batch 33: JS Cluster Streams - Batch 34: JS Cluster Consumers - Batch 35: JS Cluster Remaining - Batch 36: Stream Lifecycle All plans include mandatory verification protocol and anti-stub guardrails. Updated batches.md with file paths and planned status.
150 lines
6.6 KiB
Markdown
150 lines
6.6 KiB
Markdown
# Batch 34 JS Cluster Consumers Design
|
|
|
|
**Date:** 2026-02-27
|
|
**Batch:** 34 (`JS Cluster Consumers`)
|
|
**Scope:** 58 features + 160 unit tests
|
|
**Dependency:** batch `33` (`JS Cluster Streams`)
|
|
**Go source:** `golang/nats-server/server/jetstream_cluster.go`
|
|
|
|
## Problem
|
|
|
|
Batch 34 ports JetStream cluster consumer operations from `server/jetstream_cluster.go` (lines ~5935-8744), including consumer assignment/inflight reconciliation, replicated ack processing, leader-change handling, peer-group placement logic, clustered stream request handling, and stream/consumer mutation encoding/decoding.
|
|
|
|
The mapped test set is broad (160 tests across 29 test classes), so the design must enforce strict evidence gates and avoid fake progress through placeholder implementations.
|
|
|
|
## Context Findings
|
|
|
|
### Required command outputs
|
|
|
|
- `batch show 34 --db porting.db`
|
|
- Status: `pending`
|
|
- Features: `58` (all `deferred`)
|
|
- Tests: `160` (all `deferred`)
|
|
- Depends on: `33`
|
|
- Go file: `server/jetstream_cluster.go`
|
|
- `batch list --db porting.db`
|
|
- Batch chain includes `33 -> 34 -> 38` for JS cluster consumer progression.
|
|
- `report summary --db porting.db`
|
|
- Overall progress: `1924/6942 (27.7%)`
|
|
|
|
Environment note: `dotnet` was not on `PATH` in this shell; commands need `/usr/local/share/dotnet/dotnet` fallback.
|
|
|
|
### Mapped feature ownership (from `porting.db`)
|
|
|
|
- `JetStreamCluster`: 19
|
|
- `JetStreamEngine`: 13
|
|
- `NatsServer`: 13
|
|
- `NatsConsumer`: 7
|
|
- `SelectPeerError`: 4
|
|
- `JsAccount`: 1
|
|
- `Account`: 1
|
|
|
|
### Mapped test distribution (top classes)
|
|
|
|
- `ServerOptionsTests` (28), `JwtProcessorTests` (20), `WebSocketHandlerTests` (14), `LeafNodeHandlerTests` (11), `JetStreamEngineTests` (11), `JetStreamClusterTests1` (10), plus 23 additional classes.
|
|
|
|
## Clarified Constraints
|
|
|
|
- Planning only in this session; no implementation execution.
|
|
- Batch 0 guardrail rigor is mandatory and must be adapted for **features + tests**.
|
|
- Feature work must be sliced into groups with max ~20 feature IDs.
|
|
- Status updates must use `feature/test batch-update` chunks of max 15 IDs.
|
|
- If blocked, mark `deferred` with explicit reason; do not write stubs.
|
|
|
|
## Approaches
|
|
|
|
### Approach A: Single large implementation pass
|
|
|
|
- Pros: low planning overhead.
|
|
- Cons: poor auditability, high regression/stub risk, hard to isolate failures.
|
|
|
|
### Approach B (Recommended): Feature-first 3 groups, then 5 test waves, each with hard checkpoint gates
|
|
|
|
- Pros: bounded scope, auditable status transitions, faster root-cause isolation.
|
|
- Cons: more CLI/test command overhead.
|
|
|
|
### Approach C: Test-first across all 160 before feature completion
|
|
|
|
- Pros: immediate behavior pressure.
|
|
- Cons: high churn because many tests depend on not-yet-ported consumer cluster paths.
|
|
|
|
**Decision:** Approach B.
|
|
|
|
## Proposed Design
|
|
|
|
### 1. Architecture and File Ownership
|
|
|
|
Production code is split by behavior boundary instead of one monolithic file:
|
|
|
|
- `JetStream` consumer orchestration:
|
|
- expected: `JetStream/JetStream.ClusterConsumers.cs` (create) or `JetStreamTypes.cs` (modify)
|
|
- `NatsConsumer` cluster hooks:
|
|
- expected: `JetStream/NatsConsumer.Cluster.cs` (create) or `NatsConsumer.cs` (modify)
|
|
- `JetStreamCluster` placement + encoding/decoding:
|
|
- expected: `JetStream/JetStreamCluster.Consumers.cs` (create) or `JetStreamClusterTypes.cs` (modify)
|
|
- `NatsServer` clustered request/advisory endpoints:
|
|
- expected: `NatsServer.JetStreamClusterConsumers.cs` (create) as partial server extension
|
|
- `Account` limits selection helper:
|
|
- expected: `Accounts/Account.JetStream.cs` (create) or `Accounts/Account.cs` (modify)
|
|
|
|
### 2. Feature Slicing (max ~20 IDs each)
|
|
|
|
- **Group A (20 IDs):** `1636-1655`
|
|
Consumer assignment/inflight lookup, consumer raft-node helpers, monitor/apply entries, ack decode, leader advisory primitives.
|
|
- **Group B (20 IDs):** `1656-1675`
|
|
Assignment result processors, updates subscription lifecycle, leader-change flow, peer remap/selection foundation, tier/limits checks, base clustered stream request helpers.
|
|
- **Group C (18 IDs):** `1676-1693`
|
|
Clustered stream update/delete/purge/restore/list, consumer/message delete requests, and assignment/purge/message encode-decode helpers.
|
|
|
|
### 3. Test Slicing
|
|
|
|
- **Wave T1 (37 IDs):** JetStream cluster/consumer behavior core (`JetStreamClusterTests1/2/3/4`, `JetStreamEngineTests`, `NatsConsumerTests`)
|
|
- **Wave T2 (39 IDs):** config/reload/options surface (`ServerOptionsTests`, `ConfigCheckTests`, `ConfigReloaderTests`, `NatsServerTests`)
|
|
- **Wave T3 (33 IDs):** JWT/auth/cert/account validations (`JwtProcessorTests`, `JetStreamJwtTests`, `AuthCalloutTests`, `AuthHandlerTests`, `CertificateStoreWindowsTests`, `AccountTests`)
|
|
- **Wave T4 (32 IDs):** transport + route + leaf/websocket (`WebSocketHandlerTests`, `LeafNodeHandlerTests`, `LeafNodeProxyTests`, `RouteHandlerTests`, `GatewayHandlerTests`)
|
|
- **Wave T5 (19 IDs):** remaining integration-oriented regressions (`MqttHandlerTests`, `JetStreamLeafNodeTests`, `JetStreamSuperClusterTests`, `MessageTracerTests`, `MonitoringHandlerTests`, `EventsHandlerTests`, `JetStreamFileStoreTests`)
|
|
|
|
### 4. Verification Model
|
|
|
|
- Per-feature loop and per-test loop are mandatory.
|
|
- Every loop requires:
|
|
- stub detection scan
|
|
- build gate
|
|
- targeted test gate
|
|
- Checkpoint required between all tasks before any `verified` promotion.
|
|
- Status transitions are evidence-driven only:
|
|
- `deferred/not_started -> stub -> complete -> verified`
|
|
|
|
### 5. Failure and Deferral Strategy
|
|
|
|
If blocked by missing infra/dependency behavior:
|
|
|
|
1. Stop the current item.
|
|
2. Do not introduce placeholder logic or fake-pass tests.
|
|
3. Mark item `deferred` with explicit reason via `--override`.
|
|
4. Continue with next unblocked ID.
|
|
|
|
## Risks and Mitigations
|
|
|
|
- **Dependency readiness risk (Batch 33):**
|
|
Mitigation: hard preflight gate before starting Batch 34.
|
|
- **Wide test blast radius (160 tests / 29 classes):**
|
|
Mitigation: wave-based execution and strict checkpoints.
|
|
- **Stub regression risk in ported methods/tests:**
|
|
Mitigation: non-negotiable anti-stub scans and hard limits.
|
|
- **Ownership ambiguity across partial classes:**
|
|
Mitigation: explicit file ownership map and method-to-class grouping.
|
|
|
|
## Success Criteria
|
|
|
|
- All 58 features are `verified` with evidence or `deferred` with explicit blocker reason.
|
|
- All 160 tests are `verified` with evidence or `deferred` with explicit blocker reason.
|
|
- No forbidden stub patterns remain in touched production or test files.
|
|
- Status updates are auditable and chunked (`<=15` IDs per `batch-update` call).
|
|
|
|
## Non-Goals
|
|
|
|
- Executing implementation in this planning session.
|
|
- Expanding scope beyond Batch 34.
|
|
- Building new infrastructure outside existing batch-mapped feature/test needs.
|