142 lines
7.0 KiB
Markdown
142 lines
7.0 KiB
Markdown
# Full JetStream and Cluster Prerequisite Parity Design
|
|
|
|
**Date:** 2026-02-23
|
|
**Status:** Approved
|
|
**Scope:** Port JetStream from Go with all prerequisite subsystems required for full Go JetStream test parity, including cluster route/gateway/leaf behaviors and RAFT/meta-cluster semantics.
|
|
**Verification Gate:** Go JetStream-focused test suites in `golang/nats-server/server/` plus new/updated .NET tests.
|
|
**Cutover Model:** Single end-to-end cutover (no interim acceptance gates).
|
|
|
|
## 1. Architecture
|
|
|
|
The implementation uses a full in-process .NET parity architecture that mirrors Go subsystem boundaries while keeping strict internal contracts.
|
|
|
|
1. Core Server Layer (`NatsServer`/`NatsClient`)
|
|
- Extend existing server/client runtime to support full client kinds and inter-server protocol paths.
|
|
- Preserve responsibility for socket lifecycle, parser integration, auth entry, and local dispatch.
|
|
|
|
2. Cluster Fabric Layer
|
|
- Add route mesh, gateway links, leafnode links, interest propagation, and remote subscription accounting.
|
|
- Provide transport-neutral contracts consumed by JetStream and RAFT replication services.
|
|
|
|
3. JetStream Control Plane
|
|
- Add account-scoped JetStream managers, API subject handlers (`$JS.API.*`), stream/consumer metadata lifecycle, advisories, and limit enforcement.
|
|
- Integrate with RAFT/meta services for replicated decisions.
|
|
|
|
4. JetStream Data Plane
|
|
- Add stream ingest path, retention/eviction logic, consumer delivery/ack/redelivery, mirror/source orchestration, and flow-control behavior.
|
|
- Use pluggable storage abstractions with parity-focused behavior.
|
|
|
|
5. RAFT and Replication Layer
|
|
- Implement meta-group plus per-asset replication groups, election/term logic, log replication, snapshots, and catchup.
|
|
- Expose deterministic commit/applied hooks to JetStream runtime layers.
|
|
|
|
6. Storage Layer
|
|
- Implement memstore and filestore with sequence indexing, subject indexing, compaction/snapshot support, and recovery semantics.
|
|
|
|
7. Observability Layer
|
|
- Upgrade `/jsz` and `/varz` JetStream blocks from placeholders to live runtime reporting with Go-compatible response shape.
|
|
|
|
## 2. Components and Contracts
|
|
|
|
### 2.1 New component families
|
|
|
|
1. Cluster and interserver subsystem
|
|
- Add route/gateway/leaf and interserver protocol operations under `src/NATS.Server/`.
|
|
- Extend parser/dispatcher with route/leaf/account operations currently excluded.
|
|
- Expand client-kind model and command routing constraints.
|
|
|
|
2. JetStream API and domain model
|
|
- Add `src/NATS.Server/JetStream/` subtree for API payload models, stream/consumer models, and error templates/codes.
|
|
|
|
3. JetStream runtime
|
|
- Add stream manager, consumer manager, ack processor, delivery scheduler, mirror/source orchestration, and flow control handlers.
|
|
- Integrate publish path with stream capture/store/ack behavior.
|
|
|
|
4. RAFT subsystem
|
|
- Add `src/NATS.Server/Raft/` for replicated logs, elections, snapshots, and membership operations.
|
|
|
|
5. Storage subsystem
|
|
- Add `src/NATS.Server/JetStream/Storage/` for `MemStore` and `FileStore`, sequence/subject indexes, and restart recovery.
|
|
|
|
### 2.2 Existing components to upgrade
|
|
|
|
1. `src/NATS.Server/NatsOptions.cs`
|
|
- Add full config surface for clustering, JetStream, storage, placement, and parity-required limits.
|
|
|
|
2. `src/NATS.Server/Configuration/ConfigProcessor.cs`
|
|
- Replace silent ignore behavior for cluster/jetstream keys with parsing, mapping, and validation.
|
|
|
|
3. `src/NATS.Server/Protocol/NatsParser.cs` and `src/NATS.Server/NatsClient.cs`
|
|
- Add missing interserver operations and kind-aware dispatch paths needed for clustered JetStream behavior.
|
|
|
|
4. Monitoring components
|
|
- Upgrade `src/NATS.Server/Monitoring/MonitorServer.cs` and `src/NATS.Server/Monitoring/Varz.cs`.
|
|
- Add/extend JS monitoring handlers and models for `/jsz` and JetStream runtime fields.
|
|
|
|
## 3. Data Flow and Behavioral Semantics
|
|
|
|
1. Inbound publish path
|
|
- Parse client publish commands, apply auth/permission checks, route to local subscribers and JetStream candidates.
|
|
- For JetStream subjects: apply preconditions, append to store, replicate via RAFT (as required), apply committed state, return Go-compatible pub ack.
|
|
|
|
2. Consumer delivery path
|
|
- Use shared push/pull state model for pending, ack floor, redelivery timers, flow control, and max ack pending.
|
|
- Enforce retention policy semantics (limits/interest/workqueue), filter subject behavior, replay policy, and eviction behavior.
|
|
|
|
3. Replication and control flow
|
|
- Meta RAFT governs replicated metadata decisions.
|
|
- Per-stream/per-consumer groups replicate state and snapshots.
|
|
- Leader changes preserve at-least-once delivery and consumer state invariants.
|
|
|
|
4. Recovery flow
|
|
- Reconstruct stream/consumer/store state on startup.
|
|
- In clustered mode, rejoin replication groups and catch up before serving full API/delivery workload.
|
|
- Preserve sequence continuity, subject indexes, delete markers, and pending/redelivery state.
|
|
|
|
5. Monitoring flow
|
|
- `/varz` JetStream fields and `/jsz` return live runtime state.
|
|
- Advisory and metric surfaces update from control-plane and data-plane events.
|
|
|
|
## 4. Error Handling and Operational Constraints
|
|
|
|
1. API error parity
|
|
- Match canonical JetStream codes/messages for validation failures, state conflicts, limits, leadership/quorum issues, and storage failures.
|
|
|
|
2. Protocol behavior
|
|
- Preserve normal client compatibility while adding interserver protocol and internal client-kind restrictions.
|
|
|
|
3. Storage and consistency failures
|
|
- Classify corruption/truncation/checksum/snapshot failures as recoverable vs non-recoverable.
|
|
- Avoid silent data loss and emit monitoring/advisory signals where parity requires.
|
|
|
|
4. Cluster and RAFT fault handling
|
|
- Explicitly handle no-quorum, stale leader, delayed apply, peer removal, catchup lag, and stepdown transitions.
|
|
- Return leadership-aware API errors.
|
|
|
|
5. Config/reload behavior
|
|
- Treat JetStream and cluster config as first-class with strict validation.
|
|
- Mirror Go-like reloadable vs restart-required change boundaries.
|
|
|
|
## 5. Testing and Verification Strategy
|
|
|
|
1. .NET unit tests
|
|
- Add focused tests for JetStream API validation, stream and consumer state, RAFT primitives, mem/file store invariants, and config parsing/validation.
|
|
|
|
2. .NET integration tests
|
|
- Add end-to-end tests for publish/store/consume/ack behavior, retention policies, restart recovery, and clustered prerequisites used by JetStream.
|
|
|
|
3. Parity harness
|
|
- Maintain mapping of Go JetStream test categories to .NET feature areas.
|
|
- Execute JetStream-focused Go tests from `golang/nats-server/server/` as acceptance benchmark.
|
|
|
|
4. `differences.md` policy
|
|
- Update only after verification gate passes.
|
|
- Remove opening JetStream exclusion scope statement and replace with updated parity scope.
|
|
|
|
## 6. Scope Decisions Captured
|
|
|
|
- Include all prerequisite non-JetStream subsystems required to satisfy full Go JetStream tests.
|
|
- Verification target is full Go JetStream-focused parity, not a narrowed subset.
|
|
- Delivery model is single end-to-end cutover.
|
|
- `differences.md` top-level scope statement will be updated to include JetStream and clustering parity coverage once verified.
|