Files
natsdotnet/docs/plans/2026-02-23-jetstream-full-parity-design.md
2026-02-23 05:25:09 -05:00

7.0 KiB

Full JetStream and Cluster Prerequisite Parity Design

Date: 2026-02-23
Status: Approved
Scope: Port JetStream from Go with all prerequisite subsystems required for full Go JetStream test parity, including cluster route/gateway/leaf behaviors and RAFT/meta-cluster semantics.
Verification Gate: Go JetStream-focused test suites in golang/nats-server/server/ plus new/updated .NET tests.
Cutover Model: Single end-to-end cutover (no interim acceptance gates).

1. Architecture

The implementation uses a full in-process .NET parity architecture that mirrors Go subsystem boundaries while keeping strict internal contracts.

  1. Core Server Layer (NatsServer/NatsClient)
  • Extend existing server/client runtime to support full client kinds and inter-server protocol paths.
  • Preserve responsibility for socket lifecycle, parser integration, auth entry, and local dispatch.
  1. Cluster Fabric Layer
  • Add route mesh, gateway links, leafnode links, interest propagation, and remote subscription accounting.
  • Provide transport-neutral contracts consumed by JetStream and RAFT replication services.
  1. JetStream Control Plane
  • Add account-scoped JetStream managers, API subject handlers ($JS.API.*), stream/consumer metadata lifecycle, advisories, and limit enforcement.
  • Integrate with RAFT/meta services for replicated decisions.
  1. JetStream Data Plane
  • Add stream ingest path, retention/eviction logic, consumer delivery/ack/redelivery, mirror/source orchestration, and flow-control behavior.
  • Use pluggable storage abstractions with parity-focused behavior.
  1. RAFT and Replication Layer
  • Implement meta-group plus per-asset replication groups, election/term logic, log replication, snapshots, and catchup.
  • Expose deterministic commit/applied hooks to JetStream runtime layers.
  1. Storage Layer
  • Implement memstore and filestore with sequence indexing, subject indexing, compaction/snapshot support, and recovery semantics.
  1. Observability Layer
  • Upgrade /jsz and /varz JetStream blocks from placeholders to live runtime reporting with Go-compatible response shape.

2. Components and Contracts

2.1 New component families

  1. Cluster and interserver subsystem
  • Add route/gateway/leaf and interserver protocol operations under src/NATS.Server/.
  • Extend parser/dispatcher with route/leaf/account operations currently excluded.
  • Expand client-kind model and command routing constraints.
  1. JetStream API and domain model
  • Add src/NATS.Server/JetStream/ subtree for API payload models, stream/consumer models, and error templates/codes.
  1. JetStream runtime
  • Add stream manager, consumer manager, ack processor, delivery scheduler, mirror/source orchestration, and flow control handlers.
  • Integrate publish path with stream capture/store/ack behavior.
  1. RAFT subsystem
  • Add src/NATS.Server/Raft/ for replicated logs, elections, snapshots, and membership operations.
  1. Storage subsystem
  • Add src/NATS.Server/JetStream/Storage/ for MemStore and FileStore, sequence/subject indexes, and restart recovery.

2.2 Existing components to upgrade

  1. src/NATS.Server/NatsOptions.cs
  • Add full config surface for clustering, JetStream, storage, placement, and parity-required limits.
  1. src/NATS.Server/Configuration/ConfigProcessor.cs
  • Replace silent ignore behavior for cluster/jetstream keys with parsing, mapping, and validation.
  1. src/NATS.Server/Protocol/NatsParser.cs and src/NATS.Server/NatsClient.cs
  • Add missing interserver operations and kind-aware dispatch paths needed for clustered JetStream behavior.
  1. Monitoring components
  • Upgrade src/NATS.Server/Monitoring/MonitorServer.cs and src/NATS.Server/Monitoring/Varz.cs.
  • Add/extend JS monitoring handlers and models for /jsz and JetStream runtime fields.

3. Data Flow and Behavioral Semantics

  1. Inbound publish path
  • Parse client publish commands, apply auth/permission checks, route to local subscribers and JetStream candidates.
  • For JetStream subjects: apply preconditions, append to store, replicate via RAFT (as required), apply committed state, return Go-compatible pub ack.
  1. Consumer delivery path
  • Use shared push/pull state model for pending, ack floor, redelivery timers, flow control, and max ack pending.
  • Enforce retention policy semantics (limits/interest/workqueue), filter subject behavior, replay policy, and eviction behavior.
  1. Replication and control flow
  • Meta RAFT governs replicated metadata decisions.
  • Per-stream/per-consumer groups replicate state and snapshots.
  • Leader changes preserve at-least-once delivery and consumer state invariants.
  1. Recovery flow
  • Reconstruct stream/consumer/store state on startup.
  • In clustered mode, rejoin replication groups and catch up before serving full API/delivery workload.
  • Preserve sequence continuity, subject indexes, delete markers, and pending/redelivery state.
  1. Monitoring flow
  • /varz JetStream fields and /jsz return live runtime state.
  • Advisory and metric surfaces update from control-plane and data-plane events.

4. Error Handling and Operational Constraints

  1. API error parity
  • Match canonical JetStream codes/messages for validation failures, state conflicts, limits, leadership/quorum issues, and storage failures.
  1. Protocol behavior
  • Preserve normal client compatibility while adding interserver protocol and internal client-kind restrictions.
  1. Storage and consistency failures
  • Classify corruption/truncation/checksum/snapshot failures as recoverable vs non-recoverable.
  • Avoid silent data loss and emit monitoring/advisory signals where parity requires.
  1. Cluster and RAFT fault handling
  • Explicitly handle no-quorum, stale leader, delayed apply, peer removal, catchup lag, and stepdown transitions.
  • Return leadership-aware API errors.
  1. Config/reload behavior
  • Treat JetStream and cluster config as first-class with strict validation.
  • Mirror Go-like reloadable vs restart-required change boundaries.

5. Testing and Verification Strategy

  1. .NET unit tests
  • Add focused tests for JetStream API validation, stream and consumer state, RAFT primitives, mem/file store invariants, and config parsing/validation.
  1. .NET integration tests
  • Add end-to-end tests for publish/store/consume/ack behavior, retention policies, restart recovery, and clustered prerequisites used by JetStream.
  1. Parity harness
  • Maintain mapping of Go JetStream test categories to .NET feature areas.
  • Execute JetStream-focused Go tests from golang/nats-server/server/ as acceptance benchmark.
  1. differences.md policy
  • Update only after verification gate passes.
  • Remove opening JetStream exclusion scope statement and replace with updated parity scope.

6. Scope Decisions Captured

  • Include all prerequisite non-JetStream subsystems required to satisfy full Go JetStream tests.
  • Verification target is full Go JetStream-focused parity, not a narrowed subset.
  • Delivery model is single end-to-end cutover.
  • differences.md top-level scope statement will be updated to include JetStream and clustering parity coverage once verified.