diff --git a/docs/plans/2026-02-23-jetstream-full-parity-design.md b/docs/plans/2026-02-23-jetstream-full-parity-design.md new file mode 100644 index 0000000..c38a5bb --- /dev/null +++ b/docs/plans/2026-02-23-jetstream-full-parity-design.md @@ -0,0 +1,141 @@ +# Full JetStream and Cluster Prerequisite Parity Design + +**Date:** 2026-02-23 +**Status:** Approved +**Scope:** Port JetStream from Go with all prerequisite subsystems required for full Go JetStream test parity, including cluster route/gateway/leaf behaviors and RAFT/meta-cluster semantics. +**Verification Gate:** Go JetStream-focused test suites in `golang/nats-server/server/` plus new/updated .NET tests. +**Cutover Model:** Single end-to-end cutover (no interim acceptance gates). + +## 1. Architecture + +The implementation uses a full in-process .NET parity architecture that mirrors Go subsystem boundaries while keeping strict internal contracts. + +1. Core Server Layer (`NatsServer`/`NatsClient`) +- Extend existing server/client runtime to support full client kinds and inter-server protocol paths. +- Preserve responsibility for socket lifecycle, parser integration, auth entry, and local dispatch. + +2. Cluster Fabric Layer +- Add route mesh, gateway links, leafnode links, interest propagation, and remote subscription accounting. +- Provide transport-neutral contracts consumed by JetStream and RAFT replication services. + +3. JetStream Control Plane +- Add account-scoped JetStream managers, API subject handlers (`$JS.API.*`), stream/consumer metadata lifecycle, advisories, and limit enforcement. +- Integrate with RAFT/meta services for replicated decisions. + +4. JetStream Data Plane +- Add stream ingest path, retention/eviction logic, consumer delivery/ack/redelivery, mirror/source orchestration, and flow-control behavior. +- Use pluggable storage abstractions with parity-focused behavior. + +5. RAFT and Replication Layer +- Implement meta-group plus per-asset replication groups, election/term logic, log replication, snapshots, and catchup. +- Expose deterministic commit/applied hooks to JetStream runtime layers. + +6. Storage Layer +- Implement memstore and filestore with sequence indexing, subject indexing, compaction/snapshot support, and recovery semantics. + +7. Observability Layer +- Upgrade `/jsz` and `/varz` JetStream blocks from placeholders to live runtime reporting with Go-compatible response shape. + +## 2. Components and Contracts + +### 2.1 New component families + +1. Cluster and interserver subsystem +- Add route/gateway/leaf and interserver protocol operations under `src/NATS.Server/`. +- Extend parser/dispatcher with route/leaf/account operations currently excluded. +- Expand client-kind model and command routing constraints. + +2. JetStream API and domain model +- Add `src/NATS.Server/JetStream/` subtree for API payload models, stream/consumer models, and error templates/codes. + +3. JetStream runtime +- Add stream manager, consumer manager, ack processor, delivery scheduler, mirror/source orchestration, and flow control handlers. +- Integrate publish path with stream capture/store/ack behavior. + +4. RAFT subsystem +- Add `src/NATS.Server/Raft/` for replicated logs, elections, snapshots, and membership operations. + +5. Storage subsystem +- Add `src/NATS.Server/JetStream/Storage/` for `MemStore` and `FileStore`, sequence/subject indexes, and restart recovery. + +### 2.2 Existing components to upgrade + +1. `src/NATS.Server/NatsOptions.cs` +- Add full config surface for clustering, JetStream, storage, placement, and parity-required limits. + +2. `src/NATS.Server/Configuration/ConfigProcessor.cs` +- Replace silent ignore behavior for cluster/jetstream keys with parsing, mapping, and validation. + +3. `src/NATS.Server/Protocol/NatsParser.cs` and `src/NATS.Server/NatsClient.cs` +- Add missing interserver operations and kind-aware dispatch paths needed for clustered JetStream behavior. + +4. Monitoring components +- Upgrade `src/NATS.Server/Monitoring/MonitorServer.cs` and `src/NATS.Server/Monitoring/Varz.cs`. +- Add/extend JS monitoring handlers and models for `/jsz` and JetStream runtime fields. + +## 3. Data Flow and Behavioral Semantics + +1. Inbound publish path +- Parse client publish commands, apply auth/permission checks, route to local subscribers and JetStream candidates. +- For JetStream subjects: apply preconditions, append to store, replicate via RAFT (as required), apply committed state, return Go-compatible pub ack. + +2. Consumer delivery path +- Use shared push/pull state model for pending, ack floor, redelivery timers, flow control, and max ack pending. +- Enforce retention policy semantics (limits/interest/workqueue), filter subject behavior, replay policy, and eviction behavior. + +3. Replication and control flow +- Meta RAFT governs replicated metadata decisions. +- Per-stream/per-consumer groups replicate state and snapshots. +- Leader changes preserve at-least-once delivery and consumer state invariants. + +4. Recovery flow +- Reconstruct stream/consumer/store state on startup. +- In clustered mode, rejoin replication groups and catch up before serving full API/delivery workload. +- Preserve sequence continuity, subject indexes, delete markers, and pending/redelivery state. + +5. Monitoring flow +- `/varz` JetStream fields and `/jsz` return live runtime state. +- Advisory and metric surfaces update from control-plane and data-plane events. + +## 4. Error Handling and Operational Constraints + +1. API error parity +- Match canonical JetStream codes/messages for validation failures, state conflicts, limits, leadership/quorum issues, and storage failures. + +2. Protocol behavior +- Preserve normal client compatibility while adding interserver protocol and internal client-kind restrictions. + +3. Storage and consistency failures +- Classify corruption/truncation/checksum/snapshot failures as recoverable vs non-recoverable. +- Avoid silent data loss and emit monitoring/advisory signals where parity requires. + +4. Cluster and RAFT fault handling +- Explicitly handle no-quorum, stale leader, delayed apply, peer removal, catchup lag, and stepdown transitions. +- Return leadership-aware API errors. + +5. Config/reload behavior +- Treat JetStream and cluster config as first-class with strict validation. +- Mirror Go-like reloadable vs restart-required change boundaries. + +## 5. Testing and Verification Strategy + +1. .NET unit tests +- Add focused tests for JetStream API validation, stream and consumer state, RAFT primitives, mem/file store invariants, and config parsing/validation. + +2. .NET integration tests +- Add end-to-end tests for publish/store/consume/ack behavior, retention policies, restart recovery, and clustered prerequisites used by JetStream. + +3. Parity harness +- Maintain mapping of Go JetStream test categories to .NET feature areas. +- Execute JetStream-focused Go tests from `golang/nats-server/server/` as acceptance benchmark. + +4. `differences.md` policy +- Update only after verification gate passes. +- Remove opening JetStream exclusion scope statement and replace with updated parity scope. + +## 6. Scope Decisions Captured + +- Include all prerequisite non-JetStream subsystems required to satisfy full Go JetStream tests. +- Verification target is full Go JetStream-focused parity, not a narrowed subset. +- Delivery model is single end-to-end cutover. +- `differences.md` top-level scope statement will be updated to include JetStream and clustering parity coverage once verified.