diff --git a/docs/plans/2026-02-23-jetstream-post-baseline-parity-design.md b/docs/plans/2026-02-23-jetstream-post-baseline-parity-design.md new file mode 100644 index 0000000..25e72b9 --- /dev/null +++ b/docs/plans/2026-02-23-jetstream-post-baseline-parity-design.md @@ -0,0 +1,177 @@ +# JetStream Post-Baseline Remaining Parity Design + +**Date:** 2026-02-23 +**Status:** Approved +**Scope:** Port all remaining Go JetStream functionality still marked `Baseline` or `N` in `differences.md`, including required transport prerequisites (gateway/leaf/account protocol) needed for full JetStream parity. + +## 1. Architecture and Scope Boundary + +### Parity closure target +The completion target is to eliminate JetStream and JetStream-required transport deltas from `differences.md` by moving remaining rows from `Baseline`/`N` to `Y` unless an explicit external blocker is documented with evidence. + +### In scope (remaining parity inventory) +1. JetStream runtime stream semantics: +- retention runtime behavior (`Limits`, `Interest`, `WorkQueue`) +- `MaxAge` TTL pruning and `MaxMsgsPer` enforcement +- `MaxMsgSize` reject path +- dedupe-window semantics (bounded duplicate window, not unbounded dictionary) +- stream config behavior for `Compression`, subject transform, republish, direct/KV toggles, sealed/delete/purge guards + +2. JetStream consumer semantics: +- full deliver-policy behavior (`All`, `Last`, `New`, `ByStartSequence`, `ByStartTime`, `LastPerSubject`) +- `AckPolicy.All` wire/runtime semantics parity +- `MaxDeliver` + backoff schedule + redelivery deadlines +- flow control frames, idle heartbeats, and rate limiting +- replay policy timing parity + +3. Mirror/source advanced behavior: +- mirror sync state tracking +- source subject mapping +- cross-account mirror/source behavior and auth checks + +4. JetStream storage parity layers: +- block-backed file layout +- time-based expiry/TTL index integration +- optional compression/encryption plumbing +- deterministic sequence index behavior for recovery and lookup + +5. RAFT/cluster semantics used by JetStream: +- heartbeat / keepalive and election timeout behavior +- `nextIndex` mismatch backtracking +- snapshot transfer + install from leader +- membership change semantics +- durable meta/replica governance wiring for JetStream cluster control + +6. JetStream-required transport prerequisites: +- inter-server account interest protocol (`A+`/`A-`) with account-aware propagation +- gateway advanced semantics (`_GR_.` reply remap + full interest-only behavior) +- leaf advanced semantics (`$LDS.` loop detection + account remap rules) +- cross-cluster JetStream forwarding path over gateway once interest semantics are correct +- internal `JETSTREAM` client lifecycle parity (`ClientKind.JetStream` usage in runtime wiring) + +### Out of scope +Non-JetStream-only gaps that do not affect JetStream parity closure (for example route compression or non-JS auth callout features) remain out of scope for this plan. + +## 2. Component Plan + +### A. Transport/account prerequisite completion +Primary files: +- `src/NATS.Server/Gateways/GatewayConnection.cs` +- `src/NATS.Server/Gateways/GatewayManager.cs` +- `src/NATS.Server/LeafNodes/LeafConnection.cs` +- `src/NATS.Server/LeafNodes/LeafNodeManager.cs` +- `src/NATS.Server/Routes/RouteConnection.cs` +- `src/NATS.Server/Protocol/ClientCommandMatrix.cs` +- `src/NATS.Server/NatsServer.cs` +- `src/NATS.Server/Subscriptions/RemoteSubscription.cs` +- `src/NATS.Server/Subscriptions/SubList.cs` + +Implementation intent: +- carry account-aware remote interest metadata end-to-end +- implement gateway reply remap contract and de-remap path +- implement leaf loop marker handling and account remap/validation + +### B. JetStream runtime semantic completion +Primary files: +- `src/NATS.Server/JetStream/StreamManager.cs` +- `src/NATS.Server/JetStream/ConsumerManager.cs` +- `src/NATS.Server/JetStream/Consumers/PullConsumerEngine.cs` +- `src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs` +- `src/NATS.Server/JetStream/Consumers/AckProcessor.cs` +- `src/NATS.Server/JetStream/Publish/JetStreamPublisher.cs` +- `src/NATS.Server/JetStream/Publish/PublishPreconditions.cs` +- `src/NATS.Server/JetStream/Models/StreamConfig.cs` +- `src/NATS.Server/JetStream/Models/ConsumerConfig.cs` +- `src/NATS.Server/JetStream/Validation/JetStreamConfigValidator.cs` + +Implementation intent: +- enforce configured policies at runtime, not just parse/model shape +- preserve Go-aligned API error codes and state transition behavior + +### C. Storage and snapshot durability +Primary files: +- `src/NATS.Server/JetStream/Storage/FileStore.cs` +- `src/NATS.Server/JetStream/Storage/FileStoreBlock.cs` +- `src/NATS.Server/JetStream/Storage/FileStoreOptions.cs` +- `src/NATS.Server/JetStream/Storage/MemStore.cs` +- `src/NATS.Server/JetStream/Snapshots/StreamSnapshotService.cs` + +Implementation intent: +- replace JSONL-only behavior with block-oriented store semantics +- enforce TTL pruning in store read/write paths + +### D. RAFT and JetStream cluster governance +Primary files: +- `src/NATS.Server/Raft/RaftNode.cs` +- `src/NATS.Server/Raft/RaftReplicator.cs` +- `src/NATS.Server/Raft/RaftTransport.cs` +- `src/NATS.Server/Raft/RaftLog.cs` +- `src/NATS.Server/Raft/RaftSnapshotStore.cs` +- `src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs` +- `src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs` +- `src/NATS.Server/JetStream/Cluster/AssetPlacementPlanner.cs` + +Implementation intent: +- transition from in-memory baseline consensus behavior to networked state-machine semantics needed by cluster APIs. + +### E. Internal JetStream client and observability +Primary files: +- `src/NATS.Server/NatsServer.cs` +- `src/NATS.Server/InternalClient.cs` +- `src/NATS.Server/Monitoring/JszHandler.cs` +- `src/NATS.Server/Monitoring/VarzHandler.cs` +- `differences.md` + +Implementation intent: +- wire internal `ClientKind.JetStream` client lifecycle where Go uses internal JS messaging paths +- ensure monitoring reflects newly enforced runtime behavior + +## 3. Data Flow and Behavioral Contracts + +1. Interest/account propagation: +- local subscription updates publish account-scoped interest events to route/gateway/leaf peers +- peers update per-account remote-interest state, not global-only state + +2. Gateway reply remap: +- outbound cross-cluster reply subjects are rewritten with `_GR_.` metadata +- inbound responses are de-remapped before local delivery +- no remap leakage to end clients + +3. Leaf loop prevention: +- loop marker (`$LDS.`) is injected/checked at leaf boundaries +- looped deliveries are rejected before enqueue + +4. Stream publish lifecycle: +- validate stream policy + preconditions +- apply dedupe-window logic +- append to store, prune by policy, then trigger mirror/source + consumer fanout + +5. Consumer delivery lifecycle: +- compute start position from deliver policy +- enforce max-ack-pending/rate/flow-control/backoff rules +- track pending/acks/redelivery deterministically across pull/push engines + +6. Cluster lifecycle: +- RAFT heartbeat/election drives leader state +- append mismatch uses next-index backtracking +- snapshots transfer over transport and compact follower logs +- meta-group and stream-groups use durable consensus outputs for control APIs + +## 4. Error Handling, Testing, and Completion Gate + +### Error handling principles +1. Keep JetStream API contract errors deterministic (validation vs state vs leadership vs storage). +2. Avoid silent downgrades from strict policy semantics to baseline fallback behavior. +3. Ensure cross-cluster remap/loop detection failures surface with protocol-safe errors and no partial state mutation. + +### Test strategy +1. Unit tests for each runtime policy branch and protocol transformation. +2. Integration tests for gateway/leaf/account propagation and cross-cluster message contracts. +3. Contract tests for RAFT election, snapshot transfer, and membership transitions. +4. Parity-map tests tying Go feature inventory rows to concrete .NET tests. + +### Strict completion criteria +1. Remaining JetStream/prerequisite rows in `differences.md` are either `Y` or explicitly blocked with linked evidence. +2. New behavior has deterministic test coverage at unit + integration level. +3. Focused and full suite gates pass. +4. `differences.md` and parity map are updated only after verified green evidence.