Files
natsdotnet/docs/plans/2026-02-23-jetstream-post-baseline-parity-design.md
2026-02-23 11:13:13 -05:00

178 lines
7.8 KiB
Markdown

# JetStream Post-Baseline Remaining Parity Design
**Date:** 2026-02-23
**Status:** Approved
**Scope:** Port all remaining Go JetStream functionality still marked `Baseline` or `N` in `differences.md`, including required transport prerequisites (gateway/leaf/account protocol) needed for full JetStream parity.
## 1. Architecture and Scope Boundary
### Parity closure target
The completion target is to eliminate JetStream and JetStream-required transport deltas from `differences.md` by moving remaining rows from `Baseline`/`N` to `Y` unless an explicit external blocker is documented with evidence.
### In scope (remaining parity inventory)
1. JetStream runtime stream semantics:
- retention runtime behavior (`Limits`, `Interest`, `WorkQueue`)
- `MaxAge` TTL pruning and `MaxMsgsPer` enforcement
- `MaxMsgSize` reject path
- dedupe-window semantics (bounded duplicate window, not unbounded dictionary)
- stream config behavior for `Compression`, subject transform, republish, direct/KV toggles, sealed/delete/purge guards
2. JetStream consumer semantics:
- full deliver-policy behavior (`All`, `Last`, `New`, `ByStartSequence`, `ByStartTime`, `LastPerSubject`)
- `AckPolicy.All` wire/runtime semantics parity
- `MaxDeliver` + backoff schedule + redelivery deadlines
- flow control frames, idle heartbeats, and rate limiting
- replay policy timing parity
3. Mirror/source advanced behavior:
- mirror sync state tracking
- source subject mapping
- cross-account mirror/source behavior and auth checks
4. JetStream storage parity layers:
- block-backed file layout
- time-based expiry/TTL index integration
- optional compression/encryption plumbing
- deterministic sequence index behavior for recovery and lookup
5. RAFT/cluster semantics used by JetStream:
- heartbeat / keepalive and election timeout behavior
- `nextIndex` mismatch backtracking
- snapshot transfer + install from leader
- membership change semantics
- durable meta/replica governance wiring for JetStream cluster control
6. JetStream-required transport prerequisites:
- inter-server account interest protocol (`A+`/`A-`) with account-aware propagation
- gateway advanced semantics (`_GR_.` reply remap + full interest-only behavior)
- leaf advanced semantics (`$LDS.` loop detection + account remap rules)
- cross-cluster JetStream forwarding path over gateway once interest semantics are correct
- internal `JETSTREAM` client lifecycle parity (`ClientKind.JetStream` usage in runtime wiring)
### Out of scope
Non-JetStream-only gaps that do not affect JetStream parity closure (for example route compression or non-JS auth callout features) remain out of scope for this plan.
## 2. Component Plan
### A. Transport/account prerequisite completion
Primary files:
- `src/NATS.Server/Gateways/GatewayConnection.cs`
- `src/NATS.Server/Gateways/GatewayManager.cs`
- `src/NATS.Server/LeafNodes/LeafConnection.cs`
- `src/NATS.Server/LeafNodes/LeafNodeManager.cs`
- `src/NATS.Server/Routes/RouteConnection.cs`
- `src/NATS.Server/Protocol/ClientCommandMatrix.cs`
- `src/NATS.Server/NatsServer.cs`
- `src/NATS.Server/Subscriptions/RemoteSubscription.cs`
- `src/NATS.Server/Subscriptions/SubList.cs`
Implementation intent:
- carry account-aware remote interest metadata end-to-end
- implement gateway reply remap contract and de-remap path
- implement leaf loop marker handling and account remap/validation
### B. JetStream runtime semantic completion
Primary files:
- `src/NATS.Server/JetStream/StreamManager.cs`
- `src/NATS.Server/JetStream/ConsumerManager.cs`
- `src/NATS.Server/JetStream/Consumers/PullConsumerEngine.cs`
- `src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs`
- `src/NATS.Server/JetStream/Consumers/AckProcessor.cs`
- `src/NATS.Server/JetStream/Publish/JetStreamPublisher.cs`
- `src/NATS.Server/JetStream/Publish/PublishPreconditions.cs`
- `src/NATS.Server/JetStream/Models/StreamConfig.cs`
- `src/NATS.Server/JetStream/Models/ConsumerConfig.cs`
- `src/NATS.Server/JetStream/Validation/JetStreamConfigValidator.cs`
Implementation intent:
- enforce configured policies at runtime, not just parse/model shape
- preserve Go-aligned API error codes and state transition behavior
### C. Storage and snapshot durability
Primary files:
- `src/NATS.Server/JetStream/Storage/FileStore.cs`
- `src/NATS.Server/JetStream/Storage/FileStoreBlock.cs`
- `src/NATS.Server/JetStream/Storage/FileStoreOptions.cs`
- `src/NATS.Server/JetStream/Storage/MemStore.cs`
- `src/NATS.Server/JetStream/Snapshots/StreamSnapshotService.cs`
Implementation intent:
- replace JSONL-only behavior with block-oriented store semantics
- enforce TTL pruning in store read/write paths
### D. RAFT and JetStream cluster governance
Primary files:
- `src/NATS.Server/Raft/RaftNode.cs`
- `src/NATS.Server/Raft/RaftReplicator.cs`
- `src/NATS.Server/Raft/RaftTransport.cs`
- `src/NATS.Server/Raft/RaftLog.cs`
- `src/NATS.Server/Raft/RaftSnapshotStore.cs`
- `src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs`
- `src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs`
- `src/NATS.Server/JetStream/Cluster/AssetPlacementPlanner.cs`
Implementation intent:
- transition from in-memory baseline consensus behavior to networked state-machine semantics needed by cluster APIs.
### E. Internal JetStream client and observability
Primary files:
- `src/NATS.Server/NatsServer.cs`
- `src/NATS.Server/InternalClient.cs`
- `src/NATS.Server/Monitoring/JszHandler.cs`
- `src/NATS.Server/Monitoring/VarzHandler.cs`
- `differences.md`
Implementation intent:
- wire internal `ClientKind.JetStream` client lifecycle where Go uses internal JS messaging paths
- ensure monitoring reflects newly enforced runtime behavior
## 3. Data Flow and Behavioral Contracts
1. Interest/account propagation:
- local subscription updates publish account-scoped interest events to route/gateway/leaf peers
- peers update per-account remote-interest state, not global-only state
2. Gateway reply remap:
- outbound cross-cluster reply subjects are rewritten with `_GR_.` metadata
- inbound responses are de-remapped before local delivery
- no remap leakage to end clients
3. Leaf loop prevention:
- loop marker (`$LDS.`) is injected/checked at leaf boundaries
- looped deliveries are rejected before enqueue
4. Stream publish lifecycle:
- validate stream policy + preconditions
- apply dedupe-window logic
- append to store, prune by policy, then trigger mirror/source + consumer fanout
5. Consumer delivery lifecycle:
- compute start position from deliver policy
- enforce max-ack-pending/rate/flow-control/backoff rules
- track pending/acks/redelivery deterministically across pull/push engines
6. Cluster lifecycle:
- RAFT heartbeat/election drives leader state
- append mismatch uses next-index backtracking
- snapshots transfer over transport and compact follower logs
- meta-group and stream-groups use durable consensus outputs for control APIs
## 4. Error Handling, Testing, and Completion Gate
### Error handling principles
1. Keep JetStream API contract errors deterministic (validation vs state vs leadership vs storage).
2. Avoid silent downgrades from strict policy semantics to baseline fallback behavior.
3. Ensure cross-cluster remap/loop detection failures surface with protocol-safe errors and no partial state mutation.
### Test strategy
1. Unit tests for each runtime policy branch and protocol transformation.
2. Integration tests for gateway/leaf/account propagation and cross-cluster message contracts.
3. Contract tests for RAFT election, snapshot transfer, and membership transitions.
4. Parity-map tests tying Go feature inventory rows to concrete .NET tests.
### Strict completion criteria
1. Remaining JetStream/prerequisite rows in `differences.md` are either `Y` or explicitly blocked with linked evidence.
2. New behavior has deterministic test coverage at unit + integration level.
3. Focused and full suite gates pass.
4. `differences.md` and parity map are updated only after verified green evidence.