From 47ab559ada305f4523789a8381ad3936d4434d7c Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 23 Feb 2026 13:17:46 -0500 Subject: [PATCH] docs: add jetstream deep operational parity design --- ...etstream-deep-operational-parity-design.md | 154 ++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 docs/plans/2026-02-23-jetstream-deep-operational-parity-design.md diff --git a/docs/plans/2026-02-23-jetstream-deep-operational-parity-design.md b/docs/plans/2026-02-23-jetstream-deep-operational-parity-design.md new file mode 100644 index 0000000..e1a86d0 --- /dev/null +++ b/docs/plans/2026-02-23-jetstream-deep-operational-parity-design.md @@ -0,0 +1,154 @@ +# JetStream Deep Operational Parity Design + +**Date:** 2026-02-23 +**Status:** Approved +**Scope:** Identify and close remaining JetStream deep operational parity gaps versus Go, including behavior-level semantics, storage durability, RAFT/cluster behavior, and documentation drift reconciliation. + +## 1. Architecture and Scope Boundary + +### Scope definition +This cycle is JetStream-focused and targets deep operational parity: + +1. Stream runtime semantics +2. Consumer runtime/state machine semantics +3. Storage durability semantics +4. RAFT/network and JetStream clustering semantics +5. Documentation/evidence reconciliation + +`JETSTREAM (internal)` is treated as implemented behavior (code + tests present). Any stale doc line stating it is unimplemented is handled as documentation drift, not a re-implementation target. + +### Parity control model +Each feature area is tracked with a truth matrix: + +1. Behavior +- Go-equivalent runtime behavior exists in observable server operation. + +2. Tests +- Contract-positive plus negative/edge tests validate behavior and detect regressions beyond hook-level checks. + +3. Docs +- `differences.md` and parity artifacts accurately reflect validated behavior. + +A feature closes only when Behavior + Tests + Docs are all complete. + +### Ordered implementation layers +1. Stream runtime semantics +2. Consumer state machine semantics +3. Storage durability semantics +4. RAFT and cluster governance semantics +5. Documentation synchronization + +## 2. Component Plan + +### A. Stream runtime semantics +Primary files: +- `src/NATS.Server/JetStream/StreamManager.cs` +- `src/NATS.Server/JetStream/Models/StreamConfig.cs` +- `src/NATS.Server/JetStream/Publish/JetStreamPublisher.cs` +- `src/NATS.Server/JetStream/Publish/PublishPreconditions.cs` +- `src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs` +- `src/NATS.Server/JetStream/Validation/JetStreamConfigValidator.cs` + +Focus: +- retention semantics (`Limits/Interest/WorkQueue`) under live publish/delete flows +- `MaxAge`, `MaxMsgsPer`, `MaxMsgSize`, dedupe-window semantics under mixed workloads +- guard behavior (`sealed`, `deny_delete`, `deny_purge`) with contract-accurate errors +- runtime (not parse-only) behavior for transform/republish/direct-related features + +### B. Consumer runtime/state machine semantics +Primary files: +- `src/NATS.Server/JetStream/ConsumerManager.cs` +- `src/NATS.Server/JetStream/Consumers/AckProcessor.cs` +- `src/NATS.Server/JetStream/Consumers/PullConsumerEngine.cs` +- `src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs` +- `src/NATS.Server/JetStream/Models/ConsumerConfig.cs` +- `src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs` + +Focus: +- deliver-policy start resolution and cursor transitions +- ack floor and redelivery determinism (`AckPolicy.*`, backoff, max-deliver) +- flow control, rate limiting, replay timing semantics across longer scenarios + +### C. Storage durability semantics +Primary files: +- `src/NATS.Server/JetStream/Storage/FileStore.cs` +- `src/NATS.Server/JetStream/Storage/FileStoreBlock.cs` +- `src/NATS.Server/JetStream/Storage/FileStoreOptions.cs` +- `src/NATS.Server/JetStream/Storage/IStreamStore.cs` +- `src/NATS.Server/JetStream/Storage/MemStore.cs` + +Focus: +- durable block/index invariants under restart and prune/rewrite cycles +- compression/encryption behavior from transform stubs to parity-meaningful persistence semantics +- TTL and index consistency guarantees for large and long-running data sets + +### D. RAFT and JetStream cluster semantics +Primary files: +- `src/NATS.Server/Raft/RaftNode.cs` +- `src/NATS.Server/Raft/RaftReplicator.cs` +- `src/NATS.Server/Raft/RaftTransport.cs` +- `src/NATS.Server/Raft/RaftRpcContracts.cs` +- `src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs` +- `src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs` +- `src/NATS.Server/JetStream/Cluster/AssetPlacementPlanner.cs` +- integration touchpoints in `src/NATS.Server/NatsServer.cs` + +Focus: +- move from hook-level consensus behaviors to term/quorum-driven outcomes +- snapshot transfer and membership semantics affecting real commit/placement behavior +- cross-cluster JetStream behavior validated beyond counter-style forwarding checks + +### E. Evidence and documentation reconciliation +Primary files: +- `differences.md` +- `docs/plans/2026-02-23-jetstream-remaining-parity-map.md` +- `docs/plans/2026-02-23-jetstream-remaining-parity-verification.md` + +Focus: +- remove stale contradictory lines and align notes with verified implementation state +- keep all parity claims traceable to tests and behavior evidence + +## 3. Data Flow and Behavioral Contracts + +1. Publish path contract +- precondition checks occur before persistence mutation +- stream policy outcomes are atomic from client perspective +- no partial state exposure on failed publish paths + +2. Consumer path contract +- deterministic cursor initialization and progression +- ack/redelivery/backoff semantics form a single coherent state machine +- push/pull engines preserve contract parity under sustained load and restart boundaries + +3. Storage contract +- persisted data and indices roundtrip across restarts without sequence/index drift +- pruning, ttl, and limit enforcement preserve state invariants (`first/last/messages/bytes`) +- compression/encryption boundaries are reversible and version-safe + +4. RAFT/cluster contract +- append/commit behavior is consensus-gated (term/quorum aware) +- heartbeat and snapshot mechanics drive observable follower convergence +- placement/governance decisions reflect committed cluster state + +5. Documentation contract +- JetStream table rows and summary notes in `differences.md` must agree +- `JETSTREAM (internal)` status remains `Y` with explicit verification evidence + +## 4. Error Handling, Testing Strategy, and Completion Gates + +### Error handling +1. Keep JetStream-specific error semantics and codes intact. +2. Fail closed on durability/consensus invariant breaches. +3. Reject partial cluster mutations when consensus prerequisites fail. + +### Test strategy +1. Per feature area: contract-positive + edge/failure test. +2. Persistence features: restart/recovery tests are mandatory. +3. Replace hook-level “counter” tests with behavior-real integration tests for deep semantics. +4. Keep targeted suites per layer plus cross-layer integration scenarios. + +### Completion gates +1. Behavior gate: deep JetStream operational parity gaps closed or explicitly blocked with evidence. +2. Test gate: focused suites and full suite pass. +3. Docs gate: parity docs reflect actual validated behavior; stale contradictions removed. +4. Drift gate: explicit verification that internal JetStream client remains implemented and documented as `Y`.