Files
natsdotnet/docs/plans/2026-02-23-jetstream-deep-operational-parity-design.md
2026-02-23 13:17:46 -05:00

6.5 KiB

JetStream Deep Operational Parity Design

Date: 2026-02-23
Status: Approved
Scope: Identify and close remaining JetStream deep operational parity gaps versus Go, including behavior-level semantics, storage durability, RAFT/cluster behavior, and documentation drift reconciliation.

1. Architecture and Scope Boundary

Scope definition

This cycle is JetStream-focused and targets deep operational parity:

  1. Stream runtime semantics
  2. Consumer runtime/state machine semantics
  3. Storage durability semantics
  4. RAFT/network and JetStream clustering semantics
  5. Documentation/evidence reconciliation

JETSTREAM (internal) is treated as implemented behavior (code + tests present). Any stale doc line stating it is unimplemented is handled as documentation drift, not a re-implementation target.

Parity control model

Each feature area is tracked with a truth matrix:

  1. Behavior
  • Go-equivalent runtime behavior exists in observable server operation.
  1. Tests
  • Contract-positive plus negative/edge tests validate behavior and detect regressions beyond hook-level checks.
  1. Docs
  • differences.md and parity artifacts accurately reflect validated behavior.

A feature closes only when Behavior + Tests + Docs are all complete.

Ordered implementation layers

  1. Stream runtime semantics
  2. Consumer state machine semantics
  3. Storage durability semantics
  4. RAFT and cluster governance semantics
  5. Documentation synchronization

2. Component Plan

A. Stream runtime semantics

Primary files:

  • src/NATS.Server/JetStream/StreamManager.cs
  • src/NATS.Server/JetStream/Models/StreamConfig.cs
  • src/NATS.Server/JetStream/Publish/JetStreamPublisher.cs
  • src/NATS.Server/JetStream/Publish/PublishPreconditions.cs
  • src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs
  • src/NATS.Server/JetStream/Validation/JetStreamConfigValidator.cs

Focus:

  • retention semantics (Limits/Interest/WorkQueue) under live publish/delete flows
  • MaxAge, MaxMsgsPer, MaxMsgSize, dedupe-window semantics under mixed workloads
  • guard behavior (sealed, deny_delete, deny_purge) with contract-accurate errors
  • runtime (not parse-only) behavior for transform/republish/direct-related features

B. Consumer runtime/state machine semantics

Primary files:

  • src/NATS.Server/JetStream/ConsumerManager.cs
  • src/NATS.Server/JetStream/Consumers/AckProcessor.cs
  • src/NATS.Server/JetStream/Consumers/PullConsumerEngine.cs
  • src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs
  • src/NATS.Server/JetStream/Models/ConsumerConfig.cs
  • src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs

Focus:

  • deliver-policy start resolution and cursor transitions
  • ack floor and redelivery determinism (AckPolicy.*, backoff, max-deliver)
  • flow control, rate limiting, replay timing semantics across longer scenarios

C. Storage durability semantics

Primary files:

  • src/NATS.Server/JetStream/Storage/FileStore.cs
  • src/NATS.Server/JetStream/Storage/FileStoreBlock.cs
  • src/NATS.Server/JetStream/Storage/FileStoreOptions.cs
  • src/NATS.Server/JetStream/Storage/IStreamStore.cs
  • src/NATS.Server/JetStream/Storage/MemStore.cs

Focus:

  • durable block/index invariants under restart and prune/rewrite cycles
  • compression/encryption behavior from transform stubs to parity-meaningful persistence semantics
  • TTL and index consistency guarantees for large and long-running data sets

D. RAFT and JetStream cluster semantics

Primary files:

  • src/NATS.Server/Raft/RaftNode.cs
  • src/NATS.Server/Raft/RaftReplicator.cs
  • src/NATS.Server/Raft/RaftTransport.cs
  • src/NATS.Server/Raft/RaftRpcContracts.cs
  • src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs
  • src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs
  • src/NATS.Server/JetStream/Cluster/AssetPlacementPlanner.cs
  • integration touchpoints in src/NATS.Server/NatsServer.cs

Focus:

  • move from hook-level consensus behaviors to term/quorum-driven outcomes
  • snapshot transfer and membership semantics affecting real commit/placement behavior
  • cross-cluster JetStream behavior validated beyond counter-style forwarding checks

E. Evidence and documentation reconciliation

Primary files:

  • differences.md
  • docs/plans/2026-02-23-jetstream-remaining-parity-map.md
  • docs/plans/2026-02-23-jetstream-remaining-parity-verification.md

Focus:

  • remove stale contradictory lines and align notes with verified implementation state
  • keep all parity claims traceable to tests and behavior evidence

3. Data Flow and Behavioral Contracts

  1. Publish path contract
  • precondition checks occur before persistence mutation
  • stream policy outcomes are atomic from client perspective
  • no partial state exposure on failed publish paths
  1. Consumer path contract
  • deterministic cursor initialization and progression
  • ack/redelivery/backoff semantics form a single coherent state machine
  • push/pull engines preserve contract parity under sustained load and restart boundaries
  1. Storage contract
  • persisted data and indices roundtrip across restarts without sequence/index drift
  • pruning, ttl, and limit enforcement preserve state invariants (first/last/messages/bytes)
  • compression/encryption boundaries are reversible and version-safe
  1. RAFT/cluster contract
  • append/commit behavior is consensus-gated (term/quorum aware)
  • heartbeat and snapshot mechanics drive observable follower convergence
  • placement/governance decisions reflect committed cluster state
  1. Documentation contract
  • JetStream table rows and summary notes in differences.md must agree
  • JETSTREAM (internal) status remains Y with explicit verification evidence

4. Error Handling, Testing Strategy, and Completion Gates

Error handling

  1. Keep JetStream-specific error semantics and codes intact.
  2. Fail closed on durability/consensus invariant breaches.
  3. Reject partial cluster mutations when consensus prerequisites fail.

Test strategy

  1. Per feature area: contract-positive + edge/failure test.
  2. Persistence features: restart/recovery tests are mandatory.
  3. Replace hook-level “counter” tests with behavior-real integration tests for deep semantics.
  4. Keep targeted suites per layer plus cross-layer integration scenarios.

Completion gates

  1. Behavior gate: deep JetStream operational parity gaps closed or explicitly blocked with evidence.
  2. Test gate: focused suites and full suite pass.
  3. Docs gate: parity docs reflect actual validated behavior; stale contradictions removed.
  4. Drift gate: explicit verification that internal JetStream client remains implemented and documented as Y.