Files
natsdotnet/docs/plans/2026-02-24-full-production-parity-design.md
Joseph Doherty d445a9fae1 docs: add full production parity design
6-wave implementation plan covering RAFT consensus, FileStore block
engine, internal data structures, JetStream clustering, and remaining
subsystem test suites. Targets ~1,160 new tests for ~75% Go parity.
2026-02-23 20:31:57 -05:00

8.9 KiB

Full Production Parity Design

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Close all remaining gaps between the Go NATS server and the .NET port — implementation code and test coverage — achieving full production parity.

Current state: 1,081 tests passing, core pub/sub + JetStream basics + MQTT packet parsing + JWT claims ported. Three major implementation gaps remain: RAFT consensus, FileStore block engine, and internal data structures (AVL, subject tree, GSL, time hash wheel).

Approach: 6-wave slice-by-slice TDD, ordered by dependency. Each wave builds on the prior wave's production code and tests. Parallel subagents within each wave for independent subsystems.


Gap Analysis Summary

Implementation Gaps

Gap Go Source .NET Status Impact
RAFT consensus server/raft.go (5,800 lines) Missing entirely Blocks clustered JetStream
FileStore block engine server/filestore.go (337KB) Flat JSONL stub Blocks persistent JetStream
Internal data structures server/avl/, server/stree/, server/gsl/, server/thw/ Missing entirely Blocks FileStore + RAFT

Test Coverage Gap

  • Go server tests: ~2,937 test functions
  • .NET tests: 1,081 (32.5% coverage)
  • Gap: ~1,856 tests across all subsystems

Wave 1: Inventory + Scaffolding

Purpose: Establish project structure, create stub files, set up namespaces.

Deliverables:

  • Namespace scaffolding: NATS.Server.Internal.Avl, NATS.Server.Internal.SubjectTree, NATS.Server.Internal.Gsl, NATS.Server.Internal.TimeHashWheel
  • Stub interfaces for FileStore block engine
  • Stub interfaces for RAFT node, log, transport
  • Test project directory structure for all new subsystems

Tests: 0 (scaffolding only)


Wave 2: Internal Data Structures

Purpose: Port Go's internal data structures that FileStore and RAFT depend on.

AVL Tree (server/avl/)

  • Sparse sequence set backed by AVL-balanced binary tree
  • Used for JetStream ack tracking (consumer pending sets)
  • Key operations: Insert, Delete, Contains, Range, Size
  • Go reference: server/avl/seqset.go
  • Port as NATS.Server.Internal.Avl.SequenceSet
  • ~15 tests from Go's TestSequenceSet*

Subject Tree (server/stree/)

  • Trie for per-subject state in streams (sequence tracking, last-by-subject)
  • Supports wildcard iteration (*, >)
  • Go reference: server/stree/stree.go
  • Port as NATS.Server.Internal.SubjectTree.SubjectTree<T>
  • ~15 tests from Go's TestSubjectTree*

Generic Subject List (server/gsl/)

  • Optimized trie for subscription matching (alternative to SubList for specific paths)
  • Go reference: server/gsl/gsl.go
  • Port as NATS.Server.Internal.Gsl.GenericSubjectList<T>
  • ~15 tests from Go's TestGSL*

Time Hash Wheel (server/thw/)

  • Efficient TTL expiration using hash wheel (O(1) insert/cancel, O(bucket) tick)
  • Used for message expiry in MemStore and FileStore
  • Go reference: server/thw/thw.go
  • Port as NATS.Server.Internal.TimeHashWheel.TimeHashWheel<T>
  • ~15 tests from Go's TestTimeHashWheel*

Total tests: ~60


Wave 3: FileStore Block Engine

Purpose: Replace the flat JSONL FileStore stub with Go-compatible block-based storage.

Design Decisions

  • Behavioral equivalence — same 64MB block boundaries and semantics, not byte-level Go file compatibility
  • Block format: Each block is a separate file containing sequential messages with headers
  • Compression: S2 (Snappy variant) per-block, using IronSnappy or equivalent .NET library
  • Encryption: AES-GCM per-block (matching Go's encryption support)
  • Recovery: Block-level recovery on startup (scan for valid messages, rebuild index)

Components

  1. Block Manager — manages block files, rotation at 64MB, compaction
  2. Message Encoding — per-message header (sequence, timestamp, subject, data length) + payload
  3. Index Layer — in-memory index mapping sequence → block + offset
  4. Subject Index — per-subject first/last sequence tracking using SubjectTree (Wave 2)
  5. Purge/Compact — subject-based purge, sequence-based purge, compaction
  6. Recovery — startup block scanning, index rebuild

Go Reference Files

  • server/filestore.go — main implementation
  • server/filestore_test.go — test suite

Total tests: ~80 (store/load, block rotation, compression, encryption, purge, recovery, subject filtering)


Wave 4: RAFT Consensus

Purpose: Faithful behavioral port of Go's RAFT implementation for clustered JetStream.

Design Decisions

  • Faithful Go port — not a third-party RAFT library; port Go's raft.go directly
  • Same state machine semantics — leader election, log replication, snapshots, membership changes
  • Transport abstraction — pluggable transport (in-process for tests, TCP for production)

Components

  1. RAFT Node — state machine (Follower → Candidate → Leader), term/vote tracking
  2. Log Storage — append-only log with compaction, backed by FileStore blocks (Wave 3)
  3. Election — randomized timeout, RequestVote RPC, majority quorum
  4. Log Replication — AppendEntries RPC, leader → follower catch-up, conflict resolution
  5. Snapshots — periodic state snapshots, snapshot transfer to lagging followers
  6. Membership Changes — joint consensus for adding/removing nodes
  7. Transport — RPC abstraction with in-process and TCP implementations

Go Reference Files

  • server/raft.go — main implementation (5,800 lines)
  • server/raft_test.go — test suite

Total tests: ~70 (election, log replication, snapshots, membership, split-brain, network partition simulation)


Wave 5: JetStream Clustering + Concurrency

Purpose: Wire RAFT into JetStream for clustered operation; add NORACE concurrency tests.

Components

  1. Meta-Controller — cluster-wide RAFT group for stream/consumer placement

    • Ports Go's jetStreamCluster struct
    • Routes $JS.API.* requests through meta-group leader
    • Tests from Go's TestJetStreamClusterCreate, TestJetStreamClusterStreamLeaderStepDown
  2. Per-Stream RAFT Groups — each R>1 stream gets its own RAFT group

    • Leader accepts publishes, proposes entries, followers apply
    • Tests: create R3 stream, publish, verify all replicas, step down, verify new leader
  3. Per-Consumer RAFT Groups — consumer ack state replicated via RAFT

    • Tests: ack on leader, verify ack floor propagation, consumer failover
  4. NORACE Concurrency Suite — Go's -race-tagged tests ported to Task.WhenAll patterns

    • Concurrent pub/sub on same stream
    • Concurrent consumer creates
    • Concurrent stream purge during publish

Go Reference Files

  • server/jetstream_cluster.go, server/jetstream_cluster_test.go
  • server/norace_test.go

Total tests: ~100


Wave 6: Remaining Subsystem Test Suites

Purpose: Port remaining Go test functions across all subsystems not covered by Waves 2-5.

Subsystems

Subsystem Go Tests Existing .NET Gap Files
Config reload ~92 3 ~89 Configuration/
MQTT bridge ~123 50 ~73 Mqtt/
Leaf nodes ~110 2 ~108 LeafNodes/
Accounts/auth ~64 15 ~49 Accounts/
Gateway ~87 2 ~85 Gateways/
Routes ~73 2 ~71 Routes/
Monitoring ~45 7 ~38 Monitoring/
Client protocol ~120 30 ~90 root test dir
JetStream API ~200 20 ~180 JetStream/

Approach

  • Each subsystem is an independent parallel subagent task
  • Tests organized by .NET namespace matching existing conventions
  • Each test file has header comment mapping to Go source test function names
  • Self-contained test helpers duplicated per file (no shared TestHelpers)
  • Gate verification between subsystem batches

Total tests: ~780-850


Dependency Graph

Wave 1 (Scaffolding) ──┬──► Wave 2 (Data Structures) ──► Wave 3 (FileStore) ──► Wave 4 (RAFT) ──► Wave 5 (Clustering)
                        │
                        └──► Wave 6 (Subsystem Suites) [parallel, independent of Waves 2-5]

Wave 6 subsystems are mutually independent and can execute in parallel. Waves 2-5 are sequential.


Estimated Totals

Metric Value
New implementation code ~15,000-20,000 lines
New test code ~12,000-15,000 lines
New tests ~1,160
Final test count ~2,241
Final Go parity ~75% of Go test functions

Key Conventions

  • xUnit 3 + Shouldly assertions (never Assert.*)
  • NSubstitute for mocking
  • Go reference comments on each ported test: // Go: TestFunctionName server/file.go:line
  • Self-contained helpers per test file
  • C# 14 idioms: primary constructors, collection expressions, file-scoped namespaces
  • TDD: write failing test first, then minimal implementation
  • Gated commits between waves