docs: add full production parity implementation plan
25 tasks across 6 waves targeting ~1,415 new tests: - Wave 2: Internal data structures (AVL, SubjectTree, GSL, TimeHashWheel) - Wave 3: FileStore block engine with 160 tests - Wave 4: RAFT consensus (election, replication, snapshots, membership) - Wave 5: JetStream clustering + NORACE concurrency - Wave 6: Remaining subsystem test suites (config, MQTT, leaf, accounts, gateway, routes, monitoring, client, JetStream API/cluster)
This commit is contained in:
889
docs/plans/2026-02-24-full-production-parity-plan.md
Normal file
889
docs/plans/2026-02-24-full-production-parity-plan.md
Normal file
@@ -0,0 +1,889 @@
|
||||
# Full Production Parity Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Close all remaining implementation and test gaps between the Go NATS server and the .NET port, achieving full production parity.
|
||||
|
||||
**Architecture:** 6-wave slice-by-slice TDD, ordered by dependency. Waves 2-5 are sequential (each depends on prior wave's production code). Wave 6 subsystem suites are parallel and can begin alongside Wave 2.
|
||||
|
||||
**Tech Stack:** .NET 10 / C# 14, xUnit 3, Shouldly, NSubstitute, System.IO.Pipelines
|
||||
|
||||
---
|
||||
|
||||
## Task 0: Inventory and Scaffolding
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Internal/Avl/SequenceSet.cs`
|
||||
- Create: `src/NATS.Server/Internal/SubjectTree/SubjectTree.cs`
|
||||
- Create: `src/NATS.Server/Internal/Gsl/GenericSubjectList.cs`
|
||||
- Create: `src/NATS.Server/Internal/TimeHashWheel/HashWheel.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/IStreamStore.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/IConsumerStore.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/FileStore/FileStore.cs`
|
||||
- Create: `src/NATS.Server/Raft/IRaftNode.cs`
|
||||
- Create: `src/NATS.Server/Raft/RaftNode.cs`
|
||||
- Create: `src/NATS.Server/Raft/RaftState.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Internal/Avl/` (directory)
|
||||
- Create: `tests/NATS.Server.Tests/Internal/SubjectTree/` (directory)
|
||||
- Create: `tests/NATS.Server.Tests/Internal/Gsl/` (directory)
|
||||
- Create: `tests/NATS.Server.Tests/Internal/TimeHashWheel/` (directory)
|
||||
- Create: `tests/NATS.Server.Tests/Raft/` (directory)
|
||||
|
||||
**Step 1: Create namespace stub files with minimal type skeletons**
|
||||
|
||||
Each stub file contains the namespace declaration, a public class/interface with a `// TODO: Port from Go` comment, and the Go reference file path.
|
||||
|
||||
**Step 2: Create test directory structure**
|
||||
|
||||
```bash
|
||||
mkdir -p tests/NATS.Server.Tests/Internal/Avl
|
||||
mkdir -p tests/NATS.Server.Tests/Internal/SubjectTree
|
||||
mkdir -p tests/NATS.Server.Tests/Internal/Gsl
|
||||
mkdir -p tests/NATS.Server.Tests/Internal/TimeHashWheel
|
||||
mkdir -p tests/NATS.Server.Tests/Raft
|
||||
```
|
||||
|
||||
**Step 3: Verify build succeeds**
|
||||
|
||||
Run: `dotnet build NatsDotNet.slnx`
|
||||
Expected: Build succeeded, 0 errors
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add -A && git commit -m "feat: scaffold namespaces for data structures, FileStore, and RAFT"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 1: AVL Tree / SequenceSet
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Internal/Avl/SequenceSet.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Internal/Avl/SequenceSetTests.cs`
|
||||
- Test: `tests/NATS.Server.Tests/Internal/Avl/SequenceSetTests.cs`
|
||||
|
||||
**Go reference:** `server/avl/seqset.go` + `server/avl/seqset_test.go` (16 test functions)
|
||||
|
||||
**Public API to port:**
|
||||
|
||||
```csharp
|
||||
namespace NATS.Server.Internal.Avl;
|
||||
|
||||
public class SequenceSet
|
||||
{
|
||||
public void Insert(ulong seq);
|
||||
public bool Exists(ulong seq);
|
||||
public bool Delete(ulong seq);
|
||||
public void SetInitialMin(ulong min);
|
||||
public int Size { get; }
|
||||
public int Nodes { get; }
|
||||
public void Empty();
|
||||
public bool IsEmpty { get; }
|
||||
public void Range(Func<ulong, bool> callback);
|
||||
public (int Left, int Right) Heights();
|
||||
public (ulong Min, ulong Max, ulong Num) State();
|
||||
public (ulong Min, ulong Max) MinMax();
|
||||
public SequenceSet Clone();
|
||||
public void Union(params SequenceSet[] others);
|
||||
public static SequenceSet Union(params SequenceSet[] sets);
|
||||
public int EncodeLength();
|
||||
public byte[] Encode();
|
||||
public static (SequenceSet Set, int BytesRead) Decode(ReadOnlySpan<byte> buf);
|
||||
}
|
||||
```
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Port all 16 Go test functions:
|
||||
- `TestSeqSetBasics` → `Basics_InsertExistsDelete`
|
||||
- `TestSeqSetLeftLean` → `LeftLean_TreeBalancesCorrectly`
|
||||
- `TestSeqSetRightLean` → `RightLean_TreeBalancesCorrectly`
|
||||
- `TestSeqSetCorrectness` → `Correctness_RandomInsertDelete`
|
||||
- `TestSeqSetRange` → `Range_IteratesInOrder`
|
||||
- `TestSeqSetDelete` → `Delete_VariousPatterns`
|
||||
- `TestSeqSetInsertAndDeletePedantic` → `InsertAndDelete_PedanticVerification`
|
||||
- `TestSeqSetMinMax` → `MinMax_TracksCorrectly`
|
||||
- `TestSeqSetClone` → `Clone_IndependentCopy`
|
||||
- `TestSeqSetUnion` → `Union_MergesSets`
|
||||
- `TestSeqSetFirst` → `First_ReturnsMinimum`
|
||||
- `TestSeqSetDistinctUnion` → `DistinctUnion_NoOverlap`
|
||||
- `TestSeqSetDecodeV1` → `DecodeV1_BackwardsCompatible`
|
||||
- `TestNoRaceSeqSetSizeComparison` → `SizeComparison_LargeSet`
|
||||
- `TestNoRaceSeqSetEncodeLarge` → `EncodeLarge_RoundTrips`
|
||||
- `TestNoRaceSeqSetRelativeSpeed` → `RelativeSpeed_Performance`
|
||||
|
||||
**Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~SequenceSetTests" -v quiet`
|
||||
Expected: FAIL (types not implemented)
|
||||
|
||||
**Step 3: Implement SequenceSet**
|
||||
|
||||
Port the AVL tree from `server/avl/seqset.go`. Key implementation details:
|
||||
- Internal `Node` class with `Value`, `Height`, `Left`, `Right`
|
||||
- AVL rotations: `RotateLeft`, `RotateRight`, `Balance`
|
||||
- Run-length encoding for `Encode`/`Decode` (sequences compress into ranges)
|
||||
- The tree stores ranges `[min, max]` in each node, not individual values
|
||||
|
||||
**Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~SequenceSetTests" -v quiet`
|
||||
Expected: PASS (16 tests)
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/Internal/Avl/ tests/NATS.Server.Tests/Internal/Avl/
|
||||
git commit -m "feat: port AVL SequenceSet from Go with 16 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Subject Tree (ART)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Internal/SubjectTree/SubjectTree.cs`
|
||||
- Create: `src/NATS.Server/Internal/SubjectTree/Node.cs` (node4, node10, node16, node48, node256, leaf)
|
||||
- Create: `tests/NATS.Server.Tests/Internal/SubjectTree/SubjectTreeTests.cs`
|
||||
|
||||
**Go reference:** `server/stree/stree.go` + 8 node files + `server/stree/stree_test.go` (59 test functions)
|
||||
|
||||
**Public API to port:**
|
||||
|
||||
```csharp
|
||||
namespace NATS.Server.Internal.SubjectTree;
|
||||
|
||||
public class SubjectTree<T>
|
||||
{
|
||||
public int Size { get; }
|
||||
public SubjectTree<T> Empty();
|
||||
public (T? Value, bool Existed) Insert(ReadOnlySpan<byte> subject, T value);
|
||||
public (T? Value, bool Found) Find(ReadOnlySpan<byte> subject);
|
||||
public (T? Value, bool Found) Delete(ReadOnlySpan<byte> subject);
|
||||
public void Match(ReadOnlySpan<byte> filter, Action<byte[], T> callback);
|
||||
public bool MatchUntil(ReadOnlySpan<byte> filter, Func<byte[], T, bool> callback);
|
||||
public bool IterOrdered(Func<byte[], T, bool> callback);
|
||||
public bool IterFast(Func<byte[], T, bool> callback);
|
||||
}
|
||||
```
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Port all 59 Go test functions. Key groupings:
|
||||
- Basic CRUD: `TestSubjectTreeBasics`, `TestSubjectTreeNoPrefix`, `TestSubjectTreeEmpty` → 5 tests
|
||||
- Node growth/shrink: `TestSubjectTreeNodeGrow`, `TestNode256Operations`, `TestNode256Shrink` → 8 tests
|
||||
- Matching: `TestSubjectTreeMatchLeafOnly`, `TestSubjectTreeMatchNodes`, `TestSubjectTreeMatchUntil` + 10 more → 15 tests
|
||||
- Iteration: `TestSubjectTreeIterOrdered`, `TestSubjectTreeIterFast` + edge cases → 8 tests
|
||||
- Delete: `TestSubjectTreeNodeDelete` + edge cases → 6 tests
|
||||
- Intersection: `TestSubjectTreeLazyIntersect`, `TestSubjectTreeGSLIntersection` → 3 tests
|
||||
- Edge cases and bug regression: remaining 14 tests
|
||||
|
||||
**Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~SubjectTreeTests" -v quiet`
|
||||
Expected: FAIL
|
||||
|
||||
**Step 3: Implement SubjectTree**
|
||||
|
||||
Port the Adaptive Radix Tree from `server/stree/`. Key implementation:
|
||||
- 5 node types: `Node4`, `Node10`, `Node16`, `Node48`, `Node256` (capacity-tiered)
|
||||
- Generic `Leaf<T>` for values
|
||||
- `Parts` helper for subject tokenization (split on `.`)
|
||||
- `MatchParts` for wildcard matching (`*` single, `>` multi)
|
||||
- Node interface: `AddChild`, `FindChild`, `DeleteChild`, `IsFull`, `Grow`, `Shrink`, `Iter`
|
||||
|
||||
**Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~SubjectTreeTests" -v quiet`
|
||||
Expected: PASS (59 tests)
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/Internal/SubjectTree/ tests/NATS.Server.Tests/Internal/SubjectTree/
|
||||
git commit -m "feat: port ART SubjectTree from Go with 59 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Generic Subject List (GSL)
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Internal/Gsl/GenericSubjectList.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Internal/Gsl/GenericSubjectListTests.cs`
|
||||
|
||||
**Go reference:** `server/gsl/gsl.go` + `server/gsl/gsl_test.go` (21 test functions)
|
||||
|
||||
**Public API to port:**
|
||||
|
||||
```csharp
|
||||
namespace NATS.Server.Internal.Gsl;
|
||||
|
||||
public class GenericSubjectList<T> where T : IEquatable<T>
|
||||
{
|
||||
public void Insert(string subject, T value);
|
||||
public void Remove(string subject, T value);
|
||||
public void Match(string subject, Action<T> callback);
|
||||
public void MatchBytes(ReadOnlySpan<byte> subject, Action<T> callback);
|
||||
public bool HasInterest(string subject);
|
||||
public int NumInterest(string subject);
|
||||
public bool HasInterestStartingIn(string subject);
|
||||
public uint Count { get; }
|
||||
}
|
||||
|
||||
public class SimpleSubjectList : GenericSubjectList<int> { }
|
||||
```
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Port all 21 Go test functions:
|
||||
- Init/count: `TestGenericSublistInit`, `TestGenericSublistInsertCount`
|
||||
- Matching: `TestGenericSublistSimple` through `TestGenericSublistFullWildcard` (5 tests)
|
||||
- Remove: `TestGenericSublistRemove` through `TestGenericSublistRemoveCleanupWildcards` (4 tests)
|
||||
- Invalid subjects: `TestGenericSublistInvalidSubjectsInsert`, `TestGenericSublistBadSubjectOnRemove`
|
||||
- Edge cases: `TestGenericSublistTwoTokenPubMatchSingleTokenSub` through `TestGenericSublistMatchWithEmptyTokens` (3 tests)
|
||||
- Interest: `TestGenericSublistHasInterest` through `TestGenericSublistNumInterest` (4 tests)
|
||||
|
||||
**Step 2: Run tests to verify they fail**
|
||||
|
||||
**Step 3: Implement GenericSubjectList**
|
||||
|
||||
Trie-based subject matcher with locking (`ReaderWriterLockSlim`):
|
||||
- Internal `Node<T>` with `Psubs` (plain), `Qsubs` (queue), `Children` level map
|
||||
- `*` and `>` wildcard child pointers
|
||||
- Thread-safe via `ReaderWriterLockSlim`
|
||||
|
||||
**Step 4: Run tests to verify they pass**
|
||||
|
||||
Expected: PASS (21 tests)
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/Internal/Gsl/ tests/NATS.Server.Tests/Internal/Gsl/
|
||||
git commit -m "feat: port GenericSubjectList from Go with 21 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Time Hash Wheel
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Internal/TimeHashWheel/HashWheel.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Internal/TimeHashWheel/HashWheelTests.cs`
|
||||
|
||||
**Go reference:** `server/thw/thw.go` + `server/thw/thw_test.go` (8 test functions)
|
||||
|
||||
**Public API to port:**
|
||||
|
||||
```csharp
|
||||
namespace NATS.Server.Internal.TimeHashWheel;
|
||||
|
||||
public class HashWheel
|
||||
{
|
||||
public void Add(ulong seq, long expires);
|
||||
public bool Remove(ulong seq, long expires);
|
||||
public void Update(ulong seq, long oldExpires, long newExpires);
|
||||
public void ExpireTasks(Func<ulong, long, bool> callback);
|
||||
public long GetNextExpiration(long before);
|
||||
public ulong Count { get; }
|
||||
public byte[] Encode(ulong highSeq);
|
||||
public (ulong HighSeq, int BytesRead) Decode(ReadOnlySpan<byte> buf);
|
||||
}
|
||||
```
|
||||
|
||||
**Step 1: Write failing tests**
|
||||
|
||||
Port all 8 Go test functions:
|
||||
- `TestHashWheelBasics` → `Basics_AddRemoveCount`
|
||||
- `TestHashWheelUpdate` → `Update_ChangesExpiration`
|
||||
- `TestHashWheelExpiration` → `Expiration_FiresCallbackForExpired`
|
||||
- `TestHashWheelManualExpiration` → `ManualExpiration_SpecificTime`
|
||||
- `TestHashWheelExpirationLargerThanWheel` → `LargerThanWheel_HandlesWrapAround`
|
||||
- `TestHashWheelNextExpiration` → `NextExpiration_FindsEarliest`
|
||||
- `TestHashWheelStress` → `Stress_ConcurrentAddRemove`
|
||||
- `TestHashWheelEncodeDecode` → `EncodeDecode_RoundTrips`
|
||||
|
||||
**Step 2: Run tests to verify they fail**
|
||||
|
||||
**Step 3: Implement HashWheel**
|
||||
|
||||
Fixed-size array of slots. Each slot is a linked list of `(seq, expires)` entries:
|
||||
- Slot index = `(expires / tickResolution) % wheelSize`
|
||||
- `ExpireTasks` scans all slots whose time has passed
|
||||
- Encode/Decode for persistence (used by FileStore TTL state)
|
||||
|
||||
**Step 4: Run tests to verify they pass**
|
||||
|
||||
Expected: PASS (8 tests)
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/Internal/TimeHashWheel/ tests/NATS.Server.Tests/Internal/TimeHashWheel/
|
||||
git commit -m "feat: port TimeHashWheel from Go with 8 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: StreamStore / ConsumerStore Interfaces
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/JetStream/Storage/IStreamStore.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/IConsumerStore.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/StoreMsg.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/StreamState.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/ConsumerState.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/FileStoreConfig.cs`
|
||||
|
||||
**Go reference:** `server/store.go` (interfaces and types)
|
||||
|
||||
**Step 1: Define interfaces and value types**
|
||||
|
||||
Port `StreamStore`, `ConsumerStore` interfaces and all supporting types (`StoreMsg`, `StreamState`, `SimpleState`, `ConsumerState`, `FileStoreConfig`, `StoreCipher`, `StoreCompression`).
|
||||
|
||||
**Step 2: Verify build**
|
||||
|
||||
Run: `dotnet build NatsDotNet.slnx`
|
||||
Expected: Build succeeded
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/JetStream/Storage/
|
||||
git commit -m "feat: define StreamStore and ConsumerStore interfaces from Go store.go"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: FileStore Block Engine — Core
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/JetStream/Storage/FileStore/FileStore.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/FileStore/MessageBlock.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Storage/FileStore/BlockCache.cs`
|
||||
- Create: `tests/NATS.Server.Tests/JetStream/Storage/FileStoreTests.cs`
|
||||
|
||||
**Go reference:** `server/filestore.go` + `server/filestore_test.go` (200 test functions)
|
||||
|
||||
This is the largest single task. Split into sub-tasks:
|
||||
|
||||
### Task 6a: Basic CRUD (store/load/remove)
|
||||
Port: `TestFileStoreBasics`, `TestFileStoreMsgHeaders`, `TestFileStoreBasicWriteMsgsAndRestore`, `TestFileStoreWriteAndReadSameBlock`, `TestFileStoreAndRetrieveMultiBlock`
|
||||
Tests: ~15
|
||||
|
||||
### Task 6b: Limits Enforcement
|
||||
Port: `TestFileStoreMsgLimit`, `TestFileStoreBytesLimit`, `TestFileStoreAgeLimit`, `TestFileStoreMaxMsgsPerSubject` and variants
|
||||
Tests: ~20
|
||||
|
||||
### Task 6c: Purge / Compact / Truncate
|
||||
Port: `TestFileStorePurge`, `TestFileStoreCompact`, `TestFileStoreSparseCompaction`, `TestFileStorePurgeExWithSubject`, `TestFileStoreStreamTruncate` and variants
|
||||
Tests: ~25
|
||||
|
||||
### Task 6d: Recovery
|
||||
Port: `TestFileStoreAgeLimitRecovery`, `TestFileStoreRemovePartialRecovery`, `TestFileStoreFullStateBasics`, `TestFileStoreRecoverWithRemovesAndNoIndexDB` and variants
|
||||
Tests: ~20
|
||||
|
||||
### Task 6e: Subject Filtering
|
||||
Port: `TestFileStoreSubjectsTotals`, `TestFileStoreMultiLastSeqs`, `TestFileStoreLoadLastWildcard`, `TestFileStoreNumPendingMulti` and variants
|
||||
Tests: ~15
|
||||
|
||||
### Task 6f: Encryption
|
||||
Port: `TestFileStoreEncrypted`, `TestFileStoreRestoreEncryptedWithNoKeyFuncFails`, `TestFileStoreDoubleCompactWithWriteInBetweenEncryptedBug` and variants
|
||||
Tests: ~10
|
||||
|
||||
### Task 6g: Compression and TTL
|
||||
Port: `TestFileStoreMessageTTL`, `TestFileStoreMessageTTLRestart`, `TestFileStoreMessageSchedule`, `TestFileStoreCompressionAfterTruncate` and variants
|
||||
Tests: ~15
|
||||
|
||||
### Task 6h: Skip Messages and Consumer Store
|
||||
Port: `TestFileStoreSkipMsg`, `TestFileStoreSkipMsgs`, `TestFileStoreConsumer`, `TestFileStoreConsumerEncodeDecodeRedelivered`
|
||||
Tests: ~15
|
||||
|
||||
### Task 6i: Corruption and Edge Cases
|
||||
Port: `TestFileStoreBitRot`, `TestFileStoreSubjectCorruption`, `TestFileStoreWriteFullStateDetectCorruptState` and remaining edge case tests
|
||||
Tests: ~15
|
||||
|
||||
### Task 6j: Performance Tests
|
||||
Port: `TestFileStorePerf`, `TestFileStoreCompactPerf`, `TestFileStoreFetchPerf`, `TestFileStoreReadBackMsgPerf`
|
||||
Tests: ~10
|
||||
|
||||
**Total FileStore tests: ~160** (of 200 Go tests; remaining 40 are deep internal tests that depend on Go-specific internals)
|
||||
|
||||
**Commit after each sub-task passes:**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: FileStore [sub-task] with N tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 7: RAFT Consensus — Core Types
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Raft/IRaftNode.cs`
|
||||
- Create: `src/NATS.Server/Raft/RaftState.cs`
|
||||
- Create: `src/NATS.Server/Raft/RaftEntry.cs`
|
||||
- Create: `src/NATS.Server/Raft/RaftMessages.cs` (VoteRequest, VoteResponse, AppendEntry, AppendEntryResponse)
|
||||
- Create: `src/NATS.Server/Raft/Peer.cs`
|
||||
- Create: `src/NATS.Server/Raft/CommittedEntry.cs`
|
||||
|
||||
**Go reference:** `server/raft.go` lines 1-200 (types and interfaces)
|
||||
|
||||
**Step 1: Define types**
|
||||
|
||||
```csharp
|
||||
namespace NATS.Server.Raft;
|
||||
|
||||
public enum RaftState : byte
|
||||
{
|
||||
Follower = 0,
|
||||
Leader = 1, // Note: Go ordering — Leader before Candidate
|
||||
Candidate = 2,
|
||||
Closed = 3
|
||||
}
|
||||
|
||||
public enum EntryType : byte
|
||||
{
|
||||
Normal = 0,
|
||||
OldSnapshot = 1,
|
||||
PeerState = 2,
|
||||
AddPeer = 3,
|
||||
RemovePeer = 4,
|
||||
LeaderTransfer = 5,
|
||||
Snapshot = 6,
|
||||
Catchup = 7 // internal only
|
||||
}
|
||||
|
||||
public record struct Peer(string Id, bool Current, DateTime Last, ulong Lag);
|
||||
```
|
||||
|
||||
**Step 2: Verify build**
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/NATS.Server/Raft/
|
||||
git commit -m "feat: define RAFT core types from Go raft.go"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 8: RAFT Consensus — Wire Format
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Raft/RaftWireFormat.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Raft/RaftWireFormatTests.cs`
|
||||
|
||||
**Go reference:** `server/raft.go` encoding/decoding sections, `raft_test.go`:
|
||||
- `TestNRGAppendEntryEncode`, `TestNRGAppendEntryDecode`, `TestNRGVoteResponseEncoding`
|
||||
|
||||
**Step 1: Write failing tests for encode/decode**
|
||||
|
||||
Port the 3 wire format tests plus additional round-trip tests (~10 tests total).
|
||||
|
||||
Wire format details:
|
||||
- All little-endian binary (`BinaryPrimitives`)
|
||||
- Node IDs: exactly 8 bytes
|
||||
- VoteRequest: 32 bytes (3 × uint64 + 8-byte candidateId)
|
||||
- VoteResponse: 17 bytes (uint64 term + 8-byte peerId + 1 byte flags)
|
||||
- AppendEntry: 42-byte header + entries
|
||||
- AppendEntryResponse: 25 bytes
|
||||
|
||||
**Step 2: Implement wire format encode/decode**
|
||||
|
||||
**Step 3: Run tests, verify pass**
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: RAFT wire format encode/decode with 10 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 9: RAFT Consensus — Election
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/Raft/RaftNode.cs` (state machine core)
|
||||
- Create: `src/NATS.Server/Raft/IRaftTransport.cs`
|
||||
- Create: `src/NATS.Server/Raft/InProcessTransport.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Raft/RaftElectionTests.cs`
|
||||
|
||||
**Go reference:** `raft_test.go` election tests (~25 functions)
|
||||
|
||||
Port tests:
|
||||
- `TestNRGSimpleElection` → single/3/5 node elections
|
||||
- `TestNRGSingleNodeElection` → single node auto-leader
|
||||
- `TestNRGLeaderTransfer` → leadership transfer
|
||||
- `TestNRGInlineStepdown` → step-down
|
||||
- `TestNRGObserverMode` → observer doesn't vote
|
||||
- `TestNRGCandidateDoesntRevertTermAfterOldAE`
|
||||
- `TestNRGAssumeHighTermAfterCandidateIsolation`
|
||||
- `TestNRGHeartbeatOnLeaderChange`
|
||||
- `TestNRGElectionTimerAfterObserver`
|
||||
- `TestNRGUnsuccessfulVoteRequestDoesntResetElectionTimer`
|
||||
- `TestNRGStepDownOnSameTermDoesntClearVote`
|
||||
- `TestNRGMustNotResetVoteOnStepDownOrLeaderTransfer`
|
||||
- And more...
|
||||
|
||||
Tests: ~25
|
||||
|
||||
**Commit:**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: RAFT election with 25 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 10: RAFT Consensus — Log Replication
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/NATS.Server/Raft/RaftNode.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Raft/RaftLogReplicationTests.cs`
|
||||
|
||||
**Go reference:** `raft_test.go` log replication and catchup tests (~30 functions)
|
||||
|
||||
Port tests:
|
||||
- `TestNRGSimple` → basic propose and commit
|
||||
- `TestNRGAEFromOldLeader` → reject old leader entries
|
||||
- `TestNRGWALEntryWithoutQuorumMustTruncate`
|
||||
- `TestNRGCatchupDoesNotTruncateUncommittedEntriesWithQuorum`
|
||||
- `TestNRGCatchupCanTruncateMultipleEntriesWithoutQuorum`
|
||||
- `TestNRGSimpleCatchup` → follower catches up
|
||||
- `TestNRGChainOfBlocksRunInLockstep`
|
||||
- `TestNRGChainOfBlocksStopAndCatchUp`
|
||||
- `TestNRGAppendEntryCanEstablishQuorumAfterLeaderChange`
|
||||
- `TestNRGQuorumAccounting`
|
||||
- And more...
|
||||
|
||||
Tests: ~30
|
||||
|
||||
**Commit:**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: RAFT log replication with 30 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 11: RAFT Consensus — Snapshots and Membership
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/NATS.Server/Raft/RaftNode.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Raft/RaftSnapshotTests.cs`
|
||||
- Create: `tests/NATS.Server.Tests/Raft/RaftMembershipTests.cs`
|
||||
|
||||
**Go reference:** `raft_test.go` snapshot + membership tests (~35 functions)
|
||||
|
||||
Port tests:
|
||||
- Snapshots: `TestNRGSnapshotAndRestart`, `TestNRGSnapshotCatchup`, `TestNRGSnapshotRecovery`, `TestNRGDontRemoveSnapshotIfTruncateToApplied`, `TestNRGInstallSnapshotFromCheckpoint`, `TestNRGInstallSnapshotForce`, etc.
|
||||
- Membership: `TestNRGProposeRemovePeer`, `TestNRGProposeRemovePeerConcurrent`, `TestNRGAddPeers`, `TestNRGDisjointMajorities`, `TestNRGLeaderResurrectsRemovedPeers`, etc.
|
||||
|
||||
Tests: ~35
|
||||
|
||||
**Commit:**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: RAFT snapshots and membership with 35 tests"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 12: JetStream Clustering — Meta Controller
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/JetStream/Cluster/JetStreamCluster.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Cluster/MetaController.cs`
|
||||
- Create: `tests/NATS.Server.Tests/JetStream/Cluster/MetaControllerTests.cs`
|
||||
|
||||
**Go reference:** `server/jetstream_cluster.go`, `server/jetstream_cluster_1_test.go`
|
||||
|
||||
Port: Stream/consumer placement, `$JS.API.*` routing through meta leader, cluster formation.
|
||||
|
||||
Tests: ~30
|
||||
|
||||
---
|
||||
|
||||
## Task 13: JetStream Clustering — Per-Stream/Consumer RAFT
|
||||
|
||||
**Files:**
|
||||
- Create: `src/NATS.Server/JetStream/Cluster/StreamRaftGroup.cs`
|
||||
- Create: `src/NATS.Server/JetStream/Cluster/ConsumerRaftGroup.cs`
|
||||
- Create: `tests/NATS.Server.Tests/JetStream/Cluster/StreamRaftGroupTests.cs`
|
||||
- Create: `tests/NATS.Server.Tests/JetStream/Cluster/ConsumerRaftGroupTests.cs`
|
||||
|
||||
**Go reference:** `server/stream.go` (raft field), `server/consumer.go` (raft field), `jetstream_cluster_*_test.go`
|
||||
|
||||
Port: Per-stream/consumer RAFT groups, leader/follower replication, failover.
|
||||
|
||||
Tests: ~40
|
||||
|
||||
---
|
||||
|
||||
## Task 14: NORACE Concurrency Suite
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/NATS.Server.Tests/Concurrency/ConcurrencyTests.cs`
|
||||
|
||||
**Go reference:** `server/norace_1_test.go` (100), `server/norace_2_test.go` (41)
|
||||
|
||||
Port a representative subset of Go's `-race` tests using `Task.WhenAll` patterns:
|
||||
- Concurrent publish/subscribe on same stream
|
||||
- Concurrent consumer creates/deletes
|
||||
- Concurrent stream purge during publish
|
||||
- Concurrent RAFT proposals
|
||||
|
||||
Tests: ~30 (representative subset of 141 Go tests; full NORACE suite is deeply coupled to Go runtime internals)
|
||||
|
||||
---
|
||||
|
||||
## Task 15: Config Reload Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Configuration/ConfigReloadParityTests.cs`
|
||||
|
||||
**Go reference:** `server/reload_test.go` (73 test functions)
|
||||
|
||||
Port remaining ~70 config reload tests (3 already exist). Key areas:
|
||||
- Max connections, payload, subscriptions reload
|
||||
- Auth changes (user/pass, token, NKey)
|
||||
- TLS reload
|
||||
- Cluster config reload
|
||||
- JetStream config reload
|
||||
- Preserve existing connections during reload
|
||||
|
||||
Tests: ~70
|
||||
|
||||
**Commit:**
|
||||
|
||||
```bash
|
||||
git commit -m "feat: port 70 config reload tests from Go"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 16: MQTT Bridge Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Mqtt/` (existing files)
|
||||
- Create: additional MQTT test files as needed
|
||||
|
||||
**Go reference:** `server/mqtt_test.go` (123 test functions)
|
||||
|
||||
Port remaining ~73 MQTT tests (50 already exist). Key areas:
|
||||
- Topic mapping (MQTT topics → NATS subjects)
|
||||
- Retained messages
|
||||
- Will messages
|
||||
- MQTT-over-WebSocket
|
||||
- QoS 2 semantics (if supported)
|
||||
- MQTT 5.0 properties
|
||||
|
||||
Tests: ~73
|
||||
|
||||
---
|
||||
|
||||
## Task 17: Leaf Node Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/LeafNodes/`
|
||||
- Create: additional leaf node test files
|
||||
|
||||
**Go reference:** `server/leafnode_test.go` (110 test functions)
|
||||
|
||||
Port remaining ~108 leaf node tests (2 exist). Key areas:
|
||||
- Hub-spoke forwarding
|
||||
- Subject filter propagation
|
||||
- Loop detection (`$LDS.` prefix)
|
||||
- Auth on leaf connections
|
||||
- Reconnection
|
||||
- JetStream over leaf nodes
|
||||
|
||||
Tests: ~108
|
||||
|
||||
---
|
||||
|
||||
## Task 18: Accounts/Auth Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Accounts/`
|
||||
|
||||
**Go reference:** `server/accounts_test.go` (64 test functions)
|
||||
|
||||
Port remaining ~49 account tests (15 exist). Key areas:
|
||||
- Export/import between accounts
|
||||
- Service latency tracking
|
||||
- Account limits (connections, payload, subscriptions)
|
||||
- System account operations
|
||||
- Account revocations
|
||||
|
||||
Tests: ~49
|
||||
|
||||
---
|
||||
|
||||
## Task 19: Gateway Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Gateways/`
|
||||
|
||||
**Go reference:** `server/gateway_test.go` (88 test functions)
|
||||
|
||||
Port remaining ~86 gateway tests (2 exist). Key areas:
|
||||
- Interest-only mode optimization
|
||||
- Reply subject mapping (`_GR_.` prefix)
|
||||
- Gateway reconnection
|
||||
- Cross-cluster pub/sub
|
||||
- Gateway auth
|
||||
|
||||
Tests: ~86
|
||||
|
||||
---
|
||||
|
||||
## Task 20: Route Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Routes/`
|
||||
|
||||
**Go reference:** `server/routes_test.go` (70 test functions)
|
||||
|
||||
Port remaining ~68 route tests (2 exist). Key areas:
|
||||
- Route pooling (default 3 connections per peer)
|
||||
- Account-specific dedicated routes
|
||||
- `RS+`/`RS-` subscribe propagation
|
||||
- `RMSG` routed messages
|
||||
- Route reconnection
|
||||
- Cluster gossip
|
||||
|
||||
Tests: ~68
|
||||
|
||||
---
|
||||
|
||||
## Task 21: Monitoring Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/Monitoring/`
|
||||
|
||||
**Go reference:** `server/monitor_test.go` (100 test functions)
|
||||
|
||||
Port remaining ~93 monitoring tests (7 exist). Key areas:
|
||||
- `/varz` — server info, memory, connections, messages
|
||||
- `/connz` — connection listing, sorting, filtering
|
||||
- `/routez` — route information
|
||||
- `/gatewayz` — gateway information
|
||||
- `/subsz` — subscription statistics
|
||||
- `/jsz` — JetStream statistics
|
||||
- `/healthz` — health check
|
||||
- `/accountz` — account information
|
||||
|
||||
Tests: ~93
|
||||
|
||||
---
|
||||
|
||||
## Task 22: Client Protocol Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/ClientTests.cs` and related files
|
||||
|
||||
**Go reference:** `server/client_test.go` (82 test functions)
|
||||
|
||||
Port remaining ~52 client protocol tests (30 exist). Key areas:
|
||||
- Max payload enforcement
|
||||
- Slow consumer detection and eviction
|
||||
- Permission violations
|
||||
- Connection info parsing
|
||||
- Buffer management
|
||||
- Verbose mode
|
||||
- Pedantic mode
|
||||
|
||||
Tests: ~52
|
||||
|
||||
---
|
||||
|
||||
## Task 23: JetStream API Tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/NATS.Server.Tests/JetStream/` (multiple files)
|
||||
|
||||
**Go reference:** `server/jetstream_test.go` (312 test functions)
|
||||
|
||||
Port remaining ~292 JetStream API tests (20 exist). Key areas:
|
||||
- Stream CRUD lifecycle
|
||||
- Consumer CRUD lifecycle
|
||||
- Publish acknowledgment and dedup
|
||||
- Consumer delivery semantics (push, pull, deliver policies)
|
||||
- Retention policies (limits, interest, work queue)
|
||||
- Mirror and source streams
|
||||
- Subject transforms
|
||||
- Direct get API
|
||||
- Stream purge variants
|
||||
- Consumer pause/resume
|
||||
|
||||
Tests: ~292 (split across multiple test files by area)
|
||||
|
||||
---
|
||||
|
||||
## Task 24: JetStream Cluster Tests
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/NATS.Server.Tests/JetStream/Cluster/` (multiple files)
|
||||
|
||||
**Go reference:** `server/jetstream_cluster_1_test.go` (151), `_2_test.go` (123), `_3_test.go` (97), `_4_test.go` (85), `_long_test.go` (7) — total 463
|
||||
|
||||
Port a representative subset (~100 tests). Many of these require full RAFT + clustering (Waves 4-5). Key areas:
|
||||
- Clustered stream create/delete
|
||||
- Leader election and step-down
|
||||
- Consumer failover
|
||||
- R1/R3 replication
|
||||
- Cross-cluster JetStream
|
||||
- Snapshot/restore in cluster
|
||||
|
||||
Tests: ~100 (of 463; remaining require deep cluster integration)
|
||||
|
||||
---
|
||||
|
||||
## Wave Gate Verification
|
||||
|
||||
After each wave completes, run the full test suite:
|
||||
|
||||
```bash
|
||||
dotnet test NatsDotNet.slnx --nologo -v quiet
|
||||
```
|
||||
|
||||
Verify: 0 failures, test count increased by expected amount.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Task | Wave | Description | Tests |
|
||||
|------|------|-------------|-------|
|
||||
| 0 | 1 | Scaffolding | 0 |
|
||||
| 1 | 2 | AVL SequenceSet | 16 |
|
||||
| 2 | 2 | Subject Tree (ART) | 59 |
|
||||
| 3 | 2 | Generic Subject List | 21 |
|
||||
| 4 | 2 | Time Hash Wheel | 8 |
|
||||
| 5 | 3 | StreamStore interfaces | 0 |
|
||||
| 6 | 3 | FileStore block engine | 160 |
|
||||
| 7 | 4 | RAFT core types | 0 |
|
||||
| 8 | 4 | RAFT wire format | 10 |
|
||||
| 9 | 4 | RAFT election | 25 |
|
||||
| 10 | 4 | RAFT log replication | 30 |
|
||||
| 11 | 4 | RAFT snapshots + membership | 35 |
|
||||
| 12 | 5 | JetStream meta controller | 30 |
|
||||
| 13 | 5 | JetStream per-stream/consumer RAFT | 40 |
|
||||
| 14 | 5 | NORACE concurrency | 30 |
|
||||
| 15 | 6 | Config reload | 70 |
|
||||
| 16 | 6 | MQTT bridge | 73 |
|
||||
| 17 | 6 | Leaf nodes | 108 |
|
||||
| 18 | 6 | Accounts/auth | 49 |
|
||||
| 19 | 6 | Gateway | 86 |
|
||||
| 20 | 6 | Routes | 68 |
|
||||
| 21 | 6 | Monitoring | 93 |
|
||||
| 22 | 6 | Client protocol | 52 |
|
||||
| 23 | 6 | JetStream API | 292 |
|
||||
| 24 | 6 | JetStream cluster | 100 |
|
||||
| **Total** | | | **~1,415** |
|
||||
|
||||
**Expected final test count:** 1,081 + 1,415 = **~2,496 tests**
|
||||
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"planPath": "docs/plans/2026-02-24-full-production-parity-plan.md",
|
||||
"tasks": [
|
||||
{"id": 39, "subject": "Task 0: Inventory and Scaffolding", "status": "pending"},
|
||||
{"id": 40, "subject": "Task 1: AVL Tree / SequenceSet (16 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 41, "subject": "Task 2: Subject Tree ART (59 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 42, "subject": "Task 3: Generic Subject List (21 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 43, "subject": "Task 4: Time Hash Wheel (8 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 44, "subject": "Task 5: StreamStore/ConsumerStore Interfaces", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 45, "subject": "Task 6: FileStore Block Engine (160 tests)", "status": "pending", "blockedBy": [40, 41, 42, 43, 44]},
|
||||
{"id": 46, "subject": "Task 7: RAFT Core Types", "status": "pending", "blockedBy": [45]},
|
||||
{"id": 47, "subject": "Task 8: RAFT Wire Format (10 tests)", "status": "pending", "blockedBy": [46]},
|
||||
{"id": 48, "subject": "Task 9: RAFT Election (25 tests)", "status": "pending", "blockedBy": [47]},
|
||||
{"id": 49, "subject": "Task 10: RAFT Log Replication (30 tests)", "status": "pending", "blockedBy": [48]},
|
||||
{"id": 50, "subject": "Task 11: RAFT Snapshots + Membership (35 tests)", "status": "pending", "blockedBy": [49]},
|
||||
{"id": 51, "subject": "Task 12: JetStream Meta Controller (30 tests)", "status": "pending", "blockedBy": [50]},
|
||||
{"id": 52, "subject": "Task 13: Per-Stream/Consumer RAFT Groups (40 tests)", "status": "pending", "blockedBy": [51]},
|
||||
{"id": 53, "subject": "Task 14: NORACE Concurrency Suite (30 tests)", "status": "pending", "blockedBy": [52]},
|
||||
{"id": 54, "subject": "Task 15: Config Reload Tests (70 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 55, "subject": "Task 16: MQTT Bridge Tests (73 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 56, "subject": "Task 17: Leaf Node Tests (108 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 57, "subject": "Task 18: Accounts/Auth Tests (49 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 58, "subject": "Task 19: Gateway Tests (86 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 59, "subject": "Task 20: Route Tests (68 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 60, "subject": "Task 21: Monitoring Tests (93 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 61, "subject": "Task 22: Client Protocol Tests (52 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 62, "subject": "Task 23: JetStream API Tests (292 tests)", "status": "pending", "blockedBy": [39]},
|
||||
{"id": 63, "subject": "Task 24: JetStream Cluster Tests (100 tests)", "status": "pending", "blockedBy": [39]}
|
||||
],
|
||||
"lastUpdated": "2026-02-24T00:00:00Z"
|
||||
}
|
||||
Reference in New Issue
Block a user