Files
natsdotnet/docs/plans/2026-02-26-remaining-gaps-plan.md
Joseph Doherty 9ac29fc6f5 docs: add 93-gap implementation plan (8 phases, Tasks 1-93)
Bottom-up dependency ordering: FileStore/RAFT → Cluster/API → Consumer/Stream → Client/MQTT → Config/Gateway → Route/LeafNode → Account → Monitoring/WebSocket. Full test suite every 2 phases.
2026-02-25 07:47:11 -05:00

64 KiB

Remaining Gap Closure Implementation Plan (93 Gaps, 8 Phases)

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Close all 93 remaining implementation gaps between the Go NATS server and the .NET port, completing the feature-complete parity across FileStore, RAFT, JetStream Cluster, API, Consumer, Stream, Client, MQTT, Config, Gateway, Route, LeafNode, Account, Monitoring, and WebSocket subsystems.

Architecture: Bottom-up dependency approach — Phase 1 builds storage durability and RAFT completion, Phase 2 adds cluster coordination and API layer, Phase 3 adds consumer engines and stream lifecycle, Phase 4 adds client protocol and MQTT, Phase 5 adds config reload and gateway, Phase 6 adds route clustering and leaf nodes, Phase 7 adds account management, Phase 8 adds monitoring and WebSocket.

Tech Stack: .NET 10 / C# 14, xUnit 3, Shouldly, NSubstitute, System.IO.Hashing (XxHash64), System.IO.Pipelines, IronSnappy (S2), ChaCha20-Poly1305/AES-GCM, SQLite (test parity DB)

Test strategy: Only run targeted unit tests during implementation (dotnet test --filter). Run full test suite every 2 phases (after Phase 2, 4, 6, 8). Update docs/test_parity.db per phase.

Parity DB update pattern:

sqlite3 docs/test_parity.db "UPDATE go_tests SET status='mapped', dotnet_test='DotNetTestName', dotnet_file='TestFile.cs' WHERE go_test='GoTestName';"
sqlite3 docs/test_parity.db "INSERT INTO dotnet_tests (test_name, test_file, category) VALUES ('TestName', 'TestFile.cs', 'category');"

Phase 1: FileStore Durability + RAFT Completion (11 gaps)

Dependencies: None — pure infrastructure Exit gate: Checksums validated on read, atomic writes verified, tombstones persisted/recovered, cache expires correctly, filtered queries use SubjectTree, RAFT streams snapshots in chunks, transfers leadership, compacts with policies, checks quorum, jitters elections


Task 1: FileStore Checksum Validation (Gap 1.5)

Add per-block last-checksum tracking and read-path validation using existing XxHash64 in MessageRecord.

Files:

  • Modify: src/NATS.Server/JetStream/Storage/MsgBlock.cs:25-44 (add _lastChecksum field)
  • Modify: src/NATS.Server/JetStream/Storage/MsgBlock.cs:298 (Read method — add validation)
  • Modify: src/NATS.Server/JetStream/Storage/MessageRecord.cs:115 (Decode — expose checksum)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/FileStoreChecksumTests.cs (create)
  • Go ref: filestore.go:2204 (lastChecksum), filestore.go:8180 (validation in msgFromBufEx)

Step 1: Write failing tests

Create tests/NATS.Server.Tests/JetStream/Storage/FileStoreChecksumTests.cs:

using NATS.Server.JetStream.Storage;

namespace NATS.Server.Tests.JetStream.Storage;

public class FileStoreChecksumTests : IDisposable
{
    private readonly DirectoryInfo _dir = Directory.CreateTempSubdirectory("checksum-");

    public void Dispose() => _dir.Delete(true);

    [Fact]
    public void MsgBlock_tracks_last_checksum()
    {
        using var block = MsgBlock.Create(1, _dir.FullName, 1024 * 1024);
        block.Write("test", ReadOnlyMemory<byte>.Empty, "hello"u8.ToArray());
        block.LastChecksum.ShouldNotBeNull();
        block.LastChecksum.Length.ShouldBe(8); // XxHash64 = 8 bytes
    }

    [Fact]
    public void MsgBlock_validates_checksum_on_read()
    {
        using var block = MsgBlock.Create(1, _dir.FullName, 1024 * 1024);
        block.Write("test", ReadOnlyMemory<byte>.Empty, "hello"u8.ToArray());
        block.Flush();

        // Read should succeed with valid checksum
        var record = block.Read(1);
        record.ShouldNotBeNull();
        record!.Subject.ShouldBe("test");
    }

    [Fact]
    public void MsgBlock_detects_corrupted_record()
    {
        using var block = MsgBlock.Create(1, _dir.FullName, 1024 * 1024);
        block.Write("test", ReadOnlyMemory<byte>.Empty, "hello"u8.ToArray());
        block.Flush();
        block.ClearCache();

        // Corrupt a byte in the block file
        var files = Directory.GetFiles(_dir.FullName, "*.blk");
        var bytes = File.ReadAllBytes(files[0]);
        bytes[^10] ^= 0xFF;
        File.WriteAllBytes(files[0], bytes);

        Should.Throw<InvalidDataException>(() => block.Read(1));
    }

    [Fact]
    public void MsgBlock_validates_checksum_flag_controls_behavior()
    {
        using var block = MsgBlock.Create(1, _dir.FullName, 1024 * 1024, validateOnRead: false);
        block.Write("test", ReadOnlyMemory<byte>.Empty, "hello"u8.ToArray());
        block.Flush();
        block.ClearCache();

        // Even with corruption, no exception when validation disabled
        var files = Directory.GetFiles(_dir.FullName, "*.blk");
        var bytes = File.ReadAllBytes(files[0]);
        bytes[^10] ^= 0xFF;
        File.WriteAllBytes(files[0], bytes);

        var record = block.Read(1);
        // May return null or corrupted data, but should not throw
    }
}

Step 2: Run tests to verify they fail

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~FileStoreChecksumTests" -v normal

Expected: Compilation errors (missing LastChecksum, validateOnRead parameter)

Step 3: Implement

  1. Add _lastChecksum: byte[]? field and LastChecksum property to MsgBlock.cs
  2. Add _validateOnRead: bool parameter to Create and Recover factory methods
  3. Update Write to capture checksum from MessageRecord.Encode result
  4. Update Read to validate checksum when _validateOnRead is true and record is loaded from disk (not cache)
  5. Expose Checksum property on MessageRecord from the decoded trailer

Step 4: Run tests to verify they pass

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~FileStoreChecksumTests" -v normal

Step 5: Commit

git add src/NATS.Server/JetStream/Storage/MsgBlock.cs src/NATS.Server/JetStream/Storage/MessageRecord.cs tests/NATS.Server.Tests/JetStream/Storage/FileStoreChecksumTests.cs
git commit -m "feat: add checksum validation on MsgBlock read path (Gap 1.5)"

Task 2: Atomic File Overwrites (Gap 1.6)

Add AtomicFileWriter helper and SemaphoreSlim write lock to FileStore for crash-safe state persistence.

Files:

  • Create: src/NATS.Server/JetStream/Storage/AtomicFileWriter.cs
  • Modify: src/NATS.Server/JetStream/Storage/FileStore.cs:1827 (WriteStreamState — use atomic writer)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/AtomicFileWriterTests.cs (create)
  • Go ref: filestore.go:10599 (_writeFullState)

Step 1: Write failing tests

using NATS.Server.JetStream.Storage;

namespace NATS.Server.Tests.JetStream.Storage;

public class AtomicFileWriterTests : IDisposable
{
    private readonly DirectoryInfo _dir = Directory.CreateTempSubdirectory("atomic-");
    public void Dispose() => _dir.Delete(true);

    [Fact]
    public async Task WriteAtomicallyAsync_creates_file()
    {
        var path = Path.Combine(_dir.FullName, "state.json");
        await AtomicFileWriter.WriteAtomicallyAsync(path, "hello"u8.ToArray());
        File.Exists(path).ShouldBeTrue();
        (await File.ReadAllTextAsync(path)).ShouldBe("hello");
    }

    [Fact]
    public async Task WriteAtomicallyAsync_no_temp_file_remains()
    {
        var path = Path.Combine(_dir.FullName, "state.json");
        await AtomicFileWriter.WriteAtomicallyAsync(path, "data"u8.ToArray());
        Directory.GetFiles(_dir.FullName, "*.tmp").ShouldBeEmpty();
    }

    [Fact]
    public async Task WriteAtomicallyAsync_overwrites_existing()
    {
        var path = Path.Combine(_dir.FullName, "state.json");
        await AtomicFileWriter.WriteAtomicallyAsync(path, "old"u8.ToArray());
        await AtomicFileWriter.WriteAtomicallyAsync(path, "new"u8.ToArray());
        (await File.ReadAllTextAsync(path)).ShouldBe("new");
    }
}

Step 2: Run tests — expect compilation failure

Step 3: Implement

  1. Create AtomicFileWriter.cs with static WriteAtomicallyAsync(string path, byte[] data): write to {path}.tmp, flush to disk, File.Move(overwrite: true)
  2. Add SemaphoreSlim _stateWriteLock = new(1, 1) to FileStore
  3. Update WriteStreamState() to use _stateWriteLock and AtomicFileWriter

Step 4: Run tests — expect pass

Step 5: Commit

git add src/NATS.Server/JetStream/Storage/AtomicFileWriter.cs src/NATS.Server/JetStream/Storage/FileStore.cs tests/NATS.Server.Tests/JetStream/Storage/AtomicFileWriterTests.cs
git commit -m "feat: add atomic file writer with SemaphoreSlim for crash-safe state writes (Gap 1.6)"

Task 3: Tombstone & Deletion Tracking (Gap 1.7)

Replace HashSet<ulong> _deleted with sparse SequenceSet and add secure erase support.

Files:

  • Create: src/NATS.Server/JetStream/Storage/SequenceSet.cs
  • Modify: src/NATS.Server/JetStream/Storage/MsgBlock.cs:30 (replace _deleted)
  • Modify: src/NATS.Server/JetStream/Storage/MsgBlock.cs:332 (Delete — add secure erase)
  • Modify: src/NATS.Server/JetStream/Storage/FileStore.cs (recover tombstones)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/SequenceSetTests.cs (create)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/FileStoreTombstoneTrackingTests.cs (create)
  • Go ref: filestore.go:5267 (removeMsg), filestore.go:5890 (eraseMsg)

Step 1: Write failing tests

SequenceSetTests.cs — tests for range-compressed sorted set: Add, Remove, Contains, Count, ranges collapse (e.g., adding 1,2,3 stores as range [1-3]).

FileStoreTombstoneTrackingTests.cs — tests for: tombstones survive MsgBlock recovery, secure erase overwrites data with random bytes, SequenceSet used instead of HashSet.

Step 2: Run tests — expect compilation failure

Step 3: Implement

  1. SequenceSet.cs: sorted list of (ulong Start, ulong End) ranges with range compression, binary search for Contains/Add/Remove
  2. Replace _deleted: HashSet<ulong> with _deleted: SequenceSet in MsgBlock.cs
  3. Add secureErase parameter to MsgBlock.Delete() — when true, overwrite payload bytes with RandomNumberGenerator.Fill
  4. Persist tombstone records using existing DeletedFlag = 0x80 in MessageRecord
  5. Recover tombstones during RebuildIndex() by checking the deleted flag

Step 4: Run tests — expect pass

Step 5: Commit

git add src/NATS.Server/JetStream/Storage/SequenceSet.cs src/NATS.Server/JetStream/Storage/MsgBlock.cs src/NATS.Server/JetStream/Storage/FileStore.cs tests/NATS.Server.Tests/JetStream/Storage/SequenceSetTests.cs tests/NATS.Server.Tests/JetStream/Storage/FileStoreTombstoneTrackingTests.cs
git commit -m "feat: add SequenceSet for sparse deletion tracking with secure erase (Gap 1.7)"

Task 4: Multi-Block Write Cache (Gap 1.8)

Add WriteCacheManager to FileStore with bounded strong-reference cache, TTL eviction, and background flush.

Files:

  • Modify: src/NATS.Server/JetStream/Storage/FileStore.cs (add WriteCacheManager inner class)
  • Modify: src/NATS.Server/JetStream/Storage/MsgBlock.cs (integrate cache manager)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/WriteCacheTests.cs (create)
  • Go ref: filestore.go:4443 (setupWriteCache), filestore.go:6148 (expireCache)

Step 1: Write failing tests

Tests for: cache entries evicted after TTL (2s default), cache bounded by size (64MB default), FlushPendingMsgsAsync flushes all cached entries, PeriodicTimer background worker.

Step 2-5: Implement, test, commit

Implement WriteCacheManager as inner class with PeriodicTimer (500ms tick), bounded Dictionary<int, CacheEntry> keyed by block ID, size tracking, TTL eviction. Integrate into FileStore.StoreMsg and FileStore.RotateBlock.

git commit -m "feat: add bounded write cache with TTL eviction and background flush (Gap 1.8)"

Task 5: Query/Filter Operations (Gap 1.10)

Optimize FilteredState, LoadMsg, and NumFiltered with block-aware binary search.

Files:

  • Modify: src/NATS.Server/JetStream/Storage/FileStore.cs:471 (FilteredState — use block index ranges)
  • Modify: src/NATS.Server/JetStream/Storage/FileStore.cs:1459 (LoadMsg — block-aware binary search)
  • Test: tests/NATS.Server.Tests/JetStream/Storage/FileStoreFilterQueryTests.cs (create)
  • Go ref: filestore.go:3191 (FilteredState), filestore.go:8308 (LoadMsg)

Step 1-5: TDD cycle

Tests for: FilteredState with wildcard subjects, LoadMsg with block boundary crossing, CheckSkipFirstBlock optimization for range queries, NumFiltered with caching. Use SubjectMatch.IsMatch() for token-based filter matching.

git commit -m "feat: optimize FilteredState and LoadMsg with block-aware search (Gap 1.10)"

Task 6: RAFT InstallSnapshot Streaming (Gap 8.3)

Add chunk-based snapshot streaming with CRC32 validation.

Files:

  • Create: src/NATS.Server/Raft/SnapshotChunkEnumerator.cs
  • Modify: src/NATS.Server/Raft/RaftNode.cs:414 (InstallSnapshotFromChunksAsync — add CRC validation)
  • Modify: src/NATS.Server/Raft/RaftWireFormat.cs (add RaftInstallSnapshotChunkWire)
  • Test: tests/NATS.Server.Tests/Raft/RaftSnapshotStreamingTests.cs (create)
  • Go ref: raft.go snapshot install with chunks

Step 1: Write failing tests

[Fact]
public void SnapshotChunkEnumerator_yields_fixed_size_chunks()
{
    var data = new byte[200_000]; // ~3 chunks at 64KB
    Random.Shared.NextBytes(data);
    var chunks = new SnapshotChunkEnumerator(data, chunkSize: 65536).ToList();
    chunks.Count.ShouldBe(4); // 64K + 64K + 64K + remainder
    chunks.Sum(c => c.Length).ShouldBe(200_000);
}

[Fact]
public async Task InstallSnapshot_validates_crc32_over_assembled_content()
{
    var node = new RaftNode("n1");
    var data = "snapshot-data"u8.ToArray();
    var chunks = new SnapshotChunkEnumerator(data, 8).ToList();

    // Corrupt one chunk
    chunks[0][0] ^= 0xFF;

    await Should.ThrowAsync<InvalidDataException>(
        () => node.InstallSnapshotFromChunksAsync(chunks, 1, 1, default));
}

Step 2-5: Implement, test, commit

git commit -m "feat: add chunk-based snapshot streaming with CRC32 validation (Gap 8.3)"

Task 7: RAFT Leadership Transfer (Gap 8.4)

Add TransferLeadership with TimeoutNowRpc message type.

Files:

  • Modify: src/NATS.Server/Raft/RaftNode.cs (add TransferLeadershipAsync)
  • Modify: src/NATS.Server/Raft/RaftWireFormat.cs (add TimeoutNowRpc wire type)
  • Test: tests/NATS.Server.Tests/Raft/RaftLeadershipTransferTests.cs (create)
  • Go ref: raft.go leadership transfer

Step 1-5: TDD cycle

Tests for: target receives TimeoutNow and starts election immediately, leader stops accepting proposals during transfer, transfer times out after 2x election timeout if not completed. Implement TransferLeadershipAsync(string targetId) on RaftNode.

git commit -m "feat: add leadership transfer via TimeoutNow RPC (Gap 8.4)"

Task 8: RAFT Log Compaction Policies (Gap 8.5)

Add CompactionPolicy enum and configurable compaction thresholds.

Files:

  • Create: src/NATS.Server/Raft/CompactionPolicy.cs
  • Modify: src/NATS.Server/Raft/RaftNode.cs:400 (CompactLogAsync — use policy)
  • Test: tests/NATS.Server.Tests/Raft/RaftCompactionPolicyTests.cs (create)

Step 1-5: TDD cycle

Tests for: ByCount policy compacts when log exceeds N entries, BySize compacts when total size exceeds threshold, ByAge compacts entries older than duration. Implement CompactionPolicy enum with ByCount, BySize, ByAge variants and RaftOptions.CompactionPolicy/thresholds.

git commit -m "feat: add configurable log compaction policies (Gap 8.5)"

Task 9: RAFT Quorum Check Before Proposing (Gap 8.6)

Add HasQuorum() check using peer heartbeat timestamps.

Files:

  • Modify: src/NATS.Server/Raft/RaftNode.cs (add HasQuorum, update ProposeAsync)
  • Modify: src/NATS.Server/Raft/RaftPeerState.cs (ensure LastContact tracked)
  • Test: tests/NATS.Server.Tests/Raft/RaftQuorumCheckTests.cs (create)

Step 1-5: TDD cycle

Tests for: HasQuorum returns false when majority of peers have stale heartbeats, ProposeAsync returns ProposalResult.NoQuorum when check fails, heartbeat responses update LastContact.

git commit -m "feat: add quorum check before proposing entries (Gap 8.6)"

Task 10: RAFT ReadIndex Optimization (Gap 8.7)

Add ReadIndex() for linearizable reads without log growth.

Files:

  • Modify: src/NATS.Server/Raft/RaftNode.cs (add ReadIndexAsync)
  • Test: tests/NATS.Server.Tests/Raft/RaftReadIndexTests.cs (create)

Step 1-5: TDD cycle

Tests for: ReadIndexAsync returns current commit index after quorum heartbeat round, deposed leader's ReadIndexAsync fails (no quorum), follower can serve read when appliedIndex >= readIndex.

git commit -m "feat: add ReadIndex for linearizable reads via quorum confirmation (Gap 8.7)"

Task 11: RAFT Election Timeout Jitter (Gap 8.8)

Add RandomizedElectionTimeout() using TotalMilliseconds.

Files:

  • Modify: src/NATS.Server/Raft/RaftNode.cs:500 (ResetElectionTimeout — add jitter)
  • Test: tests/NATS.Server.Tests/Raft/RaftElectionJitterTests.cs (create)

Step 1-5: TDD cycle

Tests for: randomized timeout is within [base, 2*base) range, uses TotalMilliseconds (not Milliseconds), different nodes get different timeouts.

git commit -m "feat: add randomized election timeout jitter (Gap 8.8)"

Phase 1 Exit Gate

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~FileStoreChecksum|FullyQualifiedName~AtomicFileWriter|FullyQualifiedName~SequenceSet|FullyQualifiedName~Tombstone|FullyQualifiedName~WriteCache|FullyQualifiedName~FilterQuery|FullyQualifiedName~SnapshotStreaming|FullyQualifiedName~LeadershipTransfer|FullyQualifiedName~CompactionPolicy|FullyQualifiedName~QuorumCheck|FullyQualifiedName~ReadIndex|FullyQualifiedName~ElectionJitter" -v normal

Update test parity DB for all Phase 1 tests.


Phase 2: JetStream Cluster Coordination + API (13 gaps)

Dependencies: Phase 1 (RAFT completion) Exit gate: Peers added/removed, RAFT entries applied to state machine, assignments encoded/decoded with golden fixtures, API requests forwarded to leader, rate limiting active, advisory events published


Task 12: Peer Management & Stream Moves (Gap 2.4)

Add ProcessAddPeer/ProcessRemovePeer to JetStreamMetaGroup for peer-driven stream reassignment.

Files:

  • Modify: src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs
  • Modify: src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/PeerManagementTests.cs (create)
  • Go ref: jetstream_cluster.go:2290-2439

Step 1-5: TDD cycle

Tests for: ProcessAddPeer triggers re-replication of under-replicated streams, ProcessRemovePeer triggers reassignment away from removed peer, RemovePeerFromStream removes specific peer from replica group.

git commit -m "feat: add peer management with stream reassignment (Gap 2.4)"

Task 13: Entry Application Pipeline (Gap 2.7)

Add ApplyMetaEntries and ApplyStreamEntries dispatchers.

Files:

  • Modify: src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs:579 (extend ApplyEntry)
  • Modify: src/NATS.Server/JetStream/Cluster/StreamReplicaGroup.cs:193 (extend ApplyCommittedEntriesAsync)
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/EntryApplicationTests.cs (create)
  • Go ref: jetstream_cluster.go:2474-4261

Step 1-5: TDD cycle

Tests for: meta entry dispatch (StreamCreate, StreamUpdate, StreamDelete, ConsumerCreate, ConsumerDelete, PeerAdd, PeerRemove), stream entry dispatch (store, remove, purge), consumer entry dispatch (ack, nak, deliver).

git commit -m "feat: add entry application pipeline for meta and stream RAFT groups (Gap 2.7)"

Task 14: Topology-Aware Placement (Gap 2.8)

Extend PlacementEngine.SelectPeerGroup with tag enforcement, HA limits, and weighted selection.

Files:

  • Modify: src/NATS.Server/JetStream/Cluster/PlacementEngine.cs:14
  • Modify: src/NATS.Server/JetStream/Cluster/PlacementEngine.cs:62 (PeerInfo — add tags, storage)
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/TopologyPlacementTests.cs (create)
  • Go ref: jetstream_cluster.go:7524-7618 (selectPeerGroup)

Step 1-5: TDD cycle

Tests for: JetStreamUniqueTag enforcement (no two replicas on same-tagged node), HA asset limits per peer, tag include/exclude with prefix matching, weighted selection by available resources.

git commit -m "feat: add topology-aware placement with tag enforcement (Gap 2.8)"

Task 15: RAFT Group Creation & Lifecycle (Gap 2.9)

Flesh out RaftGroup with factory method and member helpers.

Files:

  • Modify: src/NATS.Server/JetStream/Cluster/ClusterAssignmentTypes.cs:9 (RaftGroup)
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/RaftGroupLifecycleTests.cs (create)

Step 1-5: TDD cycle

Tests for: IsMember(peerId), SetPreferred(peerId), CreateRaftGroup factory uses PlacementEngine.

git commit -m "feat: add RaftGroup lifecycle methods (Gap 2.9)"

Task 16: Assignment Encoding/Decoding (Gap 2.10)

Add AssignmentCodec with binary serialization matching Go wire format, golden fixture tests.

Files:

  • Create: src/NATS.Server/JetStream/Cluster/AssignmentCodec.cs
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/AssignmentCodecTests.cs (create)
  • Go ref: jetstream_cluster.go encode/decode functions

Step 1-5: TDD cycle

Tests for: encode/decode round-trip for stream assignments, consumer assignments, S2 compression for configs > 1KB, golden fixture tests with captured Go output bytes for format compatibility.

git commit -m "feat: add binary assignment codec with golden fixture tests (Gap 2.10)"

Task 17: Unsupported Asset Handling (Gap 2.11)

Add graceful handling for version-incompatible stream/consumer assignments.

Files:

  • Modify: src/NATS.Server/JetStream/Cluster/JetStreamMetaGroup.cs
  • Test: tests/NATS.Server.Tests/JetStream/Cluster/UnsupportedAssetTests.cs (create)

Step 1-5: TDD cycle

Tests for: unknown version assignment logged as warning and skipped, does not crash cluster.

git commit -m "feat: add unsupported asset handling for mixed-version clusters (Gap 2.11)"

Task 18: Clustered API Handlers (Gap 2.12)

Add JsClusteredStreamRequest and JsClusteredStreamUpdateRequest.

Files:

  • Modify: src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs
  • Modify: src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs
  • Test: tests/NATS.Server.Tests/JetStream/Api/ClusteredApiTests.cs (create)
  • Go ref: jetstream_cluster.go:7620-8265

Step 1-5: TDD cycle

Tests for: stream create proposes to meta RAFT, stream update proposes with validation, system subscriptions for result processing.

git commit -m "feat: add clustered stream/consumer API handlers (Gap 2.12)"

Task 19: Leader Forwarding (Gap 7.1)

Implement ForwardToLeader middleware in JetStreamApiRouter.

Files:

  • Modify: src/NATS.Server/JetStream/Api/JetStreamApiRouter.cs:97 (ForwardToLeader — implement)
  • Test: tests/NATS.Server.Tests/JetStream/Api/LeaderForwardingTests.cs (create)

Step 1-5: TDD cycle

Tests for: request forwarded when not meta leader, forwarded response returned to client, timeout after 5 seconds.

git commit -m "feat: implement leader forwarding for JetStream API (Gap 7.1)"

Task 20: Clustered API Request Handlers (Gap 7.2)

Add cluster-aware create/update/delete that propose to RAFT.

Files:

  • Modify: src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs
  • Modify: src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs
  • Test: tests/NATS.Server.Tests/JetStream/Api/ClusteredRequestTests.cs (create)
  • Go ref: jetstream_cluster.go:7620-7701

Step 1-5: TDD cycle

Tests for: cluster-aware create proposes to RAFT, waits for proposal result, returns error on proposal failure.

git commit -m "feat: add cluster-aware API request handlers (Gap 7.2)"

Task 21: API Rate Limiting & Deduplication (Gap 7.3)

Add ApiRateLimiter with SemaphoreSlim-based concurrency limiting and request deduplication.

Files:

  • Create: src/NATS.Server/JetStream/Api/ApiRateLimiter.cs
  • Modify: src/NATS.Server/JetStream/Api/JetStreamApiRouter.cs (integrate limiter)
  • Test: tests/NATS.Server.Tests/JetStream/Api/ApiRateLimiterTests.cs (create)

Step 1-5: TDD cycle

Tests for: concurrent requests capped at max (default 256), duplicate Nats-Request-Id returns cached response, TTL expiration of dedup entries (5 seconds).

git commit -m "feat: add API rate limiting and request deduplication (Gap 7.3)"

Task 22: Snapshot & Restore API Stub (Gap 7.4)

Wire $JS.API.STREAM.SNAPSHOT and $JS.API.STREAM.RESTORE endpoints.

Files:

  • Modify: src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs:157 (complete HandleSnapshot)
  • Modify: src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs:176 (complete HandleRestore)
  • Test: tests/NATS.Server.Tests/JetStream/Api/SnapshotApiTests.cs (create)

Note: Full snapshot behavior completed in Phase 3 Task 37 (Gap 4.7).

Step 1-5: TDD cycle

Tests for: endpoint responds to correct subject, validates stream exists, calls StreamSnapshotService.

git commit -m "feat: wire snapshot/restore API endpoints (Gap 7.4 stub)"

Task 23: Consumer Pause/Resume API (Gap 7.5)

Wire $JS.API.CONSUMER.PAUSE endpoint to existing ConsumerManager.Pause.

Files:

  • Modify: src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs:79
  • Test: tests/NATS.Server.Tests/JetStream/Api/ConsumerPauseApiTests.cs (create)

Step 1-5: TDD cycle

Tests for: pause endpoint calls ConsumerManager.Pause with pauseUntil, returns pause state.

git commit -m "feat: wire consumer pause/resume API endpoint (Gap 7.5)"

Task 24: Advisory Event Publication (Gap 7.6)

Add PublishAdvisory calls in API handlers for stream/consumer lifecycle events.

Files:

  • Modify: src/NATS.Server/JetStream/Api/Handlers/StreamApiHandlers.cs
  • Modify: src/NATS.Server/JetStream/Api/Handlers/ConsumerApiHandlers.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs (add PublishAdvisoryAsync)
  • Modify: src/NATS.Server/Events/EventSubjects.cs (add $JS.EVENT.ADVISORY.* subjects)
  • Test: tests/NATS.Server.Tests/JetStream/Api/AdvisoryEventTests.cs (create)

Step 1-5: TDD cycle

Tests for: stream create publishes advisory, consumer delete publishes advisory, advisory includes correct event type and payload.

git commit -m "feat: add advisory event publication for API operations (Gap 7.6)"

Phase 2 Exit Gate

# Targeted tests for Phase 2
dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~PeerManagement|FullyQualifiedName~EntryApplication|FullyQualifiedName~TopologyPlacement|FullyQualifiedName~RaftGroupLifecycle|FullyQualifiedName~AssignmentCodec|FullyQualifiedName~UnsupportedAsset|FullyQualifiedName~ClusteredApi|FullyQualifiedName~LeaderForwarding|FullyQualifiedName~ClusteredRequest|FullyQualifiedName~ApiRateLimiter|FullyQualifiedName~SnapshotApi|FullyQualifiedName~ConsumerPauseApi|FullyQualifiedName~AdvisoryEvent" -v normal

# FULL TEST SUITE CHECKPOINT (Phase 2 complete)
dotnet test

Update test parity DB for all Phase 2 tests.


Phase 3: Consumer Engines + Stream Lifecycle (13 gaps)

Dependencies: Phase 2 (cluster coordination) Exit gate: Consumer delivery loop dispatches messages, heartbeats sent, interest tracked, max deliveries enforced, filter skipping works, rate limiting via token bucket, source consumers configured, snapshot/restore operational


Task 25: Core Message Delivery Loop (Gap 3.1 — CRITICAL)

Implement LoopAndGatherMsgs in PushConsumerEngine.

Files:

  • Modify: src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/DeliveryLoopTests.cs (create)
  • Go ref: consumer.go:1400-1700 (loopAndGatherMsgs)

Step 1: Write failing tests

Tests for: delivery loop polls store for new messages, checks redelivery tracker for expired entries, calculates num_pending from store state, dispatches messages via client write path, uses Channel<ConsumerSignal> for wake-up.

Step 2-5: Implement, test, commit

Add ConsumerSignal enum (NewMessage, AckEvent, ConfigChange). Add Channel<ConsumerSignal> _signalChannel to PushConsumerEngine. Implement background LoopAndGatherMsgs task that: polls IStreamStore.LoadNextMsg, checks RedeliveryTracker.GetDue(), dispatches via SendMessage.

git commit -m "feat: implement core message delivery loop for push consumers (Gap 3.1)"

Task 26: Idle Heartbeat & Flow Control (Gap 3.5)

Add SendIdleHeartbeat and SendFlowControl with pending count headers.

Files:

  • Modify: src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs:220 (extend heartbeat)
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/IdleHeartbeatTests.cs (create)
  • Go ref: consumer.go:5222 (sendIdleHeartbeat), consumer.go:5495 (sendFlowControl)

Step 1-5: TDD cycle

Tests for: heartbeat sent with Nats-Pending-Messages/Nats-Pending-Bytes headers when no delivery within interval, flow control reply with stall detection.

git commit -m "feat: add idle heartbeat with pending count headers and flow control (Gap 3.5)"

Task 27: Delivery Interest Tracking (Gap 3.8)

Add DeliveryInterestTracker monitoring subscribe/unsubscribe events.

Files:

  • Create: src/NATS.Server/JetStream/Consumers/DeliveryInterestTracker.cs
  • Modify: src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs (integrate tracker)
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/DeliveryInterestTests.cs (create)

Step 1-5: TDD cycle

Tests for: HasInterest reflects subscription state, DeleteNotActive cleanup after timeout, gateway interest checking.

git commit -m "feat: add delivery interest tracking with auto-cleanup (Gap 3.8)"

Task 28: Max Deliveries Enforcement (Gap 3.9)

Add advisory generation and delivery exceeded policy.

Files:

  • Modify: src/NATS.Server/JetStream/Consumers/AckProcessor.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs (add NotifyDeliveryExceeded event type)
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/MaxDeliveriesTests.cs (create)

Step 1-5: TDD cycle

Tests for: advisory generated when delivery count exceeds MaxDeliver, DeliveryExceededPolicy enum (Drop, DeadLetter).

git commit -m "feat: add max delivery enforcement with advisory generation (Gap 3.9)"

Task 29: Filter Subject Skip Tracking (Gap 3.10)

Add FilterSkipTracker using SubjectMatch.IsMatch() for token-based filter matching.

Files:

  • Create: src/NATS.Server/JetStream/Consumers/FilterSkipTracker.cs
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/FilterSkipTests.cs (create)

Step 1-5: TDD cycle

Tests for: filter matching uses SubjectMatch.IsMatch() (NOT Regex), SortedSet<ulong> tracks unmatched sequences, ShouldSkip returns whether message matches filter.

git commit -m "feat: add filter skip tracking using SubjectMatch (Gap 3.10)"

Task 30: Sample/Observe Mode (Gap 3.11)

Add sample frequency parsing and stochastic latency sampling.

Files:

  • Modify: src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/SampleModeTests.cs (create)

Step 1-5: TDD cycle

Tests for: "1%" → 0.01 parsing, ShouldSample() uses Random.Shared, latency measurement and advisory.

git commit -m "feat: add sample/observe mode with latency measurement (Gap 3.11)"

Task 31: Reset to Sequence (Gap 3.12)

Add ProcessResetRequest to ConsumerManager.

Files:

  • Modify: src/NATS.Server/JetStream/ConsumerManager.cs:218 (extend Reset)
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/ConsumerResetTests.cs (create)
  • Go ref: consumer.go:4241 (processResetReq)

Step 1-5: TDD cycle

Tests for: reset to specific sequence updates NextSequence, clears pending acks, clears redelivery tracker, publishes advisory.

git commit -m "feat: add consumer reset to specific sequence (Gap 3.12)"

Task 32: Token Bucket Rate Limiting (Gap 3.13)

Add TokenBucketRateLimiter for accurate rate limiting.

Files:

  • Create: src/NATS.Server/JetStream/Consumers/TokenBucketRateLimiter.cs
  • Modify: src/NATS.Server/JetStream/Consumers/PushConsumerEngine.cs (integrate)
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/TokenBucketTests.cs (create)

Step 1-5: TDD cycle

Tests for: configurable rate (bytes/sec) and burst size, WaitForTokenAsync blocks until tokens available, dynamic rate updates.

git commit -m "feat: add token bucket rate limiter for consumers (Gap 3.13)"

Task 33: Cluster-Aware Pending Requests (Gap 3.14)

Add ProposeWaitingRequest for pull requests through consumer RAFT group.

Files:

  • Modify: src/NATS.Server/JetStream/Consumers/PullConsumerEngine.cs
  • Test: tests/NATS.Server.Tests/JetStream/Consumers/ClusterPendingRequestTests.cs (create)

Step 1-5: TDD cycle

Tests for: pull requests proposed through consumer RAFT group, cluster-wide pending tracking.

git commit -m "feat: add cluster-aware pending request tracking for pull consumers (Gap 3.14)"

Task 34: Source Consumer Setup (Gap 4.3)

Complete API request generation for source consumers.

Files:

  • Modify: src/NATS.Server/JetStream/Streams/SourceCoordinator.cs
  • Test: tests/NATS.Server.Tests/JetStream/Streams/SourceConsumerSetupTests.cs (create)

Step 1-5: TDD cycle

Tests for: consumer create request with FilterSubject, SubjectTransforms, OptStartSeq, flow control, account isolation verification.

git commit -m "feat: complete source consumer API request generation (Gap 4.3)"

Task 35: Stream Snapshot & Restore (Gap 4.7)

Implement TAR-based snapshot with S2 compression and deadline enforcement.

Files:

  • Modify: src/NATS.Server/JetStream/Snapshots/StreamSnapshotService.cs
  • Test: tests/NATS.Server.Tests/JetStream/Snapshots/StreamSnapshotTests.cs (create)

Step 1-5: TDD cycle

Tests for: TAR snapshot includes stream config, message blocks, consumer state. S2 compression applied. Deadline enforcement via CancellationTokenSource. Consumer inclusion/exclusion. Restore validates TAR, decompresses, rebuilds indices.

git commit -m "feat: implement TAR-based stream snapshot with S2 compression (Gap 4.7)"

Task 36: Stream Config Update Validation (Gap 4.8)

Add ValidateConfigUpdate with change restriction rules.

Files:

  • Modify: src/NATS.Server/JetStream/StreamManager.cs
  • Test: tests/NATS.Server.Tests/JetStream/Streams/ConfigUpdateValidationTests.cs (create)

Step 1-5: TDD cycle

Tests for: subjects overlap detection, mirror/source immutability, retention policy restrictions, MaxMsgs/MaxBytes/MaxAge monotonic decrease, discard policy compatibility.

git commit -m "feat: add stream config update validation (Gap 4.8)"

Task 37: Source/Mirror Info Reporting (Gap 4.10)

Add GetMirrorInfo/GetSourceInfo for monitoring.

Files:

  • Modify: src/NATS.Server/JetStream/Streams/MirrorCoordinator.cs
  • Modify: src/NATS.Server/JetStream/Streams/SourceCoordinator.cs
  • Test: tests/NATS.Server.Tests/JetStream/Streams/SourceMirrorInfoTests.cs (create)

Step 1-5: TDD cycle

Tests for: MirrorInfoResponse with lag/active/error, SourceInfoResponse[], wired into stream info API.

git commit -m "feat: add source/mirror info reporting for monitoring (Gap 4.10)"

Phase 3 Exit Gate

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~DeliveryLoop|FullyQualifiedName~IdleHeartbeat|FullyQualifiedName~DeliveryInterest|FullyQualifiedName~MaxDeliveries|FullyQualifiedName~FilterSkip|FullyQualifiedName~SampleMode|FullyQualifiedName~ConsumerReset|FullyQualifiedName~TokenBucket|FullyQualifiedName~ClusterPendingRequest|FullyQualifiedName~SourceConsumerSetup|FullyQualifiedName~StreamSnapshot|FullyQualifiedName~ConfigUpdateValidation|FullyQualifiedName~SourceMirrorInfo" -v normal

Update test parity DB for all Phase 3 tests.


Phase 4: Client Protocol + MQTT (12 gaps)

Dependencies: Phase 3 (consumer engines) Exit gate: Route result cache active, slow consumers tracked, trace delivery works, SUB permission cached, will messages published, QoS 1/2 tracked, retained delivered on subscribe


Task 38: Per-Account Subscription Result Cache (Gap 5.4)

Add RouteResultCache with LRU eviction and generation-based invalidation.

Files:

  • Create: src/NATS.Server/Subscriptions/RouteResultCache.cs
  • Modify: src/NATS.Server/NatsClient.cs (integrate cache into message dispatch)
  • Test: tests/NATS.Server.Tests/Subscriptions/RouteResultCacheTests.cs (create)

Step 1-5: TDD cycle

Tests for: LRU eviction at 8192 entries, per-account partitioning, atomic generation invalidation, cache hit avoids SubList.Match call.

git commit -m "feat: add per-account subscription result cache with LRU (Gap 5.4)"

Task 39: Slow Consumer Stall Gate (Gap 5.5)

Extend StallGate with per-kind slow consumer statistics.

Files:

  • Modify: src/NATS.Server/NatsClient.cs:1017 (StallGate — add SlowConsumerCount)
  • Modify: src/NATS.Server/Auth/Account.cs (add IncrementSlowConsumers)
  • Test: tests/NATS.Server.Tests/SlowConsumerStallGateTests.cs (create)

Step 1-5: TDD cycle

Tests for: SlowConsumerCount per ClientKind, account-level tracking, SlowConsumerEvent fired at threshold.

git commit -m "feat: add slow consumer per-kind tracking with account counters (Gap 5.5)"

Task 40: Dynamic Write Buffer Pooling (Gap 5.6)

Integrate OutboundBufferPool with flush coalescing and broadcast drain.

Files:

  • Modify: src/NATS.Server/NatsClient.cs:785 (RunWriteLoopAsync — add broadcast drain)
  • Modify: src/NATS.Server/IO/OutboundBufferPool.cs
  • Test: tests/NATS.Server.Tests/IO/DynamicBufferPoolTests.cs (create)

Step 1-5: TDD cycle

Tests for: broadcast flush drains multiple pending clients, reduces syscall count for fan-out.

git commit -m "feat: add dynamic write buffer pooling with broadcast drain (Gap 5.6)"

Task 41: Per-Client Trace Level (Gap 5.7)

Add TraceMsgDelivery and per-client echo control.

Files:

  • Modify: src/NATS.Server/NatsClient.cs (add TraceMsgDelivery, EchoSupported)
  • Test: tests/NATS.Server.Tests/ClientTraceTests.cs (create)

Step 1-5: TDD cycle

Tests for: trace message delivery logged at Trace level with subject/destination/size, echo flag controls routed message behavior.

git commit -m "feat: add per-client trace delivery and echo control (Gap 5.7)"

Task 42: Subscribe Permission Caching (Gap 5.8)

Extend PermissionLruCache with SUB permission entries and generation-based invalidation.

Files:

  • Modify: src/NATS.Server/Auth/PermissionLruCache.cs
  • Modify: src/NATS.Server/Auth/Account.cs (add GenerationId)
  • Test: tests/NATS.Server.Tests/Auth/SubPermissionCacheTests.cs (create)

Step 1-5: TDD cycle

Tests for: SUB permission cached alongside PUB, generation ID invalidation on permission changes.

git commit -m "feat: add SUB permission caching with generation invalidation (Gap 5.8)"

Task 43: Internal Client Kinds (Gap 5.9)

Already implemented — ClientKind.cs has System, JetStream, Account and IsInternal() extension. Verify and add tests if missing.

Files:

  • Verify: src/NATS.Server/ClientKind.cs
  • Test: tests/NATS.Server.Tests/ClientKindTests.cs (create if missing)
git commit -m "test: verify internal client kinds (Gap 5.9)"

Task 44: Adaptive Read Buffer Short-Read Counter (Gap 5.10)

Add _consecutiveShortReads counter with 4-read threshold.

Files:

  • Modify: src/NATS.Server/IO/AdaptiveReadBuffer.cs
  • Test: tests/NATS.Server.Tests/IO/AdaptiveReadBufferShortReadTests.cs (create)

Step 1-5: TDD cycle

Tests for: shrink only after 4 consecutive short reads, counter resets on full-buffer read.

git commit -m "feat: add consecutive short-read counter to prevent buffer oscillation (Gap 5.10)"

Task 45: MQTT Will Message Delivery (Gap 6.2)

Add PublishWillMessage triggered on abnormal disconnection.

Files:

  • Modify: src/NATS.Server/Mqtt/MqttSessionStore.cs
  • Modify: src/NATS.Server/Mqtt/MqttConnection.cs (trigger on disconnect)
  • Test: tests/NATS.Server.Tests/Mqtt/MqttWillMessageTests.cs (create)

Step 1-5: TDD cycle

Tests for: will message published on abnormal disconnect, NOT published on clean DISCONNECT, will delay interval support.

git commit -m "feat: add MQTT will message delivery on abnormal disconnect (Gap 6.2)"

Task 46: MQTT QoS 1/2 Tracking (Gap 6.3)

Add MqttQoS1Tracker with JetStream-backed ack tracking.

Files:

  • Create: src/NATS.Server/Mqtt/MqttQoS1Tracker.cs
  • Modify: src/NATS.Server/Mqtt/MqttRetainedStore.cs (extend MqttQoS2StateMachine)
  • Test: tests/NATS.Server.Tests/Mqtt/MqttQoSTrackingTests.cs (create)

Step 1-5: TDD cycle

Tests for: QoS 1 outgoing messages stored in $MQTT_out, removed on PUBACK, redelivered on reconnect. QoS 2 PUBREL delivery stream.

git commit -m "feat: add JetStream-backed QoS 1/2 tracking (Gap 6.3)"

Task 47: MQTT MaxAckPending Enforcement (Gap 6.4)

Add MqttFlowController with per-subscription ack pending limits.

Files:

  • Create: src/NATS.Server/Mqtt/MqttFlowController.cs
  • Test: tests/NATS.Server.Tests/Mqtt/MqttFlowControllerTests.cs (create)

Step 1-5: TDD cycle

Tests for: SemaphoreSlim-based blocking at limit, release on PUBACK/PUBCOMP, config reload updates limits.

git commit -m "feat: add MQTT MaxAckPending flow control (Gap 6.4)"

Task 48: MQTT Retained Message Delivery on Subscribe (Gap 6.5)

Deliver matching retained messages on SUBSCRIBE.

Files:

  • Modify: src/NATS.Server/Mqtt/MqttConnection.cs (integrate retained delivery on SUB)
  • Modify: src/NATS.Server/Mqtt/MqttRetainedStore.cs:80 (GetMatchingRetained)
  • Test: tests/NATS.Server.Tests/Mqtt/MqttRetainedDeliveryTests.cs (create)

Step 1-5: TDD cycle

Tests for: retained messages delivered on subscribe with Retain flag, wildcard subscriptions scan all retained topics.

git commit -m "feat: deliver retained messages on MQTT SUBSCRIBE (Gap 6.5)"

Task 49: MQTT Session Flapper Detection (Gap 6.6)

Complete flapper detection with exponential backoff.

Files:

  • Modify: src/NATS.Server/Mqtt/MqttSessionStore.cs:122 (complete TrackConnectDisconnect)
  • Test: tests/NATS.Server.Tests/Mqtt/MqttFlapperDetectionTests.cs (create)

Step 1-5: TDD cycle

Tests for: 3+ connect/disconnect cycles within 10s triggers flapper, exponential backoff on CONNACK, clear after 60s stable.

git commit -m "feat: complete MQTT session flapper detection (Gap 6.6)"

Phase 4 Exit Gate

# Targeted tests
dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~RouteResultCache|FullyQualifiedName~SlowConsumer|FullyQualifiedName~DynamicBuffer|FullyQualifiedName~ClientTrace|FullyQualifiedName~SubPermissionCache|FullyQualifiedName~ClientKind|FullyQualifiedName~AdaptiveReadBuffer|FullyQualifiedName~MqttWill|FullyQualifiedName~MqttQoS|FullyQualifiedName~MqttFlowController|FullyQualifiedName~MqttRetained|FullyQualifiedName~MqttFlapper" -v normal

# FULL TEST SUITE CHECKPOINT (Phase 4 complete)
dotnet test

Update test parity DB for all Phase 4 tests.


Phase 5: Configuration Reload + Gateway (11 gaps)

Dependencies: Phase 4 (client protocol) Exit gate: Auth changes propagated, TLS reloaded, cluster config hot-reloaded, gateways reconnect with backoff, account-specific routing works, queue groups propagated, reply mapping cached


Task 50: Auth Change Propagation (Gap 14.2)

Add PropagateAuthChanges to ConfigReloader.

Files:

  • Modify: src/NATS.Server/Configuration/ConfigReloader.cs
  • Modify: src/NATS.Server/NatsServer.cs (hook auth changes to connections)
  • Test: tests/NATS.Server.Tests/Configuration/AuthChangePropagationTests.cs (create)
git commit -m "feat: add auth change propagation to existing connections (Gap 14.2)"

Task 51: TLS Certificate Reload (Gap 14.3)

Add ReloadTlsCertificates for hot-swapping certificates.

Files:

  • Modify: src/NATS.Server/Configuration/ConfigReloader.cs:496 (extend ReloadTlsCertificate)
  • Test: tests/NATS.Server.Tests/Configuration/TlsReloadTests.cs (create)
git commit -m "feat: add TLS certificate hot-reload for new connections (Gap 14.3)"

Task 52: Cluster Config Hot Reload (Gap 14.4)

Add ApplyClusterConfigChanges for route/gateway/leaf URL changes.

Files:

  • Modify: src/NATS.Server/Configuration/ConfigReloader.cs
  • Test: tests/NATS.Server.Tests/Configuration/ClusterConfigReloadTests.cs (create)
git commit -m "feat: add cluster config hot reload (Gap 14.4)"

Task 53: Logging Level Changes (Gap 14.5)

Add ApplyLoggingChanges to update Serilog LoggingLevelSwitch.

Files:

  • Modify: src/NATS.Server/Configuration/ConfigReloader.cs
  • Test: tests/NATS.Server.Tests/Configuration/LoggingReloadTests.cs (create)
git commit -m "feat: add runtime logging level changes (Gap 14.5)"

Task 54: JetStream Config Changes (Gap 14.6)

Add ApplyJetStreamConfigChanges for MaxMemory/MaxStore/Domain.

Files:

  • Modify: src/NATS.Server/Configuration/ConfigReloader.cs
  • Test: tests/NATS.Server.Tests/Configuration/JetStreamConfigReloadTests.cs (create)
git commit -m "feat: add JetStream config change reload (Gap 14.6)"

Task 55: Gateway Reconnection with Backoff (Gap 11.2)

Add ReconnectGatewayAsync with exponential backoff and jitter.

Files:

  • Modify: src/NATS.Server/Gateways/GatewayManager.cs
  • Test: tests/NATS.Server.Tests/Gateways/GatewayReconnectionTests.cs (create)
git commit -m "feat: add gateway reconnection with exponential backoff (Gap 11.2)"

Task 56: Account-Specific Gateway Routes (Gap 11.3)

Add per-account subscription sending to gateways.

Files:

  • Modify: src/NATS.Server/Gateways/GatewayManager.cs
  • Modify: src/NATS.Server/Gateways/GatewayConnection.cs
  • Test: tests/NATS.Server.Tests/Gateways/AccountGatewayRoutesTests.cs (create)
git commit -m "feat: add account-specific gateway routes (Gap 11.3)"

Task 57: Queue Group Propagation (Gap 11.4)

Add SendQueueSubsToGateway for queue group load balancing across gateways.

Files:

  • Modify: src/NATS.Server/Gateways/GatewayManager.cs
  • Modify: src/NATS.Server/Gateways/GatewayConnection.cs
  • Test: tests/NATS.Server.Tests/Gateways/QueueGroupPropagationTests.cs (create)
git commit -m "feat: add queue group propagation to gateways (Gap 11.4)"

Task 58: Reply Subject Mapping Cache (Gap 11.5)

Add ReplyMapCache with LRU and TTL expiration.

Files:

  • Modify: src/NATS.Server/Gateways/ReplyMapper.cs
  • Test: tests/NATS.Server.Tests/Gateways/ReplyMapCacheTests.cs (create)
git commit -m "feat: add reply subject mapping cache with TTL (Gap 11.5)"

Task 59: Gateway Command Protocol (Gap 11.6)

Add GatewayCommand enum with exact Go wire format byte sequences.

Files:

  • Modify: src/NATS.Server/Gateways/GatewayManager.cs
  • Modify: src/NATS.Server/Gateways/GatewayConnection.cs
  • Test: tests/NATS.Server.Tests/Gateways/GatewayCommandTests.cs (create)
git commit -m "feat: add gateway command protocol with Go-compatible wire format (Gap 11.6)"

Task 60: Gateway Connection Registration (Gap 11.7)

Add full gateway connection registry with state tracking.

Files:

  • Modify: src/NATS.Server/Gateways/GatewayManager.cs
  • Test: tests/NATS.Server.Tests/Gateways/GatewayRegistrationTests.cs (create)
git commit -m "feat: add gateway connection registration with state tracking (Gap 11.7)"

Phase 5 Exit Gate

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~AuthChangePropagation|FullyQualifiedName~TlsReload|FullyQualifiedName~ClusterConfigReload|FullyQualifiedName~LoggingReload|FullyQualifiedName~JetStreamConfigReload|FullyQualifiedName~GatewayReconnection|FullyQualifiedName~AccountGatewayRoutes|FullyQualifiedName~QueueGroupPropagation|FullyQualifiedName~ReplyMapCache|FullyQualifiedName~GatewayCommand|FullyQualifiedName~GatewayRegistration" -v normal

Update test parity DB for all Phase 5 tests.


Phase 6: Route Clustering + LeafNode (12 gaps)

Dependencies: Phase 5 (config reload, gateway infrastructure) Exit gate: Account-specific routes work, pool sizes negotiated, routes stored by hash, cluster splits handled, leaf TLS hot-reloads, permissions synced, leaf connections validated, WebSocket transport works


Task 61: Account-Specific Dedicated Routes (Gap 13.2)

Add AccountRouteMap for per-account route connections.

Files:

  • Modify: src/NATS.Server/Routes/RouteManager.cs
  • Modify: src/NATS.Server/Routes/RouteConnection.cs
  • Test: tests/NATS.Server.Tests/Routes/AccountRouteTests.cs (create)
git commit -m "feat: add account-specific dedicated routes (Gap 13.2)"

Task 62: Route Pool Size Negotiation (Gap 13.3)

Add NegotiatePoolSize during route handshake.

Files:

  • Modify: src/NATS.Server/Routes/RouteManager.cs
  • Modify: src/NATS.Server/Routes/RouteConnection.cs
  • Test: tests/NATS.Server.Tests/Routes/PoolSizeNegotiationTests.cs (create)
git commit -m "feat: add route pool size negotiation (Gap 13.3)"

Task 63: Route Hash Storage (Gap 13.4)

Add ConcurrentDictionary<ulong, RouteConnection> for O(1) route lookup.

Files:

  • Modify: src/NATS.Server/Routes/RouteManager.cs
  • Test: tests/NATS.Server.Tests/Routes/RouteHashStorageTests.cs (create)
git commit -m "feat: add route hash storage for O(1) lookup (Gap 13.4)"

Task 64: Cluster Split Handling (Gap 13.5)

Add RemoveAllRoutesExcept and RemoveRoute for partition handling.

Files:

  • Modify: src/NATS.Server/Routes/RouteManager.cs
  • Test: tests/NATS.Server.Tests/Routes/ClusterSplitTests.cs (create)
git commit -m "feat: add cluster split handling (Gap 13.5)"

Task 65: No-Pool Route Fallback (Gap 13.6)

Add backward compatibility with pre-pool servers.

Files:

  • Modify: src/NATS.Server/Routes/RouteManager.cs
  • Modify: src/NATS.Server/Routes/RouteConnection.cs
  • Test: tests/NATS.Server.Tests/Routes/NoPoolFallbackTests.cs (create)
git commit -m "feat: add no-pool route fallback for backward compatibility (Gap 13.6)"

Task 66: Leaf Node TLS Certificate Hot-Reload (Gap 12.1)

Add UpdateTlsConfig for leaf node connections.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafTlsReloadTests.cs (create)
git commit -m "feat: add leaf node TLS certificate hot-reload (Gap 12.1)"

Task 67: Permission & Account Syncing (Gap 12.2)

Add SendPermsAndAccountInfo and InitLeafNodeSmapAndSendSubs.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Modify: src/NATS.Server/LeafNodes/LeafConnection.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafPermissionSyncTests.cs (create)
git commit -m "feat: add leaf node permission and account syncing (Gap 12.2)"

Task 68: Leaf Connection State Validation (Gap 12.3)

Add ValidateRemoteLeafNode on reconnect.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafValidationTests.cs (create)
git commit -m "feat: add leaf connection state validation on reconnect (Gap 12.3)"

Task 69: JetStream Migration Checks (Gap 12.4)

Add CheckJetStreamMigrate validation.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafJetStreamMigrationTests.cs (create)
git commit -m "feat: add leaf node JetStream migration checks (Gap 12.4)"

Task 70: Leaf Node WebSocket Support (Gap 12.5)

Add WebSocketStreamAdapter for message-framed WebSocket → stream conversion.

Files:

  • Create: src/NATS.Server/LeafNodes/WebSocketStreamAdapter.cs
  • Modify: src/NATS.Server/LeafNodes/LeafConnection.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafWebSocketTests.cs (create)
git commit -m "feat: add leaf node WebSocket support with stream adapter (Gap 12.5)"

Task 71: Leaf Cluster Registration (Gap 12.6)

Add RegisterLeafNodeCluster and HasLeafNodeCluster.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafClusterRegistrationTests.cs (create)
git commit -m "feat: add leaf cluster registration and topology tracking (Gap 12.6)"

Task 72: Leaf Connection Disable Flag (Gap 12.7)

Add IsLeafConnectDisabled with per-remote disable flag.

Files:

  • Modify: src/NATS.Server/LeafNodes/LeafNodeManager.cs
  • Test: tests/NATS.Server.Tests/LeafNodes/LeafDisableTests.cs (create)
git commit -m "feat: add leaf connection disable flag (Gap 12.7)"

Phase 6 Exit Gate

# Targeted tests
dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~AccountRoute|FullyQualifiedName~PoolSizeNegotiation|FullyQualifiedName~RouteHash|FullyQualifiedName~ClusterSplit|FullyQualifiedName~NoPoolFallback|FullyQualifiedName~LeafTlsReload|FullyQualifiedName~LeafPermissionSync|FullyQualifiedName~LeafValidation|FullyQualifiedName~LeafJetStreamMigration|FullyQualifiedName~LeafWebSocket|FullyQualifiedName~LeafClusterRegistration|FullyQualifiedName~LeafDisable" -v normal

# FULL TEST SUITE CHECKPOINT (Phase 6 complete)
dotnet test

Update test parity DB for all Phase 6 tests.


Phase 7: Account Management & Multi-Tenancy (10 gaps)

Dependencies: Phase 4 (client protocol), Phase 5 (config reload), Phase 6 (leaf/route for claim propagation) Exit gate: Service latency tracked, response thresholds enforced, import cycles detected, wildcard exports work, accounts expire, claims hot-reloaded, NKey revocation enforced


Task 73: Service Export Latency Tracking (Gap 9.1)

Add ServiceLatencyTracker with p50/p90/p99 histogram.

Files:

  • Create: src/NATS.Server/Auth/ServiceLatencyTracker.cs
  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/ServiceLatencyTrackerTests.cs (create)
git commit -m "feat: add service export latency tracking with p50/p90/p99 (Gap 9.1)"

Task 74: Service Export Response Threshold (Gap 9.2)

Add ResponseThreshold to service export configuration.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/ResponseThresholdTests.cs (create)
git commit -m "feat: add service export response threshold enforcement (Gap 9.2)"

Task 75: Stream Import Cycle Detection (Gap 9.3)

Add StreamImportFormsCycle using DFS.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/StreamImportCycleTests.cs (create)
git commit -m "feat: add stream import cycle detection (Gap 9.3)"

Task 76: Wildcard Service Exports (Gap 9.4)

Add GetWildcardServiceExport using SubjectMatch.IsMatch.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/WildcardExportTests.cs (create)
git commit -m "feat: add wildcard service export matching (Gap 9.4)"

Task 77: Account Expiration & TTL (Gap 9.5)

Add ExpiresAt, IsExpired, and SetExpirationTimer.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/AccountExpirationTests.cs (create)
git commit -m "feat: add account expiration with TTL-based cleanup (Gap 9.5)"

Task 78: Account Claim Hot-Reload (Gap 9.6)

Add UpdateAccountClaims with diff-based update.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Modify: src/NATS.Server/NatsServer.cs
  • Test: tests/NATS.Server.Tests/Auth/AccountClaimReloadTests.cs (create)
git commit -m "feat: add account claim hot-reload with diff-based update (Gap 9.6)"

Task 79: Service/Stream Activation Expiration (Gap 9.7)

Add CheckActivationExpiry for JWT activation claims.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/ActivationExpirationTests.cs (create)
git commit -m "feat: add JWT activation claim expiration checking (Gap 9.7)"

Task 80: User NKey Revocation (Gap 9.8)

Wire _revokedUsers into active connection validation.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs:38 (extend IsUserRevoked)
  • Modify: src/NATS.Server/NatsClient.cs (check revocation on operations)
  • Test: tests/NATS.Server.Tests/Auth/NKeyRevocationTests.cs (create)
git commit -m "feat: wire user NKey revocation into active connections (Gap 9.8)"

Task 81: Response Service Import (Gap 9.9)

Add AddReverseRespMapEntry and CheckForReverseEntries.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/ReverseResponseMapTests.cs (create)
git commit -m "feat: add reverse response mapping for cross-account request-reply (Gap 9.9)"

Task 82: Service Import Shadowing Detection (Gap 9.10)

Add ServiceImportShadowed checking for local subscription overlap.

Files:

  • Modify: src/NATS.Server/Auth/Account.cs
  • Test: tests/NATS.Server.Tests/Auth/ImportShadowingTests.cs (create)
git commit -m "feat: add service import shadowing detection (Gap 9.10)"

Phase 7 Exit Gate

dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~ServiceLatency|FullyQualifiedName~ResponseThreshold|FullyQualifiedName~StreamImportCycle|FullyQualifiedName~WildcardExport|FullyQualifiedName~AccountExpiration|FullyQualifiedName~AccountClaimReload|FullyQualifiedName~ActivationExpiration|FullyQualifiedName~NKeyRevocation|FullyQualifiedName~ReverseResponseMap|FullyQualifiedName~ImportShadowing" -v normal

Update test parity DB for all Phase 7 tests.


Phase 8: Monitoring, Events & WebSocket (11 gaps)

Dependencies: Phase 7 (account management) Exit gate: Closed connections queryable, account filtering works, sort options work, auth events published, trace propagated, event payloads complete, WebSocket TLS configured


Task 83: Closed Connections Ring Buffer (Gap 10.1)

Add ClosedConnectionRingBuffer and wire into ConnzHandler.

Files:

  • Create: src/NATS.Server/Monitoring/ClosedConnectionRingBuffer.cs
  • Modify: src/NATS.Server/NatsServer.cs (record closed connections)
  • Modify: src/NATS.Server/Monitoring/ConnzHandler.cs:12 (support state=closed)
  • Test: tests/NATS.Server.Tests/Monitoring/ClosedConnectionRingBufferTests.cs (create)
git commit -m "feat: add closed connection ring buffer for /connz?state=closed (Gap 10.1)"

Task 84: Account-Scoped Filtering (Gap 10.2)

Add acc query parameter to ConnzHandler.

Files:

  • Modify: src/NATS.Server/Monitoring/ConnzHandler.cs:239 (add acc param parsing)
  • Test: tests/NATS.Server.Tests/Monitoring/ConnzAccountFilterTests.cs (create)
git commit -m "feat: add account-scoped filtering to /connz (Gap 10.2)"

Task 85: Sort Options (Gap 10.3)

Add SortBy enum and sort query parameter to ConnzHandler.

Files:

  • Modify: src/NATS.Server/Monitoring/ConnzHandler.cs
  • Test: tests/NATS.Server.Tests/Monitoring/ConnzSortTests.cs (create)
git commit -m "feat: add sort options to /connz (Gap 10.3)"

Task 86: Message Trace Propagation (Gap 10.4)

Add trace context header propagation across servers.

Files:

  • Modify: src/NATS.Server/Internal/MessageTraceContext.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Test: tests/NATS.Server.Tests/Internal/TraceContextPropagationTests.cs (create)
git commit -m "feat: add message trace propagation across servers (Gap 10.4)"

Task 87: Auth Error Events (Gap 10.5)

Add SendAuthErrorEvent publishing to $SYS.SERVER.{id}.CLIENT.AUTH.ERR.

Files:

  • Modify: src/NATS.Server/NatsServer.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Test: tests/NATS.Server.Tests/Events/AuthErrorEventTests.cs (create)
git commit -m "feat: add auth error event publication (Gap 10.5)"

Task 88: Full System Event Payloads (Gap 10.6)

Audit and complete all event type fields.

Files:

  • Modify: src/NATS.Server/Events/EventTypes.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Test: tests/NATS.Server.Tests/Events/FullEventPayloadTests.cs (create)
git commit -m "feat: complete system event payload fields (Gap 10.6)"

Task 89: Closed Connection Reason Tracking (Gap 10.7)

Populate ClosedClient.Reason consistently across all disconnect paths.

Files:

  • Modify: src/NATS.Server/NatsClient.cs:902 (MarkClosed — ensure reason set)
  • Modify: src/NATS.Server/NatsServer.cs
  • Test: tests/NATS.Server.Tests/Monitoring/ClosedReasonTests.cs (create)
git commit -m "feat: consistently populate closed connection reasons (Gap 10.7)"

Task 90: Remote Server Events (Gap 10.8)

Add RemoteServerShutdown, RemoteServerUpdate, LeafNodeConnected events.

Files:

  • Modify: src/NATS.Server/NatsServer.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Modify: src/NATS.Server/Events/EventSubjects.cs
  • Test: tests/NATS.Server.Tests/Events/RemoteServerEventTests.cs (create)
git commit -m "feat: add remote server events for cluster visibility (Gap 10.8)"

Task 91: Event Compression (Gap 10.9)

Add S2 compression for system events when subscriber supports it.

Files:

  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Test: tests/NATS.Server.Tests/Events/EventCompressionTests.cs (create)
git commit -m "feat: add S2 compression for system events (Gap 10.9)"

Task 92: OCSP Peer Events (Gap 10.10)

Add OCSP peer reject and chain validation events.

Files:

  • Modify: src/NATS.Server/NatsServer.cs
  • Modify: src/NATS.Server/Events/InternalEventSystem.cs
  • Test: tests/NATS.Server.Tests/Events/OcspEventTests.cs (create)
git commit -m "feat: add OCSP peer reject and chain validation events (Gap 10.10)"

Task 93: WebSocket-Specific TLS (Gap 15.1)

Add separate TLS configuration for WebSocket listener.

Files:

  • Modify: src/NATS.Server/WebSocket/WsUpgrade.cs
  • Modify: src/NATS.Server/NatsServer.cs
  • Test: tests/NATS.Server.Tests/WebSocket/WebSocketTlsTests.cs (create)
git commit -m "feat: add WebSocket-specific TLS configuration (Gap 15.1)"

Phase 8 Exit Gate

# Targeted tests
dotnet test tests/NATS.Server.Tests --filter "FullyQualifiedName~ClosedConnectionRingBuffer|FullyQualifiedName~ConnzAccountFilter|FullyQualifiedName~ConnzSort|FullyQualifiedName~TraceContextPropagation|FullyQualifiedName~AuthErrorEvent|FullyQualifiedName~FullEventPayload|FullyQualifiedName~ClosedReason|FullyQualifiedName~RemoteServerEvent|FullyQualifiedName~EventCompression|FullyQualifiedName~OcspEvent|FullyQualifiedName~WebSocketTls" -v normal

# FULL TEST SUITE CHECKPOINT (Phase 8 complete — FINAL)
dotnet test

Update test parity DB for all Phase 8 tests.


Post-Implementation

After all 8 phases:

  1. Update docs/structuregaps.md — mark all 93 gaps as IMPLEMENTED with commit references
  2. Final test parity DB update — ensure all new tests are registered
  3. Commit final updates
git add docs/structuregaps.md docs/test_parity.db
git commit -m "docs: mark all 93 remaining gaps as IMPLEMENTED"