6-wave implementation plan covering RAFT consensus, FileStore block engine, internal data structures, JetStream clustering, and remaining subsystem test suites. Targets ~1,160 new tests for ~75% Go parity.
224 lines
8.9 KiB
Markdown
224 lines
8.9 KiB
Markdown
# Full Production Parity Design
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Close all remaining gaps between the Go NATS server and the .NET port — implementation code and test coverage — achieving full production parity.
|
|
|
|
**Current state:** 1,081 tests passing, core pub/sub + JetStream basics + MQTT packet parsing + JWT claims ported. Three major implementation gaps remain: RAFT consensus, FileStore block engine, and internal data structures (AVL, subject tree, GSL, time hash wheel).
|
|
|
|
**Approach:** 6-wave slice-by-slice TDD, ordered by dependency. Each wave builds on the prior wave's production code and tests. Parallel subagents within each wave for independent subsystems.
|
|
|
|
---
|
|
|
|
## Gap Analysis Summary
|
|
|
|
### Implementation Gaps
|
|
|
|
| Gap | Go Source | .NET Status | Impact |
|
|
|-----|-----------|-------------|--------|
|
|
| RAFT consensus | `server/raft.go` (5,800 lines) | Missing entirely | Blocks clustered JetStream |
|
|
| FileStore block engine | `server/filestore.go` (337KB) | Flat JSONL stub | Blocks persistent JetStream |
|
|
| Internal data structures | `server/avl/`, `server/stree/`, `server/gsl/`, `server/thw/` | Missing entirely | Blocks FileStore + RAFT |
|
|
|
|
### Test Coverage Gap
|
|
|
|
- Go server tests: ~2,937 test functions
|
|
- .NET tests: 1,081 (32.5% coverage)
|
|
- Gap: ~1,856 tests across all subsystems
|
|
|
|
---
|
|
|
|
## Wave 1: Inventory + Scaffolding
|
|
|
|
**Purpose:** Establish project structure, create stub files, set up namespaces.
|
|
|
|
**Deliverables:**
|
|
- Namespace scaffolding: `NATS.Server.Internal.Avl`, `NATS.Server.Internal.SubjectTree`, `NATS.Server.Internal.Gsl`, `NATS.Server.Internal.TimeHashWheel`
|
|
- Stub interfaces for FileStore block engine
|
|
- Stub interfaces for RAFT node, log, transport
|
|
- Test project directory structure for all new subsystems
|
|
|
|
**Tests:** 0 (scaffolding only)
|
|
|
|
---
|
|
|
|
## Wave 2: Internal Data Structures
|
|
|
|
**Purpose:** Port Go's internal data structures that FileStore and RAFT depend on.
|
|
|
|
### AVL Tree (`server/avl/`)
|
|
- Sparse sequence set backed by AVL-balanced binary tree
|
|
- Used for JetStream ack tracking (consumer pending sets)
|
|
- Key operations: `Insert`, `Delete`, `Contains`, `Range`, `Size`
|
|
- Go reference: `server/avl/seqset.go`
|
|
- Port as `NATS.Server.Internal.Avl.SequenceSet`
|
|
- ~15 tests from Go's `TestSequenceSet*`
|
|
|
|
### Subject Tree (`server/stree/`)
|
|
- Trie for per-subject state in streams (sequence tracking, last-by-subject)
|
|
- Supports wildcard iteration (`*`, `>`)
|
|
- Go reference: `server/stree/stree.go`
|
|
- Port as `NATS.Server.Internal.SubjectTree.SubjectTree<T>`
|
|
- ~15 tests from Go's `TestSubjectTree*`
|
|
|
|
### Generic Subject List (`server/gsl/`)
|
|
- Optimized trie for subscription matching (alternative to SubList for specific paths)
|
|
- Go reference: `server/gsl/gsl.go`
|
|
- Port as `NATS.Server.Internal.Gsl.GenericSubjectList<T>`
|
|
- ~15 tests from Go's `TestGSL*`
|
|
|
|
### Time Hash Wheel (`server/thw/`)
|
|
- Efficient TTL expiration using hash wheel (O(1) insert/cancel, O(bucket) tick)
|
|
- Used for message expiry in MemStore and FileStore
|
|
- Go reference: `server/thw/thw.go`
|
|
- Port as `NATS.Server.Internal.TimeHashWheel.TimeHashWheel<T>`
|
|
- ~15 tests from Go's `TestTimeHashWheel*`
|
|
|
|
**Total tests:** ~60
|
|
|
|
---
|
|
|
|
## Wave 3: FileStore Block Engine
|
|
|
|
**Purpose:** Replace the flat JSONL FileStore stub with Go-compatible block-based storage.
|
|
|
|
### Design Decisions
|
|
- **Behavioral equivalence** — same 64MB block boundaries and semantics, not byte-level Go file compatibility
|
|
- **Block format:** Each block is a separate file containing sequential messages with headers
|
|
- **Compression:** S2 (Snappy variant) per-block, using IronSnappy or equivalent .NET library
|
|
- **Encryption:** AES-GCM per-block (matching Go's encryption support)
|
|
- **Recovery:** Block-level recovery on startup (scan for valid messages, rebuild index)
|
|
|
|
### Components
|
|
1. **Block Manager** — manages block files, rotation at 64MB, compaction
|
|
2. **Message Encoding** — per-message header (sequence, timestamp, subject, data length) + payload
|
|
3. **Index Layer** — in-memory index mapping sequence → block + offset
|
|
4. **Subject Index** — per-subject first/last sequence tracking using SubjectTree (Wave 2)
|
|
5. **Purge/Compact** — subject-based purge, sequence-based purge, compaction
|
|
6. **Recovery** — startup block scanning, index rebuild
|
|
|
|
### Go Reference Files
|
|
- `server/filestore.go` — main implementation
|
|
- `server/filestore_test.go` — test suite
|
|
|
|
**Total tests:** ~80 (store/load, block rotation, compression, encryption, purge, recovery, subject filtering)
|
|
|
|
---
|
|
|
|
## Wave 4: RAFT Consensus
|
|
|
|
**Purpose:** Faithful behavioral port of Go's RAFT implementation for clustered JetStream.
|
|
|
|
### Design Decisions
|
|
- **Faithful Go port** — not a third-party RAFT library; port Go's `raft.go` directly
|
|
- **Same state machine semantics** — leader election, log replication, snapshots, membership changes
|
|
- **Transport abstraction** — pluggable transport (in-process for tests, TCP for production)
|
|
|
|
### Components
|
|
1. **RAFT Node** — state machine (Follower → Candidate → Leader), term/vote tracking
|
|
2. **Log Storage** — append-only log with compaction, backed by FileStore blocks (Wave 3)
|
|
3. **Election** — randomized timeout, RequestVote RPC, majority quorum
|
|
4. **Log Replication** — AppendEntries RPC, leader → follower catch-up, conflict resolution
|
|
5. **Snapshots** — periodic state snapshots, snapshot transfer to lagging followers
|
|
6. **Membership Changes** — joint consensus for adding/removing nodes
|
|
7. **Transport** — RPC abstraction with in-process and TCP implementations
|
|
|
|
### Go Reference Files
|
|
- `server/raft.go` — main implementation (5,800 lines)
|
|
- `server/raft_test.go` — test suite
|
|
|
|
**Total tests:** ~70 (election, log replication, snapshots, membership, split-brain, network partition simulation)
|
|
|
|
---
|
|
|
|
## Wave 5: JetStream Clustering + Concurrency
|
|
|
|
**Purpose:** Wire RAFT into JetStream for clustered operation; add NORACE concurrency tests.
|
|
|
|
### Components
|
|
1. **Meta-Controller** — cluster-wide RAFT group for stream/consumer placement
|
|
- Ports Go's `jetStreamCluster` struct
|
|
- Routes `$JS.API.*` requests through meta-group leader
|
|
- Tests from Go's `TestJetStreamClusterCreate`, `TestJetStreamClusterStreamLeaderStepDown`
|
|
|
|
2. **Per-Stream RAFT Groups** — each R>1 stream gets its own RAFT group
|
|
- Leader accepts publishes, proposes entries, followers apply
|
|
- Tests: create R3 stream, publish, verify all replicas, step down, verify new leader
|
|
|
|
3. **Per-Consumer RAFT Groups** — consumer ack state replicated via RAFT
|
|
- Tests: ack on leader, verify ack floor propagation, consumer failover
|
|
|
|
4. **NORACE Concurrency Suite** — Go's `-race`-tagged tests ported to `Task.WhenAll` patterns
|
|
- Concurrent pub/sub on same stream
|
|
- Concurrent consumer creates
|
|
- Concurrent stream purge during publish
|
|
|
|
### Go Reference Files
|
|
- `server/jetstream_cluster.go`, `server/jetstream_cluster_test.go`
|
|
- `server/norace_test.go`
|
|
|
|
**Total tests:** ~100
|
|
|
|
---
|
|
|
|
## Wave 6: Remaining Subsystem Test Suites
|
|
|
|
**Purpose:** Port remaining Go test functions across all subsystems not covered by Waves 2-5.
|
|
|
|
### Subsystems
|
|
|
|
| Subsystem | Go Tests | Existing .NET | Gap | Files |
|
|
|-----------|----------|---------------|-----|-------|
|
|
| Config reload | ~92 | 3 | ~89 | `Configuration/` |
|
|
| MQTT bridge | ~123 | 50 | ~73 | `Mqtt/` |
|
|
| Leaf nodes | ~110 | 2 | ~108 | `LeafNodes/` |
|
|
| Accounts/auth | ~64 | 15 | ~49 | `Accounts/` |
|
|
| Gateway | ~87 | 2 | ~85 | `Gateways/` |
|
|
| Routes | ~73 | 2 | ~71 | `Routes/` |
|
|
| Monitoring | ~45 | 7 | ~38 | `Monitoring/` |
|
|
| Client protocol | ~120 | 30 | ~90 | root test dir |
|
|
| JetStream API | ~200 | 20 | ~180 | `JetStream/` |
|
|
|
|
### Approach
|
|
- Each subsystem is an independent parallel subagent task
|
|
- Tests organized by .NET namespace matching existing conventions
|
|
- Each test file has header comment mapping to Go source test function names
|
|
- Self-contained test helpers duplicated per file (no shared TestHelpers)
|
|
- Gate verification between subsystem batches
|
|
|
|
**Total tests:** ~780-850
|
|
|
|
---
|
|
|
|
## Dependency Graph
|
|
|
|
```
|
|
Wave 1 (Scaffolding) ──┬──► Wave 2 (Data Structures) ──► Wave 3 (FileStore) ──► Wave 4 (RAFT) ──► Wave 5 (Clustering)
|
|
│
|
|
└──► Wave 6 (Subsystem Suites) [parallel, independent of Waves 2-5]
|
|
```
|
|
|
|
Wave 6 subsystems are mutually independent and can execute in parallel. Waves 2-5 are sequential.
|
|
|
|
---
|
|
|
|
## Estimated Totals
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| New implementation code | ~15,000-20,000 lines |
|
|
| New test code | ~12,000-15,000 lines |
|
|
| New tests | ~1,160 |
|
|
| Final test count | ~2,241 |
|
|
| Final Go parity | ~75% of Go test functions |
|
|
|
|
## Key Conventions
|
|
|
|
- xUnit 3 + Shouldly assertions (never `Assert.*`)
|
|
- NSubstitute for mocking
|
|
- Go reference comments on each ported test: `// Go: TestFunctionName server/file.go:line`
|
|
- Self-contained helpers per test file
|
|
- C# 14 idioms: primary constructors, collection expressions, file-scoped namespaces
|
|
- TDD: write failing test first, then minimal implementation
|
|
- Gated commits between waves
|