diff --git a/docs/plans/2026-02-24-full-production-parity-design.md b/docs/plans/2026-02-24-full-production-parity-design.md new file mode 100644 index 0000000..e929016 --- /dev/null +++ b/docs/plans/2026-02-24-full-production-parity-design.md @@ -0,0 +1,223 @@ +# Full Production Parity Design + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task. + +**Goal:** Close all remaining gaps between the Go NATS server and the .NET port — implementation code and test coverage — achieving full production parity. + +**Current state:** 1,081 tests passing, core pub/sub + JetStream basics + MQTT packet parsing + JWT claims ported. Three major implementation gaps remain: RAFT consensus, FileStore block engine, and internal data structures (AVL, subject tree, GSL, time hash wheel). + +**Approach:** 6-wave slice-by-slice TDD, ordered by dependency. Each wave builds on the prior wave's production code and tests. Parallel subagents within each wave for independent subsystems. + +--- + +## Gap Analysis Summary + +### Implementation Gaps + +| Gap | Go Source | .NET Status | Impact | +|-----|-----------|-------------|--------| +| RAFT consensus | `server/raft.go` (5,800 lines) | Missing entirely | Blocks clustered JetStream | +| FileStore block engine | `server/filestore.go` (337KB) | Flat JSONL stub | Blocks persistent JetStream | +| Internal data structures | `server/avl/`, `server/stree/`, `server/gsl/`, `server/thw/` | Missing entirely | Blocks FileStore + RAFT | + +### Test Coverage Gap + +- Go server tests: ~2,937 test functions +- .NET tests: 1,081 (32.5% coverage) +- Gap: ~1,856 tests across all subsystems + +--- + +## Wave 1: Inventory + Scaffolding + +**Purpose:** Establish project structure, create stub files, set up namespaces. + +**Deliverables:** +- Namespace scaffolding: `NATS.Server.Internal.Avl`, `NATS.Server.Internal.SubjectTree`, `NATS.Server.Internal.Gsl`, `NATS.Server.Internal.TimeHashWheel` +- Stub interfaces for FileStore block engine +- Stub interfaces for RAFT node, log, transport +- Test project directory structure for all new subsystems + +**Tests:** 0 (scaffolding only) + +--- + +## Wave 2: Internal Data Structures + +**Purpose:** Port Go's internal data structures that FileStore and RAFT depend on. + +### AVL Tree (`server/avl/`) +- Sparse sequence set backed by AVL-balanced binary tree +- Used for JetStream ack tracking (consumer pending sets) +- Key operations: `Insert`, `Delete`, `Contains`, `Range`, `Size` +- Go reference: `server/avl/seqset.go` +- Port as `NATS.Server.Internal.Avl.SequenceSet` +- ~15 tests from Go's `TestSequenceSet*` + +### Subject Tree (`server/stree/`) +- Trie for per-subject state in streams (sequence tracking, last-by-subject) +- Supports wildcard iteration (`*`, `>`) +- Go reference: `server/stree/stree.go` +- Port as `NATS.Server.Internal.SubjectTree.SubjectTree` +- ~15 tests from Go's `TestSubjectTree*` + +### Generic Subject List (`server/gsl/`) +- Optimized trie for subscription matching (alternative to SubList for specific paths) +- Go reference: `server/gsl/gsl.go` +- Port as `NATS.Server.Internal.Gsl.GenericSubjectList` +- ~15 tests from Go's `TestGSL*` + +### Time Hash Wheel (`server/thw/`) +- Efficient TTL expiration using hash wheel (O(1) insert/cancel, O(bucket) tick) +- Used for message expiry in MemStore and FileStore +- Go reference: `server/thw/thw.go` +- Port as `NATS.Server.Internal.TimeHashWheel.TimeHashWheel` +- ~15 tests from Go's `TestTimeHashWheel*` + +**Total tests:** ~60 + +--- + +## Wave 3: FileStore Block Engine + +**Purpose:** Replace the flat JSONL FileStore stub with Go-compatible block-based storage. + +### Design Decisions +- **Behavioral equivalence** — same 64MB block boundaries and semantics, not byte-level Go file compatibility +- **Block format:** Each block is a separate file containing sequential messages with headers +- **Compression:** S2 (Snappy variant) per-block, using IronSnappy or equivalent .NET library +- **Encryption:** AES-GCM per-block (matching Go's encryption support) +- **Recovery:** Block-level recovery on startup (scan for valid messages, rebuild index) + +### Components +1. **Block Manager** — manages block files, rotation at 64MB, compaction +2. **Message Encoding** — per-message header (sequence, timestamp, subject, data length) + payload +3. **Index Layer** — in-memory index mapping sequence → block + offset +4. **Subject Index** — per-subject first/last sequence tracking using SubjectTree (Wave 2) +5. **Purge/Compact** — subject-based purge, sequence-based purge, compaction +6. **Recovery** — startup block scanning, index rebuild + +### Go Reference Files +- `server/filestore.go` — main implementation +- `server/filestore_test.go` — test suite + +**Total tests:** ~80 (store/load, block rotation, compression, encryption, purge, recovery, subject filtering) + +--- + +## Wave 4: RAFT Consensus + +**Purpose:** Faithful behavioral port of Go's RAFT implementation for clustered JetStream. + +### Design Decisions +- **Faithful Go port** — not a third-party RAFT library; port Go's `raft.go` directly +- **Same state machine semantics** — leader election, log replication, snapshots, membership changes +- **Transport abstraction** — pluggable transport (in-process for tests, TCP for production) + +### Components +1. **RAFT Node** — state machine (Follower → Candidate → Leader), term/vote tracking +2. **Log Storage** — append-only log with compaction, backed by FileStore blocks (Wave 3) +3. **Election** — randomized timeout, RequestVote RPC, majority quorum +4. **Log Replication** — AppendEntries RPC, leader → follower catch-up, conflict resolution +5. **Snapshots** — periodic state snapshots, snapshot transfer to lagging followers +6. **Membership Changes** — joint consensus for adding/removing nodes +7. **Transport** — RPC abstraction with in-process and TCP implementations + +### Go Reference Files +- `server/raft.go` — main implementation (5,800 lines) +- `server/raft_test.go` — test suite + +**Total tests:** ~70 (election, log replication, snapshots, membership, split-brain, network partition simulation) + +--- + +## Wave 5: JetStream Clustering + Concurrency + +**Purpose:** Wire RAFT into JetStream for clustered operation; add NORACE concurrency tests. + +### Components +1. **Meta-Controller** — cluster-wide RAFT group for stream/consumer placement + - Ports Go's `jetStreamCluster` struct + - Routes `$JS.API.*` requests through meta-group leader + - Tests from Go's `TestJetStreamClusterCreate`, `TestJetStreamClusterStreamLeaderStepDown` + +2. **Per-Stream RAFT Groups** — each R>1 stream gets its own RAFT group + - Leader accepts publishes, proposes entries, followers apply + - Tests: create R3 stream, publish, verify all replicas, step down, verify new leader + +3. **Per-Consumer RAFT Groups** — consumer ack state replicated via RAFT + - Tests: ack on leader, verify ack floor propagation, consumer failover + +4. **NORACE Concurrency Suite** — Go's `-race`-tagged tests ported to `Task.WhenAll` patterns + - Concurrent pub/sub on same stream + - Concurrent consumer creates + - Concurrent stream purge during publish + +### Go Reference Files +- `server/jetstream_cluster.go`, `server/jetstream_cluster_test.go` +- `server/norace_test.go` + +**Total tests:** ~100 + +--- + +## Wave 6: Remaining Subsystem Test Suites + +**Purpose:** Port remaining Go test functions across all subsystems not covered by Waves 2-5. + +### Subsystems + +| Subsystem | Go Tests | Existing .NET | Gap | Files | +|-----------|----------|---------------|-----|-------| +| Config reload | ~92 | 3 | ~89 | `Configuration/` | +| MQTT bridge | ~123 | 50 | ~73 | `Mqtt/` | +| Leaf nodes | ~110 | 2 | ~108 | `LeafNodes/` | +| Accounts/auth | ~64 | 15 | ~49 | `Accounts/` | +| Gateway | ~87 | 2 | ~85 | `Gateways/` | +| Routes | ~73 | 2 | ~71 | `Routes/` | +| Monitoring | ~45 | 7 | ~38 | `Monitoring/` | +| Client protocol | ~120 | 30 | ~90 | root test dir | +| JetStream API | ~200 | 20 | ~180 | `JetStream/` | + +### Approach +- Each subsystem is an independent parallel subagent task +- Tests organized by .NET namespace matching existing conventions +- Each test file has header comment mapping to Go source test function names +- Self-contained test helpers duplicated per file (no shared TestHelpers) +- Gate verification between subsystem batches + +**Total tests:** ~780-850 + +--- + +## Dependency Graph + +``` +Wave 1 (Scaffolding) ──┬──► Wave 2 (Data Structures) ──► Wave 3 (FileStore) ──► Wave 4 (RAFT) ──► Wave 5 (Clustering) + │ + └──► Wave 6 (Subsystem Suites) [parallel, independent of Waves 2-5] +``` + +Wave 6 subsystems are mutually independent and can execute in parallel. Waves 2-5 are sequential. + +--- + +## Estimated Totals + +| Metric | Value | +|--------|-------| +| New implementation code | ~15,000-20,000 lines | +| New test code | ~12,000-15,000 lines | +| New tests | ~1,160 | +| Final test count | ~2,241 | +| Final Go parity | ~75% of Go test functions | + +## Key Conventions + +- xUnit 3 + Shouldly assertions (never `Assert.*`) +- NSubstitute for mocking +- Go reference comments on each ported test: `// Go: TestFunctionName server/file.go:line` +- Self-contained helpers per test file +- C# 14 idioms: primary constructors, collection expressions, file-scoped namespaces +- TDD: write failing test first, then minimal implementation +- Gated commits between waves