docs: add full Go parity design
Bottom-up layered approach: implementation gaps first (RAFT transport, JetStream orchestration, FileStore S2/crypto), then test ports across 3 phases targeting ~445 new tests for full Go behavioral parity.
This commit is contained in:
197
docs/plans/2026-02-24-full-go-parity-design.md
Normal file
197
docs/plans/2026-02-24-full-go-parity-design.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Full Go Parity Design
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Close all remaining implementation and test gaps between the Go NATS server reference and the .NET port, achieving full behavioral parity.
|
||||
|
||||
**Architecture:** Bottom-up layered approach — fill implementation gaps first (RAFT transport, JetStream orchestration, FileStore crypto/compression), then port remaining Go tests against working code. Parallel subagents for independent subsystems.
|
||||
|
||||
**Tech Stack:** .NET 10 / C# 14, xUnit 3, Shouldly, IronSnappy (S2), System.Security.Cryptography (ChaCha20Poly1305, AesGcm)
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- **Go server tests:** ~3,051 test functions across 104 files (~227,000 lines)
|
||||
- **.NET server tests:** ~2,606 test methods across 325 files
|
||||
- **Test gap:** ~445 tests (85% parity)
|
||||
- **Implementation gaps:** 4 critical (RAFT transport, JetStream orchestration, FileStore crypto, FileStore compression)
|
||||
|
||||
## Phase 1: Implementation Gap Closure
|
||||
|
||||
### 1A. RAFT Network Transport
|
||||
|
||||
**Files:** `src/NATS.Server/Raft/NatsRaftTransport.cs` (new), `src/NATS.Server/Raft/RaftTransport.cs` (keep InMemory)
|
||||
|
||||
Add `NatsRaftTransport` that routes RAFT RPCs over internal NATS subjects (`$NRG.<group>.*`), matching Go's approach exactly. Keep `InMemoryRaftTransport` for unit tests.
|
||||
|
||||
- `AppendEntries` -> publish to `$NRG.<group>.AE.<followerId>`
|
||||
- `RequestVote` -> request/reply on `$NRG.<group>.RV.<voterId>`
|
||||
- `InstallSnapshot` -> chunked publish on `$NRG.<group>.IS.<followerId>`
|
||||
- Wire format: binary-encoded `RaftLogEntry` / `VoteRequest` / `VoteResponse`
|
||||
- Transport registers internal subscriptions via the server's existing `InternalClient`
|
||||
|
||||
**Go reference:** `server/raft.go` lines 100-300 (RAFT group subject handling)
|
||||
|
||||
### 1B. JetStreamService Orchestration
|
||||
|
||||
**Files:** `src/NATS.Server/JetStream/JetStreamService.cs` (rewrite from stub)
|
||||
|
||||
Upgrade the stub to a lifecycle orchestrator:
|
||||
|
||||
- **Startup:** validate config, initialize store directory, recover stream/consumer metadata from disk, wire internal JS client, register API subscriptions (`$JS.API.>`)
|
||||
- **Limits enforcement:** account-level stream/consumer/storage limits on create/update/publish paths
|
||||
- **API dispatch:** route `$JS.API.*` subjects to existing handler classes (JetStreamApiRouter already exists)
|
||||
- **Shutdown:** unsubscribe API subjects, flush pending writes, stop consumers, dispose stores
|
||||
- **Stats:** track API totals, errors, and account usage
|
||||
|
||||
**Go reference:** `server/jetstream.go` (2,866 lines) and `server/jetstream_api.go` (5,165 lines)
|
||||
|
||||
### 1C. FileStore S2 Compression + Real Encryption
|
||||
|
||||
**Files:** `src/NATS.Server/JetStream/Storage/FileStore.cs` (modify), new codec files
|
||||
|
||||
Replace current Deflate + XOR with Go-parity algorithms:
|
||||
|
||||
- **Compression:** S2 codec via IronSnappy NuGet (or equivalent). Keep Deflate as legacy-read fallback.
|
||||
- **Encryption:** `ChaCha20Poly1305` (primary, matches Go default) with `AesGcm` fallback. Use `.NET` `System.Security.Cryptography` APIs with `IsSupported` runtime checks.
|
||||
- **Envelope v2:** Extend current `FSV1` header to `FSV2`:
|
||||
- Byte 0-3: magic (`FSV2`)
|
||||
- Byte 4: flags (compression algorithm ID + encryption algorithm ID)
|
||||
- Byte 5-8: key hash (SHA256 truncated)
|
||||
- Byte 9-20: nonce (12 bytes for ChaCha20/AesGcm)
|
||||
- Byte 21-36: auth tag (16 bytes)
|
||||
- Byte 37-44: payload hash (8 bytes, SHA256 truncated)
|
||||
- Byte 45+: ciphertext
|
||||
- **Legacy read:** `FSV1` envelopes read with old XOR/Deflate path. New writes always use `FSV2`.
|
||||
|
||||
**Go reference:** `server/filestore.go` lines 2000-2500 (encryption/compression envelope)
|
||||
|
||||
### 1D. Dynamic Route Pooling (DEFERRED)
|
||||
|
||||
Per codex consultation, dynamic route pool resizing is deferred. Static pool is sufficient for all current test parity needs. The Go behavior (negotiating pool size during route handshake) can be added in a future phase.
|
||||
|
||||
## Phase 2: Test Port — High Priority (~460 tests)
|
||||
|
||||
### 2A. JetStream Clustering Tests (~360 tests)
|
||||
|
||||
**Go sources:** `jetstream_cluster_1_test.go` through `jetstream_cluster_4_test.go` + `jetstream_cluster_long_test.go`
|
||||
|
||||
Port strategy: Focus on behavioral invariants using .NET-native patterns (xUnit `IAsyncLifetime`, multi-server fixtures). Group into 5 parallel subagent tasks:
|
||||
|
||||
| Subagent | Focus | Tests | Go Source |
|
||||
|----------|-------|-------|-----------|
|
||||
| 2A-1 | Leader election and failover | ~80 | cluster_1 + cluster_2 |
|
||||
| 2A-2 | Stream replication and placement | ~100 | cluster_1 + cluster_3 |
|
||||
| 2A-3 | Consumer replication and delivery | ~80 | cluster_2 + cluster_4 |
|
||||
| 2A-4 | Meta-cluster governance | ~60 | cluster_3 + cluster_4 |
|
||||
| 2A-5 | Advanced scenarios and long-running | ~40 | cluster_4 + long |
|
||||
|
||||
**Test infrastructure needed:** `JetStreamClusterFixture` — spins up 3-node NATS cluster with JetStream enabled, waits for meta-leader election, provides helpers for `WaitOnStreamLeader()`, `WaitOnConsumerLeader()`.
|
||||
|
||||
### 2B. JetStream Core Tests (~100 tests)
|
||||
|
||||
**Go source:** `jetstream_test.go` (312 tests, ~200 already ported)
|
||||
|
||||
Focus on the ~100 remaining:
|
||||
- Stream lifecycle edge cases (max messages, max bytes, discard old/new policy)
|
||||
- Consumer delivery semantics (ack wait timeout, max deliver attempts, exponential backoff)
|
||||
- Publish precondition failures (expected stream, expected last sequence, expected last msg ID)
|
||||
- Account limit enforcement (max streams, max consumers, max storage)
|
||||
- API error codes and response shapes matching Go exactly
|
||||
|
||||
### 2C. FileStore Tests (~100 tests)
|
||||
|
||||
**Go source:** `filestore_test.go` (232 tests, ~130 already ported)
|
||||
|
||||
Focus on tests that exercise the new S2 + crypto implementations:
|
||||
- 6-way permutation matrix: `{NoCipher, ChaCha20, AesGcm} x {NoCompression, S2}`
|
||||
- Block rotation under size limits
|
||||
- Recovery after simulated crash (truncated last block)
|
||||
- Corruption detection (tampered ciphertext, wrong key)
|
||||
- Large message handling (>1MB payloads)
|
||||
- Subject-filtered state queries
|
||||
|
||||
## Phase 3: Test Port — Medium and Low Priority (~100 tests)
|
||||
|
||||
### 3A. NoRace/Stress Tests (~50 tests)
|
||||
|
||||
**Go sources:** `norace_1_test.go`, `norace_2_test.go`
|
||||
|
||||
Port the most critical concurrent-operation tests. Mark with `[Trait("Category", "Stress")]` for optional execution:
|
||||
- Concurrent pub/sub with 100+ clients
|
||||
- Route/gateway reconnection under load
|
||||
- JetStream publish during cluster failover
|
||||
- Consumer redelivery timing under contention
|
||||
|
||||
### 3B. Accounts/Auth Tests (~30 tests)
|
||||
|
||||
**Go sources:** `accounts_test.go`, `auth_callout_test.go`
|
||||
|
||||
- Service import/export cross-account delivery
|
||||
- Auth callout timeout/retry behavior
|
||||
- Account connection/subscription limits
|
||||
- User revocation checking
|
||||
|
||||
### 3C. Message Trace Tests (~20 tests)
|
||||
|
||||
**Go source:** `msgtrace_test.go`
|
||||
|
||||
- Trace header propagation across routes/gateways/leaf nodes
|
||||
- Trace event publication on `$SYS.TRACE.>`
|
||||
- Trace filtering by account/subject
|
||||
|
||||
### 3D. Config/Opts/Reload Tests (~20 tests)
|
||||
|
||||
**Go sources:** `opts_test.go`, `reload_test.go`
|
||||
|
||||
- CLI override precedence edge cases
|
||||
- Config include file resolution
|
||||
- Reload signal handling for TLS certificate rotation
|
||||
- Account resolver reload semantics
|
||||
|
||||
### 3E. Events Tests (~15 tests)
|
||||
|
||||
**Go source:** `events_test.go`
|
||||
|
||||
- Server lifecycle events (`$SYS.SERVER.*.CONNECT`, `*.DISCONNECT`)
|
||||
- Account stats events
|
||||
- Advisory messages for slow consumers, auth failures
|
||||
|
||||
## Execution Strategy
|
||||
|
||||
### Parallelization
|
||||
|
||||
| Phase | Tasks | Parallel Strategy |
|
||||
|-------|-------|-------------------|
|
||||
| 1A + 1B | RAFT transport + JS orchestration | Sequential (1B depends on 1A for transport) |
|
||||
| 1C | FileStore crypto/compression | Parallel with 1A/1B (independent) |
|
||||
| 2A | JetStream cluster tests | 5 parallel subagents (depends on Phase 1) |
|
||||
| 2B + 2C | JS core + FileStore tests | 2 parallel subagents (depends on Phase 1) |
|
||||
| 3A-3E | Medium/low priority tests | 5 parallel subagents (depends on Phase 2) |
|
||||
|
||||
### Git Workflow
|
||||
|
||||
- Single worktree branch: `full-go-parity`
|
||||
- Gated commits between phases (Phase 1 complete before Phase 2 starts)
|
||||
- Merge to main with `--no-ff` after all phases pass
|
||||
|
||||
### Success Criteria
|
||||
|
||||
- All existing 2,606 tests continue to pass (no regressions)
|
||||
- ~445+ new tests added (target: 3,100+ total)
|
||||
- JetStreamService is a real lifecycle orchestrator
|
||||
- RAFT transport works over NATS subjects
|
||||
- FileStore uses S2 compression and ChaCha20/AesGcm encryption
|
||||
- Zero unclassified Go JS API subjects (inventory test passes)
|
||||
|
||||
## Architectural Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| RAFT transport | NATS internal subjects | Matches Go `$NRG.*` exactly; reuses existing pub/sub |
|
||||
| FileStore compression | S2 via IronSnappy | Go parity; better than Deflate for this workload |
|
||||
| FileStore encryption | ChaCha20Poly1305 primary | Go parity; hardware-independent performance |
|
||||
| Route pool sizing | Deferred | Not a blocker for RAFT/JetStream parity |
|
||||
| Test porting style | Behavioral (not verbatim) | .NET-native patterns; same invariants, idiomatic C# |
|
||||
| Stress tests | Separate category | Don't slow down CI; run as nightly suite |
|
||||
Reference in New Issue
Block a user