Files
natsdotnet/docs/plans/2026-02-24-full-go-parity-design.md
Joseph Doherty 0b349f8ecf docs: add full Go parity design
Bottom-up layered approach: implementation gaps first (RAFT transport,
JetStream orchestration, FileStore S2/crypto), then test ports across
3 phases targeting ~445 new tests for full Go behavioral parity.
2026-02-24 05:35:13 -05:00

8.9 KiB

Full Go Parity Design

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.

Goal: Close all remaining implementation and test gaps between the Go NATS server reference and the .NET port, achieving full behavioral parity.

Architecture: Bottom-up layered approach — fill implementation gaps first (RAFT transport, JetStream orchestration, FileStore crypto/compression), then port remaining Go tests against working code. Parallel subagents for independent subsystems.

Tech Stack: .NET 10 / C# 14, xUnit 3, Shouldly, IronSnappy (S2), System.Security.Cryptography (ChaCha20Poly1305, AesGcm)


Current State

  • Go server tests: ~3,051 test functions across 104 files (~227,000 lines)
  • .NET server tests: ~2,606 test methods across 325 files
  • Test gap: ~445 tests (85% parity)
  • Implementation gaps: 4 critical (RAFT transport, JetStream orchestration, FileStore crypto, FileStore compression)

Phase 1: Implementation Gap Closure

1A. RAFT Network Transport

Files: src/NATS.Server/Raft/NatsRaftTransport.cs (new), src/NATS.Server/Raft/RaftTransport.cs (keep InMemory)

Add NatsRaftTransport that routes RAFT RPCs over internal NATS subjects ($NRG.<group>.*), matching Go's approach exactly. Keep InMemoryRaftTransport for unit tests.

  • AppendEntries -> publish to $NRG.<group>.AE.<followerId>
  • RequestVote -> request/reply on $NRG.<group>.RV.<voterId>
  • InstallSnapshot -> chunked publish on $NRG.<group>.IS.<followerId>
  • Wire format: binary-encoded RaftLogEntry / VoteRequest / VoteResponse
  • Transport registers internal subscriptions via the server's existing InternalClient

Go reference: server/raft.go lines 100-300 (RAFT group subject handling)

1B. JetStreamService Orchestration

Files: src/NATS.Server/JetStream/JetStreamService.cs (rewrite from stub)

Upgrade the stub to a lifecycle orchestrator:

  • Startup: validate config, initialize store directory, recover stream/consumer metadata from disk, wire internal JS client, register API subscriptions ($JS.API.>)
  • Limits enforcement: account-level stream/consumer/storage limits on create/update/publish paths
  • API dispatch: route $JS.API.* subjects to existing handler classes (JetStreamApiRouter already exists)
  • Shutdown: unsubscribe API subjects, flush pending writes, stop consumers, dispose stores
  • Stats: track API totals, errors, and account usage

Go reference: server/jetstream.go (2,866 lines) and server/jetstream_api.go (5,165 lines)

1C. FileStore S2 Compression + Real Encryption

Files: src/NATS.Server/JetStream/Storage/FileStore.cs (modify), new codec files

Replace current Deflate + XOR with Go-parity algorithms:

  • Compression: S2 codec via IronSnappy NuGet (or equivalent). Keep Deflate as legacy-read fallback.
  • Encryption: ChaCha20Poly1305 (primary, matches Go default) with AesGcm fallback. Use .NET System.Security.Cryptography APIs with IsSupported runtime checks.
  • Envelope v2: Extend current FSV1 header to FSV2:
    • Byte 0-3: magic (FSV2)
    • Byte 4: flags (compression algorithm ID + encryption algorithm ID)
    • Byte 5-8: key hash (SHA256 truncated)
    • Byte 9-20: nonce (12 bytes for ChaCha20/AesGcm)
    • Byte 21-36: auth tag (16 bytes)
    • Byte 37-44: payload hash (8 bytes, SHA256 truncated)
    • Byte 45+: ciphertext
  • Legacy read: FSV1 envelopes read with old XOR/Deflate path. New writes always use FSV2.

Go reference: server/filestore.go lines 2000-2500 (encryption/compression envelope)

1D. Dynamic Route Pooling (DEFERRED)

Per codex consultation, dynamic route pool resizing is deferred. Static pool is sufficient for all current test parity needs. The Go behavior (negotiating pool size during route handshake) can be added in a future phase.

Phase 2: Test Port — High Priority (~460 tests)

2A. JetStream Clustering Tests (~360 tests)

Go sources: jetstream_cluster_1_test.go through jetstream_cluster_4_test.go + jetstream_cluster_long_test.go

Port strategy: Focus on behavioral invariants using .NET-native patterns (xUnit IAsyncLifetime, multi-server fixtures). Group into 5 parallel subagent tasks:

Subagent Focus Tests Go Source
2A-1 Leader election and failover ~80 cluster_1 + cluster_2
2A-2 Stream replication and placement ~100 cluster_1 + cluster_3
2A-3 Consumer replication and delivery ~80 cluster_2 + cluster_4
2A-4 Meta-cluster governance ~60 cluster_3 + cluster_4
2A-5 Advanced scenarios and long-running ~40 cluster_4 + long

Test infrastructure needed: JetStreamClusterFixture — spins up 3-node NATS cluster with JetStream enabled, waits for meta-leader election, provides helpers for WaitOnStreamLeader(), WaitOnConsumerLeader().

2B. JetStream Core Tests (~100 tests)

Go source: jetstream_test.go (312 tests, ~200 already ported)

Focus on the ~100 remaining:

  • Stream lifecycle edge cases (max messages, max bytes, discard old/new policy)
  • Consumer delivery semantics (ack wait timeout, max deliver attempts, exponential backoff)
  • Publish precondition failures (expected stream, expected last sequence, expected last msg ID)
  • Account limit enforcement (max streams, max consumers, max storage)
  • API error codes and response shapes matching Go exactly

2C. FileStore Tests (~100 tests)

Go source: filestore_test.go (232 tests, ~130 already ported)

Focus on tests that exercise the new S2 + crypto implementations:

  • 6-way permutation matrix: {NoCipher, ChaCha20, AesGcm} x {NoCompression, S2}
  • Block rotation under size limits
  • Recovery after simulated crash (truncated last block)
  • Corruption detection (tampered ciphertext, wrong key)
  • Large message handling (>1MB payloads)
  • Subject-filtered state queries

Phase 3: Test Port — Medium and Low Priority (~100 tests)

3A. NoRace/Stress Tests (~50 tests)

Go sources: norace_1_test.go, norace_2_test.go

Port the most critical concurrent-operation tests. Mark with [Trait("Category", "Stress")] for optional execution:

  • Concurrent pub/sub with 100+ clients
  • Route/gateway reconnection under load
  • JetStream publish during cluster failover
  • Consumer redelivery timing under contention

3B. Accounts/Auth Tests (~30 tests)

Go sources: accounts_test.go, auth_callout_test.go

  • Service import/export cross-account delivery
  • Auth callout timeout/retry behavior
  • Account connection/subscription limits
  • User revocation checking

3C. Message Trace Tests (~20 tests)

Go source: msgtrace_test.go

  • Trace header propagation across routes/gateways/leaf nodes
  • Trace event publication on $SYS.TRACE.>
  • Trace filtering by account/subject

3D. Config/Opts/Reload Tests (~20 tests)

Go sources: opts_test.go, reload_test.go

  • CLI override precedence edge cases
  • Config include file resolution
  • Reload signal handling for TLS certificate rotation
  • Account resolver reload semantics

3E. Events Tests (~15 tests)

Go source: events_test.go

  • Server lifecycle events ($SYS.SERVER.*.CONNECT, *.DISCONNECT)
  • Account stats events
  • Advisory messages for slow consumers, auth failures

Execution Strategy

Parallelization

Phase Tasks Parallel Strategy
1A + 1B RAFT transport + JS orchestration Sequential (1B depends on 1A for transport)
1C FileStore crypto/compression Parallel with 1A/1B (independent)
2A JetStream cluster tests 5 parallel subagents (depends on Phase 1)
2B + 2C JS core + FileStore tests 2 parallel subagents (depends on Phase 1)
3A-3E Medium/low priority tests 5 parallel subagents (depends on Phase 2)

Git Workflow

  • Single worktree branch: full-go-parity
  • Gated commits between phases (Phase 1 complete before Phase 2 starts)
  • Merge to main with --no-ff after all phases pass

Success Criteria

  • All existing 2,606 tests continue to pass (no regressions)
  • ~445+ new tests added (target: 3,100+ total)
  • JetStreamService is a real lifecycle orchestrator
  • RAFT transport works over NATS subjects
  • FileStore uses S2 compression and ChaCha20/AesGcm encryption
  • Zero unclassified Go JS API subjects (inventory test passes)

Architectural Decisions

Decision Choice Rationale
RAFT transport NATS internal subjects Matches Go $NRG.* exactly; reuses existing pub/sub
FileStore compression S2 via IronSnappy Go parity; better than Deflate for this workload
FileStore encryption ChaCha20Poly1305 primary Go parity; hardware-independent performance
Route pool sizing Deferred Not a blocker for RAFT/JetStream parity
Test porting style Behavioral (not verbatim) .NET-native patterns; same invariants, idiomatic C#
Stress tests Separate category Don't slow down CI; run as nightly suite