5-track parallel architecture (Storage, Consensus, Protocol, Networking, Services) covering all CRITICAL/HIGH/MEDIUM gaps identified in structuregaps.md. Feature-first approach with test_parity.db updates. Targets ~1,194 additional Go test mappings.
13 KiB
Full Go Parity: All 15 Structure Gaps — Design Document
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:writing-plans to create the implementation plan from this design.
Goal: Port all missing functionality identified in docs/structuregaps.md (15 gaps, CRITICAL through MEDIUM) from the Go NATS server to the .NET port, achieving full behavioral parity. Update docs/test_parity.db as each Go test is ported.
Architecture: 5 parallel implementation tracks organized by dependency. Tracks A (Storage), D (Networking), and E (Services) are independent and start immediately. Track C (Protocol) depends on Track A. Track B (Consensus) depends on Tracks A + C. Each track builds features first, then ports corresponding Go tests.
Approach: Feature-first — build each missing feature, then port its Go tests as validation. Bottom-up dependency ordering ensures foundations are solid before integration.
Estimated impact: ~1,194 additional Go tests mapped (859 → ~2,053, from 29% to ~70%).
Track A: Storage (FileStore Block Management)
Gap #1 — CRITICAL, 20.8x gap factor
Go: filestore.go (12,593 lines) | NET: FileStore.cs (607 lines)
Dependencies: None (starts immediately)
Tests to port: ~159 from filestore_test.go
Features
- Message Blocks — 65KB+ blocks with per-block index files. Header: magic, version, first/last sequence, message count, byte count. New block on size limit.
- Block Index — Per-block index mapping sequence number → (offset, length). Enables O(1) lookups.
- S2 Compression Integration — Wire existing
S2Codec.csinto block writes (compress on flush) and reads (decompress on load). - AEAD Encryption Integration — Wire existing
AeadEncryptor.csinto block lifecycle. Per-block encryption keys with rotation on seal. - Crash Recovery — Scan block directory on startup, validate checksums, rebuild indexes from raw data.
- Tombstone/Deletion Tracking — Sparse sequence sets (using
SequenceSet.cs) for deleted messages. Purge by subject and sequence range. - Write Cache — In-memory buffer for active (unsealed) block. Configurable max block count.
- Atomic File Operations — Write-to-temp + rename for crash-safe block sealing.
- TTL Scheduling Recovery — Reconstruct pending TTL expirations from blocks on restart, register with
HashWheel.
Key Files
| Action | File | Notes |
|---|---|---|
| Rewrite | src/NATS.Server/JetStream/Storage/FileStore.cs |
Block-based architecture |
| Create | src/NATS.Server/JetStream/Storage/MessageBlock.cs |
Block abstraction |
| Create | src/NATS.Server/JetStream/Storage/BlockIndex.cs |
Per-block index |
| Modify | src/NATS.Server/JetStream/Storage/S2Codec.cs |
Wire into block lifecycle |
| Modify | src/NATS.Server/JetStream/Storage/AeadEncryptor.cs |
Per-block key management |
| Tests | tests/.../JetStream/Storage/FileStoreBlockTests.cs |
New + expanded |
Track B: Consensus (RAFT + JetStream Cluster)
Gap #8 — MEDIUM, 4.4x | Gap #2 — CRITICAL, 213x
Go: raft.go (5,037) + jetstream_cluster.go (10,887) | NET: 1,136 + 51 lines
Dependencies: Tracks A and C must merge first
Tests to port: ~85 raft + ~358 cluster + ~47 super-cluster = ~490
Phase B1: RAFT Enhancements
- InstallSnapshot — Chunked streaming snapshot transfer. Follower receives chunks, applies partial state, catches up from log.
- Membership Changes —
ProposeAddPeer/ProposeRemovePeerwith single-server changes (matching Go's approach). - Pre-vote Protocol — Candidate must get pre-vote approval before incrementing term. Prevents disruptive elections from partitioned nodes.
- Log Compaction — Truncate RAFT log after snapshot. Track last applied index.
- Healthy Node Classification — Current/catching-up/leaderless states.
- Campaign Timeout Management — Randomized election delays.
Phase B2: JetStream Cluster Coordination
- Assignment Tracking —
StreamAssignmentandConsumerAssignmenttypes via RAFT proposals. Records: stream config, replica group, placement constraints. - RAFT Proposal Workflow — Leader validates → proposes to meta-group → on commit, all nodes apply → assigned nodes start stream/consumer.
- Placement Engine — Unique nodes, tag matching, cluster affinity. Expands
AssetPlacementPlanner.cs. - Inflight Deduplication — Track pending proposals to prevent duplicates during leader transitions.
- Peer Remove & Stream Move — Data rebalancing when a peer is removed.
- Step-down & Leadership Transfer — Graceful leader handoff.
- Per-Stream RAFT Groups — Separate RAFT group per stream for message replication.
Key Files
| Action | File | Notes |
|---|---|---|
| Modify | src/NATS.Server/Raft/RaftNode.cs |
Snapshot, membership, pre-vote, compaction |
| Create | src/NATS.Server/Raft/RaftSnapshot.cs |
Streaming snapshot |
| Create | src/NATS.Server/Raft/RaftMembership.cs |
Peer add/remove |
| Rewrite | src/.../JetStream/Cluster/JetStreamMetaGroup.cs |
51 → ~2,000+ lines |
| Create | src/.../JetStream/Cluster/StreamAssignment.cs |
Assignment type |
| Create | src/.../JetStream/Cluster/ConsumerAssignment.cs |
Assignment type |
| Create | src/.../JetStream/Cluster/PlacementEngine.cs |
Topology-aware placement |
| Modify | src/.../JetStream/Cluster/StreamReplicaGroup.cs |
Coordination logic |
Track C: Protocol (Client, Consumer, JetStream API, Mirrors/Sources)
Gaps #5, #4, #7, #3 — all HIGH Dependencies: C1/C3 independent; C2 needs C3; C4 needs Track A Tests to port: ~43 client + ~134 consumer + ~184 jetstream + ~30 mirror = ~391
C1: Client Protocol Handling (Gap #5, 7.3x)
- Adaptive read buffer tuning (512→65536 based on throughput)
- Write buffer pooling with flush coalescing
- Per-client trace level
- Full CLIENT/ROUTER/GATEWAY/LEAF/SYSTEM protocol dispatch
- Slow consumer detection and eviction
- Max control line enforcement (4096 bytes)
- Write timeout with partial flush recovery
C2: Consumer Delivery Engines (Gap #4, 13.3x)
- NAK and redelivery tracking with exponential backoff schedules
- Pending request queue for pull consumers with flow control
- Max-deliveries enforcement (drop/reject/dead-letter)
- Priority group pinning (sticky consumer assignment)
- Idle heartbeat generation
- Pause/resume state with advisory events
- Filter subject skip tracking
- Per-message redelivery delay arrays (backoff schedules)
C3: JetStream API Layer (Gap #7, 7.7x)
- Leader forwarding for non-leader API requests
- Stream/consumer info caching with generation invalidation
- Snapshot/restore API endpoints
- Purge with subject filter, keep-N, sequence-based
- Consumer pause/resume API
- Advisory event publication for API operations
- Account resource tracking (storage, streams, consumers)
C4: Stream Mirrors, Sources & Transforms (Gap #3, 16.3x)
- Mirror synchronization loop (continuous pull, apply locally)
- Source/mirror ephemeral consumer setup with position tracking
- Retry with exponential backoff and jitter
- Deduplication window (
Nats-Msg-Idheader tracking) - Purge operations (subject filter, sequence-based, keep-N)
- Stream snapshot and restore
Key Files
| Action | File | Notes |
|---|---|---|
| Modify | NatsClient.cs |
Adaptive buffers, slow consumer, trace |
| Modify | PushConsumerEngine.cs |
Major expansion |
| Modify | PullConsumerEngine.cs |
Major expansion |
| Create | .../Consumers/RedeliveryTracker.cs |
NAK/redelivery state |
| Create | .../Consumers/PriorityGroupManager.cs |
Priority pinning |
| Modify | JetStreamApiRouter.cs |
Leader forwarding |
| Modify | StreamApiHandlers.cs |
Purge, snapshot |
| Modify | ConsumerApiHandlers.cs |
Pause/resume |
| Rewrite | MirrorCoordinator.cs |
22 → ~500+ lines |
| Rewrite | SourceCoordinator.cs |
36 → ~500+ lines |
Track D: Networking (Gateway, Leaf Node, Routes)
Gaps #11, #12, #13 — all MEDIUM Dependencies: None (starts immediately) Tests to port: ~61 gateway + ~59 leafnode + ~39 routes = ~159
D1: Gateway Bridging (Gap #11, 6.7x)
- Interest-only mode (flood → interest switch)
- Account-specific gateway routes
- Reply mapper expansion (
_GR_.prefix) - Outbound connection pooling (default 3)
- Gateway TLS mutual auth
- Message trace through gateways
- Reconnection with exponential backoff
D2: Leaf Node Connections (Gap #12, 6.7x)
- Solicited leaf connection management with retry/reconnect
- Hub-spoke subject filtering
- JetStream domain awareness
- Account-scoped leaf connections
- Leaf compression negotiation (S2)
- Dynamic subscription interest updates
- Loop detection refinement (
$LDS.prefix)
D3: Route Clustering (Gap #13, 5.7x)
- Route pooling (configurable, default 3)
- Account-specific dedicated routes
- Route compression (wire
RouteCompressionCodec.cs) - Solicited route connections with discovery
- Route permission enforcement
- Dynamic route add/remove without restart
- Gossip-based topology discovery
Key Files
| Action | File | Notes |
|---|---|---|
| Modify | GatewayManager.cs, GatewayConnection.cs |
Interest-only, pooling |
| Create | GatewayInterestTracker.cs |
Interest tracking |
| Modify | ReplyMapper.cs |
Full _GR_. handling |
| Modify | LeafNodeManager.cs, LeafConnection.cs |
Solicited, JetStream |
| Modify | LeafLoopDetector.cs |
$LDS. refinement |
| Modify | RouteManager.cs, RouteConnection.cs |
Pooling, permissions |
| Create | RoutePool.cs |
Connection pool |
| Modify | RouteCompressionCodec.cs |
Wire into connection |
Track E: Services (MQTT, Accounts, Config, WebSocket, Monitoring)
Gaps #6 (HIGH), #9, #10, #14, #15 (MEDIUM) Dependencies: None (starts immediately) Tests to port: ~59 mqtt + ~34 accounts + ~105 config + ~53 websocket + ~118 monitoring = ~369
E1: MQTT Protocol (Gap #6, 10.9x)
- Session persistence with JetStream-backed ClientID mapping
- Will message handling
- QoS 1/2 tracking with packet ID mapping and retry
- Retained messages (per-account JetStream stream)
- MQTT wildcard translation (
+→*,#→>,/→.) - Session flapper detection with backoff
- MaxAckPending enforcement
- CONNECT packet validation and version negotiation
E2: Account Management (Gap #9, 13x)
- Service/stream export whitelist enforcement
- Service import with weighted destination selection
- Cycle detection for import chains
- Response tracking (request-reply latency)
- Account-level JetStream limits
- Client tracking per account with eviction
- Weighted subject mappings for traffic shaping
- System account with
$SYS.>handling
E3: Configuration & Hot Reload (Gap #14, 2.7x)
- SIGHUP signal handling (
PosixSignalRegistration) - Auth change propagation (disconnect invalidated clients)
- TLS certificate reloading for rotation
- JetStream config changes at runtime
- Logger reconfiguration without restart
- Account list updates with connection cleanup
E4: WebSocket Support (Gap #15, 1.3x)
- WebSocket-specific TLS configuration
- Origin checking refinement
permessage-deflatecompression negotiation- JWT auth through WebSocket upgrade
E5: Monitoring & Events (Gap #10, 3.5x)
- Full system event payloads (connect/disconnect/auth)
- Message trace propagation through full pipeline
- Closed connection tracking (ring buffer for
/connz) - Account-scoped monitoring (
/connz?acc=ACCOUNT) - Sort options for monitoring endpoints
Key Files
| Action | File | Notes |
|---|---|---|
| Modify | All Mqtt/ files |
Major expansion |
| Modify | Account.cs, AuthService.cs |
Import/export, limits |
| Create | AccountImportExport.cs |
Import/export logic |
| Create | AccountLimits.cs |
Per-account JetStream limits |
| Modify | ConfigReloader.cs |
Signal handling, auth propagation |
| Modify | WebSocket/ files |
TLS, compression, JWT |
| Modify | Monitoring handlers | Events, trace, connz |
| Modify | MessageTraceContext.cs |
22 → ~200+ lines |
DB Update Protocol
For every Go test ported:
UPDATE go_tests
SET status='mapped',
dotnet_test='<DotNetTestMethodName>',
dotnet_file='<DotNetTestFile.cs>',
notes='Ported from <GoFunctionName> in <go_file>:<line>'
WHERE go_file='<go_test_file>' AND go_test='<GoTestName>';
For Go tests that cannot be ported (e.g., signal_test.go on .NET):
UPDATE go_tests
SET status='not_applicable',
notes='<reason: e.g., Unix signal handling not applicable to .NET>'
WHERE go_file='<go_test_file>' AND go_test='<GoTestName>';
Batch DB updates at the end of each sub-phase to avoid per-test overhead.
Execution Order
Week 1-2: Tracks A, D, E start in parallel (3 worktrees)
Week 2-3: Track C starts (after Track A merges for C4)
Week 3-4: Track B starts (after Tracks A + C merge)
Merge order: A → D → E → C → B → main
Task Dependencies
| Task | Track | Blocked By |
|---|---|---|
| #3 | A: Storage | (none) |
| #6 | D: Networking | (none) |
| #7 | E: Services | (none) |
| #5 | C: Protocol | #3 (for C4 mirrors) |
| #4 | B: Consensus | #3, #5 |
Success Criteria
- All 15 gaps from
structuregaps.mdaddressed - ~1,194 additional Go tests mapped in
test_parity.db - Mapped ratio: 29% → ~70%
- All new tests passing (
dotnet testgreen) - Feature-first: each feature validated by its corresponding Go tests