From f1e42f1b5f71930f3a5def4e278d1f6decf2932f Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Tue, 24 Feb 2026 11:57:15 -0500 Subject: [PATCH] docs: add full Go parity design for all 15 structure gaps 5-track parallel architecture (Storage, Consensus, Protocol, Networking, Services) covering all CRITICAL/HIGH/MEDIUM gaps identified in structuregaps.md. Feature-first approach with test_parity.db updates. Targets ~1,194 additional Go test mappings. --- ...-02-24-structuregaps-full-parity-design.md | 321 ++++++++++++++++++ 1 file changed, 321 insertions(+) create mode 100644 docs/plans/2026-02-24-structuregaps-full-parity-design.md diff --git a/docs/plans/2026-02-24-structuregaps-full-parity-design.md b/docs/plans/2026-02-24-structuregaps-full-parity-design.md new file mode 100644 index 0000000..142d3cd --- /dev/null +++ b/docs/plans/2026-02-24-structuregaps-full-parity-design.md @@ -0,0 +1,321 @@ +# Full Go Parity: All 15 Structure Gaps — Design Document + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:writing-plans to create the implementation plan from this design. + +**Goal:** Port all missing functionality identified in `docs/structuregaps.md` (15 gaps, CRITICAL through MEDIUM) from the Go NATS server to the .NET port, achieving full behavioral parity. Update `docs/test_parity.db` as each Go test is ported. + +**Architecture:** 5 parallel implementation tracks organized by dependency. Tracks A (Storage), D (Networking), and E (Services) are independent and start immediately. Track C (Protocol) depends on Track A. Track B (Consensus) depends on Tracks A + C. Each track builds features first, then ports corresponding Go tests. + +**Approach:** Feature-first — build each missing feature, then port its Go tests as validation. Bottom-up dependency ordering ensures foundations are solid before integration. + +**Estimated impact:** ~1,194 additional Go tests mapped (859 → ~2,053, from 29% to ~70%). + +--- + +## Track A: Storage (FileStore Block Management) + +**Gap #1 — CRITICAL, 20.8x gap factor** +**Go**: `filestore.go` (12,593 lines) | **NET**: `FileStore.cs` (607 lines) +**Dependencies**: None (starts immediately) +**Tests to port**: ~159 from `filestore_test.go` + +### Features + +1. **Message Blocks** — 65KB+ blocks with per-block index files. Header: magic, version, first/last sequence, message count, byte count. New block on size limit. +2. **Block Index** — Per-block index mapping sequence number → (offset, length). Enables O(1) lookups. +3. **S2 Compression Integration** — Wire existing `S2Codec.cs` into block writes (compress on flush) and reads (decompress on load). +4. **AEAD Encryption Integration** — Wire existing `AeadEncryptor.cs` into block lifecycle. Per-block encryption keys with rotation on seal. +5. **Crash Recovery** — Scan block directory on startup, validate checksums, rebuild indexes from raw data. +6. **Tombstone/Deletion Tracking** — Sparse sequence sets (using `SequenceSet.cs`) for deleted messages. Purge by subject and sequence range. +7. **Write Cache** — In-memory buffer for active (unsealed) block. Configurable max block count. +8. **Atomic File Operations** — Write-to-temp + rename for crash-safe block sealing. +9. **TTL Scheduling Recovery** — Reconstruct pending TTL expirations from blocks on restart, register with `HashWheel`. + +### Key Files + +| Action | File | Notes | +|--------|------|-------| +| Rewrite | `src/NATS.Server/JetStream/Storage/FileStore.cs` | Block-based architecture | +| Create | `src/NATS.Server/JetStream/Storage/MessageBlock.cs` | Block abstraction | +| Create | `src/NATS.Server/JetStream/Storage/BlockIndex.cs` | Per-block index | +| Modify | `src/NATS.Server/JetStream/Storage/S2Codec.cs` | Wire into block lifecycle | +| Modify | `src/NATS.Server/JetStream/Storage/AeadEncryptor.cs` | Per-block key management | +| Tests | `tests/.../JetStream/Storage/FileStoreBlockTests.cs` | New + expanded | + +--- + +## Track B: Consensus (RAFT + JetStream Cluster) + +**Gap #8 — MEDIUM, 4.4x | Gap #2 — CRITICAL, 213x** +**Go**: `raft.go` (5,037) + `jetstream_cluster.go` (10,887) | **NET**: 1,136 + 51 lines +**Dependencies**: Tracks A and C must merge first +**Tests to port**: ~85 raft + ~358 cluster + ~47 super-cluster = ~490 + +### Phase B1: RAFT Enhancements + +1. **InstallSnapshot** — Chunked streaming snapshot transfer. Follower receives chunks, applies partial state, catches up from log. +2. **Membership Changes** — `ProposeAddPeer`/`ProposeRemovePeer` with single-server changes (matching Go's approach). +3. **Pre-vote Protocol** — Candidate must get pre-vote approval before incrementing term. Prevents disruptive elections from partitioned nodes. +4. **Log Compaction** — Truncate RAFT log after snapshot. Track last applied index. +5. **Healthy Node Classification** — Current/catching-up/leaderless states. +6. **Campaign Timeout Management** — Randomized election delays. + +### Phase B2: JetStream Cluster Coordination + +1. **Assignment Tracking** — `StreamAssignment` and `ConsumerAssignment` types via RAFT proposals. Records: stream config, replica group, placement constraints. +2. **RAFT Proposal Workflow** — Leader validates → proposes to meta-group → on commit, all nodes apply → assigned nodes start stream/consumer. +3. **Placement Engine** — Unique nodes, tag matching, cluster affinity. Expands `AssetPlacementPlanner.cs`. +4. **Inflight Deduplication** — Track pending proposals to prevent duplicates during leader transitions. +5. **Peer Remove & Stream Move** — Data rebalancing when a peer is removed. +6. **Step-down & Leadership Transfer** — Graceful leader handoff. +7. **Per-Stream RAFT Groups** — Separate RAFT group per stream for message replication. + +### Key Files + +| Action | File | Notes | +|--------|------|-------| +| Modify | `src/NATS.Server/Raft/RaftNode.cs` | Snapshot, membership, pre-vote, compaction | +| Create | `src/NATS.Server/Raft/RaftSnapshot.cs` | Streaming snapshot | +| Create | `src/NATS.Server/Raft/RaftMembership.cs` | Peer add/remove | +| Rewrite | `src/.../JetStream/Cluster/JetStreamMetaGroup.cs` | 51 → ~2,000+ lines | +| Create | `src/.../JetStream/Cluster/StreamAssignment.cs` | Assignment type | +| Create | `src/.../JetStream/Cluster/ConsumerAssignment.cs` | Assignment type | +| Create | `src/.../JetStream/Cluster/PlacementEngine.cs` | Topology-aware placement | +| Modify | `src/.../JetStream/Cluster/StreamReplicaGroup.cs` | Coordination logic | + +--- + +## Track C: Protocol (Client, Consumer, JetStream API, Mirrors/Sources) + +**Gaps #5, #4, #7, #3 — all HIGH** +**Dependencies**: C1/C3 independent; C2 needs C3; C4 needs Track A +**Tests to port**: ~43 client + ~134 consumer + ~184 jetstream + ~30 mirror = ~391 + +### C1: Client Protocol Handling (Gap #5, 7.3x) + +1. Adaptive read buffer tuning (512→65536 based on throughput) +2. Write buffer pooling with flush coalescing +3. Per-client trace level +4. Full CLIENT/ROUTER/GATEWAY/LEAF/SYSTEM protocol dispatch +5. Slow consumer detection and eviction +6. Max control line enforcement (4096 bytes) +7. Write timeout with partial flush recovery + +### C2: Consumer Delivery Engines (Gap #4, 13.3x) + +1. NAK and redelivery tracking with exponential backoff schedules +2. Pending request queue for pull consumers with flow control +3. Max-deliveries enforcement (drop/reject/dead-letter) +4. Priority group pinning (sticky consumer assignment) +5. Idle heartbeat generation +6. Pause/resume state with advisory events +7. Filter subject skip tracking +8. Per-message redelivery delay arrays (backoff schedules) + +### C3: JetStream API Layer (Gap #7, 7.7x) + +1. Leader forwarding for non-leader API requests +2. Stream/consumer info caching with generation invalidation +3. Snapshot/restore API endpoints +4. Purge with subject filter, keep-N, sequence-based +5. Consumer pause/resume API +6. Advisory event publication for API operations +7. Account resource tracking (storage, streams, consumers) + +### C4: Stream Mirrors, Sources & Transforms (Gap #3, 16.3x) + +1. Mirror synchronization loop (continuous pull, apply locally) +2. Source/mirror ephemeral consumer setup with position tracking +3. Retry with exponential backoff and jitter +4. Deduplication window (`Nats-Msg-Id` header tracking) +5. Purge operations (subject filter, sequence-based, keep-N) +6. Stream snapshot and restore + +### Key Files + +| Action | File | Notes | +|--------|------|-------| +| Modify | `NatsClient.cs` | Adaptive buffers, slow consumer, trace | +| Modify | `PushConsumerEngine.cs` | Major expansion | +| Modify | `PullConsumerEngine.cs` | Major expansion | +| Create | `.../Consumers/RedeliveryTracker.cs` | NAK/redelivery state | +| Create | `.../Consumers/PriorityGroupManager.cs` | Priority pinning | +| Modify | `JetStreamApiRouter.cs` | Leader forwarding | +| Modify | `StreamApiHandlers.cs` | Purge, snapshot | +| Modify | `ConsumerApiHandlers.cs` | Pause/resume | +| Rewrite | `MirrorCoordinator.cs` | 22 → ~500+ lines | +| Rewrite | `SourceCoordinator.cs` | 36 → ~500+ lines | + +--- + +## Track D: Networking (Gateway, Leaf Node, Routes) + +**Gaps #11, #12, #13 — all MEDIUM** +**Dependencies**: None (starts immediately) +**Tests to port**: ~61 gateway + ~59 leafnode + ~39 routes = ~159 + +### D1: Gateway Bridging (Gap #11, 6.7x) + +1. Interest-only mode (flood → interest switch) +2. Account-specific gateway routes +3. Reply mapper expansion (`_GR_.` prefix) +4. Outbound connection pooling (default 3) +5. Gateway TLS mutual auth +6. Message trace through gateways +7. Reconnection with exponential backoff + +### D2: Leaf Node Connections (Gap #12, 6.7x) + +1. Solicited leaf connection management with retry/reconnect +2. Hub-spoke subject filtering +3. JetStream domain awareness +4. Account-scoped leaf connections +5. Leaf compression negotiation (S2) +6. Dynamic subscription interest updates +7. Loop detection refinement (`$LDS.` prefix) + +### D3: Route Clustering (Gap #13, 5.7x) + +1. Route pooling (configurable, default 3) +2. Account-specific dedicated routes +3. Route compression (wire `RouteCompressionCodec.cs`) +4. Solicited route connections with discovery +5. Route permission enforcement +6. Dynamic route add/remove without restart +7. Gossip-based topology discovery + +### Key Files + +| Action | File | Notes | +|--------|------|-------| +| Modify | `GatewayManager.cs`, `GatewayConnection.cs` | Interest-only, pooling | +| Create | `GatewayInterestTracker.cs` | Interest tracking | +| Modify | `ReplyMapper.cs` | Full `_GR_.` handling | +| Modify | `LeafNodeManager.cs`, `LeafConnection.cs` | Solicited, JetStream | +| Modify | `LeafLoopDetector.cs` | `$LDS.` refinement | +| Modify | `RouteManager.cs`, `RouteConnection.cs` | Pooling, permissions | +| Create | `RoutePool.cs` | Connection pool | +| Modify | `RouteCompressionCodec.cs` | Wire into connection | + +--- + +## Track E: Services (MQTT, Accounts, Config, WebSocket, Monitoring) + +**Gaps #6 (HIGH), #9, #10, #14, #15 (MEDIUM)** +**Dependencies**: None (starts immediately) +**Tests to port**: ~59 mqtt + ~34 accounts + ~105 config + ~53 websocket + ~118 monitoring = ~369 + +### E1: MQTT Protocol (Gap #6, 10.9x) + +1. Session persistence with JetStream-backed ClientID mapping +2. Will message handling +3. QoS 1/2 tracking with packet ID mapping and retry +4. Retained messages (per-account JetStream stream) +5. MQTT wildcard translation (`+`→`*`, `#`→`>`, `/`→`.`) +6. Session flapper detection with backoff +7. MaxAckPending enforcement +8. CONNECT packet validation and version negotiation + +### E2: Account Management (Gap #9, 13x) + +1. Service/stream export whitelist enforcement +2. Service import with weighted destination selection +3. Cycle detection for import chains +4. Response tracking (request-reply latency) +5. Account-level JetStream limits +6. Client tracking per account with eviction +7. Weighted subject mappings for traffic shaping +8. System account with `$SYS.>` handling + +### E3: Configuration & Hot Reload (Gap #14, 2.7x) + +1. SIGHUP signal handling (`PosixSignalRegistration`) +2. Auth change propagation (disconnect invalidated clients) +3. TLS certificate reloading for rotation +4. JetStream config changes at runtime +5. Logger reconfiguration without restart +6. Account list updates with connection cleanup + +### E4: WebSocket Support (Gap #15, 1.3x) + +1. WebSocket-specific TLS configuration +2. Origin checking refinement +3. `permessage-deflate` compression negotiation +4. JWT auth through WebSocket upgrade + +### E5: Monitoring & Events (Gap #10, 3.5x) + +1. Full system event payloads (connect/disconnect/auth) +2. Message trace propagation through full pipeline +3. Closed connection tracking (ring buffer for `/connz`) +4. Account-scoped monitoring (`/connz?acc=ACCOUNT`) +5. Sort options for monitoring endpoints + +### Key Files + +| Action | File | Notes | +|--------|------|-------| +| Modify | All `Mqtt/` files | Major expansion | +| Modify | `Account.cs`, `AuthService.cs` | Import/export, limits | +| Create | `AccountImportExport.cs` | Import/export logic | +| Create | `AccountLimits.cs` | Per-account JetStream limits | +| Modify | `ConfigReloader.cs` | Signal handling, auth propagation | +| Modify | `WebSocket/` files | TLS, compression, JWT | +| Modify | Monitoring handlers | Events, trace, connz | +| Modify | `MessageTraceContext.cs` | 22 → ~200+ lines | + +--- + +## DB Update Protocol + +For every Go test ported: + +```sql +UPDATE go_tests +SET status='mapped', + dotnet_test='', + dotnet_file='', + notes='Ported from in :' +WHERE go_file='' AND go_test=''; +``` + +For Go tests that cannot be ported (e.g., `signal_test.go` on .NET): + +```sql +UPDATE go_tests +SET status='not_applicable', + notes='' +WHERE go_file='' AND go_test=''; +``` + +Batch DB updates at the end of each sub-phase to avoid per-test overhead. + +--- + +## Execution Order + +``` +Week 1-2: Tracks A, D, E start in parallel (3 worktrees) +Week 2-3: Track C starts (after Track A merges for C4) +Week 3-4: Track B starts (after Tracks A + C merge) + +Merge order: A → D → E → C → B → main +``` + +## Task Dependencies + +| Task | Track | Blocked By | +|------|-------|------------| +| #3 | A: Storage | (none) | +| #6 | D: Networking | (none) | +| #7 | E: Services | (none) | +| #5 | C: Protocol | #3 (for C4 mirrors) | +| #4 | B: Consensus | #3, #5 | + +## Success Criteria + +- All 15 gaps from `structuregaps.md` addressed +- ~1,194 additional Go tests mapped in `test_parity.db` +- Mapped ratio: 29% → ~70% +- All new tests passing (`dotnet test` green) +- Feature-first: each feature validated by its corresponding Go tests