Bottom-up dependency ordering: FileStore+RAFT → Cluster+API → Consumer+Stream → Client+MQTT → Config+Gateway → Route+LeafNode → Account → Monitoring+WebSocket. Codex-reviewed with fixes for async locks, token-based filtering, linearizable ReadIndex, and golden fixture tests for wire format compatibility.
596 lines
39 KiB
Markdown
596 lines
39 KiB
Markdown
# Design: Remaining Gap Closure (93 Gaps, 8 Phases)
|
||
|
||
**Date**: 2026-02-26
|
||
**Scope**: All 93 remaining gaps across FileStore, RAFT, JetStream Cluster, JetStream API, Consumer, Stream, Client, MQTT, Config, Gateway, Route, LeafNode, Account, Monitoring, WebSocket
|
||
**Approach**: Bottom-up by dependency (Phase 1→8)
|
||
**Codex review**: Verified 2026-02-26 — dependency ordering feedback incorporated
|
||
|
||
## Overview
|
||
|
||
This design covers ALL remaining gaps identified in `docs/structuregaps.md` after the production gaps plan (29 sub-gaps implemented in Tasks 1–27). The 93 remaining gaps span CRITICAL (1), HIGH (16), MEDIUM (58), and LOW (18) severity items.
|
||
|
||
### Codex Feedback (Incorporated)
|
||
|
||
**Round 1:**
|
||
1. Config auth/TLS reload (14.2–14.3) moved to Phase 5 before Leaf TLS/permission sync in Phase 6
|
||
2. API stubs (7.4 Snapshot, 7.5 Consumer Pause) placed in Phase 2 as endpoint wiring to existing state management
|
||
3. Networking split across Phase 5 (Config+Gateway) and Phase 6 (Route+LeafNode) per failure domain
|
||
4. Golden compatibility tests for binary formats (assignment encoding, snapshot, WAL)
|
||
5. Phase exit gates with targeted test verification per phase
|
||
|
||
**Round 2:**
|
||
6. 7.4 Snapshot API: Phase 2 is endpoint stub only; behavior completed with 4.7 in Phase 3
|
||
7. 3.10 Filter matching: use `SubjectMatch` token matcher, NOT compiled Regex (NATS wildcards are token-based)
|
||
8. 8.7 ReadIndex: requires leader quorum confirmation round for linearizable reads
|
||
9. 1.5 Checksum: use XxHash64 (already in .NET MessageRecord), not highway hash — .NET-internal format, not cross-cluster wire format
|
||
10. 2.10 Assignment codec: must match Go wire format byte-for-byte with golden fixture tests
|
||
11. 8.8 Election jitter: use `TotalMilliseconds`, not `Milliseconds` property
|
||
12. 1.6 Atomic writes: use `SemaphoreSlim` (async-compatible), not `ReaderWriterLockSlim` (sync-only deadlock risk)
|
||
13. 1.8 Write cache: bounded strong-reference cache with explicit size/TTL, not `WeakReference` (GC-nondeterministic)
|
||
14. 11.6 Gateway commands: specify exact command byte sequences matching Go wire format
|
||
15. 12.5 Leaf WebSocket: explicit adapter for message-framed WebSocket → stream conversion
|
||
16. Phase 7 depends on Phase 6 for account claim propagation paths
|
||
17. Run full test suite every 2 phases (after Phase 2, 4, 6, 8), not only at end
|
||
|
||
### Constraints
|
||
|
||
- Subagents use sonnet 4.6 model
|
||
- Only targeted unit tests during implementation (`dotnet test --filter`)
|
||
- Full test suite run only at end of all 8 phases
|
||
- Test parity DB (`docs/test_parity.db`) updated per phase
|
||
- `structuregaps.md` updated when complete
|
||
|
||
---
|
||
|
||
## Phase 1: FileStore Durability + RAFT Completion (11 gaps)
|
||
|
||
**Dependencies**: None — pure infrastructure
|
||
**Exit gate**: Checksums validated on read, atomic writes verified, tombstones persisted/recovered, cache expires correctly, filtered queries use SubjectTree, RAFT streams snapshots in chunks, transfers leadership, compacts with policies, checks quorum, jitters elections
|
||
|
||
### 1.5 Checksum Validation (HIGH)
|
||
|
||
**Files**: `MsgBlock.cs`
|
||
**Design**: Add `LastChecksum` byte array field to `MsgBlock`. Add `ValidateOnRead` flag. Wire existing `XxHash64` checksum computation into read path validation. Go uses highway hash but .NET already has XxHash64 in `MessageRecord` — extend to per-block last-checksum tracking.
|
||
|
||
### 1.6 Atomic File Overwrites (HIGH)
|
||
|
||
**Files**: New `AtomicFileWriter.cs` helper, `FileStore.cs`
|
||
**Design**: Static helper `WriteAtomicallyAsync(path, data)` that writes to `path.tmp`, calls `Flush(flushToDisk: true)`, then `File.Move(overwrite: true)`. Add `SemaphoreSlim _stateWriteLock` (async-compatible, avoids deadlock with async I/O) to `FileStore` for mutual exclusion during state writes. Use for index and metadata persistence.
|
||
|
||
### 1.7 Tombstone & Deletion Tracking (HIGH)
|
||
|
||
**Files**: New `SequenceSet.cs`, `MsgBlock.cs`
|
||
**Design**: Replace `HashSet<ulong> _deleted` with sparse `SequenceSet` (sorted list with range compression matching Go's AVL approach). Add `RemoveMsg(seq, secureErase)` that marks deletion in SequenceSet and optionally overwrites data with random bytes. Persist tombstones as special records in block file (flag byte `0x80` already exists). Recover tombstones during `RecoverBlocks()`.
|
||
|
||
### 1.8 Multi-Block Write Cache (MEDIUM)
|
||
|
||
**Files**: `FileStore.cs`, `MsgBlock.cs`
|
||
**Design**: Add `WriteCacheManager` inner class to `FileStore` with background flush via `PeriodicTimer`. Bounded strong-reference per-block cache with explicit size limit (default 64MB) and TTL eviction (default 2 seconds). Single `PeriodicTimer` worker checks cache age and size, evicts oldest entries when limits exceeded. `FlushPendingMsgsAsync()` background method.
|
||
|
||
### 1.10 Query/Filter Operations (MEDIUM)
|
||
|
||
**Files**: `FileStore.cs`
|
||
**Design**: Add `FilteredState(subject)` that uses existing `SubjectTree` for O(log n) lookups instead of LINQ scans. Add `LoadMsg(seq)` with block-aware binary search on block index ranges. Add `CheckSkipFirstBlock(filter, firstBlock)` optimization for range queries. Add `NumFiltered(filter)` with caching.
|
||
|
||
### 8.3 InstallSnapshot Streaming (HIGH)
|
||
|
||
**Files**: `RaftNode.cs`, `RaftWireFormat.cs`
|
||
**Design**: Add `SnapshotChunkEnumerator` yielding fixed-size chunks (64KB). Add `InstallSnapshotChunkRpc` message type. Receiver assembles into temp file, validates CRC32 over assembled content, then atomically replaces current snapshot.
|
||
|
||
### 8.4 Leadership Transfer (MEDIUM)
|
||
|
||
**Files**: `RaftNode.cs`
|
||
**Design**: Add `TransferLeadership(targetId)` that sends `TimeoutNowRpc` to target. Target immediately starts election. Leader stops accepting proposals during transfer. Timeout after 2x election timeout if transfer doesn't complete.
|
||
|
||
### 8.5 Log Compaction (MEDIUM)
|
||
|
||
**Files**: `RaftNode.cs`, new `CompactionPolicy.cs`
|
||
**Design**: Add `CompactionPolicy` enum (ByCount, BySize, ByAge). `CompactAsync(policy)` delegates to `RaftWal.CompactAsync` with computed cutoff index. Configurable via `RaftOptions.CompactionPolicy` and thresholds.
|
||
|
||
### 8.6 Quorum Check Before Proposing (MEDIUM)
|
||
|
||
**Files**: `RaftNode.cs`
|
||
**Design**: Add `HasQuorum()` that verifies majority of peers have heartbeat response within 2x election timeout. `ProposeAsync` returns `ProposalResult.NoQuorum` when check fails. Track last heartbeat time per peer in `RaftPeerState`.
|
||
|
||
### 8.7 Read-Only Query Optimization (LOW)
|
||
|
||
**Files**: `RaftNode.cs`
|
||
**Design**: Add `ReadIndex()` that confirms leadership via quorum heartbeat round (sends heartbeat to majority, waits for acknowledgment), then returns the current commit index. Callers serve reads from any node whose applied index >= returned commit index. Ensures linearizable reads — cannot serve stale data from a deposed leader. Avoids log growth for read-heavy workloads.
|
||
|
||
### 8.8 Election Timeout Jitter (LOW)
|
||
|
||
**Files**: `RaftNode.cs`
|
||
**Design**: Add `RandomizedElectionTimeout()` returning `baseTimeout + TimeSpan.FromMilliseconds(Random.Shared.Next(0, (int)baseTimeout.TotalMilliseconds))`. Use `TotalMilliseconds` (not `Milliseconds` property which is 0-999 component only). Apply in `ResetElectionTimer()`. Prevents synchronized elections after network partitions.
|
||
|
||
---
|
||
|
||
## Phase 2: JetStream Cluster Coordination + API (13 gaps)
|
||
|
||
**Dependencies**: Phase 1 (RAFT completion)
|
||
**Exit gate**: Peers added/removed, RAFT entries applied to state machine, assignments encoded/decoded, API requests forwarded to leader, clustered create/update works, rate limiting active, advisory events published
|
||
|
||
### 2.4 Peer Management & Stream Moves (HIGH)
|
||
|
||
**Files**: `JetStreamMetaGroup.cs`, `StreamReplicaGroup.cs`
|
||
**Design**: Add `ProcessAddPeer(peerId)` that triggers re-replication of under-replicated streams. `ProcessRemovePeer(peerId)` triggers stream reassignment away from removed peer. `RemovePeerFromStream(streamName, peerId)` removes specific peer from replica group. `RemapStreamAssignment(assignment)` reassigns stream to new peer set.
|
||
|
||
### 2.7 Entry Application Pipeline (HIGH)
|
||
|
||
**Files**: `JetStreamMetaGroup.cs`, `StreamReplicaGroup.cs`
|
||
**Design**: Add `ApplyMetaEntries(entries)` dispatcher that switches on entry type enum (StreamCreate, StreamUpdate, StreamDelete, ConsumerCreate, ConsumerDelete, PeerAdd, PeerRemove). `ApplyStreamEntries(entries)` processes per-stream RAFT entries. `ApplyStreamMsgOp(op)` handles message-level operations (store, remove, purge). `ApplyConsumerEntries(entries)` processes consumer state changes (ack, nak, deliver).
|
||
|
||
### 2.8 Topology-Aware Placement (MEDIUM)
|
||
|
||
**Files**: `PlacementEngine.cs`
|
||
**Design**: Extend `SelectPeerGroup()` with: `JetStreamUniqueTag` enforcement (no two replicas on same-tagged node), HA asset limits per peer (max streams/consumers), dynamic available storage from peer state, tag include/exclude with prefix matching, weighted selection by available resources. Add `TieredStreamAndReservationCount()` and `CreateGroupForConsumer()`.
|
||
|
||
### 2.9 RAFT Group Creation & Lifecycle (MEDIUM)
|
||
|
||
**Files**: New `RaftGroup.cs`
|
||
**Design**: `RaftGroup` record with `Name`, `Peers` list, `PreferredLeader`. Methods: `IsMember(peerId)`, `SetPreferred(peerId)`. Factory method `CreateRaftGroup(name, clusterSize, placement)` that uses PlacementEngine to select peers.
|
||
|
||
### 2.10 Assignment Encoding/Decoding (MEDIUM)
|
||
|
||
**Files**: New `AssignmentCodec.cs`
|
||
**Design**: Static methods for binary serialization matching Go wire format byte-for-byte. Stream assignments: `EncodeAddStreamAssignment()`, `EncodeUpdateStreamAssignment()`, `EncodeDeleteStreamAssignment()`, `DecodeStreamAssignment()`. Consumer assignments: `EncodeAddConsumerAssignment()`, `EncodeDeleteConsumerAssignment()`, `DecodeConsumerAssignment()`. S2 compression for configs > 1KB. **Golden fixture tests**: encode/decode round-trip tests with captured Go output bytes to verify format compatibility.
|
||
|
||
### 2.11 Unsupported Asset Handling (LOW)
|
||
|
||
**Files**: `JetStreamMetaGroup.cs`
|
||
**Design**: Add `UnsupportedStreamAssignment` and `UnsupportedConsumerAssignment` record types. When decoding an assignment with unknown version, create unsupported wrapper that logs warning and skips processing. Prevents crashes on mixed-version clusters.
|
||
|
||
### 2.12 Clustered API Handlers (MEDIUM)
|
||
|
||
**Files**: `StreamApiHandlers.cs`, `ConsumerApiHandlers.cs`
|
||
**Design**: Add `JsClusteredStreamRequest()` that proposes stream creation to meta RAFT group. `JsClusteredStreamUpdateRequest()` proposes update with validation. System-level subscriptions for result processing (`$JS.ACK.META.*`).
|
||
|
||
### 7.1 Leader Forwarding (HIGH)
|
||
|
||
**Files**: `JetStreamApiRouter.cs`
|
||
**Design**: Add `ForwardToLeader(request)` middleware check. If current node is not meta leader, forward request via internal NATS request to `$JS.API.{action}` on leader node. Return forwarded response to original client. Timeout after 5 seconds.
|
||
|
||
### 7.2 Clustered API Handlers (HIGH)
|
||
|
||
**Files**: `StreamApiHandlers.cs`, `ConsumerApiHandlers.cs`
|
||
**Design**: Cluster-aware create/update/delete that propose to RAFT instead of applying locally. Wait for proposal result (applied or error) before responding to client.
|
||
|
||
### 7.3 API Rate Limiting & Deduplication (MEDIUM)
|
||
|
||
**Files**: New `ApiRateLimiter.cs`
|
||
**Design**: `SemaphoreSlim`-based concurrency limiter (configurable max, default 256). Per-client request deduplication via `Nats-Request-Id` header stored in `ConcurrentDictionary<string, DateTime>` with 5-second TTL.
|
||
|
||
### 7.4 Snapshot & Restore API (MEDIUM)
|
||
|
||
**Files**: `StreamApiHandlers.cs`, `StreamSnapshotService.cs`
|
||
**Design**: Add `$JS.API.STREAM.SNAPSHOT` endpoint using `StreamSnapshotService` (will be fully implemented in Phase 3). Add `$JS.API.STREAM.RESTORE` endpoint for restore. Endpoints wire to existing snapshot infrastructure.
|
||
|
||
### 7.5 Consumer Pause/Resume API (MEDIUM)
|
||
|
||
**Files**: `ConsumerApiHandlers.cs`
|
||
**Design**: Add `$JS.API.CONSUMER.PAUSE` endpoint that calls existing `ConsumerManager.PauseConsumer(name, pauseUntil)`. Returns pause state in response.
|
||
|
||
### 7.6 Advisory Event Publication (LOW)
|
||
|
||
**Files**: Various API handlers
|
||
**Design**: Add `PublishAdvisory(eventType, payload)` calls in API handlers. Stream lifecycle: create, update, delete. Consumer lifecycle: create, delete, pause, resume. Uses `InternalEventSystem.PublishAsync()`.
|
||
|
||
---
|
||
|
||
## Phase 3: Consumer Engines + Stream Lifecycle (13 gaps)
|
||
|
||
**Dependencies**: Phase 2 (cluster coordination)
|
||
**Exit gate**: Consumer delivery loop dispatches messages, heartbeats sent, interest tracked, max deliveries enforced, filter skipping works, rate limiting via token bucket, source consumers fully configured, snapshot/restore operational, config validation complete
|
||
|
||
### 3.1 Core Message Delivery Loop (CRITICAL)
|
||
|
||
**Files**: `PushConsumerEngine.cs`
|
||
**Design**: Add `LoopAndGatherMsgs()` background task that: (1) Polls store via `LoadNextMsg(filter, lastDelivered)`, (2) Checks `RedeliveryTracker` for expired redeliveries, (3) Calculates `num_pending` from store state, (4) Checks delivery interest via `DeliveryInterestTracker`, (5) Dispatches message via client's write path with `MSG` frame. Uses `Channel<ConsumerSignal>` for wake-up on new messages, ack events, and configuration changes.
|
||
|
||
### 3.5 Idle Heartbeat & Flow Control (HIGH)
|
||
|
||
**Files**: `PushConsumerEngine.cs`
|
||
**Design**: Add `SendIdleHeartbeat()` that sends status message with `Nats-Pending-Messages`, `Nats-Pending-Bytes` headers when no messages delivered within `IdleHeartbeat` interval. Add `SendFlowControl()` for dedicated flow control replies with stall detection. Heartbeat timer using `PeriodicTimer`.
|
||
|
||
### 3.8 Delivery Interest Tracking (HIGH)
|
||
|
||
**Files**: New `DeliveryInterestTracker.cs`
|
||
**Design**: Monitors subscribe/unsubscribe events for the consumer's delivery subject. `HasInterest` property. Gateway interest checking via `GatewayInterestTracker.HasInterest(subject)`. `DeleteNotActive()` cleanup when no subscribers after configurable timeout (default 30 seconds). Integrates with `SubList` subscription change notifications.
|
||
|
||
### 3.9 Max Deliveries Enforcement (MEDIUM)
|
||
|
||
**Files**: `AckProcessor.cs`
|
||
**Design**: When delivery count exceeds `MaxDeliver`, generate `NotifyDeliveryExceeded` advisory via `InternalEventSystem`. Add `DeliveryExceededPolicy` enum (Drop, DeadLetter). `OnMaxDeliveriesExceeded` event for extensibility.
|
||
|
||
### 3.10 Filter Subject Skip Tracking (MEDIUM)
|
||
|
||
**Files**: New `FilterSkipTracker.cs`
|
||
**Design**: Use `SubjectMatch.IsMatch()` for token-based filter matching (NOT compiled Regex — NATS wildcards `*`/`>` are token-based, not regex). `SortedSet<ulong> _skippedSequences` for tracking unmatched sequences. `UpdateSkipped(seq)` adds to skip set. `ShouldSkip(subject)` returns whether message matches filter using existing `SubjectMatch` infrastructure.
|
||
|
||
### 3.11 Sample/Observe Mode (MEDIUM)
|
||
|
||
**Files**: `PushConsumerEngine.cs`
|
||
**Design**: Parse `SampleFrequency` string (e.g., `"1%"` → 0.01). `ShouldSample()` uses `Random.Shared.NextDouble() < frequency`. Latency measurement: record time from store to delivery. Latency advisory via `InternalEventSystem` with p50/p90/p99 buckets.
|
||
|
||
### 3.12 Reset to Sequence (MEDIUM)
|
||
|
||
**Files**: `ConsumerManager.cs`
|
||
**Design**: Add `ProcessResetRequest(targetSeq)` that resets `_deliveredSeq = targetSeq`, clears `_ackFloor`, clears pending acks from `AckProcessor`, resets `RedeliveryTracker`. Publishes advisory.
|
||
|
||
### 3.13 Rate Limiting (MEDIUM)
|
||
|
||
**Files**: New `TokenBucketRateLimiter.cs`, `PushConsumerEngine.cs`
|
||
**Design**: `TokenBucketRateLimiter` with configurable rate (bytes/sec) and burst size. `WaitForTokenAsync(byteCount)` blocks until tokens available. Dynamic rate updates via `UpdateRate(newRate)`. Integrate into delivery loop.
|
||
|
||
### 3.14 Cluster-Aware Pending Requests (LOW)
|
||
|
||
**Files**: `PullConsumerEngine.cs`
|
||
**Design**: Add `ProposeWaitingRequest(request)` that proposes pull requests through consumer RAFT group. Cluster-wide pending request tracking via replicated state.
|
||
|
||
### 4.3 Source Consumer Setup (HIGH)
|
||
|
||
**Files**: `SourceCoordinator.cs`
|
||
**Design**: Complete API request generation: build consumer create request with `FilterSubject`, `SubjectTransforms`, `OptStartSeq`/`OptStartTime`, `FlowControl` configuration. Account isolation verification via `Account.HasImportPermission()`. Starting sequence selection: use source stream's last sequence or configured start.
|
||
|
||
### 4.7 Stream Snapshot & Restore (MEDIUM)
|
||
|
||
**Files**: `StreamSnapshotService.cs`
|
||
**Design**: Implement TAR-based snapshot format: stream config, message blocks, consumer state files, all S2-compressed. Deadline enforcement via `CancellationTokenSource` with timeout. Consumer inclusion/exclusion by name list. Restore: validate TAR structure, decompress, write blocks, rebuild indices. Partial restore recovery: track progress, resume from last written block.
|
||
|
||
### 4.8 Stream Config Update Validation (MEDIUM)
|
||
|
||
**Files**: `StreamManager.cs`
|
||
**Design**: Add `ValidateConfigUpdate(oldConfig, newConfig)` that checks: subjects overlap with existing streams, mirror/source config immutability rules, retention policy change restrictions, MaxMsgs/MaxBytes/MaxAge monotonic decrease only, discard policy compatibility. Return `ConfigUpdateResult` with accepted changes and rejected reasons.
|
||
|
||
### 4.10 Source/Mirror Info Reporting (LOW)
|
||
|
||
**Files**: `MirrorCoordinator.cs`, `SourceCoordinator.cs`, monitoring handlers
|
||
**Design**: Add `GetMirrorInfo()` returning `MirrorInfoResponse` with lag, active status, error message. Add `GetSourceInfo()` returning `SourceInfoResponse[]`. Wire into `/jsz` monitoring endpoint and stream info API response.
|
||
|
||
---
|
||
|
||
## Phase 4: Client Protocol + MQTT (12 gaps)
|
||
|
||
**Dependencies**: Phase 3 (consumer engines)
|
||
**Exit gate**: Client caches route results (8192 max), slow consumers tracked per-kind, trace delivery works, SUB permission cached, will messages published on disconnect, QoS 1/2 durably tracked, retained delivered on subscribe
|
||
|
||
### 5.4 Per-Account Subscription Result Cache (HIGH)
|
||
|
||
**Files**: New `RouteResultCache.cs`, `NatsClient.cs`
|
||
**Design**: LRU cache (max 8192 entries) with per-account partitioning. Caches `SubListResult` from `SubList.Match()`. Atomic generation ID invalidation: each `SubList` modification increments generation, cache entries with stale generation are evicted. Integrate into route/gateway inbound message dispatch path.
|
||
|
||
### 5.5 Slow Consumer Stall Gate (MEDIUM)
|
||
|
||
**Files**: `NatsClient.cs`, `Account.cs`
|
||
**Design**: Extend `StallGate` with `SlowConsumerCount` per `ClientKind`. Account-level tracking: `Account.IncrementSlowConsumers(kind)`. Expose counters via `NatsClient.SlowConsumerStats` for monitoring integration. Fire `SlowConsumerEvent` when threshold crossed.
|
||
|
||
### 5.6 Dynamic Write Buffer Pooling (MEDIUM)
|
||
|
||
**Files**: `NatsClient.cs`
|
||
**Design**: Integrate `OutboundBufferPool` with flush coalescing. Add `PendingClientDrain` (`ConcurrentDictionary<long, NatsClient>`) for broadcast flush: when multiple clients have pending data, signal one write loop to drain all. Reduces syscall count for fan-out scenarios.
|
||
|
||
### 5.7 Per-Client Trace Level (MEDIUM)
|
||
|
||
**Files**: `NatsClient.cs`
|
||
**Design**: Add `TraceMsgDelivery(subject, destination, size)` method logging routed message details at Trace level. Add `EchoSupported` bool flag for per-client echo control on routed messages. Wire into `ProcessMessage()` path.
|
||
|
||
### 5.8 Subscribe Permission Caching (MEDIUM)
|
||
|
||
**Files**: `PermissionLruCache.cs`
|
||
**Design**: Extend cache with SUB permission entries alongside existing PUB entries. Add `GenerationId` field on `Account` that increments on permission changes. Cache entries include generation ID; evict on mismatch. `CheckSubPermission(subject)` mirrors existing `CheckPubPermission()`.
|
||
|
||
### 5.9 Internal Client Kinds (LOW)
|
||
|
||
**Files**: `NatsClient.cs` (ClientKind enum)
|
||
**Design**: Add `SYSTEM = 4`, `JETSTREAM = 5`, `ACCOUNT = 6` to `ClientKind` enum. Add `IsInternalClient()` extension method returning true for SYSTEM/JETSTREAM/ACCOUNT. Use in permission checks to bypass auth for internal clients.
|
||
|
||
### 5.10 Adaptive Read Buffer Short-Read Counter (LOW)
|
||
|
||
**Files**: `AdaptiveReadBuffer.cs`
|
||
**Design**: Add `_consecutiveShortReads` counter. Increment when `bytesRead < buffer.Length / 2`. Reset on full-buffer read. Shrink only after 4 consecutive short reads instead of immediately. Prevents oscillation on bursty traffic.
|
||
|
||
### 6.2 Will Message Delivery (HIGH)
|
||
|
||
**Files**: `MqttSessionStore.cs`, `MqttProtocolHandler.cs`
|
||
**Design**: Add `PublishWillMessage(session)` to `MqttSessionStore` that publishes stored will topic/payload/QoS/retain. Wire into `MqttProtocolHandler.OnDisconnect()` — trigger on abnormal disconnection (no DISCONNECT packet received). Will delay interval support (MQTT 5.0): schedule will delivery after delay.
|
||
|
||
### 6.3 QoS 1/2 Tracking (HIGH)
|
||
|
||
**Files**: New `MqttQoS1Tracker.cs`, `MqttQoS2StateMachine.cs`
|
||
**Design**: `MqttQoS1Tracker` with JetStream-backed ack tracking: store outgoing QoS 1 messages in `$MQTT_out` stream, remove on PUBACK. On reconnect, redeliver unacked messages. Extend `MqttQoS2StateMachine` with PUBREL delivery stream for phase 2. `$MQTT_qos2in` stream for incoming QoS 2 tracking.
|
||
|
||
### 6.4 MaxAckPending Enforcement (HIGH)
|
||
|
||
**Files**: New `MqttFlowController.cs`
|
||
**Design**: Per-subscription max-ack-pending limit (default from `MqttOptions.MaxAckPending`). `SemaphoreSlim`-based blocking when pending acks reach limit. Release on PUBACK/PUBCOMP. Config reload hook to update limits at runtime.
|
||
|
||
### 6.5 Retained Message Delivery on Subscribe (MEDIUM)
|
||
|
||
**Files**: `MqttProtocolHandler.cs`, `MqttRetainedStore.cs`
|
||
**Design**: On SUBSCRIBE, query `MqttRetainedStore` for matching retained messages using topic filter matching. Deliver matching retained messages with Retain flag set. For wildcard subscriptions, scan all retained topics via `GetAllRetainedAsync()` filtered by subscription filter.
|
||
|
||
### 6.6 Session Flapper Detection (LOW)
|
||
|
||
**Files**: `MqttSessionStore.cs`
|
||
**Design**: Track connect/disconnect timestamps per client ID. If 3+ connect/disconnect cycles within 10 seconds, mark as flapper. Apply exponential backoff (1s, 2s, 4s, 8s, 16s, max 30s) on CONNACK response delay. Clear flapper state after 60 seconds of stable connection.
|
||
|
||
---
|
||
|
||
## Phase 5: Configuration Reload + Gateway (11 gaps)
|
||
|
||
**Dependencies**: Phase 4 (client protocol for auth changes)
|
||
**Exit gate**: Auth changes propagated to connections, TLS reloaded for new connections, cluster config hot-reloaded, gateways reconnect with backoff, account-specific routing works, queue groups propagated, reply mapping cached
|
||
|
||
### 14.2 Auth Change Propagation (MEDIUM)
|
||
|
||
**Files**: `ConfigReloader.cs`, `NatsServer.cs`
|
||
**Design**: Add `PropagateAuthChanges(oldUsers, newUsers)` that diffs user sets, sends updated permissions to existing connections via `NatsClient.UpdatePermissions(newPerms)`. Connections with revoked users are gracefully disconnected. Add `ApplyPermissionChange` flag to distinguish reconnect-required vs in-place changes.
|
||
|
||
### 14.3 TLS Certificate Reload (MEDIUM)
|
||
|
||
**Files**: `ConfigReloader.cs`, `NatsServer.cs`
|
||
**Design**: Add `ReloadTlsCertificates(newCertPath, newKeyPath)` that loads new `X509Certificate2`, swaps the `SslServerAuthenticationOptions.ServerCertificate` for the listener. New connections use new cert; existing connections continue with old cert. Validation: check cert expiry, key match.
|
||
|
||
### 14.4 Cluster Config Hot Reload (MEDIUM)
|
||
|
||
**Files**: `ConfigReloader.cs`
|
||
**Design**: Add `ApplyClusterConfigChanges(diff)` that processes route URL additions/removals (connect to new routes, gracefully disconnect removed), gateway URL changes, leaf node URL changes. Fires `ClusterConfigChanged` event for subsystem notification.
|
||
|
||
### 14.5 Logging Level Changes (LOW)
|
||
|
||
**Files**: `ConfigReloader.cs`
|
||
**Design**: Add `ApplyLoggingChanges(newLevel)` that updates the Serilog `LoggingLevelSwitch` at runtime. Supports Debug, Trace, Information, Warning, Error changes. `LogTime` format changes applied immediately.
|
||
|
||
### 14.6 JetStream Config Changes (LOW)
|
||
|
||
**Files**: `ConfigReloader.cs`
|
||
**Design**: Add `ApplyJetStreamConfigChanges(diff)` for JetStream option changes: MaxMemory, MaxStore, Domain updates. StoreDir is non-reloadable (already detected). Fires `JetStreamConfigChanged` event.
|
||
|
||
### 11.2 Gateway Reconnection with Backoff (MEDIUM)
|
||
|
||
**Files**: `GatewayManager.cs`
|
||
**Design**: Add `ReconnectGatewayAsync(gateway)` with exponential backoff: initial=1s, max=60s, jitter=±25%. Background reconnection loop per disconnected gateway using `Task.Delay` with computed backoff. Clear backoff on successful connection.
|
||
|
||
### 11.3 Account-Specific Gateway Routes (MEDIUM)
|
||
|
||
**Files**: `GatewayManager.cs`, `GatewayConnection.cs`
|
||
**Design**: Add `SendAccountSubsToGateway(account, connection)` that sends only subscriptions belonging to the specified account. Per-account interest tracking in `GatewayInterestTracker`. Filter outbound messages by account before sending to gateway.
|
||
|
||
### 11.4 Queue Group Propagation (MEDIUM)
|
||
|
||
**Files**: `GatewayManager.cs`, `GatewayConnection.cs`
|
||
**Design**: Add `SendQueueSubsToGateway(connection)` that propagates queue group subscriptions separately from plain subscriptions. Include queue weight information for load balancing decisions across gateways.
|
||
|
||
### 11.5 Reply Subject Mapping Cache (MEDIUM)
|
||
|
||
**Files**: `ReplyMapper.cs`
|
||
**Design**: Add `ReplyMapCache` (LRU, max 4096 entries) with TTL-based expiration (default 5 minutes). `TrackGWReply(originalSubject, mappedSubject)` adds entry. Background timer prunes expired entries. Cache hit on reverse mapping avoids repeated `_GR_` prefix computation.
|
||
|
||
### 11.6 Gateway Command Protocol (LOW)
|
||
|
||
**Files**: `GatewayManager.cs`, `GatewayConnection.cs`
|
||
**Design**: Add `GatewayCommand` enum: Gossip, AllSubsStart, AllSubsComplete. Command wire format matches Go exactly: `"gw_gossip"`, `"all_subs_start"`, `"all_subs_complete"` byte sequences in control line. `ProcessGatewayCommand(cmd)` handler. Gossip command shares gateway topology. AllSubsStart/Complete bracket bulk subscription synchronization. Golden tests against Go command byte sequences.
|
||
|
||
### 11.7 Connection Registration (LOW)
|
||
|
||
**Files**: `GatewayManager.cs`
|
||
**Design**: Add `RegisterInboundGatewayConnection(conn)`, `RegisterOutboundGatewayConnection(conn)`, `GetRemoteGateway(name)` for full gateway connection registry. Track connection state (connecting, connected, disconnected) per remote gateway.
|
||
|
||
---
|
||
|
||
## Phase 6: Route Clustering + LeafNode (12 gaps)
|
||
|
||
**Dependencies**: Phase 5 (Config reload, Gateway infrastructure)
|
||
**Exit gate**: Account-specific routes work, pool sizes negotiated, routes stored by hash, cluster splits handled, leaf TLS hot-reloads, permissions synced to leaves, leaf connections validated, WebSocket transport works
|
||
|
||
### 13.2 Account-Specific Dedicated Routes (MEDIUM)
|
||
|
||
**Files**: `RouteManager.cs`, `RouteConnection.cs`
|
||
**Design**: Add `AccountRouteMap` class mapping account names to dedicated route connections. When a message targets a specific account, route via the dedicated connection instead of the pool. Configurable via `RouteOptions.AccountRoutes`.
|
||
|
||
### 13.3 Route Pool Size Negotiation (MEDIUM)
|
||
|
||
**Files**: `RouteManager.cs`, `RouteConnection.cs`
|
||
**Design**: Add `NegotiatePoolSize(remoteInfo)` during route handshake. Compare local `RoutePoolSize` with remote's advertised pool size. Use `min(local, remote)` for heterogeneous clusters. Log warning on mismatch.
|
||
|
||
### 13.4 Route Hash Storage (LOW)
|
||
|
||
**Files**: `RouteManager.cs`
|
||
**Design**: Add `ConcurrentDictionary<ulong, RouteConnection> _routesByHash`. `StoreRouteByHash(hash, route)` and `GetRouteByHash(hash)` for O(1) route lookup by connection hash. Used for deterministic routing of account-specific traffic.
|
||
|
||
### 13.5 Cluster Split Handling (LOW)
|
||
|
||
**Files**: `RouteManager.cs`
|
||
**Design**: Add `RemoveAllRoutesExcept(keepPeerIds)` for partition handling — disconnects all routes not in the keep set. `RemoveRoute(peerId)` removes a specific route. Fires `RouteRemoved` event for downstream notification.
|
||
|
||
### 13.6 No-Pool Route Fallback (LOW)
|
||
|
||
**Files**: `RouteManager.cs`, `RouteConnection.cs`
|
||
**Design**: During route handshake, detect if remote server doesn't advertise pool support. Fall back to single-connection mode (pool size = 1). Maintain backward compatibility with pre-pool-era servers.
|
||
|
||
### 12.1 TLS Certificate Hot-Reload (MEDIUM)
|
||
|
||
**Files**: `LeafNodeManager.cs`
|
||
**Design**: Add `UpdateTlsConfig(newCert)` that swaps `SslServerAuthenticationOptions.ServerCertificate` for new leaf node connections. Existing connections continue with old cert. Triggered by `ConfigReloader` TLS change event (Phase 5 dependency).
|
||
|
||
### 12.2 Permission & Account Syncing (MEDIUM)
|
||
|
||
**Files**: `LeafNodeManager.cs`, `LeafConnection.cs`
|
||
**Design**: Add `SendPermsAndAccountInfo(connection)` that sends current permission set and account info to leaf connection. `InitLeafNodeSmapAndSendSubs(connection)` initializes subscription map and sends current subscriptions. Triggered on reconnect and on auth config change.
|
||
|
||
### 12.3 Leaf Connection State Validation (MEDIUM)
|
||
|
||
**Files**: `LeafNodeManager.cs`
|
||
**Design**: Add `ValidateRemoteLeafNode(connection)` on reconnect that checks remote config hash against stored value. If config changed (different permissions, different account), re-initialize the connection. Prevents stale state after hub config changes.
|
||
|
||
### 12.4 JetStream Migration Checks (LOW)
|
||
|
||
**Files**: `LeafNodeManager.cs`
|
||
**Design**: Add `CheckJetStreamMigrate(leafConfig)` validation before leaf JetStream domain changes. Verifies no active streams/consumers would be orphaned. Returns migration feasibility result.
|
||
|
||
### 12.5 Leaf Node WebSocket Support (LOW)
|
||
|
||
**Files**: `LeafNodeManager.cs`, `LeafConnection.cs`
|
||
**Design**: Add WebSocket transport option using existing `WsUpgrade` infrastructure. Add `WebSocketStreamAdapter` that wraps the message-framed `WebSocket` as a `Stream`: reads accumulate received messages into a buffer, writes send as binary WebSocket messages, close propagates `WebSocketCloseStatus`. Handles fragmentation (reassemble multi-frame messages) and backpressure (block writes when send buffer full). `LeafConnection` uses either `NetworkStream` or `WebSocketStreamAdapter`. Configured via `LeafNodeOptions.WebSocket = true`.
|
||
|
||
### 12.6 Leaf Cluster Registration (LOW)
|
||
|
||
**Files**: `LeafNodeManager.cs`
|
||
**Design**: Add `RegisterLeafNodeCluster(clusterId, connections)` and `HasLeafNodeCluster(clusterId)` for tracking leaf cluster topology. Returns connected leaf clusters and their connection count.
|
||
|
||
### 12.7 Connection Disable Flag (LOW)
|
||
|
||
**Files**: `LeafNodeManager.cs`
|
||
**Design**: Add `IsLeafConnectDisabled(remoteUrl)` check in solicitation loop. Per-remote disable flag in `LeafNodeOptions.DisabledRemotes`. When disabled, skip solicitation attempt. Can be toggled at runtime via config reload.
|
||
|
||
---
|
||
|
||
## Phase 7: Account Management & Multi-Tenancy (10 gaps)
|
||
|
||
**Dependencies**: Phase 4 (client protocol), Phase 5 (config reload), Phase 6 (leaf/route propagation for account claim hot-reload paths)
|
||
**Exit gate**: Service export latency tracked with p50/p90/p99, response thresholds enforced, stream import cycles detected, wildcard exports work, accounts expire on TTL, claims hot-reloaded, NKey revocation enforced
|
||
|
||
### 9.1 Service Export Latency Tracking (MEDIUM)
|
||
|
||
**Files**: `Account.cs`, new `ServiceLatencyTracker.cs`
|
||
**Design**: `ServiceLatencyTracker` with histogram for p50/p90/p99 latency. `TrackServiceExport(exportSubject)` starts tracking. `RecordLatency(duration)` updates histogram. `UnTrackServiceExport(subject)` stops tracking. Advisory published via `InternalEventSystem` with latency buckets. Configurable sample rate.
|
||
|
||
### 9.2 Service Export Response Threshold (MEDIUM)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `ResponseThreshold` property to service export configuration. `CheckResponseThreshold(export, elapsed)` returns true if response time exceeds threshold. Triggers `sendBadRequestTrackingLatency()` advisory when threshold exceeded.
|
||
|
||
### 9.3 Stream Import Cycle Detection (MEDIUM)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `StreamImportFormsCycle(fromAccount, toAccount, subject)` using DFS on the import graph. Maintain import adjacency list in-memory. `CheckStreamImportsForCycles()` validates all stream imports. Called during import addition and config reload.
|
||
|
||
### 9.4 Wildcard Service Exports (MEDIUM)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `GetWildcardServiceExport(subject)` that matches against exports containing `*` or `>` wildcards. Uses existing `SubjectMatch.IsMatch()` for wildcard evaluation. Returns first matching export or null.
|
||
|
||
### 9.5 Account Expiration & TTL (MEDIUM)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `ExpiresAt` property from JWT claims. `IsExpired()` checks `DateTimeOffset.UtcNow >= ExpiresAt`. `SetExpirationTimer(ttl)` creates `Timer` that fires on expiry, calling `OnAccountExpired()` which disconnects all clients and removes the account from the account map.
|
||
|
||
### 9.6 Account Claim Hot-Reload (MEDIUM)
|
||
|
||
**Files**: `Account.cs`, `NatsServer.cs`
|
||
**Design**: Add `UpdateAccountClaims(newClaims)` that diffs old vs new claims: permission changes applied to existing clients, new exports/imports registered, removed exports/imports cleaned up. `UpdateAccountClaimsWithRefresh()` additionally forces client re-auth.
|
||
|
||
### 9.7 Service/Stream Activation Expiration (LOW)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `CheckActivationExpiry(export)` that validates JWT activation claim `exp` field. If expired, mark export as inactive. Background timer checks expiry periodically (every 60 seconds).
|
||
|
||
### 9.8 User NKey Revocation (LOW)
|
||
|
||
**Files**: `Account.cs`, `NatsClient.cs`
|
||
**Design**: Wire existing `_revokedUsers` dictionary into active connection validation. `CheckUserRevoked(nkey)` called on each client operation. Revoked users disconnected with `REVOKED_USER` reason. Revocation list updated on config reload.
|
||
|
||
### 9.9 Response Service Import (LOW)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `AddReverseRespMapEntry(replyPrefix, importAccount)` for cross-account request-reply response routing. `CheckForReverseEntries(subject)` checks if a reply subject needs remapping. Entries stored in `ConcurrentDictionary<string, Account>` with TTL cleanup.
|
||
|
||
### 9.10 Service Import Shadowing Detection (LOW)
|
||
|
||
**Files**: `Account.cs`
|
||
**Design**: Add `ServiceImportShadowed(import)` that checks if a more specific local subscription shadows an imported service. Returns true if any local subscription matches the import subject with higher specificity.
|
||
|
||
---
|
||
|
||
## Phase 8: Monitoring, Events & WebSocket (11 gaps)
|
||
|
||
**Dependencies**: Phase 7 (Account management for auth events)
|
||
**Exit gate**: Closed connections queryable via `/connz?state=closed`, account filtering works, sort options work, auth error events published, trace propagated across servers, event payloads complete, OCSP events published, WebSocket TLS configured
|
||
|
||
### 10.1 Closed Connections Ring Buffer (HIGH)
|
||
|
||
**Files**: New `ClosedConnectionRingBuffer.cs`, `NatsServer.cs`, `ConnzHandler.cs`
|
||
**Design**: Fixed-size ring buffer (default 1024, configurable via `ServerOptions.MaxClosedConnections`). `RecordClosedConnection(closedClient)` called on client disconnect. Stores `ClosedClient` records with disconnect reason, duration, bytes in/out. Wire into `ConnzHandler` to support `/connz?state=closed` query parameter.
|
||
|
||
### 10.2 Account-Scoped Filtering (MEDIUM)
|
||
|
||
**Files**: `ConnzHandler.cs`
|
||
**Design**: Add `acc` query parameter parsing. Filter `Connz.Connections` by `client.Account.Name == acc`. Apply before pagination. Handle special `$G` global account.
|
||
|
||
### 10.3 Sort Options (MEDIUM)
|
||
|
||
**Files**: `ConnzHandler.cs`
|
||
**Design**: Add `SortBy` enum: CID, Start, Subs, Pending, MsgsTo, MsgsFrom, BytesTo, BytesFrom, Last, Idle, Uptime, Stop, Reason. Parse `sort` query parameter. Apply `OrderBy` before pagination. Default sort: CID ascending.
|
||
|
||
### 10.4 Message Trace Propagation (MEDIUM)
|
||
|
||
**Files**: `MessageTraceContext.cs`, `InternalEventSystem.cs`
|
||
**Design**: Add trace context header (`Nats-Trace-Id`, `Nats-Trace-Hop`) propagation. Each server hop increments `Nats-Trace-Hop`. Trace events include hop count, server ID, timestamp. Wire into route/gateway/leaf message forwarding path.
|
||
|
||
### 10.5 Auth Error Events (MEDIUM)
|
||
|
||
**Files**: `NatsServer.cs`, `InternalEventSystem.cs`
|
||
**Design**: Add `SendAuthErrorEvent(client, reason)` that publishes to `$SYS.SERVER.{id}.CLIENT.AUTH.ERR`. Event payload: client info (IP, port, user), error reason, timestamp. `SendAccountAuthErrorEvent(account, client, reason)` for account-scoped events.
|
||
|
||
### 10.6 Full System Event Payloads (MEDIUM)
|
||
|
||
**Files**: `EventTypes.cs`, `InternalEventSystem.cs`
|
||
**Design**: Audit all event types and populate missing fields: complete `ServerInfo` (version, host, port, cluster, tags, JetStream), complete `ClientInfo` (account, user, name, lang, version, RTT, IP), complete `ConnectEventMsg` (all fields from Go's `ConnectEventMsgType`).
|
||
|
||
### 10.7 Closed Connection Reason Tracking (MEDIUM)
|
||
|
||
**Files**: `NatsClient.cs`, `NatsServer.cs`
|
||
**Design**: Populate `ClosedClient.Reason` consistently: set `CloseReason` enum before every `CloseAsync()` call path — timeout, protocol error, auth failure, slow consumer, server shutdown, max payload, etc. Map to Go's `ClosedState` values.
|
||
|
||
### 10.8 Remote Server Events (LOW)
|
||
|
||
**Files**: `NatsServer.cs`, `InternalEventSystem.cs`
|
||
**Design**: Add `RemoteServerShutdown(serverId)` event when a cluster peer disconnects. `RemoteServerUpdate(serverId, info)` when peer INFO changes. `LeafNodeConnected(leafId)` when leaf connects. Publish to `$SYS.SERVER.{id}.PEER.*` subjects.
|
||
|
||
### 10.9 Event Compression (LOW)
|
||
|
||
**Files**: `InternalEventSystem.cs`
|
||
**Design**: Add `GetAcceptEncoding(client)` that checks if the subscribing client supports S2 compression. If supported, compress system event payloads with `S2Codec` before publishing. Add `Nats-Encoding: s2` header.
|
||
|
||
### 10.10 OCSP Peer Events (LOW)
|
||
|
||
**Files**: `NatsServer.cs`, `InternalEventSystem.cs`
|
||
**Design**: Add `SendOCSPPeerRejectEvent(peer, reason)` when OCSP stapling check fails for a peer certificate. `SendOCSPPeerChainlinkInvalidEvent(peer, certIndex)` for chain validation failures. Publish to `$SYS.SERVER.{id}.OCSP.*`.
|
||
|
||
### 15.1 WebSocket-Specific TLS (LOW)
|
||
|
||
**Files**: `WsUpgrade.cs`, `NatsServer.cs`
|
||
**Design**: Add separate TLS configuration for WebSocket listener (`WebSocketTlsOptions`) with its own certificate, client cert requirements, and cipher suite preferences. Falls back to main TLS config if not specified.
|
||
|
||
---
|
||
|
||
## Cross-Cutting Concerns
|
||
|
||
### Test Parity Database Updates
|
||
|
||
Each phase updates `docs/test_parity.db`:
|
||
- New `.NET` test entries added to `dotnet_tests` table
|
||
- Matching Go test entries updated in `go_tests` table where applicable (status → mapped)
|
||
|
||
### structuregaps.md Updates
|
||
|
||
Updated after all 8 phases complete — all 93 remaining gaps marked as IMPLEMENTED.
|
||
|
||
### Exit Gate Protocol
|
||
|
||
Per phase:
|
||
1. Run targeted tests for all modified test files: `dotnet test --filter "FullyQualifiedName~TestClassName"`
|
||
2. Run existing tests for modified subsystems to verify no regressions
|
||
3. Golden fixture tests for binary formats (assignment codec, snapshot, WAL) where applicable
|
||
4. Commit all changes
|
||
5. Update test parity DB
|
||
|
||
Full test suite checkpoints (every 2 phases):
|
||
- After Phase 2: `dotnet test` — verify FileStore + RAFT + Cluster + API
|
||
- After Phase 4: `dotnet test` — verify Consumer + Stream + Client + MQTT
|
||
- After Phase 6: `dotnet test` — verify Config + Gateway + Route + LeafNode
|
||
- After Phase 8: `dotnet test` — final full verification
|
||
|
||
After all 8 phases:
|
||
1. Update `structuregaps.md` — mark all 93 gaps as IMPLEMENTED
|
||
2. Commit final updates
|