This captures the iterative CommentChecker cleanup plus updated snapshot/report outputs used to validate and benchmark the latest JetStream and transport work.
Add XML doc comments to public properties across EventTypes, Connz, Varz,
NatsOptions, StreamConfig, IStreamStore, FileStore, MqttListener,
MqttSessionStore, MessageTraceContext, and JetStreamApiResponse. Fix flaky
tests by increasing timing margins (ResponseTracker expiry 1ms→50ms,
sleep 50ms→200ms) and document known flaky test patterns in tests.md.
Three optimizations making the serial fan-out path cheaper (fan-out 0.63x→0.70x,
multi pub/sub 0.65x→0.69x):
1. Pre-format MSG prefix ("MSG subject ") and suffix (" [reply] sizes\r\n") once
per publish. New SendMessagePreformatted writes prefix+sid+suffix directly into
_directBuf — zero encoding, pure memory copies. Only SID varies per delivery.
2. Replace queue-group round-robin Interlocked.Increment/Decrement with non-atomic
uint QueueRoundRobin++ (safe: ProcessMessage runs single-threaded per connection).
3. Replace HashSet<INatsClient> pcd with ThreadStatic INatsClient[] + linear scan.
O(n) but n≤16; faster than hash for small fan-out counts.
Replace eager Dictionary<ulong, StoredMessage> with lightweight
Dictionary<ulong, MessageMeta> to eliminate ~200B StoredMessage
allocation per message on the write path.
- Add MessageMeta struct (BlockId, Subject, PayloadLength, HeaderLength,
TimestampNs) — ~40B vs ~200B for StoredMessage
- Add MaterializeMessage(seq) for on-demand reconstruction from blocks
- Update all ~60 _messages references to use _meta
- Methods needing full payload (LoadAsync, ListAsync, etc.) call
MaterializeMessage; metadata-only paths use _meta directly
- Fix MsgBlock.WriteAt to clear stale delete markers on re-write
- Add cached state properties (LastSeq, MessageCount, TotalBytes, FirstSeq)
to IStreamStore/FileStore/MemStore — eliminates GetStateAsync on publish path
- Add Capture(StreamHandle, ...) overload to StreamManager — eliminates
double FindBySubject lookup (once in JetStreamPublisher, once in Capture)
- Remove _messageIndexes dictionary from FileStore write path — all lookups
now use _messages directly, saving ~48B allocation per message
- Add JetStreamPubAckFormatter for hand-rolled UTF-8 success ack formatting —
avoids JsonSerializer overhead on the hot publish path
- Switch flush loop to exponential backoff (1→2→4→8ms) matching Go server
Add string.Create fast path in NatsToMqtt for subjects without _DOT_
escape sequences (common case), avoiding StringBuilder allocation.
Pre-warm the topic bytes cache when MQTT subscriptions are added to
eliminate cache miss on first message delivery.
Replace per-message DeliverMessage/flush in DeliverPullFetchMessagesAsync
with SendMessageNoFlush + batch flush every 64 messages. Add signal-based
wakeup (StreamHandle.NotifyPublish/WaitForPublishAsync) to replace 5ms
Task.Delay polling in both DeliverPullFetchMessagesAsync and
PullConsumerEngine.WaitForMessageAsync. Publishers signal waiting
consumers immediately after store append.
Implement Go's pcd (per-client deferred flush) pattern to reduce write-loop
wakeups during fan-out delivery, optimize ack reply string construction with
stack-based formatting, cache CompiledFilter on ConsumerHandle, and pool
fetch message lists. Durable consumer fetch improves from 0.60x to 0.74x Go.
Pub/sub 1:1 (16B) improved from 0.18x to 0.50x, fan-out from 0.18x to 0.44x,
and JetStream durable fetch from 0.13x to 0.64x vs Go. Key changes: replace
.ToArray() copy in SendMessage with pooled buffer handoff, batch multiple small
writes into single WriteAsync via 64KB coalesce buffer in write loop, and remove
profiling Stopwatch instrumentation from ProcessMessage/StreamManager hot paths.
Add detailed analysis of the 1,200x JetStream file publish gap identifying
the bottleneck in the outbound write path (not FileStore). Add tests.md
tracking skipped/failing test status across Core and JetStream suites.
Implement Go-parity background flush loop (coalesce 16KB/8ms) in MsgBlock/FileStore,
replace O(n) GetStateAsync with incremental counters, skip PruneExpired/LoadAsync/
PrunePerSubject when not needed, and bypass RAFT for single-replica streams. Fix counter
tracking bugs in RemoveMsg/EraseMsg/TTL expiry and ObjectDisposedException races in
flush loop disposal. FileStore optimizations verified with 3112/3112 JetStream tests
passing; async publish benchmark remains at ~174 msg/s due to E2E protocol path bottleneck.
Side-by-side performance benchmarks using NATS.Client.Core against both
servers on ephemeral ports. Includes core pub/sub, request/reply latency,
and JetStream throughput tests with comparison output and
benchmarks_comparison.md results. Also fixes timestamp flakiness in
StoreInterfaceTests by using explicit timestamps.