Commit Graph

598 Commits

Author SHA1 Message Date
Joseph Doherty
46ead5ea9f Improve XML documentation coverage across src modules and sync generated analysis artifacts. 2026-03-14 03:56:58 -04:00
Joseph Doherty
ba0d65317a Improve source XML docs and refresh profiling artifacts
This captures the iterative CommentChecker cleanup plus updated snapshot/report outputs used to validate and benchmark the latest JetStream and transport work.
2026-03-14 03:13:17 -04:00
Joseph Doherty
56c773dc71 Improve XML documentation coverage across core server components and refresh checker reports. 2026-03-14 02:26:53 -04:00
Joseph Doherty
007baf3fa4 Document gateway manager behavior and checkpoint doc-fix scan outputs 2026-03-14 01:41:19 -04:00
Joseph Doherty
88a82ee860 docs: add XML doc comments to server types and fix flaky test timings
Add XML doc comments to public properties across EventTypes, Connz, Varz,
NatsOptions, StreamConfig, IStreamStore, FileStore, MqttListener,
MqttSessionStore, MessageTraceContext, and JetStreamApiResponse. Fix flaky
tests by increasing timing margins (ResponseTracker expiry 1ms→50ms,
sleep 50ms→200ms) and document known flaky test patterns in tests.md.
2026-03-13 18:47:48 -04:00
Joseph Doherty
1d4b87e5f9 docs: refresh benchmark comparison with increased message counts
Increase message counts across all 14 benchmark test files to reduce
run-to-run variance (e.g. PubSub 16B: 10K→50K, FanOut: 10K→15K,
SinglePub: 100K→500K, JS tests: 5K→25K). Rewrite benchmarks_comparison.md
with fresh numbers from two-batch runs. Key changes: multi 4x4 reached
parity (1.01x), fan-out improved to 0.84x, TLS pub/sub shows 4.70x .NET
advantage, previous small-count anomalies corrected.
2026-03-13 17:52:03 -04:00
Joseph Doherty
660a897234 Merge branch 'opt/round10-fanout-serial-path' 2026-03-13 16:23:33 -04:00
Joseph Doherty
0e5ce4ed9b perf: optimize fan-out serial path — pre-formatted MSG headers, non-atomic RR, linear pcd
Three optimizations making the serial fan-out path cheaper (fan-out 0.63x→0.70x,
multi pub/sub 0.65x→0.69x):

1. Pre-format MSG prefix ("MSG subject ") and suffix (" [reply] sizes\r\n") once
   per publish. New SendMessagePreformatted writes prefix+sid+suffix directly into
   _directBuf — zero encoding, pure memory copies. Only SID varies per delivery.

2. Replace queue-group round-robin Interlocked.Increment/Decrement with non-atomic
   uint QueueRoundRobin++ (safe: ProcessMessage runs single-threaded per connection).

3. Replace HashSet<INatsClient> pcd with ThreadStatic INatsClient[] + linear scan.
   O(n) but n≤16; faster than hash for small fan-out counts.
2026-03-13 16:23:18 -04:00
Joseph Doherty
23543b2ba8 Merge branch 'opt/js-async-file-publish'
JetStream async file publish optimization (~10% improvement):
- Cached state properties eliminate GetStateAsync on publish path
- Single stream lookup eliminates double FindBySubject
- Removed _messageIndexes dictionary from write path
- Hand-rolled UTF-8 pub-ack formatter for success path
- Exponential flush backoff matching Go server
- Lazy StoredMessage materialization (MessageMeta struct)

# Conflicts:
#	benchmarks_comparison.md
2026-03-13 15:37:11 -04:00
Joseph Doherty
82ab02a612 docs: refresh benchmark comparison after JS async publish optimization 2026-03-13 15:35:59 -04:00
Joseph Doherty
6e91fda7fd perf: Phase 2 lazy StoredMessage materialization in FileStore
Replace eager Dictionary<ulong, StoredMessage> with lightweight
Dictionary<ulong, MessageMeta> to eliminate ~200B StoredMessage
allocation per message on the write path.

- Add MessageMeta struct (BlockId, Subject, PayloadLength, HeaderLength,
  TimestampNs) — ~40B vs ~200B for StoredMessage
- Add MaterializeMessage(seq) for on-demand reconstruction from blocks
- Update all ~60 _messages references to use _meta
- Methods needing full payload (LoadAsync, ListAsync, etc.) call
  MaterializeMessage; metadata-only paths use _meta directly
- Fix MsgBlock.WriteAt to clear stale delete markers on re-write
2026-03-13 15:33:38 -04:00
Joseph Doherty
e4ab48bca4 Merge branch 'feat/round9-hotpath-opt' 2026-03-13 15:30:05 -04:00
Joseph Doherty
a62a25dcdf perf: optimize fan-out hot path and switch benchmarks to Release build
Round 9 optimizations targeting per-delivery overhead:
- Switch benchmark harness from Debug to Release build (biggest impact:
  durable fetch 0.42x→0.92x, request-reply to parity)
- Batch server-wide stats after fan-out loop (2 Interlocked per delivery → 2 per publish)
- Guard auto-unsub tracking with MaxMessages > 0 (skip Interlocked in common case)
- Cache SID as ASCII bytes on Subscription (avoid per-delivery encoding)
- Pre-encode subject bytes once before fan-out loop (avoid N encodings)
- Add 1-element subject string cache in ProcessPub (avoid repeated alloc)
- Remove Interlocked from SubList.Match stats counters (approximate is fine)
- Extract WriteMessageToBuffer helper for both string and span overloads
2026-03-13 15:30:02 -04:00
Joseph Doherty
7404ecdb0e perf: Phase 1 JetStream async file publish optimizations
- Add cached state properties (LastSeq, MessageCount, TotalBytes, FirstSeq)
  to IStreamStore/FileStore/MemStore — eliminates GetStateAsync on publish path
- Add Capture(StreamHandle, ...) overload to StreamManager — eliminates
  double FindBySubject lookup (once in JetStreamPublisher, once in Capture)
- Remove _messageIndexes dictionary from FileStore write path — all lookups
  now use _messages directly, saving ~48B allocation per message
- Add JetStreamPubAckFormatter for hand-rolled UTF-8 success ack formatting —
  avoids JsonSerializer overhead on the hot publish path
- Switch flush loop to exponential backoff (1→2→4→8ms) matching Go server
2026-03-13 15:09:21 -04:00
Joseph Doherty
82cc3ec841 Merge branch 'feat/round7-ordered-consumer-perf' 2026-03-13 14:50:41 -04:00
Joseph Doherty
86fd971510 docs: refresh benchmark comparison after round 8
Ordered consumer: 0.57x (signal-based wakeup + batch flush).
Cross-protocol MQTT: 1.20x (string.Create fast path + topic cache pre-warm).
2026-03-13 14:49:32 -04:00
Joseph Doherty
f7a8d72a6d perf: optimize MQTT NatsToMqtt fast path and pre-warm topic cache
Add string.Create fast path in NatsToMqtt for subjects without _DOT_
escape sequences (common case), avoiding StringBuilder allocation.
Pre-warm the topic bytes cache when MQTT subscriptions are added to
eliminate cache miss on first message delivery.
2026-03-13 14:44:49 -04:00
Joseph Doherty
7b2def4da1 perf: batch flush + signal-based wakeup for JS pull consumers
Replace per-message DeliverMessage/flush in DeliverPullFetchMessagesAsync
with SendMessageNoFlush + batch flush every 64 messages. Add signal-based
wakeup (StreamHandle.NotifyPublish/WaitForPublishAsync) to replace 5ms
Task.Delay polling in both DeliverPullFetchMessagesAsync and
PullConsumerEngine.WaitForMessageAsync. Publishers signal waiting
consumers immediately after store append.
2026-03-13 14:44:02 -04:00
Joseph Doherty
11e01b9026 perf: optimize MQTT cross-protocol path (0.30x → 0.78x Go)
Replace per-message async fire-and-forget with direct-buffer write loop
mirroring NatsClient pattern: SpinLock-guarded buffer append, double-
buffer swap, single WriteAsync per batch.

- MqttConnection: add _directBuf/_writeBuf + RunMqttWriteLoopAsync
- MqttConnection: add EnqueuePublishNoFlush (zero-alloc PUBLISH format)
- MqttPacketWriter: add WritePublishTo(Span<byte>) + MeasurePublish
- MqttTopicMapper: add NatsToMqttBytes with bounded ConcurrentDictionary
- MqttNatsClientAdapter: synchronous SendMessageNoFlush + SignalFlush
- Skip FlushAsync on plain TCP sockets (TCP auto-flushes)
2026-03-13 14:25:13 -04:00
Joseph Doherty
699449da6a test: skip superseded MQTT e2e cases 2026-03-13 11:50:01 -04:00
Joseph Doherty
497aa227af docs: refresh benchmark comparison 2026-03-13 11:42:39 -04:00
Joseph Doherty
4b15f643f6 Merge branch 'main' into codex/filestore-main-integration 2026-03-13 11:40:36 -04:00
Joseph Doherty
a470e0bcdb docs: refresh benchmark comparison 2026-03-13 11:39:54 -04:00
Joseph Doherty
5a00708a79 Merge branch 'mqtt-e2e-wiring' 2026-03-13 11:38:55 -04:00
Joseph Doherty
a5592ed533 feat: wire MQTT end-to-end through NATS SubList for cross-protocol messaging
- MqttListener accepts IMessageRouter + delegates for client ID allocation
  and account resolution (Phase 1-2)
- MqttConnection creates MqttNatsClientAdapter on CONNECT, registers with
  SubList for cross-protocol delivery (Phase 2)
- PUBLISH routes through ProcessMessage() when router available, falls back
  to MQTT-only fan-out for test compatibility (Phase 3)
- SUBSCRIBE creates real SubList entries via adapter, enabling NATS→MQTT
  delivery with topic↔subject translation (Phase 4)
- PUBREL now delivers stored QoS 2 messages before ack (Phase 5)
- ConnzHandler includes MQTT adapters in /connz output (Phase 6)
- MQTTnet E2E tests: MQTT pub/sub, MQTT→NATS, NATS→MQTT, QoS 1 (Phase 7)
2026-03-13 11:38:52 -04:00
Joseph Doherty
20f45b2aaf Merge branch 'codex/filestore-payload-index-optimization' 2026-03-13 11:36:15 -04:00
Joseph Doherty
ca2d8019a1 docs: add FileStore benchmarks and storage notes 2026-03-13 11:34:19 -04:00
Joseph Doherty
f57edca5a8 perf: optimize FileStore payload and maintenance paths 2026-03-13 11:21:48 -04:00
Joseph Doherty
9ff5216495 perf: add compact FileStore index metadata 2026-03-13 10:34:31 -04:00
Joseph Doherty
5674853628 test: lock FileStore optimization boundaries 2026-03-13 10:29:10 -04:00
Joseph Doherty
655ca30e0b fix: stabilize pull consumer expires timeout fetch 2026-03-13 10:29:02 -04:00
Joseph Doherty
a1fc600d84 docs: add optimization planning documents 2026-03-13 10:19:56 -04:00
Joseph Doherty
fb0d31c615 docs: refresh benchmark comparison after SubList optimization 2026-03-13 10:18:52 -04:00
Joseph Doherty
900a4b0923 Merge branch 'codex/sublist-allocation-reduction' 2026-03-13 10:15:46 -04:00
Joseph Doherty
b2707a7493 Merge branch 'codex/parser-span-retention' 2026-03-13 10:11:42 -04:00
Joseph Doherty
845441b32c feat: implement full MQTT Go parity across 5 phases — binary protocol, auth/TLS, cross-protocol bridging, monitoring, and JetStream persistence
Phase 1: Binary MQTT 3.1.1 wire protocol with PipeReader-based parsing,
full packet type dispatch, and MQTT 3.1.1 compliance checks.

Phase 2: Auth pipeline routing MQTT CONNECT through AuthService,
TLS transport with SslStream wrapping, pinned cert validation.

Phase 3: IMessageRouter refactor (NatsClient → INatsClient),
MqttNatsClientAdapter for cross-protocol bridging, MqttTopicMapper
with full Go-parity topic/subject translation.

Phase 4: /connz mqtt_client field population, /varz actual MQTT port.

Phase 5: JetStream persistence — MqttStreamInitializer creates 5
internal streams, MqttConsumerManager for QoS 1/2 consumers,
subject-keyed session/retained lookups replacing linear scans.

All 503 MQTT tests and 1589 Core tests pass.
2026-03-13 10:09:40 -04:00
Joseph Doherty
d1f22255d7 docs: record SubList allocation strategy 2026-03-13 10:08:50 -04:00
Joseph Doherty
a3b34fb16d docs: record parser hot-path allocation strategy 2026-03-13 10:08:20 -04:00
Joseph Doherty
0126234fa6 perf: pool SubList match builders and cleanup scans 2026-03-13 10:06:24 -04:00
Joseph Doherty
6cf11969f5 perf: consume parser command views in client hot path 2026-03-13 10:02:15 -04:00
Joseph Doherty
9fa2ba97b9 perf: keep parser state in bytes until materialization 2026-03-13 10:02:07 -04:00
Joseph Doherty
ca7e12e753 feat: add byte-oriented parser view contract 2026-03-13 09:54:25 -04:00
Joseph Doherty
5876ad7dfa perf: reduce SubList token string churn 2026-03-13 09:53:37 -04:00
Joseph Doherty
98cbdbdeb8 test: lock parser span-retention behavior 2026-03-13 09:51:17 -04:00
Joseph Doherty
348bec36b2 perf: replace SubList routed-sub string keys 2026-03-13 09:51:11 -04:00
Joseph Doherty
08bd34c529 test: lock SubList remote-key and match behavior 2026-03-13 09:49:54 -04:00
Joseph Doherty
0be321fa53 perf: batch flush signaling and fetch path optimizations (Round 6)
Implement Go's pcd (per-client deferred flush) pattern to reduce write-loop
wakeups during fan-out delivery, optimize ack reply string construction with
stack-based formatting, cache CompiledFilter on ConsumerHandle, and pool
fetch message lists. Durable consumer fetch improves from 0.60x to 0.74x Go.
2026-03-13 09:35:57 -04:00
Joseph Doherty
0a4e7a822f perf: eliminate per-message allocations in pub/sub hot path and coalesce outbound writes
Pub/sub 1:1 (16B) improved from 0.18x to 0.50x, fan-out from 0.18x to 0.44x,
and JetStream durable fetch from 0.13x to 0.64x vs Go. Key changes: replace
.ToArray() copy in SendMessage with pooled buffer handoff, batch multiple small
writes into single WriteAsync via 64KB coalesce buffer in write loop, and remove
profiling Stopwatch instrumentation from ProcessMessage/StreamManager hot paths.
2026-03-13 05:09:36 -04:00
Joseph Doherty
9e0df9b3d7 docs: add JetStream perf investigation notes and test status tracking
Add detailed analysis of the 1,200x JetStream file publish gap identifying
the bottleneck in the outbound write path (not FileStore). Add tests.md
tracking skipped/failing test status across Core and JetStream suites.
2026-03-13 03:20:43 -04:00
Joseph Doherty
4de691c9c5 perf: add FileStore buffered writes, O(1) state tracking, and eliminate redundant per-publish work
Implement Go-parity background flush loop (coalesce 16KB/8ms) in MsgBlock/FileStore,
replace O(n) GetStateAsync with incremental counters, skip PruneExpired/LoadAsync/
PrunePerSubject when not needed, and bypass RAFT for single-replica streams. Fix counter
tracking bugs in RemoveMsg/EraseMsg/TTL expiry and ObjectDisposedException races in
flush loop disposal. FileStore optimizations verified with 3112/3112 JetStream tests
passing; async publish benchmark remains at ~174 msg/s due to E2E protocol path bottleneck.
2026-03-13 03:11:11 -04:00