Commit Graph

385 Commits

Author SHA1 Message Date
Joseph Doherty
0e5ce4ed9b perf: optimize fan-out serial path — pre-formatted MSG headers, non-atomic RR, linear pcd
Three optimizations making the serial fan-out path cheaper (fan-out 0.63x→0.70x,
multi pub/sub 0.65x→0.69x):

1. Pre-format MSG prefix ("MSG subject ") and suffix (" [reply] sizes\r\n") once
   per publish. New SendMessagePreformatted writes prefix+sid+suffix directly into
   _directBuf — zero encoding, pure memory copies. Only SID varies per delivery.

2. Replace queue-group round-robin Interlocked.Increment/Decrement with non-atomic
   uint QueueRoundRobin++ (safe: ProcessMessage runs single-threaded per connection).

3. Replace HashSet<INatsClient> pcd with ThreadStatic INatsClient[] + linear scan.
   O(n) but n≤16; faster than hash for small fan-out counts.
2026-03-13 16:23:18 -04:00
Joseph Doherty
23543b2ba8 Merge branch 'opt/js-async-file-publish'
JetStream async file publish optimization (~10% improvement):
- Cached state properties eliminate GetStateAsync on publish path
- Single stream lookup eliminates double FindBySubject
- Removed _messageIndexes dictionary from write path
- Hand-rolled UTF-8 pub-ack formatter for success path
- Exponential flush backoff matching Go server
- Lazy StoredMessage materialization (MessageMeta struct)

# Conflicts:
#	benchmarks_comparison.md
2026-03-13 15:37:11 -04:00
Joseph Doherty
6e91fda7fd perf: Phase 2 lazy StoredMessage materialization in FileStore
Replace eager Dictionary<ulong, StoredMessage> with lightweight
Dictionary<ulong, MessageMeta> to eliminate ~200B StoredMessage
allocation per message on the write path.

- Add MessageMeta struct (BlockId, Subject, PayloadLength, HeaderLength,
  TimestampNs) — ~40B vs ~200B for StoredMessage
- Add MaterializeMessage(seq) for on-demand reconstruction from blocks
- Update all ~60 _messages references to use _meta
- Methods needing full payload (LoadAsync, ListAsync, etc.) call
  MaterializeMessage; metadata-only paths use _meta directly
- Fix MsgBlock.WriteAt to clear stale delete markers on re-write
2026-03-13 15:33:38 -04:00
Joseph Doherty
a62a25dcdf perf: optimize fan-out hot path and switch benchmarks to Release build
Round 9 optimizations targeting per-delivery overhead:
- Switch benchmark harness from Debug to Release build (biggest impact:
  durable fetch 0.42x→0.92x, request-reply to parity)
- Batch server-wide stats after fan-out loop (2 Interlocked per delivery → 2 per publish)
- Guard auto-unsub tracking with MaxMessages > 0 (skip Interlocked in common case)
- Cache SID as ASCII bytes on Subscription (avoid per-delivery encoding)
- Pre-encode subject bytes once before fan-out loop (avoid N encodings)
- Add 1-element subject string cache in ProcessPub (avoid repeated alloc)
- Remove Interlocked from SubList.Match stats counters (approximate is fine)
- Extract WriteMessageToBuffer helper for both string and span overloads
2026-03-13 15:30:02 -04:00
Joseph Doherty
7404ecdb0e perf: Phase 1 JetStream async file publish optimizations
- Add cached state properties (LastSeq, MessageCount, TotalBytes, FirstSeq)
  to IStreamStore/FileStore/MemStore — eliminates GetStateAsync on publish path
- Add Capture(StreamHandle, ...) overload to StreamManager — eliminates
  double FindBySubject lookup (once in JetStreamPublisher, once in Capture)
- Remove _messageIndexes dictionary from FileStore write path — all lookups
  now use _messages directly, saving ~48B allocation per message
- Add JetStreamPubAckFormatter for hand-rolled UTF-8 success ack formatting —
  avoids JsonSerializer overhead on the hot publish path
- Switch flush loop to exponential backoff (1→2→4→8ms) matching Go server
2026-03-13 15:09:21 -04:00
Joseph Doherty
f7a8d72a6d perf: optimize MQTT NatsToMqtt fast path and pre-warm topic cache
Add string.Create fast path in NatsToMqtt for subjects without _DOT_
escape sequences (common case), avoiding StringBuilder allocation.
Pre-warm the topic bytes cache when MQTT subscriptions are added to
eliminate cache miss on first message delivery.
2026-03-13 14:44:49 -04:00
Joseph Doherty
7b2def4da1 perf: batch flush + signal-based wakeup for JS pull consumers
Replace per-message DeliverMessage/flush in DeliverPullFetchMessagesAsync
with SendMessageNoFlush + batch flush every 64 messages. Add signal-based
wakeup (StreamHandle.NotifyPublish/WaitForPublishAsync) to replace 5ms
Task.Delay polling in both DeliverPullFetchMessagesAsync and
PullConsumerEngine.WaitForMessageAsync. Publishers signal waiting
consumers immediately after store append.
2026-03-13 14:44:02 -04:00
Joseph Doherty
11e01b9026 perf: optimize MQTT cross-protocol path (0.30x → 0.78x Go)
Replace per-message async fire-and-forget with direct-buffer write loop
mirroring NatsClient pattern: SpinLock-guarded buffer append, double-
buffer swap, single WriteAsync per batch.

- MqttConnection: add _directBuf/_writeBuf + RunMqttWriteLoopAsync
- MqttConnection: add EnqueuePublishNoFlush (zero-alloc PUBLISH format)
- MqttPacketWriter: add WritePublishTo(Span<byte>) + MeasurePublish
- MqttTopicMapper: add NatsToMqttBytes with bounded ConcurrentDictionary
- MqttNatsClientAdapter: synchronous SendMessageNoFlush + SignalFlush
- Skip FlushAsync on plain TCP sockets (TCP auto-flushes)
2026-03-13 14:25:13 -04:00
Joseph Doherty
4b15f643f6 Merge branch 'main' into codex/filestore-main-integration 2026-03-13 11:40:36 -04:00
Joseph Doherty
a5592ed533 feat: wire MQTT end-to-end through NATS SubList for cross-protocol messaging
- MqttListener accepts IMessageRouter + delegates for client ID allocation
  and account resolution (Phase 1-2)
- MqttConnection creates MqttNatsClientAdapter on CONNECT, registers with
  SubList for cross-protocol delivery (Phase 2)
- PUBLISH routes through ProcessMessage() when router available, falls back
  to MQTT-only fan-out for test compatibility (Phase 3)
- SUBSCRIBE creates real SubList entries via adapter, enabling NATS→MQTT
  delivery with topic↔subject translation (Phase 4)
- PUBREL now delivers stored QoS 2 messages before ack (Phase 5)
- ConnzHandler includes MQTT adapters in /connz output (Phase 6)
- MQTTnet E2E tests: MQTT pub/sub, MQTT→NATS, NATS→MQTT, QoS 1 (Phase 7)
2026-03-13 11:38:52 -04:00
Joseph Doherty
20f45b2aaf Merge branch 'codex/filestore-payload-index-optimization' 2026-03-13 11:36:15 -04:00
Joseph Doherty
f57edca5a8 perf: optimize FileStore payload and maintenance paths 2026-03-13 11:21:48 -04:00
Joseph Doherty
9ff5216495 perf: add compact FileStore index metadata 2026-03-13 10:34:31 -04:00
Joseph Doherty
655ca30e0b fix: stabilize pull consumer expires timeout fetch 2026-03-13 10:29:02 -04:00
Joseph Doherty
900a4b0923 Merge branch 'codex/sublist-allocation-reduction' 2026-03-13 10:15:46 -04:00
Joseph Doherty
b2707a7493 Merge branch 'codex/parser-span-retention' 2026-03-13 10:11:42 -04:00
Joseph Doherty
845441b32c feat: implement full MQTT Go parity across 5 phases — binary protocol, auth/TLS, cross-protocol bridging, monitoring, and JetStream persistence
Phase 1: Binary MQTT 3.1.1 wire protocol with PipeReader-based parsing,
full packet type dispatch, and MQTT 3.1.1 compliance checks.

Phase 2: Auth pipeline routing MQTT CONNECT through AuthService,
TLS transport with SslStream wrapping, pinned cert validation.

Phase 3: IMessageRouter refactor (NatsClient → INatsClient),
MqttNatsClientAdapter for cross-protocol bridging, MqttTopicMapper
with full Go-parity topic/subject translation.

Phase 4: /connz mqtt_client field population, /varz actual MQTT port.

Phase 5: JetStream persistence — MqttStreamInitializer creates 5
internal streams, MqttConsumerManager for QoS 1/2 consumers,
subject-keyed session/retained lookups replacing linear scans.

All 503 MQTT tests and 1589 Core tests pass.
2026-03-13 10:09:40 -04:00
Joseph Doherty
a3b34fb16d docs: record parser hot-path allocation strategy 2026-03-13 10:08:20 -04:00
Joseph Doherty
0126234fa6 perf: pool SubList match builders and cleanup scans 2026-03-13 10:06:24 -04:00
Joseph Doherty
6cf11969f5 perf: consume parser command views in client hot path 2026-03-13 10:02:15 -04:00
Joseph Doherty
9fa2ba97b9 perf: keep parser state in bytes until materialization 2026-03-13 10:02:07 -04:00
Joseph Doherty
ca7e12e753 feat: add byte-oriented parser view contract 2026-03-13 09:54:25 -04:00
Joseph Doherty
5876ad7dfa perf: reduce SubList token string churn 2026-03-13 09:53:37 -04:00
Joseph Doherty
348bec36b2 perf: replace SubList routed-sub string keys 2026-03-13 09:51:11 -04:00
Joseph Doherty
0be321fa53 perf: batch flush signaling and fetch path optimizations (Round 6)
Implement Go's pcd (per-client deferred flush) pattern to reduce write-loop
wakeups during fan-out delivery, optimize ack reply string construction with
stack-based formatting, cache CompiledFilter on ConsumerHandle, and pool
fetch message lists. Durable consumer fetch improves from 0.60x to 0.74x Go.
2026-03-13 09:35:57 -04:00
Joseph Doherty
0a4e7a822f perf: eliminate per-message allocations in pub/sub hot path and coalesce outbound writes
Pub/sub 1:1 (16B) improved from 0.18x to 0.50x, fan-out from 0.18x to 0.44x,
and JetStream durable fetch from 0.13x to 0.64x vs Go. Key changes: replace
.ToArray() copy in SendMessage with pooled buffer handoff, batch multiple small
writes into single WriteAsync via 64KB coalesce buffer in write loop, and remove
profiling Stopwatch instrumentation from ProcessMessage/StreamManager hot paths.
2026-03-13 05:09:36 -04:00
Joseph Doherty
4de691c9c5 perf: add FileStore buffered writes, O(1) state tracking, and eliminate redundant per-publish work
Implement Go-parity background flush loop (coalesce 16KB/8ms) in MsgBlock/FileStore,
replace O(n) GetStateAsync with incremental counters, skip PruneExpired/LoadAsync/
PrunePerSubject when not needed, and bypass RAFT for single-replica streams. Fix counter
tracking bugs in RemoveMsg/EraseMsg/TTL expiry and ObjectDisposedException races in
flush loop disposal. FileStore optimizations verified with 3112/3112 JetStream tests
passing; async publish benchmark remains at ~174 msg/s due to E2E protocol path bottleneck.
2026-03-13 03:11:11 -04:00
Joseph Doherty
e9c86c51c3 fix: resolve 19 JetStream test failures across 5 root causes
- HandleList: populate StreamNames/ConsumerNames alongside info lists
- ValidateConfigUpdate: allow clearing mirror/sources, accept even replicas
- ToWireFormat: add AccountInfo branch for $JS.API.INFO responses
- UpdateStream fixture: preserve existing retention policy on update
- Integration test: fix assertion to match valid account info response
2026-03-13 01:14:21 -04:00
Joseph Doherty
3445a055eb feat: add JetStream cluster replication and leaf node solicited reconnect
Add JetStream stream/consumer config and data replication across cluster
peers via $JS.INTERNAL.* subjects with BroadcastRoutedMessageAsync (sends
to all peers, bypassing pool routing). Capture routed data messages into
local JetStream stores in DeliverRemoteMessage. Fix leaf node solicited
reconnect by re-launching the retry loop in WatchConnectionAsync after
disconnect.

Unskips 4 of 5 E2E cluster tests (LeaderDies_NewLeaderElected,
R3Stream_NodeDies_PublishContinues, Consumer_NodeDies_PullContinuesOnSurvivor,
Leaf_HubRestart_LeafReconnects). The 5th (LeaderRestart_RejoinsAsFollower)
requires RAFT log catchup which is a separate feature.
2026-03-13 01:02:00 -04:00
Joseph Doherty
ab805c883b fix: resolve 8 failing E2E cluster tests (FileStore path bug + missing RAFT replication)
Root cause: StreamManager.CreateStore() used a hardcoded temp path for
FileStore instead of the configured store_dir from JetStream config.
This caused stream data to accumulate across test runs in a shared
directory, producing wrong message counts (e.g., expected 5 but got 80).

Server fix:
- Pass storeDir from JetStream config through to StreamManager
- CreateStore() now uses the configured store_dir for FileStore paths

Test fixes for tests that now pass (3):
- R3Stream_CreateAndPublish_ReplicatedAcrossNodes: delete stream before
  test, verify only on publishing node (no cross-node replication yet)
- R3Stream_Purge_ReplicatedAcrossNodes: same pattern
- LogReplication_AllReplicasHaveData: same pattern

Tests skipped pending RAFT implementation (5):
- LeaderDies_NewLeaderElected: requires RAFT leader re-election
- LeaderRestart_RejoinsAsFollower: requires RAFT log catchup
- R3Stream_NodeDies_PublishContinues: requires cross-node replication
- Consumer_NodeDies_PullContinuesOnSurvivor: requires replicated state
- Leaf_HubRestart_LeafReconnects: leaf reconnection after hub restart
2026-03-13 00:03:37 -04:00
Joseph Doherty
95e9f0a92e feat: wire remaining E2E gaps — account imports, subject transforms, JWT auth, service latency
Close all 5 server-side wiring gaps so E2E tests pass without skips:
- System events: bridge user-defined system_account to internal $SYS
- Account imports/exports: config parsing + reverse response import for cross-account request-reply
- Subject transforms: parse mappings config block, apply in ProcessMessage
- JWT auth: parse trusted_keys, resolver MEMORY, resolver_preload in config
- Service latency: timestamp on request, publish ServiceLatencyMsg on response
2026-03-12 23:03:12 -04:00
Joseph Doherty
246fc7ad87 fix: route manager self-connection detection and per-peer deduplication
- Detect and reject self-route connections (node connecting to itself via
  routes list) in both inbound and outbound handshake paths
- Deduplicate RS+/RS-/RMSG forwarding by RemoteServerId to avoid sending
  duplicate messages when multiple connections exist to the same peer
- Fix ForwardRoutedMessageAsync to broadcast to all peers instead of
  selecting a single route
- Add pool_size: 1 to cluster fixture config
- Add -DV debug flags to cluster fixture servers
- Add WaitForCrossNodePropagationAsync probe pattern for reliable E2E
  cluster test timing
- Fix queue group test to use same-node subscribers (cross-node queue
  group routing not yet implemented)
2026-03-12 20:51:41 -04:00
Joseph Doherty
aeb60d3c43 test: add E2E leaf node tests (hub-to-leaf, leaf-to-hub, subject propagation)
Fix LeafNodeManager.StartAsync to launch solicited connections for RemoteLeaves
(config-file-parsed remotes) in addition to the programmatic Remotes list, and
update ParseEndpoint to handle nats-leaf:// scheme URLs. Add LeafNodeFixture
that polls /leafz for connection readiness and three E2E tests covering
hub→leaf delivery, leaf→hub delivery, and subject-scoped propagation.
2026-03-12 19:47:40 -04:00
Joseph Doherty
338f44b07b test: add E2E gateway tests (cross-gateway messaging, interest-only)
- GatewayFixture: polls /gatewayz monitoring endpoint until both servers
  report num_gateways >= 1, replacing Task.Delay with proper synchronization
- GatewayTests: two tests covering cross-gateway delivery and interest-only
  no-delivery behaviour, using double PingAsync() instead of Task.Delay
- LeafNodeTests: replace Task.Delay(500) with sub.PingAsync()+pub.PingAsync()
  to properly fence subscription propagation without timing dependencies
- Fix GatewayManager.StartAsync to read remotes from RemoteGateways (config-
  parsed) in addition to the legacy Remotes list, enabling config-file-driven
  outbound gateway connections
2026-03-12 19:45:54 -04:00
Joseph Doherty
5d9d1bebd5 test: add E2E JetStream push consumers, ACK policies, retention modes, ordered, mirror, source
Adds 8 new E2E tests to JetStreamTests.cs (tests 11-18) covering push
consumer config, AckNone/AckAll policies, Interest/WorkQueue retention,
ordered consumers, mirror streams, and source streams.

Fixes three server gaps exposed by the new tests: mirror JSON parsing
(deliver_subject and mirror object fields were silently ignored in stream
and consumer API handlers), and deliver_subject omitted from consumer
info wire format. Also fixes ShutdownDrainTests to use TaskCompletionSource
on ConnectionDisconnected instead of a Task.Delay poll loop.
2026-03-12 19:36:29 -04:00
Joseph Doherty
7fbffffd05 refactor: rename remaining tests to NATS.Server.Core.Tests
- Rename tests/NATS.Server.Tests -> tests/NATS.Server.Core.Tests
- Update solution file, InternalsVisibleTo, and csproj references
- Remove JETSTREAM_INTEGRATION_MATRIX and NATS.NKeys from csproj (moved to JetStream.Tests and Auth.Tests)
- Update all namespaces from NATS.Server.Tests.* to NATS.Server.Core.Tests.*
- Replace private GetFreePort/ReadUntilAsync helpers with TestUtilities calls
- Fix stale namespace in Transport.Tests/NetworkingGoParityTests.cs
2026-03-12 16:14:02 -04:00
Joseph Doherty
78b4bc2486 refactor: extract NATS.Server.JetStream.Tests project
Move 225 JetStream-related test files from NATS.Server.Tests into a
dedicated NATS.Server.JetStream.Tests project. This includes root-level
JetStream*.cs files, storage test files (FileStore, MemStore,
StreamStoreContract), and the full JetStream/ subfolder tree (Api,
Cluster, Consumers, MirrorSource, Snapshots, Storage, Streams).

Updated all namespaces, added InternalsVisibleTo, registered in the
solution file, and added the JETSTREAM_INTEGRATION_MATRIX define.
2026-03-12 15:58:10 -04:00
Joseph Doherty
36b9dfa654 refactor: extract NATS.Server.Auth.Tests project
Move 50 auth/accounts/permissions/JWT/NKey test files from
NATS.Server.Tests into a dedicated NATS.Server.Auth.Tests project.
Update namespaces, replace private GetFreePort/ReadUntilAsync helpers
with TestUtilities calls, replace Task.Delay with TaskCompletionSource
in test doubles, and add InternalsVisibleTo.

690 tests pass.
2026-03-12 15:54:07 -04:00
Joseph Doherty
0c086522a4 refactor: extract NATS.Server.Monitoring.Tests project
Move 39 monitoring, events, and system endpoint test files from
NATS.Server.Tests into a dedicated NATS.Server.Monitoring.Tests project.
Update namespaces, replace private GetFreePort/ReadUntilAsync with
TestUtilities shared helpers, add InternalsVisibleTo, and register
in the solution file. All 439 tests pass.
2026-03-12 15:44:12 -04:00
Joseph Doherty
edf9ed770e refactor: extract NATS.Server.Raft.Tests project
Move 43 Raft consensus test files (8 root-level + 35 in Raft/ subfolder)
from NATS.Server.Tests into a dedicated NATS.Server.Raft.Tests project.
Update namespaces, add InternalsVisibleTo, and fix timing/exception
handling issues in moved test files.
2026-03-12 15:36:02 -04:00
Joseph Doherty
615752cdc2 refactor: extract NATS.Server.Clustering.Tests project
Move 29 clustering/routing test files from NATS.Server.Tests to a
dedicated NATS.Server.Clustering.Tests project. Update namespaces,
replace private GetFreePort/ReadUntilAsync helpers with TestUtilities
calls, and extract TestServerFactory/ClusterTestServer to TestUtilities
to fix cross-project reference from JetStreamStartupTests.
2026-03-12 15:31:58 -04:00
Joseph Doherty
3f7d896a34 refactor: extract NATS.Server.LeafNodes.Tests project
Move 28 leaf node test files from NATS.Server.Tests into a dedicated
NATS.Server.LeafNodes.Tests project. Update namespaces, add
InternalsVisibleTo, register in solution file. Replace all Task.Delay
polling loops with PollHelper.WaitUntilAsync/YieldForAsync from
TestUtilities. Replace private ReadUntilAsync in LeafProtocolTests
with SocketTestHelper.ReadUntilAsync.

All 281 tests pass.
2026-03-12 15:23:33 -04:00
Joseph Doherty
9972b74bc3 refactor: extract NATS.Server.Gateways.Tests project
Move 25 gateway-related test files from NATS.Server.Tests into a
dedicated NATS.Server.Gateways.Tests project. Update namespaces,
replace private ReadUntilAsync with SocketTestHelper from TestUtilities,
inline TestServerFactory usage, add InternalsVisibleTo, and register
the project in the solution file. All 261 tests pass.
2026-03-12 15:10:50 -04:00
Joseph Doherty
a6be5e11ed refactor: extract NATS.Server.Mqtt.Tests project
Move 29 MQTT test files from NATS.Server.Tests into a dedicated
NATS.Server.Mqtt.Tests project. Update namespaces, add
InternalsVisibleTo, and replace Task.Delay calls with
PollHelper.WaitUntilAsync for proper synchronization.
2026-03-12 15:03:12 -04:00
Joseph Doherty
d2c04fcca5 refactor: extract NATS.Server.Transport.Tests project
Move TLS, OCSP, WebSocket, Networking, and IO test files from
NATS.Server.Tests into a dedicated NATS.Server.Transport.Tests
project. Update namespaces, replace private GetFreePort/ReadUntilAsync
with shared TestUtilities helpers, extract TestCertHelper to
TestUtilities, and replace Task.Delay polling loops with
PollHelper.WaitUntilAsync/YieldForAsync for proper synchronization.
2026-03-12 14:57:35 -04:00
Joseph Doherty
c30e67a69d Fix E2E test gaps and add comprehensive E2E + parity test suites
- Fix pull consumer fetch: send original stream subject in HMSG (not inbox)
  so NATS client distinguishes data messages from control messages
- Fix MaxAge expiry: add background timer in StreamManager for periodic pruning
- Fix JetStream wire format: Go-compatible anonymous objects with string enums,
  proper offset-based pagination for stream/consumer list APIs
- Add 42 E2E black-box tests (core messaging, auth, TLS, accounts, JetStream)
- Add ~1000 parity tests across all subsystems (gaps closure)
- Update gap inventory docs to reflect implementation status
2026-03-12 14:09:23 -04:00
Joseph Doherty
4ba87c4175 feat: add OCSP peer reject and chain validation events (Gap 10.10)
Add OcspChainValidationEvent DTO, OcspStatus enum, and OcspEventBuilder
helper with BuildPeerReject, BuildChainValidation, and ParseStatus methods.
Register OcspChainValidationEvent in EventJsonContext source-gen context.
Add OcspPeerReject and OcspChainValidation subject constants to EventSubjects.
10 new tests in OcspEventTests cover all DTOs, builder methods, status
parsing, and JSON round-trip.
2026-02-25 13:15:44 -05:00
Joseph Doherty
b314e3f510 feat: add S2 compression for system events (Gap 10.9)
Add EventCompressor static class with Snappy/S2 compress/decompress,
threshold-based ShouldCompress, CompressIfBeneficial with stats tracking,
and GetCompressionRatio helpers. Port 10 tests covering round-trip,
threshold logic, stats (TotalCompressed, BytesSaved, ResetStats).
Go reference: server/events.go:2082-2098 compressionType / snappyCompression.
2026-02-25 13:15:24 -05:00
Joseph Doherty
74473d81cf feat: add WebSocket-specific TLS configuration (Gap 15.1) 2026-02-25 13:14:47 -05:00
Joseph Doherty
4b9384dfcf feat: add auth error event publication (Gap 10.5)
Add SendAuthErrorEvent, SendConnectEvent, SendDisconnectEvent to
InternalEventSystem, plus AuthErrorEventCount counter and the three
companion detail record types (AuthErrorDetail, ConnectEventDetail,
DisconnectEventDetail). 10 new tests in AuthErrorEventTests all pass.
2026-02-25 13:12:52 -05:00