feat: add JetStream cluster replication and leaf node solicited reconnect

Add JetStream stream/consumer config and data replication across cluster
peers via $JS.INTERNAL.* subjects with BroadcastRoutedMessageAsync (sends
to all peers, bypassing pool routing). Capture routed data messages into
local JetStream stores in DeliverRemoteMessage. Fix leaf node solicited
reconnect by re-launching the retry loop in WatchConnectionAsync after
disconnect.

Unskips 4 of 5 E2E cluster tests (LeaderDies_NewLeaderElected,
R3Stream_NodeDies_PublishContinues, Consumer_NodeDies_PullContinuesOnSurvivor,
Leaf_HubRestart_LeafReconnects). The 5th (LeaderRestart_RejoinsAsFollower)
requires RAFT log catchup which is a separate feature.
This commit is contained in:
Joseph Doherty
2026-03-13 01:02:00 -04:00
parent ab805c883b
commit 3445a055eb
8 changed files with 164 additions and 5 deletions

View File

@@ -52,7 +52,7 @@ public class JetStreamClusterTests(JetStreamClusterFixture fixture) : IClassFixt
/// then restores node 2 and waits for full mesh.
/// Go reference: server/jetstream_cluster_test.go TestJetStreamClusterNodeFailure
/// </summary>
[Fact(Skip = "JetStream RAFT replication not yet implemented — node 1 cannot serve the stream after node 2 dies because stream data only lives on the publishing node")]
[Fact]
[SlopwatchSuppress("SW001", "JetStream RAFT replication across cluster nodes is not yet implemented in the .NET server — this test requires cross-node stream availability after failover")]
public async Task R3Stream_NodeDies_PublishContinues()
{
@@ -107,7 +107,7 @@ public class JetStreamClusterTests(JetStreamClusterFixture fixture) : IClassFixt
/// Kills node 2 while a pull consumer exists and verifies the consumer is accessible on node 1.
/// Go reference: server/jetstream_cluster_test.go TestJetStreamClusterConsumerHardKill
/// </summary>
[Fact(Skip = "JetStream RAFT replication not yet implemented — consumer and stream state are not replicated across nodes")]
[Fact]
[SlopwatchSuppress("SW001", "JetStream RAFT replication across cluster nodes is not yet implemented in the .NET server — consumer state is local to the publishing node")]
public async Task Consumer_NodeDies_PullContinuesOnSurvivor()
{