Merge branch 'feature/core-lifecycle' into main

Reconcile close reason tracking: feature branch's MarkClosed() and
ShouldSkipFlush/FlushAndCloseAsync now use main's ClientClosedReason
enum. ClosedState enum retained for forward compatibility.
This commit is contained in:
Joseph Doherty
2026-02-23 00:09:30 -05:00
12 changed files with 2745 additions and 18 deletions

View File

@@ -0,0 +1,199 @@
# Section 2: Client/Connection Handling — Design
> Implements all in-scope gaps from differences.md Section 2.
## Scope
8 features, all single-server client-facing (no clustering/routes/gateways/leaf):
1. Close reason tracking (ClosedState enum)
2. Connection state flags (bitfield replacing `_connectReceived`)
3. Channel-based write loop with batch flush
4. Slow consumer detection (pending bytes + write deadline)
5. Write deadline / timeout
6. Verbose mode (`+OK` responses)
7. No-responders validation and notification
8. Per-read-cycle stat batching
## A. Close Reasons
New `ClientClosedReason` enum with 16 values scoped to single-server:
```
ClientClosed, AuthenticationTimeout, AuthenticationViolation, TLSHandshakeError,
SlowConsumerPendingBytes, SlowConsumerWriteDeadline, WriteError, ReadError,
ParseError, StaleConnection, ProtocolViolation, MaxPayloadExceeded,
MaxSubscriptionsExceeded, ServerShutdown, MsgHeaderViolation, NoRespondersRequiresHeaders
```
Go has 37 values; excluded: route/gateway/leaf/JWT/operator-mode values.
Per-client `CloseReason` property set before closing. Available in monitoring (`/connz`).
## B. Connection State Flags
`ClientFlags` bitfield enum backed by `int`, manipulated via `Interlocked.Or`/`Interlocked.And`:
```
ConnectReceived = 1,
FirstPongSent = 2,
HandshakeComplete = 4,
CloseConnection = 8,
WriteLoopStarted = 16,
IsSlowConsumer = 32,
ConnectProcessFinished = 64
```
Replaces current `_connectReceived` (int with Volatile.Read/Write).
Helper methods: `SetFlag(flag)`, `ClearFlag(flag)`, `HasFlag(flag)`.
## C. Channel-based Write Loop
### Architecture
Replace inline `_writeLock` + direct stream writes:
```
Producer threads → QueueOutbound(bytes) → Channel<ReadOnlyMemory<byte>> → WriteLoop → Stream
```
### Components
- `Channel<ReadOnlyMemory<byte>>` — bounded (capacity derived from MaxPending / avg message size, or 8192 items)
- `_pendingBytes` (long) — tracks queued but unflushed bytes via `Interlocked.Add`
- `RunWriteLoopAsync` — background task: `WaitToReadAsync` → drain all via `TryRead` → single `FlushAsync`
- `QueueOutbound(ReadOnlyMemory<byte>)` — enqueue, update pending bytes, check slow consumer
### Coalescing
The write loop drains all available items from the channel before flushing:
```
while (await reader.WaitToReadAsync(ct))
{
while (reader.TryRead(out var data))
await stream.WriteAsync(data, ct); // buffered writes, no flush yet
await stream.FlushAsync(ct); // single flush after batch
}
```
### Migration
All existing write paths refactored:
- `SendMessageAsync` → serialize MSG/HMSG to byte array → `QueueOutbound`
- `WriteAsync` → serialize protocol message → `QueueOutbound`
- Remove `_writeLock` SemaphoreSlim
## D. Slow Consumer Detection
### Pending Bytes (Hard Limit)
In `QueueOutbound`, before writing to channel:
```
if (_pendingBytes + data.Length > _maxPending)
{
SetFlag(IsSlowConsumer);
CloseWithReason(SlowConsumerPendingBytes);
return;
}
```
- `MaxPending` default: 64MB (matching Go's `MAX_PENDING_SIZE`)
- New option in `NatsOptions`
### Write Deadline (Timeout)
In write loop flush:
```
using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
cts.CancelAfter(_writeDeadline);
await stream.FlushAsync(cts.Token);
```
On timeout → close with `SlowConsumerWriteDeadline`.
- `WriteDeadline` default: 10 seconds
- New option in `NatsOptions`
### Monitoring
- `IsSlowConsumer` flag readable for `/connz`
- Server-level `SlowConsumerCount` stat incremented
## E. Verbose Mode
After successful command processing (CONNECT, SUB, UNSUB, PUB), check `ClientOpts?.Verbose`:
```
if (ClientOpts?.Verbose == true)
QueueOutbound(OkBytes);
```
`OkBytes` = pre-encoded `+OK\r\n` static byte array in `NatsProtocol`.
## F. No-Responders
### CONNECT Validation
```
if (clientOpts.NoResponders && !clientOpts.Headers)
{
CloseWithReason(NoRespondersRequiresHeaders);
return;
}
```
### Publish-time Notification
In `NatsServer` message delivery, after `Match()` returns zero subscribers:
```
if (!delivered && reply.Length > 0 && publisher.ClientOpts?.NoResponders == true)
{
// Send HMSG with NATS/1.0 503 status back to publisher
var header = $"NATS/1.0 503\r\nNats-Subject: {subject}\r\n\r\n";
publisher.SendNoRespondersAsync(reply, sid, header);
}
```
## G. Stat Batching
In read loop, accumulate locally:
```
long localInMsgs = 0, localInBytes = 0;
// ... per message: localInMsgs++; localInBytes += size;
// End of read cycle:
Interlocked.Add(ref _inMsgs, localInMsgs);
Interlocked.Add(ref _inBytes, localInBytes);
// Same for server stats
```
Reduces atomic operations from per-message to per-read-cycle.
## Files
| File | Change | Size |
|------|--------|------|
| `ClientClosedReason.cs` | New | Small |
| `ClientFlags.cs` | New | Small |
| `NatsClient.cs` | Major rewrite of write path | Large |
| `NatsServer.cs` | No-responders, close reason | Medium |
| `NatsOptions.cs` | MaxPending, WriteDeadline | Small |
| `NatsProtocol.cs` | +OK bytes, NoResponders | Small |
| `ClientTests.cs` | Verbose, close reasons, flags | Medium |
| `ServerTests.cs` | No-responders, slow consumer | Medium |
## Test Plan
- **Verbose mode**: Connect with `verbose:true`, send SUB/PUB, verify `+OK` responses
- **Close reasons**: Trigger each close path, verify reason is set
- **State flags**: Set/clear/check flags concurrently
- **Slow consumer (pending bytes)**: Queue more than MaxPending, verify close
- **Slow consumer (write deadline)**: Use a slow/blocked stream, verify timeout close
- **No-responders**: Publish to empty subject with reply, verify 503 HMSG
- **Write coalescing**: Send multiple messages rapidly, verify batched flush
- **Stat batching**: Send N messages, verify stats match after read cycle

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,19 @@
{
"planPath": "docs/plans/2026-02-22-section2-client-connection-handling-plan.md",
"tasks": [
{"id": 4, "subject": "Task 1: Add ClientClosedReason enum", "status": "pending"},
{"id": 5, "subject": "Task 2: Add ClientFlags bitfield", "status": "pending"},
{"id": 6, "subject": "Task 3: Add MaxPending and WriteDeadline to NatsOptions", "status": "pending"},
{"id": 7, "subject": "Task 4: Integrate ClientFlags into NatsClient", "status": "pending", "blockedBy": [4, 5, 6]},
{"id": 8, "subject": "Task 5: Implement channel-based write loop", "status": "pending", "blockedBy": [7]},
{"id": 9, "subject": "Task 6: Write tests for write loop and slow consumer", "status": "pending", "blockedBy": [8]},
{"id": 10, "subject": "Task 7: Update NatsServer for SendMessage + no-responders", "status": "pending", "blockedBy": [8]},
{"id": 11, "subject": "Task 8: Implement verbose mode", "status": "pending", "blockedBy": [10]},
{"id": 12, "subject": "Task 9: Implement no-responders CONNECT validation", "status": "pending", "blockedBy": [10]},
{"id": 13, "subject": "Task 10: Implement stat batching in read loop", "status": "pending", "blockedBy": [8]},
{"id": 14, "subject": "Task 11: Update ConnzHandler for close reason + pending bytes", "status": "pending", "blockedBy": [13]},
{"id": 15, "subject": "Task 12: Fix existing tests for new write model", "status": "pending", "blockedBy": [13]},
{"id": 16, "subject": "Task 13: Final verification and differences.md update", "status": "pending", "blockedBy": [14, 15]}
],
"lastUpdated": "2026-02-22T00:00:00Z"
}