docs: add design doc for hardening base server

Covers -ERR infrastructure, MaxConnections enforcement,
pedantic subject validation on PUB, and server-side PING keepalive.
This commit is contained in:
Joseph Doherty
2026-02-22 21:25:10 -05:00
parent 5dee4f5fa6
commit b8acca19dd

View File

@@ -0,0 +1,141 @@
# Harden Base Server — Design Document
**Date:** 2026-02-22
**Status:** Approved
**Scope:** Small — fills gaps in the base single-node pub/sub server
## Overview
The base NATS server port handles PUB/SUB, wildcards, queue groups, and integration with real NATS clients. However, it lacks several hardening features present in the Go reference implementation. This design covers four areas:
1. `-ERR` response infrastructure
2. MaxConnections enforcement
3. Subject validation on PUB (pedantic mode) + max payload validation
4. Server-side PING keepalive with stale connection detection
## 1. -ERR Response Infrastructure
**Go reference:** `client.go:93,2608-2617`
### Wire Format
All errors use: `-ERR '{message}'\r\n` (single quotes around the message).
### Constants (NatsProtocol.cs)
Add standard error message constants matching the Go server:
| Constant | Wire message | Connection closed? |
|---|---|---|
| `MaxConnectionsExceeded` | `maximum connections exceeded` | Yes |
| `StaleConnection` | `Stale Connection` | Yes |
| `MaxPayloadViolation` | `Maximum Payload Violation` | Yes |
| `InvalidPublishSubject` | `Invalid Publish Subject` | No |
| `InvalidSubject` | `Invalid Subject` | No |
### Client Methods (NatsClient.cs)
- `SendErrAsync(string message)` — writes `-ERR '{message}'\r\n` using the existing write-lock pattern. Connection stays open.
- `SendErrAndCloseAsync(string message)` — sends the `-ERR` then triggers client shutdown via cancellation token.
## 2. MaxConnections Enforcement
**Go reference:** `server.go:3378-3384`, `client.go:2428-2431`
### Design
In `NatsServer.StartAsync`, after `AcceptAsync` returns a new TCP connection:
1. Check `_clients.Count >= _options.MaxConnections`
2. If exceeded: write `-ERR 'maximum connections exceeded'\r\n` directly on the raw `NetworkStream`, close the `TcpClient`. No `NatsClient` is created.
3. Log at Warning level: `"Client connection rejected: maximum connections ({MaxConnections}) exceeded"`
The check uses `ConcurrentDictionary.Count` which is safe for this purpose — a slight race is acceptable since Go has the same pattern (check under lock, but the lock is released before the reject write).
Default `MaxConnections` remains `65536`.
## 3. Subject Validation on PUB
**Go reference:** `client.go:2869-2871`
### Pedantic Mode Subject Validation
In `ProcessPub` and `ProcessHPub`, after parsing the command:
1. If `ClientOptions.Pedantic == true`, call `SubjectMatch.IsValidPublishSubject(subject)`
2. If invalid: `SendErrAsync("Invalid Publish Subject")` — connection stays open, message is dropped (not routed)
3. Log at Debug level
This matches Go which only validates publish subjects in pedantic mode.
### Max Payload Validation (Always)
In `ProcessPub` and `ProcessHPub`:
1. If payload size > `_options.MaxPayload`, call `SendErrAndCloseAsync("Maximum Payload Violation")`
2. This is a hard close matching Go behavior
## 4. Server-Side PING Keepalive
**Go reference:** `client.go:5537-5654,2577-2584,2680-2682`
### New Client State
- `_pingsOut: int` — count of unanswered server-initiated PINGs (via `Interlocked`)
- `_lastIn: long``Environment.TickCount64` timestamp of last inbound data
### Timer Implementation
Use `PeriodicTimer` in a dedicated `RunPingTimerAsync` method, launched alongside `FillPipeAsync` and `ProcessCommandsAsync` in `RunAsync`. This avoids fire-and-forget async from sync timer callbacks.
### Lifecycle
1. **Start:** After CONNECT handshake completes, launch `RunPingTimerAsync` with period = `PingInterval`
2. **Each tick:**
- If `TickCount64 - _lastIn < PingInterval.TotalMilliseconds`: client was recently active, reset `_pingsOut = 0`, skip
- Else if `_pingsOut + 1 > MaxPingsOut`: send `-ERR 'Stale Connection'` and close
- Else: increment `_pingsOut`, send `PING\r\n`
3. **PONG received:** In `DispatchCommandAsync` for `CommandType.Pong`, set `_pingsOut = 0`
4. **Cleanup:** Cancel and dispose the timer in client shutdown
### Updating `_lastIn`
Set `_lastIn = Environment.TickCount64` at the top of `ProcessCommandsAsync` whenever a complete command is parsed from the pipe.
### Defaults
- `PingInterval = 2 minutes`
- `MaxPingsOut = 2`
- Stale detection: non-responding client disconnected after ~6 minutes (3 intervals)
## Testing Strategy
### -ERR Infrastructure
- Unit test: `SendErrAsync` writes correct wire format
- Unit test: `SendErrAndCloseAsync` sends error then disconnects
### MaxConnections
- Integration test: connect N clients up to max, verify N+1 receives `-ERR` and is disconnected
- Verify existing clients are unaffected
### Subject Validation
- Test with pedantic client: PUB to `foo.*` gets `-ERR`, connection stays open
- Test with non-pedantic client: PUB to `foo.*` is accepted (no validation)
- Test max payload: PUB exceeding limit gets `-ERR` and connection closes
### PING Keepalive
- Test with short intervals: verify server sends PING after inactivity
- Test PONG response resets the counter
- Test stale detection: non-responding client is disconnected
- Test active client: recent data suppresses PING
## Files Modified
| File | Changes |
|---|---|
| `NatsProtocol.cs` | Error message constants |
| `NatsClient.cs` | `SendErrAsync`, `SendErrAndCloseAsync`, ping timer, `_lastIn` tracking, pedantic validation, max payload check |
| `NatsServer.cs` | MaxConnections check in accept loop |
| `ClientTests.cs` | -ERR format tests, max payload tests |
| `ServerTests.cs` | MaxConnections test, PING keepalive tests |
| `IntegrationTests.cs` | End-to-end stale connection test |