docs: add design doc for hardening base server
Covers -ERR infrastructure, MaxConnections enforcement, pedantic subject validation on PUB, and server-side PING keepalive.
This commit is contained in:
141
docs/plans/2026-02-22-harden-base-server-design.md
Normal file
141
docs/plans/2026-02-22-harden-base-server-design.md
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
# Harden Base Server — Design Document
|
||||||
|
|
||||||
|
**Date:** 2026-02-22
|
||||||
|
**Status:** Approved
|
||||||
|
**Scope:** Small — fills gaps in the base single-node pub/sub server
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The base NATS server port handles PUB/SUB, wildcards, queue groups, and integration with real NATS clients. However, it lacks several hardening features present in the Go reference implementation. This design covers four areas:
|
||||||
|
|
||||||
|
1. `-ERR` response infrastructure
|
||||||
|
2. MaxConnections enforcement
|
||||||
|
3. Subject validation on PUB (pedantic mode) + max payload validation
|
||||||
|
4. Server-side PING keepalive with stale connection detection
|
||||||
|
|
||||||
|
## 1. -ERR Response Infrastructure
|
||||||
|
|
||||||
|
**Go reference:** `client.go:93,2608-2617`
|
||||||
|
|
||||||
|
### Wire Format
|
||||||
|
|
||||||
|
All errors use: `-ERR '{message}'\r\n` (single quotes around the message).
|
||||||
|
|
||||||
|
### Constants (NatsProtocol.cs)
|
||||||
|
|
||||||
|
Add standard error message constants matching the Go server:
|
||||||
|
|
||||||
|
| Constant | Wire message | Connection closed? |
|
||||||
|
|---|---|---|
|
||||||
|
| `MaxConnectionsExceeded` | `maximum connections exceeded` | Yes |
|
||||||
|
| `StaleConnection` | `Stale Connection` | Yes |
|
||||||
|
| `MaxPayloadViolation` | `Maximum Payload Violation` | Yes |
|
||||||
|
| `InvalidPublishSubject` | `Invalid Publish Subject` | No |
|
||||||
|
| `InvalidSubject` | `Invalid Subject` | No |
|
||||||
|
|
||||||
|
### Client Methods (NatsClient.cs)
|
||||||
|
|
||||||
|
- `SendErrAsync(string message)` — writes `-ERR '{message}'\r\n` using the existing write-lock pattern. Connection stays open.
|
||||||
|
- `SendErrAndCloseAsync(string message)` — sends the `-ERR` then triggers client shutdown via cancellation token.
|
||||||
|
|
||||||
|
## 2. MaxConnections Enforcement
|
||||||
|
|
||||||
|
**Go reference:** `server.go:3378-3384`, `client.go:2428-2431`
|
||||||
|
|
||||||
|
### Design
|
||||||
|
|
||||||
|
In `NatsServer.StartAsync`, after `AcceptAsync` returns a new TCP connection:
|
||||||
|
|
||||||
|
1. Check `_clients.Count >= _options.MaxConnections`
|
||||||
|
2. If exceeded: write `-ERR 'maximum connections exceeded'\r\n` directly on the raw `NetworkStream`, close the `TcpClient`. No `NatsClient` is created.
|
||||||
|
3. Log at Warning level: `"Client connection rejected: maximum connections ({MaxConnections}) exceeded"`
|
||||||
|
|
||||||
|
The check uses `ConcurrentDictionary.Count` which is safe for this purpose — a slight race is acceptable since Go has the same pattern (check under lock, but the lock is released before the reject write).
|
||||||
|
|
||||||
|
Default `MaxConnections` remains `65536`.
|
||||||
|
|
||||||
|
## 3. Subject Validation on PUB
|
||||||
|
|
||||||
|
**Go reference:** `client.go:2869-2871`
|
||||||
|
|
||||||
|
### Pedantic Mode Subject Validation
|
||||||
|
|
||||||
|
In `ProcessPub` and `ProcessHPub`, after parsing the command:
|
||||||
|
|
||||||
|
1. If `ClientOptions.Pedantic == true`, call `SubjectMatch.IsValidPublishSubject(subject)`
|
||||||
|
2. If invalid: `SendErrAsync("Invalid Publish Subject")` — connection stays open, message is dropped (not routed)
|
||||||
|
3. Log at Debug level
|
||||||
|
|
||||||
|
This matches Go which only validates publish subjects in pedantic mode.
|
||||||
|
|
||||||
|
### Max Payload Validation (Always)
|
||||||
|
|
||||||
|
In `ProcessPub` and `ProcessHPub`:
|
||||||
|
|
||||||
|
1. If payload size > `_options.MaxPayload`, call `SendErrAndCloseAsync("Maximum Payload Violation")`
|
||||||
|
2. This is a hard close matching Go behavior
|
||||||
|
|
||||||
|
## 4. Server-Side PING Keepalive
|
||||||
|
|
||||||
|
**Go reference:** `client.go:5537-5654,2577-2584,2680-2682`
|
||||||
|
|
||||||
|
### New Client State
|
||||||
|
|
||||||
|
- `_pingsOut: int` — count of unanswered server-initiated PINGs (via `Interlocked`)
|
||||||
|
- `_lastIn: long` — `Environment.TickCount64` timestamp of last inbound data
|
||||||
|
|
||||||
|
### Timer Implementation
|
||||||
|
|
||||||
|
Use `PeriodicTimer` in a dedicated `RunPingTimerAsync` method, launched alongside `FillPipeAsync` and `ProcessCommandsAsync` in `RunAsync`. This avoids fire-and-forget async from sync timer callbacks.
|
||||||
|
|
||||||
|
### Lifecycle
|
||||||
|
|
||||||
|
1. **Start:** After CONNECT handshake completes, launch `RunPingTimerAsync` with period = `PingInterval`
|
||||||
|
2. **Each tick:**
|
||||||
|
- If `TickCount64 - _lastIn < PingInterval.TotalMilliseconds`: client was recently active, reset `_pingsOut = 0`, skip
|
||||||
|
- Else if `_pingsOut + 1 > MaxPingsOut`: send `-ERR 'Stale Connection'` and close
|
||||||
|
- Else: increment `_pingsOut`, send `PING\r\n`
|
||||||
|
3. **PONG received:** In `DispatchCommandAsync` for `CommandType.Pong`, set `_pingsOut = 0`
|
||||||
|
4. **Cleanup:** Cancel and dispose the timer in client shutdown
|
||||||
|
|
||||||
|
### Updating `_lastIn`
|
||||||
|
|
||||||
|
Set `_lastIn = Environment.TickCount64` at the top of `ProcessCommandsAsync` whenever a complete command is parsed from the pipe.
|
||||||
|
|
||||||
|
### Defaults
|
||||||
|
|
||||||
|
- `PingInterval = 2 minutes`
|
||||||
|
- `MaxPingsOut = 2`
|
||||||
|
- Stale detection: non-responding client disconnected after ~6 minutes (3 intervals)
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### -ERR Infrastructure
|
||||||
|
- Unit test: `SendErrAsync` writes correct wire format
|
||||||
|
- Unit test: `SendErrAndCloseAsync` sends error then disconnects
|
||||||
|
|
||||||
|
### MaxConnections
|
||||||
|
- Integration test: connect N clients up to max, verify N+1 receives `-ERR` and is disconnected
|
||||||
|
- Verify existing clients are unaffected
|
||||||
|
|
||||||
|
### Subject Validation
|
||||||
|
- Test with pedantic client: PUB to `foo.*` gets `-ERR`, connection stays open
|
||||||
|
- Test with non-pedantic client: PUB to `foo.*` is accepted (no validation)
|
||||||
|
- Test max payload: PUB exceeding limit gets `-ERR` and connection closes
|
||||||
|
|
||||||
|
### PING Keepalive
|
||||||
|
- Test with short intervals: verify server sends PING after inactivity
|
||||||
|
- Test PONG response resets the counter
|
||||||
|
- Test stale detection: non-responding client is disconnected
|
||||||
|
- Test active client: recent data suppresses PING
|
||||||
|
|
||||||
|
## Files Modified
|
||||||
|
|
||||||
|
| File | Changes |
|
||||||
|
|---|---|
|
||||||
|
| `NatsProtocol.cs` | Error message constants |
|
||||||
|
| `NatsClient.cs` | `SendErrAsync`, `SendErrAndCloseAsync`, ping timer, `_lastIn` tracking, pedantic validation, max payload check |
|
||||||
|
| `NatsServer.cs` | MaxConnections check in accept loop |
|
||||||
|
| `ClientTests.cs` | -ERR format tests, max payload tests |
|
||||||
|
| `ServerTests.cs` | MaxConnections test, PING keepalive tests |
|
||||||
|
| `IntegrationTests.cs` | End-to-end stale connection test |
|
||||||
Reference in New Issue
Block a user