# Harden Base Server — Design Document **Date:** 2026-02-22 **Status:** Approved **Scope:** Small — fills gaps in the base single-node pub/sub server ## Overview The base NATS server port handles PUB/SUB, wildcards, queue groups, and integration with real NATS clients. However, it lacks several hardening features present in the Go reference implementation. This design covers four areas: 1. `-ERR` response infrastructure 2. MaxConnections enforcement 3. Subject validation on PUB (pedantic mode) + max payload validation 4. Server-side PING keepalive with stale connection detection ## 1. -ERR Response Infrastructure **Go reference:** `client.go:93,2608-2617` ### Wire Format All errors use: `-ERR '{message}'\r\n` (single quotes around the message). ### Constants (NatsProtocol.cs) Add standard error message constants matching the Go server: | Constant | Wire message | Connection closed? | |---|---|---| | `MaxConnectionsExceeded` | `maximum connections exceeded` | Yes | | `StaleConnection` | `Stale Connection` | Yes | | `MaxPayloadViolation` | `Maximum Payload Violation` | Yes | | `InvalidPublishSubject` | `Invalid Publish Subject` | No | | `InvalidSubject` | `Invalid Subject` | No | ### Client Methods (NatsClient.cs) - `SendErrAsync(string message)` — writes `-ERR '{message}'\r\n` using the existing write-lock pattern. Connection stays open. - `SendErrAndCloseAsync(string message)` — sends the `-ERR` then triggers client shutdown via cancellation token. ## 2. MaxConnections Enforcement **Go reference:** `server.go:3378-3384`, `client.go:2428-2431` ### Design In `NatsServer.StartAsync`, after `AcceptAsync` returns a new TCP connection: 1. Check `_clients.Count >= _options.MaxConnections` 2. If exceeded: write `-ERR 'maximum connections exceeded'\r\n` directly on the raw `NetworkStream`, close the `TcpClient`. No `NatsClient` is created. 3. Log at Warning level: `"Client connection rejected: maximum connections ({MaxConnections}) exceeded"` The check uses `ConcurrentDictionary.Count` which is safe for this purpose — a slight race is acceptable since Go has the same pattern (check under lock, but the lock is released before the reject write). Default `MaxConnections` remains `65536`. ## 3. Subject Validation on PUB **Go reference:** `client.go:2869-2871` ### Pedantic Mode Subject Validation In `ProcessPub` and `ProcessHPub`, after parsing the command: 1. If `ClientOptions.Pedantic == true`, call `SubjectMatch.IsValidPublishSubject(subject)` 2. If invalid: `SendErrAsync("Invalid Publish Subject")` — connection stays open, message is dropped (not routed) 3. Log at Debug level This matches Go which only validates publish subjects in pedantic mode. ### Max Payload Validation (Always) In `ProcessPub` and `ProcessHPub`: 1. If payload size > `_options.MaxPayload`, call `SendErrAndCloseAsync("Maximum Payload Violation")` 2. This is a hard close matching Go behavior ## 4. Server-Side PING Keepalive **Go reference:** `client.go:5537-5654,2577-2584,2680-2682` ### New Client State - `_pingsOut: int` — count of unanswered server-initiated PINGs (via `Interlocked`) - `_lastIn: long` — `Environment.TickCount64` timestamp of last inbound data ### Timer Implementation Use `PeriodicTimer` in a dedicated `RunPingTimerAsync` method, launched alongside `FillPipeAsync` and `ProcessCommandsAsync` in `RunAsync`. This avoids fire-and-forget async from sync timer callbacks. ### Lifecycle 1. **Start:** After CONNECT handshake completes, launch `RunPingTimerAsync` with period = `PingInterval` 2. **Each tick:** - If `TickCount64 - _lastIn < PingInterval.TotalMilliseconds`: client was recently active, reset `_pingsOut = 0`, skip - Else if `_pingsOut + 1 > MaxPingsOut`: send `-ERR 'Stale Connection'` and close - Else: increment `_pingsOut`, send `PING\r\n` 3. **PONG received:** In `DispatchCommandAsync` for `CommandType.Pong`, set `_pingsOut = 0` 4. **Cleanup:** Cancel and dispose the timer in client shutdown ### Updating `_lastIn` Set `_lastIn = Environment.TickCount64` at the top of `ProcessCommandsAsync` whenever a complete command is parsed from the pipe. ### Defaults - `PingInterval = 2 minutes` - `MaxPingsOut = 2` - Stale detection: non-responding client disconnected after ~6 minutes (3 intervals) ## Testing Strategy ### -ERR Infrastructure - Unit test: `SendErrAsync` writes correct wire format - Unit test: `SendErrAndCloseAsync` sends error then disconnects ### MaxConnections - Integration test: connect N clients up to max, verify N+1 receives `-ERR` and is disconnected - Verify existing clients are unaffected ### Subject Validation - Test with pedantic client: PUB to `foo.*` gets `-ERR`, connection stays open - Test with non-pedantic client: PUB to `foo.*` is accepted (no validation) - Test max payload: PUB exceeding limit gets `-ERR` and connection closes ### PING Keepalive - Test with short intervals: verify server sends PING after inactivity - Test PONG response resets the counter - Test stale detection: non-responding client is disconnected - Test active client: recent data suppresses PING ## Files Modified | File | Changes | |---|---| | `NatsProtocol.cs` | Error message constants | | `NatsClient.cs` | `SendErrAsync`, `SendErrAndCloseAsync`, ping timer, `_lastIn` tracking, pedantic validation, max payload check | | `NatsServer.cs` | MaxConnections check in accept loop | | `ClientTests.cs` | -ERR format tests, max payload tests | | `ServerTests.cs` | MaxConnections test, PING keepalive tests | | `IntegrationTests.cs` | End-to-end stale connection test |