From b8acca19dd98abccbdb3f93946ac82ab187aae46 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 22 Feb 2026 21:25:10 -0500 Subject: [PATCH] docs: add design doc for hardening base server Covers -ERR infrastructure, MaxConnections enforcement, pedantic subject validation on PUB, and server-side PING keepalive. --- .../2026-02-22-harden-base-server-design.md | 141 ++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 docs/plans/2026-02-22-harden-base-server-design.md diff --git a/docs/plans/2026-02-22-harden-base-server-design.md b/docs/plans/2026-02-22-harden-base-server-design.md new file mode 100644 index 0000000..5ebdee6 --- /dev/null +++ b/docs/plans/2026-02-22-harden-base-server-design.md @@ -0,0 +1,141 @@ +# Harden Base Server — Design Document + +**Date:** 2026-02-22 +**Status:** Approved +**Scope:** Small — fills gaps in the base single-node pub/sub server + +## Overview + +The base NATS server port handles PUB/SUB, wildcards, queue groups, and integration with real NATS clients. However, it lacks several hardening features present in the Go reference implementation. This design covers four areas: + +1. `-ERR` response infrastructure +2. MaxConnections enforcement +3. Subject validation on PUB (pedantic mode) + max payload validation +4. Server-side PING keepalive with stale connection detection + +## 1. -ERR Response Infrastructure + +**Go reference:** `client.go:93,2608-2617` + +### Wire Format + +All errors use: `-ERR '{message}'\r\n` (single quotes around the message). + +### Constants (NatsProtocol.cs) + +Add standard error message constants matching the Go server: + +| Constant | Wire message | Connection closed? | +|---|---|---| +| `MaxConnectionsExceeded` | `maximum connections exceeded` | Yes | +| `StaleConnection` | `Stale Connection` | Yes | +| `MaxPayloadViolation` | `Maximum Payload Violation` | Yes | +| `InvalidPublishSubject` | `Invalid Publish Subject` | No | +| `InvalidSubject` | `Invalid Subject` | No | + +### Client Methods (NatsClient.cs) + +- `SendErrAsync(string message)` — writes `-ERR '{message}'\r\n` using the existing write-lock pattern. Connection stays open. +- `SendErrAndCloseAsync(string message)` — sends the `-ERR` then triggers client shutdown via cancellation token. + +## 2. MaxConnections Enforcement + +**Go reference:** `server.go:3378-3384`, `client.go:2428-2431` + +### Design + +In `NatsServer.StartAsync`, after `AcceptAsync` returns a new TCP connection: + +1. Check `_clients.Count >= _options.MaxConnections` +2. If exceeded: write `-ERR 'maximum connections exceeded'\r\n` directly on the raw `NetworkStream`, close the `TcpClient`. No `NatsClient` is created. +3. Log at Warning level: `"Client connection rejected: maximum connections ({MaxConnections}) exceeded"` + +The check uses `ConcurrentDictionary.Count` which is safe for this purpose — a slight race is acceptable since Go has the same pattern (check under lock, but the lock is released before the reject write). + +Default `MaxConnections` remains `65536`. + +## 3. Subject Validation on PUB + +**Go reference:** `client.go:2869-2871` + +### Pedantic Mode Subject Validation + +In `ProcessPub` and `ProcessHPub`, after parsing the command: + +1. If `ClientOptions.Pedantic == true`, call `SubjectMatch.IsValidPublishSubject(subject)` +2. If invalid: `SendErrAsync("Invalid Publish Subject")` — connection stays open, message is dropped (not routed) +3. Log at Debug level + +This matches Go which only validates publish subjects in pedantic mode. + +### Max Payload Validation (Always) + +In `ProcessPub` and `ProcessHPub`: + +1. If payload size > `_options.MaxPayload`, call `SendErrAndCloseAsync("Maximum Payload Violation")` +2. This is a hard close matching Go behavior + +## 4. Server-Side PING Keepalive + +**Go reference:** `client.go:5537-5654,2577-2584,2680-2682` + +### New Client State + +- `_pingsOut: int` — count of unanswered server-initiated PINGs (via `Interlocked`) +- `_lastIn: long` — `Environment.TickCount64` timestamp of last inbound data + +### Timer Implementation + +Use `PeriodicTimer` in a dedicated `RunPingTimerAsync` method, launched alongside `FillPipeAsync` and `ProcessCommandsAsync` in `RunAsync`. This avoids fire-and-forget async from sync timer callbacks. + +### Lifecycle + +1. **Start:** After CONNECT handshake completes, launch `RunPingTimerAsync` with period = `PingInterval` +2. **Each tick:** + - If `TickCount64 - _lastIn < PingInterval.TotalMilliseconds`: client was recently active, reset `_pingsOut = 0`, skip + - Else if `_pingsOut + 1 > MaxPingsOut`: send `-ERR 'Stale Connection'` and close + - Else: increment `_pingsOut`, send `PING\r\n` +3. **PONG received:** In `DispatchCommandAsync` for `CommandType.Pong`, set `_pingsOut = 0` +4. **Cleanup:** Cancel and dispose the timer in client shutdown + +### Updating `_lastIn` + +Set `_lastIn = Environment.TickCount64` at the top of `ProcessCommandsAsync` whenever a complete command is parsed from the pipe. + +### Defaults + +- `PingInterval = 2 minutes` +- `MaxPingsOut = 2` +- Stale detection: non-responding client disconnected after ~6 minutes (3 intervals) + +## Testing Strategy + +### -ERR Infrastructure +- Unit test: `SendErrAsync` writes correct wire format +- Unit test: `SendErrAndCloseAsync` sends error then disconnects + +### MaxConnections +- Integration test: connect N clients up to max, verify N+1 receives `-ERR` and is disconnected +- Verify existing clients are unaffected + +### Subject Validation +- Test with pedantic client: PUB to `foo.*` gets `-ERR`, connection stays open +- Test with non-pedantic client: PUB to `foo.*` is accepted (no validation) +- Test max payload: PUB exceeding limit gets `-ERR` and connection closes + +### PING Keepalive +- Test with short intervals: verify server sends PING after inactivity +- Test PONG response resets the counter +- Test stale detection: non-responding client is disconnected +- Test active client: recent data suppresses PING + +## Files Modified + +| File | Changes | +|---|---| +| `NatsProtocol.cs` | Error message constants | +| `NatsClient.cs` | `SendErrAsync`, `SendErrAndCloseAsync`, ping timer, `_lastIn` tracking, pedantic validation, max payload check | +| `NatsServer.cs` | MaxConnections check in accept loop | +| `ClientTests.cs` | -ERR format tests, max payload tests | +| `ServerTests.cs` | MaxConnections test, PING keepalive tests | +| `IntegrationTests.cs` | End-to-end stale connection test |