Files
natsdotnet/docs/plans/2026-02-22-harden-base-server-design.md
Joseph Doherty b8acca19dd docs: add design doc for hardening base server
Covers -ERR infrastructure, MaxConnections enforcement,
pedantic subject validation on PUB, and server-side PING keepalive.
2026-02-22 21:25:10 -05:00

5.5 KiB

Harden Base Server — Design Document

Date: 2026-02-22 Status: Approved Scope: Small — fills gaps in the base single-node pub/sub server

Overview

The base NATS server port handles PUB/SUB, wildcards, queue groups, and integration with real NATS clients. However, it lacks several hardening features present in the Go reference implementation. This design covers four areas:

  1. -ERR response infrastructure
  2. MaxConnections enforcement
  3. Subject validation on PUB (pedantic mode) + max payload validation
  4. Server-side PING keepalive with stale connection detection

1. -ERR Response Infrastructure

Go reference: client.go:93,2608-2617

Wire Format

All errors use: -ERR '{message}'\r\n (single quotes around the message).

Constants (NatsProtocol.cs)

Add standard error message constants matching the Go server:

Constant Wire message Connection closed?
MaxConnectionsExceeded maximum connections exceeded Yes
StaleConnection Stale Connection Yes
MaxPayloadViolation Maximum Payload Violation Yes
InvalidPublishSubject Invalid Publish Subject No
InvalidSubject Invalid Subject No

Client Methods (NatsClient.cs)

  • SendErrAsync(string message) — writes -ERR '{message}'\r\n using the existing write-lock pattern. Connection stays open.
  • SendErrAndCloseAsync(string message) — sends the -ERR then triggers client shutdown via cancellation token.

2. MaxConnections Enforcement

Go reference: server.go:3378-3384, client.go:2428-2431

Design

In NatsServer.StartAsync, after AcceptAsync returns a new TCP connection:

  1. Check _clients.Count >= _options.MaxConnections
  2. If exceeded: write -ERR 'maximum connections exceeded'\r\n directly on the raw NetworkStream, close the TcpClient. No NatsClient is created.
  3. Log at Warning level: "Client connection rejected: maximum connections ({MaxConnections}) exceeded"

The check uses ConcurrentDictionary.Count which is safe for this purpose — a slight race is acceptable since Go has the same pattern (check under lock, but the lock is released before the reject write).

Default MaxConnections remains 65536.

3. Subject Validation on PUB

Go reference: client.go:2869-2871

Pedantic Mode Subject Validation

In ProcessPub and ProcessHPub, after parsing the command:

  1. If ClientOptions.Pedantic == true, call SubjectMatch.IsValidPublishSubject(subject)
  2. If invalid: SendErrAsync("Invalid Publish Subject") — connection stays open, message is dropped (not routed)
  3. Log at Debug level

This matches Go which only validates publish subjects in pedantic mode.

Max Payload Validation (Always)

In ProcessPub and ProcessHPub:

  1. If payload size > _options.MaxPayload, call SendErrAndCloseAsync("Maximum Payload Violation")
  2. This is a hard close matching Go behavior

4. Server-Side PING Keepalive

Go reference: client.go:5537-5654,2577-2584,2680-2682

New Client State

  • _pingsOut: int — count of unanswered server-initiated PINGs (via Interlocked)
  • _lastIn: longEnvironment.TickCount64 timestamp of last inbound data

Timer Implementation

Use PeriodicTimer in a dedicated RunPingTimerAsync method, launched alongside FillPipeAsync and ProcessCommandsAsync in RunAsync. This avoids fire-and-forget async from sync timer callbacks.

Lifecycle

  1. Start: After CONNECT handshake completes, launch RunPingTimerAsync with period = PingInterval
  2. Each tick:
    • If TickCount64 - _lastIn < PingInterval.TotalMilliseconds: client was recently active, reset _pingsOut = 0, skip
    • Else if _pingsOut + 1 > MaxPingsOut: send -ERR 'Stale Connection' and close
    • Else: increment _pingsOut, send PING\r\n
  3. PONG received: In DispatchCommandAsync for CommandType.Pong, set _pingsOut = 0
  4. Cleanup: Cancel and dispose the timer in client shutdown

Updating _lastIn

Set _lastIn = Environment.TickCount64 at the top of ProcessCommandsAsync whenever a complete command is parsed from the pipe.

Defaults

  • PingInterval = 2 minutes
  • MaxPingsOut = 2
  • Stale detection: non-responding client disconnected after ~6 minutes (3 intervals)

Testing Strategy

-ERR Infrastructure

  • Unit test: SendErrAsync writes correct wire format
  • Unit test: SendErrAndCloseAsync sends error then disconnects

MaxConnections

  • Integration test: connect N clients up to max, verify N+1 receives -ERR and is disconnected
  • Verify existing clients are unaffected

Subject Validation

  • Test with pedantic client: PUB to foo.* gets -ERR, connection stays open
  • Test with non-pedantic client: PUB to foo.* is accepted (no validation)
  • Test max payload: PUB exceeding limit gets -ERR and connection closes

PING Keepalive

  • Test with short intervals: verify server sends PING after inactivity
  • Test PONG response resets the counter
  • Test stale detection: non-responding client is disconnected
  • Test active client: recent data suppresses PING

Files Modified

File Changes
NatsProtocol.cs Error message constants
NatsClient.cs SendErrAsync, SendErrAndCloseAsync, ping timer, _lastIn tracking, pedantic validation, max payload check
NatsServer.cs MaxConnections check in accept loop
ClientTests.cs -ERR format tests, max payload tests
ServerTests.cs MaxConnections test, PING keepalive tests
IntegrationTests.cs End-to-end stale connection test