Files
natsdotnet/docs/plans/2026-02-22-monitoring-tls-design.md
Joseph Doherty 8ee5a7f97b docs: add design for monitoring HTTP and TLS support
Covers /varz, /connz endpoints via Kestrel Minimal APIs,
full TLS support with four modes (none/required/first/mixed),
cert pinning, rate limiting, and testing strategy.
2026-02-22 21:33:24 -05:00

10 KiB

Monitoring HTTP & TLS Support Design

Date: 2026-02-22 Scope: Port monitoring endpoints (/varz, /connz) and full TLS support from Go NATS server Go Reference: golang/nats-server/server/monitor.go, server.go (TLS), client.go (TLS), opts.go

Overview

Two features ported from Go NATS:

  1. Monitoring HTTP — Kestrel Minimal API embedded in NatsServer, serving /varz, /connz, /healthz and stub endpoints. Exact Go JSON schema for tooling compatibility.
  2. TLS SupportSslStream wrapping with four modes: no TLS, TLS required, TLS-first, and mixed TLS/plaintext. Certificate pinning, client cert verification, rate limiting.

1. Server-Level Stats Aggregation

New ServerStats class with atomic counters, replacing the need to sum across all clients on each /varz request.

ServerStats Fields

// src/NATS.Server/ServerStats.cs
public sealed class ServerStats
{
    public long InMsgs;
    public long OutMsgs;
    public long InBytes;
    public long OutBytes;
    public long TotalConnections;
    public long SlowConsumers;
    public long StaleConnections;
    public long Stalls;
    public long SlowConsumerClients;
    public long SlowConsumerRoutes;
    public long SlowConsumerLeafs;
    public long SlowConsumerGateways;
    public readonly ConcurrentDictionary<string, long> HttpReqStats = new();
}

Integration Points

  • NatsServer owns a ServerStats instance, passes it to each NatsClient
  • NatsClient.ProcessPub increments server-level InMsgs/InBytes alongside client-level counters
  • NatsClient.SendMessageAsync increments server-level OutMsgs/OutBytes
  • Accept loop increments TotalConnections
  • NatsServer.StartTime field added (set once at startup)

2. Monitoring HTTP Endpoints

HTTP Stack

Kestrel Minimal APIs via FrameworkReference to Microsoft.AspNetCore.App. No NuGet packages needed.

Endpoints

Path Handler Description
/ HandleRoot Links to all endpoints
/varz HandleVarz Server stats and config
/connz HandleConnz Connection info (paginated)
/healthz HandleHealthz Health check (200 OK)
/routez stub Returns {}
/gatewayz stub Returns {}
/leafz stub Returns {}
/subz stub Returns {}
/accountz stub Returns {}
/jsz stub Returns {}

All paths support optional base path prefix via MonitorBasePath config.

Configuration

// Added to NatsOptions
public int MonitorPort { get; set; }             // 0 = disabled, CLI: -m
public string MonitorHost { get; set; } = "0.0.0.0";
public string? MonitorBasePath { get; set; }
public int MonitorHttpsPort { get; set; }        // 0 = disabled

Varz Model

Exact Go JSON field names. All fields from Go's Varz struct including nested config structs (ClusterOptsVarz, GatewayOptsVarz, LeafNodeOptsVarz, MqttOptsVarz, WebsocketOptsVarz, JetStreamVarz). Nested structs return defaults/zeros until those subsystems are ported.

Key field categories: identification, network config, security/limits, timing/lifecycle, runtime metrics (mem, CPU, cores), connection stats, message stats, health counters, subsystem configs, HTTP request stats.

Connz Model

Paginated connection list with query parameter support:

  • sort — sort field (cid, bytes_to, msgs_to, etc.)
  • subs / subs=detail — include subscription lists
  • offset / limit — pagination (default limit 1024)
  • state — filter open/closed/all
  • auth — include usernames

ConnInfo includes all Go fields: cid, kind, ip, port, start, last_activity, rtt, uptime, idle, pending, msg/byte stats, subscription count, client name/lang/version, TLS version/cipher, account.

Concurrency

  • HandleVarz acquires a SemaphoreSlim(1,1) to serialize JSON building (matches Go's varzMu)
  • HandleConnz snapshots _clients.Values.ToArray() to avoid holding the dictionary during serialization
  • CPU percentage sampled via Process.TotalProcessorTime delta, cached for 1 second

NatsClient Additions for ConnInfo

public DateTime StartTime { get; }       // set in constructor
public DateTime LastActivity;            // updated on every command dispatch
public string? RemoteIp { get; }         // from socket.RemoteEndPoint
public int RemotePort { get; }           // from socket.RemoteEndPoint

3. TLS Support

Configuration

// Added to NatsOptions
public string? TlsCert { get; set; }
public string? TlsKey { get; set; }
public string? TlsCaCert { get; set; }
public bool TlsVerify { get; set; }
public bool TlsMap { get; set; }
public double TlsTimeout { get; set; } = 2.0;
public bool TlsHandshakeFirst { get; set; }
public TimeSpan TlsHandshakeFirstFallback { get; set; } = TimeSpan.FromMilliseconds(50);
public bool AllowNonTls { get; set; }
public long TlsRateLimit { get; set; }
public HashSet<string>? TlsPinnedCerts { get; set; }
public SslProtocols TlsMinVersion { get; set; } = SslProtocols.Tls12;

CLI args: --tls, --tlscert, --tlskey, --tlscacert, --tlsverify

INFO Message Changes

Three new fields on ServerInfo: tls_required, tls_verify, tls_available.

  • tls_required = (TlsConfig != null && !AllowNonTls)
  • tls_verify = (TlsConfig != null && TlsVerify)
  • tls_available = (TlsConfig != null && AllowNonTls)

Four TLS Modes

Mode 1: No TLS — current behavior, unchanged.

Mode 2: TLS Required — send INFO with tls_required=true, client initiates TLS, server detects 0x16 byte, performs SslStream handshake, validates pinned certs, continues protocol over encrypted stream.

Mode 3: TLS First — do NOT send INFO, wait up to 50ms for data. If 0x16 byte arrives: TLS handshake then send INFO over encrypted stream. If timeout or non-TLS byte: fallback to Mode 2 flow.

Mode 4: Mixed — send INFO with tls_available=true, peek first byte. 0x16 → TLS handshake. Other → continue plaintext.

Key Components

TlsHelper — static class for cert loading (X509Certificate2 from PEM/PFX), CA cert loading, building SslServerAuthenticationOptions, pinned cert validation (SHA256 of SubjectPublicKeyInfo).

TlsConnectionWrapper — per-connection negotiation state machine. Takes socket + options, returns (Stream stream, bool infoAlreadySent). Handles peek logic, timeout, handshake, cert validation.

PeekableStream — wraps NetworkStream, buffers peeked bytes, replays them on first ReadAsync. Required so SslStream.AuthenticateAsServerAsync sees the full TLS ClientHello including the peeked byte.

TlsRateLimiter — token-bucket rate limiter. Refills TlsRateLimit tokens per second. WaitAsync blocks if no tokens. Only applies to TLS handshakes, not plain connections.

TlsConnectionState — post-handshake record: TlsVersion, CipherSuite, PeerCert. Stored on NatsClient for /connz reporting.

NatsClient Changes

Constructor takes Stream instead of building NetworkStream internally. TLS negotiation happens before NatsClient is constructed. NatsClient receives the already-negotiated stream and TlsConnectionState.

Accept Loop Changes

Accept socket
  → Increment TotalConnections
  → Rate limit check (if TLS configured)
  → TlsConnectionWrapper.NegotiateAsync (returns stream + infoAlreadySent)
  → Extract TlsConnectionState from SslStream if applicable
  → Construct NatsClient with stream + tlsState
  → client.InfoAlreadySent flag set if TLS-first sent INFO during negotiation
  → RunClientAsync

4. File Layout

src/NATS.Server/
  ServerStats.cs
  Monitoring/
    MonitorServer.cs          # Kestrel host, route registration
    Varz.cs                   # Varz + nested config structs
    Connz.cs                  # Connz, ConnInfo, ConnzOptions, SubDetail
    VarzHandler.cs            # Snapshot logic, CPU/mem sampling
    ConnzHandler.cs           # Query param parsing, sort, pagination
  Tls/
    TlsHelper.cs              # Cert loading, auth options builder
    TlsConnectionWrapper.cs   # Per-connection TLS negotiation
    TlsConnectionState.cs     # Post-handshake state record
    TlsRateLimiter.cs         # Token-bucket rate limiter
    PeekableStream.cs         # Buffered-peek stream wrapper

Package Dependencies

  • FrameworkReference to Microsoft.AspNetCore.App in NATS.Server.csproj (for Kestrel)
  • No new NuGet packages — SslStream, X509Certificate2, SslServerAuthenticationOptions all in System.Net.Security
  • Tests use HttpClient (built-in) and CertificateRequest (built-in) for self-signed test certs

5. Testing Strategy

Monitoring Tests (MonitorTests.cs)

  • /varz returns correct server identity, config limits, zero stats on fresh server
  • After pub/sub traffic: message/byte counters are accurate
  • /connz pagination: ?limit=2&offset=0 with 5 clients returns 2, total=5
  • /connz?sort=bytes_to ordering
  • /connz?subs=true includes subscription subjects
  • /healthz returns 200
  • HTTP request stats tracked in /varz response

TLS Tests (TlsTests.cs)

Self-signed certs generated in-memory via CertificateRequest + RSA.Create().

  • Basic TLS: server cert, client connects with SslStream, pub/sub works
  • TLS Required: plaintext client rejected
  • TLS Verify: valid client cert succeeds, wrong cert fails
  • Mixed mode: TLS and plaintext clients coexist
  • TLS First: immediate TLS handshake without reading INFO first
  • TLS First fallback: slow client gets INFO sent, normal negotiation
  • Certificate pinning: matching cert accepted, non-matching rejected
  • Rate limiting: rapid connections throttled
  • TLS timeout: incomplete handshake closed after configured timeout
  • Integration: NATS.Client.Core NuGet client works over TLS
  • Monitoring: /connz shows tls_version and tls_cipher_suite

6. Error Handling

  • TLS handshake failures are non-fatal: log warning, close socket, increment counter
  • Mixed mode byte detection: 0x16 → TLS, printable ASCII → plain, connection close → clean disconnect
  • Rate limiter: holds TCP connection open until token available (not rejected)
  • Monitoring concurrency: varzMu semaphore serializes /varz, client snapshot for /connz
  • CPU sampling: cached 1 second to avoid overhead on rapid polls
  • Graceful shutdown: MonitorServer.DisposeAsync() stops Kestrel, rate limiter disposes timer, in-flight handshakes cancelled via CancellationToken