# Monitoring HTTP & TLS Support Design **Date:** 2026-02-22 **Scope:** Port monitoring endpoints (`/varz`, `/connz`) and full TLS support from Go NATS server **Go Reference:** `golang/nats-server/server/monitor.go`, `server.go` (TLS), `client.go` (TLS), `opts.go` ## Overview Two features ported from Go NATS: 1. **Monitoring HTTP** — Kestrel Minimal API embedded in `NatsServer`, serving `/varz`, `/connz`, `/healthz` and stub endpoints. Exact Go JSON schema for tooling compatibility. 2. **TLS Support** — `SslStream` wrapping with four modes: no TLS, TLS required, TLS-first, and mixed TLS/plaintext. Certificate pinning, client cert verification, rate limiting. ## 1. Server-Level Stats Aggregation New `ServerStats` class with atomic counters, replacing the need to sum across all clients on each `/varz` request. ### ServerStats Fields ```csharp // src/NATS.Server/ServerStats.cs public sealed class ServerStats { public long InMsgs; public long OutMsgs; public long InBytes; public long OutBytes; public long TotalConnections; public long SlowConsumers; public long StaleConnections; public long Stalls; public long SlowConsumerClients; public long SlowConsumerRoutes; public long SlowConsumerLeafs; public long SlowConsumerGateways; public readonly ConcurrentDictionary HttpReqStats = new(); } ``` ### Integration Points - `NatsServer` owns a `ServerStats` instance, passes it to each `NatsClient` - `NatsClient.ProcessPub` increments server-level `InMsgs`/`InBytes` alongside client-level counters - `NatsClient.SendMessageAsync` increments server-level `OutMsgs`/`OutBytes` - Accept loop increments `TotalConnections` - `NatsServer.StartTime` field added (set once at startup) ## 2. Monitoring HTTP Endpoints ### HTTP Stack Kestrel Minimal APIs via `FrameworkReference` to `Microsoft.AspNetCore.App`. No NuGet packages needed. ### Endpoints | Path | Handler | Description | |------|---------|-------------| | `/` | `HandleRoot` | Links to all endpoints | | `/varz` | `HandleVarz` | Server stats and config | | `/connz` | `HandleConnz` | Connection info (paginated) | | `/healthz` | `HandleHealthz` | Health check (200 OK) | | `/routez` | stub | Returns `{}` | | `/gatewayz` | stub | Returns `{}` | | `/leafz` | stub | Returns `{}` | | `/subz` | stub | Returns `{}` | | `/accountz` | stub | Returns `{}` | | `/jsz` | stub | Returns `{}` | All paths support optional base path prefix via `MonitorBasePath` config. ### Configuration ```csharp // Added to NatsOptions public int MonitorPort { get; set; } // 0 = disabled, CLI: -m public string MonitorHost { get; set; } = "0.0.0.0"; public string? MonitorBasePath { get; set; } public int MonitorHttpsPort { get; set; } // 0 = disabled ``` ### Varz Model Exact Go JSON field names. All fields from Go's `Varz` struct including nested config structs (`ClusterOptsVarz`, `GatewayOptsVarz`, `LeafNodeOptsVarz`, `MqttOptsVarz`, `WebsocketOptsVarz`, `JetStreamVarz`). Nested structs return defaults/zeros until those subsystems are ported. Key field categories: identification, network config, security/limits, timing/lifecycle, runtime metrics (mem, CPU, cores), connection stats, message stats, health counters, subsystem configs, HTTP request stats. ### Connz Model Paginated connection list with query parameter support: - `sort` — sort field (cid, bytes_to, msgs_to, etc.) - `subs` / `subs=detail` — include subscription lists - `offset` / `limit` — pagination (default limit 1024) - `state` — filter open/closed/all - `auth` — include usernames `ConnInfo` includes all Go fields: cid, kind, ip, port, start, last_activity, rtt, uptime, idle, pending, msg/byte stats, subscription count, client name/lang/version, TLS version/cipher, account. ### Concurrency - `HandleVarz` acquires a `SemaphoreSlim(1,1)` to serialize JSON building (matches Go's `varzMu`) - `HandleConnz` snapshots `_clients.Values.ToArray()` to avoid holding the dictionary during serialization - CPU percentage sampled via `Process.TotalProcessorTime` delta, cached for 1 second ### NatsClient Additions for ConnInfo ```csharp public DateTime StartTime { get; } // set in constructor public DateTime LastActivity; // updated on every command dispatch public string? RemoteIp { get; } // from socket.RemoteEndPoint public int RemotePort { get; } // from socket.RemoteEndPoint ``` ## 3. TLS Support ### Configuration ```csharp // Added to NatsOptions public string? TlsCert { get; set; } public string? TlsKey { get; set; } public string? TlsCaCert { get; set; } public bool TlsVerify { get; set; } public bool TlsMap { get; set; } public double TlsTimeout { get; set; } = 2.0; public bool TlsHandshakeFirst { get; set; } public TimeSpan TlsHandshakeFirstFallback { get; set; } = TimeSpan.FromMilliseconds(50); public bool AllowNonTls { get; set; } public long TlsRateLimit { get; set; } public HashSet? TlsPinnedCerts { get; set; } public SslProtocols TlsMinVersion { get; set; } = SslProtocols.Tls12; ``` CLI args: `--tls`, `--tlscert`, `--tlskey`, `--tlscacert`, `--tlsverify` ### INFO Message Changes Three new fields on `ServerInfo`: `tls_required`, `tls_verify`, `tls_available`. - `tls_required = (TlsConfig != null && !AllowNonTls)` - `tls_verify = (TlsConfig != null && TlsVerify)` - `tls_available = (TlsConfig != null && AllowNonTls)` ### Four TLS Modes **Mode 1: No TLS** — current behavior, unchanged. **Mode 2: TLS Required** — send INFO with `tls_required=true`, client initiates TLS, server detects 0x16 byte, performs `SslStream` handshake, validates pinned certs, continues protocol over encrypted stream. **Mode 3: TLS First** — do NOT send INFO, wait up to 50ms for data. If 0x16 byte arrives: TLS handshake then send INFO over encrypted stream. If timeout or non-TLS byte: fallback to Mode 2 flow. **Mode 4: Mixed** — send INFO with `tls_available=true`, peek first byte. 0x16 → TLS handshake. Other → continue plaintext. ### Key Components **`TlsHelper`** — static class for cert loading (`X509Certificate2` from PEM/PFX), CA cert loading, building `SslServerAuthenticationOptions`, pinned cert validation (SHA256 of SubjectPublicKeyInfo). **`TlsConnectionWrapper`** — per-connection negotiation state machine. Takes socket + options, returns `(Stream stream, bool infoAlreadySent)`. Handles peek logic, timeout, handshake, cert validation. **`PeekableStream`** — wraps `NetworkStream`, buffers peeked bytes, replays them on first `ReadAsync`. Required so `SslStream.AuthenticateAsServerAsync` sees the full TLS ClientHello including the peeked byte. **`TlsRateLimiter`** — token-bucket rate limiter. Refills `TlsRateLimit` tokens per second. `WaitAsync` blocks if no tokens. Only applies to TLS handshakes, not plain connections. **`TlsConnectionState`** — post-handshake record: `TlsVersion`, `CipherSuite`, `PeerCert`. Stored on `NatsClient` for `/connz` reporting. ### NatsClient Changes Constructor takes `Stream` instead of building `NetworkStream` internally. TLS negotiation happens before `NatsClient` is constructed. `NatsClient` receives the already-negotiated stream and `TlsConnectionState`. ### Accept Loop Changes ``` Accept socket → Increment TotalConnections → Rate limit check (if TLS configured) → TlsConnectionWrapper.NegotiateAsync (returns stream + infoAlreadySent) → Extract TlsConnectionState from SslStream if applicable → Construct NatsClient with stream + tlsState → client.InfoAlreadySent flag set if TLS-first sent INFO during negotiation → RunClientAsync ``` ## 4. File Layout ``` src/NATS.Server/ ServerStats.cs Monitoring/ MonitorServer.cs # Kestrel host, route registration Varz.cs # Varz + nested config structs Connz.cs # Connz, ConnInfo, ConnzOptions, SubDetail VarzHandler.cs # Snapshot logic, CPU/mem sampling ConnzHandler.cs # Query param parsing, sort, pagination Tls/ TlsHelper.cs # Cert loading, auth options builder TlsConnectionWrapper.cs # Per-connection TLS negotiation TlsConnectionState.cs # Post-handshake state record TlsRateLimiter.cs # Token-bucket rate limiter PeekableStream.cs # Buffered-peek stream wrapper ``` ### Package Dependencies - `FrameworkReference` to `Microsoft.AspNetCore.App` in `NATS.Server.csproj` (for Kestrel) - No new NuGet packages — `SslStream`, `X509Certificate2`, `SslServerAuthenticationOptions` all in `System.Net.Security` - Tests use `HttpClient` (built-in) and `CertificateRequest` (built-in) for self-signed test certs ## 5. Testing Strategy ### Monitoring Tests (`MonitorTests.cs`) - `/varz` returns correct server identity, config limits, zero stats on fresh server - After pub/sub traffic: message/byte counters are accurate - `/connz` pagination: `?limit=2&offset=0` with 5 clients returns 2, total=5 - `/connz?sort=bytes_to` ordering - `/connz?subs=true` includes subscription subjects - `/healthz` returns 200 - HTTP request stats tracked in `/varz` response ### TLS Tests (`TlsTests.cs`) Self-signed certs generated in-memory via `CertificateRequest` + `RSA.Create()`. - Basic TLS: server cert, client connects with SslStream, pub/sub works - TLS Required: plaintext client rejected - TLS Verify: valid client cert succeeds, wrong cert fails - Mixed mode: TLS and plaintext clients coexist - TLS First: immediate TLS handshake without reading INFO first - TLS First fallback: slow client gets INFO sent, normal negotiation - Certificate pinning: matching cert accepted, non-matching rejected - Rate limiting: rapid connections throttled - TLS timeout: incomplete handshake closed after configured timeout - Integration: NATS.Client.Core NuGet client works over TLS - Monitoring: `/connz` shows `tls_version` and `tls_cipher_suite` ## 6. Error Handling - **TLS handshake failures** are non-fatal: log warning, close socket, increment counter - **Mixed mode byte detection**: 0x16 → TLS, printable ASCII → plain, connection close → clean disconnect - **Rate limiter**: holds TCP connection open until token available (not rejected) - **Monitoring concurrency**: `varzMu` semaphore serializes `/varz`, client snapshot for `/connz` - **CPU sampling**: cached 1 second to avoid overhead on rapid polls - **Graceful shutdown**: `MonitorServer.DisposeAsync()` stops Kestrel, rate limiter disposes timer, in-flight handshakes cancelled via CancellationToken