Files
natsdotnet/docs/plans/2026-02-23-sections-7-10-gaps-design.md
2026-02-23 00:17:35 -05:00

8.5 KiB

Sections 7-10 Gaps Design: Monitoring, TLS, Logging, Ping/Pong

Date: 2026-02-23 Scope: Implement remaining gaps in differences.md sections 7 (Monitoring), 8 (TLS), 9 (Logging), 10 (Ping/Pong) Goal: Go parity for all features within scope


Section 7: Monitoring

7a. /subz Endpoint

Replace the empty stub with a full SubszHandler.

Models:

  • Subsz — response envelope: Id, Now, SublistStats, Total, Offset, Limit, Subs[]
  • SubszOptionsOffset, Limit, Subscriptions (bool for detail), Account (filter), Test (literal subject filter)
  • Reuse existing SubDetail from Connz

Algorithm:

  1. Iterate all accounts (or filter by Account param)
  2. Collect all subscriptions from each account's SubList
  3. If Test subject provided, filter using SubjectMatch.MatchLiteral() to only return subs that would receive that message
  4. Apply pagination (offset/limit)
  5. If Subscriptions is true, include SubDetail[] array

SubList stats — add a Stats() method to SubList returning SublistStats (count, cache size, inserts, removes, matches, cache hits).

Files: New Monitoring/SubszHandler.cs, Monitoring/Subsz.cs. Modify MonitorServer.cs, SubList.cs.

7b. Connz ByStop / ByReason Sorting

Add two missing sort options for closed connection queries.

  • Add ByStop and ByReason to SortOpt enum
  • Parse sort=stop and sort=reason in query params
  • Validate: these sorts only work with state=closed — return error if used with open connections

7c. Connz State Filtering & Closed Connections

Track closed connections and support state-based filtering.

Closed connection tracking:

  • ClosedClient record: Cid, Ip, Port, Start, Stop, Reason, Name, Lang, Version, InMsgs, OutMsgs, InBytes, OutBytes, NumSubs, Rtt, TlsVersion, TlsCipherSuite
  • ConcurrentQueue<ClosedClient> on NatsServer (capped at 10,000 entries)
  • Populate in RemoveClient() from client state before disposal

State filter:

  • Parse state=open|closed|all query param
  • open (default): current live connections only
  • closed: only from closed connections list
  • all: merge both

Files: Modify NatsServer.cs, ConnzHandler.cs, new Monitoring/ClosedClient.cs.

7d. Varz Slow Consumer Stats

Already at parity. SlowConsumersStats is populated from ServerStats counters. No changes needed.


Section 8: TLS

8a. TLS Rate Limiting

Already implemented via TlsRateLimiter (semaphore + periodic refill timer). Wired into AcceptClientAsync. Only a unit test needed.

8b. TLS Cert-to-User Mapping (TlsMap)

Full DN parsing using .NET built-in X500DistinguishedName.

New TlsMapAuthenticator:

  • Implements IAuthenticator
  • Receives the list of configured User objects
  • On Authenticate():
    1. Extract X509Certificate2 from auth context (passed from TlsConnectionState)
    2. Parse subject DN via cert.SubjectName (X500DistinguishedName)
    3. Build normalized DN string from RDN components
    4. Try exact DN match against user map (key = DN string)
    5. If no exact match, try CN-only match
    6. Return AuthResult with matched user's permissions

Auth context extension:

  • Add X509Certificate2? ClientCertificate to ClientAuthContext
  • Pass certificate from TlsConnectionState in ProcessConnectAsync

AuthService integration:

  • When options.TlsMap && options.TlsVerify, add TlsMapAuthenticator to authenticator chain
  • TlsMap auth runs before other authenticators (cert-based auth takes priority)

Files: New Auth/TlsMapAuthenticator.cs. Modify Auth/AuthService.cs, Auth/ClientAuthContext.cs, NatsClient.cs.


Section 9: Logging

9a. File Logging with Rotation

New options on NatsOptions:

  • LogFile (string?) — path to log file
  • LogSizeLimit (long) — file size in bytes before rotation (0 = unlimited)
  • LogMaxFiles (int) — max retained rotated files (0 = unlimited)

CLI flags: --log_file, --log_size_limit, --log_max_files

Serilog config: Add WriteTo.File() with fileSizeLimitBytes and retainedFileCountLimit when LogFile is set.

9b. Debug/Trace Modes

New options on NatsOptions:

  • Debug (bool) — enable debug-level logging
  • Trace (bool) — enable trace/verbose-level logging

CLI flags: -D (debug), -V or -T (trace), -DV (both)

Serilog config:

  • Default: MinimumLevel.Information()
  • -D: MinimumLevel.Debug()
  • -V/-T: MinimumLevel.Verbose()

9c. Color Output

Auto-detect TTY via Console.IsOutputRedirected.

  • TTY: use Serilog.Sinks.Console with AnsiConsoleTheme.Code
  • Non-TTY: use ConsoleTheme.None

Matches Go's behavior of disabling color when stderr is not a terminal.

9d. Timestamp Format Control

New options on NatsOptions:

  • Logtime (bool, default true) — include timestamps
  • LogtimeUTC (bool, default false) — use UTC format

CLI flags: --logtime (true/false), --logtime_utc

Output template adjustment:

  • With timestamps: [{Timestamp:yyyy/MM/dd HH:mm:ss.ffffff} {Level:u3}] {Message:lj}{NewLine}{Exception}
  • Without timestamps: [{Level:u3}] {Message:lj}{NewLine}{Exception}
  • UTC: set Serilog.Formatting culture to UTC

9e. Log Reopening (SIGUSR1)

When file logging is configured:

  • SIGUSR1 handler calls ReOpenLogFile() on the server
  • ReOpenLogFile() flushes and closes current Serilog logger, creates new one with same config
  • This enables external log rotation tools (logrotate)

Files: Modify NatsOptions.cs, Program.cs, NatsServer.cs.


Section 10: Ping/Pong

10a. RTT Tracking

New fields on NatsClient:

  • _rttStartTicks (long) — UTC ticks when PING sent
  • _rtt (long) — computed RTT in ticks
  • Rtt property (TimeSpan) — computed from _rtt

Logic:

  • In RunPingTimerAsync, before writing PING: _rttStartTicks = DateTime.UtcNow.Ticks
  • In DispatchCommandAsync PONG handler: compute _rtt = DateTime.UtcNow.Ticks - _rttStartTicks (min 1 tick)
  • computeRTT() helper ensures minimum 1 tick (handles clock granularity on Windows)

Monitoring exposure:

  • Populate ConnInfo.Rtt as formatted string (e.g., "1.234ms")
  • Add ByRtt sort option to Connz

10b. RTT-Based First PING Delay

New state on NatsClient:

  • _firstPongSent flag in ClientFlags

Logic in RunPingTimerAsync:

  • Before first PING, check: _firstPongSent || timeSinceStart > 2 seconds
  • If neither condition met, skip this PING cycle
  • Set _firstPongSent on first PONG after CONNECT (in PONG handler)

This prevents the server from sending PING (for RTT) before the client has had a chance to respond to the initial INFO with CONNECT+PING.

10c. Stale Connection Stats

New model:

  • StaleConnectionStatsClients, Routes, Gateways, Leafs (matching Go)

ServerStats extension:

  • Add StaleConnectionClients, StaleConnectionRoutes, etc. fields
  • Increment in MarkClosed(StaleConnection) based on connection kind

Varz exposure:

  • Add StaleConnectionStats field to Varz
  • Populate from ServerStats counters

Files: Modify NatsClient.cs, ServerStats.cs, Varz.cs, VarzHandler.cs, Connz.cs, ConnzHandler.cs.


Test Coverage

Each section includes unit tests:

Feature Test File Tests
Subz endpoint SubszHandlerTests.cs Empty response, with subs, account filter, test subject filter, pagination
Connz closed state ConnzHandlerTests.cs State=closed, ByStop sort, ByReason sort, validation errors
TLS rate limiter TlsRateLimiterTests.cs Rate enforcement, refill behavior
TlsMap auth TlsMapAuthenticatorTests.cs DN matching, CN fallback, no match
File logging LoggingTests.cs File creation, rotation on size limit
RTT tracking ClientTests.cs RTT computed on PONG, exposed in connz, ByRtt sort
First PING delay ClientTests.cs PING delayed until first PONG or 2s
Stale stats ServerTests.cs Stale counters incremented, exposed in varz

Parallelization Strategy

These work streams are independent and can be developed by parallel subagents:

  1. Monitoring stream (7a, 7b, 7c): SubszHandler + Connz closed connections + state filter
  2. TLS stream (8b): TlsMapAuthenticator
  3. Logging stream (9a-9e): All logging improvements
  4. Ping/Pong stream (10a-10c): RTT tracking + first PING delay + stale stats

Streams 1-4 touch different files with minimal overlap. The only shared touch point is NatsOptions.cs (new options for logging and ping/pong), which can be handled by one stream first and the others will build on it.