From f4efbcf09ebe562f8e541330dc010a55a9ecffa0 Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 23 Feb 2026 00:17:35 -0500 Subject: [PATCH] docs: add design for sections 7-10 gaps implementation --- .../2026-02-23-sections-7-10-gaps-design.md | 226 ++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 docs/plans/2026-02-23-sections-7-10-gaps-design.md diff --git a/docs/plans/2026-02-23-sections-7-10-gaps-design.md b/docs/plans/2026-02-23-sections-7-10-gaps-design.md new file mode 100644 index 0000000..094ff98 --- /dev/null +++ b/docs/plans/2026-02-23-sections-7-10-gaps-design.md @@ -0,0 +1,226 @@ +# Sections 7-10 Gaps Design: Monitoring, TLS, Logging, Ping/Pong + +**Date:** 2026-02-23 +**Scope:** Implement remaining gaps in differences.md sections 7 (Monitoring), 8 (TLS), 9 (Logging), 10 (Ping/Pong) +**Goal:** Go parity for all features within scope + +--- + +## Section 7: Monitoring + +### 7a. `/subz` Endpoint + +Replace the empty stub with a full `SubszHandler`. + +**Models:** +- `Subsz` — response envelope: `Id`, `Now`, `SublistStats`, `Total`, `Offset`, `Limit`, `Subs[]` +- `SubszOptions` — `Offset`, `Limit`, `Subscriptions` (bool for detail), `Account` (filter), `Test` (literal subject filter) +- Reuse existing `SubDetail` from Connz + +**Algorithm:** +1. Iterate all accounts (or filter by `Account` param) +2. Collect all subscriptions from each account's SubList +3. If `Test` subject provided, filter using `SubjectMatch.MatchLiteral()` to only return subs that would receive that message +4. Apply pagination (offset/limit) +5. If `Subscriptions` is true, include `SubDetail[]` array + +**SubList stats** — add a `Stats()` method to `SubList` returning `SublistStats` (count, cache size, inserts, removes, matches, cache hits). + +**Files:** New `Monitoring/SubszHandler.cs`, `Monitoring/Subsz.cs`. Modify `MonitorServer.cs`, `SubList.cs`. + +### 7b. Connz `ByStop` / `ByReason` Sorting + +Add two missing sort options for closed connection queries. + +- Add `ByStop` and `ByReason` to `SortOpt` enum +- Parse `sort=stop` and `sort=reason` in query params +- Validate: these sorts only work with `state=closed` — return error if used with open connections + +### 7c. Connz State Filtering & Closed Connections + +Track closed connections and support state-based filtering. + +**Closed connection tracking:** +- `ClosedClient` record: `Cid`, `Ip`, `Port`, `Start`, `Stop`, `Reason`, `Name`, `Lang`, `Version`, `InMsgs`, `OutMsgs`, `InBytes`, `OutBytes`, `NumSubs`, `Rtt`, `TlsVersion`, `TlsCipherSuite` +- `ConcurrentQueue` on `NatsServer` (capped at 10,000 entries) +- Populate in `RemoveClient()` from client state before disposal + +**State filter:** +- Parse `state=open|closed|all` query param +- `open` (default): current live connections only +- `closed`: only from closed connections list +- `all`: merge both + +**Files:** Modify `NatsServer.cs`, `ConnzHandler.cs`, new `Monitoring/ClosedClient.cs`. + +### 7d. Varz Slow Consumer Stats + +Already at parity. `SlowConsumersStats` is populated from `ServerStats` counters. No changes needed. + +--- + +## Section 8: TLS + +### 8a. TLS Rate Limiting + +Already implemented via `TlsRateLimiter` (semaphore + periodic refill timer). Wired into `AcceptClientAsync`. Only a unit test needed. + +### 8b. TLS Cert-to-User Mapping (TlsMap) + +Full DN parsing using .NET built-in `X500DistinguishedName`. + +**New `TlsMapAuthenticator`:** +- Implements `IAuthenticator` +- Receives the list of configured `User` objects +- On `Authenticate()`: + 1. Extract `X509Certificate2` from auth context (passed from `TlsConnectionState`) + 2. Parse subject DN via `cert.SubjectName` (`X500DistinguishedName`) + 3. Build normalized DN string from RDN components + 4. Try exact DN match against user map (key = DN string) + 5. If no exact match, try CN-only match + 6. Return `AuthResult` with matched user's permissions + +**Auth context extension:** +- Add `X509Certificate2? ClientCertificate` to `ClientAuthContext` +- Pass certificate from `TlsConnectionState` in `ProcessConnectAsync` + +**AuthService integration:** +- When `options.TlsMap && options.TlsVerify`, add `TlsMapAuthenticator` to authenticator chain +- TlsMap auth runs before other authenticators (cert-based auth takes priority) + +**Files:** New `Auth/TlsMapAuthenticator.cs`. Modify `Auth/AuthService.cs`, `Auth/ClientAuthContext.cs`, `NatsClient.cs`. + +--- + +## Section 9: Logging + +### 9a. File Logging with Rotation + +**New options on `NatsOptions`:** +- `LogFile` (string?) — path to log file +- `LogSizeLimit` (long) — file size in bytes before rotation (0 = unlimited) +- `LogMaxFiles` (int) — max retained rotated files (0 = unlimited) + +**CLI flags:** `--log_file`, `--log_size_limit`, `--log_max_files` + +**Serilog config:** Add `WriteTo.File()` with `fileSizeLimitBytes` and `retainedFileCountLimit` when `LogFile` is set. + +### 9b. Debug/Trace Modes + +**New options on `NatsOptions`:** +- `Debug` (bool) — enable debug-level logging +- `Trace` (bool) — enable trace/verbose-level logging + +**CLI flags:** `-D` (debug), `-V` or `-T` (trace), `-DV` (both) + +**Serilog config:** +- Default: `MinimumLevel.Information()` +- `-D`: `MinimumLevel.Debug()` +- `-V`/`-T`: `MinimumLevel.Verbose()` + +### 9c. Color Output + +Auto-detect TTY via `Console.IsOutputRedirected`. +- TTY: use `Serilog.Sinks.Console` with `AnsiConsoleTheme.Code` +- Non-TTY: use `ConsoleTheme.None` + +Matches Go's behavior of disabling color when stderr is not a terminal. + +### 9d. Timestamp Format Control + +**New options on `NatsOptions`:** +- `Logtime` (bool, default true) — include timestamps +- `LogtimeUTC` (bool, default false) — use UTC format + +**CLI flags:** `--logtime` (true/false), `--logtime_utc` + +**Output template adjustment:** +- With timestamps: `[{Timestamp:yyyy/MM/dd HH:mm:ss.ffffff} {Level:u3}] {Message:lj}{NewLine}{Exception}` +- Without timestamps: `[{Level:u3}] {Message:lj}{NewLine}{Exception}` +- UTC: set `Serilog.Formatting` culture to UTC + +### 9e. Log Reopening (SIGUSR1) + +When file logging is configured: +- SIGUSR1 handler calls `ReOpenLogFile()` on the server +- `ReOpenLogFile()` flushes and closes current Serilog logger, creates new one with same config +- This enables external log rotation tools (logrotate) + +**Files:** Modify `NatsOptions.cs`, `Program.cs`, `NatsServer.cs`. + +--- + +## Section 10: Ping/Pong + +### 10a. RTT Tracking + +**New fields on `NatsClient`:** +- `_rttStartTicks` (long) — UTC ticks when PING sent +- `_rtt` (long) — computed RTT in ticks +- `Rtt` property (TimeSpan) — computed from `_rtt` + +**Logic:** +- In `RunPingTimerAsync`, before writing PING: `_rttStartTicks = DateTime.UtcNow.Ticks` +- In `DispatchCommandAsync` PONG handler: compute `_rtt = DateTime.UtcNow.Ticks - _rttStartTicks` (min 1 tick) +- `computeRTT()` helper ensures minimum 1 tick (handles clock granularity on Windows) + +**Monitoring exposure:** +- Populate `ConnInfo.Rtt` as formatted string (e.g., `"1.234ms"`) +- Add `ByRtt` sort option to Connz + +### 10b. RTT-Based First PING Delay + +**New state on `NatsClient`:** +- `_firstPongSent` flag in `ClientFlags` + +**Logic in `RunPingTimerAsync`:** +- Before first PING, check: `_firstPongSent || timeSinceStart > 2 seconds` +- If neither condition met, skip this PING cycle +- Set `_firstPongSent` on first PONG after CONNECT (in PONG handler) + +This prevents the server from sending PING (for RTT) before the client has had a chance to respond to the initial INFO with CONNECT+PING. + +### 10c. Stale Connection Stats + +**New model:** +- `StaleConnectionStats` — `Clients`, `Routes`, `Gateways`, `Leafs` (matching Go) + +**ServerStats extension:** +- Add `StaleConnectionClients`, `StaleConnectionRoutes`, etc. fields +- Increment in `MarkClosed(StaleConnection)` based on connection kind + +**Varz exposure:** +- Add `StaleConnectionStats` field to `Varz` +- Populate from `ServerStats` counters + +**Files:** Modify `NatsClient.cs`, `ServerStats.cs`, `Varz.cs`, `VarzHandler.cs`, `Connz.cs`, `ConnzHandler.cs`. + +--- + +## Test Coverage + +Each section includes unit tests: + +| Feature | Test File | Tests | +|---------|-----------|-------| +| Subz endpoint | SubszHandlerTests.cs | Empty response, with subs, account filter, test subject filter, pagination | +| Connz closed state | ConnzHandlerTests.cs | State=closed, ByStop sort, ByReason sort, validation errors | +| TLS rate limiter | TlsRateLimiterTests.cs | Rate enforcement, refill behavior | +| TlsMap auth | TlsMapAuthenticatorTests.cs | DN matching, CN fallback, no match | +| File logging | LoggingTests.cs | File creation, rotation on size limit | +| RTT tracking | ClientTests.cs | RTT computed on PONG, exposed in connz, ByRtt sort | +| First PING delay | ClientTests.cs | PING delayed until first PONG or 2s | +| Stale stats | ServerTests.cs | Stale counters incremented, exposed in varz | + +--- + +## Parallelization Strategy + +These work streams are independent and can be developed by parallel subagents: + +1. **Monitoring stream** (7a, 7b, 7c): SubszHandler + Connz closed connections + state filter +2. **TLS stream** (8b): TlsMapAuthenticator +3. **Logging stream** (9a-9e): All logging improvements +4. **Ping/Pong stream** (10a-10c): RTT tracking + first PING delay + stale stats + +Streams 1-4 touch different files with minimal overlap. The only shared touch point is `NatsOptions.cs` (new options for logging and ping/pong), which can be handled by one stream first and the others will build on it.