Files
natsdotnet/docs/plans/2026-02-22-core-lifecycle-design.md
Joseph Doherty c2dc503e2e docs: add core server lifecycle design for section 1 gaps
Covers ClosedState enum, accept loop backoff, ephemeral port,
graceful shutdown, lame duck mode, PID/ports files, signal
handling, and stub components.
2026-02-22 23:25:53 -05:00

140 lines
5.2 KiB
Markdown

# Core Server Lifecycle — Design
Implements all gaps from section 1 of `differences.md` (Core Server Lifecycle).
Reference: `golang/nats-server/server/server.go`, `client.go`, `signal.go`
## Components
### 1. ClosedState Enum & Close Reason Tracking
New file `src/NATS.Server/ClosedState.cs` — full Go enum (37 values from `client.go:188-228`).
- `NatsClient` gets `CloseReason` property, `MarkClosed(ClosedState)` method
- Close reason set in `RunAsync` finally blocks based on exception type
- Error-related reasons (ReadError, WriteError, TLSHandshakeError) skip flush on close
- `NatsServer.RemoveClient` logs close reason via structured logging
### 2. Accept Loop Exponential Backoff
Port Go's `acceptError` pattern from `server.go:4607-4627`.
- Constants: `AcceptMinSleep = 10ms`, `AcceptMaxSleep = 1s`
- On `SocketException`: sleep `tmpDelay`, double it, cap at 1s
- On success: reset to 10ms
- During sleep: check `_quitCts` to abort if shutting down
- Non-temporary errors break the loop
### 3. Ephemeral Port (port=0)
After `_listener.Bind()` + `Listen()`, resolve actual port:
```csharp
if (_options.Port == 0)
{
var actualPort = ((IPEndPoint)_listener.LocalEndPoint!).Port;
_options.Port = actualPort;
_serverInfo.Port = actualPort;
}
```
Add public `Port` property on `NatsServer` exposing the resolved port.
### 4. Graceful Shutdown with WaitForShutdown
New fields on `NatsServer`:
- `_shutdown` (volatile bool)
- `_shutdownComplete` (TaskCompletionSource)
- `_quitCts` (CancellationTokenSource) — internal shutdown signal
`ShutdownAsync()` sequence:
1. Guard: if already shutting down, return
2. Set `_shutdown = true`, cancel `_quitCts`
3. Close `_listener` (stops accept loop)
4. Close all client connections with `ServerShutdown` reason
5. Wait for active client tasks to drain
6. Stop monitor server
7. Signal `_shutdownComplete`
`WaitForShutdown()`: blocks on `_shutdownComplete.Task`.
`Dispose()`: calls `ShutdownAsync` synchronously if not already shut down.
### 5. Task Tracking
Track active client tasks for clean shutdown:
- `_activeClientCount` (int, Interlocked)
- `_allClientsExited` (TaskCompletionSource, signaled when count hits 0 during shutdown)
- Increment in `AcceptClientAsync`, decrement in `RunClientAsync` finally block
- `ShutdownAsync` waits on `_allClientsExited` with timeout
### 6. Flush Pending Data Before Close
`NatsClient.FlushAndCloseAsync(bool minimalFlush)`:
- If not skip-flush reason: flush stream with 100ms write deadline
- Close socket
`MarkClosed(ClosedState)` sets skip-flush flag for: ReadError, WriteError, SlowConsumerPendingBytes, SlowConsumerWriteDeadline, TLSHandshakeError.
### 7. Lame Duck Mode
New options: `LameDuckDuration` (default 2min), `LameDuckGracePeriod` (default 10s).
`LameDuckShutdownAsync()`:
1. Set `_lameDuckMode = true`
2. Close listener (stop new connections)
3. Wait `LameDuckGracePeriod` (10s default) for clients to drain naturally
4. Stagger-close remaining clients over `LameDuckDuration - GracePeriod`
- Sleep interval = remaining duration / client count (min 1ms, max 1s)
- Randomize slightly to avoid reconnect storms
5. Call `ShutdownAsync()` for final cleanup
Accept loop: on error, if `_lameDuckMode`, exit cleanly.
### 8. PID File & Ports File
New options: `PidFile` (string?), `PortsFileDir` (string?).
PID file: `File.WriteAllText(pidFile, Process.GetCurrentProcess().Id.ToString())`
Ports file: JSON with `{ "client": port, "monitor": monitorPort }` written to `{dir}/{exe}_{pid}.ports`
Written at startup, deleted at shutdown.
### 9. Signal Handling
In `Program.cs`, use `PosixSignalRegistration` (.NET 6+):
- `SIGTERM``server.ShutdownAsync()` then exit
- `SIGUSR2``server.LameDuckShutdownAsync()`
- `SIGUSR1` → log "log reopen not yet supported"
- `SIGHUP` → log "config reload not yet supported"
Keep existing Ctrl+C handler (SIGINT).
### 10. Server Identity NKey (Stub)
Generate Ed25519 key pair at construction. Store as `ServerNKey` (public) and `_serverSeed` (private). Not used in protocol yet — placeholder for future cluster identity.
### 11. System Account (Stub)
Create `$SYS` account in `_accounts` at construction. Expose as `SystemAccount` property. No internal subscriptions yet.
### 12. Config File & Profiling (Stubs)
- `NatsOptions.ConfigFile` — if set, log warning "config file parsing not yet supported"
- `NatsOptions.ProfPort` — if set, log warning "profiling endpoint not yet supported"
- `Program.cs`: add `-c` CLI flag
## Testing
- Accept loop backoff: mock socket that throws N times, verify delays
- Ephemeral port: start server with port=0, verify resolved port > 0
- Graceful shutdown: start server, connect clients, call ShutdownAsync, verify all disconnected
- WaitForShutdown: verify it blocks until shutdown completes
- Close reason tracking: verify correct ClosedState for auth timeout, max connections, stale connection
- Lame duck mode: start server, connect clients, trigger lame duck, verify staggered closure
- PID file: start server with PidFile option, verify file contents, verify deleted on shutdown
- Ports file: start server with PortsFileDir, verify JSON contents
- Flush before close: verify data is flushed before socket close during shutdown
- System account: verify $SYS account exists after construction