Covers ClosedState enum, accept loop backoff, ephemeral port, graceful shutdown, lame duck mode, PID/ports files, signal handling, and stub components.
140 lines
5.2 KiB
Markdown
140 lines
5.2 KiB
Markdown
# Core Server Lifecycle — Design
|
|
|
|
Implements all gaps from section 1 of `differences.md` (Core Server Lifecycle).
|
|
|
|
Reference: `golang/nats-server/server/server.go`, `client.go`, `signal.go`
|
|
|
|
## Components
|
|
|
|
### 1. ClosedState Enum & Close Reason Tracking
|
|
|
|
New file `src/NATS.Server/ClosedState.cs` — full Go enum (37 values from `client.go:188-228`).
|
|
|
|
- `NatsClient` gets `CloseReason` property, `MarkClosed(ClosedState)` method
|
|
- Close reason set in `RunAsync` finally blocks based on exception type
|
|
- Error-related reasons (ReadError, WriteError, TLSHandshakeError) skip flush on close
|
|
- `NatsServer.RemoveClient` logs close reason via structured logging
|
|
|
|
### 2. Accept Loop Exponential Backoff
|
|
|
|
Port Go's `acceptError` pattern from `server.go:4607-4627`.
|
|
|
|
- Constants: `AcceptMinSleep = 10ms`, `AcceptMaxSleep = 1s`
|
|
- On `SocketException`: sleep `tmpDelay`, double it, cap at 1s
|
|
- On success: reset to 10ms
|
|
- During sleep: check `_quitCts` to abort if shutting down
|
|
- Non-temporary errors break the loop
|
|
|
|
### 3. Ephemeral Port (port=0)
|
|
|
|
After `_listener.Bind()` + `Listen()`, resolve actual port:
|
|
|
|
```csharp
|
|
if (_options.Port == 0)
|
|
{
|
|
var actualPort = ((IPEndPoint)_listener.LocalEndPoint!).Port;
|
|
_options.Port = actualPort;
|
|
_serverInfo.Port = actualPort;
|
|
}
|
|
```
|
|
|
|
Add public `Port` property on `NatsServer` exposing the resolved port.
|
|
|
|
### 4. Graceful Shutdown with WaitForShutdown
|
|
|
|
New fields on `NatsServer`:
|
|
- `_shutdown` (volatile bool)
|
|
- `_shutdownComplete` (TaskCompletionSource)
|
|
- `_quitCts` (CancellationTokenSource) — internal shutdown signal
|
|
|
|
`ShutdownAsync()` sequence:
|
|
1. Guard: if already shutting down, return
|
|
2. Set `_shutdown = true`, cancel `_quitCts`
|
|
3. Close `_listener` (stops accept loop)
|
|
4. Close all client connections with `ServerShutdown` reason
|
|
5. Wait for active client tasks to drain
|
|
6. Stop monitor server
|
|
7. Signal `_shutdownComplete`
|
|
|
|
`WaitForShutdown()`: blocks on `_shutdownComplete.Task`.
|
|
|
|
`Dispose()`: calls `ShutdownAsync` synchronously if not already shut down.
|
|
|
|
### 5. Task Tracking
|
|
|
|
Track active client tasks for clean shutdown:
|
|
- `_activeClientCount` (int, Interlocked)
|
|
- `_allClientsExited` (TaskCompletionSource, signaled when count hits 0 during shutdown)
|
|
- Increment in `AcceptClientAsync`, decrement in `RunClientAsync` finally block
|
|
- `ShutdownAsync` waits on `_allClientsExited` with timeout
|
|
|
|
### 6. Flush Pending Data Before Close
|
|
|
|
`NatsClient.FlushAndCloseAsync(bool minimalFlush)`:
|
|
- If not skip-flush reason: flush stream with 100ms write deadline
|
|
- Close socket
|
|
|
|
`MarkClosed(ClosedState)` sets skip-flush flag for: ReadError, WriteError, SlowConsumerPendingBytes, SlowConsumerWriteDeadline, TLSHandshakeError.
|
|
|
|
### 7. Lame Duck Mode
|
|
|
|
New options: `LameDuckDuration` (default 2min), `LameDuckGracePeriod` (default 10s).
|
|
|
|
`LameDuckShutdownAsync()`:
|
|
1. Set `_lameDuckMode = true`
|
|
2. Close listener (stop new connections)
|
|
3. Wait `LameDuckGracePeriod` (10s default) for clients to drain naturally
|
|
4. Stagger-close remaining clients over `LameDuckDuration - GracePeriod`
|
|
- Sleep interval = remaining duration / client count (min 1ms, max 1s)
|
|
- Randomize slightly to avoid reconnect storms
|
|
5. Call `ShutdownAsync()` for final cleanup
|
|
|
|
Accept loop: on error, if `_lameDuckMode`, exit cleanly.
|
|
|
|
### 8. PID File & Ports File
|
|
|
|
New options: `PidFile` (string?), `PortsFileDir` (string?).
|
|
|
|
PID file: `File.WriteAllText(pidFile, Process.GetCurrentProcess().Id.ToString())`
|
|
Ports file: JSON with `{ "client": port, "monitor": monitorPort }` written to `{dir}/{exe}_{pid}.ports`
|
|
|
|
Written at startup, deleted at shutdown.
|
|
|
|
### 9. Signal Handling
|
|
|
|
In `Program.cs`, use `PosixSignalRegistration` (.NET 6+):
|
|
|
|
- `SIGTERM` → `server.ShutdownAsync()` then exit
|
|
- `SIGUSR2` → `server.LameDuckShutdownAsync()`
|
|
- `SIGUSR1` → log "log reopen not yet supported"
|
|
- `SIGHUP` → log "config reload not yet supported"
|
|
|
|
Keep existing Ctrl+C handler (SIGINT).
|
|
|
|
### 10. Server Identity NKey (Stub)
|
|
|
|
Generate Ed25519 key pair at construction. Store as `ServerNKey` (public) and `_serverSeed` (private). Not used in protocol yet — placeholder for future cluster identity.
|
|
|
|
### 11. System Account (Stub)
|
|
|
|
Create `$SYS` account in `_accounts` at construction. Expose as `SystemAccount` property. No internal subscriptions yet.
|
|
|
|
### 12. Config File & Profiling (Stubs)
|
|
|
|
- `NatsOptions.ConfigFile` — if set, log warning "config file parsing not yet supported"
|
|
- `NatsOptions.ProfPort` — if set, log warning "profiling endpoint not yet supported"
|
|
- `Program.cs`: add `-c` CLI flag
|
|
|
|
## Testing
|
|
|
|
- Accept loop backoff: mock socket that throws N times, verify delays
|
|
- Ephemeral port: start server with port=0, verify resolved port > 0
|
|
- Graceful shutdown: start server, connect clients, call ShutdownAsync, verify all disconnected
|
|
- WaitForShutdown: verify it blocks until shutdown completes
|
|
- Close reason tracking: verify correct ClosedState for auth timeout, max connections, stale connection
|
|
- Lame duck mode: start server, connect clients, trigger lame duck, verify staggered closure
|
|
- PID file: start server with PidFile option, verify file contents, verify deleted on shutdown
|
|
- Ports file: start server with PortsFileDir, verify JSON contents
|
|
- Flush before close: verify data is flushed before socket close during shutdown
|
|
- System account: verify $SYS account exists after construction
|