Files
natsdotnet/docs/plans/2026-02-22-core-lifecycle-design.md
Joseph Doherty c2dc503e2e docs: add core server lifecycle design for section 1 gaps
Covers ClosedState enum, accept loop backoff, ephemeral port,
graceful shutdown, lame duck mode, PID/ports files, signal
handling, and stub components.
2026-02-22 23:25:53 -05:00

5.2 KiB

Core Server Lifecycle — Design

Implements all gaps from section 1 of differences.md (Core Server Lifecycle).

Reference: golang/nats-server/server/server.go, client.go, signal.go

Components

1. ClosedState Enum & Close Reason Tracking

New file src/NATS.Server/ClosedState.cs — full Go enum (37 values from client.go:188-228).

  • NatsClient gets CloseReason property, MarkClosed(ClosedState) method
  • Close reason set in RunAsync finally blocks based on exception type
  • Error-related reasons (ReadError, WriteError, TLSHandshakeError) skip flush on close
  • NatsServer.RemoveClient logs close reason via structured logging

2. Accept Loop Exponential Backoff

Port Go's acceptError pattern from server.go:4607-4627.

  • Constants: AcceptMinSleep = 10ms, AcceptMaxSleep = 1s
  • On SocketException: sleep tmpDelay, double it, cap at 1s
  • On success: reset to 10ms
  • During sleep: check _quitCts to abort if shutting down
  • Non-temporary errors break the loop

3. Ephemeral Port (port=0)

After _listener.Bind() + Listen(), resolve actual port:

if (_options.Port == 0)
{
    var actualPort = ((IPEndPoint)_listener.LocalEndPoint!).Port;
    _options.Port = actualPort;
    _serverInfo.Port = actualPort;
}

Add public Port property on NatsServer exposing the resolved port.

4. Graceful Shutdown with WaitForShutdown

New fields on NatsServer:

  • _shutdown (volatile bool)
  • _shutdownComplete (TaskCompletionSource)
  • _quitCts (CancellationTokenSource) — internal shutdown signal

ShutdownAsync() sequence:

  1. Guard: if already shutting down, return
  2. Set _shutdown = true, cancel _quitCts
  3. Close _listener (stops accept loop)
  4. Close all client connections with ServerShutdown reason
  5. Wait for active client tasks to drain
  6. Stop monitor server
  7. Signal _shutdownComplete

WaitForShutdown(): blocks on _shutdownComplete.Task.

Dispose(): calls ShutdownAsync synchronously if not already shut down.

5. Task Tracking

Track active client tasks for clean shutdown:

  • _activeClientCount (int, Interlocked)
  • _allClientsExited (TaskCompletionSource, signaled when count hits 0 during shutdown)
  • Increment in AcceptClientAsync, decrement in RunClientAsync finally block
  • ShutdownAsync waits on _allClientsExited with timeout

6. Flush Pending Data Before Close

NatsClient.FlushAndCloseAsync(bool minimalFlush):

  • If not skip-flush reason: flush stream with 100ms write deadline
  • Close socket

MarkClosed(ClosedState) sets skip-flush flag for: ReadError, WriteError, SlowConsumerPendingBytes, SlowConsumerWriteDeadline, TLSHandshakeError.

7. Lame Duck Mode

New options: LameDuckDuration (default 2min), LameDuckGracePeriod (default 10s).

LameDuckShutdownAsync():

  1. Set _lameDuckMode = true
  2. Close listener (stop new connections)
  3. Wait LameDuckGracePeriod (10s default) for clients to drain naturally
  4. Stagger-close remaining clients over LameDuckDuration - GracePeriod
    • Sleep interval = remaining duration / client count (min 1ms, max 1s)
    • Randomize slightly to avoid reconnect storms
  5. Call ShutdownAsync() for final cleanup

Accept loop: on error, if _lameDuckMode, exit cleanly.

8. PID File & Ports File

New options: PidFile (string?), PortsFileDir (string?).

PID file: File.WriteAllText(pidFile, Process.GetCurrentProcess().Id.ToString()) Ports file: JSON with { "client": port, "monitor": monitorPort } written to {dir}/{exe}_{pid}.ports

Written at startup, deleted at shutdown.

9. Signal Handling

In Program.cs, use PosixSignalRegistration (.NET 6+):

  • SIGTERMserver.ShutdownAsync() then exit
  • SIGUSR2server.LameDuckShutdownAsync()
  • SIGUSR1 → log "log reopen not yet supported"
  • SIGHUP → log "config reload not yet supported"

Keep existing Ctrl+C handler (SIGINT).

10. Server Identity NKey (Stub)

Generate Ed25519 key pair at construction. Store as ServerNKey (public) and _serverSeed (private). Not used in protocol yet — placeholder for future cluster identity.

11. System Account (Stub)

Create $SYS account in _accounts at construction. Expose as SystemAccount property. No internal subscriptions yet.

12. Config File & Profiling (Stubs)

  • NatsOptions.ConfigFile — if set, log warning "config file parsing not yet supported"
  • NatsOptions.ProfPort — if set, log warning "profiling endpoint not yet supported"
  • Program.cs: add -c CLI flag

Testing

  • Accept loop backoff: mock socket that throws N times, verify delays
  • Ephemeral port: start server with port=0, verify resolved port > 0
  • Graceful shutdown: start server, connect clients, call ShutdownAsync, verify all disconnected
  • WaitForShutdown: verify it blocks until shutdown completes
  • Close reason tracking: verify correct ClosedState for auth timeout, max connections, stale connection
  • Lame duck mode: start server, connect clients, trigger lame duck, verify staggered closure
  • PID file: start server with PidFile option, verify file contents, verify deleted on shutdown
  • Ports file: start server with PortsFileDir, verify JSON contents
  • Flush before close: verify data is flushed before socket close during shutdown
  • System account: verify $SYS account exists after construction