feat: add structured logging, Shouldly assertions, CPM, and project documentation

- Add Microsoft.Extensions.Logging + Serilog to NatsServer and NatsClient
- Convert all test assertions from xUnit Assert to Shouldly
- Add NSubstitute package for future mocking needs
- Introduce Central Package Management via Directory.Packages.props
- Add documentation_rules.md with style guide, generation/update rules, component map
- Generate 10 documentation files across 5 component folders (GettingStarted, Protocol, Subscriptions, Server, Configuration/Operations)
- Update CLAUDE.md with logging, testing, porting, agent model, CPM, and documentation guidance
This commit is contained in:
Joseph Doherty
2026-02-22 21:05:53 -05:00
parent b9f4dec523
commit 539b2b7588
25 changed files with 2734 additions and 110 deletions

View File

@@ -0,0 +1,151 @@
# Protocol Overview
NATS uses a line-oriented, text-based protocol over TCP. All commands are terminated by `\r\n`. This simplicity makes it easy to debug with raw TCP tools and keeps parsing overhead low.
## Command Reference
All commands flow either from server to client (S→C) or client to server (C→S). PING and PONG travel in both directions as part of the keepalive mechanism.
| Command | Direction | Format |
|---------|-----------|--------|
| INFO | S→C | `INFO {json}\r\n` |
| CONNECT | C→S | `CONNECT {json}\r\n` |
| PUB | C→S | `PUB subject [reply] size\r\n[payload]\r\n` |
| HPUB | C→S | `HPUB subject [reply] hdr_size total_size\r\n[headers+payload]\r\n` |
| SUB | C→S | `SUB subject [queue] sid\r\n` |
| UNSUB | C→S | `UNSUB sid [max_msgs]\r\n` |
| MSG | S→C | `MSG subject sid [reply] size\r\n[payload]\r\n` |
| HMSG | S→C | `HMSG subject sid [reply] hdr_size total_size\r\n[headers+payload]\r\n` |
| PING/PONG | Both | `PING\r\n` / `PONG\r\n` |
| +OK/-ERR | S→C | `+OK\r\n` / `-ERR 'msg'\r\n` |
Arguments in brackets are optional. `sid` is the subscription ID string assigned by the client. Commands with a payload body (PUB, HPUB, MSG, HMSG) use a two-line structure: a control line with sizes, then the raw payload bytes, then a terminating `\r\n`.
## Connection Handshake
The handshake is always server-initiated:
1. Client opens TCP connection.
2. Server immediately sends `INFO {json}\r\n` describing itself.
3. Client sends `CONNECT {json}\r\n` with its options.
4. Normal operation begins (PUB, SUB, MSG, PING/PONG, etc.).
If `verbose` is enabled in `ClientOptions`, the server sends `+OK` after each valid client command. If the server rejects the CONNECT (bad credentials, unsupported protocol version, etc.) it sends `-ERR 'message'\r\n` and closes the connection.
## ServerInfo
The `ServerInfo` JSON payload is sent in the initial INFO message. The `ClientId` and `ClientIp` fields are omitted from JSON when not set.
```csharp
public sealed class ServerInfo
{
[JsonPropertyName("server_id")]
public required string ServerId { get; set; }
[JsonPropertyName("server_name")]
public required string ServerName { get; set; }
[JsonPropertyName("version")]
public required string Version { get; set; }
[JsonPropertyName("proto")]
public int Proto { get; set; } = NatsProtocol.ProtoVersion;
[JsonPropertyName("host")]
public required string Host { get; set; }
[JsonPropertyName("port")]
public int Port { get; set; }
[JsonPropertyName("headers")]
public bool Headers { get; set; } = true;
[JsonPropertyName("max_payload")]
public int MaxPayload { get; set; } = NatsProtocol.MaxPayloadSize;
[JsonPropertyName("client_id")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingDefault)]
public ulong ClientId { get; set; }
[JsonPropertyName("client_ip")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)]
public string? ClientIp { get; set; }
}
```
`headers` signals that the server supports HPUB/HMSG. `max_payload` advertises the largest message body the server accepts (default 1 MB). `proto` is the protocol version integer; the current value is `1`.
## ClientOptions
The `ClientOptions` JSON payload is sent by the client in the CONNECT command.
```csharp
public sealed class ClientOptions
{
[JsonPropertyName("verbose")]
public bool Verbose { get; set; }
[JsonPropertyName("pedantic")]
public bool Pedantic { get; set; }
[JsonPropertyName("echo")]
public bool Echo { get; set; } = true;
[JsonPropertyName("name")]
public string? Name { get; set; }
[JsonPropertyName("lang")]
public string? Lang { get; set; }
[JsonPropertyName("version")]
public string? Version { get; set; }
[JsonPropertyName("protocol")]
public int Protocol { get; set; }
[JsonPropertyName("headers")]
public bool Headers { get; set; }
[JsonPropertyName("no_responders")]
public bool NoResponders { get; set; }
}
```
`echo` defaults to `true`, meaning a client receives its own published messages if it has a matching subscription. Setting `echo` to `false` suppresses that. `no_responders` causes the server to send a status message when a request has no subscribers, rather than letting the client time out.
## Protocol Constants
`NatsProtocol` centralises limits and pre-encoded byte arrays to avoid repeated allocations in the hot path.
```csharp
public static class NatsProtocol
{
public const int MaxControlLineSize = 4096;
public const int MaxPayloadSize = 1024 * 1024; // 1MB
public const int DefaultPort = 4222;
// Pre-encoded protocol fragments
public static readonly byte[] CrLf = "\r\n"u8.ToArray();
public static readonly byte[] PingBytes = "PING\r\n"u8.ToArray();
public static readonly byte[] PongBytes = "PONG\r\n"u8.ToArray();
public static readonly byte[] OkBytes = "+OK\r\n"u8.ToArray();
public static readonly byte[] InfoPrefix = "INFO "u8.ToArray();
public static readonly byte[] MsgPrefix = "MSG "u8.ToArray();
public static readonly byte[] HmsgPrefix = "HMSG "u8.ToArray();
public static readonly byte[] ErrPrefix = "-ERR "u8.ToArray();
}
```
`MaxControlLineSize` (4096 bytes) is the maximum length of a command line before the payload. Any control line that exceeds this limit causes the parser to throw `ProtocolViolationException`. `MaxPayloadSize` (1 MB) is the default limit enforced by the parser; it is configurable per server instance.
## Go Reference
The Go implementation of protocol parsing is in `golang/nats-server/server/parser.go`. The .NET implementation follows the same command identification strategy and enforces the same control line and payload size limits.
## Related Documentation
- [Parser](Parser.md)
- [Server Overview](../Server/Overview.md)
- [Configuration Overview](../Configuration/Overview.md)
<!-- Last verified against codebase: 2026-02-22 -->

View File

@@ -0,0 +1,271 @@
# Parser
`NatsParser` is a stateful byte-level parser that processes NATS protocol commands from a `ReadOnlySequence<byte>` provided by `System.IO.Pipelines`. It is called repeatedly in a read loop until no more complete commands are available in the buffer.
## Key Types
### CommandType
The `CommandType` enum identifies every command the parser can produce:
```csharp
public enum CommandType
{
Ping,
Pong,
Connect,
Info,
Pub,
HPub,
Sub,
Unsub,
Ok,
Err,
}
```
### ParsedCommand
`ParsedCommand` is a `readonly struct` that carries the result of a successful parse. Using a struct avoids a heap allocation per command on the fast path.
```csharp
public readonly struct ParsedCommand
{
public CommandType Type { get; init; }
public string? Subject { get; init; }
public string? ReplyTo { get; init; }
public string? Queue { get; init; }
public string? Sid { get; init; }
public int MaxMessages { get; init; }
public int HeaderSize { get; init; }
public ReadOnlyMemory<byte> Payload { get; init; }
public static ParsedCommand Simple(CommandType type) => new() { Type = type, MaxMessages = -1 };
}
```
Fields that do not apply to a given command type are left at their default values (`null` for strings, `0` for integers). `MaxMessages` uses `-1` as a sentinel meaning "unset" (relevant for UNSUB with no max). `HeaderSize` is set for HPUB/HMSG; `-1` indicates no headers. `Payload` carries the raw body bytes for PUB/HPUB, and the raw JSON bytes for CONNECT/INFO.
## TryParse
`TryParse` is the main entry point. It is called by the read loop after each `PipeReader.ReadAsync` completes.
```csharp
public bool TryParse(ref ReadOnlySequence<byte> buffer, out ParsedCommand command)
```
The method returns `true` and advances `buffer` past the consumed bytes when a complete command is available. It returns `false` — leaving `buffer` unchanged — when more data is needed. The caller must call `TryParse` in a loop until it returns `false`, then call `PipeReader.AdvanceTo` to signal how far the buffer was consumed.
If the parser detects a malformed command it throws `ProtocolViolationException`, which the read loop catches to close the connection.
## Command Identification
After locating the `\r\n` control line terminator, the parser lowercase-normalises the first two bytes using a bitwise OR with `0x20` and dispatches on them. This single branch handles both upper- and lowercase input without a string comparison or allocation.
```csharp
byte b0 = (byte)(lineSpan[0] | 0x20); // lowercase
byte b1 = (byte)(lineSpan[1] | 0x20);
switch (b0)
{
case (byte)'p':
if (b1 == (byte)'i') // PING
{
command = ParsedCommand.Simple(CommandType.Ping);
buffer = buffer.Slice(reader.Position);
return true;
}
if (b1 == (byte)'o') // PONG
{
command = ParsedCommand.Simple(CommandType.Pong);
buffer = buffer.Slice(reader.Position);
return true;
}
if (b1 == (byte)'u') // PUB
{
return ParsePub(lineSpan, ref buffer, reader.Position, out command);
}
break;
case (byte)'h':
if (b1 == (byte)'p') // HPUB
{
return ParseHPub(lineSpan, ref buffer, reader.Position, out command);
}
break;
case (byte)'s':
if (b1 == (byte)'u') // SUB
{
command = ParseSub(lineSpan);
buffer = buffer.Slice(reader.Position);
return true;
}
break;
case (byte)'u':
if (b1 == (byte)'n') // UNSUB
{
command = ParseUnsub(lineSpan);
buffer = buffer.Slice(reader.Position);
return true;
}
break;
case (byte)'c':
if (b1 == (byte)'o') // CONNECT
{
command = ParseConnect(lineSpan);
buffer = buffer.Slice(reader.Position);
return true;
}
break;
case (byte)'i':
if (b1 == (byte)'n') // INFO
{
command = ParseInfo(lineSpan);
buffer = buffer.Slice(reader.Position);
return true;
}
break;
case (byte)'+': // +OK
command = ParsedCommand.Simple(CommandType.Ok);
buffer = buffer.Slice(reader.Position);
return true;
case (byte)'-': // -ERR
command = ParsedCommand.Simple(CommandType.Err);
buffer = buffer.Slice(reader.Position);
return true;
}
throw new ProtocolViolationException("Unknown protocol operation");
```
The two-character pairs are: `p+i` = PING, `p+o` = PONG, `p+u` = PUB, `h+p` = HPUB, `s+u` = SUB, `u+n` = UNSUB, `c+o` = CONNECT, `i+n` = INFO. `+` and `-` are matched on `b0` alone since their second characters are unambiguous.
## Two-Phase Parsing for PUB and HPUB
PUB and HPUB require a payload body that follows the control line. The parser handles split reads — where the TCP segment boundary falls inside the payload — through an `_awaitingPayload` state flag.
**Phase 1 — control line:** The parser reads the control line up to `\r\n`, extracts the subject, optional reply-to, and payload size(s), then stores these in private fields (`_pendingSubject`, `_pendingReplyTo`, `_expectedPayloadSize`, `_pendingHeaderSize`, `_pendingType`) and sets `_awaitingPayload = true`. It then immediately calls `TryReadPayload` to attempt phase 2.
**Phase 2 — payload read:** `TryReadPayload` checks whether `buffer.Length >= _expectedPayloadSize + 2` (the `+ 2` accounts for the trailing `\r\n`). If enough data is present, the payload bytes are copied to a new `byte[]`, the trailing `\r\n` is verified, the `ParsedCommand` is constructed, and `_awaitingPayload` is reset to `false`. If not enough data is present, `TryReadPayload` returns `false` and `_awaitingPayload` remains `true`.
On the next call to `TryParse`, the check at the top of the method routes straight to `TryReadPayload` without re-parsing the control line:
```csharp
if (_awaitingPayload)
return TryReadPayload(ref buffer, out command);
```
This means the parser correctly handles payloads that arrive across multiple `PipeReader.ReadAsync` completions without buffering the control line a second time.
## Zero-Allocation Argument Splitting
`SplitArgs` splits the argument portion of a control line into token ranges without allocating. The caller `stackalloc`s a `Span<Range>` sized to the maximum expected argument count for the command, then passes it to `SplitArgs`:
```csharp
internal static int SplitArgs(Span<byte> data, Span<Range> ranges)
{
int count = 0;
int start = -1;
for (int i = 0; i < data.Length; i++)
{
byte b = data[i];
if (b is (byte)' ' or (byte)'\t')
{
if (start >= 0)
{
if (count >= ranges.Length)
throw new ProtocolViolationException("Too many arguments");
ranges[count++] = start..i;
start = -1;
}
}
else
{
if (start < 0)
start = i;
}
}
if (start >= 0)
{
if (count >= ranges.Length)
throw new ProtocolViolationException("Too many arguments");
ranges[count++] = start..data.Length;
}
return count;
}
```
The returned `int` is the number of populated entries in `ranges`. Callers index into the original span using those ranges (e.g. `argsSpan[ranges[0]]`) to extract each token as a sub-span, then decode to `string` with `Encoding.ASCII.GetString`. Consecutive whitespace is collapsed: a new token only begins on a non-whitespace byte after one or more whitespace bytes.
## Decimal Integer Parsing
`ParseSize` converts an ASCII decimal integer in a byte span to an `int`. It is used for payload sizes and UNSUB max-message counts.
```csharp
internal static int ParseSize(Span<byte> data)
{
if (data.Length == 0 || data.Length > 9)
return -1;
int n = 0;
foreach (byte b in data)
{
if (b < (byte)'0' || b > (byte)'9')
return -1;
n = n * 10 + (b - '0');
}
return n;
}
```
The length cap of 9 digits prevents overflow without a checked-arithmetic check. A return value of `-1` signals a parse failure; callers treat this as a `ProtocolViolationException`.
## Error Handling
`ProtocolViolationException` is thrown for all malformed input:
- Control line exceeds `MaxControlLineSize` (4096 bytes).
- Unknown command bytes.
- Wrong number of arguments for a command.
- Payload size is negative, exceeds `MaxPayloadSize`, or the trailing `\r\n` after the payload is absent.
- `SplitArgs` receives more tokens than the caller's `ranges` span can hold.
The read loop is responsible for catching `ProtocolViolationException`, sending `-ERR` to the client, and closing the connection.
## Limits
| Limit | Value | Source |
|-------|-------|--------|
| Max control line | 4096 bytes | `NatsProtocol.MaxControlLineSize` |
| Max payload (default) | 1 048 576 bytes | `NatsProtocol.MaxPayloadSize` |
| Max size field digits | 9 | `ParseSize` length check |
The max payload is configurable: `NatsParser` accepts a `maxPayload` constructor argument, which `NatsClient` sets from `NatsOptions`.
## Go Reference
The .NET parser is a direct port of the state machine in `golang/nats-server/server/parser.go`. The Go implementation uses the same two-byte command identification technique and the same two-phase control-line/payload split for PUB and HPUB.
## Related Documentation
- [Protocol Overview](Overview.md)
- [Server Overview](../Server/Overview.md)
<!-- Last verified against codebase: 2026-02-22 -->