Files

Joseph Doherty 539b2b7588 feat: add structured logging, Shouldly assertions, CPM, and project documentation

- Add Microsoft.Extensions.Logging + Serilog to NatsServer and NatsClient
- Convert all test assertions from xUnit Assert to Shouldly
- Add NSubstitute package for future mocking needs
- Introduce Central Package Management via Directory.Packages.props
- Add documentation_rules.md with style guide, generation/update rules, component map
- Generate 10 documentation files across 5 component folders (GettingStarted, Protocol, Subscriptions, Server, Configuration/Operations)
- Update CLAUDE.md with logging, testing, porting, agent model, CPM, and documentation guidance

2026-02-22 21:05:53 -05:00

9.5 KiB

Raw Blame History

Parser

NatsParser is a stateful byte-level parser that processes NATS protocol commands from a ReadOnlySequence<byte> provided by System.IO.Pipelines. It is called repeatedly in a read loop until no more complete commands are available in the buffer.

Key Types

CommandType

The CommandType enum identifies every command the parser can produce:

public enum CommandType
{
    Ping,
    Pong,
    Connect,
    Info,
    Pub,
    HPub,
    Sub,
    Unsub,
    Ok,
    Err,
}

ParsedCommand

ParsedCommand is a readonly struct that carries the result of a successful parse. Using a struct avoids a heap allocation per command on the fast path.

public readonly struct ParsedCommand
{
    public CommandType Type { get; init; }
    public string? Subject { get; init; }
    public string? ReplyTo { get; init; }
    public string? Queue { get; init; }
    public string? Sid { get; init; }
    public int MaxMessages { get; init; }
    public int HeaderSize { get; init; }
    public ReadOnlyMemory<byte> Payload { get; init; }

    public static ParsedCommand Simple(CommandType type) => new() { Type = type, MaxMessages = -1 };
}

Fields that do not apply to a given command type are left at their default values (null for strings, 0 for integers). MaxMessages uses -1 as a sentinel meaning "unset" (relevant for UNSUB with no max). HeaderSize is set for HPUB/HMSG; -1 indicates no headers. Payload carries the raw body bytes for PUB/HPUB, and the raw JSON bytes for CONNECT/INFO.

TryParse

TryParse is the main entry point. It is called by the read loop after each PipeReader.ReadAsync completes.

public bool TryParse(ref ReadOnlySequence<byte> buffer, out ParsedCommand command)

The method returns true and advances buffer past the consumed bytes when a complete command is available. It returns false — leaving buffer unchanged — when more data is needed. The caller must call TryParse in a loop until it returns false, then call PipeReader.AdvanceTo to signal how far the buffer was consumed.

If the parser detects a malformed command it throws ProtocolViolationException, which the read loop catches to close the connection.

Command Identification

After locating the \r\n control line terminator, the parser lowercase-normalises the first two bytes using a bitwise OR with 0x20 and dispatches on them. This single branch handles both upper- and lowercase input without a string comparison or allocation.

byte b0 = (byte)(lineSpan[0] | 0x20); // lowercase
byte b1 = (byte)(lineSpan[1] | 0x20);

switch (b0)
{
    case (byte)'p':
        if (b1 == (byte)'i') // PING
        {
            command = ParsedCommand.Simple(CommandType.Ping);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        if (b1 == (byte)'o') // PONG
        {
            command = ParsedCommand.Simple(CommandType.Pong);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        if (b1 == (byte)'u') // PUB
        {
            return ParsePub(lineSpan, ref buffer, reader.Position, out command);
        }

        break;

    case (byte)'h':
        if (b1 == (byte)'p') // HPUB
        {
            return ParseHPub(lineSpan, ref buffer, reader.Position, out command);
        }

        break;

    case (byte)'s':
        if (b1 == (byte)'u') // SUB
        {
            command = ParseSub(lineSpan);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        break;

    case (byte)'u':
        if (b1 == (byte)'n') // UNSUB
        {
            command = ParseUnsub(lineSpan);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        break;

    case (byte)'c':
        if (b1 == (byte)'o') // CONNECT
        {
            command = ParseConnect(lineSpan);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        break;

    case (byte)'i':
        if (b1 == (byte)'n') // INFO
        {
            command = ParseInfo(lineSpan);
            buffer = buffer.Slice(reader.Position);
            return true;
        }

        break;

    case (byte)'+': // +OK
        command = ParsedCommand.Simple(CommandType.Ok);
        buffer = buffer.Slice(reader.Position);
        return true;

    case (byte)'-': // -ERR
        command = ParsedCommand.Simple(CommandType.Err);
        buffer = buffer.Slice(reader.Position);
        return true;
}

throw new ProtocolViolationException("Unknown protocol operation");

The two-character pairs are: p+i = PING, p+o = PONG, p+u = PUB, h+p = HPUB, s+u = SUB, u+n = UNSUB, c+o = CONNECT, i+n = INFO. + and - are matched on b0 alone since their second characters are unambiguous.

Two-Phase Parsing for PUB and HPUB

PUB and HPUB require a payload body that follows the control line. The parser handles split reads — where the TCP segment boundary falls inside the payload — through an _awaitingPayload state flag.

Phase 1 — control line: The parser reads the control line up to \r\n, extracts the subject, optional reply-to, and payload size(s), then stores these in private fields (_pendingSubject, _pendingReplyTo, _expectedPayloadSize, _pendingHeaderSize, _pendingType) and sets _awaitingPayload = true. It then immediately calls TryReadPayload to attempt phase 2.

Phase 2 — payload read: TryReadPayload checks whether buffer.Length >= _expectedPayloadSize + 2 (the + 2 accounts for the trailing \r\n). If enough data is present, the payload bytes are copied to a new byte[], the trailing \r\n is verified, the ParsedCommand is constructed, and _awaitingPayload is reset to false. If not enough data is present, TryReadPayload returns false and _awaitingPayload remains true.

On the next call to TryParse, the check at the top of the method routes straight to TryReadPayload without re-parsing the control line:

if (_awaitingPayload)
    return TryReadPayload(ref buffer, out command);

This means the parser correctly handles payloads that arrive across multiple PipeReader.ReadAsync completions without buffering the control line a second time.

Zero-Allocation Argument Splitting

SplitArgs splits the argument portion of a control line into token ranges without allocating. The caller stackallocs a Span<Range> sized to the maximum expected argument count for the command, then passes it to SplitArgs:

internal static int SplitArgs(Span<byte> data, Span<Range> ranges)
{
    int count = 0;
    int start = -1;

    for (int i = 0; i < data.Length; i++)
    {
        byte b = data[i];
        if (b is (byte)' ' or (byte)'\t')
        {
            if (start >= 0)
            {
                if (count >= ranges.Length)
                    throw new ProtocolViolationException("Too many arguments");
                ranges[count++] = start..i;
                start = -1;
            }
        }
        else
        {
            if (start < 0)
                start = i;
        }
    }

    if (start >= 0)
    {
        if (count >= ranges.Length)
            throw new ProtocolViolationException("Too many arguments");
        ranges[count++] = start..data.Length;
    }

    return count;
}

The returned int is the number of populated entries in ranges. Callers index into the original span using those ranges (e.g. argsSpan[ranges[0]]) to extract each token as a sub-span, then decode to string with Encoding.ASCII.GetString. Consecutive whitespace is collapsed: a new token only begins on a non-whitespace byte after one or more whitespace bytes.

Decimal Integer Parsing

ParseSize converts an ASCII decimal integer in a byte span to an int. It is used for payload sizes and UNSUB max-message counts.

internal static int ParseSize(Span<byte> data)
{
    if (data.Length == 0 || data.Length > 9)
        return -1;
    int n = 0;
    foreach (byte b in data)
    {
        if (b < (byte)'0' || b > (byte)'9')
            return -1;
        n = n * 10 + (b - '0');
    }

    return n;
}

The length cap of 9 digits prevents overflow without a checked-arithmetic check. A return value of -1 signals a parse failure; callers treat this as a ProtocolViolationException.

Error Handling

ProtocolViolationException is thrown for all malformed input:

Control line exceeds MaxControlLineSize (4096 bytes).
Unknown command bytes.
Wrong number of arguments for a command.
Payload size is negative, exceeds MaxPayloadSize, or the trailing \r\n after the payload is absent.
SplitArgs receives more tokens than the caller's ranges span can hold.

The read loop is responsible for catching ProtocolViolationException, sending -ERR to the client, and closing the connection.

Limits

Limit	Value	Source
Max control line	4096 bytes	`NatsProtocol.MaxControlLineSize`
Max payload (default)	1 048 576 bytes	`NatsProtocol.MaxPayloadSize`
Max size field digits	9	`ParseSize` length check

The max payload is configurable: NatsParser accepts a maxPayload constructor argument, which NatsClient sets from NatsOptions.

Go Reference

The .NET parser is a direct port of the state machine in golang/nats-server/server/parser.go. The Go implementation uses the same two-byte command identification technique and the same two-phase control-line/payload split for PUB and HPUB.

9.5 KiB Raw Blame History