docs: record parser hot-path allocation strategy

This commit is contained in:
Joseph Doherty
2026-03-13 10:08:20 -04:00
parent 6cf11969f5
commit a3b34fb16d
5 changed files with 235 additions and 8 deletions

View File

@@ -24,9 +24,30 @@ public enum CommandType
}
```
### ParsedCommandView
`ParsedCommandView` is the byte-first parser result used on the hot path. It keeps protocol fields in byte-oriented storage and exposes payload as a `ReadOnlySequence<byte>` so single-segment bodies can flow through without an unconditional copy.
```csharp
public readonly struct ParsedCommandView
{
public CommandType Type { get; init; }
public string? Operation { get; init; }
public ReadOnlyMemory<byte> Subject { get; init; }
public ReadOnlyMemory<byte> ReplyTo { get; init; }
public ReadOnlyMemory<byte> Queue { get; init; }
public ReadOnlyMemory<byte> Sid { get; init; }
public int MaxMessages { get; init; }
public int HeaderSize { get; init; }
public ReadOnlySequence<byte> Payload { get; init; }
}
```
`Subject`, `ReplyTo`, `Queue`, and `Sid` remain ASCII-encoded bytes until a caller explicitly materializes them. `Payload` stays sequence-backed until a caller asks for contiguous memory.
### ParsedCommand
`ParsedCommand` is a `readonly struct` that carries the result of a successful parse. Using a struct avoids a heap allocation per command on the fast path.
`ParsedCommand` remains the compatibility shape for existing consumers. `TryParse` now delegates through `TryParseView` and materializes strings and contiguous payload memory in one adapter step instead of during every parse branch.
```csharp
public readonly struct ParsedCommand
@@ -46,15 +67,21 @@ public readonly struct ParsedCommand
Fields that do not apply to a given command type are left at their default values (`null` for strings, `0` for integers). `MaxMessages` uses `-1` as a sentinel meaning "unset" (relevant for UNSUB with no max). `HeaderSize` is set for HPUB/HMSG; `-1` indicates no headers. `Payload` carries the raw body bytes for PUB/HPUB, and the raw JSON bytes for CONNECT/INFO.
## TryParse
## TryParseView and TryParse
`TryParse` is the main entry point. It is called by the read loop after each `PipeReader.ReadAsync` completes.
`TryParseView` is the byte-oriented parser entry point. It is called by hot-path consumers such as `NatsClient.ProcessCommandsAsync` when they want to defer materialization.
```csharp
internal bool TryParseView(ref ReadOnlySequence<byte> buffer, out ParsedCommandView command)
```
`TryParse` remains the compatibility entry point for existing call sites and tests:
```csharp
public bool TryParse(ref ReadOnlySequence<byte> buffer, out ParsedCommand command)
```
The method returns `true` and advances `buffer` past the consumed bytes when a complete command is available. It returns `false` — leaving `buffer` unchanged — when more data is needed. The caller must call `TryParse` in a loop until it returns `false`, then call `PipeReader.AdvanceTo` to signal how far the buffer was consumed.
Both methods return `true` and advance `buffer` past the consumed bytes when a complete command is available. They return `false` — leaving `buffer` unchanged — when more data is needed. The caller must call the parser in a loop until it returns `false`, then call `PipeReader.AdvanceTo` to signal how far the buffer was consumed.
If the parser detects a malformed command it throws `ProtocolViolationException`, which the read loop catches to close the connection.
@@ -158,9 +185,9 @@ The two-character pairs are: `p+i` = PING, `p+o` = PONG, `p+u` = PUB, `h+p` = HP
PUB and HPUB require a payload body that follows the control line. The parser handles split reads — where the TCP segment boundary falls inside the payload — through an `_awaitingPayload` state flag.
**Phase 1 — control line:** The parser reads the control line up to `\r\n`, extracts the subject, optional reply-to, and payload size(s), then stores these in private fields (`_pendingSubject`, `_pendingReplyTo`, `_expectedPayloadSize`, `_pendingHeaderSize`, `_pendingType`) and sets `_awaitingPayload = true`. It then immediately calls `TryReadPayload` to attempt phase 2.
**Phase 1 — control line:** The parser reads the control line up to `\r\n`, extracts the subject, optional reply-to, and payload size(s), then stores these in private fields (`_pendingSubject`, `_pendingReplyTo`, `_expectedPayloadSize`, `_pendingHeaderSize`, `_pendingType`) and sets `_awaitingPayload = true`. The pending subject and reply values are held as byte-oriented state, not strings. It then immediately calls `TryReadPayload` to attempt phase 2.
**Phase 2 — payload read:** `TryReadPayload` checks whether `buffer.Length >= _expectedPayloadSize + 2` (the `+ 2` accounts for the trailing `\r\n`). If enough data is present, the payload bytes are copied to a new `byte[]`, the trailing `\r\n` is verified, the `ParsedCommand` is constructed, and `_awaitingPayload` is reset to `false`. If not enough data is present, `TryReadPayload` returns `false` and `_awaitingPayload` remains `true`.
**Phase 2 — payload read:** `TryReadPayload` checks whether `buffer.Length >= _expectedPayloadSize + 2` (the `+ 2` accounts for the trailing `\r\n`). If enough data is present, the parser slices the payload as a `ReadOnlySequence<byte>`, verifies the trailing `\r\n`, constructs a `ParsedCommandView`, and resets `_awaitingPayload` to `false`. If not enough data is present, `TryReadPayload` returns `false` and `_awaitingPayload` remains `true`.
On the next call to `TryParse`, the check at the top of the method routes straight to `TryReadPayload` without re-parsing the control line:
@@ -171,6 +198,20 @@ if (_awaitingPayload)
This means the parser correctly handles payloads that arrive across multiple `PipeReader.ReadAsync` completions without buffering the control line a second time.
## Materialization Boundaries
The parser now has explicit materialization boundaries:
- `TryParseView` keeps payloads sequence-backed and leaves token fields as bytes.
- `ParsedCommandView.Materialize()` converts byte fields to strings and converts multi-segment payloads to a standalone `byte[]`.
- `NatsClient` consumes `ParsedCommandView` directly for the `PUB` and `HPUB` hot path, only decoding subject and reply strings at the routing and permission-check boundary.
- `CONNECT` and `INFO` now keep their JSON payload as a slice of the original control-line sequence until a consumer explicitly materializes it.
Payload copying is still intentional in two places:
- when a multi-segment payload must become contiguous for a consumer using `ReadOnlyMemory<byte>`
- when compatibility callers continue to use `TryParse` and require a materialized `ParsedCommand`
## Zero-Allocation Argument Splitting
`SplitArgs` splits the argument portion of a control line into token ranges without allocating. The caller `stackalloc`s a `Span<Range>` sized to the maximum expected argument count for the command, then passes it to `SplitArgs`: