docs: add FileStore benchmarks and storage notes

This commit is contained in:
Joseph Doherty
2026-03-13 11:34:19 -04:00
parent f57edca5a8
commit ca2d8019a1
4 changed files with 174 additions and 32 deletions

View File

@@ -286,43 +286,31 @@ public ValueTask<ulong> AppendAsync(string subject, ReadOnlyMemory<byte> payload
### FileStore
`FileStore` appends messages to a JSONL file (`messages.jsonl`) and keeps a full in-memory index (`Dictionary<ulong, StoredMessage>`) identical in structure to `MemStore`. It is not production-safe for several reasons:
- **No locking**: `AppendAsync`, `LoadAsync`, `GetStateAsync`, and `TrimToMaxMessages` are not synchronized. Concurrent access from `StreamManager.Capture` and `PullConsumerEngine.FetchAsync` is unsafe.
- **Per-write file I/O**: Each `AppendAsync` calls `File.AppendAllTextAsync`, issuing a separate file open/write/close per message.
- **Full rewrite on trim**: `TrimToMaxMessages` calls `RewriteDataFile()`, which rewrites the entire file from the in-memory index. This is O(n) in message count and blocking.
- **Full in-memory index**: The in-memory dictionary holds every undeleted message payload; there is no paging or streaming read path.
`FileStore` now persists messages into block files via `MsgBlock`, keeps a live in-memory message cache for load paths, and maintains a compact metadata index (`Dictionary<ulong, StoredMessageIndex>`) plus a per-subject last-sequence map for hot-path lookups such as `LoadLastBySubjectAsync`. Headers and payloads are stored separately and remain separate across snapshot, restore, block rewrite, and crash recovery. On startup, any legacy `messages.jsonl` file is migrated into block storage before recovery continues.
```csharp
// FileStore.cs
public void TrimToMaxMessages(ulong maxMessages)
private readonly Dictionary<ulong, StoredMessage> _messages = new();
private readonly Dictionary<ulong, StoredMessageIndex> _messageIndexes = new();
private readonly Dictionary<string, ulong> _lastSequenceBySubject = new(StringComparer.Ordinal);
public ValueTask<StoredMessage?> LoadLastBySubjectAsync(string subject, CancellationToken ct)
{
while ((ulong)_messages.Count > maxMessages)
if (_lastSequenceBySubject.TryGetValue(subject, out var sequence)
&& _messages.TryGetValue(sequence, out var match))
{
var first = _messages.Keys.Min();
_messages.Remove(first);
return ValueTask.FromResult<StoredMessage?>(match);
}
RewriteDataFile();
}
private void RewriteDataFile()
{
var lines = new List<string>(_messages.Count);
foreach (var message in _messages.OrderBy(kv => kv.Key).Select(kv => kv.Value))
{
lines.Add(JsonSerializer.Serialize(new FileRecord
{
Sequence = message.Sequence,
Subject = message.Subject,
PayloadBase64 = Convert.ToBase64String(message.Payload.ToArray()),
}));
}
File.WriteAllLines(_dataFilePath, lines);
return ValueTask.FromResult<StoredMessage?>(null);
}
```
The Go reference (`filestore.go`) uses block-based binary storage with S2 compression, per-block indexes, and memory-mapped I/O. This implementation shares none of those properties.
The current implementation is still materially simpler than Go `filestore.go`:
- **No synchronization**: `FileStore` still exposes unsynchronized mutation and read paths. It is safe only under the current test and single-process usage assumptions.
- **Payloads still stay resident**: the compact index removes duplicate payload ownership for metadata-heavy operations, but `_messages` still retains live payload bytes in memory for direct load paths.
- **No Go-equivalent block index stack**: there is no per-block subject tree, mmap-backed read path, or Go-style cache/compaction parity. Deletes and trims rely on tombstones plus later block maintenance rather than Go's full production filestore behavior.
---
@@ -445,7 +433,7 @@ The following features are present in the Go reference (`golang/nats-server/serv
- **Ephemeral consumers**: `ConsumerManager.CreateOrUpdate` requires a non-empty `DurableName`. There is no support for unnamed ephemeral consumers.
- **Push delivery over the NATS wire**: Push consumers enqueue `PushFrame` objects into an in-memory queue. No MSG is written to any connected NATS client's TCP socket.
- **Consumer filter subject enforcement**: `FilterSubject` is stored on `ConsumerConfig` but is never applied in `PullConsumerEngine.FetchAsync`. All messages in the stream are returned regardless of filter.
- **FileStore production safety**: No locking, per-write file I/O, full-rewrite-on-trim, and full in-memory index make `FileStore` unsuitable for production use.
- **FileStore production safety**: `FileStore` now uses block files and compact metadata indexes, but it still lacks synchronization and Go-level block indexing, so it remains unsuitable for production use.
- **RAFT persistence and networking**: `RaftNode` log entries are not persisted across restarts. Replication uses direct in-process method calls; there is no network transport for multi-server consensus.
- **Cross-server replication**: Mirror and source coordinators work only within one `StreamManager` in one process. Messages published on a remote server are not replicated.
- **Duplicate message window**: `PublishPreconditions` tracks message IDs for deduplication but there is no configurable `DuplicateWindow` TTL to expire old IDs.
@@ -460,4 +448,4 @@ The following features are present in the Go reference (`golang/nats-server/serv
- [Configuration Overview](../Configuration/Overview.md)
- [Protocol Overview](../Protocol/Overview.md)
<!-- Last verified against codebase: 2026-02-23 -->
<!-- Last verified against codebase: 2026-03-13 -->