docs: add design doc for SYSTEM and ACCOUNT connection types

Covers 6 implementation layers: ClientKind enum + INatsClient interface,
event infrastructure with Channel<T>, system event publishing, request-reply
monitoring services, import/export model with ACCOUNT client, and response
routing with latency tracking.
This commit is contained in:
Joseph Doherty
2026-02-23 05:03:17 -05:00
22 changed files with 5035 additions and 21 deletions

View File

@@ -0,0 +1,565 @@
# Design: SYSTEM and ACCOUNT Connection Types
**Date:** 2026-02-23
**Status:** Approved
**Approach:** Bottom-Up Layered Build (6 layers)
## Overview
Port the SYSTEM and ACCOUNT internal connection types from the Go NATS server to .NET. This includes:
- Client type differentiation (ClientKind enum)
- Internal client infrastructure (socketless clients with callback-based delivery)
- Full system event publishing ($SYS.ACCOUNT.*.CONNECT, DISCONNECT, STATSZ, etc.)
- System request-reply monitoring services ($SYS.REQ.SERVER.*.VARZ, CONNZ, etc.)
- Account service/stream imports and exports (cross-account message routing)
- Response routing for service imports with latency tracking
**Go reference files:**
- `golang/nats-server/server/client.go` — client type constants (lines 45-65), `isInternalClient()`, message delivery (lines 3789-3803)
- `golang/nats-server/server/server.go` — system account setup (lines 1822-1892), `createInternalClient()` (lines 1910-1936)
- `golang/nats-server/server/events.go``internal` struct (lines 124-147), event subjects (lines 41-97), send/receive loops (lines 474-668), event publishing, subscriptions (lines 1172-1495)
- `golang/nats-server/server/accounts.go``Account` struct (lines 52-119), import/export structs (lines 142-263), `addServiceImport()` (lines 1560-2112), `addServiceImportSub()` (lines 2156-2187), `internalClient()` (lines 2114-2122)
---
## Layer 1: ClientKind Enum + INatsClient Interface + InternalClient
### ClientKind Enum
**New file:** `src/NATS.Server/ClientKind.cs`
```csharp
public enum ClientKind
{
Client, // End user connection
Router, // Cluster peer (out of scope)
Gateway, // Inter-cluster bridge (out of scope)
Leaf, // Leaf node (out of scope)
System, // Internal system client
JetStream, // Internal JetStream client (out of scope)
Account, // Internal per-account client
}
public static class ClientKindExtensions
{
public static bool IsInternal(this ClientKind kind) =>
kind is ClientKind.System or ClientKind.JetStream or ClientKind.Account;
}
```
### INatsClient Interface
Extract from `NatsClient` the surface used by `Subscription`, `DeliverMessage`, `ProcessMessage`:
```csharp
public interface INatsClient
{
ulong Id { get; }
ClientKind Kind { get; }
bool IsInternal { get; }
Account? Account { get; }
ClientOptions? ClientOpts { get; }
ClientPermissions? Permissions { get; }
void SendMessage(string subject, string sid, string? replyTo,
ReadOnlyMemory<byte> headers, ReadOnlyMemory<byte> payload);
bool QueueOutbound(ReadOnlyMemory<byte> data);
}
```
### InternalClient Class
**New file:** `src/NATS.Server/InternalClient.cs`
Lightweight, socketless client for internal messaging:
- `ClientKind Kind` — System, Account, or JetStream
- `Account Account` — associated account
- `ulong Id` — unique client ID from server's ID counter
- Headers always enabled, echo always disabled
- `SendMessage` invokes internal callback delegate or pushes to Channel
- No socket, no read/write loops, no parser
- `QueueOutbound` is a no-op (internal clients don't write wire protocol)
### Subscription Change
`Subscription.Client` changes from `NatsClient?` to `INatsClient?`. This is the biggest refactoring step — all code referencing `sub.Client` as `NatsClient` needs updating.
`NatsClient` implements `INatsClient` with `Kind = ClientKind.Client`.
---
## Layer 2: System Event Infrastructure
### InternalEventSystem Class
**New file:** `src/NATS.Server/Events/InternalEventSystem.cs`
Core class managing the server's internal event system, mirroring Go's `internal` struct:
```csharp
public sealed class InternalEventSystem : IAsyncDisposable
{
// Core state
public Account SystemAccount { get; }
public InternalClient SystemClient { get; }
private ulong _sequence;
private int _subscriptionId;
private readonly string _serverHash;
private readonly string _inboxPrefix;
// Message queues (Channel<T>-based)
private readonly Channel<PublishMessage> _sendQueue;
private readonly Channel<InternalSystemMessage> _receiveQueue;
private readonly Channel<InternalSystemMessage> _receiveQueuePings;
// Background tasks
private Task? _sendLoop;
private Task? _receiveLoop;
private Task? _receiveLoopPings;
// Remote server tracking
private readonly ConcurrentDictionary<string, ServerUpdate> _remoteServers = new();
// Timers
private PeriodicTimer? _statszTimer; // 10s interval
private PeriodicTimer? _accountConnsTimer; // 30s interval
private PeriodicTimer? _orphanSweeper; // 90s interval
}
```
### Message Types
```csharp
public record PublishMessage(
InternalClient? Client, // Use specific client or default to system client
string Subject,
string? Reply,
ServerInfo? Info,
byte[]? Headers,
object? Body, // JSON-serializable
bool Echo = false,
bool IsLast = false);
public record InternalSystemMessage(
Subscription? Sub,
INatsClient? Client,
Account? Account,
string Subject,
string? Reply,
ReadOnlyMemory<byte> Headers,
ReadOnlyMemory<byte> Message,
Action<Subscription?, INatsClient?, Account?, string, string?, ReadOnlyMemory<byte>, ReadOnlyMemory<byte>> Callback);
```
### Lifecycle
- `StartAsync(NatsServer server)` — creates system client, starts 3 background Tasks
- `StopAsync()` — publishes shutdown event with `IsLast=true`, signals channels complete, awaits all tasks
### Send Loop
Consumes from `_sendQueue`:
1. Fills in ServerInfo metadata (name, host, ID, sequence, version, tags)
2. Serializes body to JSON using source-generated serializer
3. Calls `server.ProcessMessage()` on the system account to deliver locally
4. Handles compression if configured
### Receive Loop(s)
Two instances (general + pings) consuming from their respective channels:
- Pop messages, invoke callbacks
- Exit on cancellation
### APIs on NatsServer
```csharp
public void SendInternalMsg(string subject, string? reply, object? msg);
public void SendInternalAccountMsg(Account account, string subject, object? msg);
public Subscription SysSubscribe(string subject, SystemMessageHandler callback);
public Subscription SysSubscribeInternal(string subject, SystemMessageHandler callback);
```
### noInlineCallback Pattern
Wraps a `SystemMessageHandler` so that instead of executing inline during message delivery, it enqueues to `_receiveQueue` for async dispatch. This prevents system event handlers from blocking the publishing path.
---
## Layer 3: System Event Publishing
### Event Types (DTOs)
**New folder:** `src/NATS.Server/Events/`
All events embed a `TypedEvent` base:
```csharp
public record TypedEvent(string Type, string Id, DateTime Time);
```
| Event Class | Type String | Published On |
|-------------|-------------|-------------|
| `ConnectEventMsg` | `io.nats.server.advisory.v1.client_connect` | `$SYS.ACCOUNT.{acc}.CONNECT` |
| `DisconnectEventMsg` | `io.nats.server.advisory.v1.client_disconnect` | `$SYS.ACCOUNT.{acc}.DISCONNECT` |
| `AccountNumConns` | `io.nats.server.advisory.v1.account_connections` | `$SYS.ACCOUNT.{acc}.SERVER.CONNS` |
| `ServerStatsMsg` | (stats) | `$SYS.SERVER.{id}.STATSZ` |
| `ShutdownEventMsg` | (shutdown) | `$SYS.SERVER.{id}.SHUTDOWN` |
| `LameDuckEventMsg` | (lameduck) | `$SYS.SERVER.{id}.LAMEDUCK` |
| `AuthErrorEventMsg` | `io.nats.server.advisory.v1.client_auth` | `$SYS.SERVER.{id}.CLIENT.AUTH.ERR` |
### Integration Points
| Location | Event | Trigger |
|----------|-------|---------|
| `NatsServer.HandleClientAsync()` after auth | `ConnectEventMsg` | Client authenticated |
| `NatsServer.RemoveClient()` | `DisconnectEventMsg` | Client disconnected |
| `NatsServer.ShutdownAsync()` | `ShutdownEventMsg` | Server shutting down |
| `NatsServer.LameDuckShutdownAsync()` | `LameDuckEventMsg` | Lame duck mode |
| Auth failure in `NatsClient.ProcessConnect()` | `AuthErrorEventMsg` | Auth rejected |
| Periodic timer (10s) | `ServerStatsMsg` | Timer tick |
| Periodic timer (30s) | `AccountNumConns` | Timer tick, for each account with connections |
### JSON Serialization
`System.Text.Json` source generator context:
```csharp
[JsonSerializable(typeof(ConnectEventMsg))]
[JsonSerializable(typeof(DisconnectEventMsg))]
[JsonSerializable(typeof(ServerStatsMsg))]
// ... etc
internal partial class EventJsonContext : JsonSerializerContext { }
```
---
## Layer 4: System Request-Reply Services
### Subscriptions Created in initEventTracking()
Server-specific (only this server responds):
| Subject | Handler | Response |
|---------|---------|----------|
| `$SYS.REQ.SERVER.{id}.IDZ` | `IdzReq` | Server identity |
| `$SYS.REQ.SERVER.{id}.STATSZ` | `StatszReq` | Server stats (same as /varz stats) |
| `$SYS.REQ.SERVER.{id}.VARZ` | `VarzReq` | Same as /varz JSON |
| `$SYS.REQ.SERVER.{id}.CONNZ` | `ConnzReq` | Same as /connz JSON |
| `$SYS.REQ.SERVER.{id}.SUBSZ` | `SubszReq` | Same as /subz JSON |
| `$SYS.REQ.SERVER.{id}.HEALTHZ` | `HealthzReq` | Health status |
| `$SYS.REQ.SERVER.{id}.ACCOUNTZ` | `AccountzReq` | Account info |
Wildcard ping (all servers respond):
| Subject | Handler |
|---------|---------|
| `$SYS.REQ.SERVER.PING.STATSZ` | `StatszReq` |
| `$SYS.REQ.SERVER.PING.VARZ` | `VarzReq` |
| `$SYS.REQ.SERVER.PING.IDZ` | `IdzReq` |
| `$SYS.REQ.SERVER.PING.HEALTHZ` | `HealthzReq` |
Account-scoped:
| Subject | Handler |
|---------|---------|
| `$SYS.REQ.ACCOUNT.*.CONNZ` | `AccountConnzReq` |
| `$SYS.REQ.ACCOUNT.*.SUBSZ` | `AccountSubszReq` |
| `$SYS.REQ.ACCOUNT.*.INFO` | `AccountInfoReq` |
| `$SYS.REQ.ACCOUNT.*.STATZ` | `AccountStatzReq` |
### Implementation
Handlers reuse existing `MonitorServer` data builders. The request body (if present) is parsed for options (e.g., sort, limit for CONNZ). Response is serialized to JSON and published on the request's reply subject via `SendInternalMsg`.
---
## Layer 5: Import/Export Model + ACCOUNT Client
### Export Types
**New file:** `src/NATS.Server/Imports/StreamExport.cs`
```csharp
public sealed class StreamExport
{
public ExportAuth Auth { get; init; } = new();
}
```
**New file:** `src/NATS.Server/Imports/ServiceExport.cs`
```csharp
public sealed class ServiceExport
{
public ExportAuth Auth { get; init; } = new();
public Account? Account { get; init; }
public ServiceResponseType ResponseType { get; init; } = ServiceResponseType.Singleton;
public TimeSpan ResponseThreshold { get; init; } = TimeSpan.FromMinutes(2);
public ServiceLatency? Latency { get; init; }
public bool AllowTrace { get; init; }
}
```
**New file:** `src/NATS.Server/Imports/ExportAuth.cs`
```csharp
public sealed class ExportAuth
{
public bool TokenRequired { get; init; }
public uint AccountPosition { get; init; }
public HashSet<string>? ApprovedAccounts { get; init; }
public Dictionary<string, long>? RevokedAccounts { get; init; }
public bool IsAuthorized(Account account) { ... }
}
```
### Import Types
**New file:** `src/NATS.Server/Imports/StreamImport.cs`
```csharp
public sealed class StreamImport
{
public required Account SourceAccount { get; init; }
public required string From { get; init; }
public required string To { get; init; }
public SubjectTransform? Transform { get; init; }
public bool UsePub { get; init; }
public bool Invalid { get; set; }
}
```
**New file:** `src/NATS.Server/Imports/ServiceImport.cs`
```csharp
public sealed class ServiceImport
{
public required Account DestinationAccount { get; init; }
public required string From { get; init; }
public required string To { get; init; }
public SubjectTransform? Transform { get; init; }
public ServiceExport? Export { get; init; }
public ServiceResponseType ResponseType { get; init; }
public byte[]? Sid { get; set; }
public bool IsResponse { get; init; }
public bool UsePub { get; init; }
public bool Invalid { get; set; }
public bool Share { get; init; }
public bool Tracking { get; init; }
}
```
### Account Extensions
Add to `Account`:
```csharp
// Export/Import maps
public ExportMap Exports { get; } = new();
public ImportMap Imports { get; } = new();
// Internal ACCOUNT client (lazy)
private InternalClient? _internalClient;
public InternalClient GetOrCreateInternalClient(NatsServer server) { ... }
// Internal subscription management
private ulong _internalSubId;
public Subscription SubscribeInternal(string subject, SystemMessageHandler callback) { ... }
// Import/Export APIs
public void AddServiceExport(string subject, ServiceResponseType responseType, IEnumerable<Account>? approved);
public void AddStreamExport(string subject, IEnumerable<Account>? approved);
public ServiceImport AddServiceImport(Account destination, string from, string to);
public void AddStreamImport(Account source, string from, string to);
```
### ExportMap / ImportMap
```csharp
public sealed class ExportMap
{
public Dictionary<string, StreamExport> Streams { get; } = new(StringComparer.Ordinal);
public Dictionary<string, ServiceExport> Services { get; } = new(StringComparer.Ordinal);
public Dictionary<string, ServiceImport> Responses { get; } = new(StringComparer.Ordinal);
}
public sealed class ImportMap
{
public List<StreamImport> Streams { get; } = [];
public Dictionary<string, List<ServiceImport>> Services { get; } = new(StringComparer.Ordinal);
}
```
### Service Import Subscription Flow
1. `account.AddServiceImport(dest, "requests.>", "api.>")` called
2. Account creates its `InternalClient` (Kind=Account) if needed
3. Creates subscription on `"requests.>"` in account's SubList with `Client = internalClient`
4. Subscription carries a `ServiceImport` reference
5. When message matches, `DeliverMessage` detects internal client → invokes `ProcessServiceImport`
### ProcessServiceImport Callback
1. Transform subject if transform configured
2. Match against destination account's SubList
3. Deliver to destination subscribers (rewriting reply subject for response routing)
4. If reply present: set up response service import (see Layer 6)
### Stream Import Delivery
In `DeliverMessage`, before sending to subscriber:
- If subscription has `StreamImport` reference, apply subject transform
- Deliver with transformed subject
### Message Delivery Path Changes
`NatsServer.ProcessMessage` needs modification:
- After matching local account SubList, also check for service imports that might forward to other accounts
- For subscriptions with `sub.StreamImport != null`, transform subject before delivery
---
## Layer 6: Response Routing + Latency Tracking
### Service Reply Prefix
Generated per request: `_R_.{random10chars}.` — unique reply namespace in the exporting account.
### Response Service Import Creation
When `ProcessServiceImport` handles a request with a reply subject:
1. Generate new reply prefix: `_R_.{random}.`
2. Create response `ServiceImport` in the exporting account:
- `From = newReplyPrefix + ">"` (wildcard to catch all responses)
- `To = originalReply` (original reply subject in importing account)
- `IsResponse = true`
3. Subscribe to new prefix in exporting account
4. Rewrite reply in forwarded message to new prefix
5. Store in `ExportMap.Responses[newPrefix]`
### Response Delivery
When exporting account service responds on the rewritten reply:
1. Response matches the `_R_.{random}.>` subscription
2. Response service import callback fires
3. Transforms reply back to original subject
4. Delivers to original account's subscribers
### Cleanup
- **Singleton:** Remove response import after first response delivery
- **Streamed:** Track timestamp, clean up via timer after `ResponseThreshold` (default 2 min)
- **Chunked:** Same as Streamed
Timer runs periodically (every 30s), checks `ServiceImport.Timestamp` against threshold, removes stale entries.
### Latency Tracking
```csharp
public sealed class ServiceLatency
{
public int SamplingPercentage { get; init; } // 1-100
public string Subject { get; init; } = string.Empty; // where to publish metrics
}
public record ServiceLatencyMsg(
TypedEvent Event,
string Status,
string Requestor, // Account name
string Responder, // Account name
TimeSpan RequestStart,
TimeSpan ServiceLatency,
TimeSpan TotalLatency);
```
When tracking is enabled:
1. Record request timestamp when creating response import
2. On response delivery, calculate latency
3. Publish `ServiceLatencyMsg` to configured subject
4. Sampling: only track if `Random.Shared.Next(100) < SamplingPercentage`
---
## Testing Strategy
### Layer 1 Tests
- Verify `ClientKind.IsInternal()` for all kinds
- Create `InternalClient`, verify properties (Kind, Id, Account, IsInternal)
- Verify `INatsClient` interface on both `NatsClient` and `InternalClient`
### Layer 2 Tests
- Start/stop `InternalEventSystem` lifecycle
- `SysSubscribe` creates subscription in system account SubList
- `SendInternalMsg` delivers to system subscribers via send loop
- `noInlineCallback` queues to receive loop rather than executing inline
- Concurrent publish/subscribe stress test
### Layer 3 Tests
- Connect event published on `$SYS.ACCOUNT.{acc}.CONNECT` when client authenticates
- Disconnect event published when client closes
- Server stats published every 10s on `$SYS.SERVER.{id}.STATSZ`
- Account conns published every 30s for accounts with connections
- Shutdown event published during shutdown
- Auth error event published on auth failure
- Event JSON structure matches Go format
### Layer 4 Tests
- Subscribe to `$SYS.REQ.SERVER.{id}.VARZ`, send request, verify response matches /varz
- Subscribe to `$SYS.REQ.SERVER.{id}.CONNZ`, verify response
- Ping wildcard `$SYS.REQ.SERVER.PING.HEALTHZ` receives response
- Account-scoped requests work
### Layer 5 Tests
- `AddServiceExport` + `AddServiceImport` creates internal subscription
- Message published on import subject is forwarded to export account
- Wildcard imports with subject transforms
- Authorization: only approved accounts can import
- Stream import with subject transform
- Cycle detection in service imports
- Account internal client lazy creation
### Layer 6 Tests
- Service import request-reply: request forwarded with rewritten reply, response routed back
- Singleton response: import cleaned up after one response
- Streamed response: multiple responses, cleaned up after timeout
- Latency tracking: metrics published to configured subject
- Response threshold timer cleans up stale entries
---
## Files to Create/Modify
### New Files
- `src/NATS.Server/ClientKind.cs`
- `src/NATS.Server/INatsClient.cs`
- `src/NATS.Server/InternalClient.cs`
- `src/NATS.Server/Events/InternalEventSystem.cs`
- `src/NATS.Server/Events/EventTypes.cs` (all event DTOs)
- `src/NATS.Server/Events/EventJsonContext.cs` (source gen)
- `src/NATS.Server/Events/EventSubjects.cs` (subject constants)
- `src/NATS.Server/Imports/ServiceImport.cs`
- `src/NATS.Server/Imports/StreamImport.cs`
- `src/NATS.Server/Imports/ServiceExport.cs`
- `src/NATS.Server/Imports/StreamExport.cs`
- `src/NATS.Server/Imports/ExportAuth.cs`
- `src/NATS.Server/Imports/ExportMap.cs`
- `src/NATS.Server/Imports/ImportMap.cs`
- `src/NATS.Server/Imports/ServiceResponseType.cs`
- `src/NATS.Server/Imports/ServiceLatency.cs`
- `tests/NATS.Server.Tests/InternalClientTests.cs`
- `tests/NATS.Server.Tests/EventSystemTests.cs`
- `tests/NATS.Server.Tests/SystemEventsTests.cs`
- `tests/NATS.Server.Tests/SystemRequestReplyTests.cs`
- `tests/NATS.Server.Tests/ImportExportTests.cs`
- `tests/NATS.Server.Tests/ResponseRoutingTests.cs`
### Modified Files
- `src/NATS.Server/NatsClient.cs` — implement `INatsClient`, add `Kind` property
- `src/NATS.Server/NatsServer.cs` — integrate event system, add import/export message path, system event publishing
- `src/NATS.Server/Auth/Account.cs` — add exports/imports, internal client, subscription APIs
- `src/NATS.Server/Subscriptions/Subscription.cs``Client``INatsClient?`, add `ServiceImport?`, `StreamImport?`
- `src/NATS.Server/Subscriptions/SubList.cs` — work with `INatsClient` if needed
- `src/NATS.Server/Monitoring/MonitorServer.cs` — expose data builders for request-reply handlers
- `differences.md` — update SYSTEM, ACCOUNT, import/export status