- Resolve 14 conflicts from popping local stash on top of origin'seed1e88+8d3352fdoc-comment additions (11 mechanical, plus version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs) - Fix 4 test files that used AGENTS.md as the repo-root sentinel (now use CLAUDE.md, since AGENTS.md was removed in4731ab5) - Redirect 10 doc citations from AGENTS.md to the matching gateway.md sections (Value Model, Status Model, Security, STA Worker Thread Model, gRPC Layer rule, cancellation rule) Verified: solution build clean, x86 worker build clean, 266/266 gateway tests passing, 121/121 worker tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Gateway gRPC Service Layer
The gRPC service layer is the public entry point for client traffic. It is intentionally thin: handlers validate the incoming request, look up or open a session, dispatch to the worker through the session manager, and translate worker replies and events back into public proto types.
Layer Responsibilities
The architecture rule (from gateway.md) is that the gRPC layer must "validate request, find session, call the session worker client, map worker replies to public replies, and stream events". Anything else — caching, retries, worker process lifetime, event ordering — lives behind ISessionManager and the worker client. Keeping the layer thin lets the same session/worker code be reused by future transports (for example, an in-process host or an alternate IPC) without having to re-derive validation or mapping rules.
The layer is composed of four collaborators:
| Type | Lifetime | Role |
|---|---|---|
MxAccessGatewayService |
scoped (gRPC) | Implements the four MxAccessGateway RPCs, performs exception mapping. |
MxAccessGrpcRequestValidator |
singleton | Rejects malformed requests before any session work runs. |
MxAccessGrpcMapper |
singleton | Converts public proto types to internal WorkerCommand/WorkerEvent types and back. |
IEventStreamService (EventStreamService) |
singleton | Owns the event stream pipeline, including bounded queue and backpressure handling. |
Registration happens in GatewayApplication:
builder.Services.AddSingleton<MxAccessGrpcMapper>();
builder.Services.AddSingleton<MxAccessGrpcRequestValidator>();
builder.Services.AddSingleton<IEventStreamService, EventStreamService>();
The service itself is mapped as a normal gRPC endpoint via endpoints.MapGrpcService<MxAccessGatewayService>().
A second gRPC service, GalaxyRepositoryGrpcService, is mapped alongside it. It exposes the read-only Galaxy Repository browse surface and is documented separately in Galaxy Repository Browse. It shares the authorization interceptor and authentication policy used by MxAccessGatewayService, but it does not go through the session manager or worker — it talks to SQL Server directly.
RPC Handlers
MxAccessGatewayService derives from the generated MxAccessGateway.MxAccessGatewayBase and implements every RPC declared in mxaccess_gateway.proto. The proto contract itself is documented in Contracts; this section covers only what the server-side handler does on top of that contract.
Public gRPC send and receive message sizes are configured from
MxGateway:Protocol:MaxGrpcMessageBytes (default 16 MiB). Official clients use
the same default so paged Galaxy browse replies and larger MXAccess payloads
fail consistently instead of depending on language-specific gRPC defaults.
OpenSession
OpenSession validates the request, asks ISessionManager to open a session under the caller's identity, and returns a reply that advertises both protocol versions and the capabilities the gateway supports. Capability strings are static because the gateway has a fixed feature set per build; clients use them as a forward-compatibility hint rather than runtime negotiation.
GatewaySession session = await sessionManager
.OpenSessionAsync(
SessionOpenRequest.FromContract(request),
ResolveClientIdentity(),
context.CancellationToken)
.ConfigureAwait(false);
OpenSessionReply reply = new()
{
SessionId = session.SessionId,
BackendName = session.BackendName,
WorkerProcessId = session.WorkerProcessId ?? 0,
WorkerProtocolVersion = GatewayContractInfo.WorkerProtocolVersion,
GatewayProtocolVersion = GatewayContractInfo.GatewayProtocolVersion,
DefaultCommandTimeout = Google.Protobuf.WellKnownTypes.Duration.FromTimeSpan(session.CommandTimeout),
ProtocolStatus = MxAccessGrpcMapper.Ok(),
};
ResolveClientIdentity() reads IGatewayRequestIdentityAccessor.Current and prefers DisplayName, falling back to KeyId. The accessor is populated by the authorization interceptor before the handler runs (see Authorization).
CloseSession
The handler delegates to sessionManager.CloseSessionAsync and converts the resulting SessionCloseResult into a CloseSessionReply. The handler distinguishes the two outcomes via the protocol status message — "Session was already closed." versus "Session closed." — so callers do not need to reason about idempotency from a status code alone.
Invoke
Invoke is the unary command path. It runs the validator (which enforces payload-vs-kind matching), uses the mapper to wrap the public MxCommand in a WorkerCommand with an enqueue timestamp, calls sessionManager.InvokeAsync, and unwraps the worker reply.
requestValidator.ValidateInvoke(request);
WorkerCommand workerCommand = mapper.MapCommand(request);
WorkerCommandReply workerReply = await sessionManager
.InvokeAsync(request.SessionId, workerCommand, context.CancellationToken)
.ConfigureAwait(false);
return mapper.MapCommandReply(workerReply);
Carrying the enqueue timestamp into the worker layer is what lets queue-wait time be measured separately from worker-side execution time when troubleshooting timeouts.
StreamEvents
StreamEvents is a server-streaming RPC. The handler delegates the full pipeline to IEventStreamService and just forwards each MxEvent onto the response stream. Keeping the channel and producer/consumer machinery out of the handler means cancellation, exception mapping, and metric bookkeeping live in one place.
Validation Rules
MxAccessGrpcRequestValidator rejects requests with StatusCode.InvalidArgument before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for ISessionManager so the validator can stay synchronous and side-effect free.
| RPC | Rule | Status |
|---|---|---|
OpenSession |
command_timeout, when set, must be > 0. |
InvalidArgument |
CloseSession |
session_id must be non-empty. |
InvalidArgument |
StreamEvents |
session_id must be non-empty. |
InvalidArgument |
Invoke |
session_id non-empty, command present, kind not Unspecified, payload oneof must match kind. |
InvalidArgument |
The payload-vs-kind check matters because the MxCommand.payload oneof is non-discriminated on the wire — a misaligned client could send kind = Write with a Register payload and silently confuse the worker. The validator turns that into a clear client error:
private static void ValidateCommandPayload(MxCommand command)
{
MxCommand.PayloadOneofCase expectedPayload = ExpectedPayload(command.Kind);
if (command.PayloadCase != expectedPayload)
{
throw InvalidArgument(
$"Command kind {command.Kind} requires payload {expectedPayload} but received {command.PayloadCase}.");
}
}
ExpectedPayload enumerates every MxCommandKind that has a matching payload case; unknown kinds map to PayloadOneofCase.None, which forces a mismatch and therefore a rejection.
Mapping Rules
MxAccessGrpcMapper is the only place that translates between public proto types and internal worker types. Two design choices are worth calling out.
The mapper clones the inbound MxCommand rather than reusing the reference. This isolates the worker pipeline from any later mutation of the request graph by the gRPC framework or interceptors:
public WorkerCommand MapCommand(MxCommandRequest request)
{
ArgumentNullException.ThrowIfNull(request);
ArgumentNullException.ThrowIfNull(request.Command);
return new WorkerCommand
{
Command = request.Command.Clone(),
EnqueueTimestamp = Timestamp.FromDateTimeOffset(_timeProvider.GetUtcNow()),
};
}
When the worker reply or event payload is missing, the mapper returns a synthetic public message with ProtocolStatusCode.ProtocolViolation (for replies) or a sentinel MxEvent with MxEventFamily.Unspecified (for events). The gateway never relays a partial frame to clients — anything missing is reported as a protocol violation against the worker, not a transport error against the client.
The mapper also exposes static factory methods for every ProtocolStatusCode (Ok, InvalidRequest, SessionNotFound, SessionNotReady, WorkerUnavailable, Timeout, Canceled, ProtocolViolation) so that handlers and tests can produce status payloads without duplicating the enum-to-string mapping.
Exception to Status Mapping
Every handler wraps its body in try { ... } catch (Exception exception) when (exception is not RpcException) { throw MapException(exception); }. RpcException is allowed to propagate untouched (the validator already produces them with the right code). All other exceptions are translated by MapException, which knows three categories:
OperationCanceledExceptionbecomesStatusCode.Cancelled.SessionManagerExceptionis mapped by itsErrorCode.WorkerClientExceptionis mapped by itsErrorCode.
Anything else is logged at warning and surfaced as Unavailable with a generic message — clients see a retryable status, and the unexpected exception is captured in gateway logs rather than leaked over the wire.
StatusCode statusCode = exception.ErrorCode switch
{
SessionManagerErrorCode.SessionNotFound => StatusCode.NotFound,
SessionManagerErrorCode.SessionNotReady => StatusCode.FailedPrecondition,
SessionManagerErrorCode.EventSubscriberAlreadyActive => StatusCode.ResourceExhausted,
SessionManagerErrorCode.EventQueueOverflow => StatusCode.ResourceExhausted,
SessionManagerErrorCode.SessionLimitExceeded => StatusCode.ResourceExhausted,
SessionManagerErrorCode.OpenFailed => StatusCode.Unavailable,
SessionManagerErrorCode.CloseFailed => StatusCode.Unavailable,
_ => StatusCode.Unavailable,
};
WorkerClientException follows the same pattern: CommandTimeout becomes DeadlineExceeded, GatewayShutdown becomes Cancelled, InvalidState becomes FailedPrecondition, ProtocolViolation becomes Internal, and unmapped codes fall through to Unavailable.
Event Streaming Model
EventStreamService implements the StreamEvents pipeline. It exists as a separate service (rather than living inside MxAccessGatewayService) so that the channel, backpressure policy, and metric bookkeeping can be unit-tested without spinning up a Kestrel host.
The pipeline has three stages:
- The session is resolved via
sessionManager.TryGetSession. A miss raisesSessionManagerException(SessionNotFound)so the handler reportsStatusCode.NotFound. session.AttachEventSubscriber(...)enforces the single-subscriber-per-session rule (or allows multiple subscribers ifSessions:AllowMultipleEventSubscribersis enabled). The returnedIDisposableis released in thefinallyblock, ensuring the subscriber slot is freed even when the client cancels mid-stream.- A
Channel<MxEvent>decouples the worker-side producer from the gRPC writer. The channel is bounded byEvents:QueueCapacityand configured for a single reader and writer:
Channel<MxEvent> eventQueue = Channel.CreateBounded<MxEvent>(
new BoundedChannelOptions(options.Value.Events.QueueCapacity)
{
SingleReader = true,
SingleWriter = true,
FullMode = BoundedChannelFullMode.Wait,
AllowSynchronousContinuations = false,
});
The producer reads WorkerEvents from the session, maps them, and writes to the channel. Per-session ordering is preserved end-to-end: events arrive from the worker in WorkerSequence order, the producer is the single writer, and the consumer drains FIFO. The producer also honours the request's AfterWorkerSequence cursor by skipping any event whose WorkerSequence is at or before the requested cutoff, which lets clients resume after a disconnect without server-side replay state.
Backpressure
When TryWrite fails the queue is full. The handling depends on Events:BackpressurePolicy:
if (!writer.TryWrite(publicEvent))
{
string message = $"Session {session.SessionId} event stream queue overflowed.";
metrics.QueueOverflow("grpc-event-stream");
if (options.Value.Events.BackpressurePolicy == EventBackpressurePolicy.FailFast)
{
session.MarkFaulted(message);
metrics.Fault(SessionManagerErrorCode.EventQueueOverflow.ToString());
}
else
{
logger.LogDebug(
"Disconnecting event stream for session {SessionId} after queue overflow.",
session.SessionId);
}
writer.TryComplete(new SessionManagerException(
SessionManagerErrorCode.EventQueueOverflow,
message));
return;
}
Under FailFast the session is faulted so subsequent commands return FailedPrecondition; the client must reopen. Under the default policy only the stream is dropped and the session continues to accept commands, leaving recovery to the client (typically a fresh StreamEvents call with an updated AfterWorkerSequence). Either way, the consumer side observes StatusCode.ResourceExhausted via the EventQueueOverflow mapping above.
Cancellation and cleanup
The handler creates a linked cancellation token (streamCts) so that completing the consumer (client disconnect, error, or graceful end-of-stream) also cancels the producer. The finally block cancels the source, disposes the subscriber slot, awaits the producer (swallowing the expected cancellation), and emits StreamDisconnected("Detached") so dashboards see the disconnection regardless of cause.
WorkerClientException thrown by the producer marks the session as faulted before completing the channel — the worker is presumed gone, and any subsequent command on that session must observe the fault rather than silently retry.
Authorization Interceptor Integration
Authorization is applied as a gRPC interceptor, registered in GrpcAuthorizationServiceCollectionExtensions:
services.AddSingleton<GatewayGrpcAuthorizationInterceptor>();
services.AddGrpc(options => options.Interceptors.Add<GatewayGrpcAuthorizationInterceptor>());
Because the interceptor runs before any handler, MxAccessGatewayService can safely assume the call has been authorized and that IGatewayRequestIdentityAccessor.Current is populated. The handler's only responsibility is to read the identity for OpenSession so the session is owned by the authenticated principal; it does not perform any authorization checks of its own. See Authorization for the policy and identity model.