Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Gateway gRPC Authorization
The authorization subsystem has two layers. The gRPC interceptor enforces the verb scope required by the RPC. Service-layer constraint checks then narrow what an authenticated API key can browse, read, or write inside the Galaxy.
Overview
Authorization runs as a single gRPC server interceptor registered for every call on the gateway. It pulls the authenticated identity for the current request, derives the scope that the request type requires, and either lets the call continue or fails the call with a gRPC status. The pipeline keeps service classes free of cross-cutting checks, which matches the gateway.md "thin gRPC layer" rule that service handlers translate between contracts and domain code without owning policy.
The participating types live under src/MxGateway.Server/Security/Authorization/:
GatewayGrpcAuthorizationInterceptorruns the authenticate-then-authorize pipeline for unary and server-streaming calls.GatewayGrpcScopeResolvermaps a request message (and, forMxCommandRequest, the innerMxCommandKind) to the scope string that must be present on the caller.GatewayScopesexposes the canonical scope constants used by the resolver and any downstream consumer.GatewayRequestIdentityAccessorandIGatewayRequestIdentityAccessorexpose the verified identity to handlers and any service code that runs inside the call.IConstraintEnforcerapplies optional API-key constraints against the cached Galaxy hierarchy from service bodies.GrpcAuthorizationServiceCollectionExtensionswires the components into the DI container and the gRPC pipeline.
The ApiKeyIdentity consumed here is produced by the authentication layer; see Authentication for how it is built and how scopes are persisted.
Why an Interceptor
Centralizing the policy in GatewayGrpcAuthorizationInterceptor produces three concrete benefits:
- Every RPC defined in
MxAccessGatewayServiceis covered by construction. A new RPC inherits the check the moment its request type is added toGatewayGrpcScopeResolver, instead of relying on each service method to remember to call an authorization helper. - Verb-scope policy stays centralized. Request-specific constraints still run in service bodies because they need command payloads, item handles, and Galaxy metadata that the interceptor should not inspect.
- Authentication and authorization happen in one place, so the gRPC
Statusmapping is consistent. A failed key check always returnsUnauthenticated, and a missing scope always returnsPermissionDeniedwith the offending scope name.
Interceptor Flow
GatewayGrpcAuthorizationInterceptor overrides both UnaryServerHandler and ServerStreamingServerHandler. Both call the same private AuthenticateAndAuthorizeAsync helper before invoking the continuation, then push the resolved identity onto the accessor for the duration of the call.
public override async Task<TResponse> UnaryServerHandler<TRequest, TResponse>(
TRequest request,
ServerCallContext context,
UnaryServerMethod<TRequest, TResponse> continuation)
{
ApiKeyIdentity? identity = await AuthenticateAndAuthorizeAsync(request, context).ConfigureAwait(false);
IDisposable? identityScope = identity is null ? null : identityAccessor.Push(identity);
using (identityScope)
{
return await continuation(request, context).ConfigureAwait(false);
}
}
The shared helper performs the actual decision:
if (options.Value.Authentication.Mode == AuthenticationMode.Disabled)
{
return null;
}
string? authorizationHeader = context.RequestHeaders.GetValue("authorization");
ApiKeyVerificationResult verificationResult = await apiKeyVerifier
.VerifyAsync(authorizationHeader, context.CancellationToken)
.ConfigureAwait(false);
if (!verificationResult.Succeeded || verificationResult.Identity is null)
{
throw new RpcException(new Status(
StatusCode.Unauthenticated,
"Missing or invalid API key."));
}
string requiredScope = scopeResolver.ResolveRequiredScope(request);
if (!verificationResult.Identity.Scopes.Contains(requiredScope))
{
throw new RpcException(new Status(
StatusCode.PermissionDenied,
$"API key is missing required scope '{requiredScope}'."));
}
return verificationResult.Identity;
The flow is:
- If
GatewayOptions.Authentication.ModeisAuthenticationMode.Disabled, the helper returnsnullimmediately. No identity is pushed onto the accessor and the continuation runs without scope enforcement. This matches theAuthenticationModeenum, which only definesApiKeyandDisabled. - Otherwise, the
authorizationrequest header is read directly offServerCallContext.RequestHeadersand handed toIApiKeyVerifier.VerifyAsync. A failed verification or a missing identity throwsRpcExceptionwithStatusCode.Unauthenticated. GatewayGrpcScopeResolver.ResolveRequiredScope(request)produces the scope string. If the identity'sScopesset does not contain it, the helper throwsRpcExceptionwithStatusCode.PermissionDeniedand embeds the missing scope name inStatus.Detailso callers can diagnose the failure.- On success, the verified
ApiKeyIdentityis returned and pushed ontoIGatewayRequestIdentityAccessorfor the lifetime of the call.
The status codes are deliberately distinct: Unauthenticated signals "we do not know who you are," and PermissionDenied signals "we know who you are, but you cannot do this." Treating the two as the same code would make troubleshooting harder for client implementations.
Scope Resolution
GatewayGrpcScopeResolver is a stateless singleton that switches on the runtime request type. Top-level RPC requests map directly:
public string ResolveRequiredScope(object request)
{
return request switch
{
OpenSessionRequest => GatewayScopes.SessionOpen,
CloseSessionRequest => GatewayScopes.SessionClose,
StreamEventsRequest => GatewayScopes.EventsRead,
MxCommandRequest commandRequest => ResolveCommandScope(commandRequest.Command?.Kind ?? MxCommandKind.Unspecified),
AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite,
QueryActiveAlarmsRequest => GatewayScopes.EventsRead,
TestConnectionRequest or
GetLastDeployTimeRequest or
DiscoverHierarchyRequest or
WatchDeployEventsRequest => GatewayScopes.MetadataRead,
_ => GatewayScopes.Admin
};
}
The _ => GatewayScopes.Admin fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. AcknowledgeAlarm is treated as a write — it mutates alarm state, mirroring MxCommandKind.Write* — and QueryActiveAlarms shares the alarm/event surface with StreamEvents and MxCommandKind.DrainEvents, so it carries events:read.
MxCommandRequest is special because it multiplexes many MxAccess operations through a single RPC. The resolver inspects the embedded MxCommandKind so each operation gets its own scope:
private static string ResolveCommandScope(MxCommandKind kind)
{
return kind switch
{
MxCommandKind.Write or
MxCommandKind.Write2 => GatewayScopes.InvokeWrite,
MxCommandKind.WriteSecured or
MxCommandKind.WriteSecured2 or
MxCommandKind.AuthenticateUser => GatewayScopes.InvokeSecure,
MxCommandKind.ArchestraUserToId or
MxCommandKind.GetSessionState or
MxCommandKind.GetWorkerInfo => GatewayScopes.MetadataRead,
MxCommandKind.DrainEvents => GatewayScopes.EventsRead,
MxCommandKind.ShutdownWorker => GatewayScopes.Admin,
_ => GatewayScopes.InvokeRead
};
}
Reads (Register, AddItem, Advise, and any other unspecified kind) fall through to InvokeRead, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown.
Constraint Enforcement
ApiKeyIdentity.Constraints is optional. Empty constraints preserve the
previous behavior: the key is authorized only by its verb scopes. Non-empty
constraints are stored as JSON in api_keys.constraints and are applied by
IConstraintEnforcer after the interceptor succeeds.
Supported constraints are:
| Constraint | Meaning |
|---|---|
read_subtrees |
Contained-path globs allowed for read/subscription commands. |
write_subtrees |
Contained-path globs allowed for write commands. |
read_tag_globs |
Tag-address globs allowed for read/subscription commands. |
write_tag_globs |
Tag-address globs allowed for write commands. |
max_write_classification |
Maximum Galaxy attribute security_classification a key may write. |
browse_subtrees |
Contained-path globs used to filter Galaxy browse results and deploy-event counts. |
read_alarm_only |
Read/subscription commands must target objects with alarm-bearing attributes. |
read_historized_only |
Read/subscription commands must target objects with historized attributes. |
Glob matching is anchored, case-insensitive, and supports * and ?.
Subtree and tag glob lists are alternatives: matching either list allows that
scope dimension. Empty lists mean unconstrained for that dimension.
Constraints are set when a key is created — through the apikey create-key
flags (see Authentication) or the dashboard API Keys
page create dialog (see
Gateway Dashboard Design). The
dashboard API Keys page also renders each key's effective constraints.
The service checks read constraints for AddItem, AddItem2, AddItemBulk,
SubscribeBulk, and AdviseItemBulk. It checks write constraints for
Write, Write2, WriteSecured, and WriteSecured2. Successful item
registrations are tracked per session so later item-handle commands resolve
back to the original tag address. If a constrained key presents an unknown item
handle, the gateway fails closed.
Non-bulk constraint failures return gRPC PermissionDenied. Bulk read
commands preserve input order and return a failed SubscribeResult for each
denied item while still forwarding allowed items to the worker. Every denial
adds an api_key_audit entry with the key id, command kind, target, and
blocking constraint; secured values and raw credentials are never logged.
Scope Catalog
GatewayScopes is the single source of truth for scope strings. Every entry is currently mapped by either the resolver or another security component:
| Constant | Value | Required For |
|---|---|---|
SessionOpen |
session:open |
OpenSessionRequest |
SessionClose |
session:close |
CloseSessionRequest |
EventsRead |
events:read |
StreamEventsRequest, QueryActiveAlarmsRequest, MxCommandKind.DrainEvents |
InvokeRead |
invoke:read |
MxCommandRequest for read-style command kinds (Register, AddItem, Advise, and any kind not otherwise mapped) |
InvokeWrite |
invoke:write |
AcknowledgeAlarmRequest, MxCommandKind.Write, MxCommandKind.Write2, MxCommandKind.WriteBulk, MxCommandKind.Write2Bulk |
InvokeSecure |
invoke:secure |
MxCommandKind.WriteSecured, MxCommandKind.WriteSecured2, MxCommandKind.WriteSecuredBulk, MxCommandKind.WriteSecured2Bulk, MxCommandKind.AuthenticateUser |
MetadataRead |
metadata:read |
MxCommandKind.ArchestraUserToId, MxCommandKind.GetSessionState, MxCommandKind.GetWorkerInfo, GalaxyRepository.TestConnection, GalaxyRepository.GetLastDeployTime, GalaxyRepository.DiscoverHierarchy, GalaxyRepository.WatchDeployEvents |
Admin |
admin |
MxCommandKind.ShutdownWorker, the default for any unrecognized request type, and the dashboard authorization policy |
The Admin constant is also referenced by DashboardAuthenticator and DashboardAuthorizationHandler so that the dashboard and the gRPC layer agree on what "admin" means.
Identity Access for Downstream Layers
Once authorization passes, GatewayGrpcAuthorizationInterceptor calls identityAccessor.Push(identity) and disposes the returned scope when the continuation completes. GatewayRequestIdentityAccessor stores the active identity in an AsyncLocal<ApiKeyIdentity?>, so the value flows across await boundaries and child tasks belonging to the same request.
public sealed class GatewayRequestIdentityAccessor : IGatewayRequestIdentityAccessor
{
private readonly AsyncLocal<ApiKeyIdentity?> currentIdentity = new();
public ApiKeyIdentity? Current => currentIdentity.Value;
public IDisposable Push(ApiKeyIdentity identity)
{
ArgumentNullException.ThrowIfNull(identity);
ApiKeyIdentity? previousIdentity = currentIdentity.Value;
currentIdentity.Value = identity;
return new IdentityScope(this, previousIdentity);
}
}
The returned IdentityScope restores the previous value on dispose rather than clearing it. This makes the accessor safe for nested pushes, even though the current interceptor only pushes once per call. Disposing twice is a no-op because of the disposed guard inside IdentityScope.
Downstream code consumes the accessor through the IGatewayRequestIdentityAccessor interface:
public interface IGatewayRequestIdentityAccessor
{
ApiKeyIdentity? Current { get; }
IDisposable Push(ApiKeyIdentity identity);
}
MxAccessGatewayService takes IGatewayRequestIdentityAccessor as a constructor dependency and reads Current whenever it needs to attach the calling identity to a domain operation, which keeps the service free of header parsing or scope checks.
When AuthenticationMode.Disabled is configured, no identity is pushed, so Current returns null. Downstream code must tolerate that, just as it tolerates the absence of a scope check.
Registration
GrpcAuthorizationServiceCollectionExtensions.AddGatewayGrpcAuthorization is the single entry point that registers every component and inserts the interceptor into the gRPC pipeline:
public static IServiceCollection AddGatewayGrpcAuthorization(this IServiceCollection services)
{
services.AddSingleton<GatewayGrpcScopeResolver>();
services.AddSingleton<IGatewayRequestIdentityAccessor, GatewayRequestIdentityAccessor>();
services.AddSingleton<GatewayGrpcAuthorizationInterceptor>();
services.AddGrpc(options => options.Interceptors.Add<GatewayGrpcAuthorizationInterceptor>());
return services;
}
Singleton lifetimes are appropriate because none of the three classes hold per-request state on instance fields; the request-scoped value lives inside the AsyncLocal on GatewayRequestIdentityAccessor. GatewayApplication calls builder.Services.AddGatewayGrpcAuthorization() during startup, and the call also performs AddGrpc, so the gateway never registers gRPC without the interceptor attached.