1aafd6bde4
Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
282 lines
16 KiB
Markdown
282 lines
16 KiB
Markdown
# Gateway gRPC Authorization
|
|
|
|
The authorization subsystem has two layers. The gRPC interceptor enforces the
|
|
verb scope required by the RPC. Service-layer constraint checks then narrow
|
|
what an authenticated API key can browse, read, or write inside the Galaxy.
|
|
|
|
## Overview
|
|
|
|
Authorization runs as a single gRPC server interceptor registered for every call on the gateway. It pulls the authenticated identity for the current request, derives the scope that the request type requires, and either lets the call continue or fails the call with a gRPC status. The pipeline keeps service classes free of cross-cutting checks, which matches the `gateway.md` "thin gRPC layer" rule that service handlers translate between contracts and domain code without owning policy.
|
|
|
|
The participating types live under `src/MxGateway.Server/Security/Authorization/`:
|
|
|
|
- `GatewayGrpcAuthorizationInterceptor` runs the authenticate-then-authorize pipeline for unary and server-streaming calls.
|
|
- `GatewayGrpcScopeResolver` maps a request message (and, for `MxCommandRequest`, the inner `MxCommandKind`) to the scope string that must be present on the caller.
|
|
- `GatewayScopes` exposes the canonical scope constants used by the resolver and any downstream consumer.
|
|
- `GatewayRequestIdentityAccessor` and `IGatewayRequestIdentityAccessor` expose the verified identity to handlers and any service code that runs inside the call.
|
|
- `IConstraintEnforcer` applies optional API-key constraints against the
|
|
cached Galaxy hierarchy from service bodies.
|
|
- `GrpcAuthorizationServiceCollectionExtensions` wires the components into the DI container and the gRPC pipeline.
|
|
|
|
The `ApiKeyIdentity` consumed here is produced by the authentication layer; see [Authentication](./Authentication.md) for how it is built and how scopes are persisted.
|
|
|
|
## Why an Interceptor
|
|
|
|
Centralizing the policy in `GatewayGrpcAuthorizationInterceptor` produces three concrete benefits:
|
|
|
|
1. Every RPC defined in `MxAccessGatewayService` is covered by construction. A new RPC inherits the check the moment its request type is added to `GatewayGrpcScopeResolver`, instead of relying on each service method to remember to call an authorization helper.
|
|
2. Verb-scope policy stays centralized. Request-specific constraints still run
|
|
in service bodies because they need command payloads, item handles, and
|
|
Galaxy metadata that the interceptor should not inspect.
|
|
3. Authentication and authorization happen in one place, so the gRPC `Status` mapping is consistent. A failed key check always returns `Unauthenticated`, and a missing scope always returns `PermissionDenied` with the offending scope name.
|
|
|
|
## Interceptor Flow
|
|
|
|
`GatewayGrpcAuthorizationInterceptor` overrides both `UnaryServerHandler` and `ServerStreamingServerHandler`. Both call the same private `AuthenticateAndAuthorizeAsync` helper before invoking the continuation, then push the resolved identity onto the accessor for the duration of the call.
|
|
|
|
```csharp
|
|
public override async Task<TResponse> UnaryServerHandler<TRequest, TResponse>(
|
|
TRequest request,
|
|
ServerCallContext context,
|
|
UnaryServerMethod<TRequest, TResponse> continuation)
|
|
{
|
|
ApiKeyIdentity? identity = await AuthenticateAndAuthorizeAsync(request, context).ConfigureAwait(false);
|
|
IDisposable? identityScope = identity is null ? null : identityAccessor.Push(identity);
|
|
using (identityScope)
|
|
{
|
|
return await continuation(request, context).ConfigureAwait(false);
|
|
}
|
|
}
|
|
```
|
|
|
|
The shared helper performs the actual decision:
|
|
|
|
```csharp
|
|
if (options.Value.Authentication.Mode == AuthenticationMode.Disabled)
|
|
{
|
|
return null;
|
|
}
|
|
|
|
string? authorizationHeader = context.RequestHeaders.GetValue("authorization");
|
|
ApiKeyVerificationResult verificationResult = await apiKeyVerifier
|
|
.VerifyAsync(authorizationHeader, context.CancellationToken)
|
|
.ConfigureAwait(false);
|
|
|
|
if (!verificationResult.Succeeded || verificationResult.Identity is null)
|
|
{
|
|
throw new RpcException(new Status(
|
|
StatusCode.Unauthenticated,
|
|
"Missing or invalid API key."));
|
|
}
|
|
|
|
string requiredScope = scopeResolver.ResolveRequiredScope(request);
|
|
if (!verificationResult.Identity.Scopes.Contains(requiredScope))
|
|
{
|
|
throw new RpcException(new Status(
|
|
StatusCode.PermissionDenied,
|
|
$"API key is missing required scope '{requiredScope}'."));
|
|
}
|
|
|
|
return verificationResult.Identity;
|
|
```
|
|
|
|
The flow is:
|
|
|
|
1. If `GatewayOptions.Authentication.Mode` is `AuthenticationMode.Disabled`, the helper returns `null` immediately. No identity is pushed onto the accessor and the continuation runs without scope enforcement. This matches the `AuthenticationMode` enum, which only defines `ApiKey` and `Disabled`.
|
|
2. Otherwise, the `authorization` request header is read directly off `ServerCallContext.RequestHeaders` and handed to `IApiKeyVerifier.VerifyAsync`. A failed verification or a missing identity throws `RpcException` with `StatusCode.Unauthenticated`.
|
|
3. `GatewayGrpcScopeResolver.ResolveRequiredScope(request)` produces the scope string. If the identity's `Scopes` set does not contain it, the helper throws `RpcException` with `StatusCode.PermissionDenied` and embeds the missing scope name in `Status.Detail` so callers can diagnose the failure.
|
|
4. On success, the verified `ApiKeyIdentity` is returned and pushed onto `IGatewayRequestIdentityAccessor` for the lifetime of the call.
|
|
|
|
The status codes are deliberately distinct: `Unauthenticated` signals "we do not know who you are," and `PermissionDenied` signals "we know who you are, but you cannot do this." Treating the two as the same code would make troubleshooting harder for client implementations.
|
|
|
|
## Scope Resolution
|
|
|
|
`GatewayGrpcScopeResolver` is a stateless singleton that switches on the runtime request type. Top-level RPC requests map directly:
|
|
|
|
```csharp
|
|
public string ResolveRequiredScope(object request)
|
|
{
|
|
return request switch
|
|
{
|
|
OpenSessionRequest => GatewayScopes.SessionOpen,
|
|
CloseSessionRequest => GatewayScopes.SessionClose,
|
|
StreamEventsRequest => GatewayScopes.EventsRead,
|
|
MxCommandRequest commandRequest => ResolveCommandScope(commandRequest.Command?.Kind ?? MxCommandKind.Unspecified),
|
|
AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite,
|
|
QueryActiveAlarmsRequest => GatewayScopes.EventsRead,
|
|
TestConnectionRequest or
|
|
GetLastDeployTimeRequest or
|
|
DiscoverHierarchyRequest or
|
|
WatchDeployEventsRequest => GatewayScopes.MetadataRead,
|
|
_ => GatewayScopes.Admin
|
|
};
|
|
}
|
|
```
|
|
|
|
The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `QueryActiveAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`.
|
|
|
|
`MxCommandRequest` is special because it multiplexes many MxAccess operations through a single RPC. The resolver inspects the embedded `MxCommandKind` so each operation gets its own scope:
|
|
|
|
```csharp
|
|
private static string ResolveCommandScope(MxCommandKind kind)
|
|
{
|
|
return kind switch
|
|
{
|
|
MxCommandKind.Write or
|
|
MxCommandKind.Write2 or
|
|
MxCommandKind.WriteBulk or
|
|
MxCommandKind.Write2Bulk => GatewayScopes.InvokeWrite,
|
|
|
|
MxCommandKind.WriteSecured or
|
|
MxCommandKind.WriteSecured2 or
|
|
MxCommandKind.WriteSecuredBulk or
|
|
MxCommandKind.WriteSecured2Bulk or
|
|
MxCommandKind.AuthenticateUser => GatewayScopes.InvokeSecure,
|
|
|
|
MxCommandKind.ArchestraUserToId or
|
|
MxCommandKind.GetSessionState or
|
|
MxCommandKind.GetWorkerInfo => GatewayScopes.MetadataRead,
|
|
|
|
MxCommandKind.DrainEvents => GatewayScopes.EventsRead,
|
|
MxCommandKind.ShutdownWorker => GatewayScopes.Admin,
|
|
|
|
_ => GatewayScopes.InvokeRead
|
|
};
|
|
}
|
|
```
|
|
|
|
Reads (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any other unspecified kind) fall through to `InvokeRead`, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown. The four bulk-write families (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`) are mapped explicitly so a missing arm cannot silently demote a bulk write to a read scope.
|
|
|
|
## Constraint Enforcement
|
|
|
|
`ApiKeyIdentity.Constraints` is optional. Empty constraints preserve the
|
|
previous behavior: the key is authorized only by its verb scopes. Non-empty
|
|
constraints are stored as JSON in `api_keys.constraints` and are applied by
|
|
`IConstraintEnforcer` after the interceptor succeeds.
|
|
|
|
Supported constraints are:
|
|
|
|
| Constraint | Meaning |
|
|
|------------|---------|
|
|
| `read_subtrees` | Contained-path globs allowed for read/subscription commands. |
|
|
| `write_subtrees` | Contained-path globs allowed for write commands. |
|
|
| `read_tag_globs` | Tag-address globs allowed for read/subscription commands. |
|
|
| `write_tag_globs` | Tag-address globs allowed for write commands. |
|
|
| `max_write_classification` | Maximum Galaxy attribute `security_classification` a key may write. |
|
|
| `browse_subtrees` | Contained-path globs used to filter Galaxy browse results and deploy-event counts. |
|
|
| `read_alarm_only` | Read/subscription commands must target objects with alarm-bearing attributes. |
|
|
| `read_historized_only` | Read/subscription commands must target objects with historized attributes. |
|
|
|
|
Glob matching is anchored, case-insensitive, and supports `*` and `?`.
|
|
Subtree and tag glob lists are alternatives: matching either list allows that
|
|
scope dimension. Empty lists mean unconstrained for that dimension.
|
|
|
|
Constraints are set when a key is created — through the `apikey create-key`
|
|
flags (see [Authentication](./Authentication.md)) or the dashboard API Keys
|
|
page create dialog (see
|
|
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page)). The
|
|
dashboard API Keys page also renders each key's effective constraints.
|
|
|
|
The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`,
|
|
`SubscribeBulk`, `AdviseItemBulk`, and `ReadBulk`. It checks write constraints
|
|
for `Write`, `Write2`, `WriteSecured`, `WriteSecured2`, `WriteBulk`,
|
|
`Write2Bulk`, `WriteSecuredBulk`, and `WriteSecured2Bulk`. Bulk commands run
|
|
through `BulkConstraintPlan` (`ReadBulkConstraintPlan`,
|
|
`WriteBulkConstraintPlan`, `SubscribeBulkConstraintPlan`), which preserves the
|
|
caller's input order: each entry is evaluated against the constraint surface,
|
|
and `BulkConstraintPlan.MergeDeniedInto` re-merges denied entries back into
|
|
their original index positions so the reply slot at `entries[i]` always
|
|
corresponds to the request slot at `entries[i]`. Successful item registrations
|
|
are tracked per session so later item-handle commands resolve back to the
|
|
original tag address. If a constrained key presents an unknown item handle,
|
|
the gateway fails closed.
|
|
|
|
Non-bulk constraint failures return gRPC `PermissionDenied`. Bulk read
|
|
commands preserve input order and return a failed `SubscribeResult` for each
|
|
denied item while still forwarding allowed items to the worker. Every denial
|
|
adds an `api_key_audit` entry with the key id, command kind, target, and
|
|
blocking constraint; secured values and raw credentials are never logged.
|
|
|
|
## Scope Catalog
|
|
|
|
`GatewayScopes` is the single source of truth for scope strings. Every entry is currently mapped by either the resolver or another security component:
|
|
|
|
| Constant | Value | Required For |
|
|
|----------|-------|--------------|
|
|
| `SessionOpen` | `session:open` | `OpenSessionRequest` |
|
|
| `SessionClose` | `session:close` | `CloseSessionRequest` |
|
|
| `EventsRead` | `events:read` | `StreamEventsRequest`, `QueryActiveAlarmsRequest`, `MxCommandKind.DrainEvents` |
|
|
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any kind not otherwise mapped) |
|
|
| `InvokeWrite` | `invoke:write` | `AcknowledgeAlarmRequest`, `MxCommandKind.Write`, `MxCommandKind.Write2`, `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk` |
|
|
| `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.WriteSecuredBulk`, `MxCommandKind.WriteSecured2Bulk`, `MxCommandKind.AuthenticateUser` |
|
|
| `MetadataRead` | `metadata:read` | `MxCommandKind.ArchestraUserToId`, `MxCommandKind.GetSessionState`, `MxCommandKind.GetWorkerInfo`, `GalaxyRepository.TestConnection`, `GalaxyRepository.GetLastDeployTime`, `GalaxyRepository.DiscoverHierarchy`, `GalaxyRepository.WatchDeployEvents` |
|
|
| `Admin` | `admin` | `MxCommandKind.ShutdownWorker`, the default for any unrecognized request type, and the dashboard authorization policy |
|
|
|
|
The `Admin` constant is also referenced by `DashboardAuthenticator` and `DashboardAuthorizationHandler` so that the dashboard and the gRPC layer agree on what "admin" means.
|
|
|
|
## Identity Access for Downstream Layers
|
|
|
|
Once authorization passes, `GatewayGrpcAuthorizationInterceptor` calls `identityAccessor.Push(identity)` and disposes the returned scope when the continuation completes. `GatewayRequestIdentityAccessor` stores the active identity in an `AsyncLocal<ApiKeyIdentity?>`, so the value flows across `await` boundaries and child tasks belonging to the same request.
|
|
|
|
```csharp
|
|
public sealed class GatewayRequestIdentityAccessor : IGatewayRequestIdentityAccessor
|
|
{
|
|
private readonly AsyncLocal<ApiKeyIdentity?> currentIdentity = new();
|
|
|
|
public ApiKeyIdentity? Current => currentIdentity.Value;
|
|
|
|
public IDisposable Push(ApiKeyIdentity identity)
|
|
{
|
|
ArgumentNullException.ThrowIfNull(identity);
|
|
|
|
ApiKeyIdentity? previousIdentity = currentIdentity.Value;
|
|
currentIdentity.Value = identity;
|
|
|
|
return new IdentityScope(this, previousIdentity);
|
|
}
|
|
}
|
|
```
|
|
|
|
The returned `IdentityScope` restores the previous value on dispose rather than clearing it. This makes the accessor safe for nested pushes, even though the current interceptor only pushes once per call. Disposing twice is a no-op because of the `disposed` guard inside `IdentityScope`.
|
|
|
|
Downstream code consumes the accessor through the `IGatewayRequestIdentityAccessor` interface:
|
|
|
|
```csharp
|
|
public interface IGatewayRequestIdentityAccessor
|
|
{
|
|
ApiKeyIdentity? Current { get; }
|
|
|
|
IDisposable Push(ApiKeyIdentity identity);
|
|
}
|
|
```
|
|
|
|
`MxAccessGatewayService` takes `IGatewayRequestIdentityAccessor` as a constructor dependency and reads `Current` whenever it needs to attach the calling identity to a domain operation, which keeps the service free of header parsing or scope checks.
|
|
|
|
When `AuthenticationMode.Disabled` is configured, no identity is pushed, so `Current` returns `null`. Downstream code must tolerate that, just as it tolerates the absence of a scope check.
|
|
|
|
## Registration
|
|
|
|
`GrpcAuthorizationServiceCollectionExtensions.AddGatewayGrpcAuthorization` is the single entry point that registers every component and inserts the interceptor into the gRPC pipeline:
|
|
|
|
```csharp
|
|
public static IServiceCollection AddGatewayGrpcAuthorization(this IServiceCollection services)
|
|
{
|
|
services.AddSingleton<GatewayGrpcScopeResolver>();
|
|
services.AddSingleton<IGatewayRequestIdentityAccessor, GatewayRequestIdentityAccessor>();
|
|
services.AddSingleton<GatewayGrpcAuthorizationInterceptor>();
|
|
services.AddGrpc(options => options.Interceptors.Add<GatewayGrpcAuthorizationInterceptor>());
|
|
|
|
return services;
|
|
}
|
|
```
|
|
|
|
Singleton lifetimes are appropriate because none of the three classes hold per-request state on instance fields; the request-scoped value lives inside the `AsyncLocal` on `GatewayRequestIdentityAccessor`. `GatewayApplication` calls `builder.Services.AddGatewayGrpcAuthorization()` during startup, and the call also performs `AddGrpc`, so the gateway never registers gRPC without the interceptor attached.
|
|
|
|
## Related Documentation
|
|
|
|
- [Authentication](./Authentication.md)
|
|
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
|
|
- [Grpc](./Grpc.md)
|
|
- [GatewayConfiguration](./GatewayConfiguration.md)
|
|
- [Galaxy Repository Browse](./GalaxyRepository.md)
|