Client.Dotnet-004: documented DefaultCallTimeout as both the per-attempt deadline and the shared retry budget, and removed DeadlineExceeded from the transient-retry set (a client-imposed deadline cannot be helped by retrying). Client.Dotnet-005: RegisterAsync/AddItemAsync/AddItem2Async silently returned 0 when a successful reply lacked the typed payload. They now throw a descriptive MxGatewayException. Client.Dotnet-006: added XML docs to the previously undocumented public members MaxGrpcMessageBytes, GatewayProtocolVersion, WorkerProtocolVersion. Client.Dotnet-007: corrected the AcknowledgeAlarmAsync XML comment — the RPC requires the admin scope, not a non-existent invoke:alarm-ack sub-scope. Client.Dotnet-008: the CLI redactor missed env-var-sourced keys because the caller passed only the --api-key option. Redaction now uses the same resolver, stripping env-var keys too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
Code Review — Client.Dotnet
| Field | Value |
|---|---|
| Module | clients/dotnet |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | 3cc53a8 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Minor: handle-selector fallback ?? reply.ReturnValue.Int32Value can mask a missing typed reply (Client.Dotnet-005); CLI redactor misses env-var keys (Client.Dotnet-008). |
| 2 | mxaccessgw conventions | Good — consumes the shared contracts project, no forked proto, authorization: Bearer metadata correct, parity preserved via split EnsureProtocolSuccess/EnsureMxAccessSuccess. |
| 3 | Concurrency & thread safety | Issue found: _disposed flags unsynchronized; MxGatewaySession.DisposeAsync can race a concurrent CloseAsync (Client.Dotnet-003). |
| 4 | Error handling & resilience | Issues found: gRPC-to-native mapping collapses non-auth statuses into one untyped exception (Client.Dotnet-001); shared retry/timeout budget (Client.Dotnet-004). |
| 5 | Security | Good — API key never logged by the library, CLI redacts keys, TLS custom-root validation correct. |
| 6 | Performance & resource management | No issues found — channels and streaming calls disposed correctly. |
| 7 | Design-document adherence | No issues found — matches ClientLibrariesDesign.md. |
| 8 | Code organization & conventions | Issue found: undocumented public members (Client.Dotnet-006). |
| 9 | Testing coverage | Issue found: the production retry path is never exercised (Client.Dotnet-002). |
| 10 | Documentation & comments | Issue found: doc misstates the unary timeout retry budget as per-call (Client.Dotnet-004, Client.Dotnet-007). |
Findings
Client.Dotnet-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199, clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140 |
| Status | Resolved |
Description: MapRpcException only produces typed exceptions for Unauthenticated and PermissionDenied. Every other gRPC status — NotFound, InvalidArgument, ResourceExhausted, FailedPrecondition, Unavailable, Internal — collapses into the base MxGatewayException with no surfaced StatusCode. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into InnerException and downcasting to RpcException.
Recommendation: Carry the gRPC StatusCode on MxGatewayException (e.g. a StatusCode property) and/or add typed subclasses for at least NotFound, InvalidArgument, and Unavailable. Populate it from exception.StatusCode in MapRpcException.
Resolution: (2026-05-18) Confirmed against source: both transports had a duplicated private MapRpcException that only typed two statuses and discarded the gRPC code for the rest. Added a nullable StatusCode property (Grpc.Core.StatusCode?) to MxGatewayException plus constructors that carry it, threaded it through MxGatewayAuthenticationException/MxGatewayAuthorizationException, and extracted the two duplicated mappers into a single shared internal RpcExceptionMapper (RpcExceptionMapper.cs) that populates StatusCode from exception.StatusCode for every status. Callers can now distinguish transient from permanent failures without downcasting InnerException. Documented in clients/dotnet/README.md. Regression test: RpcExceptionMapperTests (8 cases incl. the [Theory] over NotFound/InvalidArgument/ResourceExhausted/FailedPrecondition/Unavailable/Internal).
Client.Dotnet-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148, clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256 |
| Status | Resolved |
Description: The retry predicate MxGatewayClientRetryPolicy.IsTransientGrpcFailure handles two shapes: a raw RpcException and an MxGatewayException { InnerException: RpcException }. In production the transport always maps RpcException → MxGatewayException before it reaches the retry pipeline, so only the wrapped-MxGatewayException branch ever runs in production. But FakeGatewayTransport throws the raw RpcException and never maps it, so every retry test exercises only the raw-RpcException branch — the branch that never occurs in production. The production retry behaviour is effectively untested.
Recommendation: Add a fake/transport mode that maps RpcException to MxGatewayException the way GrpcMxGatewayClientTransport does (or add tests that enqueue a pre-wrapped MxGatewayException), so the actually-used predicate branch is covered.
Resolution: (2026-05-18) Confirmed against source: FakeGatewayTransport threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-RpcException predicate branch. Added a MapTransportExceptions flag to FakeGatewayTransport that, when set, runs thrown RpcExceptions through the same shared RpcExceptionMapper the production gRPC transport uses, producing the wrapped MxGatewayException shape. Added regression test MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException, which exercises the previously-untested production predicate branch. Verified red: removing the MxGatewayException { InnerException: RpcException } case from IsTransientGrpcFailure fails the new test while the pre-existing raw-RpcException test still passes.
Client.Dotnet-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663, clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240 |
| Status | Resolved |
Description: DisposeAsync calls CloseAsync() (no token) then unconditionally _closeLock.Dispose(). If another thread is concurrently awaiting CloseAsync(token) — legal, since the type exposes public async methods and no single-threaded contract — disposing the SemaphoreSlim while a WaitAsync is pending throws ObjectDisposedException into that caller. The _disposed flags in both clients are also plain unsynchronised bool reads/writes; ThrowIfDisposed racing DisposeAsync can observe a stale value.
Recommendation: Either document MxGatewaySession/MxGatewayClient as not thread-safe for concurrent dispose, or guard _disposed with Interlocked/volatile and avoid disposing _closeLock until all in-flight CloseAsync calls complete.
Resolution: (2026-05-18) Confirmed against source: MxGatewaySession.DisposeAsync disposed _closeLock unconditionally, racing concurrent CloseAsync callers; MxGatewayClient._disposed was a plain bool. Fixed MxGatewaySession by tracking in-flight CloseAsync callers with an _activeCloseCount guarded by a dedicated _disposeGate lock and a _closeLockDisposed flag: CloseAsync registers under the gate (and throws ObjectDisposedException if disposal already won) before awaiting _closeLock.WaitAsync, and DisposeAsync drains _activeCloseCount to zero before disposing the semaphore, so the close lock provably outlives every pending WaitAsync. Fixed MxGatewayClient by changing _disposed to an int accessed via Interlocked.Exchange/Volatile.Read. Regression test MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync runs 100 iterations with one close holding the lock and one parked behind it while DisposeAsync runs concurrently; verified red against the original DisposeAsync (fails with ObjectDisposedException), green after the fix.
Client.Dotnet-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294, clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403 |
| Status | Resolved |
Description: ExecuteSafeUnaryAsync wraps the whole Polly retry pipeline in a single linked CTS cancelled after Options.DefaultCallTimeout, while CreateCallOptions also stamps each individual call with a DefaultCallTimeout gRPC deadline. The retry pipeline therefore shares one DefaultCallTimeout budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe DefaultCallTimeout as a per-call timeout, which misrepresents this. DeadlineExceeded is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.
Recommendation: Decide whether DefaultCallTimeout is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on DeadlineExceeded when the deadline was client-imposed.
Resolution: (2026-05-18) Confirmed against source: the shared linked-CTS budget plus per-call deadline both use DefaultCallTimeout, and IsTransientStatus listed DeadlineExceeded. Resolved as a per-operation budget (the simpler, non-breaking choice): the DefaultCallTimeout XML doc in MxGatewayClientOptions.cs now states it is both the per-attempt gRPC deadline and the overall budget shared across the initial attempt, every retry, and the backoff delays — an upper bound on total wall-clock time, not a fresh per-retry allowance. Removed DeadlineExceeded from MxGatewayClientRetryPolicy.IsTransientStatus: every unary deadline is client-imposed (CreateCallOptions stamps the shared budget), so a DeadlineExceeded means the budget is exhausted and an immediate retry can only fail again. Regression test MxGatewayClientSessionTests.InvokeAsync_DoesNotRetrySafeDiagnosticCommand_OnDeadlineExceeded asserts the safe diagnostic command (Ping) is attempted exactly once and the failure surfaces; verified red against the original transient set (the call retried and succeeded).
Client.Dotnet-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175 |
| Status | Resolved |
Description: RegisterAsync/AddItemAsync/AddItem2Async return reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value. After EnsureMxAccessSuccess() passes, a missing typed payload silently falls back to ReturnValue.Int32Value, which for a reply carrying no return value is 0. A caller then uses 0 as a ServerHandle/ItemHandle, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.
Recommendation: If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive MxGatewayException) rather than falling through to ReturnValue.Int32Value.
Resolution: (2026-05-18) Confirmed against source and mxaccess_gateway.proto: register/add_item/add_item2 are members of the MxCommandReply.payload oneof, so the typed accessor is null whenever the worker did not set that case — and the fallback returned ReturnValue.Int32Value (0 for a reply with no return value). The typed sub-message is the contract for these handle-returning commands, so its absence on an otherwise-successful reply is now an error: RegisterAsync/AddItemAsync/AddItem2Async throw via a new private MxGatewaySession.CreateMissingPayloadException helper that builds a descriptive MxGatewayException naming the missing payload, kind, session, and correlation id. Regression tests MxGatewayClientSessionTests.RegisterAsync_Throws_WhenSuccessfulReplyMissingPayload and AddItemAsync_Throws_WhenSuccessfulReplyMissingPayload enqueue an Ok reply with no typed payload and assert the descriptive throw; verified red against the original fallback (returned 0 instead of throwing).
Client.Dotnet-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50, clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14 |
| Status | Resolved |
Description: MxGatewayClientOptions.MaxGrpcMessageBytes and the two consts in MxGatewayClientContractInfo are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.
Recommendation: Add <summary> doc comments to MaxGrpcMessageBytes, GatewayProtocolVersion, and WorkerProtocolVersion.
Resolution: (2026-05-18) Confirmed: all three public members lacked XML docs while every other public member in the assembly is documented. Added <summary> comments to MxGatewayClientOptions.MaxGrpcMessageBytes (describing the 16 MiB default applied to both send and receive limits), and to MxGatewayClientContractInfo.GatewayProtocolVersion and WorkerProtocolVersion (describing their wire-compatibility / diagnostics purpose). Pure documentation change — no test needed; build remains warning-clean.
Client.Dotnet-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192 |
| Status | Resolved |
Description: The AcknowledgeAlarmAsync XML comment states the gateway authenticates against an invoke:alarm-ack scope, but CLAUDE.md documents the scope set without any invoke:alarm-ack sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.
Recommendation: Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.
Resolution: (2026-05-18) Confirmed against the server-side authorization model: GatewayGrpcScopeResolver.ResolveRequiredScope has no arm for AcknowledgeAlarmRequest, so it falls to the _ => GatewayScopes.Admin default — the RPC actually requires the admin scope. No invoke:alarm-ack sub-scope exists anywhere in GatewayScopes. The client XML comment on AcknowledgeAlarmAsync was wrong, not the docs. Corrected the comment to state the gateway authorizes AcknowledgeAlarmRequest against the API key's admin scope and that there is no finer-grained alarm-ack sub-scope. Pure documentation change — no test needed.
Client.Dotnet-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17 |
| Status | Resolved |
Description: The CLI redactor only removes the API key string when it was supplied via --api-key; RunCoreAsync passes arguments.GetOptional("api-key") to Redact. When the key comes from an environment variable (--api-key-env, the documented default path), apiKey is null and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.
Recommendation: Resolve the effective API key (same logic as ResolveApiKey) before redacting, so the env-var-sourced key is also stripped from error output.
Resolution: (2026-05-18) Confirmed against source: MxGatewayClientCli.RunCoreAsync's catch block redacted only arguments.GetOptional("api-key"), so an env-var-sourced key (--api-key-env, default MXGATEWAY_API_KEY) was never stripped. Note MxGatewayCliSecretRedactor itself is correct — the defect was the caller passing the wrong value. Extracted a non-throwing TryResolveApiKey helper (used by both the existing ResolveApiKey and the catch block) that resolves --api-key then the --api-key-env environment variable; the catch block now redacts that effective key. Updated clients/dotnet/README.md (smoke paragraph) to state the CLI redacts the effective key whether from --api-key or --api-key-env. Regression test MxGatewayClientCliTests.RunAsync_ErrorOutput_RedactsApiKey_WhenSourcedFromEnvironmentVariable sets a test env var, forces a transport error echoing the key, and asserts the key is absent and [redacted] is present; verified red against the original GetOptional("api-key")-only redaction (key printed unredacted).