Files
mxaccessgw/code-reviews/Client.Dotnet/findings.md
T
Joseph Doherty 8023eccfa6 Resolve Client.Dotnet-001, -002, -003 code-review findings
Client.Dotnet-001: MapRpcException typed only Unauthenticated and
PermissionDenied; every other gRPC status collapsed to an untyped exception
with the status code discarded. Added a nullable StatusCode to
MxGatewayException, extracted the duplicated mappers into a shared
RpcExceptionMapper that records the code for every status, and documented it.

Client.Dotnet-002: the production retry branch (MxGatewayException wrapping
RpcException) was never exercised. FakeGatewayTransport gained a
MapTransportExceptions mode that runs thrown RpcExceptions through
RpcExceptionMapper exactly as the production transport does.

Client.Dotnet-003: MxGatewaySession.DisposeAsync disposed _closeLock while a
concurrent CloseAsync could be parked in WaitAsync. DisposeAsync now drains
in-flight CloseAsync callers before disposing the semaphore; the client's
_disposed flag is accessed via Interlocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:33 -04:00

12 KiB

Code Review — Client.Dotnet

Field Value
Module clients/dotnet
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 3cc53a8
Status Reviewed
Open findings 5

Checklist coverage

# Category Result
1 Correctness & logic bugs Minor: handle-selector fallback ?? reply.ReturnValue.Int32Value can mask a missing typed reply (Client.Dotnet-005); CLI redactor misses env-var keys (Client.Dotnet-008).
2 mxaccessgw conventions Good — consumes the shared contracts project, no forked proto, authorization: Bearer metadata correct, parity preserved via split EnsureProtocolSuccess/EnsureMxAccessSuccess.
3 Concurrency & thread safety Issue found: _disposed flags unsynchronized; MxGatewaySession.DisposeAsync can race a concurrent CloseAsync (Client.Dotnet-003).
4 Error handling & resilience Issues found: gRPC-to-native mapping collapses non-auth statuses into one untyped exception (Client.Dotnet-001); shared retry/timeout budget (Client.Dotnet-004).
5 Security Good — API key never logged by the library, CLI redacts keys, TLS custom-root validation correct.
6 Performance & resource management No issues found — channels and streaming calls disposed correctly.
7 Design-document adherence No issues found — matches ClientLibrariesDesign.md.
8 Code organization & conventions Issue found: undocumented public members (Client.Dotnet-006).
9 Testing coverage Issue found: the production retry path is never exercised (Client.Dotnet-002).
10 Documentation & comments Issue found: doc misstates the unary timeout retry budget as per-call (Client.Dotnet-004, Client.Dotnet-007).

Findings

Client.Dotnet-001

Field Value
Severity Medium
Category Error handling & resilience
Location clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199, clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140
Status Resolved

Description: MapRpcException only produces typed exceptions for Unauthenticated and PermissionDenied. Every other gRPC status — NotFound, InvalidArgument, ResourceExhausted, FailedPrecondition, Unavailable, Internal — collapses into the base MxGatewayException with no surfaced StatusCode. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into InnerException and downcasting to RpcException.

Recommendation: Carry the gRPC StatusCode on MxGatewayException (e.g. a StatusCode property) and/or add typed subclasses for at least NotFound, InvalidArgument, and Unavailable. Populate it from exception.StatusCode in MapRpcException.

Resolution: (2026-05-18) Confirmed against source: both transports had a duplicated private MapRpcException that only typed two statuses and discarded the gRPC code for the rest. Added a nullable StatusCode property (Grpc.Core.StatusCode?) to MxGatewayException plus constructors that carry it, threaded it through MxGatewayAuthenticationException/MxGatewayAuthorizationException, and extracted the two duplicated mappers into a single shared internal RpcExceptionMapper (RpcExceptionMapper.cs) that populates StatusCode from exception.StatusCode for every status. Callers can now distinguish transient from permanent failures without downcasting InnerException. Documented in clients/dotnet/README.md. Regression test: RpcExceptionMapperTests (8 cases incl. the [Theory] over NotFound/InvalidArgument/ResourceExhausted/FailedPrecondition/Unavailable/Internal).

Client.Dotnet-002

Field Value
Severity Medium
Category Testing coverage
Location clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148, clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256
Status Resolved

Description: The retry predicate MxGatewayClientRetryPolicy.IsTransientGrpcFailure handles two shapes: a raw RpcException and an MxGatewayException { InnerException: RpcException }. In production the transport always maps RpcExceptionMxGatewayException before it reaches the retry pipeline, so only the wrapped-MxGatewayException branch ever runs in production. But FakeGatewayTransport throws the raw RpcException and never maps it, so every retry test exercises only the raw-RpcException branch — the branch that never occurs in production. The production retry behaviour is effectively untested.

Recommendation: Add a fake/transport mode that maps RpcException to MxGatewayException the way GrpcMxGatewayClientTransport does (or add tests that enqueue a pre-wrapped MxGatewayException), so the actually-used predicate branch is covered.

Resolution: (2026-05-18) Confirmed against source: FakeGatewayTransport threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-RpcException predicate branch. Added a MapTransportExceptions flag to FakeGatewayTransport that, when set, runs thrown RpcExceptions through the same shared RpcExceptionMapper the production gRPC transport uses, producing the wrapped MxGatewayException shape. Added regression test MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException, which exercises the previously-untested production predicate branch. Verified red: removing the MxGatewayException { InnerException: RpcException } case from IsTransientGrpcFailure fails the new test while the pre-existing raw-RpcException test still passes.

Client.Dotnet-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663, clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240
Status Resolved

Description: DisposeAsync calls CloseAsync() (no token) then unconditionally _closeLock.Dispose(). If another thread is concurrently awaiting CloseAsync(token) — legal, since the type exposes public async methods and no single-threaded contract — disposing the SemaphoreSlim while a WaitAsync is pending throws ObjectDisposedException into that caller. The _disposed flags in both clients are also plain unsynchronised bool reads/writes; ThrowIfDisposed racing DisposeAsync can observe a stale value.

Recommendation: Either document MxGatewaySession/MxGatewayClient as not thread-safe for concurrent dispose, or guard _disposed with Interlocked/volatile and avoid disposing _closeLock until all in-flight CloseAsync calls complete.

Resolution: (2026-05-18) Confirmed against source: MxGatewaySession.DisposeAsync disposed _closeLock unconditionally, racing concurrent CloseAsync callers; MxGatewayClient._disposed was a plain bool. Fixed MxGatewaySession by tracking in-flight CloseAsync callers with an _activeCloseCount guarded by a dedicated _disposeGate lock and a _closeLockDisposed flag: CloseAsync registers under the gate (and throws ObjectDisposedException if disposal already won) before awaiting _closeLock.WaitAsync, and DisposeAsync drains _activeCloseCount to zero before disposing the semaphore, so the close lock provably outlives every pending WaitAsync. Fixed MxGatewayClient by changing _disposed to an int accessed via Interlocked.Exchange/Volatile.Read. Regression test MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync runs 100 iterations with one close holding the lock and one parked behind it while DisposeAsync runs concurrently; verified red against the original DisposeAsync (fails with ObjectDisposedException), green after the fix.

Client.Dotnet-004

Field Value
Severity Low
Category Error handling & resilience
Location clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294, clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403
Status Open

Description: ExecuteSafeUnaryAsync wraps the whole Polly retry pipeline in a single linked CTS cancelled after Options.DefaultCallTimeout, while CreateCallOptions also stamps each individual call with a DefaultCallTimeout gRPC deadline. The retry pipeline therefore shares one DefaultCallTimeout budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe DefaultCallTimeout as a per-call timeout, which misrepresents this. DeadlineExceeded is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.

Recommendation: Decide whether DefaultCallTimeout is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on DeadlineExceeded when the deadline was client-imposed.

Resolution: (open)

Client.Dotnet-005

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175
Status Open

Description: RegisterAsync/AddItemAsync/AddItem2Async return reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value. After EnsureMxAccessSuccess() passes, a missing typed payload silently falls back to ReturnValue.Int32Value, which for a reply carrying no return value is 0. A caller then uses 0 as a ServerHandle/ItemHandle, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.

Recommendation: If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive MxGatewayException) rather than falling through to ReturnValue.Int32Value.

Resolution: (open)

Client.Dotnet-006

Field Value
Severity Low
Category Code organization & conventions
Location clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50, clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14
Status Open

Description: MxGatewayClientOptions.MaxGrpcMessageBytes and the two consts in MxGatewayClientContractInfo are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.

Recommendation: Add <summary> doc comments to MaxGrpcMessageBytes, GatewayProtocolVersion, and WorkerProtocolVersion.

Resolution: (open)

Client.Dotnet-007

Field Value
Severity Low
Category Documentation & comments
Location clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192
Status Open

Description: The AcknowledgeAlarmAsync XML comment states the gateway authenticates against an invoke:alarm-ack scope, but CLAUDE.md documents the scope set without any invoke:alarm-ack sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.

Recommendation: Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.

Resolution: (open)

Client.Dotnet-008

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17
Status Open

Description: The CLI redactor only removes the API key string when it was supplied via --api-key; RunCoreAsync passes arguments.GetOptional("api-key") to Redact. When the key comes from an environment variable (--api-key-env, the documented default path), apiKey is null and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.

Recommendation: Resolve the effective API key (same logic as ResolveApiKey) before redacting, so the env-var-sourced key is also stripped from error output.

Resolution: (open)