Files
mxaccessgw/code-reviews/Client.Dotnet/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

31 KiB

Code Review — Client.Dotnet

Field Value
Module clients/dotnet
Reviewer Claude Code
Review date 2026-05-20
Commit reviewed 1cd51bb
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Issue found (this review): the Client.Dotnet-005 fix did not reach the CLI — BenchReadBulkAsync, BenchStreamEventsAsync, and SmokeAsync still fall through to reply.ReturnValue.Int32Value for Register / AddItem handles (Client.Dotnet-010).
2 mxaccessgw conventions Good — consumes the shared contracts project, no forked proto, authorization: Bearer metadata correct, parity preserved via split EnsureProtocolSuccess/EnsureMxAccessSuccess.
3 Concurrency & thread safety Issues found (this review): GalaxyRepositoryClient._disposed is still a plain unsynchronized bool (Client.Dotnet-009) — the symmetric fix from Client.Dotnet-003 was applied only to MxGatewayClient; the new bench-stream-events CLI command races firstSteadyEventUtc/lastSteadyEventUtc across parallel sessions (Client.Dotnet-011).
4 Error handling & resilience No new issues found this review (Client.Dotnet-001 and Client.Dotnet-004 remain resolved).
5 Security Good — API key never logged by the library, CLI redacts keys (incl. env-var-sourced), TLS custom-root validation correct, secured-write payloads never logged.
6 Performance & resource management No issues found — channels and streaming calls disposed correctly.
7 Design-document adherence No issues found — matches DotnetClientDesign.md and ClientLibrariesDesign.md.
8 Code organization & conventions Issues found (this review): the .NET client projects do not inherit src/Directory.Build.props so TreatWarningsAsErrors / EnforceCodeStyleInBuild / AnalysisLevel=latest are silently absent (Client.Dotnet-012); DiscoverHierarchyOptions and the DiscoverHierarchyAsync(DiscoverHierarchyOptions, …) overload have no XML docs (Client.Dotnet-013).
9 Testing coverage Issue found (this review): the SDK-level alarm tests pin the fake-transport raw-RpcException shape but never exercise the production gRPC-to-native mapping (GrpcMxGatewayClientTransport.AcknowledgeAlarmAsync) — the same gap Client.Dotnet-002 closed for Invoke, still open for alarms (Client.Dotnet-014).
10 Documentation & comments No new issues this review.

Findings

Client.Dotnet-001

Field Value
Severity Medium
Category Error handling & resilience
Location clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199, clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140
Status Resolved

Description: MapRpcException only produces typed exceptions for Unauthenticated and PermissionDenied. Every other gRPC status — NotFound, InvalidArgument, ResourceExhausted, FailedPrecondition, Unavailable, Internal — collapses into the base MxGatewayException with no surfaced StatusCode. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into InnerException and downcasting to RpcException.

Recommendation: Carry the gRPC StatusCode on MxGatewayException (e.g. a StatusCode property) and/or add typed subclasses for at least NotFound, InvalidArgument, and Unavailable. Populate it from exception.StatusCode in MapRpcException.

Resolution: (2026-05-18) Confirmed against source: both transports had a duplicated private MapRpcException that only typed two statuses and discarded the gRPC code for the rest. Added a nullable StatusCode property (Grpc.Core.StatusCode?) to MxGatewayException plus constructors that carry it, threaded it through MxGatewayAuthenticationException/MxGatewayAuthorizationException, and extracted the two duplicated mappers into a single shared internal RpcExceptionMapper (RpcExceptionMapper.cs) that populates StatusCode from exception.StatusCode for every status. Callers can now distinguish transient from permanent failures without downcasting InnerException. Documented in clients/dotnet/README.md. Regression test: RpcExceptionMapperTests (8 cases incl. the [Theory] over NotFound/InvalidArgument/ResourceExhausted/FailedPrecondition/Unavailable/Internal).

Client.Dotnet-002

Field Value
Severity Medium
Category Testing coverage
Location clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148, clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256
Status Resolved

Description: The retry predicate MxGatewayClientRetryPolicy.IsTransientGrpcFailure handles two shapes: a raw RpcException and an MxGatewayException { InnerException: RpcException }. In production the transport always maps RpcExceptionMxGatewayException before it reaches the retry pipeline, so only the wrapped-MxGatewayException branch ever runs in production. But FakeGatewayTransport throws the raw RpcException and never maps it, so every retry test exercises only the raw-RpcException branch — the branch that never occurs in production. The production retry behaviour is effectively untested.

Recommendation: Add a fake/transport mode that maps RpcException to MxGatewayException the way GrpcMxGatewayClientTransport does (or add tests that enqueue a pre-wrapped MxGatewayException), so the actually-used predicate branch is covered.

Resolution: (2026-05-18) Confirmed against source: FakeGatewayTransport threw queued exceptions verbatim, so the existing retry tests only ever hit the raw-RpcException predicate branch. Added a MapTransportExceptions flag to FakeGatewayTransport that, when set, runs thrown RpcExceptions through the same shared RpcExceptionMapper the production gRPC transport uses, producing the wrapped MxGatewayException shape. Added regression test MxGatewayClientSessionTests.InvokeAsync_RetriesSafeDiagnosticCommand_WhenTransportMapsRpcException, which exercises the previously-untested production predicate branch. Verified red: removing the MxGatewayException { InnerException: RpcException } case from IsTransientGrpcFailure fails the new test while the pre-existing raw-RpcException test still passes.

Client.Dotnet-003

Field Value
Severity Medium
Category Concurrency & thread safety
Location clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663, clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240
Status Resolved

Description: DisposeAsync calls CloseAsync() (no token) then unconditionally _closeLock.Dispose(). If another thread is concurrently awaiting CloseAsync(token) — legal, since the type exposes public async methods and no single-threaded contract — disposing the SemaphoreSlim while a WaitAsync is pending throws ObjectDisposedException into that caller. The _disposed flags in both clients are also plain unsynchronised bool reads/writes; ThrowIfDisposed racing DisposeAsync can observe a stale value.

Recommendation: Either document MxGatewaySession/MxGatewayClient as not thread-safe for concurrent dispose, or guard _disposed with Interlocked/volatile and avoid disposing _closeLock until all in-flight CloseAsync calls complete.

Resolution: (2026-05-18) Confirmed against source: MxGatewaySession.DisposeAsync disposed _closeLock unconditionally, racing concurrent CloseAsync callers; MxGatewayClient._disposed was a plain bool. Fixed MxGatewaySession by tracking in-flight CloseAsync callers with an _activeCloseCount guarded by a dedicated _disposeGate lock and a _closeLockDisposed flag: CloseAsync registers under the gate (and throws ObjectDisposedException if disposal already won) before awaiting _closeLock.WaitAsync, and DisposeAsync drains _activeCloseCount to zero before disposing the semaphore, so the close lock provably outlives every pending WaitAsync. Fixed MxGatewayClient by changing _disposed to an int accessed via Interlocked.Exchange/Volatile.Read. Regression test MxGatewayClientSessionTests.DisposeAsync_DoesNotRaceConcurrentCloseAsync runs 100 iterations with one close holding the lock and one parked behind it while DisposeAsync runs concurrently; verified red against the original DisposeAsync (fails with ObjectDisposedException), green after the fix.

Client.Dotnet-004

Field Value
Severity Low
Category Error handling & resilience
Location clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294, clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403
Status Resolved

Description: ExecuteSafeUnaryAsync wraps the whole Polly retry pipeline in a single linked CTS cancelled after Options.DefaultCallTimeout, while CreateCallOptions also stamps each individual call with a DefaultCallTimeout gRPC deadline. The retry pipeline therefore shares one DefaultCallTimeout budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe DefaultCallTimeout as a per-call timeout, which misrepresents this. DeadlineExceeded is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.

Recommendation: Decide whether DefaultCallTimeout is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on DeadlineExceeded when the deadline was client-imposed.

Resolution: (2026-05-18) Confirmed against source: the shared linked-CTS budget plus per-call deadline both use DefaultCallTimeout, and IsTransientStatus listed DeadlineExceeded. Resolved as a per-operation budget (the simpler, non-breaking choice): the DefaultCallTimeout XML doc in MxGatewayClientOptions.cs now states it is both the per-attempt gRPC deadline and the overall budget shared across the initial attempt, every retry, and the backoff delays — an upper bound on total wall-clock time, not a fresh per-retry allowance. Removed DeadlineExceeded from MxGatewayClientRetryPolicy.IsTransientStatus: every unary deadline is client-imposed (CreateCallOptions stamps the shared budget), so a DeadlineExceeded means the budget is exhausted and an immediate retry can only fail again. Regression test MxGatewayClientSessionTests.InvokeAsync_DoesNotRetrySafeDiagnosticCommand_OnDeadlineExceeded asserts the safe diagnostic command (Ping) is attempted exactly once and the failure surfaces; verified red against the original transient set (the call retried and succeeded).

Client.Dotnet-005

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175
Status Resolved

Description: RegisterAsync/AddItemAsync/AddItem2Async return reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value. After EnsureMxAccessSuccess() passes, a missing typed payload silently falls back to ReturnValue.Int32Value, which for a reply carrying no return value is 0. A caller then uses 0 as a ServerHandle/ItemHandle, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.

Recommendation: If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive MxGatewayException) rather than falling through to ReturnValue.Int32Value.

Resolution: (2026-05-18) Confirmed against source and mxaccess_gateway.proto: register/add_item/add_item2 are members of the MxCommandReply.payload oneof, so the typed accessor is null whenever the worker did not set that case — and the fallback returned ReturnValue.Int32Value (0 for a reply with no return value). The typed sub-message is the contract for these handle-returning commands, so its absence on an otherwise-successful reply is now an error: RegisterAsync/AddItemAsync/AddItem2Async throw via a new private MxGatewaySession.CreateMissingPayloadException helper that builds a descriptive MxGatewayException naming the missing payload, kind, session, and correlation id. Regression tests MxGatewayClientSessionTests.RegisterAsync_Throws_WhenSuccessfulReplyMissingPayload and AddItemAsync_Throws_WhenSuccessfulReplyMissingPayload enqueue an Ok reply with no typed payload and assert the descriptive throw; verified red against the original fallback (returned 0 instead of throwing).

Client.Dotnet-006

Field Value
Severity Low
Category Code organization & conventions
Location clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50, clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14
Status Resolved

Description: MxGatewayClientOptions.MaxGrpcMessageBytes and the two consts in MxGatewayClientContractInfo are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.

Recommendation: Add <summary> doc comments to MaxGrpcMessageBytes, GatewayProtocolVersion, and WorkerProtocolVersion.

Resolution: (2026-05-18) Confirmed: all three public members lacked XML docs while every other public member in the assembly is documented. Added <summary> comments to MxGatewayClientOptions.MaxGrpcMessageBytes (describing the 16 MiB default applied to both send and receive limits), and to MxGatewayClientContractInfo.GatewayProtocolVersion and WorkerProtocolVersion (describing their wire-compatibility / diagnostics purpose). Pure documentation change — no test needed; build remains warning-clean.

Client.Dotnet-007

Field Value
Severity Low
Category Documentation & comments
Location clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192
Status Resolved

Description: The AcknowledgeAlarmAsync XML comment states the gateway authenticates against an invoke:alarm-ack scope, but CLAUDE.md documents the scope set without any invoke:alarm-ack sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.

Recommendation: Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.

Resolution: (2026-05-18) Confirmed against the server-side authorization model: GatewayGrpcScopeResolver.ResolveRequiredScope has no arm for AcknowledgeAlarmRequest, so it falls to the _ => GatewayScopes.Admin default — the RPC actually requires the admin scope. No invoke:alarm-ack sub-scope exists anywhere in GatewayScopes. The client XML comment on AcknowledgeAlarmAsync was wrong, not the docs. Corrected the comment to state the gateway authorizes AcknowledgeAlarmRequest against the API key's admin scope and that there is no finer-grained alarm-ack sub-scope. Pure documentation change — no test needed.

Client.Dotnet-008

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17
Status Resolved

Description: The CLI redactor only removes the API key string when it was supplied via --api-key; RunCoreAsync passes arguments.GetOptional("api-key") to Redact. When the key comes from an environment variable (--api-key-env, the documented default path), apiKey is null and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.

Recommendation: Resolve the effective API key (same logic as ResolveApiKey) before redacting, so the env-var-sourced key is also stripped from error output.

Resolution: (2026-05-18) Confirmed against source: MxGatewayClientCli.RunCoreAsync's catch block redacted only arguments.GetOptional("api-key"), so an env-var-sourced key (--api-key-env, default MXGATEWAY_API_KEY) was never stripped. Note MxGatewayCliSecretRedactor itself is correct — the defect was the caller passing the wrong value. Extracted a non-throwing TryResolveApiKey helper (used by both the existing ResolveApiKey and the catch block) that resolves --api-key then the --api-key-env environment variable; the catch block now redacts that effective key. Updated clients/dotnet/README.md (smoke paragraph) to state the CLI redacts the effective key whether from --api-key or --api-key-env. Regression test MxGatewayClientCliTests.RunAsync_ErrorOutput_RedactsApiKey_WhenSourcedFromEnvironmentVariable sets a test env var, forces a transport error echoing the key, and asserts the key is absent and [redacted] is present; verified red against the original GetOptional("api-key")-only redaction (key printed unredacted).

Client.Dotnet-009

Field Value
Severity Low
Category Concurrency & thread safety
Location clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:26,339-348,445-448
Status Resolved

Description: Client.Dotnet-003 upgraded MxGatewayClient._disposed to an int accessed via Interlocked.Exchange / Volatile.Read so a concurrent ThrowIfDisposed cannot observe a stale value. The symmetric GalaxyRepositoryClient._disposed is still a plain unsynchronised bool: DisposeAsync reads if (_disposed) then writes _disposed = true without Interlocked or Volatile, and ThrowIfDisposed does an unsynchronised read. The Galaxy client is publicly IAsyncDisposable and exposes TestConnectionAsync / GetLastDeployTimeAsync / DiscoverHierarchyAsync / WatchDeployEventsAsync as legal-to-call-concurrently public APIs, so a concurrent dispose can produce the same torn-read race the gateway client fix prevented. The two clients also exhibit the same shape (gRPC channel + transport + retry pipeline), so the divergence is an accidental inconsistency.

Recommendation: Mirror Client.Dotnet-003 on GalaxyRepositoryClient: change _disposed to an int, use Interlocked.Exchange(ref _disposed, 1) != 0 in DisposeAsync, and Volatile.Read(ref _disposed) != 0 in ThrowIfDisposed. A duplicated MxGatewaySession-style close-lock drain is unnecessary because GalaxyRepositoryClient does not own a per-call SemaphoreSlim.

Resolution: 2026-05-20 — Changed GalaxyRepositoryClient._disposed from bool to int; DisposeAsync now uses Interlocked.Exchange(ref _disposed, 1) != 0 for the once-only guard and ThrowIfDisposed uses Volatile.Read(ref _disposed) != 0, mirroring the Client.Dotnet-003 fix on MxGatewayClient.

Client.Dotnet-010

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:638,896,1261,1279
Status Resolved

Description: Client.Dotnet-005 fixed the silent Register / AddItem / AddItem2 handle-fallback to reply.ReturnValue.Int32Value inside MxGatewaySession, but the same fallback pattern was left in the CLI and is now also present in two new bench commands shipped after that fix. BenchReadBulkAsync (line 638) and BenchStreamEventsAsync (line 896) both do int serverHandle = registerReply.Register?.ServerHandle ?? registerReply.ReturnValue.Int32Value; after a register call, and SmokeAsync (lines 1261 and 1279) passes reply => reply.Register?.ServerHandle ?? reply.ReturnValue.Int32Value and the equivalent AddItem?.ItemHandle selector to InvokeForHandleAsync. After EnsureProtocolSuccess + EnsureMxAccessSuccess pass but the worker did not set the typed register / add_item oneof case, all four call sites silently produce a zero handle and proceed to drive the rest of the smoke / bench against an invalid handle — exactly the failure mode the SDK-level fix prevents.

Recommendation: Either delegate to the SDK helpers (MxGatewaySession.RegisterAsync / AddItemAsync) which already throw the descriptive MxGatewayException via CreateMissingPayloadException, or replicate the same null-check explicitly in InvokeForHandleAsync and the two bench commands. A unit test that enqueues an Ok reply with no typed payload through FakeCliClient and asserts the smoke / bench commands fail loudly would prevent regression.

Resolution: 2026-05-20 — Added private CLI helpers RequireRegisterServerHandle and RequireAddItemItemHandle (with a shared CreateMissingPayloadException mirroring the SDK-level MxGatewaySession helper) that throw a descriptive MxGatewayException when the typed register / add_item payload is absent on an otherwise-successful reply. Replaced all four ?? reply.ReturnValue.Int32Value fallback sites — BenchReadBulkAsync (line 638), BenchStreamEventsAsync (line 896), and both SmokeAsync selectors (lines 1261, 1279) — with these helpers, so the CLI now fails loudly with the same shape as the SDK helpers rather than silently driving the rest of the command against a zero handle.

Client.Dotnet-011

Field Value
Severity Low
Category Concurrency & thread safety
Location clients/dotnet/MxGateway.Client.Cli/MxGatewayClientCli.cs:857-858,922-963,1014-1015
Status Resolved

Description: The new bench-stream-events command (added in commit 1cd51bb) supports --session-count > 1 and runs each session's StreamEvents reader in parallel via openedSessions.Select(RunStreamAsync).ToArray() then Task.WhenAll. Inside the per-session lambda the inner Task.Run-spawned event loop updates two shared DateTime? fields without synchronisation:

if (firstSteadyEventUtc is null)
{
    firstSteadyEventUtc = nowUtc;
}
lastSteadyEventUtc = nowUtc;

The integer counters next to them (steadyEvents, steadyDataChangeEvents, warmupEvents) use Interlocked.Increment, and the latency list uses an explicit lock (latencyLock), so the rest of the loop is data-race-free — but these two DateTime? updates are not. With N parallel sessions a torn read on firstSteadyEventUtc produces a non-deterministic "first event time" and the final steadyElapsedSeconds = (lastSteadyEventUtc.Value - firstSteadyEventUtc.Value).TotalSeconds can compute a slightly wrong window. The user-visible impact is bench-only (skewed eventsPerSecond / dataChangeEventsPerSecond numbers), and on x64 the 64-bit DateTime field read/write happens to be atomic, so this is Low — but the pattern is inconsistent with the rest of the same loop.

Recommendation: Either guard the two DateTime? updates with the existing latencyLock (cheapest), use Interlocked.CompareExchange for firstSteadyEventUtc and Volatile.Write for lastSteadyEventUtc, or aggregate per-session in local variables and reduce after Task.WhenAll. The reduce-after approach also fixes a related issue: today a faster session can stomp firstSteadyEventUtc after a slower one already set it.

Resolution: 2026-05-20 — Guarded the firstSteadyEventUtc / lastSteadyEventUtc reads and writes inside the per-session event loop with the existing latencyLock. firstSteadyEventUtc now uses the null-coalescing assignment firstSteadyEventUtc ??= nowUtc; under the lock so a slower session can't stomp an earlier already-set value. The lock is already held by the latency-list append a few lines below, so the extra cost is one uncontended acquisition per event. The final read in the stats block runs after Task.WhenAll (happens-before applies) and stays lock-free.

Client.Dotnet-012

Field Value
Severity Low
Category Code organization & conventions
Location clients/dotnet/MxGateway.Client/MxGateway.Client.csproj, clients/dotnet/MxGateway.Client.Cli/MxGateway.Client.Cli.csproj, clients/dotnet/MxGateway.Client.Tests/MxGateway.Client.Tests.csproj
Status Resolved

Description: src/Directory.Build.props enforces TreatWarningsAsErrors=true, EnforceCodeStyleInBuild=true, AnalysisLevel=latest, and Deterministic=true for every gateway / worker / contracts project, and CLAUDE.md calls this out as a baseline build property. The .NET client projects live under clients/dotnet/ and there is no Directory.Build.props at clients/ or clients/dotnet/ — so none of those properties apply to MxGateway.Client, MxGateway.Client.Cli, or MxGateway.Client.Tests. New warnings in the client do not break the build, and code-style violations are not blocked at build time. The CSharpStyleGuide.md baseline ("Treat compiler warnings as actionable") and the CLAUDE.md table under "Source Update Workflow" both apply equally to .NET client ("dotnet build clients/dotnet/MxGateway.Client.sln"), but the enforcement floor is missing.

Recommendation: Add clients/dotnet/Directory.Build.props (or clients/Directory.Build.props covering Rust-Cargo siblings is N/A — only clients/dotnet/) carrying the same property set: TreatWarningsAsErrors=true, EnforceCodeStyleInBuild=true, AnalysisLevel=latest, Deterministic=true. Excluding generated code (which already lives under src/MxGateway.Contracts/Generated) is automatic because the client only references the contracts project. Build the client locally after adding it to confirm no warnings already snuck in.

Resolution: 2026-05-20 — Added clients/dotnet/Directory.Build.props mirroring src/Directory.Build.props: LangVersion=latest, Nullable=enable, ImplicitUsings=enable, TreatWarningsAsErrors=true, AnalysisLevel=latest, EnforceCodeStyleInBuild=true, Deterministic=true. The three client .csproj files inherit from it automatically. Re-ran dotnet build clients/dotnet/MxGateway.Client.sln and confirmed 0 warnings / 0 errors — no pre-existing warnings were silently being tolerated.

Client.Dotnet-013

Field Value
Severity Low
Category Code organization & conventions
Location clients/dotnet/MxGateway.Client/DiscoverHierarchyOptions.cs:3-24, clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:185-187, clients/dotnet/MxGateway.Client.Cli/IMxGatewayCliClient.cs:6
Status Resolved

Description: Client.Dotnet-006 fixed three undocumented public members. Three more remain undocumented in code paths the prior review didn't visit:

  • DiscoverHierarchyOptions (the public record) has no <summary> on the type and no XML doc on any of its ten public properties (RootGobjectId, RootTagName, RootContainedPath, MaxDepth, CategoryIds, TemplateChainContains, TagNameGlob, IncludeAttributes, AlarmBearingOnly, HistorizedOnly).
  • The second DiscoverHierarchyAsync(DiscoverHierarchyOptions, CancellationToken) overload on GalaxyRepositoryClient is public with no XML doc, while the parameterless overload one method above it carries a full <summary> / <param> block.
  • IMxGatewayCliClient is a public interface in the CLI project with no <summary> on the type (the member docs are present).

This is the same convention-violation shape Client.Dotnet-006 closed; CLAUDE.md style guidance describes XML docs on the public surface as the baseline expectation.

Recommendation: Add <summary> docs to each undocumented member. For DiscoverHierarchyOptions, the property names map cleanly to the underlying DiscoverHierarchyRequest proto fields — a one-line summary per property and a type-level summary tying the record to the Galaxy hierarchy browse is enough. The CLI interface only needs a type-level summary; the members already document themselves.

Resolution: 2026-05-20 — Added XML docs to all three call sites: a type-level summary plus a one-line summary per property on DiscoverHierarchyOptions (ten properties, mapped to the underlying DiscoverHierarchyRequest proto fields and noting the root-precedence rule); a <summary>/<param>/<returns> block on the second DiscoverHierarchyAsync(DiscoverHierarchyOptions, CancellationToken) overload describing its filter semantics and transparent pagination; and a type-level <summary> on the public IMxGatewayCliClient interface explaining its CLI-only transport role and the production binding.

Client.Dotnet-014

Field Value
Severity Low
Category Testing coverage
Location clients/dotnet/MxGateway.Client.Tests/MxGatewayClientAlarmsTests.cs:76-98, clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:212-231
Status Resolved

Description: Client.Dotnet-002 closed a coverage gap where the production retry path (RpcExceptionMxGatewayException mapping by RpcExceptionMapper.Map) was never exercised, by adding a MapTransportExceptions flag to FakeGatewayTransport and a regression test that runs through the wrapped-exception branch. That flag is wired through Translate(...) in OpenSessionAsync / CloseSessionAsync / InvokeAsync, but the new alarm test path is not: FakeGatewayTransport.AcknowledgeAlarmAsync throws the queued exception verbatim (line 219), bypassing Translate. The accompanying MxGatewayClientAlarmsTests.AcknowledgeAlarmAsync_MapsUnauthenticated_RpcException_ToTypedException test acknowledges this in a comment ("Note: the FakeGatewayTransport surfaces RpcException directly … the SDK-level test pins the pass-through shape so a future migration to direct mapping won't silently change observable behaviour") and asserts Assert.ThrowsAsync<RpcException> — but the production path through GrpcMxGatewayClientTransport.AcknowledgeAlarmAsync (lines 120-134) already calls RpcExceptionMapper.Map, so production callers see MxGatewayAuthenticationException and not RpcException. The test name advertises mapping that the SDK-level harness doesn't exercise, and any callable from MxGatewayClient.AcknowledgeAlarmAsync cannot regress on the alarm-ack mapping without somebody noticing.

Recommendation: Either route FakeGatewayTransport.AcknowledgeAlarmAsync through the same Translate helper the other RPCs use and add a regression test that enables MapTransportExceptions = true and asserts MxGatewayAuthenticationException; or rename the existing test to make the pass-through shape explicit (e.g. …_SurfacesRpcExceptionFromFakeTransportVerbatim) and add a second test exercising the production mapping. Either fix closes the alarm-side equivalent of the gap Client.Dotnet-002 closed for Invoke.

Resolution: 2026-05-20 — Applied both halves of the recommendation. Routed FakeGatewayTransport.AcknowledgeAlarmAsync through the same Translate helper the other RPCs use, so when MapTransportExceptions = true thrown RpcExceptions now run through the production RpcExceptionMapper.Map. Renamed the existing pass-through test to AcknowledgeAlarmAsync_SurfacesRpcExceptionFromFakeTransportVerbatim_WhenMappingDisabled (with an updated comment pinning that this shape only applies when mapping is off), and added a new test AcknowledgeAlarmAsync_MapsUnauthenticated_RpcException_ToTypedException that enables mapping and asserts the production-parity MxGatewayAuthenticationException with StatusCode.Unauthenticated. Closes the alarm-side equivalent of the gap Client.Dotnet-002 closed for Invoke.