Client.Go-004: ran gofmt on alarms_test.go and galaxy_test.go; the tree is now gofmt-clean. Client.Go-005/009/010: migrated Dial/DialGalaxy off the deprecated grpc.DialContext/WithBlock to grpc.NewClient via a shared dial helper, with a DialTimeout-bounded readiness probe to keep fail-fast semantics; shared callContext deadline arithmetic; updated the stale Dial doc comment. Test harnesses use passthrough:///bufnet for the NewClient default-scheme change. Client.Go-006: added GatewayError.Code() and an IsTransient(err) helper so callers can classify transient gRPC failures. Client.Go-007: newCorrelationID no longer returns an empty id when crypto/rand fails — it falls back to a non-empty time+counter id. Client.Go-008: added coverage_test.go for transport-credential resolution, callContext deadline arithmetic, and native value/array edge kinds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 KiB
Code Review — Client.Go
| Field | Value |
|---|---|
| Module | clients/go |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | 3cc53a8 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a typed-nil Unwrap/errors.As trap (Client.Go-001), a CLI panic on malformed input (Client.Go-003), empty-string correlation id on rand failure (Client.Go-007). |
| 2 | mxaccessgw conventions | Generally good; two test files fail gofmt, breaking the documented workflow (Client.Go-004). |
| 3 | Concurrency & thread safety | No issues found — stream goroutines and cancellation are sound. |
| 4 | Error handling & resilience | Issues found: the compatibility event path silently drops events (Client.Go-002); no transient/permanent classification (Client.Go-006). |
| 5 | Security | No issues found — TLS by default with a TLS 1.2 floor, API key redaction, no secret logging. |
| 6 | Performance & resource management | No issues found — connections/streams closed via deferred Close/cancel. |
| 7 | Design-document adherence | Issues found: deprecated grpc.DialContext+WithBlock usage and a missing error taxonomy (Client.Go-005, Client.Go-006). |
| 8 | Code organization & conventions | Issue found: duplication between Client and GalaxyClient (Client.Go-009). |
| 9 | Testing coverage | Issue found: TLS path, callContext deadline logic, and NativeValue/NativeArray edges untested (Client.Go-008). |
| 10 | Documentation & comments | Issue found: a stale WithBlock dial-cancellation claim (Client.Go-010). |
Findings
Client.Go-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | clients/go/mxgateway/errors.go:88-93, clients/go/mxgateway/errors.go:117-128 |
| Status | Resolved |
Description: MxAccessError.Unwrap returns e.Command directly. EnsureMxAccessSuccess constructs &MxAccessError{Reply: reply} with Command left nil (the HRESULT / failing-MxStatusProxy path). When Command is a nil *CommandError, Unwrap() returns a non-nil error interface wrapping a nil pointer. Consequently errors.As(err, &ce) for *CommandError returns true while setting ce to nil — a caller writing the idiomatic if errors.As(err, &commandErr) { use commandErr.Status } nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-Command path.
Recommendation: Make Unwrap return an untyped nil when Command is nil: if e == nil || e.Command == nil { return nil }; return e.Command. Add a test for the HRESULT-only MxAccessError asserting errors.As(err, &ce) is false.
Resolution: Resolved 2026-05-18: MxAccessError.Unwrap now returns an untyped nil when Command is nil, so errors.As no longer binds a typed-nil *CommandError; added errors_test.go regression coverage for the HRESULT-only and populated-Command paths.
Client.Go-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | clients/go/mxgateway/session.go:440-516 |
| Status | Resolved |
Description: For the Events/EventsAfter compatibility API (cancelWhenResultBufferFull == true), when the 16-slot results channel is full sendEventResult cancels and returns false; the goroutine returns and close(results) runs — the consumer sees the channel close with no EventResult{Err: ...} ever delivered. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
Recommendation: Before cancelling on a full buffer, deliver a terminal EventResult carrying an explicit error (e.g. ErrEventBufferOverflow). Document the behaviour on Session.Events; steer callers to SubscribeEvents (which blocks instead of dropping).
Resolution: Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed results with no terminal result. Added the exported sentinel ErrEventBufferOverflow (errors.go); sendEventResult now, on a full buffer, cancels the stream then calls the new deliverTerminalResult helper, which evicts one of the oldest buffered events to make room and places EventResult{Err: ErrEventBufferOverflow} so it becomes the consumer's last item before the channel closes. The previously lossy regression test (TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned) was re-pointed to assert the terminal ErrEventBufferOverflow result is delivered. clients/go/README.md now documents the bounded-buffer/overflow behaviour and steers no-loss callers to SubscribeEvents.
Client.Go-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | clients/go/cmd/mxgw-go/main.go:517-532 |
| Status | Resolved |
Description: parseInt32List calls panic(err) when an item-handles token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like -item-handles 1,foo crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in main.go.
Recommendation: Change parseInt32List to return ([]int32, error) and have runUnsubscribeBulk propagate the error, matching parseValue's pattern.
Resolution: Resolved 2026-05-18: confirmed against source — parseInt32List called panic(err) on a malformed token. It now returns ([]int32, error), wrapping the bad token (invalid item handle %q: %w); runUnsubscribeBulk parses item handles before dialing and returns the error, so a typo flows through runWithIO to os.Exit(2) like other validation paths. Regression tests TestParseInt32ListParsesValidTokens and TestParseInt32ListReturnsErrorOnMalformedToken added to cmd/mxgw-go/main_test.go.
Client.Go-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | clients/go/mxgateway/alarms_test.go:153-154, clients/go/mxgateway/galaxy_test.go:58-59 |
| Status | Resolved |
Description: gofmt -l flags alarms_test.go and galaxy_test.go for misaligned struct-literal field padding. The Go client README lists gofmt as part of the workflow and the repo enforces style; unformatted committed code breaks gofmt-gated checks and CI.
Recommendation: Run gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go.
Resolution: Resolved 2026-05-18: confirmed gofmt -l . flagged both files for misaligned struct-literal padding. Ran gofmt -w on mxgateway/alarms_test.go and mxgateway/galaxy_test.go; gofmt -l . is now clean for the whole module.
Client.Go-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | clients/go/mxgateway/client.go:64,68, clients/go/mxgateway/galaxy.go:83,87 |
| Status | Resolved |
Description: The client uses grpc.DialContext with grpc.WithBlock(). In current grpc-go both are deprecated in favour of grpc.NewClient (lazy connection). WithBlock also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard Dial error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
Recommendation: Migrate to grpc.NewClient; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by DialTimeout, and update the doc comment.
Resolution: Resolved 2026-05-18: confirmed Dial/DialGalaxy used the deprecated grpc.DialContext + grpc.WithBlock pair. Migrated both to the shared dial(ctx, opts) helper, which now builds a lazy connection with grpc.NewClient and runs an explicit waitForReady readiness probe (Connect + WaitForStateChange until connectivity.Ready) bounded by DialTimeout — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: grpc.NewClient defaults the target scheme to dns, so the bufconn test harnesses (client_session_test.go, alarms_test.go, galaxy_test.go) were updated to use passthrough:///bufnet so the fake target reaches the context dialer. New tests TestDialFailsFastWhenGatewayUnreachable and TestDialReadinessProbeReachesReady cover the probe; go vet reports no deprecation. clients/go/README.md documents the lazy-connect + readiness-probe semantics.
Client.Go-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/go/mxgateway/errors.go:9-130 |
| Status | Resolved |
Description: docs/ClientLibrariesDesign.md recommends a high-level error taxonomy (TransportError, AuthenticationError, TimeoutError, etc.). The Go client collapses all transport/gRPC failures into a single GatewayError with no way to classify transient (Unavailable, DeadlineExceeded) vs permanent (Unauthenticated, InvalidArgument) without manually unwrapping and calling status.Code.
Recommendation: Add a helper (e.g. IsTransient(err) bool) or expose the gRPC codes.Code on GatewayError, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
Resolution: Resolved 2026-05-18: implemented the recommended classification surface in errors.go rather than a full parallel type hierarchy (the existing GatewayError/CommandError/MxAccessError chain already separates transport from protocol from MXAccess failures). Added GatewayError.Code() (returns the wrapped gRPC codes.Code, OK for nil, Unknown for a non-status error) and the free function IsTransient(err error) bool, which unwraps through *GatewayError and any gRPC-status chain and reports true for Unavailable, DeadlineExceeded, ResourceExhausted, and Aborted. Tests TestGatewayErrorCode and TestIsTransient cover the matrix; clients/go/README.md documents both for retry/timeout/auth handling.
Client.Go-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/go/mxgateway/session.go:526-532 |
| Status | Resolved |
Description: newCorrelationID returns an empty string when crypto/rand.Read fails, silently producing an MxCommandRequest with no correlation id. rand.Read failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
Recommendation: Either propagate the error up through invokeCommand, or fall back to a time/counter-based id rather than an empty string.
Resolution: Resolved 2026-05-18: confirmed newCorrelationID returned "" on a rand.Read failure. It now falls back to a non-empty "fallback-<unixnano>-<counter>" id built from time.Now().UnixNano() and a process-wide atomic.Uint64 monotonic counter, so every command stays traceable even without entropy. The crypto/rand call was routed through a randRead package variable so the failure path is testable; TestNewCorrelationIDFallsBackOnRandFailure simulates a rand.Read failure and asserts the fallback id is non-empty, fallback- prefixed, and unique, and TestNewCorrelationIDUsesRandEntropy pins the happy path.
Client.Go-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/go/mxgateway/ (test files) |
| Status | Resolved |
Description: Several critical paths are untested: TLS credential resolution in resolveTransportCredentials (only the Plaintext path is exercised); the callContext deadline-shortening logic (client.go:198-204) including the negative-timeout disable case; and NativeValue/NativeArray for the array, raw-bytes, null, and unsupported-kind branches.
Recommendation: Add unit tests for resolveTransportCredentials precedence, callContext deadline arithmetic, and NativeValue/NativeArray round-trips for every kind.
Resolution: Resolved 2026-05-18: added clients/go/mxgateway/coverage_test.go. TestResolveTransportCredentialsPrecedence exercises every branch (explicit TransportCredentials, Plaintext, missing CACertFile error, TLSConfig + ServerNameOverride, default TLS floor) and TestResolveTransportCredentialsDoesNotMutateTLSConfig confirms the supplied *tls.Config is cloned. TestCallContextDeadlineArithmetic covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. TestNativeValueEdgeKinds, TestNativeArrayEdgeKinds, and TestNativeValueUnsupportedKind cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches.
Client.Go-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/go/mxgateway/galaxy.go:60-93,241-256, clients/go/mxgateway/client.go:41-74,190-205 |
| Status | Resolved |
Description: DialGalaxy/Dial and GalaxyClient.callContext/Client.callContext are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
Recommendation: Extract a shared unexported dial(ctx, opts) and a free callContext(opts, ctx) function, and have both client constructors call them.
Resolution: Resolved 2026-05-18: extracted the shared unexported dial(ctx, opts) (*grpc.ClientConn, error) (credential resolution, dial-option assembly, grpc.NewClient, readiness probe) and the free callContext(ctx, callTimeout) (context.Context, context.CancelFunc) into client.go. Dial/DialGalaxy and both (*Client).callContext/(*GalaxyClient).callContext methods now delegate to them; the duplicated dial and deadline code in galaxy.go was removed (its now-unused errors import dropped). This was done together with the Client.Go-005 migration so the grpc.NewClient change lives in exactly one place.
Client.Go-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/go/mxgateway/client.go:39-40 |
| Status | Resolved |
Description: The Dial doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated WithBlock behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
Recommendation: Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that DialTimeout bounds the initial connect attempt.
Resolution: Resolved 2026-05-18: alongside the Client.Go-005 migration, the Dial doc comment was rewritten to describe the lazy grpc.NewClient connection, the DialTimeout-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling ctx aborts the probe. DialGalaxy and the new dial/waitForReady/callContext helpers carry matching doc comments.