Files
mxaccessgw/code-reviews/Client.Go/findings.md
T
Joseph Doherty e967e85973 Resolve Client.Go-001 code-review finding
MxAccessError.Unwrap returned e.Command directly; on the HRESULT-only path
Command is a nil *CommandError, so Unwrap returned a non-nil error wrapping
a typed nil and errors.As bound a nil *CommandError. Unwrap now returns an
untyped nil when Command is nil. Added errors_test.go regression coverage
for the HRESULT-only and populated-Command paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:46:12 -04:00

9.6 KiB

Code Review — Client.Go

Field Value
Module clients/go
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 3cc53a8
Status Reviewed
Open findings 9

Checklist coverage

# Category Result
1 Correctness & logic bugs Issues found: a typed-nil Unwrap/errors.As trap (Client.Go-001), a CLI panic on malformed input (Client.Go-003), empty-string correlation id on rand failure (Client.Go-007).
2 mxaccessgw conventions Generally good; two test files fail gofmt, breaking the documented workflow (Client.Go-004).
3 Concurrency & thread safety No issues found — stream goroutines and cancellation are sound.
4 Error handling & resilience Issues found: the compatibility event path silently drops events (Client.Go-002); no transient/permanent classification (Client.Go-006).
5 Security No issues found — TLS by default with a TLS 1.2 floor, API key redaction, no secret logging.
6 Performance & resource management No issues found — connections/streams closed via deferred Close/cancel.
7 Design-document adherence Issues found: deprecated grpc.DialContext+WithBlock usage and a missing error taxonomy (Client.Go-005, Client.Go-006).
8 Code organization & conventions Issue found: duplication between Client and GalaxyClient (Client.Go-009).
9 Testing coverage Issue found: TLS path, callContext deadline logic, and NativeValue/NativeArray edges untested (Client.Go-008).
10 Documentation & comments Issue found: a stale WithBlock dial-cancellation claim (Client.Go-010).

Findings

Client.Go-001

Field Value
Severity High
Category Correctness & logic bugs
Location clients/go/mxgateway/errors.go:88-93, clients/go/mxgateway/errors.go:117-128
Status Resolved

Description: MxAccessError.Unwrap returns e.Command directly. EnsureMxAccessSuccess constructs &MxAccessError{Reply: reply} with Command left nil (the HRESULT / failing-MxStatusProxy path). When Command is a nil *CommandError, Unwrap() returns a non-nil error interface wrapping a nil pointer. Consequently errors.As(err, &ce) for *CommandError returns true while setting ce to nil — a caller writing the idiomatic if errors.As(err, &commandErr) { use commandErr.Status } nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-Command path.

Recommendation: Make Unwrap return an untyped nil when Command is nil: if e == nil || e.Command == nil { return nil }; return e.Command. Add a test for the HRESULT-only MxAccessError asserting errors.As(err, &ce) is false.

Resolution: Resolved 2026-05-18: MxAccessError.Unwrap now returns an untyped nil when Command is nil, so errors.As no longer binds a typed-nil *CommandError; added errors_test.go regression coverage for the HRESULT-only and populated-Command paths.

Client.Go-002

Field Value
Severity Medium
Category Error handling & resilience
Location clients/go/mxgateway/session.go:440-516
Status Open

Description: For the Events/EventsAfter compatibility API (cancelWhenResultBufferFull == true), when the 16-slot results channel is full sendEventResult cancels and returns false; the goroutine returns and close(results) runs — the consumer sees the channel close with no EventResult{Err: ...} ever delivered. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.

Recommendation: Before cancelling on a full buffer, deliver a terminal EventResult carrying an explicit error (e.g. ErrEventBufferOverflow). Document the behaviour on Session.Events; steer callers to SubscribeEvents (which blocks instead of dropping).

Resolution: (open)

Client.Go-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location clients/go/cmd/mxgw-go/main.go:517-532
Status Open

Description: parseInt32List calls panic(err) when an item-handles token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like -item-handles 1,foo crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in main.go.

Recommendation: Change parseInt32List to return ([]int32, error) and have runUnsubscribeBulk propagate the error, matching parseValue's pattern.

Resolution: (open)

Client.Go-004

Field Value
Severity Low
Category mxaccessgw conventions
Location clients/go/mxgateway/alarms_test.go:153-154, clients/go/mxgateway/galaxy_test.go:58-59
Status Open

Description: gofmt -l flags alarms_test.go and galaxy_test.go for misaligned struct-literal field padding. The Go client README lists gofmt as part of the workflow and the repo enforces style; unformatted committed code breaks gofmt-gated checks and CI.

Recommendation: Run gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go.

Resolution: (open)

Client.Go-005

Field Value
Severity Low
Category Design-document adherence
Location clients/go/mxgateway/client.go:64,68, clients/go/mxgateway/galaxy.go:83,87
Status Open

Description: The client uses grpc.DialContext with grpc.WithBlock(). In current grpc-go both are deprecated in favour of grpc.NewClient (lazy connection). WithBlock also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard Dial error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.

Recommendation: Migrate to grpc.NewClient; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by DialTimeout, and update the doc comment.

Resolution: (open)

Client.Go-006

Field Value
Severity Low
Category Error handling & resilience
Location clients/go/mxgateway/errors.go:9-130
Status Open

Description: docs/ClientLibrariesDesign.md recommends a high-level error taxonomy (TransportError, AuthenticationError, TimeoutError, etc.). The Go client collapses all transport/gRPC failures into a single GatewayError with no way to classify transient (Unavailable, DeadlineExceeded) vs permanent (Unauthenticated, InvalidArgument) without manually unwrapping and calling status.Code.

Recommendation: Add a helper (e.g. IsTransient(err) bool) or expose the gRPC codes.Code on GatewayError, so retry/timeout/auth handling can be written without re-parsing the wrapped error.

Resolution: (open)

Client.Go-007

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/go/mxgateway/session.go:526-532
Status Open

Description: newCorrelationID returns an empty string when crypto/rand.Read fails, silently producing an MxCommandRequest with no correlation id. rand.Read failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.

Recommendation: Either propagate the error up through invokeCommand, or fall back to a time/counter-based id rather than an empty string.

Resolution: (open)

Client.Go-008

Field Value
Severity Low
Category Testing coverage
Location clients/go/mxgateway/ (test files)
Status Open

Description: Several critical paths are untested: TLS credential resolution in resolveTransportCredentials (only the Plaintext path is exercised); the callContext deadline-shortening logic (client.go:198-204) including the negative-timeout disable case; and NativeValue/NativeArray for the array, raw-bytes, null, and unsupported-kind branches.

Recommendation: Add unit tests for resolveTransportCredentials precedence, callContext deadline arithmetic, and NativeValue/NativeArray round-trips for every kind.

Resolution: (open)

Client.Go-009

Field Value
Severity Low
Category Code organization & conventions
Location clients/go/mxgateway/galaxy.go:60-93,241-256, clients/go/mxgateway/client.go:41-74,190-205
Status Open

Description: DialGalaxy/Dial and GalaxyClient.callContext/Client.callContext are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.

Recommendation: Extract a shared unexported dial(ctx, opts) and a free callContext(opts, ctx) function, and have both client constructors call them.

Resolution: (open)

Client.Go-010

Field Value
Severity Low
Category Documentation & comments
Location clients/go/mxgateway/client.go:39-40
Status Open

Description: The Dial doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated WithBlock behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.

Recommendation: Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that DialTimeout bounds the initial connect attempt.

Resolution: (open)