# Code Review — Client.Go | Field | Value | |---|---| | Module | `clients/go` | | Reviewer | Claude Code | | Review date | 2026-05-24 | | Commit reviewed | `d692232` | | Status | Reviewed | | Open findings | 0 | ## Checklist coverage A re-review of commit `a020350` (which resolved Client.Go-011..016). `gofmt -l .`, `go vet ./...`, `go build ./...`, and `go test ./... -count=1` are all clean. | # | Category | Result | |---|---|---| | 1 | Correctness & logic bugs | Prior Client.Go-001/003/007/011 remain resolved. No new correctness bugs found. | | 2 | mxaccessgw conventions | `gofmt -l .` and `go vet ./...` clean; Client.Go-004 stays resolved. No new issues. | | 3 | Concurrency & thread safety | Client.Go-013 resolved. New issue: `runBenchReadBulk`'s warm-up + steady-state wall-clock loops ignore `ctx` cancellation, so a Ctrl+C or parent-cancel keeps spinning ReadBulk calls until the wall-clock deadline (Client.Go-018). | | 4 | Error handling & resilience | Client.Go-014 resolved. New issue: `parseValue` returns bare `strconv` errors with no `%w` wrap and no CLI-context, so a typo like `-type int32 -value foo` surfaces as `strconv.ParseInt: parsing "foo": invalid syntax` without naming the flag — out of line with the GoStyleGuide "wrap errors with useful context using `%w`" rule (Client.Go-017). | | 5 | Security | No issues found — TLS-by-default with TLS 1.2 floor, API-key redaction in CLI JSON output, no secret logging. | | 6 | Performance & resource management | No issues found — `defer client.Close()` / `defer subscription.Close()` applied consistently; bench-read-bulk preallocates the latency slice. | | 7 | Design-document adherence | No new issues. Lazy `grpc.NewClient` + readiness probe (Client.Go-005) and the shared `dial` helper (Client.Go-009) are applied uniformly across `Dial` and `DialGalaxy`. | | 8 | Code organization & conventions | Client.Go-015 resolved. New issue: `runStreamEvents` does not install a signal handler (Ctrl+C kills the process abruptly), while `runGalaxyWatch` does — the two long-running stream commands have divergent shutdown UX (Client.Go-020). | | 9 | Testing coverage | Client.Go-008/016 resolved. New issue: the six new bulk and bench subcommands (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`) have no CLI-level unit tests — in particular the Client.Go-015 secured-flag-gating fix has no regression test (Client.Go-021). | | 10 | Documentation & comments | Client.Go-010/012 resolved. New issue: `runGalaxyWatch` parses `-last-seen-deploy-time` with `time.RFC3339` (no fractional seconds), while `parseRfc3339Timestamp` for `-timestamp-value` accepts `time.RFC3339Nano` — the CLI advertises "RFC 3339" for both but quietly differs on sub-second support (Client.Go-019). | ### 2026-05-24 review (commit d692232) Re-review pass at `d692232`. Two commits touched `clients/go/`: `397d3c5` updated the proto root in `generate-proto.ps1` to `src/ZB.MOM.WW.MxGateway.Contracts/Protos`; `d692232` dropped stale `SessionId =` lines from `AcknowledgeAlarmRequest` / `AcknowledgeAlarmReply` fixtures in `alarms_test.go` after the proto retired the field, and substituted `CorrelationId: req.GetClientCorrelationId()` for the fake server's reply. Module path (`gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go`) and subpackage `mxgateway` intentionally unchanged per Go convention. `go build ./...` and `go test ./...` are green at HEAD. | # | Category | Result | |---|---|---| | 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. | | 2 | mxaccessgw conventions | No issues found — Go convention preserved (short lowercase package name, gitea-scoped module path). The generated `_pb.go` descriptor still carries `MxGateway.Contracts.Proto` in its csharp_namespace bytes; wire-level metadata not used by Go, intentionally not regenerated. | | 3 | Concurrency & thread safety | No issues found in this diff. | | 4 | Error handling & resilience | No issues found in this diff. | | 5 | Security | No issues found in this diff. | | 6 | Performance & resource management | No issues found in this diff. | | 7 | Design-document adherence | No issues found in this diff. | | 8 | Code organization & conventions | No issues found in this diff. | | 9 | Testing coverage | No issues found in this diff — `alarms_test.go` correctly drops retired `session_id` from `AcknowledgeAlarmRequest` and retains it on `QueryActiveAlarmsRequest`. | | 10 | Documentation & comments | No issues found in this diff. | ## Findings ### Client.Go-001 | Field | Value | |---|---| | Severity | High | | Category | Correctness & logic bugs | | Location | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` | | Status | Resolved | **Description:** `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()` returns a non-nil `error` interface wrapping a nil pointer. Consequently `errors.As(err, &ce)` for `*CommandError` returns `true` while setting `ce` to nil — a caller writing the idiomatic `if errors.As(err, &commandErr) { use commandErr.Status }` nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-`Command` path. **Recommendation:** Make `Unwrap` return an untyped nil when `Command` is nil: `if e == nil || e.Command == nil { return nil }; return e.Command`. Add a test for the HRESULT-only `MxAccessError` asserting `errors.As(err, &ce)` is `false`. **Resolution:** Resolved 2026-05-18: `MxAccessError.Unwrap` now returns an untyped nil when `Command` is nil, so `errors.As` no longer binds a typed-nil `*CommandError`; added `errors_test.go` regression coverage for the HRESULT-only and populated-`Command` paths. ### Client.Go-002 | Field | Value | |---|---| | Severity | Medium | | Category | Error handling & resilience | | Location | `clients/go/mxgateway/session.go:440-516` | | Status | Resolved | **Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour. **Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping). **Resolution:** Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed `results` with no terminal result. Added the exported sentinel `ErrEventBufferOverflow` (`errors.go`); `sendEventResult` now, on a full buffer, cancels the stream then calls the new `deliverTerminalResult` helper, which evicts one of the oldest buffered events to make room and places `EventResult{Err: ErrEventBufferOverflow}` so it becomes the consumer's last item before the channel closes. The previously lossy regression test (`TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned`) was re-pointed to assert the terminal `ErrEventBufferOverflow` result is delivered. `clients/go/README.md` now documents the bounded-buffer/overflow behaviour and steers no-loss callers to `SubscribeEvents`. ### Client.Go-003 | Field | Value | |---|---| | Severity | Medium | | Category | Correctness & logic bugs | | Location | `clients/go/cmd/mxgw-go/main.go:517-532` | | Status | Resolved | **Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`. **Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern. **Resolution:** Resolved 2026-05-18: confirmed against source — `parseInt32List` called `panic(err)` on a malformed token. It now returns `([]int32, error)`, wrapping the bad token (`invalid item handle %q: %w`); `runUnsubscribeBulk` parses item handles before dialing and returns the error, so a typo flows through `runWithIO` to `os.Exit(2)` like other validation paths. Regression tests `TestParseInt32ListParsesValidTokens` and `TestParseInt32ListReturnsErrorOnMalformedToken` added to `cmd/mxgw-go/main_test.go`. ### Client.Go-004 | Field | Value | |---|---| | Severity | Low | | Category | mxaccessgw conventions | | Location | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` | | Status | Resolved | **Description:** `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated checks and CI. **Recommendation:** Run `gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go`. **Resolution:** Resolved 2026-05-18: confirmed `gofmt -l .` flagged both files for misaligned struct-literal padding. Ran `gofmt -w` on `mxgateway/alarms_test.go` and `mxgateway/galaxy_test.go`; `gofmt -l .` is now clean for the whole module. ### Client.Go-005 | Field | Value | |---|---| | Severity | Low | | Category | Design-document adherence | | Location | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` | | Status | Resolved | **Description:** The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard `Dial` error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent. **Recommendation:** Migrate to `grpc.NewClient`; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by `DialTimeout`, and update the doc comment. **Resolution:** Resolved 2026-05-18: confirmed `Dial`/`DialGalaxy` used the deprecated `grpc.DialContext` + `grpc.WithBlock` pair. Migrated both to the shared `dial(ctx, opts)` helper, which now builds a lazy connection with `grpc.NewClient` and runs an explicit `waitForReady` readiness probe (`Connect` + `WaitForStateChange` until `connectivity.Ready`) bounded by `DialTimeout` — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: `grpc.NewClient` defaults the target scheme to `dns`, so the bufconn test harnesses (`client_session_test.go`, `alarms_test.go`, `galaxy_test.go`) were updated to use `passthrough:///bufnet` so the fake target reaches the context dialer. New tests `TestDialFailsFastWhenGatewayUnreachable` and `TestDialReadinessProbeReachesReady` cover the probe; `go vet` reports no deprecation. `clients/go/README.md` documents the lazy-connect + readiness-probe semantics. ### Client.Go-006 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience | | Location | `clients/go/mxgateway/errors.go:9-130` | | Status | Resolved | **Description:** `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classify transient (`Unavailable`, `DeadlineExceeded`) vs permanent (`Unauthenticated`, `InvalidArgument`) without manually unwrapping and calling `status.Code`. **Recommendation:** Add a helper (e.g. `IsTransient(err) bool`) or expose the gRPC `codes.Code` on `GatewayError`, so retry/timeout/auth handling can be written without re-parsing the wrapped error. **Resolution:** Resolved 2026-05-18: implemented the recommended classification surface in `errors.go` rather than a full parallel type hierarchy (the existing `GatewayError`/`CommandError`/`MxAccessError` chain already separates transport from protocol from MXAccess failures). Added `GatewayError.Code()` (returns the wrapped gRPC `codes.Code`, `OK` for nil, `Unknown` for a non-status error) and the free function `IsTransient(err error) bool`, which unwraps through `*GatewayError` and any gRPC-status chain and reports `true` for `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, and `Aborted`. Tests `TestGatewayErrorCode` and `TestIsTransient` cover the matrix; `clients/go/README.md` documents both for retry/timeout/auth handling. ### Client.Go-007 | Field | Value | |---|---| | Severity | Low | | Category | Correctness & logic bugs | | Location | `clients/go/mxgateway/session.go:526-532` | | Status | Resolved | **Description:** `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested. **Recommendation:** Either propagate the error up through `invokeCommand`, or fall back to a time/counter-based id rather than an empty string. **Resolution:** Resolved 2026-05-18: confirmed `newCorrelationID` returned `""` on a `rand.Read` failure. It now falls back to a non-empty `"fallback--"` id built from `time.Now().UnixNano()` and a process-wide `atomic.Uint64` monotonic counter, so every command stays traceable even without entropy. The `crypto/rand` call was routed through a `randRead` package variable so the failure path is testable; `TestNewCorrelationIDFallsBackOnRandFailure` simulates a `rand.Read` failure and asserts the fallback id is non-empty, `fallback-` prefixed, and unique, and `TestNewCorrelationIDUsesRandEntropy` pins the happy path. ### Client.Go-008 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `clients/go/mxgateway/` (test files) | | Status | Resolved | **Description:** Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disable case; and `NativeValue`/`NativeArray` for the array, raw-bytes, null, and unsupported-kind branches. **Recommendation:** Add unit tests for `resolveTransportCredentials` precedence, `callContext` deadline arithmetic, and `NativeValue`/`NativeArray` round-trips for every kind. **Resolution:** Resolved 2026-05-18: added `clients/go/mxgateway/coverage_test.go`. `TestResolveTransportCredentialsPrecedence` exercises every branch (explicit `TransportCredentials`, `Plaintext`, missing `CACertFile` error, `TLSConfig` + `ServerNameOverride`, default TLS floor) and `TestResolveTransportCredentialsDoesNotMutateTLSConfig` confirms the supplied `*tls.Config` is cloned. `TestCallContextDeadlineArithmetic` covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. `TestNativeValueEdgeKinds`, `TestNativeArrayEdgeKinds`, and `TestNativeValueUnsupportedKind` cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches. ### Client.Go-009 | Field | Value | |---|---| | Severity | Low | | Category | Code organization & conventions | | Location | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` | | Status | Resolved | **Description:** `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift. **Recommendation:** Extract a shared unexported `dial(ctx, opts)` and a free `callContext(opts, ctx)` function, and have both client constructors call them. **Resolution:** Resolved 2026-05-18: extracted the shared unexported `dial(ctx, opts) (*grpc.ClientConn, error)` (credential resolution, dial-option assembly, `grpc.NewClient`, readiness probe) and the free `callContext(ctx, callTimeout) (context.Context, context.CancelFunc)` into `client.go`. `Dial`/`DialGalaxy` and both `(*Client).callContext`/`(*GalaxyClient).callContext` methods now delegate to them; the duplicated dial and deadline code in `galaxy.go` was removed (its now-unused `errors` import dropped). This was done together with the Client.Go-005 migration so the `grpc.NewClient` change lives in exactly one place. ### Client.Go-010 | Field | Value | |---|---| | Severity | Low | | Category | Documentation & comments | | Location | `clients/go/mxgateway/client.go:39-40` | | Status | Resolved | **Description:** The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work. **Recommendation:** Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that `DialTimeout` bounds the initial connect attempt. **Resolution:** Resolved 2026-05-18: alongside the Client.Go-005 migration, the `Dial` doc comment was rewritten to describe the lazy `grpc.NewClient` connection, the `DialTimeout`-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling `ctx` aborts the probe. `DialGalaxy` and the new `dial`/`waitForReady`/`callContext` helpers carry matching doc comments. ### Client.Go-011 | Field | Value | |---|---| | Severity | Low | | Category | Correctness & logic bugs | | Location | `clients/go/mxgateway/alarms_test.go:66-73` | | Status | Resolved | **Description:** `TestAcknowledgeAlarmRejectsNilRequest` contains a no-op `if` with an empty body whose intent is documented in a comment ("Accept either: the helper returned the literal sentinel, or the generic transport error — both prove nil was rejected"). The condition ```go if err == nil || !errors.Is(err, errors.Unwrap(err)) && err.Error() != "mxgateway: acknowledge alarm request is required" { // ... } ``` evaluates expressions for side effects only and asserts nothing — Go's `&&` binds tighter than `||`, the body is empty, and the actual nil check happens on the very next `if err == nil`. The block is effectively dead code masquerading as a check. It also evaluates `errors.Unwrap(err)` regardless of `err`'s shape, and would call `err.Error()` even when err might be a wrapped status error whose message wording the gateway is free to change — making the apparent assertion brittle on top of being dead. **Recommendation:** Drop the empty-body `if` entirely (the subsequent `if err == nil { t.Fatalf(...) }` already enforces the contract), or, if the intent is to additionally pin the literal error message for the sentinel path, replace it with a real assertion (`if err.Error() != "mxgateway: acknowledge alarm request is required" { t.Fatalf(...) }`) and remove the spurious `errors.Is(err, errors.Unwrap(err))` clause. **Resolution:** 2026-05-20 — Removed the empty-body `if` in `TestAcknowledgeAlarmRejectsNilRequest`; the subsequent `if err == nil { t.Fatalf(...) }` already enforces the nil-rejection contract without the dead, brittle compound predicate. ### Client.Go-012 | Field | Value | |---|---| | Severity | Low | | Category | Documentation & comments | | Location | `clients/go/cmd/mxgw-go/main.go:1063-1065`, `clients/go/cmd/mxgw-go/main.go:88-104` | | Status | Resolved | **Description:** `writeUsage` lists the available subcommands as `version|open-session|close-session|register|add-item|advise|subscribe-bulk|unsubscribe-bulk|write|stream-events|smoke|galaxy-test-connection|galaxy-last-deploy|galaxy-discover|galaxy-watch`. Six subcommands wired into `run` are missing from this list: `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, and `bench-read-bulk`. A user invoking `mxgw-go` with no args or an unknown command (the two paths that print this banner) sees an incomplete CLI surface and may believe the bulk-write / read-bulk families are not implemented. The README does document them, but the inline usage banner is the first source of truth a CLI user consults. **Recommendation:** Extend the usage string to include every command registered in the `switch args[0]` in `run`, or generate it from a single source-of-truth slice keyed on command name → handler so the two cannot drift again. **Resolution:** 2026-05-20 — `writeUsage` now lists the previously missing `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, and `bench-read-bulk` subcommands alongside the original surface, so the no-args / unknown-command banner reflects every command wired into `run`. ### Client.Go-013 | Field | Value | |---|---| | Severity | Low | | Category | Concurrency & thread safety | | Location | `clients/go/cmd/mxgw-go/main.go:1246-1249`, `clients/go/cmd/mxgw-go/main.go:1257-1262` | | Status | Resolved | **Description:** In `runGalaxyWatch`, the signal-cancellation branch carefully drains the buffered `events` channel after `cancelStream()` so the `WatchDeployEvents` goroutine can exit (`for range events { }`). The limit-reached branch (`if *limit > 0 && count >= *limit { cancelStream(); return nil }`) skips that drain and returns immediately. After the function returns, `defer client.Close()` runs and tears down the gRPC connection; in the gap before the connection close propagates, the WatchDeployEvents goroutine may still be blocked on `case events <- event:` (the channel is buffered to 16 but a slow producer can refill it) — the goroutine then exits via `<-ctx.Done()` because `streamCtx` was cancelled, so it isn't a permanent leak, but the two cancellation paths behave inconsistently and the limit-reached path can briefly hold a goroutine plus the gRPC stream while the client tears down underneath it. **Recommendation:** Factor the drain into a helper and use it from both branches, e.g. after `cancelStream()` always `for range events { }` (and let the surrounding `select`/`for` re-evaluate `<-errs` if a terminal error was already buffered). Alternatively, drop the explicit drain in both branches and rely on `defer cancelStream()` plus `defer client.Close()` — but pick one model and apply it consistently. **Resolution:** 2026-05-20 — The limit-reached branch in `runGalaxyWatch` now drains the buffered `events` channel (`for range events { }`) after `cancelStream()`, matching the signal-cancel branch. Both cancellation paths now wait for the `WatchDeployEvents` goroutine to exit before `defer client.Close()` tears the gRPC connection down. ### Client.Go-014 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience | | Location | `clients/go/mxgateway/session.go:602`, `clients/go/mxgateway/galaxy.go:189` | | Status | Resolved | **Description:** Two stream Recv loops compare end-of-stream with `err == io.EOF` directly: - `session.go:602` — `if err == io.EOF || status.Code(err) == codes.Canceled || streamCtx.Err() != nil { return }` - `galaxy.go:189` — `if recvErr == io.EOF { return }` gRPC's generated `Recv()` does return the `io.EOF` sentinel directly today, so the comparisons work in practice. However, the Go idiom (and the project's `docs/style-guides/GoStyleGuide.md`) is to use `errors.Is(err, io.EOF)` so future wrapping (e.g. an interceptor decorating Recv errors) does not silently flip the loop from "stream finished normally" to "stream produced an error". The mxgateway client itself wraps non-EOF Recv errors in `*GatewayError`, which `errors.Is` already supports — using `errors.Is` keeps both paths consistent. **Recommendation:** Replace `recvErr == io.EOF` / `err == io.EOF` with `errors.Is(err, io.EOF)` (the `errors` package is already imported in both files). **Resolution:** 2026-05-20 — Both stream Recv loops now use `errors.Is(err, io.EOF)`: `session.go` already imported `errors`, and `galaxy.go` gained the missing `errors` import alongside the `recvErr == io.EOF` → `errors.Is(recvErr, io.EOF)` change, keeping EOF detection robust against any future Recv-error wrapping. ### Client.Go-015 | Field | Value | |---|---| | Severity | Low | | Category | Code organization & conventions | | Location | `clients/go/cmd/mxgw-go/main.go:410-512` | | Status | Resolved | **Description:** `runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool)` accepts `secured` but never uses it — the routing is keyed on `command` (the string `"write-bulk"` / `"write2-bulk"` / `"write-secured-bulk"` / `"write-secured2-bulk"`). The function ends with `_ = secured // currently only used for routing above; reserved for future per-variant validation`, which is misleading because `secured` is not in fact used for routing. The four wrapper functions (`runWriteBulk`, `runWrite2Bulk`, `runWriteSecuredBulk`, `runWriteSecured2Bulk`) all pass a `secured` argument that has no effect. The four CLI options `-current-user-id`, `-verifier-user-id` are unconditionally registered on every variant, including the non-secured ones, so a `write-bulk` invocation that passes `-current-user-id 42` silently does nothing. Either remove `secured` and the dead `_ = secured` comment, or use it to gate the registration of secured-only flags so wrong combinations are rejected with a clean error. **Recommendation:** Drop the `secured` parameter (the `command` switch already distinguishes the four variants) and the misleading `_ = secured` line; or, if validation is the goal, branch flag registration on `secured` so secured-only flags are unavailable for the non-secured variants and emit a clean usage error if they appear. **Resolution:** 2026-05-20 — Dropped the unused `secured` parameter from `runWriteBulkVariant` (the `command` switch already distinguishes the four variants) and removed the misleading `_ = secured` line. The variant is now derived locally from `command` and used to gate flag registration: `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error instead of silently no-op'ing. The four `runWrite*Bulk` wrappers were updated to match the new signature. ### Client.Go-016 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `clients/go/mxgateway/galaxy_test.go:382-429` | | Status | Resolved | **Description:** `fakeGalaxyServer.watchSendInterval` is declared on the test fake and consulted inside `WatchDeployEvents` (`if s.watchSendInterval > 0 { ... }`) but no test in the package sets a non-zero value. The dead field plus its branch were presumably added to support a backpressure / pacing test that was never landed, and now the only effect is reader confusion ("which test uses this?") and a pointlessly larger fake. Backpressure on the bootstrap-plus-events sequence is also genuinely worth testing, given that `WatchDeployEvents` writes to a 16-deep buffered channel. **Recommendation:** Either delete the unused `watchSendInterval` field and its branch in `WatchDeployEvents`, or add the test it was added for — e.g. one that pumps more than 16 events with a small interval and asserts the consumer keeps up without losing or reordering events. Linking the field to a `// for TestX` comment if it stays would also help. **Resolution:** 2026-05-20 — Removed the unused `watchSendInterval` field from `fakeGalaxyServer` and the corresponding `if s.watchSendInterval > 0 { ... }` branch in `WatchDeployEvents`; no test set the field, so the dead code path is gone and the fake is leaner. `gofmt -w` reflowed the struct to drop the no-longer-needed field-name padding. ### Client.Go-017 | Field | Value | |---|---| | Severity | Low | | Category | Error handling & resilience | | Location | `clients/go/cmd/mxgw-go/main.go:954-991` | | Status | Resolved | **Description:** `parseValue` returns the raw `strconv.ParseBool` / `strconv.ParseInt` / `strconv.ParseFloat` error verbatim — no wrap with `%w` and no indication of which CLI flag was the source. A user running `mxgw-go write -type int32 -value foo` sees ``` strconv.ParseInt: parsing "foo": invalid syntax ``` with no mention of `-value`, `-type`, or which subcommand failed. The same pattern hits every typed branch (bool, int32, int64, float, double). Compare with the sibling helpers in the same file: `parseInt32List` wraps with `"invalid item handle %q: %w"` (Client.Go-003 resolution) and `parseRfc3339Timestamp` wraps with `"invalid RFC 3339 timestamp %q: %w"`. `parseValue` was missed and is inconsistent with those two. The GoStyleGuide (`docs/style-guides/GoStyleGuide.md`, "Errors" section) requires "Wrap errors with useful context using `%w`." **Recommendation:** Wrap each `strconv` error with the offending input and type, e.g. `return nil, fmt.Errorf("invalid %s value %q: %w", valueType, valueText, err)`. The wrapper handles all five typed branches uniformly without a per-branch change. **Resolution:** 2026-05-20 — Each typed branch of `parseValue` now wraps the bare `strconv` error with `%w` and names the offending flag and value (`"invalid -value for -type %s: %q: %w"`), so `mxgw-go write -type int32 -value foo` surfaces the source flag, the requested type, and the bad token while still letting `errors.Is/As` reach the underlying `strconv` sentinel. The new `TestParseValueWrapsStrconvErrorWithFlagContext` table-test pins all five typed branches (bool, int32, int64, float, double) to the new wrapper shape. ### Client.Go-018 | Field | Value | |---|---| | Severity | Low | | Category | Concurrency & thread safety | | Location | `clients/go/cmd/mxgw-go/main.go:593-623` | | Status | Resolved | **Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only: ```go for time.Now().Before(warmupDeadline) { _, _ = session.ReadBulk(ctx, serverHandle, tags, timeout) } ... for time.Now().Before(steadyDeadline) { callStart := time.Now() results, err := session.ReadBulk(ctx, serverHandle, tags, timeout) ... } ``` Neither loop checks `ctx.Done()` / `ctx.Err()`. If the parent context is cancelled (e.g. the operator Ctrl+Cs the benchmark, or the cross-language bench driver `scripts/bench-read-bulk.ps1` times out and kills the child early), the loops keep iterating until their wall-clock deadlines elapse. Each `ReadBulk` call inside fails fast (the gRPC call inherits the cancelled context and returns `context.Canceled`), but the steady-state loop counts those as `failedCalls++` and keeps spinning — wasting CPU and inflating the `failedCalls` and `latencyMs.max` figures the PowerShell driver collates across all five clients. The .NET, Rust, Python, and Java bench drivers should be checked for the same shape, but the Go one is the only one being reviewed here. Note that `runBenchReadBulk` is the only Go CLI command that does NOT register its own signal handler (compare with `runGalaxyWatch` which does via `signal.NotifyContext`). **Recommendation:** Drop out of both loops as soon as `ctx.Err() != nil`. Concretely, change the loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`), or use a `select { case <-ctx.Done(): break loop; default: }` guard at the top of each iteration. The cross-language bench shape (`durationMs`, `totalCalls`, `failedCalls`, `latencyMs`) stays the same — the bench just exits sooner and reports the truncated window faithfully. **Resolution:** 2026-05-20 — Both the warm-up and steady-state loops in `runBenchReadBulk` now carry an `&& ctx.Err() == nil` guard alongside the wall-clock check, so a cancelled parent context (Ctrl+C, or the cross-language bench driver killing the child early) breaks the loop instead of spinning failing `ReadBulk` calls until the deadline elapses. The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`. ### Client.Go-019 | Field | Value | |---|---| | Severity | Low | | Category | Documentation & comments | | Location | `clients/go/cmd/mxgw-go/main.go:710-716`, `clients/go/cmd/mxgw-go/main.go:1204,1213` | | Status | Resolved | **Description:** The CLI advertises two timestamp flags as "RFC3339" but parses them with different layouts: - `-timestamp-value` (write2/write-secured2 bulk): `parseRfc3339Timestamp` uses `time.RFC3339Nano`, which accepts both `2026-04-28T10:00:00Z` and `2026-04-28T10:00:00.123456789Z`. - `-last-seen-deploy-time` (galaxy-watch): `time.Parse(time.RFC3339, ...)`, which rejects fractional seconds. A user copy-pasting an `ObservedAt` timestamp from `galaxy-watch -json` (which is emitted as `RFC3339Nano` by `formatDeployEvent`) directly into `-last-seen-deploy-time` will get a parse error if the source value carried a fractional component, even though both flag descriptions say "RFC3339". The flag help string at `main.go:1204` literally says "RFC3339 timestamp", and the README example uses `2026-04-28T10:00:00Z` (whole seconds only), so the issue is silent until a fractional timestamp comes from the gateway. **Recommendation:** Switch the `galaxy-watch` parse to `time.RFC3339Nano` to match `parseRfc3339Timestamp` (and the gateway's own emit format). One line change at `main.go:1213`. While there, update the flag help string and the README example to say "RFC 3339 (with optional fractional seconds)" so the two flags are documented uniformly. **Resolution:** 2026-05-20 — `runGalaxyWatch` now parses `-last-seen-deploy-time` with `time.RFC3339Nano`, matching `parseRfc3339Timestamp` and the gateway's own `formatDeployEvent` emit format; the layout is strictly broader than the previous `time.RFC3339` (whole-second values still parse). The flag help string changed to "RFC 3339 timestamp (with optional fractional seconds)" and the `clients/go/README.md` example was extended with an explicit fractional-seconds line so the two flags advertise the same surface. ### Client.Go-020 | Field | Value | |---|---| | Severity | Low | | Category | Code organization & conventions | | Location | `clients/go/cmd/mxgw-go/main.go:753-802`, `clients/go/cmd/mxgw-go/main.go:1199-1275` | | Status | Resolved | **Description:** The two long-running stream commands have divergent Ctrl+C UX: - `runGalaxyWatch` registers a signal handler: ```go signalCtx, stopSignals := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM) defer stopSignals() streamCtx, cancelStream := context.WithCancel(signalCtx) ``` so Ctrl+C drains buffered events and returns cleanly. - `runStreamEvents` does not register any signal handler — its parent context is `context.Background()` from `runWithIO`, so Ctrl+C abruptly kills the process. The deferred `subscription.Close()` and `client.Close()` never run, leaving the server-side stream to fault out on a torn TCP connection rather than a clean cancel. The two commands are otherwise structurally identical (subscribe + loop until limit or external stop) — the inconsistency is one half of a pair that was missed when `galaxy-watch` was added. Worth flagging because it directly affects what an integrator who Ctrl+Cs `stream-events` sees in the gateway's logs (a transport reset rather than a `codes.Canceled`). **Recommendation:** Mirror the `runGalaxyWatch` pattern in `runStreamEvents`: wrap `ctx` in `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)`, derive `streamCtx` from it, and let `defer subscription.Close()` / `defer cancelStream()` tear the stream down on signal. The change is roughly six lines and brings the two stream commands into parity. Optionally factor a shared `withSignals(ctx) (context.Context, context.CancelFunc)` helper if a third stream command lands. **Resolution:** 2026-05-20 — `runStreamEvents` now installs `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)` (with a deferred `stopSignals()`) and derives `streamCtx` from the resulting signal-aware context, mirroring `runGalaxyWatch`. Ctrl+C now cancels the gRPC stream cleanly — the gateway sees `codes.Canceled` instead of a torn TCP connection — and the deferred `subscription.Close()` / `client.Close()` actually run on signal. The two long-running stream commands now share the same shutdown UX. ### Client.Go-021 | Field | Value | |---|---| | Severity | Low | | Category | Testing coverage | | Location | `clients/go/cmd/mxgw-go/main_test.go`, `clients/go/cmd/mxgw-go/main.go:363-520,522-655` | | Status | Resolved | **Description:** The six bulk / bench subcommands wired into `run` (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`) have **no CLI-level unit tests** in `main_test.go`. In particular, the Client.Go-015 resolution claims: > `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error instead of silently no-op'ing. But there is no test asserting that, e.g., `mxgw-go write-bulk -current-user-id 1 ...` returns a "flag provided but not defined" error, or that `mxgw-go write-secured-bulk -user-id 1 ...` does the same. A future refactor of `runWriteBulkVariant` (notably one that re-introduced the `secured` parameter) could silently re-permit the wrong flags without breaking any test. The same gap applies to: parameter validation in `runReadBulk` (bulk size, empty session/items rejection), the value-count vs handle-count mismatch error in `runWriteBulkVariant:447`, and `runBenchReadBulk`'s `bulk-size`/`duration-seconds` positivity checks. `mxgateway/client_session_test.go` already covers the library-level happy paths (`TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults`, `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag`, `TestSubscribeBulkBuildsOneBulkCommandAndReturnsResults`), so this finding is about CLI surface area only. **Recommendation:** Add table-driven tests in `cmd/mxgw-go/main_test.go` along the existing `TestParseInt32List*` and `TestParseValueBuildsTypedValue` style: - `TestRunWriteBulkVariantGatesSecuredFlags`: invoke `runWithIO` with `write-bulk -current-user-id 1 ...` and `write-secured-bulk -user-id 1 ...`, assert each returns an error matching `flag provided but not defined`. - `TestRunReadBulkRejectsMissingArgs`: invoke `runWithIO` with `read-bulk` (no `-session-id`), assert the documented "session-id and items are required" error. - `TestRunBenchReadBulkRejectsNonPositiveBulkSize` / `TestRunBenchReadBulkRejectsNonPositiveDuration`: pin the positivity checks at `main.go:544-549`. - `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues`: pin the `len(handles) != len(valueTexts)` error at `main.go:447`. Each is a few lines and routes through the existing `runWithIO` entry point, so it does not need a bufconn fake. **Resolution:** 2026-05-20 — Added CLI-level table-driven regression tests in `cmd/mxgw-go/main_test.go` routed through `runWithIO`, so they need no bufconn fake: `TestRunWriteBulkVariantGatesSecuredFlags` pins Client.Go-015 by asserting `write-bulk -current-user-id`, `write-bulk -verifier-user-id`, `write2-bulk -current-user-id`, `write-secured-bulk -user-id`, and `write-secured2-bulk -user-id` all surface `flag provided but not defined`; `TestRunReadBulkRejectsMissingArgs` pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; `TestRunBenchReadBulkRejectsNonPositiveBulkSize` and `TestRunBenchReadBulkRejectsNonPositiveDuration` pin the positivity checks; `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` pins the explicit `item-handles count ... does not match values count ...` error. `go test ./...` passes.