Files

T

Joseph Doherty 82996aa8e6 Resolve Client.Go-022..027: bulk flags, bench cancel, batch loop

Client.Go-022  Re-applied Client.Go-015 shape — runWriteBulkVariant drops
               the unused secured param and gates -current-user-id /
               -verifier-user-id / -user-id behind the secured-only
               variants.
Client.Go-023  Re-applied Client.Go-018 shape — bench warm-up and steady-
               state loops respect ctx.Err().
Client.Go-024  Added SDK-level tests for WriteBulk / Write2Bulk /
               WriteSecuredBulk / WriteSecured2Bulk / ReadBulk and
               StreamAlarms via the existing bufconn fake gateway pattern.
Client.Go-025  Five bulk SDK methods short-circuit on empty input without
               an RPC round-trip and document the behavior.
Client.Go-026  runBatch widens scanner.Buffer to 16 MiB and emits an
               error-with-sentinel if a longer line still arrives, rather
               than aborting the session silently.
Client.Go-027  runBatch treats blank lines as skip-and-continue; only EOF
               ends the session.

All resolved at 2026-05-24; gofmt + go vet + go build + go test ./... all
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 08:49:58 -04:00

61 KiB

Raw Blame History

Code Review — Client.Go

Field	Value
Module	`clients/go`
Reviewer	Claude Code
Review date	2026-05-24
Commit reviewed	`42b0037`
Status	Re-reviewed
Open findings	0

Checklist coverage

A re-review of commit a020350 (which resolved Client.Go-011..016). gofmt -l ., go vet ./..., go build ./..., and go test ./... -count=1 are all clean.

#	Category	Result
1	Correctness & logic bugs	Prior Client.Go-001/003/007/011 remain resolved. No new correctness bugs found.
2	mxaccessgw conventions	`gofmt -l .` and `go vet ./...` clean; Client.Go-004 stays resolved. No new issues.
3	Concurrency & thread safety	Client.Go-013 resolved. New issue: `runBenchReadBulk`'s warm-up + steady-state wall-clock loops ignore `ctx` cancellation, so a Ctrl+C or parent-cancel keeps spinning ReadBulk calls until the wall-clock deadline (Client.Go-018).
4	Error handling & resilience	Client.Go-014 resolved. New issue: `parseValue` returns bare `strconv` errors with no `%w` wrap and no CLI-context, so a typo like `-type int32 -value foo` surfaces as `strconv.ParseInt: parsing "foo": invalid syntax` without naming the flag — out of line with the GoStyleGuide "wrap errors with useful context using `%w`" rule (Client.Go-017).
5	Security	No issues found — TLS-by-default with TLS 1.2 floor, API-key redaction in CLI JSON output, no secret logging.
6	Performance & resource management	No issues found — `defer client.Close()` / `defer subscription.Close()` applied consistently; bench-read-bulk preallocates the latency slice.
7	Design-document adherence	No new issues. Lazy `grpc.NewClient` + readiness probe (Client.Go-005) and the shared `dial` helper (Client.Go-009) are applied uniformly across `Dial` and `DialGalaxy`.
8	Code organization & conventions	Client.Go-015 resolved. New issue: `runStreamEvents` does not install a signal handler (Ctrl+C kills the process abruptly), while `runGalaxyWatch` does — the two long-running stream commands have divergent shutdown UX (Client.Go-020).
9	Testing coverage	Client.Go-008/016 resolved. New issue: the six new bulk and bench subcommands (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`) have no CLI-level unit tests — in particular the Client.Go-015 secured-flag-gating fix has no regression test (Client.Go-021).
10	Documentation & comments	Client.Go-010/012 resolved. New issue: `runGalaxyWatch` parses `-last-seen-deploy-time` with `time.RFC3339` (no fractional seconds), while `parseRfc3339Timestamp` for `-timestamp-value` accepts `time.RFC3339Nano` — the CLI advertises "RFC 3339" for both but quietly differs on sub-second support (Client.Go-019).

2026-05-24 review (commit `d692232`)

Re-review pass at d692232. Two commits touched clients/go/: 397d3c5 updated the proto root in generate-proto.ps1 to src/ZB.MOM.WW.MxGateway.Contracts/Protos; d692232 dropped stale SessionId = lines from AcknowledgeAlarmRequest / AcknowledgeAlarmReply fixtures in alarms_test.go after the proto retired the field, and substituted CorrelationId: req.GetClientCorrelationId() for the fake server's reply. Module path (gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go) and subpackage mxgateway intentionally unchanged per Go convention. go build ./... and go test ./... are green at HEAD.

#	Category	Result
1	Correctness & logic bugs	No issues found in the a020350..d692232 diff.
2	mxaccessgw conventions	No issues found — Go convention preserved (short lowercase package name, gitea-scoped module path). The generated `_pb.go` descriptor still carries `MxGateway.Contracts.Proto` in its csharp_namespace bytes; wire-level metadata not used by Go, intentionally not regenerated.
3	Concurrency & thread safety	No issues found in this diff.
4	Error handling & resilience	No issues found in this diff.
5	Security	No issues found in this diff.
6	Performance & resource management	No issues found in this diff.
7	Design-document adherence	No issues found in this diff.
8	Code organization & conventions	No issues found in this diff.
9	Testing coverage	No issues found in this diff — `alarms_test.go` correctly drops retired `session_id` from `AcknowledgeAlarmRequest` and retains it on `QueryActiveAlarmsRequest`.
10	Documentation & comments	No issues found in this diff.

2026-05-24 re-review (commit `42b0037`)

Re-review pass at 42b0037. Four commits touched clients/go/: 71d2c39 ported the long-lived batch subcommand; 8aaab82 added the bulk SDK methods (WriteBulk/Write2Bulk/WriteSecuredBulk/WriteSecured2Bulk/ReadBulk) and the six bulk/bench CLI subcommands (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk); 6f0d142 ported stream-alarms and acknowledge-alarm (CLI + SDK type aliases + Client.StreamAlarms); 8738735 updated the README to document the new SDK methods. gofmt -l ., go vet ./..., go build ./..., and go test ./... -count=1 are clean at HEAD.

This re-review surfaces two regressions of previously-Resolved findings and four genuinely-new issues introduced by the new bulk / batch / alarm surface. Client.Go-022 and Client.Go-023 are recorded as new IDs (not status flips on the original) because the relevant code was deleted and re-added in 8aaab82, so the prior Resolution remains historically accurate against that earlier commit.

#	Category	Result
1	Correctness & logic bugs	New issue: bulk SDK methods accept a non-nil but empty entry/tag slice and send a no-op bulk command (Client.Go-025).
2	mxaccessgw conventions	No issues found in this diff. `gofmt -l .` and `go vet ./...` clean.
3	Concurrency & thread safety	New issue: `runBenchReadBulk` warm-up + steady-state loops once again ignore `ctx.Err()` — a regression of Client.Go-018 (Client.Go-023).
4	Error handling & resilience	New issue: `runBatch` uses a default 64 KiB `bufio.Scanner` buffer; one long bulk-args line aborts the whole batch session with no EOR sentinel (Client.Go-026).
5	Security	No issues found — `runAcknowledgeAlarm` doesn't log the `-comment` / `-operator` values; the bulk-write SDK doc comments still flag credential-sensitive values.
6	Performance & resource management	No issues found in this diff — `defer client.Close()` / `defer session.Close(...)` / `defer session.UnsubscribeBulk(...)` applied consistently in `runBenchReadBulk`.
7	Design-document adherence	No issues found in this diff. `Client.StreamAlarms` correctly skips `c.callContext` (streams must not carry the unary call timeout); `AcknowledgeAlarm` correctly uses it.
8	Code organization & conventions	New issues: `runWriteBulkVariant` re-introduces the unused `secured` parameter, the dead `_ = secured` line, and unconditionally-registered secured/non-secured flags — regresses Client.Go-015 (Client.Go-022). `runBatch` treats a blank line in the middle of a script as end-of-session (Client.Go-027).
9	Testing coverage	New issue: the five new bulk SDK methods and `Client.StreamAlarms` have no unit tests in `mxgateway/` (Client.Go-024).
10	Documentation & comments	No issues found in this diff. README documents the new `StreamAlarms`/`AcknowledgeAlarm` SDK calls; `Session.ReadBulk` documents the cached-vs-snapshot semantics and `timeout=0` default; `WriteSecuredBulk` flags credential sensitivity.

Findings

Client.Go-001

Field	Value
Severity	High
Category	Correctness & logic bugs
Location	`clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128`
Status	Resolved

Description: MxAccessError.Unwrap returns e.Command directly. EnsureMxAccessSuccess constructs &MxAccessError{Reply: reply} with Command left nil (the HRESULT / failing-MxStatusProxy path). When Command is a nil *CommandError, Unwrap() returns a non-nil error interface wrapping a nil pointer. Consequently errors.As(err, &ce) for *CommandError returns true while setting ce to nil — a caller writing the idiomatic if errors.As(err, &commandErr) { use commandErr.Status } nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-Command path.

Recommendation: Make Unwrap return an untyped nil when Command is nil: if e == nil || e.Command == nil { return nil }; return e.Command. Add a test for the HRESULT-only MxAccessError asserting errors.As(err, &ce) is false.

Resolution: Resolved 2026-05-18: MxAccessError.Unwrap now returns an untyped nil when Command is nil, so errors.As no longer binds a typed-nil *CommandError; added errors_test.go regression coverage for the HRESULT-only and populated-Command paths.

Client.Go-002

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/go/mxgateway/session.go:440-516`
Status	Resolved

Description: For the Events/EventsAfter compatibility API (cancelWhenResultBufferFull == true), when the 16-slot results channel is full sendEventResult cancels and returns false; the goroutine returns and close(results) runs — the consumer sees the channel close with no EventResult{Err: ...} ever delivered. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.

Recommendation: Before cancelling on a full buffer, deliver a terminal EventResult carrying an explicit error (e.g. ErrEventBufferOverflow). Document the behaviour on Session.Events; steer callers to SubscribeEvents (which blocks instead of dropping).

Resolution: Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed results with no terminal result. Added the exported sentinel ErrEventBufferOverflow (errors.go); sendEventResult now, on a full buffer, cancels the stream then calls the new deliverTerminalResult helper, which evicts one of the oldest buffered events to make room and places EventResult{Err: ErrEventBufferOverflow} so it becomes the consumer's last item before the channel closes. The previously lossy regression test (TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned) was re-pointed to assert the terminal ErrEventBufferOverflow result is delivered. clients/go/README.md now documents the bounded-buffer/overflow behaviour and steers no-loss callers to SubscribeEvents.

Client.Go-003

Field	Value
Severity	Medium
Category	Correctness & logic bugs
Location	`clients/go/cmd/mxgw-go/main.go:517-532`
Status	Resolved

Description: parseInt32List calls panic(err) when an item-handles token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like -item-handles 1,foo crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in main.go.

Recommendation: Change parseInt32List to return ([]int32, error) and have runUnsubscribeBulk propagate the error, matching parseValue's pattern.

Resolution: Resolved 2026-05-18: confirmed against source — parseInt32List called panic(err) on a malformed token. It now returns ([]int32, error), wrapping the bad token (invalid item handle %q: %w); runUnsubscribeBulk parses item handles before dialing and returns the error, so a typo flows through runWithIO to os.Exit(2) like other validation paths. Regression tests TestParseInt32ListParsesValidTokens and TestParseInt32ListReturnsErrorOnMalformedToken added to cmd/mxgw-go/main_test.go.

Client.Go-004

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59`
Status	Resolved

Description: gofmt -l flags alarms_test.go and galaxy_test.go for misaligned struct-literal field padding. The Go client README lists gofmt as part of the workflow and the repo enforces style; unformatted committed code breaks gofmt-gated checks and CI.

Recommendation: Run gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go.

Resolution: Resolved 2026-05-18: confirmed gofmt -l . flagged both files for misaligned struct-literal padding. Ran gofmt -w on mxgateway/alarms_test.go and mxgateway/galaxy_test.go; gofmt -l . is now clean for the whole module.

Client.Go-005

Field	Value
Severity	Low
Category	Design-document adherence
Location	`clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87`
Status	Resolved

Description: The client uses grpc.DialContext with grpc.WithBlock(). In current grpc-go both are deprecated in favour of grpc.NewClient (lazy connection). WithBlock also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard Dial error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.

Recommendation: Migrate to grpc.NewClient; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by DialTimeout, and update the doc comment.

Resolution: Resolved 2026-05-18: confirmed Dial/DialGalaxy used the deprecated grpc.DialContext + grpc.WithBlock pair. Migrated both to the shared dial(ctx, opts) helper, which now builds a lazy connection with grpc.NewClient and runs an explicit waitForReady readiness probe (Connect + WaitForStateChange until connectivity.Ready) bounded by DialTimeout — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: grpc.NewClient defaults the target scheme to dns, so the bufconn test harnesses (client_session_test.go, alarms_test.go, galaxy_test.go) were updated to use passthrough:///bufnet so the fake target reaches the context dialer. New tests TestDialFailsFastWhenGatewayUnreachable and TestDialReadinessProbeReachesReady cover the probe; go vet reports no deprecation. clients/go/README.md documents the lazy-connect + readiness-probe semantics.

Client.Go-006

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/go/mxgateway/errors.go:9-130`
Status	Resolved

Description: docs/ClientLibrariesDesign.md recommends a high-level error taxonomy (TransportError, AuthenticationError, TimeoutError, etc.). The Go client collapses all transport/gRPC failures into a single GatewayError with no way to classify transient (Unavailable, DeadlineExceeded) vs permanent (Unauthenticated, InvalidArgument) without manually unwrapping and calling status.Code.

Recommendation: Add a helper (e.g. IsTransient(err) bool) or expose the gRPC codes.Code on GatewayError, so retry/timeout/auth handling can be written without re-parsing the wrapped error.

Resolution: Resolved 2026-05-18: implemented the recommended classification surface in errors.go rather than a full parallel type hierarchy (the existing GatewayError/CommandError/MxAccessError chain already separates transport from protocol from MXAccess failures). Added GatewayError.Code() (returns the wrapped gRPC codes.Code, OK for nil, Unknown for a non-status error) and the free function IsTransient(err error) bool, which unwraps through *GatewayError and any gRPC-status chain and reports true for Unavailable, DeadlineExceeded, ResourceExhausted, and Aborted. Tests TestGatewayErrorCode and TestIsTransient cover the matrix; clients/go/README.md documents both for retry/timeout/auth handling.

Client.Go-007

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`clients/go/mxgateway/session.go:526-532`
Status	Resolved

Description: newCorrelationID returns an empty string when crypto/rand.Read fails, silently producing an MxCommandRequest with no correlation id. rand.Read failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.

Recommendation: Either propagate the error up through invokeCommand, or fall back to a time/counter-based id rather than an empty string.

Resolution: Resolved 2026-05-18: confirmed newCorrelationID returned "" on a rand.Read failure. It now falls back to a non-empty "fallback-<unixnano>-<counter>" id built from time.Now().UnixNano() and a process-wide atomic.Uint64 monotonic counter, so every command stays traceable even without entropy. The crypto/rand call was routed through a randRead package variable so the failure path is testable; TestNewCorrelationIDFallsBackOnRandFailure simulates a rand.Read failure and asserts the fallback id is non-empty, fallback- prefixed, and unique, and TestNewCorrelationIDUsesRandEntropy pins the happy path.

Client.Go-008

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/go/mxgateway/` (test files)
Status	Resolved

Description: Several critical paths are untested: TLS credential resolution in resolveTransportCredentials (only the Plaintext path is exercised); the callContext deadline-shortening logic (client.go:198-204) including the negative-timeout disable case; and NativeValue/NativeArray for the array, raw-bytes, null, and unsupported-kind branches.

Recommendation: Add unit tests for resolveTransportCredentials precedence, callContext deadline arithmetic, and NativeValue/NativeArray round-trips for every kind.

Resolution: Resolved 2026-05-18: added clients/go/mxgateway/coverage_test.go. TestResolveTransportCredentialsPrecedence exercises every branch (explicit TransportCredentials, Plaintext, missing CACertFile error, TLSConfig + ServerNameOverride, default TLS floor) and TestResolveTransportCredentialsDoesNotMutateTLSConfig confirms the supplied *tls.Config is cloned. TestCallContextDeadlineArithmetic covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. TestNativeValueEdgeKinds, TestNativeArrayEdgeKinds, and TestNativeValueUnsupportedKind cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches.

Client.Go-009

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205`
Status	Resolved

Description: DialGalaxy/Dial and GalaxyClient.callContext/Client.callContext are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.

Recommendation: Extract a shared unexported dial(ctx, opts) and a free callContext(opts, ctx) function, and have both client constructors call them.

Resolution: Resolved 2026-05-18: extracted the shared unexported dial(ctx, opts) (*grpc.ClientConn, error) (credential resolution, dial-option assembly, grpc.NewClient, readiness probe) and the free callContext(ctx, callTimeout) (context.Context, context.CancelFunc) into client.go. Dial/DialGalaxy and both (*Client).callContext/(*GalaxyClient).callContext methods now delegate to them; the duplicated dial and deadline code in galaxy.go was removed (its now-unused errors import dropped). This was done together with the Client.Go-005 migration so the grpc.NewClient change lives in exactly one place.

Client.Go-010

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/go/mxgateway/client.go:39-40`
Status	Resolved

Description: The Dial doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated WithBlock behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.

Recommendation: Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that DialTimeout bounds the initial connect attempt.

Resolution: Resolved 2026-05-18: alongside the Client.Go-005 migration, the Dial doc comment was rewritten to describe the lazy grpc.NewClient connection, the DialTimeout-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling ctx aborts the probe. DialGalaxy and the new dial/waitForReady/callContext helpers carry matching doc comments.

Client.Go-011

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`clients/go/mxgateway/alarms_test.go:66-73`
Status	Resolved

Description: TestAcknowledgeAlarmRejectsNilRequest contains a no-op if with an empty body whose intent is documented in a comment ("Accept either: the helper returned the literal sentinel, or the generic transport error — both prove nil was rejected"). The condition

if err == nil || !errors.Is(err, errors.Unwrap(err)) && err.Error() != "mxgateway: acknowledge alarm request is required" {
    // ...
}

evaluates expressions for side effects only and asserts nothing — Go's && binds tighter than ||, the body is empty, and the actual nil check happens on the very next if err == nil. The block is effectively dead code masquerading as a check. It also evaluates errors.Unwrap(err) regardless of err's shape, and would call err.Error() even when err might be a wrapped status error whose message wording the gateway is free to change — making the apparent assertion brittle on top of being dead.

Recommendation: Drop the empty-body if entirely (the subsequent if err == nil { t.Fatalf(...) } already enforces the contract), or, if the intent is to additionally pin the literal error message for the sentinel path, replace it with a real assertion (if err.Error() != "mxgateway: acknowledge alarm request is required" { t.Fatalf(...) }) and remove the spurious errors.Is(err, errors.Unwrap(err)) clause.

Resolution: 2026-05-20 — Removed the empty-body if in TestAcknowledgeAlarmRejectsNilRequest; the subsequent if err == nil { t.Fatalf(...) } already enforces the nil-rejection contract without the dead, brittle compound predicate.

Client.Go-012

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/go/cmd/mxgw-go/main.go:1063-1065`, `clients/go/cmd/mxgw-go/main.go:88-104`
Status	Resolved

Description: writeUsage lists the available subcommands as version|open-session|close-session|register|add-item|advise|subscribe-bulk|unsubscribe-bulk|write|stream-events|smoke|galaxy-test-connection|galaxy-last-deploy|galaxy-discover|galaxy-watch. Six subcommands wired into run are missing from this list: read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, and bench-read-bulk. A user invoking mxgw-go with no args or an unknown command (the two paths that print this banner) sees an incomplete CLI surface and may believe the bulk-write / read-bulk families are not implemented. The README does document them, but the inline usage banner is the first source of truth a CLI user consults.

Recommendation: Extend the usage string to include every command registered in the switch args[0] in run, or generate it from a single source-of-truth slice keyed on command name → handler so the two cannot drift again.

Resolution: 2026-05-20 — writeUsage now lists the previously missing read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, and bench-read-bulk subcommands alongside the original surface, so the no-args / unknown-command banner reflects every command wired into run.

Client.Go-013

Field	Value
Severity	Low
Category	Concurrency & thread safety
Location	`clients/go/cmd/mxgw-go/main.go:1246-1249`, `clients/go/cmd/mxgw-go/main.go:1257-1262`
Status	Resolved

Description: In runGalaxyWatch, the signal-cancellation branch carefully drains the buffered events channel after cancelStream() so the WatchDeployEvents goroutine can exit (for range events { }). The limit-reached branch (if *limit > 0 && count >= *limit { cancelStream(); return nil }) skips that drain and returns immediately. After the function returns, defer client.Close() runs and tears down the gRPC connection; in the gap before the connection close propagates, the WatchDeployEvents goroutine may still be blocked on case events <- event: (the channel is buffered to 16 but a slow producer can refill it) — the goroutine then exits via <-ctx.Done() because streamCtx was cancelled, so it isn't a permanent leak, but the two cancellation paths behave inconsistently and the limit-reached path can briefly hold a goroutine plus the gRPC stream while the client tears down underneath it.

Recommendation: Factor the drain into a helper and use it from both branches, e.g. after cancelStream() always for range events { } (and let the surrounding select/for re-evaluate <-errs if a terminal error was already buffered). Alternatively, drop the explicit drain in both branches and rely on defer cancelStream() plus defer client.Close() — but pick one model and apply it consistently.

Resolution: 2026-05-20 — The limit-reached branch in runGalaxyWatch now drains the buffered events channel (for range events { }) after cancelStream(), matching the signal-cancel branch. Both cancellation paths now wait for the WatchDeployEvents goroutine to exit before defer client.Close() tears the gRPC connection down.

Client.Go-014

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/go/mxgateway/session.go:602`, `clients/go/mxgateway/galaxy.go:189`
Status	Resolved

Description: Two stream Recv loops compare end-of-stream with err == io.EOF directly:

session.go:602 — if err == io.EOF || status.Code(err) == codes.Canceled || streamCtx.Err() != nil { return }
galaxy.go:189 — if recvErr == io.EOF { return }

gRPC's generated Recv() does return the io.EOF sentinel directly today, so the comparisons work in practice. However, the Go idiom (and the project's docs/style-guides/GoStyleGuide.md) is to use errors.Is(err, io.EOF) so future wrapping (e.g. an interceptor decorating Recv errors) does not silently flip the loop from "stream finished normally" to "stream produced an error". The mxgateway client itself wraps non-EOF Recv errors in *GatewayError, which errors.Is already supports — using errors.Is keeps both paths consistent.

Recommendation: Replace recvErr == io.EOF / err == io.EOF with errors.Is(err, io.EOF) (the errors package is already imported in both files).

Resolution: 2026-05-20 — Both stream Recv loops now use errors.Is(err, io.EOF): session.go already imported errors, and galaxy.go gained the missing errors import alongside the recvErr == io.EOF → errors.Is(recvErr, io.EOF) change, keeping EOF detection robust against any future Recv-error wrapping.

Client.Go-015

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/go/cmd/mxgw-go/main.go:410-512`
Status	Resolved

Description: runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool) accepts secured but never uses it — the routing is keyed on command (the string "write-bulk" / "write2-bulk" / "write-secured-bulk" / "write-secured2-bulk"). The function ends with _ = secured // currently only used for routing above; reserved for future per-variant validation, which is misleading because secured is not in fact used for routing. The four wrapper functions (runWriteBulk, runWrite2Bulk, runWriteSecuredBulk, runWriteSecured2Bulk) all pass a secured argument that has no effect. The four CLI options -current-user-id, -verifier-user-id are unconditionally registered on every variant, including the non-secured ones, so a write-bulk invocation that passes -current-user-id 42 silently does nothing. Either remove secured and the dead _ = secured comment, or use it to gate the registration of secured-only flags so wrong combinations are rejected with a clean error.

Recommendation: Drop the secured parameter (the command switch already distinguishes the four variants) and the misleading _ = secured line; or, if validation is the goal, branch flag registration on secured so secured-only flags are unavailable for the non-secured variants and emit a clean usage error if they appear.

Resolution: 2026-05-20 — Dropped the unused secured parameter from runWriteBulkVariant (the command switch already distinguishes the four variants) and removed the misleading _ = secured line. The variant is now derived locally from command and used to gate flag registration: -current-user-id / -verifier-user-id are only registered for the secured variants and -user-id only for Write/Write2, so a wrong-variant flag now fails with a clean flag provided but not defined usage error instead of silently no-op'ing. The four runWrite*Bulk wrappers were updated to match the new signature.

Client.Go-016

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/go/mxgateway/galaxy_test.go:382-429`
Status	Resolved

Description: fakeGalaxyServer.watchSendInterval is declared on the test fake and consulted inside WatchDeployEvents (if s.watchSendInterval > 0 { ... }) but no test in the package sets a non-zero value. The dead field plus its branch were presumably added to support a backpressure / pacing test that was never landed, and now the only effect is reader confusion ("which test uses this?") and a pointlessly larger fake. Backpressure on the bootstrap-plus-events sequence is also genuinely worth testing, given that WatchDeployEvents writes to a 16-deep buffered channel.

Recommendation: Either delete the unused watchSendInterval field and its branch in WatchDeployEvents, or add the test it was added for — e.g. one that pumps more than 16 events with a small interval and asserts the consumer keeps up without losing or reordering events. Linking the field to a // for TestX comment if it stays would also help.

Resolution: 2026-05-20 — Removed the unused watchSendInterval field from fakeGalaxyServer and the corresponding if s.watchSendInterval > 0 { ... } branch in WatchDeployEvents; no test set the field, so the dead code path is gone and the fake is leaner. gofmt -w reflowed the struct to drop the no-longer-needed field-name padding.

Client.Go-017

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/go/cmd/mxgw-go/main.go:954-991`
Status	Resolved

Description: parseValue returns the raw strconv.ParseBool / strconv.ParseInt / strconv.ParseFloat error verbatim — no wrap with %w and no indication of which CLI flag was the source. A user running mxgw-go write -type int32 -value foo sees

strconv.ParseInt: parsing "foo": invalid syntax

with no mention of -value, -type, or which subcommand failed. The same pattern hits every typed branch (bool, int32, int64, float, double). Compare with the sibling helpers in the same file: parseInt32List wraps with "invalid item handle %q: %w" (Client.Go-003 resolution) and parseRfc3339Timestamp wraps with "invalid RFC 3339 timestamp %q: %w". parseValue was missed and is inconsistent with those two. The GoStyleGuide (docs/style-guides/GoStyleGuide.md, "Errors" section) requires "Wrap errors with useful context using %w."

Recommendation: Wrap each strconv error with the offending input and type, e.g. return nil, fmt.Errorf("invalid %s value %q: %w", valueType, valueText, err). The wrapper handles all five typed branches uniformly without a per-branch change.

Resolution: 2026-05-20 — Each typed branch of parseValue now wraps the bare strconv error with %w and names the offending flag and value ("invalid -value for -type %s: %q: %w"), so mxgw-go write -type int32 -value foo surfaces the source flag, the requested type, and the bad token while still letting errors.Is/As reach the underlying strconv sentinel. The new TestParseValueWrapsStrconvErrorWithFlagContext table-test pins all five typed branches (bool, int32, int64, float, double) to the new wrapper shape.

Client.Go-018

Field	Value
Severity	Low
Category	Concurrency & thread safety
Location	`clients/go/cmd/mxgw-go/main.go:593-623`
Status	Resolved

Description: runBenchReadBulk's warm-up and steady-state loops are wall-clock-only:

for time.Now().Before(warmupDeadline) {
    _, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
...
for time.Now().Before(steadyDeadline) {
    callStart := time.Now()
    results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
    ...
}

Neither loop checks ctx.Done() / ctx.Err(). If the parent context is cancelled (e.g. the operator Ctrl+Cs the benchmark, or the cross-language bench driver scripts/bench-read-bulk.ps1 times out and kills the child early), the loops keep iterating until their wall-clock deadlines elapse. Each ReadBulk call inside fails fast (the gRPC call inherits the cancelled context and returns context.Canceled), but the steady-state loop counts those as failedCalls++ and keeps spinning — wasting CPU and inflating the failedCalls and latencyMs.max figures the PowerShell driver collates across all five clients. The .NET, Rust, Python, and Java bench drivers should be checked for the same shape, but the Go one is the only one being reviewed here. Note that runBenchReadBulk is the only Go CLI command that does NOT register its own signal handler (compare with runGalaxyWatch which does via signal.NotifyContext).

Recommendation: Drop out of both loops as soon as ctx.Err() != nil. Concretely, change the loop conditions to for time.Now().Before(warmupDeadline) && ctx.Err() == nil (and the same on steadyDeadline), or use a select { case <-ctx.Done(): break loop; default: } guard at the top of each iteration. The cross-language bench shape (durationMs, totalCalls, failedCalls, latencyMs) stays the same — the bench just exits sooner and reports the truncated window faithfully.

Resolution: 2026-05-20 — Both the warm-up and steady-state loops in runBenchReadBulk now carry an && ctx.Err() == nil guard alongside the wall-clock check, so a cancelled parent context (Ctrl+C, or the cross-language bench driver killing the child early) breaks the loop instead of spinning failing ReadBulk calls until the deadline elapses. The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via durationMs / totalCalls.

Client.Go-019

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/go/cmd/mxgw-go/main.go:710-716`, `clients/go/cmd/mxgw-go/main.go:1204,1213`
Status	Resolved

Description: The CLI advertises two timestamp flags as "RFC3339" but parses them with different layouts:

-timestamp-value (write2/write-secured2 bulk): parseRfc3339Timestamp uses time.RFC3339Nano, which accepts both 2026-04-28T10:00:00Z and 2026-04-28T10:00:00.123456789Z.
-last-seen-deploy-time (galaxy-watch): time.Parse(time.RFC3339, ...), which rejects fractional seconds.

A user copy-pasting an ObservedAt timestamp from galaxy-watch -json (which is emitted as RFC3339Nano by formatDeployEvent) directly into -last-seen-deploy-time will get a parse error if the source value carried a fractional component, even though both flag descriptions say "RFC3339". The flag help string at main.go:1204 literally says "RFC3339 timestamp", and the README example uses 2026-04-28T10:00:00Z (whole seconds only), so the issue is silent until a fractional timestamp comes from the gateway.

Recommendation: Switch the galaxy-watch parse to time.RFC3339Nano to match parseRfc3339Timestamp (and the gateway's own emit format). One line change at main.go:1213. While there, update the flag help string and the README example to say "RFC 3339 (with optional fractional seconds)" so the two flags are documented uniformly.

Resolution: 2026-05-20 — runGalaxyWatch now parses -last-seen-deploy-time with time.RFC3339Nano, matching parseRfc3339Timestamp and the gateway's own formatDeployEvent emit format; the layout is strictly broader than the previous time.RFC3339 (whole-second values still parse). The flag help string changed to "RFC 3339 timestamp (with optional fractional seconds)" and the clients/go/README.md example was extended with an explicit fractional-seconds line so the two flags advertise the same surface.

Client.Go-020

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/go/cmd/mxgw-go/main.go:753-802`, `clients/go/cmd/mxgw-go/main.go:1199-1275`
Status	Resolved

Description: The two long-running stream commands have divergent Ctrl+C UX:

runGalaxyWatch registers a signal handler:

signalCtx, stopSignals := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer stopSignals()
streamCtx, cancelStream := context.WithCancel(signalCtx)

so Ctrl+C drains buffered events and returns cleanly.

runStreamEvents does not register any signal handler — its parent context is context.Background() from runWithIO, so Ctrl+C abruptly kills the process. The deferred subscription.Close() and client.Close() never run, leaving the server-side stream to fault out on a torn TCP connection rather than a clean cancel.

The two commands are otherwise structurally identical (subscribe + loop until limit or external stop) — the inconsistency is one half of a pair that was missed when galaxy-watch was added. Worth flagging because it directly affects what an integrator who Ctrl+Cs stream-events sees in the gateway's logs (a transport reset rather than a codes.Canceled).

Recommendation: Mirror the runGalaxyWatch pattern in runStreamEvents: wrap ctx in signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM), derive streamCtx from it, and let defer subscription.Close() / defer cancelStream() tear the stream down on signal. The change is roughly six lines and brings the two stream commands into parity. Optionally factor a shared withSignals(ctx) (context.Context, context.CancelFunc) helper if a third stream command lands.

Resolution: 2026-05-20 — runStreamEvents now installs signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM) (with a deferred stopSignals()) and derives streamCtx from the resulting signal-aware context, mirroring runGalaxyWatch. Ctrl+C now cancels the gRPC stream cleanly — the gateway sees codes.Canceled instead of a torn TCP connection — and the deferred subscription.Close() / client.Close() actually run on signal. The two long-running stream commands now share the same shutdown UX.

Client.Go-021

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/go/cmd/mxgw-go/main_test.go`, `clients/go/cmd/mxgw-go/main.go:363-520,522-655`
Status	Resolved

Description: The six bulk / bench subcommands wired into run (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk) have no CLI-level unit tests in main_test.go. In particular, the Client.Go-015 resolution claims:

-current-user-id / -verifier-user-id are only registered for the secured variants and -user-id only for Write/Write2, so a wrong-variant flag now fails with a clean flag provided but not defined usage error instead of silently no-op'ing.

But there is no test asserting that, e.g., mxgw-go write-bulk -current-user-id 1 ... returns a "flag provided but not defined" error, or that mxgw-go write-secured-bulk -user-id 1 ... does the same. A future refactor of runWriteBulkVariant (notably one that re-introduced the secured parameter) could silently re-permit the wrong flags without breaking any test. The same gap applies to: parameter validation in runReadBulk (bulk size, empty session/items rejection), the value-count vs handle-count mismatch error in runWriteBulkVariant:447, and runBenchReadBulk's bulk-size/duration-seconds positivity checks.

mxgateway/client_session_test.go already covers the library-level happy paths (TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults, TestReadBulkForwardsTimeoutAndUnpacksCachedFlag, TestSubscribeBulkBuildsOneBulkCommandAndReturnsResults), so this finding is about CLI surface area only.

Recommendation: Add table-driven tests in cmd/mxgw-go/main_test.go along the existing TestParseInt32List* and TestParseValueBuildsTypedValue style:

TestRunWriteBulkVariantGatesSecuredFlags: invoke runWithIO with write-bulk -current-user-id 1 ... and write-secured-bulk -user-id 1 ..., assert each returns an error matching flag provided but not defined.
TestRunReadBulkRejectsMissingArgs: invoke runWithIO with read-bulk (no -session-id), assert the documented "session-id and items are required" error.
TestRunBenchReadBulkRejectsNonPositiveBulkSize / TestRunBenchReadBulkRejectsNonPositiveDuration: pin the positivity checks at main.go:544-549.
TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues: pin the len(handles) != len(valueTexts) error at main.go:447.

Each is a few lines and routes through the existing runWithIO entry point, so it does not need a bufconn fake.

Resolution: 2026-05-20 — Added CLI-level table-driven regression tests in cmd/mxgw-go/main_test.go routed through runWithIO, so they need no bufconn fake: TestRunWriteBulkVariantGatesSecuredFlags pins Client.Go-015 by asserting write-bulk -current-user-id, write-bulk -verifier-user-id, write2-bulk -current-user-id, write-secured-bulk -user-id, and write-secured2-bulk -user-id all surface flag provided but not defined; TestRunReadBulkRejectsMissingArgs pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; TestRunBenchReadBulkRejectsNonPositiveBulkSize and TestRunBenchReadBulkRejectsNonPositiveDuration pin the positivity checks; TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues pins the explicit item-handles count ... does not match values count ... error. go test ./... passes.

Client.Go-022

Field	Value
Severity	Medium
Category	Code organization & conventions
Location	`clients/go/cmd/mxgw-go/main.go:398-412,417-519`
Status	Resolved

Description: Commit 8aaab82 ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit:

runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool) accepts a secured parameter that is never used for routing — the switch at main.go:478 keys on command ("write-bulk" / "write2-bulk" / "write-secured-bulk" / "write-secured2-bulk"). The function ends with the misleading _ = secured // currently only used for routing above; reserved for future per-variant validation at main.go:517, the exact line Client.Go-015 called out as both inaccurate (secured is not used for routing) and reserved-without-purpose.
All four wrappers (runWriteBulk, runWrite2Bulk, runWriteSecuredBulk, runWriteSecured2Bulk at main.go:398-412) pass a secured argument that has no effect.
-current-user-id, -verifier-user-id, and -user-id are all registered unconditionally for every variant (main.go:427-429), so mxgw-go write-bulk -current-user-id 5 -item-handles 1 -values foo silently no-ops the flag instead of failing with flag provided but not defined. The secured-only flag gating that Client.Go-015's resolution claimed is gone again.

Because the surrounding test file (main_test.go) lost the regression tests promised by Client.Go-021 between then and this commit, nothing caught the re-introduction.

Recommendation: Re-apply the Client.Go-015 fix on this re-added code. Drop the secured parameter and the _ = secured line (the command switch is the only routing key); derive the variant locally from command; register -current-user-id / -verifier-user-id only inside the secured branches and -user-id only inside Write/Write2 — so a wrong-variant flag fails with a clean flag provided but not defined usage error. Re-add the TestRunWriteBulkVariantGatesSecuredFlags table-test from the Client.Go-021 resolution so a future regression is caught by CI.

Resolution: 2026-05-24 — Re-applied the Client.Go-015 fix. Dropped the unused secured parameter from runWriteBulkVariant and the misleading _ = secured line; the variant is now derived locally from command and gates flag registration. -current-user-id / -verifier-user-id are only registered for the secured variants and -user-id only for Write/Write2, so a wrong-variant flag now fails with a clean flag provided but not defined usage error. The four runWrite*Bulk wrappers were updated to match the new signature. Regression test TestRunWriteBulkVariantGatesSecuredFlags in cmd/mxgw-go/main_test.go (table-driven across all five wrong-variant flag/command pairs) was re-added; it failed pre-fix on every case ("session-id, item-handles, and values are required" reached because the flag was silently accepted), and passes post-fix with the expected flag provided but not defined.

Client.Go-023

Field	Value
Severity	Medium
Category	Concurrency & thread safety
Location	`clients/go/cmd/mxgw-go/main.go:604-606,616-632`
Status	Resolved

Description: runBenchReadBulk's warm-up and steady-state loops are wall-clock-only again:

warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second)
timeout := time.Duration(*timeoutMs) * time.Millisecond
for time.Now().Before(warmupDeadline) {
    _, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
...
for time.Now().Before(steadyDeadline) {
    callStart := time.Now()
    results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
    ...
}

Neither loop checks ctx.Done() / ctx.Err(). This is exactly the shape Client.Go-018 documented and was marked Resolved against an earlier commit — the && ctx.Err() == nil guard the Resolution added has been removed by commit 8aaab82 (which re-added runBenchReadBulk). If the cross-language driver scripts/bench-read-bulk.ps1 times out and kills the child early, or an operator Ctrl+Cs the bench, each ReadBulk call inside fails fast with context.Canceled but the steady-state loop counts those as failedCalls++ and keeps spinning for the full *durationSeconds window. That inflates failedCalls and latencyMs.max in the JSON the PowerShell driver collates across all five clients, and wastes CPU. Note that runBenchReadBulk is also the only Go CLI command that does not register its own signal handler — runStreamAlarms (added in the same diff range) and runGalaxyWatch both do.

Recommendation: Re-apply the Client.Go-018 fix: change both loop conditions to for time.Now().Before(warmupDeadline) && ctx.Err() == nil (and the same on steadyDeadline). The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via durationMs / totalCalls. Optionally add the signal.NotifyContext pattern used by runStreamAlarms and runGalaxyWatch so direct Ctrl+C on the bench also short-circuits cleanly.

Resolution: 2026-05-24 — Re-applied the Client.Go-018 fix. Both the warm-up and steady-state loops in runBenchReadBulk now carry an && ctx.Err() == nil guard alongside the wall-clock check, so a cancelled parent context breaks the loops instead of spinning failing ReadBulk calls until the deadline elapses. The cross-language bench JSON shape (durationMs / totalCalls) is unchanged — the truncated window is just reported faithfully. Regression test TestRunBenchReadBulkRespectsContextCancellation in cmd/mxgw-go/main_test.go spins up a localhost TCP gRPC fake (benchFakeGateway) that answers OpenSession + Invoke for the register/subscribe/read/unsubscribe sequence, runs the bench with -warmup-seconds 5 -duration-seconds 5, cancels the ctx after 150ms, and asserts the bench returns in under 4s. Pre-fix the test ran for the full 10s (warmup+duration); post-fix it returns within ~250ms.

Client.Go-024

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76`
Status	Resolved

Description: The five new bulk SDK methods on Session and the new Client.StreamAlarms method have no unit tests in clients/go/mxgateway/:

Session.WriteBulk (session.go:395)
Session.Write2Bulk (session.go:418)
Session.WriteSecuredBulk (session.go:442)
Session.WriteSecured2Bulk (session.go:465)
Session.ReadBulk (session.go:495) — in particular the timeout > uint32(^uint32(0)) ms saturation branch at session.go:504-509 is interesting (a time.Duration is int64 ns; the conversion silently clamps to MaxUint32 ms ≈ 49 days) and is entirely untested.
Client.StreamAlarms (alarms.go:65) — no equivalent of TestQueryActiveAlarmsPassesFilterPrefix exists for StreamAlarms, even though they share the same fake-server pattern.

mxgateway/client_session_test.go and mxgateway/alarms_test.go cover the prior single-item and bulk methods (TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults, TestReadBulkForwardsTimeoutAndUnpacksCachedFlag, etc. were referenced by Client.Go-021 but Grep of the current tree confirms they no longer exist — the test file ends at commit 8aaab82 without any of them), so library-level happy paths and error paths for the new shape are entirely uncovered.

Recommendation: Add bufconn-fake-server tests in mxgateway/client_session_test.go and mxgateway/alarms_test.go:

TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults — confirm the protobuf payload is MX_COMMAND_KIND_WRITE_BULK carrying every entry, and the reply's BulkWriteResult list flows back. Mirror for Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk.
TestReadBulkForwardsTimeoutAndUnpacksCachedFlag — pin timeoutMs arithmetic for 0, a normal time.Duration, and the > MaxUint32 ms saturation branch; assert WasCached propagates from each BulkReadResult.
TestStreamAlarmsPassesFilterPrefix — mirror TestQueryActiveAlarmsPassesFilterPrefix; assert request flows and stream Recv returns each fake AlarmFeedMessage (active-alarm snapshot, snapshot-complete sentinel, transition).

These plus nil / empty-slice rejection tests for each bulk method close out the new public surface.

Resolution: 2026-05-24 — Added SDK-level tests using the existing newBufconnClient / newBufconnClientWithAlarms fake-gateway pattern. In clients/go/mxgateway/client_session_test.go: TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults confirms the protobuf payload carries MX_COMMAND_KIND_WRITE_BULK with all entries and returns per-entry results; TestWriteBulkRejectsNilEntries pins the nil guard on all five new bulk methods (WriteBulk/Write2Bulk/WriteSecuredBulk/WriteSecured2Bulk/ReadBulk); TestReadBulkForwardsTimeoutAndUnpacksCachedFlag pins normal timeoutMs arithmetic and WasCached propagation; TestReadBulkSaturatesTimeoutAboveMaxUint32 pins the > MaxUint32 ms clamp at session.go:504-509. In clients/go/mxgateway/alarms_test.go: TestStreamAlarmsPassesFilterPrefixAndReceivesFeedMessages asserts request flows and stream Recv returns each fake AlarmFeedMessage (active-alarm snapshot, snapshot-complete sentinel) with auth metadata attached; TestStreamAlarmsRejectsNilRequest pins the nil guard. The fakeGatewayWithAlarms was extended with a StreamAlarms method.

Client.Go-025

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`clients/go/mxgateway/session.go:395-485,495-525`
Status	Resolved

Description: The five new bulk methods (WriteBulk, Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) each guard with if entries == nil { return error } and an upper-bound ensureBulkSize check, but accept a non-nil empty slice (e.g. []*WriteBulkEntry{} or []string{}). The call then sends an MX_COMMAND_KIND_WRITE_BULK (or peer) command with zero entries across the gRPC wire to the gateway, which forwards to the worker for a no-op round trip. This is the same shape Client.Go-015 / Client.Go-021 were written against (the CLI now also accepts mxgw-go write-bulk -item-handles , -values , which parseInt32List returns as empty without error). The pre-existing bulk methods (AddItemBulk, AdviseItemBulk, etc. at session.go:253-343) carry the identical pattern, so this is a long-standing convention — but it's still a real cost on the hot path. The Java / .NET / Rust / Python clients should be checked for parity if this is fixed.

Recommendation: Pick one model and apply it consistently to every bulk SDK method (new and pre-existing):

Reject empty slices early: if len(entries) == 0 { return nil, errors.New("mxgateway: write bulk entries are required") } — this is the spelling already used on the nil branch, and matches the GoStyleGuide's "Return errors instead of panicking" / "Wrap errors with useful context" voice.
Or document at the SDK doc-comment that an empty slice is a valid no-op (it produces []*BulkWriteResult{} and one wasted gRPC round trip), and have the CLI reject empty -item-handles / -values before dialing.

Option 1 is cheaper for callers (one less round trip and one clearer error message) and removes the empty-list footgun for cross-language drivers that may pass empty arrays from PowerShell , splits.

Resolution: 2026-05-24 — Audited the four other clients for parity: .NET (MxGatewaySession.cs:520), Rust (session.rs:408 via ensure_bulk_size), Python (session.py:350), and Java (MxGatewaySession.java:451) all accept empty slices and make the round-trip with zero entries. To preserve cross-language behaviour (no error on empty input) while removing the wasteful round trip on the Go hot path, all five new bulk methods (WriteBulk, Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) now short-circuit on len(entries) == 0 and return an empty result slice without invoking the command. The nil guard is preserved (returns the "...are required" error) and the SDK doc comments now document the empty-slice no-op shape explicitly. Regression test TestBulkMethodsShortCircuitOnEmptySliceWithoutRoundTrip in client_session_test.go invokes each of the five methods with an empty slice and asserts (a) no error, (b) zero-length result, and (c) fake.invokeRequest == nil (no gRPC round trip). Pre-fix the test failed on the first assertion ("WriteBulk(empty) sent a round trip; expected short-circuit"); post-fix it passes.

Client.Go-026

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/go/cmd/mxgw-go/main.go:1196-1222`
Status	Resolved

Description: runBatch reads command lines with a default bufio.Scanner:

scanner := bufio.NewScanner(in)
for scanner.Scan() {
    ...
}
return scanner.Err()

The default bufio.Scanner token size is 64 KiB (bufio.MaxScanTokenSize). One long subcommand invocation — e.g. a subscribe-bulk -items tag001,tag002,...,tag5000 line with several thousand tag references, or a write-bulk driven by a Galaxy-sized handle list — crosses 64 KiB and the scanner errors out with bufio.ErrTooLong mid-session. Two follow-on effects:

The error is returned only at the outer return scanner.Err(), so the cross-language batch harness sees neither the EOR sentinel for the failing command nor any of the still-pending commands queued after it.
The fix is one line: scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024) (or larger), which lifts the cap on one-line tokens without buffering the whole input.

A second weakness: strings.Fields(line) splits on whitespace and does no quote-aware tokenisation, so a literal -comment "with spaces" becomes four args (-comment, "with, spaces"). This isn't a Go-specific bug — every language's batch tokeniser has this surface — but it means certain CLI flags that take free-text values (acknowledge-alarm -comment, acknowledge-alarm -operator) can't carry spaces in batch mode. Worth documenting if not fixed.

Recommendation: Call scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024) immediately after bufio.NewScanner so a long bulk-args line doesn't abort the session. If runBatch is intended to support free-text flag values (the acknowledge-alarm -comment shape is the obvious case), swap strings.Fields for a quote-aware tokeniser (mvdan.cc/sh/v3/syntax or a small inline state machine matching the .NET/Rust harness shape). Otherwise add a one-line comment to runBatch's doc-comment that batch-mode arguments must not contain whitespace.

Resolution: 2026-05-24 — runBatch now sets scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024) immediately after bufio.NewScanner, lifting the per-line cap from 64 KiB to 16 MiB so a long bulk-args line (several thousand handles) no longer aborts the session. If a single line still exceeds the 16 MiB cap, the resulting scanner.Err() is now framed as a final error-with-sentinel (JSON payload + batchEOR) and returned, so the harness never sees an unframed bufio failure. Regression test TestRunBatchHandlesLongCommandLine in cmd/mxgw-go/main_test.go feeds an ~88 KiB subscribe-bulk line (above the old 64 KiB default) followed by version --json and asserts two EOR sentinels are emitted — pre-fix the test failed with "bufio.Scanner: token too long" returned from runBatch; post-fix both commands run and the session completes cleanly. Quote-aware tokenisation is out of scope for this finding (the recommendation accepts either fix); the strings.Fields shape is unchanged.

Client.Go-027

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/go/cmd/mxgw-go/main.go:1195-1206`
Status	Resolved

Description: runBatch's doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches:

for scanner.Scan() {
    line := scanner.Text()
    if line == "" {
        break
    }
    ...
}

The cross-language batch harness sends \n-delimited command lines into the child's stdin and reads back EOR-framed results — there is no documented "empty line ends the session" framing in any other client (the .NET, Rust, Python, and Java drivers should be diffed if a fix lands). A stray blank line in a longer script (very easy in PowerShell here-strings, which prefer trailing newlines) terminates the entire Go batch session silently — the harness sees a clean exit and reads no EOR for the still-pending commands, which presents as "the next command never ran" with no error path.

The two cases the empty-line check seems to cover — (a) operator pressing Enter on an interactive mxgw-go batch prompt, and (b) \n\n framing between commands — would both be cleaner under a continue (skip blank lines, keep reading until real EOF) than a break. Option (a) isn't a real workflow (batch is a programmatic interface); option (b) is the actual bug.

Recommendation: Change if line == "" { break } to if line == "" { continue } (alongside the existing len(args) == 0 continue, which is then redundant — keep one, drop the other for clarity). Update the runBatch doc-comment to read "only stdin EOF ends the session" and drop the "or an empty line" clause. If the interactive ergonomic is genuinely wanted, gate it on isatty(stdin) so the batch-from-pipe case isn't affected.

Resolution: 2026-05-24 — runBatch no longer treats a blank line as end-of-session. The if line == "" { break } early-exit was removed; blank or whitespace-only lines now fall through the existing if len(args) == 0 { continue } guard (kept as the single blank-line skip rule for clarity), so only stdin EOF ends the session. The doc-comment was updated to read "Blank lines are skipped; only stdin EOF ends the session." Regression test TestRunBatchSkipsBlankLinesAndContinuesUntilEOF in cmd/mxgw-go/main_test.go feeds version --json\n\nversion --json\n (a stray blank line between two commands) and asserts two EOR sentinels are emitted — pre-fix the test failed with "EOR sentinel count = 1, want 2" because the blank line broke the loop and the second command never ran; post-fix both commands run.

61 KiB Raw Blame History