Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
38 KiB
Code Review — Client.Go
| Field | Value |
|---|---|
| Module | clients/go |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | a020350 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
A re-review of commit a020350 (which resolved Client.Go-011..016). gofmt -l .,
go vet ./..., go build ./..., and go test ./... -count=1 are all clean.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Prior Client.Go-001/003/007/011 remain resolved. No new correctness bugs found. |
| 2 | mxaccessgw conventions | gofmt -l . and go vet ./... clean; Client.Go-004 stays resolved. No new issues. |
| 3 | Concurrency & thread safety | Client.Go-013 resolved. New issue: runBenchReadBulk's warm-up + steady-state wall-clock loops ignore ctx cancellation, so a Ctrl+C or parent-cancel keeps spinning ReadBulk calls until the wall-clock deadline (Client.Go-018). |
| 4 | Error handling & resilience | Client.Go-014 resolved. New issue: parseValue returns bare strconv errors with no %w wrap and no CLI-context, so a typo like -type int32 -value foo surfaces as strconv.ParseInt: parsing "foo": invalid syntax without naming the flag — out of line with the GoStyleGuide "wrap errors with useful context using %w" rule (Client.Go-017). |
| 5 | Security | No issues found — TLS-by-default with TLS 1.2 floor, API-key redaction in CLI JSON output, no secret logging. |
| 6 | Performance & resource management | No issues found — defer client.Close() / defer subscription.Close() applied consistently; bench-read-bulk preallocates the latency slice. |
| 7 | Design-document adherence | No new issues. Lazy grpc.NewClient + readiness probe (Client.Go-005) and the shared dial helper (Client.Go-009) are applied uniformly across Dial and DialGalaxy. |
| 8 | Code organization & conventions | Client.Go-015 resolved. New issue: runStreamEvents does not install a signal handler (Ctrl+C kills the process abruptly), while runGalaxyWatch does — the two long-running stream commands have divergent shutdown UX (Client.Go-020). |
| 9 | Testing coverage | Client.Go-008/016 resolved. New issue: the six new bulk and bench subcommands (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk) have no CLI-level unit tests — in particular the Client.Go-015 secured-flag-gating fix has no regression test (Client.Go-021). |
| 10 | Documentation & comments | Client.Go-010/012 resolved. New issue: runGalaxyWatch parses -last-seen-deploy-time with time.RFC3339 (no fractional seconds), while parseRfc3339Timestamp for -timestamp-value accepts time.RFC3339Nano — the CLI advertises "RFC 3339" for both but quietly differs on sub-second support (Client.Go-019). |
Findings
Client.Go-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | clients/go/mxgateway/errors.go:88-93, clients/go/mxgateway/errors.go:117-128 |
| Status | Resolved |
Description: MxAccessError.Unwrap returns e.Command directly. EnsureMxAccessSuccess constructs &MxAccessError{Reply: reply} with Command left nil (the HRESULT / failing-MxStatusProxy path). When Command is a nil *CommandError, Unwrap() returns a non-nil error interface wrapping a nil pointer. Consequently errors.As(err, &ce) for *CommandError returns true while setting ce to nil — a caller writing the idiomatic if errors.As(err, &commandErr) { use commandErr.Status } nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-Command path.
Recommendation: Make Unwrap return an untyped nil when Command is nil: if e == nil || e.Command == nil { return nil }; return e.Command. Add a test for the HRESULT-only MxAccessError asserting errors.As(err, &ce) is false.
Resolution: Resolved 2026-05-18: MxAccessError.Unwrap now returns an untyped nil when Command is nil, so errors.As no longer binds a typed-nil *CommandError; added errors_test.go regression coverage for the HRESULT-only and populated-Command paths.
Client.Go-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | clients/go/mxgateway/session.go:440-516 |
| Status | Resolved |
Description: For the Events/EventsAfter compatibility API (cancelWhenResultBufferFull == true), when the 16-slot results channel is full sendEventResult cancels and returns false; the goroutine returns and close(results) runs — the consumer sees the channel close with no EventResult{Err: ...} ever delivered. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
Recommendation: Before cancelling on a full buffer, deliver a terminal EventResult carrying an explicit error (e.g. ErrEventBufferOverflow). Document the behaviour on Session.Events; steer callers to SubscribeEvents (which blocks instead of dropping).
Resolution: Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed results with no terminal result. Added the exported sentinel ErrEventBufferOverflow (errors.go); sendEventResult now, on a full buffer, cancels the stream then calls the new deliverTerminalResult helper, which evicts one of the oldest buffered events to make room and places EventResult{Err: ErrEventBufferOverflow} so it becomes the consumer's last item before the channel closes. The previously lossy regression test (TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned) was re-pointed to assert the terminal ErrEventBufferOverflow result is delivered. clients/go/README.md now documents the bounded-buffer/overflow behaviour and steers no-loss callers to SubscribeEvents.
Client.Go-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | clients/go/cmd/mxgw-go/main.go:517-532 |
| Status | Resolved |
Description: parseInt32List calls panic(err) when an item-handles token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like -item-handles 1,foo crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in main.go.
Recommendation: Change parseInt32List to return ([]int32, error) and have runUnsubscribeBulk propagate the error, matching parseValue's pattern.
Resolution: Resolved 2026-05-18: confirmed against source — parseInt32List called panic(err) on a malformed token. It now returns ([]int32, error), wrapping the bad token (invalid item handle %q: %w); runUnsubscribeBulk parses item handles before dialing and returns the error, so a typo flows through runWithIO to os.Exit(2) like other validation paths. Regression tests TestParseInt32ListParsesValidTokens and TestParseInt32ListReturnsErrorOnMalformedToken added to cmd/mxgw-go/main_test.go.
Client.Go-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | clients/go/mxgateway/alarms_test.go:153-154, clients/go/mxgateway/galaxy_test.go:58-59 |
| Status | Resolved |
Description: gofmt -l flags alarms_test.go and galaxy_test.go for misaligned struct-literal field padding. The Go client README lists gofmt as part of the workflow and the repo enforces style; unformatted committed code breaks gofmt-gated checks and CI.
Recommendation: Run gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go.
Resolution: Resolved 2026-05-18: confirmed gofmt -l . flagged both files for misaligned struct-literal padding. Ran gofmt -w on mxgateway/alarms_test.go and mxgateway/galaxy_test.go; gofmt -l . is now clean for the whole module.
Client.Go-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | clients/go/mxgateway/client.go:64,68, clients/go/mxgateway/galaxy.go:83,87 |
| Status | Resolved |
Description: The client uses grpc.DialContext with grpc.WithBlock(). In current grpc-go both are deprecated in favour of grpc.NewClient (lazy connection). WithBlock also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard Dial error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
Recommendation: Migrate to grpc.NewClient; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by DialTimeout, and update the doc comment.
Resolution: Resolved 2026-05-18: confirmed Dial/DialGalaxy used the deprecated grpc.DialContext + grpc.WithBlock pair. Migrated both to the shared dial(ctx, opts) helper, which now builds a lazy connection with grpc.NewClient and runs an explicit waitForReady readiness probe (Connect + WaitForStateChange until connectivity.Ready) bounded by DialTimeout — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: grpc.NewClient defaults the target scheme to dns, so the bufconn test harnesses (client_session_test.go, alarms_test.go, galaxy_test.go) were updated to use passthrough:///bufnet so the fake target reaches the context dialer. New tests TestDialFailsFastWhenGatewayUnreachable and TestDialReadinessProbeReachesReady cover the probe; go vet reports no deprecation. clients/go/README.md documents the lazy-connect + readiness-probe semantics.
Client.Go-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/go/mxgateway/errors.go:9-130 |
| Status | Resolved |
Description: docs/ClientLibrariesDesign.md recommends a high-level error taxonomy (TransportError, AuthenticationError, TimeoutError, etc.). The Go client collapses all transport/gRPC failures into a single GatewayError with no way to classify transient (Unavailable, DeadlineExceeded) vs permanent (Unauthenticated, InvalidArgument) without manually unwrapping and calling status.Code.
Recommendation: Add a helper (e.g. IsTransient(err) bool) or expose the gRPC codes.Code on GatewayError, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
Resolution: Resolved 2026-05-18: implemented the recommended classification surface in errors.go rather than a full parallel type hierarchy (the existing GatewayError/CommandError/MxAccessError chain already separates transport from protocol from MXAccess failures). Added GatewayError.Code() (returns the wrapped gRPC codes.Code, OK for nil, Unknown for a non-status error) and the free function IsTransient(err error) bool, which unwraps through *GatewayError and any gRPC-status chain and reports true for Unavailable, DeadlineExceeded, ResourceExhausted, and Aborted. Tests TestGatewayErrorCode and TestIsTransient cover the matrix; clients/go/README.md documents both for retry/timeout/auth handling.
Client.Go-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/go/mxgateway/session.go:526-532 |
| Status | Resolved |
Description: newCorrelationID returns an empty string when crypto/rand.Read fails, silently producing an MxCommandRequest with no correlation id. rand.Read failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
Recommendation: Either propagate the error up through invokeCommand, or fall back to a time/counter-based id rather than an empty string.
Resolution: Resolved 2026-05-18: confirmed newCorrelationID returned "" on a rand.Read failure. It now falls back to a non-empty "fallback-<unixnano>-<counter>" id built from time.Now().UnixNano() and a process-wide atomic.Uint64 monotonic counter, so every command stays traceable even without entropy. The crypto/rand call was routed through a randRead package variable so the failure path is testable; TestNewCorrelationIDFallsBackOnRandFailure simulates a rand.Read failure and asserts the fallback id is non-empty, fallback- prefixed, and unique, and TestNewCorrelationIDUsesRandEntropy pins the happy path.
Client.Go-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/go/mxgateway/ (test files) |
| Status | Resolved |
Description: Several critical paths are untested: TLS credential resolution in resolveTransportCredentials (only the Plaintext path is exercised); the callContext deadline-shortening logic (client.go:198-204) including the negative-timeout disable case; and NativeValue/NativeArray for the array, raw-bytes, null, and unsupported-kind branches.
Recommendation: Add unit tests for resolveTransportCredentials precedence, callContext deadline arithmetic, and NativeValue/NativeArray round-trips for every kind.
Resolution: Resolved 2026-05-18: added clients/go/mxgateway/coverage_test.go. TestResolveTransportCredentialsPrecedence exercises every branch (explicit TransportCredentials, Plaintext, missing CACertFile error, TLSConfig + ServerNameOverride, default TLS floor) and TestResolveTransportCredentialsDoesNotMutateTLSConfig confirms the supplied *tls.Config is cloned. TestCallContextDeadlineArithmetic covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. TestNativeValueEdgeKinds, TestNativeArrayEdgeKinds, and TestNativeValueUnsupportedKind cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches.
Client.Go-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/go/mxgateway/galaxy.go:60-93,241-256, clients/go/mxgateway/client.go:41-74,190-205 |
| Status | Resolved |
Description: DialGalaxy/Dial and GalaxyClient.callContext/Client.callContext are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
Recommendation: Extract a shared unexported dial(ctx, opts) and a free callContext(opts, ctx) function, and have both client constructors call them.
Resolution: Resolved 2026-05-18: extracted the shared unexported dial(ctx, opts) (*grpc.ClientConn, error) (credential resolution, dial-option assembly, grpc.NewClient, readiness probe) and the free callContext(ctx, callTimeout) (context.Context, context.CancelFunc) into client.go. Dial/DialGalaxy and both (*Client).callContext/(*GalaxyClient).callContext methods now delegate to them; the duplicated dial and deadline code in galaxy.go was removed (its now-unused errors import dropped). This was done together with the Client.Go-005 migration so the grpc.NewClient change lives in exactly one place.
Client.Go-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/go/mxgateway/client.go:39-40 |
| Status | Resolved |
Description: The Dial doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated WithBlock behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
Recommendation: Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that DialTimeout bounds the initial connect attempt.
Resolution: Resolved 2026-05-18: alongside the Client.Go-005 migration, the Dial doc comment was rewritten to describe the lazy grpc.NewClient connection, the DialTimeout-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling ctx aborts the probe. DialGalaxy and the new dial/waitForReady/callContext helpers carry matching doc comments.
Client.Go-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | clients/go/mxgateway/alarms_test.go:66-73 |
| Status | Resolved |
Description: TestAcknowledgeAlarmRejectsNilRequest contains a no-op if with an empty body whose intent is documented in a comment ("Accept either: the helper returned the literal sentinel, or the generic transport error — both prove nil was rejected"). The condition
if err == nil || !errors.Is(err, errors.Unwrap(err)) && err.Error() != "mxgateway: acknowledge alarm request is required" {
// ...
}
evaluates expressions for side effects only and asserts nothing — Go's && binds tighter than ||, the body is empty, and the actual nil check happens on the very next if err == nil. The block is effectively dead code masquerading as a check. It also evaluates errors.Unwrap(err) regardless of err's shape, and would call err.Error() even when err might be a wrapped status error whose message wording the gateway is free to change — making the apparent assertion brittle on top of being dead.
Recommendation: Drop the empty-body if entirely (the subsequent if err == nil { t.Fatalf(...) } already enforces the contract), or, if the intent is to additionally pin the literal error message for the sentinel path, replace it with a real assertion (if err.Error() != "mxgateway: acknowledge alarm request is required" { t.Fatalf(...) }) and remove the spurious errors.Is(err, errors.Unwrap(err)) clause.
Resolution: 2026-05-20 — Removed the empty-body if in TestAcknowledgeAlarmRejectsNilRequest; the subsequent if err == nil { t.Fatalf(...) } already enforces the nil-rejection contract without the dead, brittle compound predicate.
Client.Go-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/go/cmd/mxgw-go/main.go:1063-1065, clients/go/cmd/mxgw-go/main.go:88-104 |
| Status | Resolved |
Description: writeUsage lists the available subcommands as version|open-session|close-session|register|add-item|advise|subscribe-bulk|unsubscribe-bulk|write|stream-events|smoke|galaxy-test-connection|galaxy-last-deploy|galaxy-discover|galaxy-watch. Six subcommands wired into run are missing from this list: read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, and bench-read-bulk. A user invoking mxgw-go with no args or an unknown command (the two paths that print this banner) sees an incomplete CLI surface and may believe the bulk-write / read-bulk families are not implemented. The README does document them, but the inline usage banner is the first source of truth a CLI user consults.
Recommendation: Extend the usage string to include every command registered in the switch args[0] in run, or generate it from a single source-of-truth slice keyed on command name → handler so the two cannot drift again.
Resolution: 2026-05-20 — writeUsage now lists the previously missing read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, and bench-read-bulk subcommands alongside the original surface, so the no-args / unknown-command banner reflects every command wired into run.
Client.Go-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | clients/go/cmd/mxgw-go/main.go:1246-1249, clients/go/cmd/mxgw-go/main.go:1257-1262 |
| Status | Resolved |
Description: In runGalaxyWatch, the signal-cancellation branch carefully drains the buffered events channel after cancelStream() so the WatchDeployEvents goroutine can exit (for range events { }). The limit-reached branch (if *limit > 0 && count >= *limit { cancelStream(); return nil }) skips that drain and returns immediately. After the function returns, defer client.Close() runs and tears down the gRPC connection; in the gap before the connection close propagates, the WatchDeployEvents goroutine may still be blocked on case events <- event: (the channel is buffered to 16 but a slow producer can refill it) — the goroutine then exits via <-ctx.Done() because streamCtx was cancelled, so it isn't a permanent leak, but the two cancellation paths behave inconsistently and the limit-reached path can briefly hold a goroutine plus the gRPC stream while the client tears down underneath it.
Recommendation: Factor the drain into a helper and use it from both branches, e.g. after cancelStream() always for range events { } (and let the surrounding select/for re-evaluate <-errs if a terminal error was already buffered). Alternatively, drop the explicit drain in both branches and rely on defer cancelStream() plus defer client.Close() — but pick one model and apply it consistently.
Resolution: 2026-05-20 — The limit-reached branch in runGalaxyWatch now drains the buffered events channel (for range events { }) after cancelStream(), matching the signal-cancel branch. Both cancellation paths now wait for the WatchDeployEvents goroutine to exit before defer client.Close() tears the gRPC connection down.
Client.Go-014
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/go/mxgateway/session.go:602, clients/go/mxgateway/galaxy.go:189 |
| Status | Resolved |
Description: Two stream Recv loops compare end-of-stream with err == io.EOF directly:
session.go:602—if err == io.EOF || status.Code(err) == codes.Canceled || streamCtx.Err() != nil { return }galaxy.go:189—if recvErr == io.EOF { return }
gRPC's generated Recv() does return the io.EOF sentinel directly today, so the comparisons work in practice. However, the Go idiom (and the project's docs/style-guides/GoStyleGuide.md) is to use errors.Is(err, io.EOF) so future wrapping (e.g. an interceptor decorating Recv errors) does not silently flip the loop from "stream finished normally" to "stream produced an error". The mxgateway client itself wraps non-EOF Recv errors in *GatewayError, which errors.Is already supports — using errors.Is keeps both paths consistent.
Recommendation: Replace recvErr == io.EOF / err == io.EOF with errors.Is(err, io.EOF) (the errors package is already imported in both files).
Resolution: 2026-05-20 — Both stream Recv loops now use errors.Is(err, io.EOF): session.go already imported errors, and galaxy.go gained the missing errors import alongside the recvErr == io.EOF → errors.Is(recvErr, io.EOF) change, keeping EOF detection robust against any future Recv-error wrapping.
Client.Go-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/go/cmd/mxgw-go/main.go:410-512 |
| Status | Resolved |
Description: runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool) accepts secured but never uses it — the routing is keyed on command (the string "write-bulk" / "write2-bulk" / "write-secured-bulk" / "write-secured2-bulk"). The function ends with _ = secured // currently only used for routing above; reserved for future per-variant validation, which is misleading because secured is not in fact used for routing. The four wrapper functions (runWriteBulk, runWrite2Bulk, runWriteSecuredBulk, runWriteSecured2Bulk) all pass a secured argument that has no effect. The four CLI options -current-user-id, -verifier-user-id are unconditionally registered on every variant, including the non-secured ones, so a write-bulk invocation that passes -current-user-id 42 silently does nothing. Either remove secured and the dead _ = secured comment, or use it to gate the registration of secured-only flags so wrong combinations are rejected with a clean error.
Recommendation: Drop the secured parameter (the command switch already distinguishes the four variants) and the misleading _ = secured line; or, if validation is the goal, branch flag registration on secured so secured-only flags are unavailable for the non-secured variants and emit a clean usage error if they appear.
Resolution: 2026-05-20 — Dropped the unused secured parameter from runWriteBulkVariant (the command switch already distinguishes the four variants) and removed the misleading _ = secured line. The variant is now derived locally from command and used to gate flag registration: -current-user-id / -verifier-user-id are only registered for the secured variants and -user-id only for Write/Write2, so a wrong-variant flag now fails with a clean flag provided but not defined usage error instead of silently no-op'ing. The four runWrite*Bulk wrappers were updated to match the new signature.
Client.Go-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/go/mxgateway/galaxy_test.go:382-429 |
| Status | Resolved |
Description: fakeGalaxyServer.watchSendInterval is declared on the test fake and consulted inside WatchDeployEvents (if s.watchSendInterval > 0 { ... }) but no test in the package sets a non-zero value. The dead field plus its branch were presumably added to support a backpressure / pacing test that was never landed, and now the only effect is reader confusion ("which test uses this?") and a pointlessly larger fake. Backpressure on the bootstrap-plus-events sequence is also genuinely worth testing, given that WatchDeployEvents writes to a 16-deep buffered channel.
Recommendation: Either delete the unused watchSendInterval field and its branch in WatchDeployEvents, or add the test it was added for — e.g. one that pumps more than 16 events with a small interval and asserts the consumer keeps up without losing or reordering events. Linking the field to a // for TestX comment if it stays would also help.
Resolution: 2026-05-20 — Removed the unused watchSendInterval field from fakeGalaxyServer and the corresponding if s.watchSendInterval > 0 { ... } branch in WatchDeployEvents; no test set the field, so the dead code path is gone and the fake is leaner. gofmt -w reflowed the struct to drop the no-longer-needed field-name padding.
Client.Go-017
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | clients/go/cmd/mxgw-go/main.go:954-991 |
| Status | Resolved |
Description: parseValue returns the raw strconv.ParseBool / strconv.ParseInt / strconv.ParseFloat error verbatim — no wrap with %w and no indication of which CLI flag was the source. A user running mxgw-go write -type int32 -value foo sees
strconv.ParseInt: parsing "foo": invalid syntax
with no mention of -value, -type, or which subcommand failed. The same pattern hits every typed branch (bool, int32, int64, float, double). Compare with the sibling helpers in the same file: parseInt32List wraps with "invalid item handle %q: %w" (Client.Go-003 resolution) and parseRfc3339Timestamp wraps with "invalid RFC 3339 timestamp %q: %w". parseValue was missed and is inconsistent with those two. The GoStyleGuide (docs/style-guides/GoStyleGuide.md, "Errors" section) requires "Wrap errors with useful context using %w."
Recommendation: Wrap each strconv error with the offending input and type, e.g. return nil, fmt.Errorf("invalid %s value %q: %w", valueType, valueText, err). The wrapper handles all five typed branches uniformly without a per-branch change.
Resolution: 2026-05-20 — Each typed branch of parseValue now wraps the bare strconv error with %w and names the offending flag and value ("invalid -value for -type %s: %q: %w"), so mxgw-go write -type int32 -value foo surfaces the source flag, the requested type, and the bad token while still letting errors.Is/As reach the underlying strconv sentinel. The new TestParseValueWrapsStrconvErrorWithFlagContext table-test pins all five typed branches (bool, int32, int64, float, double) to the new wrapper shape.
Client.Go-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | clients/go/cmd/mxgw-go/main.go:593-623 |
| Status | Resolved |
Description: runBenchReadBulk's warm-up and steady-state loops are wall-clock-only:
for time.Now().Before(warmupDeadline) {
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
...
for time.Now().Before(steadyDeadline) {
callStart := time.Now()
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
...
}
Neither loop checks ctx.Done() / ctx.Err(). If the parent context is cancelled (e.g. the operator Ctrl+Cs the benchmark, or the cross-language bench driver scripts/bench-read-bulk.ps1 times out and kills the child early), the loops keep iterating until their wall-clock deadlines elapse. Each ReadBulk call inside fails fast (the gRPC call inherits the cancelled context and returns context.Canceled), but the steady-state loop counts those as failedCalls++ and keeps spinning — wasting CPU and inflating the failedCalls and latencyMs.max figures the PowerShell driver collates across all five clients. The .NET, Rust, Python, and Java bench drivers should be checked for the same shape, but the Go one is the only one being reviewed here. Note that runBenchReadBulk is the only Go CLI command that does NOT register its own signal handler (compare with runGalaxyWatch which does via signal.NotifyContext).
Recommendation: Drop out of both loops as soon as ctx.Err() != nil. Concretely, change the loop conditions to for time.Now().Before(warmupDeadline) && ctx.Err() == nil (and the same on steadyDeadline), or use a select { case <-ctx.Done(): break loop; default: } guard at the top of each iteration. The cross-language bench shape (durationMs, totalCalls, failedCalls, latencyMs) stays the same — the bench just exits sooner and reports the truncated window faithfully.
Resolution: 2026-05-20 — Both the warm-up and steady-state loops in runBenchReadBulk now carry an && ctx.Err() == nil guard alongside the wall-clock check, so a cancelled parent context (Ctrl+C, or the cross-language bench driver killing the child early) breaks the loop instead of spinning failing ReadBulk calls until the deadline elapses. The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via durationMs / totalCalls.
Client.Go-019
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | clients/go/cmd/mxgw-go/main.go:710-716, clients/go/cmd/mxgw-go/main.go:1204,1213 |
| Status | Resolved |
Description: The CLI advertises two timestamp flags as "RFC3339" but parses them with different layouts:
-timestamp-value(write2/write-secured2 bulk):parseRfc3339Timestampusestime.RFC3339Nano, which accepts both2026-04-28T10:00:00Zand2026-04-28T10:00:00.123456789Z.-last-seen-deploy-time(galaxy-watch):time.Parse(time.RFC3339, ...), which rejects fractional seconds.
A user copy-pasting an ObservedAt timestamp from galaxy-watch -json (which is emitted as RFC3339Nano by formatDeployEvent) directly into -last-seen-deploy-time will get a parse error if the source value carried a fractional component, even though both flag descriptions say "RFC3339". The flag help string at main.go:1204 literally says "RFC3339 timestamp", and the README example uses 2026-04-28T10:00:00Z (whole seconds only), so the issue is silent until a fractional timestamp comes from the gateway.
Recommendation: Switch the galaxy-watch parse to time.RFC3339Nano to match parseRfc3339Timestamp (and the gateway's own emit format). One line change at main.go:1213. While there, update the flag help string and the README example to say "RFC 3339 (with optional fractional seconds)" so the two flags are documented uniformly.
Resolution: 2026-05-20 — runGalaxyWatch now parses -last-seen-deploy-time with time.RFC3339Nano, matching parseRfc3339Timestamp and the gateway's own formatDeployEvent emit format; the layout is strictly broader than the previous time.RFC3339 (whole-second values still parse). The flag help string changed to "RFC 3339 timestamp (with optional fractional seconds)" and the clients/go/README.md example was extended with an explicit fractional-seconds line so the two flags advertise the same surface.
Client.Go-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | clients/go/cmd/mxgw-go/main.go:753-802, clients/go/cmd/mxgw-go/main.go:1199-1275 |
| Status | Resolved |
Description: The two long-running stream commands have divergent Ctrl+C UX:
-
runGalaxyWatchregisters a signal handler:signalCtx, stopSignals := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM) defer stopSignals() streamCtx, cancelStream := context.WithCancel(signalCtx)so Ctrl+C drains buffered events and returns cleanly.
-
runStreamEventsdoes not register any signal handler — its parent context iscontext.Background()fromrunWithIO, so Ctrl+C abruptly kills the process. The deferredsubscription.Close()andclient.Close()never run, leaving the server-side stream to fault out on a torn TCP connection rather than a clean cancel.
The two commands are otherwise structurally identical (subscribe + loop until limit or external stop) — the inconsistency is one half of a pair that was missed when galaxy-watch was added. Worth flagging because it directly affects what an integrator who Ctrl+Cs stream-events sees in the gateway's logs (a transport reset rather than a codes.Canceled).
Recommendation: Mirror the runGalaxyWatch pattern in runStreamEvents: wrap ctx in signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM), derive streamCtx from it, and let defer subscription.Close() / defer cancelStream() tear the stream down on signal. The change is roughly six lines and brings the two stream commands into parity. Optionally factor a shared withSignals(ctx) (context.Context, context.CancelFunc) helper if a third stream command lands.
Resolution: 2026-05-20 — runStreamEvents now installs signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM) (with a deferred stopSignals()) and derives streamCtx from the resulting signal-aware context, mirroring runGalaxyWatch. Ctrl+C now cancels the gRPC stream cleanly — the gateway sees codes.Canceled instead of a torn TCP connection — and the deferred subscription.Close() / client.Close() actually run on signal. The two long-running stream commands now share the same shutdown UX.
Client.Go-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | clients/go/cmd/mxgw-go/main_test.go, clients/go/cmd/mxgw-go/main.go:363-520,522-655 |
| Status | Resolved |
Description: The six bulk / bench subcommands wired into run (read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk, bench-read-bulk) have no CLI-level unit tests in main_test.go. In particular, the Client.Go-015 resolution claims:
-current-user-id/-verifier-user-idare only registered for the secured variants and-user-idonly for Write/Write2, so a wrong-variant flag now fails with a cleanflag provided but not definedusage error instead of silently no-op'ing.
But there is no test asserting that, e.g., mxgw-go write-bulk -current-user-id 1 ... returns a "flag provided but not defined" error, or that mxgw-go write-secured-bulk -user-id 1 ... does the same. A future refactor of runWriteBulkVariant (notably one that re-introduced the secured parameter) could silently re-permit the wrong flags without breaking any test. The same gap applies to: parameter validation in runReadBulk (bulk size, empty session/items rejection), the value-count vs handle-count mismatch error in runWriteBulkVariant:447, and runBenchReadBulk's bulk-size/duration-seconds positivity checks.
mxgateway/client_session_test.go already covers the library-level happy paths (TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults, TestReadBulkForwardsTimeoutAndUnpacksCachedFlag, TestSubscribeBulkBuildsOneBulkCommandAndReturnsResults), so this finding is about CLI surface area only.
Recommendation: Add table-driven tests in cmd/mxgw-go/main_test.go along the existing TestParseInt32List* and TestParseValueBuildsTypedValue style:
TestRunWriteBulkVariantGatesSecuredFlags: invokerunWithIOwithwrite-bulk -current-user-id 1 ...andwrite-secured-bulk -user-id 1 ..., assert each returns an error matchingflag provided but not defined.TestRunReadBulkRejectsMissingArgs: invokerunWithIOwithread-bulk(no-session-id), assert the documented "session-id and items are required" error.TestRunBenchReadBulkRejectsNonPositiveBulkSize/TestRunBenchReadBulkRejectsNonPositiveDuration: pin the positivity checks atmain.go:544-549.TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues: pin thelen(handles) != len(valueTexts)error atmain.go:447.
Each is a few lines and routes through the existing runWithIO entry point, so it does not need a bufconn fake.
Resolution: 2026-05-20 — Added CLI-level table-driven regression tests in cmd/mxgw-go/main_test.go routed through runWithIO, so they need no bufconn fake: TestRunWriteBulkVariantGatesSecuredFlags pins Client.Go-015 by asserting write-bulk -current-user-id, write-bulk -verifier-user-id, write2-bulk -current-user-id, write-secured-bulk -user-id, and write-secured2-bulk -user-id all surface flag provided but not defined; TestRunReadBulkRejectsMissingArgs pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; TestRunBenchReadBulkRejectsNonPositiveBulkSize and TestRunBenchReadBulkRejectsNonPositiveDuration pin the positivity checks; TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues pins the explicit item-handles count ... does not match values count ... error. go test ./... passes.