82996aa8e6
Client.Go-022 Re-applied Client.Go-015 shape — runWriteBulkVariant drops
the unused secured param and gates -current-user-id /
-verifier-user-id / -user-id behind the secured-only
variants.
Client.Go-023 Re-applied Client.Go-018 shape — bench warm-up and steady-
state loops respect ctx.Err().
Client.Go-024 Added SDK-level tests for WriteBulk / Write2Bulk /
WriteSecuredBulk / WriteSecured2Bulk / ReadBulk and
StreamAlarms via the existing bufconn fake gateway pattern.
Client.Go-025 Five bulk SDK methods short-circuit on empty input without
an RPC round-trip and document the behavior.
Client.Go-026 runBatch widens scanner.Buffer to 16 MiB and emits an
error-with-sentinel if a longer line still arrives, rather
than aborting the session silently.
Client.Go-027 runBatch treats blank lines as skip-and-continue; only EOF
ends the session.
All resolved at 2026-05-24; gofmt + go vet + go build + go test ./... all
green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
628 lines
61 KiB
Markdown
628 lines
61 KiB
Markdown
# Code Review — Client.Go
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `clients/go` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-24 |
|
|
| Commit reviewed | `42b0037` |
|
|
| Status | Re-reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
A re-review of commit `a020350` (which resolved Client.Go-011..016). `gofmt -l .`,
|
|
`go vet ./...`, `go build ./...`, and `go test ./... -count=1` are all clean.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Prior Client.Go-001/003/007/011 remain resolved. No new correctness bugs found. |
|
|
| 2 | mxaccessgw conventions | `gofmt -l .` and `go vet ./...` clean; Client.Go-004 stays resolved. No new issues. |
|
|
| 3 | Concurrency & thread safety | Client.Go-013 resolved. New issue: `runBenchReadBulk`'s warm-up + steady-state wall-clock loops ignore `ctx` cancellation, so a Ctrl+C or parent-cancel keeps spinning ReadBulk calls until the wall-clock deadline (Client.Go-018). |
|
|
| 4 | Error handling & resilience | Client.Go-014 resolved. New issue: `parseValue` returns bare `strconv` errors with no `%w` wrap and no CLI-context, so a typo like `-type int32 -value foo` surfaces as `strconv.ParseInt: parsing "foo": invalid syntax` without naming the flag — out of line with the GoStyleGuide "wrap errors with useful context using `%w`" rule (Client.Go-017). |
|
|
| 5 | Security | No issues found — TLS-by-default with TLS 1.2 floor, API-key redaction in CLI JSON output, no secret logging. |
|
|
| 6 | Performance & resource management | No issues found — `defer client.Close()` / `defer subscription.Close()` applied consistently; bench-read-bulk preallocates the latency slice. |
|
|
| 7 | Design-document adherence | No new issues. Lazy `grpc.NewClient` + readiness probe (Client.Go-005) and the shared `dial` helper (Client.Go-009) are applied uniformly across `Dial` and `DialGalaxy`. |
|
|
| 8 | Code organization & conventions | Client.Go-015 resolved. New issue: `runStreamEvents` does not install a signal handler (Ctrl+C kills the process abruptly), while `runGalaxyWatch` does — the two long-running stream commands have divergent shutdown UX (Client.Go-020). |
|
|
| 9 | Testing coverage | Client.Go-008/016 resolved. New issue: the six new bulk and bench subcommands (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`) have no CLI-level unit tests — in particular the Client.Go-015 secured-flag-gating fix has no regression test (Client.Go-021). |
|
|
| 10 | Documentation & comments | Client.Go-010/012 resolved. New issue: `runGalaxyWatch` parses `-last-seen-deploy-time` with `time.RFC3339` (no fractional seconds), while `parseRfc3339Timestamp` for `-timestamp-value` accepts `time.RFC3339Nano` — the CLI advertises "RFC 3339" for both but quietly differs on sub-second support (Client.Go-019). |
|
|
|
|
### 2026-05-24 review (commit d692232)
|
|
|
|
Re-review pass at `d692232`. Two commits touched `clients/go/`: `397d3c5`
|
|
updated the proto root in `generate-proto.ps1` to `src/ZB.MOM.WW.MxGateway.Contracts/Protos`;
|
|
`d692232` dropped stale `SessionId =` lines from `AcknowledgeAlarmRequest` /
|
|
`AcknowledgeAlarmReply` fixtures in `alarms_test.go` after the proto retired
|
|
the field, and substituted `CorrelationId: req.GetClientCorrelationId()` for
|
|
the fake server's reply. Module path (`gitea.dohertylan.com/dohertj2/mxaccessgw/clients/go`)
|
|
and subpackage `mxgateway` intentionally unchanged per Go convention.
|
|
`go build ./...` and `go test ./...` are green at HEAD.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
|
|
| 2 | mxaccessgw conventions | No issues found — Go convention preserved (short lowercase package name, gitea-scoped module path). The generated `_pb.go` descriptor still carries `MxGateway.Contracts.Proto` in its csharp_namespace bytes; wire-level metadata not used by Go, intentionally not regenerated. |
|
|
| 3 | Concurrency & thread safety | No issues found in this diff. |
|
|
| 4 | Error handling & resilience | No issues found in this diff. |
|
|
| 5 | Security | No issues found in this diff. |
|
|
| 6 | Performance & resource management | No issues found in this diff. |
|
|
| 7 | Design-document adherence | No issues found in this diff. |
|
|
| 8 | Code organization & conventions | No issues found in this diff. |
|
|
| 9 | Testing coverage | No issues found in this diff — `alarms_test.go` correctly drops retired `session_id` from `AcknowledgeAlarmRequest` and retains it on `QueryActiveAlarmsRequest`. |
|
|
| 10 | Documentation & comments | No issues found in this diff. |
|
|
|
|
### 2026-05-24 re-review (commit 42b0037)
|
|
|
|
Re-review pass at `42b0037`. Four commits touched `clients/go/`:
|
|
`71d2c39` ported the long-lived `batch` subcommand; `8aaab82` added the bulk
|
|
SDK methods (`WriteBulk`/`Write2Bulk`/`WriteSecuredBulk`/`WriteSecured2Bulk`/`ReadBulk`)
|
|
and the six bulk/bench CLI subcommands (`read-bulk`, `write-bulk`, `write2-bulk`,
|
|
`write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`); `6f0d142`
|
|
ported `stream-alarms` and `acknowledge-alarm` (CLI + SDK type aliases +
|
|
`Client.StreamAlarms`); `8738735` updated the README to document the new SDK
|
|
methods. `gofmt -l .`, `go vet ./...`, `go build ./...`, and
|
|
`go test ./... -count=1` are clean at HEAD.
|
|
|
|
This re-review surfaces two regressions of previously-Resolved findings and
|
|
four genuinely-new issues introduced by the new bulk / batch / alarm surface.
|
|
Client.Go-022 and Client.Go-023 are recorded as new IDs (not status flips on
|
|
the original) because the relevant code was deleted and re-added in
|
|
`8aaab82`, so the prior Resolution remains historically accurate against
|
|
that earlier commit.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | New issue: bulk SDK methods accept a non-nil but empty entry/tag slice and send a no-op bulk command (Client.Go-025). |
|
|
| 2 | mxaccessgw conventions | No issues found in this diff. `gofmt -l .` and `go vet ./...` clean. |
|
|
| 3 | Concurrency & thread safety | New issue: `runBenchReadBulk` warm-up + steady-state loops once again ignore `ctx.Err()` — a regression of Client.Go-018 (Client.Go-023). |
|
|
| 4 | Error handling & resilience | New issue: `runBatch` uses a default 64 KiB `bufio.Scanner` buffer; one long bulk-args line aborts the whole batch session with no EOR sentinel (Client.Go-026). |
|
|
| 5 | Security | No issues found — `runAcknowledgeAlarm` doesn't log the `-comment` / `-operator` values; the bulk-write SDK doc comments still flag credential-sensitive values. |
|
|
| 6 | Performance & resource management | No issues found in this diff — `defer client.Close()` / `defer session.Close(...)` / `defer session.UnsubscribeBulk(...)` applied consistently in `runBenchReadBulk`. |
|
|
| 7 | Design-document adherence | No issues found in this diff. `Client.StreamAlarms` correctly skips `c.callContext` (streams must not carry the unary call timeout); `AcknowledgeAlarm` correctly uses it. |
|
|
| 8 | Code organization & conventions | New issues: `runWriteBulkVariant` re-introduces the unused `secured` parameter, the dead `_ = secured` line, and unconditionally-registered secured/non-secured flags — regresses Client.Go-015 (Client.Go-022). `runBatch` treats a blank line in the middle of a script as end-of-session (Client.Go-027). |
|
|
| 9 | Testing coverage | New issue: the five new bulk SDK methods and `Client.StreamAlarms` have no unit tests in `mxgateway/` (Client.Go-024). |
|
|
| 10 | Documentation & comments | No issues found in this diff. README documents the new `StreamAlarms`/`AcknowledgeAlarm` SDK calls; `Session.ReadBulk` documents the cached-vs-snapshot semantics and `timeout=0` default; `WriteSecuredBulk` flags credential sensitivity. |
|
|
|
|
## Findings
|
|
|
|
### Client.Go-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | High |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()` returns a non-nil `error` interface wrapping a nil pointer. Consequently `errors.As(err, &ce)` for `*CommandError` returns `true` while setting `ce` to nil — a caller writing the idiomatic `if errors.As(err, &commandErr) { use commandErr.Status }` nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-`Command` path.
|
|
|
|
**Recommendation:** Make `Unwrap` return an untyped nil when `Command` is nil: `if e == nil || e.Command == nil { return nil }; return e.Command`. Add a test for the HRESULT-only `MxAccessError` asserting `errors.As(err, &ce)` is `false`.
|
|
|
|
**Resolution:** Resolved 2026-05-18: `MxAccessError.Unwrap` now returns an untyped nil when `Command` is nil, so `errors.As` no longer binds a typed-nil `*CommandError`; added `errors_test.go` regression coverage for the HRESULT-only and populated-`Command` paths.
|
|
|
|
### Client.Go-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `clients/go/mxgateway/session.go:440-516` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
|
|
|
|
**Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping).
|
|
|
|
**Resolution:** Resolved 2026-05-18: confirmed against source — on a full bounded buffer the compatibility path cancelled and closed `results` with no terminal result. Added the exported sentinel `ErrEventBufferOverflow` (`errors.go`); `sendEventResult` now, on a full buffer, cancels the stream then calls the new `deliverTerminalResult` helper, which evicts one of the oldest buffered events to make room and places `EventResult{Err: ErrEventBufferOverflow}` so it becomes the consumer's last item before the channel closes. The previously lossy regression test (`TestEventsAfterCancelsStreamWhenCompatibilityChannelIsAbandoned`) was re-pointed to assert the terminal `ErrEventBufferOverflow` result is delivered. `clients/go/README.md` now documents the bounded-buffer/overflow behaviour and steers no-loss callers to `SubscribeEvents`.
|
|
|
|
### Client.Go-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:517-532` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`.
|
|
|
|
**Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern.
|
|
|
|
**Resolution:** Resolved 2026-05-18: confirmed against source — `parseInt32List` called `panic(err)` on a malformed token. It now returns `([]int32, error)`, wrapping the bad token (`invalid item handle %q: %w`); `runUnsubscribeBulk` parses item handles before dialing and returns the error, so a typo flows through `runWithIO` to `os.Exit(2)` like other validation paths. Regression tests `TestParseInt32ListParsesValidTokens` and `TestParseInt32ListReturnsErrorOnMalformedToken` added to `cmd/mxgw-go/main_test.go`.
|
|
|
|
### Client.Go-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | mxaccessgw conventions |
|
|
| Location | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated checks and CI.
|
|
|
|
**Recommendation:** Run `gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go`.
|
|
|
|
**Resolution:** Resolved 2026-05-18: confirmed `gofmt -l .` flagged both files for misaligned struct-literal padding. Ran `gofmt -w` on `mxgateway/alarms_test.go` and `mxgateway/galaxy_test.go`; `gofmt -l .` is now clean for the whole module.
|
|
|
|
### Client.Go-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard `Dial` error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
|
|
|
|
**Recommendation:** Migrate to `grpc.NewClient`; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by `DialTimeout`, and update the doc comment.
|
|
|
|
**Resolution:** Resolved 2026-05-18: confirmed `Dial`/`DialGalaxy` used the deprecated `grpc.DialContext` + `grpc.WithBlock` pair. Migrated both to the shared `dial(ctx, opts)` helper, which now builds a lazy connection with `grpc.NewClient` and runs an explicit `waitForReady` readiness probe (`Connect` + `WaitForStateChange` until `connectivity.Ready`) bounded by `DialTimeout` — preserving fail-fast behavior while letting an otherwise lazy connection recover when the gateway is briefly down. Note: `grpc.NewClient` defaults the target scheme to `dns`, so the bufconn test harnesses (`client_session_test.go`, `alarms_test.go`, `galaxy_test.go`) were updated to use `passthrough:///bufnet` so the fake target reaches the context dialer. New tests `TestDialFailsFastWhenGatewayUnreachable` and `TestDialReadinessProbeReachesReady` cover the probe; `go vet` reports no deprecation. `clients/go/README.md` documents the lazy-connect + readiness-probe semantics.
|
|
|
|
### Client.Go-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `clients/go/mxgateway/errors.go:9-130` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classify transient (`Unavailable`, `DeadlineExceeded`) vs permanent (`Unauthenticated`, `InvalidArgument`) without manually unwrapping and calling `status.Code`.
|
|
|
|
**Recommendation:** Add a helper (e.g. `IsTransient(err) bool`) or expose the gRPC `codes.Code` on `GatewayError`, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
|
|
|
|
**Resolution:** Resolved 2026-05-18: implemented the recommended classification surface in `errors.go` rather than a full parallel type hierarchy (the existing `GatewayError`/`CommandError`/`MxAccessError` chain already separates transport from protocol from MXAccess failures). Added `GatewayError.Code()` (returns the wrapped gRPC `codes.Code`, `OK` for nil, `Unknown` for a non-status error) and the free function `IsTransient(err error) bool`, which unwraps through `*GatewayError` and any gRPC-status chain and reports `true` for `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, and `Aborted`. Tests `TestGatewayErrorCode` and `TestIsTransient` cover the matrix; `clients/go/README.md` documents both for retry/timeout/auth handling.
|
|
|
|
### Client.Go-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `clients/go/mxgateway/session.go:526-532` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
|
|
|
|
**Recommendation:** Either propagate the error up through `invokeCommand`, or fall back to a time/counter-based id rather than an empty string.
|
|
|
|
**Resolution:** Resolved 2026-05-18: confirmed `newCorrelationID` returned `""` on a `rand.Read` failure. It now falls back to a non-empty `"fallback-<unixnano>-<counter>"` id built from `time.Now().UnixNano()` and a process-wide `atomic.Uint64` monotonic counter, so every command stays traceable even without entropy. The `crypto/rand` call was routed through a `randRead` package variable so the failure path is testable; `TestNewCorrelationIDFallsBackOnRandFailure` simulates a `rand.Read` failure and asserts the fallback id is non-empty, `fallback-` prefixed, and unique, and `TestNewCorrelationIDUsesRandEntropy` pins the happy path.
|
|
|
|
### Client.Go-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `clients/go/mxgateway/` (test files) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disable case; and `NativeValue`/`NativeArray` for the array, raw-bytes, null, and unsupported-kind branches.
|
|
|
|
**Recommendation:** Add unit tests for `resolveTransportCredentials` precedence, `callContext` deadline arithmetic, and `NativeValue`/`NativeArray` round-trips for every kind.
|
|
|
|
**Resolution:** Resolved 2026-05-18: added `clients/go/mxgateway/coverage_test.go`. `TestResolveTransportCredentialsPrecedence` exercises every branch (explicit `TransportCredentials`, `Plaintext`, missing `CACertFile` error, `TLSConfig` + `ServerNameOverride`, default TLS floor) and `TestResolveTransportCredentialsDoesNotMutateTLSConfig` confirms the supplied `*tls.Config` is cloned. `TestCallContextDeadlineArithmetic` covers zero/default, negative-disable, positive timeout, caller-deadline-sooner-kept, and caller-deadline-later-shortened. `TestNativeValueEdgeKinds`, `TestNativeArrayEdgeKinds`, and `TestNativeValueUnsupportedKind` cover the null, raw-bytes (including the no-alias copy), array, timestamp-with-nil, and unsupported-kind branches.
|
|
|
|
### Client.Go-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
|
|
|
|
**Recommendation:** Extract a shared unexported `dial(ctx, opts)` and a free `callContext(opts, ctx)` function, and have both client constructors call them.
|
|
|
|
**Resolution:** Resolved 2026-05-18: extracted the shared unexported `dial(ctx, opts) (*grpc.ClientConn, error)` (credential resolution, dial-option assembly, `grpc.NewClient`, readiness probe) and the free `callContext(ctx, callTimeout) (context.Context, context.CancelFunc)` into `client.go`. `Dial`/`DialGalaxy` and both `(*Client).callContext`/`(*GalaxyClient).callContext` methods now delegate to them; the duplicated dial and deadline code in `galaxy.go` was removed (its now-unused `errors` import dropped). This was done together with the Client.Go-005 migration so the `grpc.NewClient` change lives in exactly one place.
|
|
|
|
### Client.Go-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `clients/go/mxgateway/client.go:39-40` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
|
|
|
|
**Recommendation:** Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that `DialTimeout` bounds the initial connect attempt.
|
|
|
|
**Resolution:** Resolved 2026-05-18: alongside the Client.Go-005 migration, the `Dial` doc comment was rewritten to describe the lazy `grpc.NewClient` connection, the `DialTimeout`-bounded (default 10s, or ctx deadline when sooner) readiness probe, that a briefly-unavailable gateway recovers instead of producing a hard error, and that cancelling `ctx` aborts the probe. `DialGalaxy` and the new `dial`/`waitForReady`/`callContext` helpers carry matching doc comments.
|
|
|
|
### Client.Go-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `clients/go/mxgateway/alarms_test.go:66-73` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `TestAcknowledgeAlarmRejectsNilRequest` contains a no-op `if` with an empty body whose intent is documented in a comment ("Accept either: the helper returned the literal sentinel, or the generic transport error — both prove nil was rejected"). The condition
|
|
|
|
```go
|
|
if err == nil || !errors.Is(err, errors.Unwrap(err)) && err.Error() != "mxgateway: acknowledge alarm request is required" {
|
|
// ...
|
|
}
|
|
```
|
|
|
|
evaluates expressions for side effects only and asserts nothing — Go's `&&` binds tighter than `||`, the body is empty, and the actual nil check happens on the very next `if err == nil`. The block is effectively dead code masquerading as a check. It also evaluates `errors.Unwrap(err)` regardless of `err`'s shape, and would call `err.Error()` even when err might be a wrapped status error whose message wording the gateway is free to change — making the apparent assertion brittle on top of being dead.
|
|
|
|
**Recommendation:** Drop the empty-body `if` entirely (the subsequent `if err == nil { t.Fatalf(...) }` already enforces the contract), or, if the intent is to additionally pin the literal error message for the sentinel path, replace it with a real assertion (`if err.Error() != "mxgateway: acknowledge alarm request is required" { t.Fatalf(...) }`) and remove the spurious `errors.Is(err, errors.Unwrap(err))` clause.
|
|
|
|
**Resolution:** 2026-05-20 — Removed the empty-body `if` in `TestAcknowledgeAlarmRejectsNilRequest`; the subsequent `if err == nil { t.Fatalf(...) }` already enforces the nil-rejection contract without the dead, brittle compound predicate.
|
|
|
|
### Client.Go-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:1063-1065`, `clients/go/cmd/mxgw-go/main.go:88-104` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `writeUsage` lists the available subcommands as `version|open-session|close-session|register|add-item|advise|subscribe-bulk|unsubscribe-bulk|write|stream-events|smoke|galaxy-test-connection|galaxy-last-deploy|galaxy-discover|galaxy-watch`. Six subcommands wired into `run` are missing from this list: `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, and `bench-read-bulk`. A user invoking `mxgw-go` with no args or an unknown command (the two paths that print this banner) sees an incomplete CLI surface and may believe the bulk-write / read-bulk families are not implemented. The README does document them, but the inline usage banner is the first source of truth a CLI user consults.
|
|
|
|
**Recommendation:** Extend the usage string to include every command registered in the `switch args[0]` in `run`, or generate it from a single source-of-truth slice keyed on command name → handler so the two cannot drift again.
|
|
|
|
**Resolution:** 2026-05-20 — `writeUsage` now lists the previously missing `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, and `bench-read-bulk` subcommands alongside the original surface, so the no-args / unknown-command banner reflects every command wired into `run`.
|
|
|
|
### Client.Go-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:1246-1249`, `clients/go/cmd/mxgw-go/main.go:1257-1262` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** In `runGalaxyWatch`, the signal-cancellation branch carefully drains the buffered `events` channel after `cancelStream()` so the `WatchDeployEvents` goroutine can exit (`for range events { }`). The limit-reached branch (`if *limit > 0 && count >= *limit { cancelStream(); return nil }`) skips that drain and returns immediately. After the function returns, `defer client.Close()` runs and tears down the gRPC connection; in the gap before the connection close propagates, the WatchDeployEvents goroutine may still be blocked on `case events <- event:` (the channel is buffered to 16 but a slow producer can refill it) — the goroutine then exits via `<-ctx.Done()` because `streamCtx` was cancelled, so it isn't a permanent leak, but the two cancellation paths behave inconsistently and the limit-reached path can briefly hold a goroutine plus the gRPC stream while the client tears down underneath it.
|
|
|
|
**Recommendation:** Factor the drain into a helper and use it from both branches, e.g. after `cancelStream()` always `for range events { }` (and let the surrounding `select`/`for` re-evaluate `<-errs` if a terminal error was already buffered). Alternatively, drop the explicit drain in both branches and rely on `defer cancelStream()` plus `defer client.Close()` — but pick one model and apply it consistently.
|
|
|
|
**Resolution:** 2026-05-20 — The limit-reached branch in `runGalaxyWatch` now drains the buffered `events` channel (`for range events { }`) after `cancelStream()`, matching the signal-cancel branch. Both cancellation paths now wait for the `WatchDeployEvents` goroutine to exit before `defer client.Close()` tears the gRPC connection down.
|
|
|
|
### Client.Go-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `clients/go/mxgateway/session.go:602`, `clients/go/mxgateway/galaxy.go:189` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Two stream Recv loops compare end-of-stream with `err == io.EOF` directly:
|
|
|
|
- `session.go:602` — `if err == io.EOF || status.Code(err) == codes.Canceled || streamCtx.Err() != nil { return }`
|
|
- `galaxy.go:189` — `if recvErr == io.EOF { return }`
|
|
|
|
gRPC's generated `Recv()` does return the `io.EOF` sentinel directly today, so the comparisons work in practice. However, the Go idiom (and the project's `docs/style-guides/GoStyleGuide.md`) is to use `errors.Is(err, io.EOF)` so future wrapping (e.g. an interceptor decorating Recv errors) does not silently flip the loop from "stream finished normally" to "stream produced an error". The mxgateway client itself wraps non-EOF Recv errors in `*GatewayError`, which `errors.Is` already supports — using `errors.Is` keeps both paths consistent.
|
|
|
|
**Recommendation:** Replace `recvErr == io.EOF` / `err == io.EOF` with `errors.Is(err, io.EOF)` (the `errors` package is already imported in both files).
|
|
|
|
**Resolution:** 2026-05-20 — Both stream Recv loops now use `errors.Is(err, io.EOF)`: `session.go` already imported `errors`, and `galaxy.go` gained the missing `errors` import alongside the `recvErr == io.EOF` → `errors.Is(recvErr, io.EOF)` change, keeping EOF detection robust against any future Recv-error wrapping.
|
|
|
|
### Client.Go-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:410-512` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool)` accepts `secured` but never uses it — the routing is keyed on `command` (the string `"write-bulk"` / `"write2-bulk"` / `"write-secured-bulk"` / `"write-secured2-bulk"`). The function ends with `_ = secured // currently only used for routing above; reserved for future per-variant validation`, which is misleading because `secured` is not in fact used for routing. The four wrapper functions (`runWriteBulk`, `runWrite2Bulk`, `runWriteSecuredBulk`, `runWriteSecured2Bulk`) all pass a `secured` argument that has no effect. The four CLI options `-current-user-id`, `-verifier-user-id` are unconditionally registered on every variant, including the non-secured ones, so a `write-bulk` invocation that passes `-current-user-id 42` silently does nothing. Either remove `secured` and the dead `_ = secured` comment, or use it to gate the registration of secured-only flags so wrong combinations are rejected with a clean error.
|
|
|
|
**Recommendation:** Drop the `secured` parameter (the `command` switch already distinguishes the four variants) and the misleading `_ = secured` line; or, if validation is the goal, branch flag registration on `secured` so secured-only flags are unavailable for the non-secured variants and emit a clean usage error if they appear.
|
|
|
|
**Resolution:** 2026-05-20 — Dropped the unused `secured` parameter from `runWriteBulkVariant` (the `command` switch already distinguishes the four variants) and removed the misleading `_ = secured` line. The variant is now derived locally from `command` and used to gate flag registration: `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error instead of silently no-op'ing. The four `runWrite*Bulk` wrappers were updated to match the new signature.
|
|
|
|
### Client.Go-016
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `clients/go/mxgateway/galaxy_test.go:382-429` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `fakeGalaxyServer.watchSendInterval` is declared on the test fake and consulted inside `WatchDeployEvents` (`if s.watchSendInterval > 0 { ... }`) but no test in the package sets a non-zero value. The dead field plus its branch were presumably added to support a backpressure / pacing test that was never landed, and now the only effect is reader confusion ("which test uses this?") and a pointlessly larger fake. Backpressure on the bootstrap-plus-events sequence is also genuinely worth testing, given that `WatchDeployEvents` writes to a 16-deep buffered channel.
|
|
|
|
**Recommendation:** Either delete the unused `watchSendInterval` field and its branch in `WatchDeployEvents`, or add the test it was added for — e.g. one that pumps more than 16 events with a small interval and asserts the consumer keeps up without losing or reordering events. Linking the field to a `// for TestX` comment if it stays would also help.
|
|
|
|
**Resolution:** 2026-05-20 — Removed the unused `watchSendInterval` field from `fakeGalaxyServer` and the corresponding `if s.watchSendInterval > 0 { ... }` branch in `WatchDeployEvents`; no test set the field, so the dead code path is gone and the fake is leaner. `gofmt -w` reflowed the struct to drop the no-longer-needed field-name padding.
|
|
|
|
### Client.Go-017
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:954-991` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `parseValue` returns the raw `strconv.ParseBool` / `strconv.ParseInt` / `strconv.ParseFloat` error verbatim — no wrap with `%w` and no indication of which CLI flag was the source. A user running `mxgw-go write -type int32 -value foo` sees
|
|
|
|
```
|
|
strconv.ParseInt: parsing "foo": invalid syntax
|
|
```
|
|
|
|
with no mention of `-value`, `-type`, or which subcommand failed. The same pattern hits every typed branch (bool, int32, int64, float, double). Compare with the sibling helpers in the same file: `parseInt32List` wraps with `"invalid item handle %q: %w"` (Client.Go-003 resolution) and `parseRfc3339Timestamp` wraps with `"invalid RFC 3339 timestamp %q: %w"`. `parseValue` was missed and is inconsistent with those two. The GoStyleGuide (`docs/style-guides/GoStyleGuide.md`, "Errors" section) requires "Wrap errors with useful context using `%w`."
|
|
|
|
**Recommendation:** Wrap each `strconv` error with the offending input and type, e.g. `return nil, fmt.Errorf("invalid %s value %q: %w", valueType, valueText, err)`. The wrapper handles all five typed branches uniformly without a per-branch change.
|
|
|
|
**Resolution:** 2026-05-20 — Each typed branch of `parseValue` now wraps the bare `strconv` error with `%w` and names the offending flag and value (`"invalid -value for -type %s: %q: %w"`), so `mxgw-go write -type int32 -value foo` surfaces the source flag, the requested type, and the bad token while still letting `errors.Is/As` reach the underlying `strconv` sentinel. The new `TestParseValueWrapsStrconvErrorWithFlagContext` table-test pins all five typed branches (bool, int32, int64, float, double) to the new wrapper shape.
|
|
|
|
### Client.Go-018
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:593-623` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only:
|
|
|
|
```go
|
|
for time.Now().Before(warmupDeadline) {
|
|
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
|
|
}
|
|
...
|
|
for time.Now().Before(steadyDeadline) {
|
|
callStart := time.Now()
|
|
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
|
|
...
|
|
}
|
|
```
|
|
|
|
Neither loop checks `ctx.Done()` / `ctx.Err()`. If the parent context is cancelled (e.g. the operator Ctrl+Cs the benchmark, or the cross-language bench driver `scripts/bench-read-bulk.ps1` times out and kills the child early), the loops keep iterating until their wall-clock deadlines elapse. Each `ReadBulk` call inside fails fast (the gRPC call inherits the cancelled context and returns `context.Canceled`), but the steady-state loop counts those as `failedCalls++` and keeps spinning — wasting CPU and inflating the `failedCalls` and `latencyMs.max` figures the PowerShell driver collates across all five clients. The .NET, Rust, Python, and Java bench drivers should be checked for the same shape, but the Go one is the only one being reviewed here. Note that `runBenchReadBulk` is the only Go CLI command that does NOT register its own signal handler (compare with `runGalaxyWatch` which does via `signal.NotifyContext`).
|
|
|
|
**Recommendation:** Drop out of both loops as soon as `ctx.Err() != nil`. Concretely, change the loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`), or use a `select { case <-ctx.Done(): break loop; default: }` guard at the top of each iteration. The cross-language bench shape (`durationMs`, `totalCalls`, `failedCalls`, `latencyMs`) stays the same — the bench just exits sooner and reports the truncated window faithfully.
|
|
|
|
**Resolution:** 2026-05-20 — Both the warm-up and steady-state loops in `runBenchReadBulk` now carry an `&& ctx.Err() == nil` guard alongside the wall-clock check, so a cancelled parent context (Ctrl+C, or the cross-language bench driver killing the child early) breaks the loop instead of spinning failing `ReadBulk` calls until the deadline elapses. The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`.
|
|
|
|
### Client.Go-019
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:710-716`, `clients/go/cmd/mxgw-go/main.go:1204,1213` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The CLI advertises two timestamp flags as "RFC3339" but parses them with different layouts:
|
|
|
|
- `-timestamp-value` (write2/write-secured2 bulk): `parseRfc3339Timestamp` uses `time.RFC3339Nano`, which accepts both `2026-04-28T10:00:00Z` and `2026-04-28T10:00:00.123456789Z`.
|
|
- `-last-seen-deploy-time` (galaxy-watch): `time.Parse(time.RFC3339, ...)`, which rejects fractional seconds.
|
|
|
|
A user copy-pasting an `ObservedAt` timestamp from `galaxy-watch -json` (which is emitted as `RFC3339Nano` by `formatDeployEvent`) directly into `-last-seen-deploy-time` will get a parse error if the source value carried a fractional component, even though both flag descriptions say "RFC3339". The flag help string at `main.go:1204` literally says "RFC3339 timestamp", and the README example uses `2026-04-28T10:00:00Z` (whole seconds only), so the issue is silent until a fractional timestamp comes from the gateway.
|
|
|
|
**Recommendation:** Switch the `galaxy-watch` parse to `time.RFC3339Nano` to match `parseRfc3339Timestamp` (and the gateway's own emit format). One line change at `main.go:1213`. While there, update the flag help string and the README example to say "RFC 3339 (with optional fractional seconds)" so the two flags are documented uniformly.
|
|
|
|
**Resolution:** 2026-05-20 — `runGalaxyWatch` now parses `-last-seen-deploy-time` with `time.RFC3339Nano`, matching `parseRfc3339Timestamp` and the gateway's own `formatDeployEvent` emit format; the layout is strictly broader than the previous `time.RFC3339` (whole-second values still parse). The flag help string changed to "RFC 3339 timestamp (with optional fractional seconds)" and the `clients/go/README.md` example was extended with an explicit fractional-seconds line so the two flags advertise the same surface.
|
|
|
|
### Client.Go-020
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:753-802`, `clients/go/cmd/mxgw-go/main.go:1199-1275` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The two long-running stream commands have divergent Ctrl+C UX:
|
|
|
|
- `runGalaxyWatch` registers a signal handler:
|
|
|
|
```go
|
|
signalCtx, stopSignals := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
|
|
defer stopSignals()
|
|
streamCtx, cancelStream := context.WithCancel(signalCtx)
|
|
```
|
|
|
|
so Ctrl+C drains buffered events and returns cleanly.
|
|
|
|
- `runStreamEvents` does not register any signal handler — its parent context is `context.Background()` from `runWithIO`, so Ctrl+C abruptly kills the process. The deferred `subscription.Close()` and `client.Close()` never run, leaving the server-side stream to fault out on a torn TCP connection rather than a clean cancel.
|
|
|
|
The two commands are otherwise structurally identical (subscribe + loop until limit or external stop) — the inconsistency is one half of a pair that was missed when `galaxy-watch` was added. Worth flagging because it directly affects what an integrator who Ctrl+Cs `stream-events` sees in the gateway's logs (a transport reset rather than a `codes.Canceled`).
|
|
|
|
**Recommendation:** Mirror the `runGalaxyWatch` pattern in `runStreamEvents`: wrap `ctx` in `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)`, derive `streamCtx` from it, and let `defer subscription.Close()` / `defer cancelStream()` tear the stream down on signal. The change is roughly six lines and brings the two stream commands into parity. Optionally factor a shared `withSignals(ctx) (context.Context, context.CancelFunc)` helper if a third stream command lands.
|
|
|
|
**Resolution:** 2026-05-20 — `runStreamEvents` now installs `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)` (with a deferred `stopSignals()`) and derives `streamCtx` from the resulting signal-aware context, mirroring `runGalaxyWatch`. Ctrl+C now cancels the gRPC stream cleanly — the gateway sees `codes.Canceled` instead of a torn TCP connection — and the deferred `subscription.Close()` / `client.Close()` actually run on signal. The two long-running stream commands now share the same shutdown UX.
|
|
|
|
### Client.Go-021
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `clients/go/cmd/mxgw-go/main_test.go`, `clients/go/cmd/mxgw-go/main.go:363-520,522-655` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The six bulk / bench subcommands wired into `run` (`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`) have **no CLI-level unit tests** in `main_test.go`. In particular, the Client.Go-015 resolution claims:
|
|
|
|
> `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error instead of silently no-op'ing.
|
|
|
|
But there is no test asserting that, e.g., `mxgw-go write-bulk -current-user-id 1 ...` returns a "flag provided but not defined" error, or that `mxgw-go write-secured-bulk -user-id 1 ...` does the same. A future refactor of `runWriteBulkVariant` (notably one that re-introduced the `secured` parameter) could silently re-permit the wrong flags without breaking any test. The same gap applies to: parameter validation in `runReadBulk` (bulk size, empty session/items rejection), the value-count vs handle-count mismatch error in `runWriteBulkVariant:447`, and `runBenchReadBulk`'s `bulk-size`/`duration-seconds` positivity checks.
|
|
|
|
`mxgateway/client_session_test.go` already covers the library-level happy paths (`TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults`, `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag`, `TestSubscribeBulkBuildsOneBulkCommandAndReturnsResults`), so this finding is about CLI surface area only.
|
|
|
|
**Recommendation:** Add table-driven tests in `cmd/mxgw-go/main_test.go` along the existing `TestParseInt32List*` and `TestParseValueBuildsTypedValue` style:
|
|
|
|
- `TestRunWriteBulkVariantGatesSecuredFlags`: invoke `runWithIO` with `write-bulk -current-user-id 1 ...` and `write-secured-bulk -user-id 1 ...`, assert each returns an error matching `flag provided but not defined`.
|
|
- `TestRunReadBulkRejectsMissingArgs`: invoke `runWithIO` with `read-bulk` (no `-session-id`), assert the documented "session-id and items are required" error.
|
|
- `TestRunBenchReadBulkRejectsNonPositiveBulkSize` / `TestRunBenchReadBulkRejectsNonPositiveDuration`: pin the positivity checks at `main.go:544-549`.
|
|
- `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues`: pin the `len(handles) != len(valueTexts)` error at `main.go:447`.
|
|
|
|
Each is a few lines and routes through the existing `runWithIO` entry point, so it does not need a bufconn fake.
|
|
|
|
**Resolution:** 2026-05-20 — Added CLI-level table-driven regression tests in `cmd/mxgw-go/main_test.go` routed through `runWithIO`, so they need no bufconn fake: `TestRunWriteBulkVariantGatesSecuredFlags` pins Client.Go-015 by asserting `write-bulk -current-user-id`, `write-bulk -verifier-user-id`, `write2-bulk -current-user-id`, `write-secured-bulk -user-id`, and `write-secured2-bulk -user-id` all surface `flag provided but not defined`; `TestRunReadBulkRejectsMissingArgs` pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; `TestRunBenchReadBulkRejectsNonPositiveBulkSize` and `TestRunBenchReadBulkRejectsNonPositiveDuration` pin the positivity checks; `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` pins the explicit `item-handles count ... does not match values count ...` error. `go test ./...` passes.
|
|
|
|
### Client.Go-022
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Code organization & conventions |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Commit `8aaab82` ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit:
|
|
|
|
- `runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool)` accepts a `secured` parameter that is never used for routing — the switch at `main.go:478` keys on `command` (`"write-bulk"` / `"write2-bulk"` / `"write-secured-bulk"` / `"write-secured2-bulk"`). The function ends with the misleading `_ = secured // currently only used for routing above; reserved for future per-variant validation` at `main.go:517`, the exact line Client.Go-015 called out as both inaccurate (`secured` is not used for routing) and reserved-without-purpose.
|
|
- All four wrappers (`runWriteBulk`, `runWrite2Bulk`, `runWriteSecuredBulk`, `runWriteSecured2Bulk` at `main.go:398-412`) pass a `secured` argument that has no effect.
|
|
- `-current-user-id`, `-verifier-user-id`, and `-user-id` are all registered unconditionally for every variant (`main.go:427-429`), so `mxgw-go write-bulk -current-user-id 5 -item-handles 1 -values foo` silently no-ops the flag instead of failing with `flag provided but not defined`. The secured-only flag gating that Client.Go-015's resolution claimed is gone again.
|
|
|
|
Because the surrounding test file (`main_test.go`) lost the regression tests promised by Client.Go-021 between then and this commit, nothing caught the re-introduction.
|
|
|
|
**Recommendation:** Re-apply the Client.Go-015 fix on this re-added code. Drop the `secured` parameter and the `_ = secured` line (the `command` switch is the only routing key); derive the variant locally from `command`; register `-current-user-id` / `-verifier-user-id` only inside the secured branches and `-user-id` only inside Write/Write2 — so a wrong-variant flag fails with a clean `flag provided but not defined` usage error. Re-add the `TestRunWriteBulkVariantGatesSecuredFlags` table-test from the Client.Go-021 resolution so a future regression is caught by CI.
|
|
|
|
**Resolution:** 2026-05-24 — Re-applied the Client.Go-015 fix. Dropped the unused `secured` parameter from `runWriteBulkVariant` and the misleading `_ = secured` line; the variant is now derived locally from `command` and gates flag registration. `-current-user-id` / `-verifier-user-id` are only registered for the secured variants and `-user-id` only for Write/Write2, so a wrong-variant flag now fails with a clean `flag provided but not defined` usage error. The four `runWrite*Bulk` wrappers were updated to match the new signature. Regression test `TestRunWriteBulkVariantGatesSecuredFlags` in `cmd/mxgw-go/main_test.go` (table-driven across all five wrong-variant flag/command pairs) was re-added; it failed pre-fix on every case ("session-id, item-handles, and values are required" reached because the flag was silently accepted), and passes post-fix with the expected `flag provided but not defined`.
|
|
|
|
### Client.Go-023
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Concurrency & thread safety |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only again:
|
|
|
|
```go
|
|
warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second)
|
|
timeout := time.Duration(*timeoutMs) * time.Millisecond
|
|
for time.Now().Before(warmupDeadline) {
|
|
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
|
|
}
|
|
...
|
|
for time.Now().Before(steadyDeadline) {
|
|
callStart := time.Now()
|
|
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
|
|
...
|
|
}
|
|
```
|
|
|
|
Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client.Go-018 documented and was marked Resolved against an earlier commit — the `&& ctx.Err() == nil` guard the Resolution added has been removed by commit `8aaab82` (which re-added `runBenchReadBulk`). If the cross-language driver `scripts/bench-read-bulk.ps1` times out and kills the child early, or an operator Ctrl+Cs the bench, each `ReadBulk` call inside fails fast with `context.Canceled` but the steady-state loop counts those as `failedCalls++` and keeps spinning for the full `*durationSeconds` window. That inflates `failedCalls` and `latencyMs.max` in the JSON the PowerShell driver collates across all five clients, and wastes CPU. Note that `runBenchReadBulk` is also the only Go CLI command that does not register its own signal handler — `runStreamAlarms` (added in the same diff range) and `runGalaxyWatch` both do.
|
|
|
|
**Recommendation:** Re-apply the Client.Go-018 fix: change both loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`). The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`. Optionally add the `signal.NotifyContext` pattern used by `runStreamAlarms` and `runGalaxyWatch` so direct Ctrl+C on the bench also short-circuits cleanly.
|
|
|
|
**Resolution:** 2026-05-24 — Re-applied the Client.Go-018 fix. Both the warm-up and steady-state loops in `runBenchReadBulk` now carry an `&& ctx.Err() == nil` guard alongside the wall-clock check, so a cancelled parent context breaks the loops instead of spinning failing `ReadBulk` calls until the deadline elapses. The cross-language bench JSON shape (`durationMs` / `totalCalls`) is unchanged — the truncated window is just reported faithfully. Regression test `TestRunBenchReadBulkRespectsContextCancellation` in `cmd/mxgw-go/main_test.go` spins up a localhost TCP gRPC fake (`benchFakeGateway`) that answers OpenSession + Invoke for the register/subscribe/read/unsubscribe sequence, runs the bench with `-warmup-seconds 5 -duration-seconds 5`, cancels the ctx after 150ms, and asserts the bench returns in under 4s. Pre-fix the test ran for the full 10s (warmup+duration); post-fix it returns within ~250ms.
|
|
|
|
### Client.Go-024
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The five new bulk SDK methods on `Session` and the new `Client.StreamAlarms` method have **no unit tests** in `clients/go/mxgateway/`:
|
|
|
|
- `Session.WriteBulk` (`session.go:395`)
|
|
- `Session.Write2Bulk` (`session.go:418`)
|
|
- `Session.WriteSecuredBulk` (`session.go:442`)
|
|
- `Session.WriteSecured2Bulk` (`session.go:465`)
|
|
- `Session.ReadBulk` (`session.go:495`) — in particular the `timeout > uint32(^uint32(0))` ms saturation branch at `session.go:504-509` is interesting (a `time.Duration` is `int64` ns; the conversion silently clamps to `MaxUint32` ms ≈ 49 days) and is entirely untested.
|
|
- `Client.StreamAlarms` (`alarms.go:65`) — no equivalent of `TestQueryActiveAlarmsPassesFilterPrefix` exists for `StreamAlarms`, even though they share the same fake-server pattern.
|
|
|
|
`mxgateway/client_session_test.go` and `mxgateway/alarms_test.go` cover the prior single-item and bulk methods (`TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults`, `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag`, etc. were referenced by Client.Go-021 but `Grep` of the current tree confirms they no longer exist — the test file ends at commit `8aaab82` without any of them), so library-level happy paths and error paths for the new shape are entirely uncovered.
|
|
|
|
**Recommendation:** Add bufconn-fake-server tests in `mxgateway/client_session_test.go` and `mxgateway/alarms_test.go`:
|
|
|
|
- `TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults` — confirm the protobuf payload is `MX_COMMAND_KIND_WRITE_BULK` carrying every entry, and the reply's `BulkWriteResult` list flows back. Mirror for Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk.
|
|
- `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag` — pin `timeoutMs` arithmetic for `0`, a normal `time.Duration`, and the `> MaxUint32 ms` saturation branch; assert `WasCached` propagates from each `BulkReadResult`.
|
|
- `TestStreamAlarmsPassesFilterPrefix` — mirror `TestQueryActiveAlarmsPassesFilterPrefix`; assert request flows and stream Recv returns each fake `AlarmFeedMessage` (active-alarm snapshot, snapshot-complete sentinel, transition).
|
|
|
|
These plus `nil` / empty-slice rejection tests for each bulk method close out the new public surface.
|
|
|
|
**Resolution:** 2026-05-24 — Added SDK-level tests using the existing `newBufconnClient` / `newBufconnClientWithAlarms` fake-gateway pattern. In `clients/go/mxgateway/client_session_test.go`: `TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults` confirms the protobuf payload carries `MX_COMMAND_KIND_WRITE_BULK` with all entries and returns per-entry results; `TestWriteBulkRejectsNilEntries` pins the nil guard on all five new bulk methods (WriteBulk/Write2Bulk/WriteSecuredBulk/WriteSecured2Bulk/ReadBulk); `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag` pins normal `timeoutMs` arithmetic and `WasCached` propagation; `TestReadBulkSaturatesTimeoutAboveMaxUint32` pins the `> MaxUint32 ms` clamp at `session.go:504-509`. In `clients/go/mxgateway/alarms_test.go`: `TestStreamAlarmsPassesFilterPrefixAndReceivesFeedMessages` asserts request flows and stream Recv returns each fake `AlarmFeedMessage` (active-alarm snapshot, snapshot-complete sentinel) with auth metadata attached; `TestStreamAlarmsRejectsNilRequest` pins the nil guard. The `fakeGatewayWithAlarms` was extended with a `StreamAlarms` method.
|
|
|
|
### Client.Go-025
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `clients/go/mxgateway/session.go:395-485,495-525` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) each guard with `if entries == nil { return error }` and an upper-bound `ensureBulkSize` check, but accept a non-nil empty slice (e.g. `[]*WriteBulkEntry{}` or `[]string{}`). The call then sends an `MX_COMMAND_KIND_WRITE_BULK` (or peer) command with zero entries across the gRPC wire to the gateway, which forwards to the worker for a no-op round trip. This is the same shape Client.Go-015 / Client.Go-021 were written against (the CLI now also accepts `mxgw-go write-bulk -item-handles , -values ,` which `parseInt32List` returns as empty without error). The pre-existing bulk methods (`AddItemBulk`, `AdviseItemBulk`, etc. at `session.go:253-343`) carry the identical pattern, so this is a long-standing convention — but it's still a real cost on the hot path. The Java / .NET / Rust / Python clients should be checked for parity if this is fixed.
|
|
|
|
**Recommendation:** Pick one model and apply it consistently to every bulk SDK method (new and pre-existing):
|
|
|
|
1. Reject empty slices early: `if len(entries) == 0 { return nil, errors.New("mxgateway: write bulk entries are required") }` — this is the spelling already used on the nil branch, and matches the GoStyleGuide's "Return errors instead of panicking" / "Wrap errors with useful context" voice.
|
|
2. Or document at the SDK doc-comment that an empty slice is a valid no-op (it produces `[]*BulkWriteResult{}` and one wasted gRPC round trip), and have the CLI reject empty `-item-handles` / `-values` before dialing.
|
|
|
|
Option 1 is cheaper for callers (one less round trip and one clearer error message) and removes the empty-list footgun for cross-language drivers that may pass empty arrays from PowerShell `,` splits.
|
|
|
|
**Resolution:** 2026-05-24 — Audited the four other clients for parity: .NET (`MxGatewaySession.cs:520`), Rust (`session.rs:408` via `ensure_bulk_size`), Python (`session.py:350`), and Java (`MxGatewaySession.java:451`) all accept empty slices and make the round-trip with zero entries. To preserve cross-language behaviour (no error on empty input) while removing the wasteful round trip on the Go hot path, all five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) now short-circuit on `len(entries) == 0` and return an empty result slice without invoking the command. The nil guard is preserved (returns the "...are required" error) and the SDK doc comments now document the empty-slice no-op shape explicitly. Regression test `TestBulkMethodsShortCircuitOnEmptySliceWithoutRoundTrip` in `client_session_test.go` invokes each of the five methods with an empty slice and asserts (a) no error, (b) zero-length result, and (c) `fake.invokeRequest == nil` (no gRPC round trip). Pre-fix the test failed on the first assertion ("WriteBulk(empty) sent a round trip; expected short-circuit"); post-fix it passes.
|
|
|
|
### Client.Go-026
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Error handling & resilience |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:1196-1222` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `runBatch` reads command lines with a default `bufio.Scanner`:
|
|
|
|
```go
|
|
scanner := bufio.NewScanner(in)
|
|
for scanner.Scan() {
|
|
...
|
|
}
|
|
return scanner.Err()
|
|
```
|
|
|
|
The default `bufio.Scanner` token size is 64 KiB (`bufio.MaxScanTokenSize`). One long subcommand invocation — e.g. a `subscribe-bulk -items tag001,tag002,...,tag5000` line with several thousand tag references, or a `write-bulk` driven by a Galaxy-sized handle list — crosses 64 KiB and the scanner errors out with `bufio.ErrTooLong` mid-session. Two follow-on effects:
|
|
|
|
- The error is returned only at the outer `return scanner.Err()`, so the cross-language batch harness sees neither the EOR sentinel for the failing command nor any of the still-pending commands queued after it.
|
|
- The fix is one line: `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` (or larger), which lifts the cap on one-line tokens without buffering the whole input.
|
|
|
|
A second weakness: `strings.Fields(line)` splits on whitespace and does no quote-aware tokenisation, so a literal `-comment "with spaces"` becomes four args (`-comment`, `"with`, `spaces"`). This isn't a Go-specific bug — every language's batch tokeniser has this surface — but it means certain CLI flags that take free-text values (`acknowledge-alarm -comment`, `acknowledge-alarm -operator`) can't carry spaces in batch mode. Worth documenting if not fixed.
|
|
|
|
**Recommendation:** Call `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner` so a long bulk-args line doesn't abort the session. If `runBatch` is intended to support free-text flag values (the `acknowledge-alarm -comment` shape is the obvious case), swap `strings.Fields` for a quote-aware tokeniser (`mvdan.cc/sh/v3/syntax` or a small inline state machine matching the .NET/Rust harness shape). Otherwise add a one-line comment to `runBatch`'s doc-comment that batch-mode arguments must not contain whitespace.
|
|
|
|
**Resolution:** 2026-05-24 — `runBatch` now sets `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner`, lifting the per-line cap from 64 KiB to 16 MiB so a long bulk-args line (several thousand handles) no longer aborts the session. If a single line still exceeds the 16 MiB cap, the resulting `scanner.Err()` is now framed as a final error-with-sentinel (JSON payload + `batchEOR`) and returned, so the harness never sees an unframed bufio failure. Regression test `TestRunBatchHandlesLongCommandLine` in `cmd/mxgw-go/main_test.go` feeds an ~88 KiB `subscribe-bulk` line (above the old 64 KiB default) followed by `version --json` and asserts two EOR sentinels are emitted — pre-fix the test failed with "bufio.Scanner: token too long" returned from `runBatch`; post-fix both commands run and the session completes cleanly. Quote-aware tokenisation is out of scope for this finding (the recommendation accepts either fix); the `strings.Fields` shape is unchanged.
|
|
|
|
### Client.Go-027
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `clients/go/cmd/mxgw-go/main.go:1195-1206` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `runBatch`'s doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches:
|
|
|
|
```go
|
|
for scanner.Scan() {
|
|
line := scanner.Text()
|
|
if line == "" {
|
|
break
|
|
}
|
|
...
|
|
}
|
|
```
|
|
|
|
The cross-language batch harness sends `\n`-delimited command lines into the child's stdin and reads back EOR-framed results — there is no documented "empty line ends the session" framing in any other client (the .NET, Rust, Python, and Java drivers should be diffed if a fix lands). A stray blank line in a longer script (very easy in PowerShell here-strings, which prefer trailing newlines) terminates the entire Go batch session silently — the harness sees a clean exit and reads no EOR for the still-pending commands, which presents as "the next command never ran" with no error path.
|
|
|
|
The two cases the empty-line check seems to cover — (a) operator pressing Enter on an interactive `mxgw-go batch` prompt, and (b) `\n\n` framing between commands — would both be cleaner under a `continue` (skip blank lines, keep reading until real EOF) than a `break`. Option (a) isn't a real workflow (batch is a programmatic interface); option (b) is the actual bug.
|
|
|
|
**Recommendation:** Change `if line == "" { break }` to `if line == "" { continue }` (alongside the existing `len(args) == 0` continue, which is then redundant — keep one, drop the other for clarity). Update the `runBatch` doc-comment to read "only stdin EOF ends the session" and drop the "or an empty line" clause. If the interactive ergonomic is genuinely wanted, gate it on `isatty(stdin)` so the batch-from-pipe case isn't affected.
|
|
|
|
**Resolution:** 2026-05-24 — `runBatch` no longer treats a blank line as end-of-session. The `if line == "" { break }` early-exit was removed; blank or whitespace-only lines now fall through the existing `if len(args) == 0 { continue }` guard (kept as the single blank-line skip rule for clarity), so only stdin EOF ends the session. The doc-comment was updated to read "Blank lines are skipped; only stdin EOF ends the session." Regression test `TestRunBatchSkipsBlankLinesAndContinuesUntilEOF` in `cmd/mxgw-go/main_test.go` feeds `version --json\n\nversion --json\n` (a stray blank line between two commands) and asserts two EOR sentinels are emitted — pre-fix the test failed with "EOR sentinel count = 1, want 2" because the blank line broke the loop and the second command never ran; post-fix both commands run.
|