code-reviews: re-review Client.Go at 42b0037

Append 6 new findings (Client.Go-022..027): regressions of Client.Go-015
(runWriteBulkVariant secured flag) and Client.Go-018 (bench loop
ignores ctx.Err()) reintroduced by the bulk port; no SDK tests for the
new WriteBulk/ReadBulk/StreamAlarms methods; bulk SDK accepts empty
slices; bufio.Scanner buffer + blank-line sentinel can abort the batch
session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-24 08:28:41 -04:00
parent b9ef09d26e
commit afa82e0989
+196 -3
View File
@@ -5,9 +5,9 @@
| Module | `clients/go` |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 0 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 6 |
## Checklist coverage
@@ -51,6 +51,38 @@ and subpackage `mxgateway` intentionally unchanged per Go convention.
| 9 | Testing coverage | No issues found in this diff — `alarms_test.go` correctly drops retired `session_id` from `AcknowledgeAlarmRequest` and retains it on `QueryActiveAlarmsRequest`. |
| 10 | Documentation & comments | No issues found in this diff. |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037`. Four commits touched `clients/go/`:
`71d2c39` ported the long-lived `batch` subcommand; `8aaab82` added the bulk
SDK methods (`WriteBulk`/`Write2Bulk`/`WriteSecuredBulk`/`WriteSecured2Bulk`/`ReadBulk`)
and the six bulk/bench CLI subcommands (`read-bulk`, `write-bulk`, `write2-bulk`,
`write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`); `6f0d142`
ported `stream-alarms` and `acknowledge-alarm` (CLI + SDK type aliases +
`Client.StreamAlarms`); `8738735` updated the README to document the new SDK
methods. `gofmt -l .`, `go vet ./...`, `go build ./...`, and
`go test ./... -count=1` are clean at HEAD.
This re-review surfaces two regressions of previously-Resolved findings and
four genuinely-new issues introduced by the new bulk / batch / alarm surface.
Client.Go-022 and Client.Go-023 are recorded as new IDs (not status flips on
the original) because the relevant code was deleted and re-added in
`8aaab82`, so the prior Resolution remains historically accurate against
that earlier commit.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | New issue: bulk SDK methods accept a non-nil but empty entry/tag slice and send a no-op bulk command (Client.Go-025). |
| 2 | mxaccessgw conventions | No issues found in this diff. `gofmt -l .` and `go vet ./...` clean. |
| 3 | Concurrency & thread safety | New issue: `runBenchReadBulk` warm-up + steady-state loops once again ignore `ctx.Err()` — a regression of Client.Go-018 (Client.Go-023). |
| 4 | Error handling & resilience | New issue: `runBatch` uses a default 64 KiB `bufio.Scanner` buffer; one long bulk-args line aborts the whole batch session with no EOR sentinel (Client.Go-026). |
| 5 | Security | No issues found — `runAcknowledgeAlarm` doesn't log the `-comment` / `-operator` values; the bulk-write SDK doc comments still flag credential-sensitive values. |
| 6 | Performance & resource management | No issues found in this diff — `defer client.Close()` / `defer session.Close(...)` / `defer session.UnsubscribeBulk(...)` applied consistently in `runBenchReadBulk`. |
| 7 | Design-document adherence | No issues found in this diff. `Client.StreamAlarms` correctly skips `c.callContext` (streams must not carry the unary call timeout); `AcknowledgeAlarm` correctly uses it. |
| 8 | Code organization & conventions | New issues: `runWriteBulkVariant` re-introduces the unused `secured` parameter, the dead `_ = secured` line, and unconditionally-registered secured/non-secured flags — regresses Client.Go-015 (Client.Go-022). `runBatch` treats a blank line in the middle of a script as end-of-session (Client.Go-027). |
| 9 | Testing coverage | New issue: the five new bulk SDK methods and `Client.StreamAlarms` have no unit tests in `mxgateway/` (Client.Go-024). |
| 10 | Documentation & comments | No issues found in this diff. README documents the new `StreamAlarms`/`AcknowledgeAlarm` SDK calls; `Session.ReadBulk` documents the cached-vs-snapshot semantics and `timeout=0` default; `WriteSecuredBulk` flags credential sensitivity. |
## Findings
### Client.Go-001
@@ -432,3 +464,164 @@ But there is no test asserting that, e.g., `mxgw-go write-bulk -current-user-id
Each is a few lines and routes through the existing `runWithIO` entry point, so it does not need a bufconn fake.
**Resolution:** 2026-05-20 — Added CLI-level table-driven regression tests in `cmd/mxgw-go/main_test.go` routed through `runWithIO`, so they need no bufconn fake: `TestRunWriteBulkVariantGatesSecuredFlags` pins Client.Go-015 by asserting `write-bulk -current-user-id`, `write-bulk -verifier-user-id`, `write2-bulk -current-user-id`, `write-secured-bulk -user-id`, and `write-secured2-bulk -user-id` all surface `flag provided but not defined`; `TestRunReadBulkRejectsMissingArgs` pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; `TestRunBenchReadBulkRejectsNonPositiveBulkSize` and `TestRunBenchReadBulkRejectsNonPositiveDuration` pin the positivity checks; `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` pins the explicit `item-handles count ... does not match values count ...` error. `go test ./...` passes.
### Client.Go-022
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` |
| Status | Open |
**Description:** Commit `8aaab82` ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit:
- `runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool)` accepts a `secured` parameter that is never used for routing — the switch at `main.go:478` keys on `command` (`"write-bulk"` / `"write2-bulk"` / `"write-secured-bulk"` / `"write-secured2-bulk"`). The function ends with the misleading `_ = secured // currently only used for routing above; reserved for future per-variant validation` at `main.go:517`, the exact line Client.Go-015 called out as both inaccurate (`secured` is not used for routing) and reserved-without-purpose.
- All four wrappers (`runWriteBulk`, `runWrite2Bulk`, `runWriteSecuredBulk`, `runWriteSecured2Bulk` at `main.go:398-412`) pass a `secured` argument that has no effect.
- `-current-user-id`, `-verifier-user-id`, and `-user-id` are all registered unconditionally for every variant (`main.go:427-429`), so `mxgw-go write-bulk -current-user-id 5 -item-handles 1 -values foo` silently no-ops the flag instead of failing with `flag provided but not defined`. The secured-only flag gating that Client.Go-015's resolution claimed is gone again.
Because the surrounding test file (`main_test.go`) lost the regression tests promised by Client.Go-021 between then and this commit, nothing caught the re-introduction.
**Recommendation:** Re-apply the Client.Go-015 fix on this re-added code. Drop the `secured` parameter and the `_ = secured` line (the `command` switch is the only routing key); derive the variant locally from `command`; register `-current-user-id` / `-verifier-user-id` only inside the secured branches and `-user-id` only inside Write/Write2 — so a wrong-variant flag fails with a clean `flag provided but not defined` usage error. Re-add the `TestRunWriteBulkVariantGatesSecuredFlags` table-test from the Client.Go-021 resolution so a future regression is caught by CI.
**Resolution:** Open.
### Client.Go-023
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` |
| Status | Open |
**Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only again:
```go
warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second)
timeout := time.Duration(*timeoutMs) * time.Millisecond
for time.Now().Before(warmupDeadline) {
_, _ = session.ReadBulk(ctx, serverHandle, tags, timeout)
}
...
for time.Now().Before(steadyDeadline) {
callStart := time.Now()
results, err := session.ReadBulk(ctx, serverHandle, tags, timeout)
...
}
```
Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client.Go-018 documented and was marked Resolved against an earlier commit — the `&& ctx.Err() == nil` guard the Resolution added has been removed by commit `8aaab82` (which re-added `runBenchReadBulk`). If the cross-language driver `scripts/bench-read-bulk.ps1` times out and kills the child early, or an operator Ctrl+Cs the bench, each `ReadBulk` call inside fails fast with `context.Canceled` but the steady-state loop counts those as `failedCalls++` and keeps spinning for the full `*durationSeconds` window. That inflates `failedCalls` and `latencyMs.max` in the JSON the PowerShell driver collates across all five clients, and wastes CPU. Note that `runBenchReadBulk` is also the only Go CLI command that does not register its own signal handler — `runStreamAlarms` (added in the same diff range) and `runGalaxyWatch` both do.
**Recommendation:** Re-apply the Client.Go-018 fix: change both loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`). The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`. Optionally add the `signal.NotifyContext` pattern used by `runStreamAlarms` and `runGalaxyWatch` so direct Ctrl+C on the bench also short-circuits cleanly.
**Resolution:** Open.
### Client.Go-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` |
| Status | Open |
**Description:** The five new bulk SDK methods on `Session` and the new `Client.StreamAlarms` method have **no unit tests** in `clients/go/mxgateway/`:
- `Session.WriteBulk` (`session.go:395`)
- `Session.Write2Bulk` (`session.go:418`)
- `Session.WriteSecuredBulk` (`session.go:442`)
- `Session.WriteSecured2Bulk` (`session.go:465`)
- `Session.ReadBulk` (`session.go:495`) — in particular the `timeout > uint32(^uint32(0))` ms saturation branch at `session.go:504-509` is interesting (a `time.Duration` is `int64` ns; the conversion silently clamps to `MaxUint32` ms ≈ 49 days) and is entirely untested.
- `Client.StreamAlarms` (`alarms.go:65`) — no equivalent of `TestQueryActiveAlarmsPassesFilterPrefix` exists for `StreamAlarms`, even though they share the same fake-server pattern.
`mxgateway/client_session_test.go` and `mxgateway/alarms_test.go` cover the prior single-item and bulk methods (`TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults`, `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag`, etc. were referenced by Client.Go-021 but `Grep` of the current tree confirms they no longer exist — the test file ends at commit `8aaab82` without any of them), so library-level happy paths and error paths for the new shape are entirely uncovered.
**Recommendation:** Add bufconn-fake-server tests in `mxgateway/client_session_test.go` and `mxgateway/alarms_test.go`:
- `TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults` — confirm the protobuf payload is `MX_COMMAND_KIND_WRITE_BULK` carrying every entry, and the reply's `BulkWriteResult` list flows back. Mirror for Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk.
- `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag` — pin `timeoutMs` arithmetic for `0`, a normal `time.Duration`, and the `> MaxUint32 ms` saturation branch; assert `WasCached` propagates from each `BulkReadResult`.
- `TestStreamAlarmsPassesFilterPrefix` — mirror `TestQueryActiveAlarmsPassesFilterPrefix`; assert request flows and stream Recv returns each fake `AlarmFeedMessage` (active-alarm snapshot, snapshot-complete sentinel, transition).
These plus `nil` / empty-slice rejection tests for each bulk method close out the new public surface.
**Resolution:** Open.
### Client.Go-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/session.go:395-485,495-525` |
| Status | Open |
**Description:** The five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) each guard with `if entries == nil { return error }` and an upper-bound `ensureBulkSize` check, but accept a non-nil empty slice (e.g. `[]*WriteBulkEntry{}` or `[]string{}`). The call then sends an `MX_COMMAND_KIND_WRITE_BULK` (or peer) command with zero entries across the gRPC wire to the gateway, which forwards to the worker for a no-op round trip. This is the same shape Client.Go-015 / Client.Go-021 were written against (the CLI now also accepts `mxgw-go write-bulk -item-handles , -values ,` which `parseInt32List` returns as empty without error). The pre-existing bulk methods (`AddItemBulk`, `AdviseItemBulk`, etc. at `session.go:253-343`) carry the identical pattern, so this is a long-standing convention — but it's still a real cost on the hot path. The Java / .NET / Rust / Python clients should be checked for parity if this is fixed.
**Recommendation:** Pick one model and apply it consistently to every bulk SDK method (new and pre-existing):
1. Reject empty slices early: `if len(entries) == 0 { return nil, errors.New("mxgateway: write bulk entries are required") }` — this is the spelling already used on the nil branch, and matches the GoStyleGuide's "Return errors instead of panicking" / "Wrap errors with useful context" voice.
2. Or document at the SDK doc-comment that an empty slice is a valid no-op (it produces `[]*BulkWriteResult{}` and one wasted gRPC round trip), and have the CLI reject empty `-item-handles` / `-values` before dialing.
Option 1 is cheaper for callers (one less round trip and one clearer error message) and removes the empty-list footgun for cross-language drivers that may pass empty arrays from PowerShell `,` splits.
**Resolution:** Open.
### Client.Go-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/go/cmd/mxgw-go/main.go:1196-1222` |
| Status | Open |
**Description:** `runBatch` reads command lines with a default `bufio.Scanner`:
```go
scanner := bufio.NewScanner(in)
for scanner.Scan() {
...
}
return scanner.Err()
```
The default `bufio.Scanner` token size is 64 KiB (`bufio.MaxScanTokenSize`). One long subcommand invocation — e.g. a `subscribe-bulk -items tag001,tag002,...,tag5000` line with several thousand tag references, or a `write-bulk` driven by a Galaxy-sized handle list — crosses 64 KiB and the scanner errors out with `bufio.ErrTooLong` mid-session. Two follow-on effects:
- The error is returned only at the outer `return scanner.Err()`, so the cross-language batch harness sees neither the EOR sentinel for the failing command nor any of the still-pending commands queued after it.
- The fix is one line: `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` (or larger), which lifts the cap on one-line tokens without buffering the whole input.
A second weakness: `strings.Fields(line)` splits on whitespace and does no quote-aware tokenisation, so a literal `-comment "with spaces"` becomes four args (`-comment`, `"with`, `spaces"`). This isn't a Go-specific bug — every language's batch tokeniser has this surface — but it means certain CLI flags that take free-text values (`acknowledge-alarm -comment`, `acknowledge-alarm -operator`) can't carry spaces in batch mode. Worth documenting if not fixed.
**Recommendation:** Call `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner` so a long bulk-args line doesn't abort the session. If `runBatch` is intended to support free-text flag values (the `acknowledge-alarm -comment` shape is the obvious case), swap `strings.Fields` for a quote-aware tokeniser (`mvdan.cc/sh/v3/syntax` or a small inline state machine matching the .NET/Rust harness shape). Otherwise add a one-line comment to `runBatch`'s doc-comment that batch-mode arguments must not contain whitespace.
**Resolution:** Open.
### Client.Go-027
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:1195-1206` |
| Status | Open |
**Description:** `runBatch`'s doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches:
```go
for scanner.Scan() {
line := scanner.Text()
if line == "" {
break
}
...
}
```
The cross-language batch harness sends `\n`-delimited command lines into the child's stdin and reads back EOR-framed results — there is no documented "empty line ends the session" framing in any other client (the .NET, Rust, Python, and Java drivers should be diffed if a fix lands). A stray blank line in a longer script (very easy in PowerShell here-strings, which prefer trailing newlines) terminates the entire Go batch session silently — the harness sees a clean exit and reads no EOR for the still-pending commands, which presents as "the next command never ran" with no error path.
The two cases the empty-line check seems to cover — (a) operator pressing Enter on an interactive `mxgw-go batch` prompt, and (b) `\n\n` framing between commands — would both be cleaner under a `continue` (skip blank lines, keep reading until real EOF) than a `break`. Option (a) isn't a real workflow (batch is a programmatic interface); option (b) is the actual bug.
**Recommendation:** Change `if line == "" { break }` to `if line == "" { continue }` (alongside the existing `len(args) == 0` continue, which is then redundant — keep one, drop the other for clarity). Update the `runBatch` doc-comment to read "only stdin EOF ends the session" and drop the "or an empty line" clause. If the interactive ergonomic is genuinely wanted, gate it on `isatty(stdin)` so the batch-from-pipe case isn't affected.
**Resolution:** Open.