diff --git a/code-reviews/Client.Go/findings.md b/code-reviews/Client.Go/findings.md index 4fcedb6..e15dab4 100644 --- a/code-reviews/Client.Go/findings.md +++ b/code-reviews/Client.Go/findings.md @@ -5,9 +5,9 @@ | Module | `clients/go` | | Reviewer | Claude Code | | Review date | 2026-05-24 | -| Commit reviewed | `d692232` | -| Status | Reviewed | -| Open findings | 0 | +| Commit reviewed | `42b0037` | +| Status | Re-reviewed | +| Open findings | 6 | ## Checklist coverage @@ -51,6 +51,38 @@ and subpackage `mxgateway` intentionally unchanged per Go convention. | 9 | Testing coverage | No issues found in this diff — `alarms_test.go` correctly drops retired `session_id` from `AcknowledgeAlarmRequest` and retains it on `QueryActiveAlarmsRequest`. | | 10 | Documentation & comments | No issues found in this diff. | +### 2026-05-24 re-review (commit 42b0037) + +Re-review pass at `42b0037`. Four commits touched `clients/go/`: +`71d2c39` ported the long-lived `batch` subcommand; `8aaab82` added the bulk +SDK methods (`WriteBulk`/`Write2Bulk`/`WriteSecuredBulk`/`WriteSecured2Bulk`/`ReadBulk`) +and the six bulk/bench CLI subcommands (`read-bulk`, `write-bulk`, `write2-bulk`, +`write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`); `6f0d142` +ported `stream-alarms` and `acknowledge-alarm` (CLI + SDK type aliases + +`Client.StreamAlarms`); `8738735` updated the README to document the new SDK +methods. `gofmt -l .`, `go vet ./...`, `go build ./...`, and +`go test ./... -count=1` are clean at HEAD. + +This re-review surfaces two regressions of previously-Resolved findings and +four genuinely-new issues introduced by the new bulk / batch / alarm surface. +Client.Go-022 and Client.Go-023 are recorded as new IDs (not status flips on +the original) because the relevant code was deleted and re-added in +`8aaab82`, so the prior Resolution remains historically accurate against +that earlier commit. + +| # | Category | Result | +|---|---|---| +| 1 | Correctness & logic bugs | New issue: bulk SDK methods accept a non-nil but empty entry/tag slice and send a no-op bulk command (Client.Go-025). | +| 2 | mxaccessgw conventions | No issues found in this diff. `gofmt -l .` and `go vet ./...` clean. | +| 3 | Concurrency & thread safety | New issue: `runBenchReadBulk` warm-up + steady-state loops once again ignore `ctx.Err()` — a regression of Client.Go-018 (Client.Go-023). | +| 4 | Error handling & resilience | New issue: `runBatch` uses a default 64 KiB `bufio.Scanner` buffer; one long bulk-args line aborts the whole batch session with no EOR sentinel (Client.Go-026). | +| 5 | Security | No issues found — `runAcknowledgeAlarm` doesn't log the `-comment` / `-operator` values; the bulk-write SDK doc comments still flag credential-sensitive values. | +| 6 | Performance & resource management | No issues found in this diff — `defer client.Close()` / `defer session.Close(...)` / `defer session.UnsubscribeBulk(...)` applied consistently in `runBenchReadBulk`. | +| 7 | Design-document adherence | No issues found in this diff. `Client.StreamAlarms` correctly skips `c.callContext` (streams must not carry the unary call timeout); `AcknowledgeAlarm` correctly uses it. | +| 8 | Code organization & conventions | New issues: `runWriteBulkVariant` re-introduces the unused `secured` parameter, the dead `_ = secured` line, and unconditionally-registered secured/non-secured flags — regresses Client.Go-015 (Client.Go-022). `runBatch` treats a blank line in the middle of a script as end-of-session (Client.Go-027). | +| 9 | Testing coverage | New issue: the five new bulk SDK methods and `Client.StreamAlarms` have no unit tests in `mxgateway/` (Client.Go-024). | +| 10 | Documentation & comments | No issues found in this diff. README documents the new `StreamAlarms`/`AcknowledgeAlarm` SDK calls; `Session.ReadBulk` documents the cached-vs-snapshot semantics and `timeout=0` default; `WriteSecuredBulk` flags credential sensitivity. | + ## Findings ### Client.Go-001 @@ -432,3 +464,164 @@ But there is no test asserting that, e.g., `mxgw-go write-bulk -current-user-id Each is a few lines and routes through the existing `runWithIO` entry point, so it does not need a bufconn fake. **Resolution:** 2026-05-20 — Added CLI-level table-driven regression tests in `cmd/mxgw-go/main_test.go` routed through `runWithIO`, so they need no bufconn fake: `TestRunWriteBulkVariantGatesSecuredFlags` pins Client.Go-015 by asserting `write-bulk -current-user-id`, `write-bulk -verifier-user-id`, `write2-bulk -current-user-id`, `write-secured-bulk -user-id`, and `write-secured2-bulk -user-id` all surface `flag provided but not defined`; `TestRunReadBulkRejectsMissingArgs` pins the "session-id and items are required" check across no-flags / missing-items / missing-session-id; `TestRunBenchReadBulkRejectsNonPositiveBulkSize` and `TestRunBenchReadBulkRejectsNonPositiveDuration` pin the positivity checks; `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` pins the explicit `item-handles count ... does not match values count ...` error. `go test ./...` passes. + +### Client.Go-022 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Code organization & conventions | +| Location | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` | +| Status | Open | + +**Description:** Commit `8aaab82` ("Go client: port bulk read/write SDK methods + CLI subcommands") re-introduces every symptom that Client.Go-015 documented and was marked Resolved against an earlier commit: + +- `runWriteBulkVariant(ctx, args, stdout, stderr, command, withTimestamp, secured bool)` accepts a `secured` parameter that is never used for routing — the switch at `main.go:478` keys on `command` (`"write-bulk"` / `"write2-bulk"` / `"write-secured-bulk"` / `"write-secured2-bulk"`). The function ends with the misleading `_ = secured // currently only used for routing above; reserved for future per-variant validation` at `main.go:517`, the exact line Client.Go-015 called out as both inaccurate (`secured` is not used for routing) and reserved-without-purpose. +- All four wrappers (`runWriteBulk`, `runWrite2Bulk`, `runWriteSecuredBulk`, `runWriteSecured2Bulk` at `main.go:398-412`) pass a `secured` argument that has no effect. +- `-current-user-id`, `-verifier-user-id`, and `-user-id` are all registered unconditionally for every variant (`main.go:427-429`), so `mxgw-go write-bulk -current-user-id 5 -item-handles 1 -values foo` silently no-ops the flag instead of failing with `flag provided but not defined`. The secured-only flag gating that Client.Go-015's resolution claimed is gone again. + +Because the surrounding test file (`main_test.go`) lost the regression tests promised by Client.Go-021 between then and this commit, nothing caught the re-introduction. + +**Recommendation:** Re-apply the Client.Go-015 fix on this re-added code. Drop the `secured` parameter and the `_ = secured` line (the `command` switch is the only routing key); derive the variant locally from `command`; register `-current-user-id` / `-verifier-user-id` only inside the secured branches and `-user-id` only inside Write/Write2 — so a wrong-variant flag fails with a clean `flag provided but not defined` usage error. Re-add the `TestRunWriteBulkVariantGatesSecuredFlags` table-test from the Client.Go-021 resolution so a future regression is caught by CI. + +**Resolution:** Open. + +### Client.Go-023 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Concurrency & thread safety | +| Location | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` | +| Status | Open | + +**Description:** `runBenchReadBulk`'s warm-up and steady-state loops are wall-clock-only again: + +```go +warmupDeadline := time.Now().Add(time.Duration(*warmupSeconds) * time.Second) +timeout := time.Duration(*timeoutMs) * time.Millisecond +for time.Now().Before(warmupDeadline) { + _, _ = session.ReadBulk(ctx, serverHandle, tags, timeout) +} +... +for time.Now().Before(steadyDeadline) { + callStart := time.Now() + results, err := session.ReadBulk(ctx, serverHandle, tags, timeout) + ... +} +``` + +Neither loop checks `ctx.Done()` / `ctx.Err()`. This is exactly the shape Client.Go-018 documented and was marked Resolved against an earlier commit — the `&& ctx.Err() == nil` guard the Resolution added has been removed by commit `8aaab82` (which re-added `runBenchReadBulk`). If the cross-language driver `scripts/bench-read-bulk.ps1` times out and kills the child early, or an operator Ctrl+Cs the bench, each `ReadBulk` call inside fails fast with `context.Canceled` but the steady-state loop counts those as `failedCalls++` and keeps spinning for the full `*durationSeconds` window. That inflates `failedCalls` and `latencyMs.max` in the JSON the PowerShell driver collates across all five clients, and wastes CPU. Note that `runBenchReadBulk` is also the only Go CLI command that does not register its own signal handler — `runStreamAlarms` (added in the same diff range) and `runGalaxyWatch` both do. + +**Recommendation:** Re-apply the Client.Go-018 fix: change both loop conditions to `for time.Now().Before(warmupDeadline) && ctx.Err() == nil` (and the same on `steadyDeadline`). The cross-language bench JSON shape is unchanged — the truncated window is just reported faithfully via `durationMs` / `totalCalls`. Optionally add the `signal.NotifyContext` pattern used by `runStreamAlarms` and `runGalaxyWatch` so direct Ctrl+C on the bench also short-circuits cleanly. + +**Resolution:** Open. + +### Client.Go-024 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Testing coverage | +| Location | `clients/go/mxgateway/session.go:395-525`, `clients/go/mxgateway/alarms.go:65-76` | +| Status | Open | + +**Description:** The five new bulk SDK methods on `Session` and the new `Client.StreamAlarms` method have **no unit tests** in `clients/go/mxgateway/`: + +- `Session.WriteBulk` (`session.go:395`) +- `Session.Write2Bulk` (`session.go:418`) +- `Session.WriteSecuredBulk` (`session.go:442`) +- `Session.WriteSecured2Bulk` (`session.go:465`) +- `Session.ReadBulk` (`session.go:495`) — in particular the `timeout > uint32(^uint32(0))` ms saturation branch at `session.go:504-509` is interesting (a `time.Duration` is `int64` ns; the conversion silently clamps to `MaxUint32` ms ≈ 49 days) and is entirely untested. +- `Client.StreamAlarms` (`alarms.go:65`) — no equivalent of `TestQueryActiveAlarmsPassesFilterPrefix` exists for `StreamAlarms`, even though they share the same fake-server pattern. + +`mxgateway/client_session_test.go` and `mxgateway/alarms_test.go` cover the prior single-item and bulk methods (`TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults`, `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag`, etc. were referenced by Client.Go-021 but `Grep` of the current tree confirms they no longer exist — the test file ends at commit `8aaab82` without any of them), so library-level happy paths and error paths for the new shape are entirely uncovered. + +**Recommendation:** Add bufconn-fake-server tests in `mxgateway/client_session_test.go` and `mxgateway/alarms_test.go`: + +- `TestWriteBulkBuildsOneBulkCommandAndReturnsPerEntryResults` — confirm the protobuf payload is `MX_COMMAND_KIND_WRITE_BULK` carrying every entry, and the reply's `BulkWriteResult` list flows back. Mirror for Write2Bulk, WriteSecuredBulk, WriteSecured2Bulk. +- `TestReadBulkForwardsTimeoutAndUnpacksCachedFlag` — pin `timeoutMs` arithmetic for `0`, a normal `time.Duration`, and the `> MaxUint32 ms` saturation branch; assert `WasCached` propagates from each `BulkReadResult`. +- `TestStreamAlarmsPassesFilterPrefix` — mirror `TestQueryActiveAlarmsPassesFilterPrefix`; assert request flows and stream Recv returns each fake `AlarmFeedMessage` (active-alarm snapshot, snapshot-complete sentinel, transition). + +These plus `nil` / empty-slice rejection tests for each bulk method close out the new public surface. + +**Resolution:** Open. + +### Client.Go-025 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Correctness & logic bugs | +| Location | `clients/go/mxgateway/session.go:395-485,495-525` | +| Status | Open | + +**Description:** The five new bulk methods (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk`) each guard with `if entries == nil { return error }` and an upper-bound `ensureBulkSize` check, but accept a non-nil empty slice (e.g. `[]*WriteBulkEntry{}` or `[]string{}`). The call then sends an `MX_COMMAND_KIND_WRITE_BULK` (or peer) command with zero entries across the gRPC wire to the gateway, which forwards to the worker for a no-op round trip. This is the same shape Client.Go-015 / Client.Go-021 were written against (the CLI now also accepts `mxgw-go write-bulk -item-handles , -values ,` which `parseInt32List` returns as empty without error). The pre-existing bulk methods (`AddItemBulk`, `AdviseItemBulk`, etc. at `session.go:253-343`) carry the identical pattern, so this is a long-standing convention — but it's still a real cost on the hot path. The Java / .NET / Rust / Python clients should be checked for parity if this is fixed. + +**Recommendation:** Pick one model and apply it consistently to every bulk SDK method (new and pre-existing): + +1. Reject empty slices early: `if len(entries) == 0 { return nil, errors.New("mxgateway: write bulk entries are required") }` — this is the spelling already used on the nil branch, and matches the GoStyleGuide's "Return errors instead of panicking" / "Wrap errors with useful context" voice. +2. Or document at the SDK doc-comment that an empty slice is a valid no-op (it produces `[]*BulkWriteResult{}` and one wasted gRPC round trip), and have the CLI reject empty `-item-handles` / `-values` before dialing. + +Option 1 is cheaper for callers (one less round trip and one clearer error message) and removes the empty-list footgun for cross-language drivers that may pass empty arrays from PowerShell `,` splits. + +**Resolution:** Open. + +### Client.Go-026 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Error handling & resilience | +| Location | `clients/go/cmd/mxgw-go/main.go:1196-1222` | +| Status | Open | + +**Description:** `runBatch` reads command lines with a default `bufio.Scanner`: + +```go +scanner := bufio.NewScanner(in) +for scanner.Scan() { + ... +} +return scanner.Err() +``` + +The default `bufio.Scanner` token size is 64 KiB (`bufio.MaxScanTokenSize`). One long subcommand invocation — e.g. a `subscribe-bulk -items tag001,tag002,...,tag5000` line with several thousand tag references, or a `write-bulk` driven by a Galaxy-sized handle list — crosses 64 KiB and the scanner errors out with `bufio.ErrTooLong` mid-session. Two follow-on effects: + +- The error is returned only at the outer `return scanner.Err()`, so the cross-language batch harness sees neither the EOR sentinel for the failing command nor any of the still-pending commands queued after it. +- The fix is one line: `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` (or larger), which lifts the cap on one-line tokens without buffering the whole input. + +A second weakness: `strings.Fields(line)` splits on whitespace and does no quote-aware tokenisation, so a literal `-comment "with spaces"` becomes four args (`-comment`, `"with`, `spaces"`). This isn't a Go-specific bug — every language's batch tokeniser has this surface — but it means certain CLI flags that take free-text values (`acknowledge-alarm -comment`, `acknowledge-alarm -operator`) can't carry spaces in batch mode. Worth documenting if not fixed. + +**Recommendation:** Call `scanner.Buffer(make([]byte, 0, 64*1024), 16*1024*1024)` immediately after `bufio.NewScanner` so a long bulk-args line doesn't abort the session. If `runBatch` is intended to support free-text flag values (the `acknowledge-alarm -comment` shape is the obvious case), swap `strings.Fields` for a quote-aware tokeniser (`mvdan.cc/sh/v3/syntax` or a small inline state machine matching the .NET/Rust harness shape). Otherwise add a one-line comment to `runBatch`'s doc-comment that batch-mode arguments must not contain whitespace. + +**Resolution:** Open. + +### Client.Go-027 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Code organization & conventions | +| Location | `clients/go/cmd/mxgw-go/main.go:1195-1206` | +| Status | Open | + +**Description:** `runBatch`'s doc-comment says the loop "never terminates on command error; only stdin EOF (or an empty line) ends the session", and the implementation matches: + +```go +for scanner.Scan() { + line := scanner.Text() + if line == "" { + break + } + ... +} +``` + +The cross-language batch harness sends `\n`-delimited command lines into the child's stdin and reads back EOR-framed results — there is no documented "empty line ends the session" framing in any other client (the .NET, Rust, Python, and Java drivers should be diffed if a fix lands). A stray blank line in a longer script (very easy in PowerShell here-strings, which prefer trailing newlines) terminates the entire Go batch session silently — the harness sees a clean exit and reads no EOR for the still-pending commands, which presents as "the next command never ran" with no error path. + +The two cases the empty-line check seems to cover — (a) operator pressing Enter on an interactive `mxgw-go batch` prompt, and (b) `\n\n` framing between commands — would both be cleaner under a `continue` (skip blank lines, keep reading until real EOF) than a `break`. Option (a) isn't a real workflow (batch is a programmatic interface); option (b) is the actual bug. + +**Recommendation:** Change `if line == "" { break }` to `if line == "" { continue }` (alongside the existing `len(args) == 0` continue, which is then redundant — keep one, drop the other for clarity). Update the `runBatch` doc-comment to read "only stdin EOF ends the session" and drop the "or an empty line" clause. If the interactive ergonomic is genuinely wanted, gate it on `isatty(stdin)` so the batch-from-pipe case isn't affected. + +**Resolution:** Open.