code-reviews: 2026-06-16 re-review of all 11 modules at 8df5ab3

Re-review of the 99-commit delta since the 410acc9 baseline (session-resilience
epic, dashboard disable-login, galaxy browse fixes, and stillpending §8).

44 new Open findings, no Critical/High:
- Server 2 (incl. Medium design-doc drift), Worker 0 (026/027/028 confirmed
  resolved), Contracts 3, Tests 3, Worker.Tests 3, IntegrationTests 4
- Client.Dotnet 4 (Medium env-var key redaction), Client.Go 5 (Medium watch
  drain), Client.Java 9 (Medium overflow race), Client.Python 5 (Medium README
  API), Client.Rust 6 (Medium --tls/--plaintext downgrade)

README regenerated; regen-readme.py --check passes.
This commit is contained in:
Joseph Doherty
2026-06-16 18:57:56 -04:00
parent 8df5ab381a
commit 25d04ec37e
12 changed files with 936 additions and 44 deletions
+57 -12
View File
@@ -10,23 +10,68 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 25 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 29 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 39 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 31 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 32 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 19 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 29 |
| [Server](Server/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 53 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 35 |
| [Worker](Worker/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 28 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 33 |
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 29 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 34 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 9 | 48 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 36 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 6 | 38 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 22 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 33 |
| [Server](Server/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 2 | 55 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 38 |
| [Worker](Worker/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 28 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 36 |
## Pending findings
Findings with status `Open` or `In Progress`, ordered by severity.
_No pending findings._
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Dotnet-028 | Medium | Security | `clients/dotnet/.../MxGatewayClientCli.cs:156` | Client.Dotnet-008 was recorded resolved by adding a `TryResolveApiKey` helper resolving both `--api-key` and the `--api-key-env` env-var path, wired into the error-redaction catch block. At HEAD the catch block reads `arguments.GetOptional… |
| Client.Go-030 | Medium | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:1491-1494` | `runGalaxyWatch`'s limit-reached branch calls `cancelStream()` and returns WITHOUT draining the buffered `events` channel, unlike the signal-cancel branch which drains. This is the shape Client.Go-013's resolution claimed to have fixed ("n… |
| Client.Java-040 | Medium | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1552-1561` | The `stream-alarms` overflow handler does `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)` non-atomically on an `ArrayBlockingQueue` shared with the gRPC delivery thread. In production gRPC (netty I/O thread), a concurrent… |
| Client.Python-036 | Medium | Documentation & comments | `clients/python/README.md:143-158` | The README "Browsing lazily" section's code example calls `galaxy.browse_children(...)`, a method that does not exist — the actual public low-level method is `browse_children_raw`. The example raises `AttributeError` at runtime. The README… |
| Client.Rust-033 | Medium | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:485` | `ConnectionArgs::options()` computes plaintext as `!self.tls \|\| self.plaintext`. With both `--tls` and `--plaintext` supplied, this is `true`, silently degrading to an unencrypted channel despite the explicit `--tls`. A security-sensitive… |
| Server-054 | Medium | Design-document adherence | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) | The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still… |
| Client.Dotnet-026 | Low | Correctness & logic bugs | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) | Client.Dotnet-015 extended `isLongRunning` to include the bench commands so they aren't silently cancelled by the default 30s CTS. The new `galaxy-browse` command is NOT in `isLongRunning`. A `galaxy-browse --depth N` tree walk on a large… |
| Client.Dotnet-027 | Low | Performance & resource management | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` | `LazyBrowseNode` allocates one `SemaphoreSlim _expandLock = new(1,1)` per node and never disposes it (the type is not IDisposable). For a large Galaxy browse tree (thousands of nodes), live SemaphoreSlim instances accumulate; OS handles ar… |
| Client.Dotnet-029 | Low | Code organization & conventions | `clients/dotnet/.../IMxGatewayCliClient.cs:6` | `IMxGatewayCliClient` is a public interface with no type-level `<summary>` XML doc. The Client.Dotnet-013 resolution recorded adding one; at HEAD it is absent. No CS1591 fires (GenerateDocumentationFile now scoped to the packable library o… |
| Client.Go-031 | Low | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:1037-1046` | `closeSmokeSession` registers `defer cancel()` twice on the same `cancel` variable across two `context.WithTimeout` calls when the deadline-shortening branch fires. Because `cancel` is reassigned, both defers end up calling the second cont… |
| Client.Go-032 | Low | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:839-841` | `runStreamEvents` does not install a `signal.NotifyContext` handler, while `runStreamAlarms` and `runGalaxyWatch` do. Client.Go-020's resolution claimed this was added. Without a signal-aware parent context, Ctrl+C kills the process withou… |
| Client.Go-033 | Low | Testing coverage | `clients/go/cmd/mxgw-go/main_test.go` | Gaps vs prior coverage: (1) `TestRunBenchReadBulkRejectsNonPositiveDuration` (named in Client.Go-021's resolution) is absent — the `-duration-seconds`-positive guard at main.go:619 is untested; (2) `runStreamEvents` has no CLI-level test (… |
| Client.Go-034 | Low | Documentation & comments | `clients/go/README.md:245-263` | The README CLI example table lists ~12 commands but the binary now exposes ~27 subcommands (per `writeUsage`). Absent: `ping`, `galaxy-browse`, `batch`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`,… |
| Client.Java-041 | Low | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:2187-2194` | `jsonString` escapes only `\`, `"`, `\r`, `\n` — not `\t`, `\b`, `\f`, or U+0000U+001F/U+007F. A tag address/message/reference containing a tab produces malformed JSON (RFC 8259). Affects the hand-rolled `jsonObject`/`jsonString`/`jsonVal… |
| Client.Java-042 | Low | Error handling & resilience | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1565-1567` | `StreamAlarmsCommand.onError` calls `queue.offer(error)` without checking the return value. If the queue is full when a transport error arrives, the error is dropped and the drain loop blocks forever on `queue.take()`. Same class as Client… |
| Client.Java-043 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:241-264` | `galaxyBrowseParentZeroEmitsWarningToStderr` calls `MxGatewayCli.execute(new FakeClientFactory(), ...)` for a galaxy-browse command, which wires the real `GrpcGalaxyClientFactory` and constructs a live Netty channel to localhost:5000 as a… |
| Client.Java-044 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientVersion.java:12` | `CLIENT_VERSION = "0.1.0"` is out of sync with Gradle `version = '0.1.1'` (cross-ref `clients/java/build.gradle:6`). The `version` command advertises 0.1.0 while the published artifact is 0.1.1; consumers can't use the version string as a… |
| Client.Java-045 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java` | The harness implements only `streamEvents`/`closeSession` (gateway) and `discoverHierarchy`/`watchDeployEvents` (galaxy); all other RPCs return gRPC UNIMPLEMENTED. This is undocumented, so a future test exercising invoke/register through t… |
| Client.Java-046 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:680-696` | `streamAlarmsCommandFailsFastOnQueueOverflow` delivers all 2000 onNext synchronously from within `streamAlarms`, so `subscriptionRef` is still null when the overflow fires — the `sub.cancel()` branch is never exercised. The test also doesn… |
| Client.Java-047 | Low | Documentation & comments | `clients/java/README.md` | README advertises the `0.1.1` artifact coordinate (Gitea Maven section) while the `version` command reports `0.1.0` — the user-visible symptom of Client.Java-044. Cross-ref `MxGatewayClientVersion.java:12`. |
| Client.Java-048 | Low | Documentation & comments | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:88-105` | The public `execute(PrintWriter, PrintWriter, String...)` Javadoc calls it "Test-friendly entry point", but it wires `GrpcMxGatewayCliClientFactory` with no injection — the actual test seam is the package-private `execute(MxGatewayCliClien… |
| Client.Python-032 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` | `_smoke` reintroduces the dead `closed = False` / `if not closed:` guard that Client.Python-004's resolution claimed to have removed via `async with session:`. `closed` is never reassigned, so the guard is always true. Behavior is correct… |
| Client.Python-033 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` | `_parse_string_list` always emits `param_hint="--items"`, but it is also called from `_build_write_bulk_entries` with `kwargs["values"]`. An empty `--values ""` on the write-bulk commands yields `Error: Invalid value for '--items': ...`, p… |
| Client.Python-034 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` | `_parse_int_list` does `int(item)` with no error handling. A non-numeric token (e.g. `--item-handles "10,abc"`) raises a raw `ValueError`, surfacing as an unformatted traceback interactively (other input errors raise `click.BadParameter`). |
| Client.Python-035 | Low | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` | Two new public types — `BrowseChildrenOptions` (options.py) and `LazyBrowseNode` (galaxy.py) — are absent from `__init__.py`/`__all__`, so callers can't `from zb_mom_ww_mxgateway import BrowseChildrenOptions`, breaking the package-root imp… |
| Client.Rust-034 | Low | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` | `Command::Version` carries a `jsonl: bool` field that is never read; the dispatch arm matches `{ json, .. }` and discards `jsonl`. `mxgw version --jsonl` silently behaves as plain text. |
| Client.Rust-035 | Low | Security | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` | `--api-key-env` (default `MXGATEWAY_API_KEY`) names an env var read into an `ApiKey` Bearer token, but its clap help has no description of the expected value format. A user pointing it at another credential's env var would silently forward… |
| Client.Rust-036 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:351` | The new `galaxy browse` subcommand (with its filter/depth/json flags) is not listed in the "Test CLI" command table in RustClientDesign.md, which still reads `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}`. |
| Client.Rust-037 | Low | Design-document adherence | `clients/rust/README.md:164-179` | The README "Browsing lazily" example calls `galaxy.browse_children(...).await?.into_inner()`, but the public API is `GalaxyClient::browse_children_raw` (the bare `browse_children` is the generated proto-client method, not public; and `brow… |
| Client.Rust-038 | Low | Testing coverage | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` | Three CLI test gaps: (1) `ConnectionArgs::options()` `--tls`/`--plaintext` resolution (incl. the both-set path of Client.Rust-033) is untested; (2) `browse_children_one_level`'s repeated-page-token guard is untested; (3) `parse_rfc3339_tim… |
| Contracts-020 | Low | Design-document adherence | `gateway.md:1087,1101-1102` | gateway.md still lists "no reconnectable sessions" under "Resolved for v1" and lists "reconnectable sessions" / "multi-subscriber event fan-out" as post-v1 revisit items. The shipped `ReplayGap` reconnect-replay contract and multi-subscrib… |
| Contracts-021 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` | The `replay_gap` field comment ends with "(Reconnect/replay logic is Task 12; this is the contract surface only.)". That parenthetical is now stale — the reconnect/replay logic has shipped and is exercised by EventStreamServiceTests/Sessio… |
| Contracts-022 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | No round-trip / descriptor pin exists for the new `ReplayGap` message or `MxEvent.replay_gap` (field 14). The field is exercised functionally end-to-end, but there is no contract-level pin to catch a future renumber/type-narrowing of `repl… |
| IntegrationTests-030 | Low | Documentation & comments | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` | `docs/GatewayTesting.md` says "All six tests are gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1" and enumerates five parity paths. This diff adds two new `[LiveMxAccessFact]` tests (B8 new COM commands: AuthenticateUser/ArchestrAUserToId/Sus… |
| IntegrationTests-031 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` | The inline comment at line 672 says "Suspend / Activate against the advised item", but no `Advise` call is made between `AddItem` (line 616) and `CreateSuspendRequest` (line 677) — the item is added but not advised. The comment mislabels t… |
| IntegrationTests-032 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` | In the buffered-item test, when no sample-bearing `OnBufferedDataChange` batch arrives, the sample-predicate `TimeoutException` is caught and discarded (line 831) before asserting `bootstrapBufferedEvents > 0`. The final failure message ("… |
| IntegrationTests-033 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` | The new-COM-commands live test covers AuthenticateUser/ArchestrAUserToId/Suspend/Activate but not `AddItem2`/`Write2` — the B8 extended commands with a second context parameter introduced in the same bundle. Only live COM tests can verify… |
| Server-055 | Low | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` | When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` an… |
| Tests-036 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` | Three new validator rules — `DetachGraceSeconds >= 0` (GatewayOptionsValidator.cs:185-186), `ReplayBufferCapacity >= 0` (:215-216), `ReplayRetentionSeconds >= 0` (:219-220) — have no tests, while the sibling new options (`MaxEventSubscribe… |
| Tests-037 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | The reconnect/replay contract surface (`ReplayGap` message, `MxEvent.replay_gap = 14`, `StreamEventsRequest.after_worker_sequence`) has no protobuf serialize/parse round-trip test pinning the wire shape and the documented sentinel invarian… |
| Tests-038 | Low | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` | `DrainUntilFaultAsync` relies on the channel completing WITH a fault so `WaitToReadAsync` re-throws. Correct for current callers, but if reused on a channel that completes gracefully, `WaitToReadAsync` returns false without throwing and th… |
| Worker.Tests-034 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs:2233`, `src/ZB.MOM.WW.MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs:97` | `FakeMxStatus` is defined twice — file-scope in `TestSupport/NoopMxAccessServer.cs:97` and nested in `MxAccessCommandExecutorTests.FakeMxAccessComObject:2233` — both exposing the same four public fields that `MxStatusProxyConverter` reflec… |
| Worker.Tests-035 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs`, `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:99-136` | `MxAccessCommandExecutor.Execute` has a `_` discard arm returning `CreateInvalidRequestReply(... "Unsupported MXAccess command kind ...")` — the safety net for an unknown `MxCommandKind` (e.g. a future gateway enum value before the worker… |
| Worker.Tests-036 | Low | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:983-996` | `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop` carries a redundant wall-clock assertion `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)`. The existing `heartbeatWait` CTS (cancel-after 5s) already enforces the same bound — th… |
## Closed findings