code-reviews: re-review Client.Python at 42b0037

Append 5 new findings (Client.Python-022..026): README flags for new
alarm subcommands do not exist; Client.Python-013 regression — the
silent localhost auto-plaintext branch is still present (the prior
Resolution did not survive the rename); production batch path uses
the click.testing.CliRunner helper; no behavioural tests for new SDK
+ CLI; bench cleanup swallows exceptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-24 08:28:55 -04:00
parent 15fceed536
commit bc28fee641
+268 -3
View File
@@ -5,9 +5,9 @@
| Module | `clients/python` |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | `d692232` |
| Status | Reviewed |
| Open findings | 0 |
| Commit reviewed | `42b0037` |
| Status | Re-reviewed |
| Open findings | 5 |
## Checklist coverage
@@ -28,6 +28,38 @@ history. This section reflects categories evaluated in this pass.
| 9 | Testing coverage | Issue found: no test exercises the wheel-build / editable-install flow; the broken `pyproject.toml` (Client.Python-018) was not caught at commit time because the test suite runs from `src/` via `pytest pythonpath` (Client.Python-020). |
| 10 | Documentation & comments | Issue found: cross-client CLI parity gap — the Python CLI ships none of the Galaxy subcommands (`galaxy-test-connection`, `galaxy-last-deploy`, `galaxy-discover`, `galaxy-watch`) the .NET / Go / Rust / Java CLIs all expose, and lacks the new `.NET`-only `bench-stream-events`. README does not flag the gap (Client.Python-021). |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037`. The diff against the previous review base
`d692232` is four commits affecting `clients/python`:
- `71d2c39` e2e: port `batch` subcommand to all five client CLIs
- `6add4b4` Python client: port bulk read/write SDK methods + CLI subcommands
- `828e3e6` Python client: port stream-alarms and acknowledge-alarm
- `8738735` clients: document StreamAlarms + AcknowledgeAlarm in each README
Surface area added: `Session.read_bulk` / `write_bulk` / `write2_bulk` /
`write_secured_bulk` / `write_secured2_bulk`; `GatewayClient.stream_alarms`
+ `_canceling_alarm_feed_iterator`; the corresponding CLI subcommands
`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`,
`write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`,
`acknowledge-alarm`, and `batch`; new README CLI examples for the alarm
subcommands; new CLI tests for `stream-alarms` / `acknowledge-alarm`
registration and `batch` semantics.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: `bench-read-bulk` does a function-local `import time` and uses bare `except Exception: pass` in cleanup blocks (Client.Python-026). |
| 2 | mxaccessgw conventions | No new issues found — secured writes still redact, generated code untouched, MXAccess parity preserved. |
| 3 | Concurrency & thread safety | No new issues found — `_canceling_alarm_feed_iterator` follows the same shape as `_canceling_iterator` (Client.Python-007 helper). |
| 4 | Error handling & resilience | No new issues found in the new SDK methods; new RPC mapping `map_rpc_error("stream alarms", ...)` is correct. |
| 5 | Security | Issue found: `_use_plaintext` localhost auto-plaintext branch resolved under Client.Python-013 is back in the renamed CLI module and was carried forward through the new commit untouched (Client.Python-023). |
| 6 | Performance & resource management | No new issues found. |
| 7 | Design-document adherence | No new issues found in the new alarm / bulk surface — matches the cross-client parity matrix expectation. |
| 8 | Code organization & conventions | Issue found: the new `batch` subcommand uses `click.testing.CliRunner` from production code (Client.Python-024). |
| 9 | Testing coverage | Issue found: the new SDK methods and most of the new CLI subcommand bodies have no behavioural tests — only `--help` smoke tests for the alarm CLI (Client.Python-025). |
| 10 | Documentation & comments | Issue found: the README CLI examples for `stream-alarms` and `acknowledge-alarm` use flags that do not exist on the implemented commands (Client.Python-022). |
### 2026-05-24 review (commit d692232)
Re-review pass at `d692232`. Diff against `a020350` is commit `397d3c5`:
@@ -795,3 +827,236 @@ the cross-language benchmark matrix has a record of the asymmetry.
No CLI source change; the implementation of the four Galaxy
subcommands is deferred. Resolved as a doc note rather than a full
parity fix.
### Client.Python-022
| Field | Value |
|---|---|
| Severity | High |
| Category | Documentation & comments |
| Location | `clients/python/README.md:201-202`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:389-420` |
| Status | Open |
**Description:** The README CLI examples added by commit `8738735` for the
new alarm subcommands cite flags the CLI does not accept:
```
mxgw-py stream-alarms --session-id <id> --max-messages 1 --json
mxgw-py acknowledge-alarm --session-id <id> --alarm-reference "\\Galaxy\Area001.Pump001.PumpFault" --json
```
Both subcommands are session-less (the alarm feed is served by the gateway
itself, not a worker session — see the docstring on `acknowledge_alarm`,
"Acknowledge an active MXAccess alarm condition (session-less)"). Neither
`@main.command("stream-alarms")` nor `@main.command("acknowledge-alarm")`
declares a `--session-id` option, and `acknowledge-alarm` declares the
ack-target as `--reference`, **not** `--alarm-reference`. A user copy-pasting
either example gets `Error: no such option: --session-id` (or
`--alarm-reference`) and exits non-zero before any RPC is attempted.
This drift is invisible to the test suite because
`tests/test_cli.py::test_acknowledge_alarm_requires_reference` only asserts
that the missing-flag error mentions `--reference` — it does not validate
the README at all. The .NET / Go / Rust / Java alarm CLI examples in the
sibling READMEs are consistent with their actual flag names, so the Python
README is the only one out of step with its implementation.
**Recommendation:** Either fix the README examples to match the implementation
(remove `--session-id` from both lines, rename `--alarm-reference` to
`--reference`), or — if cross-client parity wants the longer flag name —
rename the CLI option to `--alarm-reference` and add a test that copy-pastes
the README examples through `CliRunner` to assert they parse. Option (1) is
the smaller change.
### Client.Python-023
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:901-906` |
| Status | Open |
**Description:** Client.Python-013 (severity Medium, Security) was marked
**Resolved** on 2026-05-20 with the explicit claim that the silent
`localhost:` / `127.0.0.1:` auto-plaintext branch had been removed from
`_use_plaintext`. The re-review at `d692232` re-asserted this in its
checklist ("`_use_plaintext` now requires explicit `--plaintext` opt-in
(Client.Python-013 resolution verified)").
The branch is still present in the reviewed source at HEAD `42b0037`:
```python
def _use_plaintext(kwargs: dict[str, Any]) -> bool:
if kwargs.get("use_tls"):
return False
if kwargs.get("plaintext"):
return True
return kwargs["endpoint"].startswith("localhost:") or kwargs["endpoint"].startswith("127.0.0.1:")
```
The same code is present in `git show d692232:clients/python/src/zb_mom_ww_mxgateway_cli/commands.py`, so the regression entered at or
before the rename commit `397d3c5` (which created
`src/zb_mom_ww_mxgateway_cli/commands.py` from scratch with the
pre-Client.Python-013 body) and was not noticed in the prior re-review.
The original security argument is unchanged: a user who runs the gateway
behind TLS on loopback for production-shaped local testing and passes
`--api-key mxgw_<secret>` against `localhost:5001` silently gets a plaintext
gRPC channel, with the bearer token attached to it. The other clients
(.NET https-prefix detection, Go / Java explicit `--plaintext`, Rust
opt-in) do not auto-downgrade. The Client.Python-013 resolution also added
six regression tests in `tests/test_cli.py` that asserted the explicit-flag
contract; those tests do not exist in the current `tests/test_cli.py`
either they were lost in the rename or never carried over.
**Recommendation:** Re-apply the Client.Python-013 fix on the current
source: drop the `endpoint.startswith("localhost:") or endpoint.startswith("127.0.0.1:")`
branch, make `--plaintext` and `--tls` mutually exclusive, default to TLS,
and add an assertion-time test that copy-pastes the Client.Python-013
regression-test fixture into `tests/test_cli.py`. Because Client.Python-013
is marked Resolved with a 2026-05-20 commit reference, do **not** silently
re-resolve this finding — keep it Open with a fresh ID so the regression
audit trail is preserved.
### Client.Python-024
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Code organization & conventions |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` |
| Status | Open |
**Description:** The new `batch` subcommand (commit `71d2c39`) implements
the cross-language batch protocol by importing `click.testing.CliRunner`
into production code and calling `runner.invoke(main, args, catch_exceptions=True)`
in a `for raw_line in sys.stdin:` loop. `CliRunner` is documented as a
**testing** helper:
* It replaces `sys.stdin` / `sys.stdout` / `sys.stderr` with `io.StringIO`
during each `invoke()`, swallowing any side-channel output the inner
command writes directly to the real streams (the existing CLI bodies do
not, but any future helper that calls `print()` mid-command will be
silently captured into `result.output` rather than reaching the
harness real-time).
* It captures the inner command into an `Exception` rather than letting
Click's normal exit code propagate, so `result.exit_code` is the
pseudo-exit of a `SystemExit` translation, not the real process exit.
* Click does not guarantee `CliRunner` is stable across versions —
click 9 has already deprecated `runner.invoke(..., mix_stderr=...)`,
and a future Click release could change the return-tuple shape.
* It is recursively re-entrant: `runner.invoke(main, ["batch"], ...)`
inside batch silently spawns a nested batch reading from the same
StringIO-replaced stdin (empty), so a stdin line of `batch` exits
cleanly with no error — almost certainly not the intended semantics.
The other client CLIs in the matrix (.NET, Go, Rust, Java) implement
`batch` by dispatching to their command parser directly, not by
re-invoking the test runner.
**Recommendation:** Replace `CliRunner` with a direct call into the Click
parser, e.g. `main.main(args, standalone_mode=False)` wrapped in a
`try/except click.ClickException` to convert Click exit conditions into
the `{"error": ..., "type": ...}` payload. Capture stdout via a per-line
context manager (e.g. `contextlib.redirect_stdout(io.StringIO())`) so the
batch loop can interleave inner-command output with the
`__MXGW_BATCH_EOR__` sentinel without depending on the testing API. Add
a regression test that drives `batch` with `batch\n` on stdin and asserts
recursive invocation is either rejected or correctly bounded.
### Client.Python-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/python/tests/test_cli.py`, `clients/python/src/zb_mom_ww_mxgateway/{client.py,session.py}`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py` |
| Status | Open |
**Description:** Commits `6add4b4` and `828e3e6` added five new SDK
methods (`Session.read_bulk`, `Session.write_bulk`, `Session.write2_bulk`,
`Session.write_secured_bulk`, `Session.write_secured2_bulk`),
`GatewayClient.stream_alarms`, the helper
`_canceling_alarm_feed_iterator`, and eight new CLI subcommands
(`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`,
`write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`,
`acknowledge-alarm`). The only test coverage added in `tests/test_cli.py`
is:
* `test_stream_alarms_is_registered``--help` smoke only.
* `test_acknowledge_alarm_requires_reference` — verifies the missing-flag
Click error contains `--reference`; no happy-path test.
There is no test that:
1. Asserts `Session.read_bulk` / `write_bulk` / `write2_bulk` /
`write_secured_bulk` / `write_secured2_bulk` builds the expected
`MxCommand` shape (kind, sub-message, server_handle, entries) — the
prior Client.Python-009 coverage pattern (`test_add_item2_sends_*`,
`test_write2_sends_value_and_timestamp_value`) is not extended to the
bulk family even though they ship the same wire-shape risk.
2. Exercises `_canceling_alarm_feed_iterator` cancel-on-task-cancellation
(the Client.Python-007 helper test pattern is not extended).
3. Drives `stream-alarms` / `acknowledge-alarm` / `read-bulk` /
`write-bulk` / `write-secured-bulk` happy paths through `CliRunner`
with a fake stub — the existing
`test_cli_register_happy_path_emits_server_handle` pattern is not
extended.
4. Asserts `bench-read-bulk` emits the cross-language schema for the new
`read-bulk`-shaped fields (the Client.Python-015 pattern existed for
the previous bench command but no equivalent exists for this one).
A silent drift in any of the four bulk-write request shapes — or a
schema drift on `bench-read-bulk` — would not be caught.
**Recommendation:** Extend the Client.Python-009 / Client.Python-015 /
Client.Python-016 patterns: add request-shape tests for the four new
bulk methods, a CLI happy-path test for `read-bulk` / `write-bulk` /
`stream-alarms` / `acknowledge-alarm` against fake stubs, and a
cross-language schema test for `bench-read-bulk` mirroring
`test_bench_read_bulk_emits_cross_language_schema` (with `--read-bulk`
applied to the renamed bench). At minimum, add a request-shape test for
`write_secured_bulk` since the secured family is the highest-risk
parity surface.
### Client.Python-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:674-738` |
| Status | Open |
**Description:** Two minor quality issues in the new `_bench_read_bulk`
body (commit `6add4b4`):
1. `import time` is done inside the function body (line 676) rather than
at module top. `PythonStyleGuide.md` does not state this explicitly,
but every other helper in `commands.py` imports its dependencies at
module top, and `time` is already imported (transitively) elsewhere
in the package. The function-local import is a vestige of incremental
development and should be hoisted.
2. The `finally` cleanup block uses two consecutive bare
`except Exception: pass` blocks to swallow `unsubscribe_bulk` and
`session.close()` failures (lines 733-734 and 737-738). This silently
discards diagnostic information about cleanup failures — e.g. a
transient gateway crash mid-benchmark or a protocol error during
unsubscribe — and matches an anti-pattern the rest of the module
avoids (the `_bench_read_bulk` analogues in the other clients log the
cleanup failure before swallowing it).
Both are Low severity. The bench command is best-effort by design (the
benchmark output is what matters; cleanup failures are not user-facing),
but a single log line on cleanup failure would make a future regression
visible at the next benchmark run rather than silently corrupting the
worker's subscription bookkeeping until a session-level GC sweep.
**Recommendation:** Move `import time` to the module-level import block.
Replace each `except Exception: pass` with `except Exception as exc:
logger.warning("bench-read-bulk cleanup: %s", exc)` against a
module-level `logger = logging.getLogger(__name__)`. No behavioural
change in the happy path; failure path becomes diagnosable. No new test
required for the import hoist; the logger change is exercised by the
existing bench smoke test once `caplog` is added to the test signature.