From bc28fee641d74b38bde0d2efabbc570b6ecf5b4b Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Sun, 24 May 2026 08:28:55 -0400 Subject: [PATCH] code-reviews: re-review Client.Python at 42b0037 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Append 5 new findings (Client.Python-022..026): README flags for new alarm subcommands do not exist; Client.Python-013 regression — the silent localhost auto-plaintext branch is still present (the prior Resolution did not survive the rename); production batch path uses the click.testing.CliRunner helper; no behavioural tests for new SDK + CLI; bench cleanup swallows exceptions. Co-Authored-By: Claude Opus 4.7 (1M context) --- code-reviews/Client.Python/findings.md | 271 ++++++++++++++++++++++++- 1 file changed, 268 insertions(+), 3 deletions(-) diff --git a/code-reviews/Client.Python/findings.md b/code-reviews/Client.Python/findings.md index c48cbba..6920ce0 100644 --- a/code-reviews/Client.Python/findings.md +++ b/code-reviews/Client.Python/findings.md @@ -5,9 +5,9 @@ | Module | `clients/python` | | Reviewer | Claude Code | | Review date | 2026-05-24 | -| Commit reviewed | `d692232` | -| Status | Reviewed | -| Open findings | 0 | +| Commit reviewed | `42b0037` | +| Status | Re-reviewed | +| Open findings | 5 | ## Checklist coverage @@ -28,6 +28,38 @@ history. This section reflects categories evaluated in this pass. | 9 | Testing coverage | Issue found: no test exercises the wheel-build / editable-install flow; the broken `pyproject.toml` (Client.Python-018) was not caught at commit time because the test suite runs from `src/` via `pytest pythonpath` (Client.Python-020). | | 10 | Documentation & comments | Issue found: cross-client CLI parity gap — the Python CLI ships none of the Galaxy subcommands (`galaxy-test-connection`, `galaxy-last-deploy`, `galaxy-discover`, `galaxy-watch`) the .NET / Go / Rust / Java CLIs all expose, and lacks the new `.NET`-only `bench-stream-events`. README does not flag the gap (Client.Python-021). | +### 2026-05-24 re-review (commit 42b0037) + +Re-review pass at `42b0037`. The diff against the previous review base +`d692232` is four commits affecting `clients/python`: + +- `71d2c39` e2e: port `batch` subcommand to all five client CLIs +- `6add4b4` Python client: port bulk read/write SDK methods + CLI subcommands +- `828e3e6` Python client: port stream-alarms and acknowledge-alarm +- `8738735` clients: document StreamAlarms + AcknowledgeAlarm in each README + +Surface area added: `Session.read_bulk` / `write_bulk` / `write2_bulk` / +`write_secured_bulk` / `write_secured2_bulk`; `GatewayClient.stream_alarms` ++ `_canceling_alarm_feed_iterator`; the corresponding CLI subcommands +`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, +`write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`, +`acknowledge-alarm`, and `batch`; new README CLI examples for the alarm +subcommands; new CLI tests for `stream-alarms` / `acknowledge-alarm` +registration and `batch` semantics. + +| # | Category | Result | +|---|---|---| +| 1 | Correctness & logic bugs | Issue found: `bench-read-bulk` does a function-local `import time` and uses bare `except Exception: pass` in cleanup blocks (Client.Python-026). | +| 2 | mxaccessgw conventions | No new issues found — secured writes still redact, generated code untouched, MXAccess parity preserved. | +| 3 | Concurrency & thread safety | No new issues found — `_canceling_alarm_feed_iterator` follows the same shape as `_canceling_iterator` (Client.Python-007 helper). | +| 4 | Error handling & resilience | No new issues found in the new SDK methods; new RPC mapping `map_rpc_error("stream alarms", ...)` is correct. | +| 5 | Security | Issue found: `_use_plaintext` localhost auto-plaintext branch resolved under Client.Python-013 is back in the renamed CLI module and was carried forward through the new commit untouched (Client.Python-023). | +| 6 | Performance & resource management | No new issues found. | +| 7 | Design-document adherence | No new issues found in the new alarm / bulk surface — matches the cross-client parity matrix expectation. | +| 8 | Code organization & conventions | Issue found: the new `batch` subcommand uses `click.testing.CliRunner` from production code (Client.Python-024). | +| 9 | Testing coverage | Issue found: the new SDK methods and most of the new CLI subcommand bodies have no behavioural tests — only `--help` smoke tests for the alarm CLI (Client.Python-025). | +| 10 | Documentation & comments | Issue found: the README CLI examples for `stream-alarms` and `acknowledge-alarm` use flags that do not exist on the implemented commands (Client.Python-022). | + ### 2026-05-24 review (commit d692232) Re-review pass at `d692232`. Diff against `a020350` is commit `397d3c5`: @@ -795,3 +827,236 @@ the cross-language benchmark matrix has a record of the asymmetry. No CLI source change; the implementation of the four Galaxy subcommands is deferred. Resolved as a doc note rather than a full parity fix. + +### Client.Python-022 + +| Field | Value | +|---|---| +| Severity | High | +| Category | Documentation & comments | +| Location | `clients/python/README.md:201-202`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:389-420` | +| Status | Open | + +**Description:** The README CLI examples added by commit `8738735` for the +new alarm subcommands cite flags the CLI does not accept: + +``` +mxgw-py stream-alarms --session-id --max-messages 1 --json +mxgw-py acknowledge-alarm --session-id --alarm-reference "\\Galaxy\Area001.Pump001.PumpFault" --json +``` + +Both subcommands are session-less (the alarm feed is served by the gateway +itself, not a worker session — see the docstring on `acknowledge_alarm`, +"Acknowledge an active MXAccess alarm condition (session-less)"). Neither +`@main.command("stream-alarms")` nor `@main.command("acknowledge-alarm")` +declares a `--session-id` option, and `acknowledge-alarm` declares the +ack-target as `--reference`, **not** `--alarm-reference`. A user copy-pasting +either example gets `Error: no such option: --session-id` (or +`--alarm-reference`) and exits non-zero before any RPC is attempted. + +This drift is invisible to the test suite because +`tests/test_cli.py::test_acknowledge_alarm_requires_reference` only asserts +that the missing-flag error mentions `--reference` — it does not validate +the README at all. The .NET / Go / Rust / Java alarm CLI examples in the +sibling READMEs are consistent with their actual flag names, so the Python +README is the only one out of step with its implementation. + +**Recommendation:** Either fix the README examples to match the implementation +(remove `--session-id` from both lines, rename `--alarm-reference` to +`--reference`), or — if cross-client parity wants the longer flag name — +rename the CLI option to `--alarm-reference` and add a test that copy-pastes +the README examples through `CliRunner` to assert they parse. Option (1) is +the smaller change. + +### Client.Python-023 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Security | +| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:901-906` | +| Status | Open | + +**Description:** Client.Python-013 (severity Medium, Security) was marked +**Resolved** on 2026-05-20 with the explicit claim that the silent +`localhost:` / `127.0.0.1:` auto-plaintext branch had been removed from +`_use_plaintext`. The re-review at `d692232` re-asserted this in its +checklist ("`_use_plaintext` now requires explicit `--plaintext` opt-in +(Client.Python-013 resolution verified)"). + +The branch is still present in the reviewed source at HEAD `42b0037`: + +```python +def _use_plaintext(kwargs: dict[str, Any]) -> bool: + if kwargs.get("use_tls"): + return False + if kwargs.get("plaintext"): + return True + return kwargs["endpoint"].startswith("localhost:") or kwargs["endpoint"].startswith("127.0.0.1:") +``` + +The same code is present in `git show d692232:clients/python/src/zb_mom_ww_mxgateway_cli/commands.py`, so the regression entered at or +before the rename commit `397d3c5` (which created +`src/zb_mom_ww_mxgateway_cli/commands.py` from scratch with the +pre-Client.Python-013 body) and was not noticed in the prior re-review. + +The original security argument is unchanged: a user who runs the gateway +behind TLS on loopback for production-shaped local testing and passes +`--api-key mxgw_` against `localhost:5001` silently gets a plaintext +gRPC channel, with the bearer token attached to it. The other clients +(.NET https-prefix detection, Go / Java explicit `--plaintext`, Rust +opt-in) do not auto-downgrade. The Client.Python-013 resolution also added +six regression tests in `tests/test_cli.py` that asserted the explicit-flag +contract; those tests do not exist in the current `tests/test_cli.py` — +either they were lost in the rename or never carried over. + +**Recommendation:** Re-apply the Client.Python-013 fix on the current +source: drop the `endpoint.startswith("localhost:") or endpoint.startswith("127.0.0.1:")` +branch, make `--plaintext` and `--tls` mutually exclusive, default to TLS, +and add an assertion-time test that copy-pastes the Client.Python-013 +regression-test fixture into `tests/test_cli.py`. Because Client.Python-013 +is marked Resolved with a 2026-05-20 commit reference, do **not** silently +re-resolve this finding — keep it Open with a fresh ID so the regression +audit trail is preserved. + +### Client.Python-024 + +| Field | Value | +|---|---| +| Severity | Medium | +| Category | Code organization & conventions | +| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` | +| Status | Open | + +**Description:** The new `batch` subcommand (commit `71d2c39`) implements +the cross-language batch protocol by importing `click.testing.CliRunner` +into production code and calling `runner.invoke(main, args, catch_exceptions=True)` +in a `for raw_line in sys.stdin:` loop. `CliRunner` is documented as a +**testing** helper: + +* It replaces `sys.stdin` / `sys.stdout` / `sys.stderr` with `io.StringIO` + during each `invoke()`, swallowing any side-channel output the inner + command writes directly to the real streams (the existing CLI bodies do + not, but any future helper that calls `print()` mid-command will be + silently captured into `result.output` rather than reaching the + harness real-time). +* It captures the inner command into an `Exception` rather than letting + Click's normal exit code propagate, so `result.exit_code` is the + pseudo-exit of a `SystemExit` translation, not the real process exit. +* Click does not guarantee `CliRunner` is stable across versions — + click 9 has already deprecated `runner.invoke(..., mix_stderr=...)`, + and a future Click release could change the return-tuple shape. +* It is recursively re-entrant: `runner.invoke(main, ["batch"], ...)` + inside batch silently spawns a nested batch reading from the same + StringIO-replaced stdin (empty), so a stdin line of `batch` exits + cleanly with no error — almost certainly not the intended semantics. + +The other client CLIs in the matrix (.NET, Go, Rust, Java) implement +`batch` by dispatching to their command parser directly, not by +re-invoking the test runner. + +**Recommendation:** Replace `CliRunner` with a direct call into the Click +parser, e.g. `main.main(args, standalone_mode=False)` wrapped in a +`try/except click.ClickException` to convert Click exit conditions into +the `{"error": ..., "type": ...}` payload. Capture stdout via a per-line +context manager (e.g. `contextlib.redirect_stdout(io.StringIO())`) so the +batch loop can interleave inner-command output with the +`__MXGW_BATCH_EOR__` sentinel without depending on the testing API. Add +a regression test that drives `batch` with `batch\n` on stdin and asserts +recursive invocation is either rejected or correctly bounded. + +### Client.Python-025 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Testing coverage | +| Location | `clients/python/tests/test_cli.py`, `clients/python/src/zb_mom_ww_mxgateway/{client.py,session.py}`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py` | +| Status | Open | + +**Description:** Commits `6add4b4` and `828e3e6` added five new SDK +methods (`Session.read_bulk`, `Session.write_bulk`, `Session.write2_bulk`, +`Session.write_secured_bulk`, `Session.write_secured2_bulk`), +`GatewayClient.stream_alarms`, the helper +`_canceling_alarm_feed_iterator`, and eight new CLI subcommands +(`read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, +`write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`, +`acknowledge-alarm`). The only test coverage added in `tests/test_cli.py` +is: + +* `test_stream_alarms_is_registered` — `--help` smoke only. +* `test_acknowledge_alarm_requires_reference` — verifies the missing-flag + Click error contains `--reference`; no happy-path test. + +There is no test that: + +1. Asserts `Session.read_bulk` / `write_bulk` / `write2_bulk` / + `write_secured_bulk` / `write_secured2_bulk` builds the expected + `MxCommand` shape (kind, sub-message, server_handle, entries) — the + prior Client.Python-009 coverage pattern (`test_add_item2_sends_*`, + `test_write2_sends_value_and_timestamp_value`) is not extended to the + bulk family even though they ship the same wire-shape risk. +2. Exercises `_canceling_alarm_feed_iterator` cancel-on-task-cancellation + (the Client.Python-007 helper test pattern is not extended). +3. Drives `stream-alarms` / `acknowledge-alarm` / `read-bulk` / + `write-bulk` / `write-secured-bulk` happy paths through `CliRunner` + with a fake stub — the existing + `test_cli_register_happy_path_emits_server_handle` pattern is not + extended. +4. Asserts `bench-read-bulk` emits the cross-language schema for the new + `read-bulk`-shaped fields (the Client.Python-015 pattern existed for + the previous bench command but no equivalent exists for this one). + +A silent drift in any of the four bulk-write request shapes — or a +schema drift on `bench-read-bulk` — would not be caught. + +**Recommendation:** Extend the Client.Python-009 / Client.Python-015 / +Client.Python-016 patterns: add request-shape tests for the four new +bulk methods, a CLI happy-path test for `read-bulk` / `write-bulk` / +`stream-alarms` / `acknowledge-alarm` against fake stubs, and a +cross-language schema test for `bench-read-bulk` mirroring +`test_bench_read_bulk_emits_cross_language_schema` (with `--read-bulk` +applied to the renamed bench). At minimum, add a request-shape test for +`write_secured_bulk` since the secured family is the highest-risk +parity surface. + +### Client.Python-026 + +| Field | Value | +|---|---| +| Severity | Low | +| Category | Correctness & logic bugs | +| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:674-738` | +| Status | Open | + +**Description:** Two minor quality issues in the new `_bench_read_bulk` +body (commit `6add4b4`): + +1. `import time` is done inside the function body (line 676) rather than + at module top. `PythonStyleGuide.md` does not state this explicitly, + but every other helper in `commands.py` imports its dependencies at + module top, and `time` is already imported (transitively) elsewhere + in the package. The function-local import is a vestige of incremental + development and should be hoisted. +2. The `finally` cleanup block uses two consecutive bare + `except Exception: pass` blocks to swallow `unsubscribe_bulk` and + `session.close()` failures (lines 733-734 and 737-738). This silently + discards diagnostic information about cleanup failures — e.g. a + transient gateway crash mid-benchmark or a protocol error during + unsubscribe — and matches an anti-pattern the rest of the module + avoids (the `_bench_read_bulk` analogues in the other clients log the + cleanup failure before swallowing it). + +Both are Low severity. The bench command is best-effort by design (the +benchmark output is what matters; cleanup failures are not user-facing), +but a single log line on cleanup failure would make a future regression +visible at the next benchmark run rather than silently corrupting the +worker's subscription bookkeeping until a session-level GC sweep. + +**Recommendation:** Move `import time` to the module-level import block. +Replace each `except Exception: pass` with `except Exception as exc: +logger.warning("bench-read-bulk cleanup: %s", exc)` against a +module-level `logger = logging.getLogger(__name__)`. No behavioural +change in the happy path; failure path becomes diagnosable. No new test +required for the import hoist; the logger change is exercised by the +existing bench smoke test once `caplog` is added to the test signature.