Files
mxaccessgw/code-reviews/Client.Python/findings.md
Joseph Doherty a7bf1ef95d Resolve Client.Python-001/002/004/006/007/008/010/011/012 findings
Client.Python-001: dropped "scaffold" from the stale pyproject description.
Client.Python-002 (re-triaged): stale finding — MxGatewayCommandError is
already exported and in __all__; no change needed.
Client.Python-004: removed the dead `closed` variable in _smoke; the CLI
smoke now uses `async with session`.
Client.Python-006: close() on both clients and Session had an unlocked
check-then-set race; `_closed` is now set before the await.
Client.Python-007: gateway stream iterators now share one helper that
explicitly catches CancelledError and cancels the call.
Client.Python-008: to_mx_value now rejects nan/inf; float/bytes mapping
documented.
Client.Python-010: removed the circular-import-workaround late imports in
favour of TYPE_CHECKING / module-scope imports.
Client.Python-011: ensure_mxaccess_success no longer treats a proto3-default
success==0 with an unset category as a failure.
Client.Python-012 (Won't Fix): invoke_raw deliberately skips MXAccess-failure
detection for parity tests; documented the contract instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:59:24 -04:00

21 KiB

Code Review — Client.Python

Field Value
Module clients/python
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 3cc53a8
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Issues found: dead closed variable (Client.Python-004); float/bytes value-mapping assumptions (Client.Python-008).
2 mxaccessgw conventions Largely adheres; one missing export and a *_raw MXAccess-failure documentation gap (Client.Python-002, Client.Python-012).
3 Concurrency & thread safety Issue found: close() idempotency claim does not hold under concurrent close (Client.Python-006).
4 Error handling & resilience Issues found: inconsistent timeout-kwarg fallback (Client.Python-003); success == 0 default-value hazard (Client.Python-011); inconsistent cancel helpers (Client.Python-007).
5 Security No issues found — API keys redacted in repr and CLI output, TLS supported, no secret logging.
6 Performance & resource management Issue found: discover_hierarchy buffers the whole hierarchy in memory (Client.Python-005).
7 Design-document adherence Matches the design docs closely; minor CLI doc drift (Client.Python-001).
8 Code organization & conventions Issues found: MxGatewayCommandError omitted from __all__ (Client.Python-002); fragile circular-import workaround (Client.Python-010).
9 Testing coverage Issue found: write2, add_item2, bulk-size limits, TLS ca_file, and CLI command bodies untested (Client.Python-009).
10 Documentation & comments Issue found: stale "scaffold" package description (Client.Python-001).

Findings

Client.Python-001

Field Value
Severity Low
Category Documentation & comments
Location clients/python/pyproject.toml:8,25, clients/python/src/mxgateway_cli/commands.py:25
Status Resolved

Description: The package description in pyproject.toml still says "Async Python client scaffold" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The mxgw-py console-script name is itself consistent between pyproject.toml and the README.)

Recommendation: Update the pyproject.toml description to drop "scaffold"; keep README CLI examples in sync with the actual mxgw-py entry point.

Resolution: 2026-05-18 — Confirmed: pyproject.toml:8 description read "Async Python client scaffold for MXAccess Gateway." Changed to "Async Python client for MXAccess Gateway." The mxgw-py console-script name was already consistent with the README, so no README change was needed. Pure metadata fix — no test required.

Client.Python-002

Field Value
Severity Low
Category Code organization & conventions
Location clients/python/src/mxgateway/__init__.py:27
Status Resolved

Description: MxGatewayCommandError is imported into __init__.py and is a documented public exception, but it is missing from __all__. It is the parent of MxAccessError and a meaningful catch target, so omitting it from the public surface is inconsistent — from mxgateway import * will not expose it and tooling that respects __all__ treats it as private.

Recommendation: Add "MxGatewayCommandError" to the __all__ list.

Resolution: 2026-05-18 — Re-triaged: this finding is stale against the reviewed source. clients/python/src/mxgateway/__init__.py already imports MxGatewayCommandError (line 16) and lists "MxGatewayCommandError" in __all__ (line 38). from mxgateway import * exposes it correctly. Verified at runtime ('MxGatewayCommandError' in mxgateway.__all__ is True). No source change required — the defect described no longer exists.

Client.Python-003

Field Value
Severity Medium
Category Error handling & resilience
Location clients/python/src/mxgateway/client.py:125-137,155-173
Status Resolved

Description: stream_events_raw and query_active_alarms call the stub directly with a timeout kwarg when stream_timeout is set, with no TypeError fallback. galaxy.py:watch_deploy_events and _unary do have a fallback that strips timeout if the callable rejects it. This asymmetry means a fake/older stub that does not accept timeout crashes for gateway streams but not Galaxy streams. It is only masked today because stream_timeout defaults to None.

Recommendation: Apply the same try/except TypeError timeout-fallback pattern to stream_events_raw and query_active_alarms, or remove the fallback everywhere and standardise on a single behaviour.

Resolution: 2026-05-18 — Confirmed: both stream methods in client.py called the stub with timeout unconditionally and had no TypeError fallback, unlike _unary and galaxy.watch_deploy_events. Added a shared _open_stream helper in client.py that opens a server-streaming call and strips the timeout kwarg when the stub raises TypeError: ... unexpected keyword argument 'timeout', then routed both stream_events_raw and query_active_alarms through it. Regression tests in tests/test_stream_timeout_fallback.py (test_stream_events_raw_falls_back_when_stub_rejects_timeout, test_query_active_alarms_falls_back_when_stub_rejects_timeout, test_stream_events_raw_still_passes_timeout_to_capable_stub) failed before the fix and pass after. No public behaviour change for real gRPC stubs, so no README update needed.

Client.Python-004

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/python/src/mxgateway_cli/commands.py:386,402-404
Status Resolved

Description: In _smoke, the local variable closed is set to False and never reassigned; the finally block's if not closed: is therefore always true. This is dead/misleading code suggesting a removed early-close path.

Recommendation: Remove the closed variable and the if not closed: guard; call await session.close() directly in the finally block (or use async with session:).

Resolution: 2026-05-18 — Confirmed: closed = False was set and never reassigned, making if not closed: dead code. Replaced the try/finally with async with session: so the session is closed via the documented async context manager — Session already implements __aexit__close(). Behaviour is unchanged (the session is still closed on every exit path); no test needed for the dead-code removal — exercised by the existing CLI smoke test.

Client.Python-005

Field Value
Severity Medium
Category Performance & resource management
Location clients/python/src/mxgateway/galaxy.py:117-140
Status Resolved

Description: discover_hierarchy pages through the entire Galaxy object hierarchy and accumulates every GalaxyObject (each carrying its full attribute list) into a single in-memory list before returning. For a large Galaxy this is a very large allocation with no streaming alternative and no caller-side bound.

Recommendation: Offer an async-generator variant (e.g. iter_hierarchy()) that yields objects/pages as they arrive, keeping discover_hierarchy() as a convenience wrapper. At minimum document the memory characteristic.

Resolution: 2026-05-18 — Confirmed: discover_hierarchy buffered the entire hierarchy with no streaming alternative. Added GalaxyRepositoryClient.iter_hierarchy, an async generator that fetches one DiscoverHierarchyRequest page at a time and yields each GalaxyObject as it arrives, so peak memory is bounded by a single page (_DISCOVER_HIERARCHY_PAGE_SIZE). Pages are fetched lazily — the next page is only requested after the current page is fully consumed. discover_hierarchy is now a thin convenience wrapper ([obj async for obj in self.iter_hierarchy()]) that preserves its list[GalaxyObject] contract, including the repeated-page-token guard. Regression tests in tests/test_galaxy_iter_hierarchy.py (test_iter_hierarchy_yields_objects_across_pages, test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page, test_iter_hierarchy_rejects_repeated_page_token, test_discover_hierarchy_still_returns_full_list) failed before the fix and pass after. clients/python/README.md updated with the iter_hierarchy usage and memory guidance since this adds a new public method.

Client.Python-006

Field Value
Severity Low
Category Concurrency & thread safety
Location clients/python/src/mxgateway/client.py:74-82, clients/python/src/mxgateway/galaxy.py:85-93, clients/python/src/mxgateway/session.py:38-55
Status Resolved

Description: close() on the clients and Session.close() use a plain self._closed check-then-set with an await between, with no lock. If two coroutines call close() concurrently both can pass the guard before either sets it, causing a double channel.close() / double CloseSession RPC. Single-task usage is the documented contract, so impact is low, but the idempotency guarantee asserted in docstrings only holds for sequential calls.

Recommendation: Set self._closed = True before the await, or guard with an asyncio.Lock, so the idempotency claim holds under concurrent close.

Resolution: 2026-05-18 — Confirmed the check-then-set window. Fixed GatewayClient.close, GalaxyRepositoryClient.close, and Session.close to set self._closed = True before the await (channel close / CloseSession RPC). A second coroutine entering close() while the first is still awaiting now hits the early-return guard and does not issue a second channel.close() / CloseSession. Docstrings updated to state the idempotency holds under concurrent calls. TDD: regression tests in tests/test_low_severity_findings.py (test_gateway_client_concurrent_close_closes_channel_once, test_galaxy_client_concurrent_close_closes_channel_once, test_session_concurrent_close_sends_one_close_session_rpc) — each uses a fake channel/client that stalls inside close/close_session_raw so two concurrent close() calls interleave at the exact race window; they failed before the fix and pass after.

Client.Python-007

Field Value
Severity Low
Category Error handling & resilience
Location clients/python/src/mxgateway/client.py:204-213
Status Resolved

Description: _canceling_iterator (gateway event stream) does not catch asyncio.CancelledError to invoke call.cancel() explicitly — it relies on the finally block. galaxy.py:_canceling_iterator does explicitly catch CancelledError, cancel, and re-raise. The two are functionally equivalent today, but the inconsistency between near-identical helpers invites future divergence.

Recommendation: Make the two _canceling_iterator helpers identical, ideally by factoring a single shared helper.

Resolution: 2026-05-18 — Confirmed the divergence. Factored a single shared helper: client._canceling_iterator(call, operation) now takes the map_rpc_error operation string as a parameter, explicitly catches asyncio.CancelledError (cancels the call, re-raises) and grpc.RpcError, and repeats the cancel in finally. This replaces both the gateway _canceling_iterator and the gateway _canceling_active_alarms_iterator; galaxy.py now imports and delegates to the same helper instead of defining its own, so the gateway and Galaxy stream helpers are byte-for-byte identical. TDD: tests/test_low_severity_findings.py::test_gateway_stream_iterator_cancels_call_on_task_cancellation drives a cancellable fake stream and asserts the gateway iterator cancels the underlying call on task cancellation. All existing stream-cancellation tests still pass.

Client.Python-008

Field Value
Severity Low
Category Correctness & logic bugs
Location clients/python/src/mxgateway/values.py:62-67,83-88
Status Resolved

Description: to_mx_value maps any Python float to VT_R8/MX_DATA_TYPE_DOUBLE with no handling for nan/inf, which are serialised and forwarded to MXAccess which may reject or mis-handle them. bytes is mapped to VT_RECORD/MX_DATA_TYPE_UNKNOWN, a questionable default. The data_type keyword exists but Session.write never forwards it.

Recommendation: Document the float/bytes mapping assumptions, optionally validate finiteness, and consider plumbing the data_type keyword through Session.write/write2.

Resolution: 2026-05-18 — Confirmed the non-finite-float hazard. Added an _ensure_finite guard in values.py: to_mx_value now raises ValueError for nan/inf/-inf, both for a scalar float and for a non-finite element inside a float sequence — MXAccess has no defined wire representation for non-finite doubles, so rejecting client-side is the correct fail-fast. The float/bytes mapping assumptions (finite-only doubles; bytes as an opaque VT_RECORD pass-through) are now documented in the values.py module docstring and clients/python/README.md. Plumbing data_type through Session.write/write2 was deliberately not done: it is a larger public-API surface change the finding only marks as "consider", and the documented MXAccess-parity convention is type-by-Python-value; the data_type keyword stays available on to_mx_value for callers that build the MxValue directly. TDD: tests/test_low_severity_findings.py adds test_to_mx_value_rejects_nan, test_to_mx_value_rejects_positive_infinity, test_to_mx_value_rejects_negative_infinity, test_to_mx_value_rejects_non_finite_float_in_sequence, and test_to_mx_value_accepts_finite_float. README updated since to_mx_value (used by Session.write/write2) now rejects an input it previously accepted.

Client.Python-009

Field Value
Severity Medium
Category Testing coverage
Location clients/python/tests/
Status Resolved

Description: Several non-trivial public paths are untested: Session.write2/add_item2 request construction; the bulk-size limit _ensure_bulk_size/MAX_BULK_ITEMS guard; the None-argument TypeError guards in bulk methods; the TLS ca_file read path in create_channel; most CLI command bodies; and map_rpc_error's default (non-auth) branch.

Recommendation: Add tests for write2/add_item2 request shape, the bulk-size ValueError, the ca_file TLS branch, the generic map_rpc_error fallthrough, and at least one happy-path CLI command using a fake stub.

Resolution: 2026-05-18 — Confirmed coverage gap against the existing tests/ files. Added tests/test_coverage_gaps.py covering every path the finding lists: test_add_item2_sends_item_context_and_returns_handle and test_write2_sends_value_and_timestamp_value (request shape + MxValue oneof), test_subscribe_bulk_rejects_oversized_request and test_add_item_bulk_at_limit_is_allowed (the MAX_BULK_ITEMS _ensure_bulk_size boundary), test_advise_item_bulk_rejects_none_argument (the None-argument TypeError guard), test_create_channel_reads_ca_file and test_create_channel_missing_ca_file_raises (the TLS ca_file read path), test_map_rpc_error_generic_branch_returns_transport_error and test_map_rpc_error_handles_error_without_code (the non-auth map_rpc_error fallthrough and the no-code path), and test_cli_register_happy_path_emits_server_handle (a happy-path CLI command body driven end to end through CliRunner with a fake stub via a monkeypatched _connect). All 10 new tests pass. No source change required — this is a pure coverage finding.

Client.Python-010

Field Value
Severity Low
Category Code organization & conventions
Location clients/python/src/mxgateway/session.py:404, clients/python/src/mxgateway_cli/commands.py:422-425
Status Resolved

Description: session.py ends with a module-level late import from .client import GatewayClient # noqa: E402 purely to satisfy a string type hint, and commands.py:_session does a function-local import. Both work around a circular dependency that from __future__ import annotations (already in effect) makes unnecessary. _session also lacks a return type annotation.

Recommendation: Drop the runtime late import in session.py and use a TYPE_CHECKING-guarded import for the hint; add the -> Session return annotation to commands.py:_session.

Resolution: 2026-05-18 — Confirmed: with from __future__ import annotations in effect all annotations are strings, so the runtime late import was unnecessary. Removed the trailing from .client import GatewayClient # noqa: E402 in session.py and replaced it with a top-of-file if TYPE_CHECKING: import that satisfies the GatewayClient hint without a runtime dependency (no import cycle: client.py does not import session at module scope). In commands.py, hoisted the function-local from mxgateway.session import Session to a module-level import and added the -> Session return annotation to _session. Verified import mxgateway and import mxgateway_cli.commands succeed with no circular-import error. Pure refactor — covered by the existing import and CLI tests; no new test needed.

Client.Python-011

Field Value
Severity Low
Category Error handling & resilience
Location clients/python/src/mxgateway/errors.py:122-148
Status Resolved

Description: ensure_mxaccess_success raises MxAccessError if any mx_status.success == 0. This treats success == 0 as the failure sentinel, but 0 is also the proto3 scalar default for an unset MxStatusProxy. If the gateway ever returns a reply with an unpopulated status entry (e.g. a partially-filled bulk result), the client raises MxAccessError even though no real failure occurred.

Recommendation: Confirm against the proto/gateway contract whether success is guaranteed populated for every statuses entry; if not, key the failure decision on an explicit failure field rather than the success == 0 default.

Resolution: 2026-05-18 — Confirmed against the gateway contract: success is not guaranteed populated for every statuses entry. src/MxGateway.Worker/Conversion/MxStatusProxyConverter.cs::ConvertMany emits a placeholder MxStatusProxy for a null MXSTATUS_PROXY COM array entry, setting Category/DetectedBy to Unknown but leaving Success at its proto3 default of 0. A fully-default proto entry likewise has success == 0. Under the old client logic either placeholder would falsely raise MxAccessError. Fixed ensure_mxaccess_success to key the per-status failure decision on a new _is_mxaccess_status_failure helper that requires success == 0 and a populated, non-OK category — a status with category of MX_STATUS_CATEGORY_UNSPECIFIED (default proto) or MX_STATUS_CATEGORY_UNKNOWN (the null-entry placeholder) is treated as unpopulated and ignored. MX_STATUS_CATEGORY_OK is also excluded so a genuine success entry never raises. Real failures (categories WARNING and the error categories, raw value ≥ 2) still raise as before — the existing write.mxaccess-failure fixture (SECURITY_ERROR/OPERATIONAL_ERROR statuses) and the MXACCESS_FAILURE protocol-status path are unaffected. TDD: tests/test_low_severity_findings.py adds test_ensure_mxaccess_success_ignores_unpopulated_status_entry (default + null-placeholder entries, no raise), test_ensure_mxaccess_success_raises_on_populated_failure_status (populated COMMUNICATION_ERROR, raises), and test_ensure_mxaccess_success_passes_when_status_reports_success. No public-behaviour change for genuine replies, so no README update.

Client.Python-012

Field Value
Severity Low
Category mxaccessgw conventions
Location clients/python/src/mxgateway/client.py:84-108, clients/python/src/mxgateway/session.py:57-77
Status Won't Fix

Description: Session.invoke_raw does not run ensure_mxaccess_success while Session.invoke does, so a caller using invoke_raw for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by design but under-documented — the README's "preserve raw replies" sentence does not state that *_raw methods skip MXAccess-failure detection entirely.

Recommendation: Document explicitly (README + docstring) that *_raw methods surface MXAccess HRESULT/status failures only inside the reply and do not raise MxAccessError, so parity-test callers know to inspect protocol_status/hresult/statuses themselves.

Resolution: 2026-05-18 — Won't Fix (no behaviour change). Confirmed this is intentional, correct parity behaviour: the *_raw methods exist precisely so parity-test callers can inspect an unmodified gateway reply, including embedded MXAccess HRESULT/status failures, without an exception masking them. Changing invoke_raw to raise MxAccessError would defeat its purpose and duplicate Session.invoke. The finding's only actionable point is the documentation gap, which has been addressed: clients/python/README.md now states explicitly that *_raw methods enforce gateway protocol success only and do not run MXAccess-failure detection, and the docstrings of GatewayClient.invoke_raw and Session.invoke_raw say the same and point callers to inspect protocol_status/hresult/statuses (and to Session.invoke for the checked variant). No code/test change — the runtime contract is unchanged and correct.