Files

T

Joseph Doherty 1aafd6bde4 Code-review 2026-05-20 sweep #2 : re-review at a020350, resolve 48 findings

Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.

High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
  pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
  string (it must be a valid SPDX expression), so `pip wheel .` and
  `pip install -e .` both fail before any source compiles. Tests
  still pass because pytest bypasses the build backend via
  `pythonpath`. Dropped the invalid license string, kept the
  `License :: Other/Proprietary License` classifier, and added
  `tests/test_packaging.py` so a future regression of the same shape
  is caught in CI.

Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
  on WorkerPipeSessionOptions bounds the in-flight-command watchdog
  suppression so a truly stuck COM call still triggers StaHung
  instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
  cross-language bench comparison is apples-to-apples again;
  `failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
  serialisation pattern to DeployEventStream so close() arriving
  after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
  stability check after UnAdvise instead of strict equality against
  the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
  log sink the WriteSecured live test owns (worker stdout/stderr,
  gateway logs, direct WriteLine) so the credential is proven
  absent from the full output buffer, not just the diagnostic
  message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
  for the previously-uncovered Write2Bulk and WriteSecured2Bulk
  arms of WriteBulkConstraintPlan.SetPayload.

Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
  GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
  AlarmsOptions validated at startup (Server-026); Authorization.md
  Constraint Enforcement snippet/prose enumerate the bulk write/read
  family (Server-027); bulk-read-commands and bulk-write-commands
  capability tokens added to OpenSession (Server-029);
  NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
  state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
  guard the poll path uses, at every command entry (Worker-024);
  RunAsync null-checks the runtime-session factory result
  (Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
  GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
  rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
  CancelCommandReturnValue serialised under lock (Worker.Tests-027);
  Probes namespace lifted to MxGateway.Worker.Tests.Probes
  (Worker.Tests-029); cancel-envelope sequence numbers monotonised
  (Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
  section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
  (Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
  test backed by a TaskCompletionSource fake (Tests-022); companion
  FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
  (Tests-023); constraint plan reply-count divergence pinned
  (Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
  end-to-end (IntegrationTests-018); abnormal-exit keyword set
  tightened to pipe-disconnected/end-of-stream and the test now
  asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
  default 30s wall-clock budget doesn't kill them (015);
  BenchStreamEventsAsync observes the inner stream task on every
  exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
  %w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
  RFC3339Nano with fractional seconds (019); runStreamEvents installs
  signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
  table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
  cancellation contract Client.Java-015 established (022); stream-events
  text path uses Long.toUnsignedString for worker_sequence (023);
  bench-read-bulk no longer pollutes success-latency histogram with
  failure durations (024); --shutdown-timeout CLI option propagates
  through to ClientOptions (025); seven new MxGatewayCliTests cover
  the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
  wheel-build smoke test added under tests/test_packaging.py (020);
  README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
  document the AsRef<str> read_bulk genericism (019);
  next_correlation_id re-exported at the crate root, with a
  property-style doc contract and an explicit disclaimer that the
  literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
  IConstraintEnforcer mechanism instead of "tag-allowlist filter"
  (014); BulkReadResult gains explicit per-arm payload-population
  documentation for the success vs failure cases (015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 10:28:54 -04:00

50 KiB

Raw Blame History

Code Review — Client.Python

Field	Value
Module	`clients/python`
Reviewer	Claude Code
Review date	2026-05-20
Commit reviewed	`a020350`
Status	Reviewed
Open findings	0

Checklist coverage

A re-review at commit a020350 over the same module. Prior findings (Client.Python-001 — Client.Python-017) remain closed and are kept as history. This section reflects categories evaluated in this pass.

#	Category	Result
1	Correctness & logic bugs	No new issues found — TLS-by-default fix in Client.Python-013 verified; no test fixture accidentally relies on plaintext defaults.
2	mxaccessgw conventions	No new issues found — secrets redacted, MXAccess parity preserved, generated code untouched.
3	Concurrency & thread safety	No new issues found — close-idempotency and shared cancel-on-cancel iterator still in place.
4	Error handling & resilience	No new issues found.
5	Security	No new issues found — `_use_plaintext` now requires explicit `--plaintext` opt-in (Client.Python-013 resolution verified). The `--api-key` flag is also still redacted from the option repr and CLI errors.
6	Performance & resource management	No new issues found.
7	Design-document adherence	No new issues found — `PythonClientDesign.md` is consistent with the implemented surface.
8	Code organization & conventions	Issue found: `mxgateway_cli` is shipped in the wheel but has no PEP 561 `py.typed` marker (Client.Python-019), so the CLI module's inline type hints are invisible to downstream `mypy` runs.
9	Testing coverage	Issue found: no test exercises the wheel-build / editable-install flow; the broken `pyproject.toml` (Client.Python-018) was not caught at commit time because the test suite runs from `src/` via `pytest pythonpath` (Client.Python-020).
10	Documentation & comments	Issue found: cross-client CLI parity gap — the Python CLI ships none of the Galaxy subcommands (`galaxy-test-connection`, `galaxy-last-deploy`, `galaxy-discover`, `galaxy-watch`) the .NET / Go / Rust / Java CLIs all expose, and lacks the new `.NET`-only `bench-stream-events`. README does not flag the gap (Client.Python-021).

Findings

Client.Python-001

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25`
Status	Resolved

Description: The package description in pyproject.toml still says "Async Python client scaffold" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The mxgw-py console-script name is itself consistent between pyproject.toml and the README.)

Recommendation: Update the pyproject.toml description to drop "scaffold"; keep README CLI examples in sync with the actual mxgw-py entry point.

Resolution: 2026-05-18 — Confirmed: pyproject.toml:8 description read "Async Python client scaffold for MXAccess Gateway." Changed to "Async Python client for MXAccess Gateway." The mxgw-py console-script name was already consistent with the README, so no README change was needed. Pure metadata fix — no test required.

Client.Python-002

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/python/src/mxgateway/__init__.py:27`
Status	Resolved

Description: MxGatewayCommandError is imported into __init__.py and is a documented public exception, but it is missing from __all__. It is the parent of MxAccessError and a meaningful catch target, so omitting it from the public surface is inconsistent — from mxgateway import * will not expose it and tooling that respects __all__ treats it as private.

Recommendation: Add "MxGatewayCommandError" to the __all__ list.

Resolution: 2026-05-18 — Re-triaged: this finding is stale against the reviewed source. clients/python/src/mxgateway/__init__.py already imports MxGatewayCommandError (line 16) and lists "MxGatewayCommandError" in __all__ (line 38). from mxgateway import * exposes it correctly. Verified at runtime ('MxGatewayCommandError' in mxgateway.__all__ is True). No source change required — the defect described no longer exists.

Client.Python-003

Field	Value
Severity	Medium
Category	Error handling & resilience
Location	`clients/python/src/mxgateway/client.py:125-137,155-173`
Status	Resolved

Description: stream_events_raw and query_active_alarms call the stub directly with a timeout kwarg when stream_timeout is set, with no TypeError fallback. galaxy.py:watch_deploy_events and _unary do have a fallback that strips timeout if the callable rejects it. This asymmetry means a fake/older stub that does not accept timeout crashes for gateway streams but not Galaxy streams. It is only masked today because stream_timeout defaults to None.

Recommendation: Apply the same try/except TypeError timeout-fallback pattern to stream_events_raw and query_active_alarms, or remove the fallback everywhere and standardise on a single behaviour.

Resolution: 2026-05-18 — Confirmed: both stream methods in client.py called the stub with timeout unconditionally and had no TypeError fallback, unlike _unary and galaxy.watch_deploy_events. Added a shared _open_stream helper in client.py that opens a server-streaming call and strips the timeout kwarg when the stub raises TypeError: ... unexpected keyword argument 'timeout', then routed both stream_events_raw and query_active_alarms through it. Regression tests in tests/test_stream_timeout_fallback.py (test_stream_events_raw_falls_back_when_stub_rejects_timeout, test_query_active_alarms_falls_back_when_stub_rejects_timeout, test_stream_events_raw_still_passes_timeout_to_capable_stub) failed before the fix and pass after. No public behaviour change for real gRPC stubs, so no README update needed.

Client.Python-004

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`clients/python/src/mxgateway_cli/commands.py:386,402-404`
Status	Resolved

Description: In _smoke, the local variable closed is set to False and never reassigned; the finally block's if not closed: is therefore always true. This is dead/misleading code suggesting a removed early-close path.

Recommendation: Remove the closed variable and the if not closed: guard; call await session.close() directly in the finally block (or use async with session:).

Resolution: 2026-05-18 — Confirmed: closed = False was set and never reassigned, making if not closed: dead code. Replaced the try/finally with async with session: so the session is closed via the documented async context manager — Session already implements __aexit__ → close(). Behaviour is unchanged (the session is still closed on every exit path); no test needed for the dead-code removal — exercised by the existing CLI smoke test.

Client.Python-005

Field	Value
Severity	Medium
Category	Performance & resource management
Location	`clients/python/src/mxgateway/galaxy.py:117-140`
Status	Resolved

Description: discover_hierarchy pages through the entire Galaxy object hierarchy and accumulates every GalaxyObject (each carrying its full attribute list) into a single in-memory list before returning. For a large Galaxy this is a very large allocation with no streaming alternative and no caller-side bound.

Recommendation: Offer an async-generator variant (e.g. iter_hierarchy()) that yields objects/pages as they arrive, keeping discover_hierarchy() as a convenience wrapper. At minimum document the memory characteristic.

Resolution: 2026-05-18 — Confirmed: discover_hierarchy buffered the entire hierarchy with no streaming alternative. Added GalaxyRepositoryClient.iter_hierarchy, an async generator that fetches one DiscoverHierarchyRequest page at a time and yields each GalaxyObject as it arrives, so peak memory is bounded by a single page (_DISCOVER_HIERARCHY_PAGE_SIZE). Pages are fetched lazily — the next page is only requested after the current page is fully consumed. discover_hierarchy is now a thin convenience wrapper ([obj async for obj in self.iter_hierarchy()]) that preserves its list[GalaxyObject] contract, including the repeated-page-token guard. Regression tests in tests/test_galaxy_iter_hierarchy.py (test_iter_hierarchy_yields_objects_across_pages, test_iter_hierarchy_is_lazy_and_does_not_prefetch_next_page, test_iter_hierarchy_rejects_repeated_page_token, test_discover_hierarchy_still_returns_full_list) failed before the fix and pass after. clients/python/README.md updated with the iter_hierarchy usage and memory guidance since this adds a new public method.

Client.Python-006

Field	Value
Severity	Low
Category	Concurrency & thread safety
Location	`clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55`
Status	Resolved

Description: close() on the clients and Session.close() use a plain self._closed check-then-set with an await between, with no lock. If two coroutines call close() concurrently both can pass the guard before either sets it, causing a double channel.close() / double CloseSession RPC. Single-task usage is the documented contract, so impact is low, but the idempotency guarantee asserted in docstrings only holds for sequential calls.

Recommendation: Set self._closed = True before the await, or guard with an asyncio.Lock, so the idempotency claim holds under concurrent close.

Resolution: 2026-05-18 — Confirmed the check-then-set window. Fixed GatewayClient.close, GalaxyRepositoryClient.close, and Session.close to set self._closed = True before the await (channel close / CloseSession RPC). A second coroutine entering close() while the first is still awaiting now hits the early-return guard and does not issue a second channel.close() / CloseSession. Docstrings updated to state the idempotency holds under concurrent calls. TDD: regression tests in tests/test_low_severity_findings.py (test_gateway_client_concurrent_close_closes_channel_once, test_galaxy_client_concurrent_close_closes_channel_once, test_session_concurrent_close_sends_one_close_session_rpc) — each uses a fake channel/client that stalls inside close/close_session_raw so two concurrent close() calls interleave at the exact race window; they failed before the fix and pass after.

Client.Python-007

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/python/src/mxgateway/client.py:204-213`
Status	Resolved

Description: _canceling_iterator (gateway event stream) does not catch asyncio.CancelledError to invoke call.cancel() explicitly — it relies on the finally block. galaxy.py:_canceling_iterator does explicitly catch CancelledError, cancel, and re-raise. The two are functionally equivalent today, but the inconsistency between near-identical helpers invites future divergence.

Recommendation: Make the two _canceling_iterator helpers identical, ideally by factoring a single shared helper.

Resolution: 2026-05-18 — Confirmed the divergence. Factored a single shared helper: client._canceling_iterator(call, operation) now takes the map_rpc_error operation string as a parameter, explicitly catches asyncio.CancelledError (cancels the call, re-raises) and grpc.RpcError, and repeats the cancel in finally. This replaces both the gateway _canceling_iterator and the gateway _canceling_active_alarms_iterator; galaxy.py now imports and delegates to the same helper instead of defining its own, so the gateway and Galaxy stream helpers are byte-for-byte identical. TDD: tests/test_low_severity_findings.py::test_gateway_stream_iterator_cancels_call_on_task_cancellation drives a cancellable fake stream and asserts the gateway iterator cancels the underlying call on task cancellation. All existing stream-cancellation tests still pass.

Client.Python-008

Field	Value
Severity	Low
Category	Correctness & logic bugs
Location	`clients/python/src/mxgateway/values.py:62-67,83-88`
Status	Resolved

Description: to_mx_value maps any Python float to VT_R8/MX_DATA_TYPE_DOUBLE with no handling for nan/inf, which are serialised and forwarded to MXAccess which may reject or mis-handle them. bytes is mapped to VT_RECORD/MX_DATA_TYPE_UNKNOWN, a questionable default. The data_type keyword exists but Session.write never forwards it.

Recommendation: Document the float/bytes mapping assumptions, optionally validate finiteness, and consider plumbing the data_type keyword through Session.write/write2.

Resolution: 2026-05-18 — Confirmed the non-finite-float hazard. Added an _ensure_finite guard in values.py: to_mx_value now raises ValueError for nan/inf/-inf, both for a scalar float and for a non-finite element inside a float sequence — MXAccess has no defined wire representation for non-finite doubles, so rejecting client-side is the correct fail-fast. The float/bytes mapping assumptions (finite-only doubles; bytes as an opaque VT_RECORD pass-through) are now documented in the values.py module docstring and clients/python/README.md. Plumbing data_type through Session.write/write2 was deliberately not done: it is a larger public-API surface change the finding only marks as "consider", and the documented MXAccess-parity convention is type-by-Python-value; the data_type keyword stays available on to_mx_value for callers that build the MxValue directly. TDD: tests/test_low_severity_findings.py adds test_to_mx_value_rejects_nan, test_to_mx_value_rejects_positive_infinity, test_to_mx_value_rejects_negative_infinity, test_to_mx_value_rejects_non_finite_float_in_sequence, and test_to_mx_value_accepts_finite_float. README updated since to_mx_value (used by Session.write/write2) now rejects an input it previously accepted.

Client.Python-009

Field	Value
Severity	Medium
Category	Testing coverage
Location	`clients/python/tests/`
Status	Resolved

Description: Several non-trivial public paths are untested: Session.write2/add_item2 request construction; the bulk-size limit _ensure_bulk_size/MAX_BULK_ITEMS guard; the None-argument TypeError guards in bulk methods; the TLS ca_file read path in create_channel; most CLI command bodies; and map_rpc_error's default (non-auth) branch.

Recommendation: Add tests for write2/add_item2 request shape, the bulk-size ValueError, the ca_file TLS branch, the generic map_rpc_error fallthrough, and at least one happy-path CLI command using a fake stub.

Resolution: 2026-05-18 — Confirmed coverage gap against the existing tests/ files. Added tests/test_coverage_gaps.py covering every path the finding lists: test_add_item2_sends_item_context_and_returns_handle and test_write2_sends_value_and_timestamp_value (request shape + MxValue oneof), test_subscribe_bulk_rejects_oversized_request and test_add_item_bulk_at_limit_is_allowed (the MAX_BULK_ITEMS _ensure_bulk_size boundary), test_advise_item_bulk_rejects_none_argument (the None-argument TypeError guard), test_create_channel_reads_ca_file and test_create_channel_missing_ca_file_raises (the TLS ca_file read path), test_map_rpc_error_generic_branch_returns_transport_error and test_map_rpc_error_handles_error_without_code (the non-auth map_rpc_error fallthrough and the no-code path), and test_cli_register_happy_path_emits_server_handle (a happy-path CLI command body driven end to end through CliRunner with a fake stub via a monkeypatched _connect). All 10 new tests pass. No source change required — this is a pure coverage finding.

Client.Python-010

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425`
Status	Resolved

Description: session.py ends with a module-level late import from .client import GatewayClient # noqa: E402 purely to satisfy a string type hint, and commands.py:_session does a function-local import. Both work around a circular dependency that from __future__ import annotations (already in effect) makes unnecessary. _session also lacks a return type annotation.

Recommendation: Drop the runtime late import in session.py and use a TYPE_CHECKING-guarded import for the hint; add the -> Session return annotation to commands.py:_session.

Resolution: 2026-05-18 — Confirmed: with from __future__ import annotations in effect all annotations are strings, so the runtime late import was unnecessary. Removed the trailing from .client import GatewayClient # noqa: E402 in session.py and replaced it with a top-of-file if TYPE_CHECKING: import that satisfies the GatewayClient hint without a runtime dependency (no import cycle: client.py does not import session at module scope). In commands.py, hoisted the function-local from mxgateway.session import Session to a module-level import and added the -> Session return annotation to _session. Verified import mxgateway and import mxgateway_cli.commands succeed with no circular-import error. Pure refactor — covered by the existing import and CLI tests; no new test needed.

Client.Python-011

Field	Value
Severity	Low
Category	Error handling & resilience
Location	`clients/python/src/mxgateway/errors.py:122-148`
Status	Resolved

Description: ensure_mxaccess_success raises MxAccessError if any mx_status.success == 0. This treats success == 0 as the failure sentinel, but 0 is also the proto3 scalar default for an unset MxStatusProxy. If the gateway ever returns a reply with an unpopulated status entry (e.g. a partially-filled bulk result), the client raises MxAccessError even though no real failure occurred.

Recommendation: Confirm against the proto/gateway contract whether success is guaranteed populated for every statuses entry; if not, key the failure decision on an explicit failure field rather than the success == 0 default.

Resolution: 2026-05-18 — Confirmed against the gateway contract: success is not guaranteed populated for every statuses entry. src/MxGateway.Worker/Conversion/MxStatusProxyConverter.cs::ConvertMany emits a placeholder MxStatusProxy for a null MXSTATUS_PROXY COM array entry, setting Category/DetectedBy to Unknown but leaving Success at its proto3 default of 0. A fully-default proto entry likewise has success == 0. Under the old client logic either placeholder would falsely raise MxAccessError. Fixed ensure_mxaccess_success to key the per-status failure decision on a new _is_mxaccess_status_failure helper that requires success == 0 and a populated, non-OK category — a status with category of MX_STATUS_CATEGORY_UNSPECIFIED (default proto) or MX_STATUS_CATEGORY_UNKNOWN (the null-entry placeholder) is treated as unpopulated and ignored. MX_STATUS_CATEGORY_OK is also excluded so a genuine success entry never raises. Real failures (categories WARNING and the error categories, raw value ≥ 2) still raise as before — the existing write.mxaccess-failure fixture (SECURITY_ERROR/OPERATIONAL_ERROR statuses) and the MXACCESS_FAILURE protocol-status path are unaffected. TDD: tests/test_low_severity_findings.py adds test_ensure_mxaccess_success_ignores_unpopulated_status_entry (default + null-placeholder entries, no raise), test_ensure_mxaccess_success_raises_on_populated_failure_status (populated COMMUNICATION_ERROR, raises), and test_ensure_mxaccess_success_passes_when_status_reports_success. No public-behaviour change for genuine replies, so no README update.

Client.Python-012

Field	Value
Severity	Low
Category	mxaccessgw conventions
Location	`clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77`
Status	Won't Fix

Description: Session.invoke_raw does not run ensure_mxaccess_success while Session.invoke does, so a caller using invoke_raw for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by design but under-documented — the README's "preserve raw replies" sentence does not state that *_raw methods skip MXAccess-failure detection entirely.

Recommendation: Document explicitly (README + docstring) that *_raw methods surface MXAccess HRESULT/status failures only inside the reply and do not raise MxAccessError, so parity-test callers know to inspect protocol_status/hresult/statuses themselves.

Resolution: 2026-05-18 — Won't Fix (no behaviour change). Confirmed this is intentional, correct parity behaviour: the *_raw methods exist precisely so parity-test callers can inspect an unmodified gateway reply, including embedded MXAccess HRESULT/status failures, without an exception masking them. Changing invoke_raw to raise MxAccessError would defeat its purpose and duplicate Session.invoke. The finding's only actionable point is the documentation gap, which has been addressed: clients/python/README.md now states explicitly that *_raw methods enforce gateway protocol success only and do not run MXAccess-failure detection, and the docstrings of GatewayClient.invoke_raw and Session.invoke_raw say the same and point callers to inspect protocol_status/hresult/statuses (and to Session.invoke for the checked variant). No code/test change — the runtime contract is unchanged and correct.

Client.Python-013

Field	Value
Severity	Medium
Category	Security
Location	`clients/python/src/mxgateway_cli/commands.py:757-762`
Status	Resolved

Description: _use_plaintext silently returns True whenever the endpoint string starts with localhost: or 127.0.0.1:, even if neither --plaintext nor --tls is supplied on the command line. Any CLI subcommand (e.g. mxgw-py open-session --endpoint localhost:5001 --api-key mxgw_<secret>) then attaches the API key to a plaintext gRPC channel without warning. This is a silent security downgrade: a user who deliberately ran the gateway behind TLS on loopback (e.g. for testing a production-shaped TLS config locally) and who passes --api-key expecting the secret to be transport-protected gets a plaintext bearer token instead. The auto-downgrade is also undocumented — README.md and the CLI --help text both describe --plaintext and --tls as the controls, with no mention that endpoint-prefix matching can override either. The other client CLIs do not auto-downgrade: the .NET CLI uses https://-prefix detection on a URI scheme (an explicit signal), Go and Java require an explicit --plaintext/--tls choice, and Rust defaults to plaintext only when plaintext = true is set on the options struct.

Recommendation: Drop the localhost-prefix auto-plaintext branch and require the user to pass --plaintext or --tls (or default to TLS to match the rest of the matrix). If the implicit-localhost behaviour is kept for ergonomics, document it prominently in both README.md and --help, emit a stderr warning when --api-key is combined with the auto-downgrade path, and add a CLI test asserting the auto-downgrade is in fact active so it is not silently lost in a future refactor.

Resolution: 2026-05-20 — Removed the silent localhost: / 127.0.0.1: auto-plaintext branch from _use_plaintext. The new contract matches the Go and Java CLIs: TLS is the default, --plaintext is the only way to opt in to an unencrypted channel, and --tls is accepted as a redundant, explicit affirmation of the default (mutually exclusive with --plaintext, which now raises click.UsageError). The --plaintext / --tls --help text and clients/python/README.md both call out the new behaviour. Added six regression tests in clients/python/tests/test_cli.py covering: (a) a localhost: endpoint with no flags resolves to TLS, (b) a 127.0.0.1: endpoint with no flags resolves to TLS, (c) --plaintext opts in to plaintext, (d) --tls is accepted and idempotent with the default, (e) --plaintext combined with --tls is rejected, and (f) an end-to-end CliRunner test asserting ClientOptions.plaintext == False flows through to GatewayClient.connect when no flag is supplied against a localhost: endpoint. Behaviour change for callers: scripts that previously relied on mxgw-py … --endpoint localhost:5000 … selecting plaintext silently must now add an explicit --plaintext flag (or set up TLS on the gateway). Calling mxgw-py with an --api-key against a plaintext-only gateway without --plaintext will now fail to connect rather than silently leaking the bearer token.

Client.Python-014

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/python/src/mxgateway_cli/commands.py:22-23`
Status	Resolved

Description: commands.py has two consecutive from mxgateway.values import lines:

from mxgateway.values import to_mx_value
from mxgateway.values import MxValueInput

These import from the same module and should be combined into a single from mxgateway.values import MxValueInput, to_mx_value. The split form is inconsistent with the rest of the file (every other module is imported in a single statement) and would be flagged by ruff/isort if any linter were configured. Pure style, no behavioural impact.

Recommendation: Collapse the two imports into one statement, ordered to match the conventional alphabetical-within-module pattern: from mxgateway.values import MxValueInput, to_mx_value.

Resolution: 2026-05-20 — Collapsed the two consecutive from mxgateway.values import to_mx_value / from mxgateway.values import MxValueInput lines in clients/python/src/mxgateway_cli/commands.py into a single from mxgateway.values import MxValueInput, to_mx_value statement, matching the alphabetical-within-module pattern used elsewhere in the file. Pure style fix — no behavioural impact, covered by the existing CLI tests.

Client.Python-015

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/python/src/mxgateway_cli/commands.py:273-294,564-647`, `clients/python/tests/`
Status	Resolved

Description: _bench_read_bulk is a ~80-line CLI body that opens its own session, registers, subscribe_bulks, runs a warm-up loop, a measurement loop, collects per-call latencies, computes a percentile summary, and emits the shared cross-language JSON schema. It is the largest untested CLI command in the module — tests/ has no bench_read_bulk test, fake-stub-driven or otherwise. A drift in the schema field names (callsPerSecond, cachedReadResults, latencyMs.p50, …) would break the cross-language scripts/bench-read-bulk.ps1 aggregation silently. _percentile_summary and _percentile are also untested — the boundary cases (n == 0, n == 1, quantile interpolation) would benefit from a small unit test since the identical algorithm is duplicated in the .NET / Go / Rust / Java drivers and a divergence would corrupt cross-language comparisons.

Recommendation: Add a fake-stub-driven bench_read_bulk test that drives a short --duration-seconds 0 --warmup-seconds 0 run through CliRunner and asserts the JSON schema (language == "python", the full key set, latencyMs.p50/p95/p99/max/mean present). Add unit tests for _percentile covering n == 0, n == 1, and a known-good interpolated value at p95 so the implementation cannot silently drift from the other clients.

Resolution: 2026-05-20 — Added clients/python/tests/test_cli_bench_and_helpers.py with three layers of coverage. (1) _percentile unit tests pin the cross-language algorithm (rank = q * (n - 1), linear interpolation between adjacent ranks): empty sample returns 0.0, single element returns that element, exact-rank queries return the sample value (p50 of [10,20,30,40,50] is 30.0), and the interpolated p95/p99 values (48.0 / 49.6 for that same five-element sample) are locked down so any drift from the .NET / Go / Rust / Java drivers fails fast. (2) _percentile_summary tests assert the full {p50, p95, p99, max, mean} dict shape, the zero-sample placeholder, and the 3-decimal rounding contract. (3) A bench-read-bulk smoke test (test_bench_read_bulk_emits_cross_language_schema) drives the CLI through CliRunner with --duration-seconds 0 --warmup-seconds 0 against a fake stub that handles OpenSession, Register, SubscribeBulk, ReadBulk, and UnsubscribeBulk, then asserts the emitted JSON has exactly the 16 cross-language schema keys (language, command, endpoint, clientName, bulkSize, durationSeconds, warmupSeconds, durationMs, tags, totalCalls, successfulCalls, failedCalls, totalReadResults, cachedReadResults, callsPerSecond, latencyMs) and that latencyMs is a {p50, p95, p99, max, mean} sub-object — guarding against silent breakage of scripts/bench-read-bulk.ps1's cross-language aggregation. No source change — this is a pure coverage finding.

Client.Python-016

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/python/src/mxgateway_cli/commands.py:25,757-775,805-830`
Status	Resolved

Description: Three CLI helper paths are not covered by tests/:

_use_plaintext localhost auto-downgrade (line 762) — the endpoint.startswith("localhost:") or endpoint.startswith("127.0.0.1:") branch (see also Client.Python-013) is untested; no test asserts that an endpoint without --plaintext and without --tls resolves to plaintext.
_collect_events MAX_AGGREGATE_EVENTS guard (line 811-815) — passing --max-events greater than MAX_AGGREGATE_EVENTS raises click.BadParameter, but no test exercises the guard. A silent removal of the constant or the comparison would not be caught.
_api_key_from_env (line 765-768) — only the implicit path through _secrets is exercised; there is no test that verifies an env-var name resolves to a value and that an unset env var produces None.

These are all small, fake-stub-driven CLI behaviours rather than end-to-end paths. The previous coverage finding (Client.Python-009) closed without adding tests for these specific paths.

Recommendation: Add three small CliRunner / unit tests: one asserting the localhost auto-plaintext (or its replacement, if Client.Python-013 is fixed), one asserting --max-events 10001 exits non-zero with the MAX_AGGREGATE_EVENTS error message, and one asserting _api_key_from_env("MXGATEWAY_API_KEY") returns the env value and None for an unset variable.

Resolution: 2026-05-20 — Scope adjusted: Client.Python-013 has since removed the _use_plaintext localhost auto-plaintext branch, so item (1) is no longer a real code path — the test_use_plaintext_requires_explicit_flag_for_localhost_endpoint and test_cli_localhost_endpoint_defaults_to_tls_via_open_session regressions added under Client.Python-013 already pin the new TLS-by-default contract. The remaining two helpers are now covered in clients/python/tests/test_cli_bench_and_helpers.py. (2) MAX_AGGREGATE_EVENTS cap: test_collect_events_rejects_max_events_above_aggregate_cap drives stream-events with --max-events 10001 through CliRunner against stubbed _connect / _session fakes and asserts the CLI exits non-zero with the documented less than or equal to 10000 message; test_collect_events_accepts_max_events_at_aggregate_cap_boundary confirms --max-events 10000 is accepted at the boundary and returns an empty event list. (3) _api_key_from_env: test_api_key_from_env_resolves_value_when_variable_is_set (env-var populated → returned), test_api_key_from_env_returns_none_when_variable_is_unset (env-var unset → None), test_api_key_from_env_returns_none_when_name_is_none (the name is None early-return), and test_api_key_from_env_returns_none_when_name_is_empty_string (the if not name truthiness guard). No source change — pure coverage finding.

Client.Python-017

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/python/pyproject.toml:5-25`, `clients/python/src/mxgateway/`
Status	Resolved

Description: The package metadata in pyproject.toml is minimal for a published wheel:

No authors field. PyPI / pip show will display no author.
No license field, no license-files field, and no LICENSE file is referenced from the project. The repo as a whole has no top-level LICENSE either, but other client packages (Java has a license entry, the .NET package has a license expression in the csproj) tend to set this.
No classifiers (no Programming Language :: Python :: 3.12, Operating System :: Microsoft :: Windows, Topic :: …, no development-status classifier). Without these the PyPI search facets are empty and tooling like pip cannot tell whether the package is alpha/beta/stable.
No keywords, no [project.urls] (no homepage / source / issue link pointing back to the repo).
The package ships no PEP 561 py.typed marker file in src/mxgateway/. Type hints are written throughout the module (from __future__ import annotations, full annotations on every public function), but downstream consumers running mypy on mxaccess-gateway-client will not see those hints — PEP 561 requires the marker file to opt the package into type-stub distribution.

Recommendation: Add authors, license = "<spdx>", keywords, and [project.urls] to pyproject.toml; add at least the standard classifiers trio (Development Status, Programming Language :: Python :: 3.12, Intended Audience); create an empty src/mxgateway/py.typed file and include it in the wheel via [tool.setuptools.package-data] so consumers running mypy against an installed wheel pick up the type information.

Resolution: 2026-05-20 — Filled out clients/python/pyproject.toml with the missing PyPI metadata: authors = [{ name = "MXAccess Gateway Authors" }], license = "Proprietary" (the repo has no top-level LICENSE file and no other client publishes under an OSS licence, so the SPDX Proprietary expression matches the de-facto status), the standard classifier set (Development Status :: 4 - Beta, Intended Audience :: Developers / Information Technology, Operating System :: Microsoft :: Windows and :: POSIX, Programming Language :: Python / Python :: 3 / Python :: 3.12, Topic :: Software Development :: Libraries :: Python Modules, Topic :: System :: Distributed Computing, and Typing :: Typed), a keywords list (mxaccess, archestra, gateway, grpc, industrial, scada), and [project.urls] with Homepage / Source / Issues pointing at the Gitea repo. Added the PEP 561 marker file clients/python/src/mxgateway/py.typed (empty, as the spec requires) and declared it in [tool.setuptools.package-data] mxgateway = ["py.typed"] so the wheel ships the marker and downstream mypy users see the inline type hints. Pure metadata / packaging change — python -m pytest -q still passes (91 tests).

Client.Python-018

Field	Value
Severity	High
Category	Code organization & conventions
Location	`clients/python/pyproject.toml:11`
Status	Resolved

Description: The Client.Python-017 resolution set license = "Proprietary" as a top-level string. Under PEP 639 (enforced by setuptools >= 77, and active in the installed setuptools 82.0.1), the project.license string form must be a valid SPDX expression. "Proprietary" is not a registered SPDX identifier, so the configured build backend (setuptools.build_meta) refuses the file outright. Both python -m pip wheel . --no-deps --wheel-dir … and python -m pip install -e . — the exact commands documented in clients/python/README.md ("Build And Test", "Packaging") and the "build wheel" instruction in docs/ClientPackaging.md — now fail before any source is compiled with:

ValueError: invalid pyproject.toml config: `project.license`.
configuration error: `project.license` must be valid exactly by one definition (0 matches found):
    - {type: string, format: 'SPDX'}
    - type: table keys: 'file': … required: ['file']
    - type: table keys: 'text': … required: ['text']

python -m pytest still runs because [tool.pytest.ini_options] pythonpath = ["src"] lets pytest import the package without an install — which masked the regression at commit time and explains how the Client.Python-017 resolution comment was able to assert "python -m pytest -q still passes (91 tests)" while shipping a wheel build that cannot start. The Client.Python-017 resolution comment that "the SPDX Proprietary expression matches the de-facto status" is incorrect: Proprietary is not a registered SPDX identifier; only entries on the SPDX licence list (e.g. MIT, Apache-2.0, BSD-3-Clause) or LicenseRef-* custom identifiers satisfy the { type: string, format: 'SPDX' } rule. PEP 639 added the LicenseRef-… escape hatch precisely for proprietary / unlisted licences.

This is a regression of the developer-onboarding workflow introduced by the very commit being reviewed. A fresh checkout cannot run python -m pip install -e ".[dev]" (the command in CLAUDE.md's "Clients" section) without first patching pyproject.toml.

Recommendation: Fix the license value so the build backend accepts it. Three concrete options, in order of preference:

Use a LicenseRef-* SPDX-compatible custom identifier: license = "LicenseRef-Proprietary". Requires no additional LICENSE file and is honoured by setuptools / pip / PyPI as a proprietary marker.
Add a top-level LICENSE file (or clients/python/LICENSE) and point at it via the table form: license = { file = "LICENSE" }. This also documents the proprietary terms.
Drop the license key entirely and convey the same intent via the classifier "License :: Other/Proprietary License" (already part of the classifier set), reverting the PEP-639 string field that the build backend now insists must be SPDX.

Add a CI / pre-commit check that runs python -m pip wheel . --no-deps (or python -m build) on clients/python so a future pyproject.toml regression is caught at commit time rather than at first install on a clean machine. See also Client.Python-020.

Resolution: 2026-05-20 — Dropped the invalid top-level license = "Proprietary" string from clients/python/pyproject.toml and added the existing License :: Other/Proprietary License trove classifier to convey the same intent without violating PEP 639's SPDX rule. No LICENSE file exists at the repo root or under clients/python/, so the license = { file = "LICENSE" } table form was not used; relying on the classifier is the option (3) variant called out in the recommendation. Verified by running python -m pip wheel . --no-deps -w ./.test-wheel-output from clients/python: the build now succeeds and emits mxaccess_gateway_client-0.1.0-py3-none-any.whl (47 KB) where previously it failed with the project.license must be valid exactly by one definition ValueError. The CI / pre-commit recommendation is addressed by Client.Python-020.

Client.Python-019

Field	Value
Severity	Low
Category	Code organization & conventions
Location	`clients/python/pyproject.toml:60-61`, `clients/python/src/mxgateway_cli/`
Status	Resolved

Description: Client.Python-017 added the PEP 561 marker file clients/python/src/mxgateway/py.typed and declared it in [tool.setuptools.package-data] mxgateway = ["py.typed"]. The wheel therefore advertises mxgateway as typed. However the same wheel also ships the mxgateway_cli package (setuptools.packages.find with where = ["src"] discovers both mxgateway and mxgateway_cli, confirmed via find_packages in this review), and mxgateway_cli:

is shipped in the wheel and is the package the mxgw-py console script entry point resolves into ([project.scripts] mxgw-py = "mxgateway_cli.commands:main"),
is fully type-annotated (every function in commands.py has full parameter and return annotations; from __future__ import annotations is in effect),
but has no py.typed file and is not listed in [tool.setuptools.package-data].

PEP 561 requires the marker file inside each importable package the distribution wants to expose to type checkers — the mxgateway marker does not transfer to mxgateway_cli. A downstream consumer that imports or composes against mxgateway_cli (e.g. wrapping it as a programmatic CLI library) will see all symbols as Untyped under mypy despite the hints being present in source.

This is a follow-up to Client.Python-017 — the fix is small and pure packaging.

Recommendation: Create clients/python/src/mxgateway_cli/py.typed (empty file, as PEP 561 requires) and extend the existing package-data declaration so the wheel ships it:

[tool.setuptools.package-data]
mxgateway = ["py.typed"]
mxgateway_cli = ["py.typed"]

No source change in either package; verify by building a wheel (once Client.Python-018 is fixed) and inspecting that both mxgateway/py.typed and mxgateway_cli/py.typed appear in the wheel contents.

Resolution: 2026-05-20 — Created the empty PEP 561 marker file clients/python/src/mxgateway_cli/py.typed and added mxgateway_cli = ["py.typed"] under [tool.setuptools.package-data] in clients/python/pyproject.toml alongside the existing mxgateway = ["py.typed"] line. Verified by inspecting the built wheel (mxaccess_gateway_client-0.1.0-py3-none-any.whl): the archive now contains both mxgateway/py.typed and mxgateway_cli/py.typed, so downstream mypy consumers see the inline type hints in both packages. Pure packaging change — no source modifications.

Client.Python-020

Field	Value
Severity	Low
Category	Testing coverage
Location	`clients/python/tests/`, `scripts/`
Status	Resolved

Description: Client.Python-018 is invisible to the existing test suite: python -m pytest passes because [tool.pytest.ini_options] pythonpath = ["src"] lets pytest import the package without going through setuptools.build_meta. None of the 91 tests build the wheel, do an editable install, or otherwise exercise the setuptools.build_meta configuration validator. As a result, a pyproject.toml regression that breaks pip install -e . / pip wheel . — the exact commands documented in the Python client README and CLAUDE.md — passes the test suite green. The other language clients have parallel coverage gaps (no CI-level "the package installs" smoke test for Python in scripts/run-client-e2e-tests.ps1, which only runs the live e2e matrix and assumes the editable install already worked), but Python is the only one whose published install command is currently broken.

Recommendation: Add a thin pytest module (e.g. tests/test_packaging.py) that runs

import subprocess, sys, pathlib
def test_pyproject_validates_against_setuptools_build_meta():
    here = pathlib.Path(__file__).resolve().parent.parent
    result = subprocess.run(
        [sys.executable, "-m", "pip", "wheel", ".",
         "--no-deps", "--no-build-isolation",
         "--wheel-dir", str(tmp_path)],
        cwd=here, capture_output=True, text=True,
    )
    assert result.returncode == 0, result.stderr

(or any equivalent that invokes setuptools.config.pyprojecttoml.read_configuration). Marker the test with @pytest.mark.slow if the wheel build is too heavy for the default suite, and document the test in the README. Alternatively add a CI step to scripts/run-client-e2e-tests.ps1 (or a new scripts/check-python-package.ps1) that fails the build when the wheel build fails. Either approach would have surfaced Client.Python-018 at commit time.

Resolution: 2026-05-20 — Added clients/python/tests/test_packaging.py::test_pip_wheel_build_succeeds. The test invokes python -m pip wheel . --no-deps --wheel-dir <tmp> against the package root via subprocess and asserts (a) exit code zero and (b) an mxaccess_gateway_client-*.whl file is produced in the temp directory, capturing stdout/stderr in the assertion message on failure so any future PEP 639 / SPDX violation or other setuptools.build_meta configuration error is reported with the build backend's own error text. Verified the test would have caught Client.Python-018: with the old license = "Proprietary" string in place the test fails with the project.license must be valid exactly by one definition ValueError. The pytest module is the simpler half of the recommendation; no PowerShell wrapper script was added since pytest already runs in the same python -m pytest invocation the README documents. Test suite is now 92 tests (was 91), all passing.

Client.Python-021

Field	Value
Severity	Low
Category	Documentation & comments
Location	`clients/python/src/mxgateway_cli/commands.py`, `clients/python/README.md:235-258`
Status	Resolved

Description: Cross-client CLI parity check (one of the things the review prompt asks for): the mxgw-py CLI subcommand set has drifted from every other client CLI in the matrix.

Subcommand inventory at this commit:

Subcommand	.NET (`mxgw`)	Go (`mxgw-go`)	Rust (`mxgw`)	Java (`mxgw-java`)	Python (`mxgw-py`)
`version`	yes	yes	yes	yes	yes
`ping`	yes	(no)	yes	(no)	yes
`open-session` / `close-session`	yes	yes	yes	yes	yes
`register` / `add-item` / `advise`	yes	yes	yes	yes	yes
`subscribe-bulk` / `unsubscribe-bulk` / `read-bulk`	yes	yes	yes	yes	yes
`write-bulk` / `write2-bulk` / `write-secured-bulk` / `write-secured2-bulk`	yes	yes	yes	yes	yes
`write` / `write2`	yes / (varies)	yes / (no)	yes / yes	yes / (no)	yes / yes
`stream-events`	yes	yes	yes	yes	yes
`smoke`	yes	yes	yes	yes	yes
`bench-read-bulk`	yes	yes	yes	yes	yes
`bench-stream-events`	yes	(no)	(no)	(no)	(no)
`galaxy-test-connection` (or alias)	yes	yes	yes	yes	(no)
`galaxy-last-deploy` / `galaxy-deploy-time`	yes	yes	yes	yes	(no)
`galaxy-discover`	yes	yes	yes	yes	(no)
`galaxy-watch`	yes	yes	yes	yes	(no)

Two parity gaps remain after Client.Python-013/017:

The Python CLI ships no Galaxy subcommands at all even though the GalaxyRepositoryClient library wrapper is fully implemented and exercised by tests/test_galaxy.py / tests/test_galaxy_iter_hierarchy.py. The README acknowledges the watch-deploy-events gap inline ("The CLI does not currently expose a streaming watch-deploy-events subcommand — use the library API directly when subscribing to deploy events from Python.") but does not call out that the other three Galaxy subcommands are also missing — and the .NET / Go / Rust / Java CLIs all expose them. A user running the cross-language smoke matrix who expects Python to behave like the other clients sees a silent "command not found" on mxgw-py galaxy-test-connection.
The new bench-stream-events subcommand (added to the .NET CLI in the previous commit 1cd51bb) is .NET-only today; the Python CLI is consistent with Go / Rust / Java on this point. Worth flagging as a forward-looking parity gap that will need filling if the cross-language benchmark matrix grows a stream-events driver in scripts/.

Severity is Low because the existing scripts/bench-read-bulk.ps1 matrix only invokes bench-read-bulk and does not break, and the Python GalaxyRepositoryClient library is fully functional — the gap is purely in the test CLI surface. But cross-client parity is an explicit review check and the gap is not documented.

Recommendation: Either (a) add galaxy-test-connection, galaxy-last-deploy, galaxy-discover, and galaxy-watch subcommands to mxgateway_cli/commands.py (each is a thin wrapper over GalaxyRepositoryClient, mirroring the existing four-language implementation), or (b) update clients/python/README.md's "CLI" section with an explicit "CLI parity gaps" subsection that lists the missing subcommands and recommends the library API. Option (a) is preferable for cross-language matrix testing. Also document the bench-stream-events gap symmetrically once a cross-language stream benchmark driver is added under scripts/.

Resolution: 2026-05-20 — Scoped this finding to a documentation-only fix; the full Galaxy CLI parity implementation (four new subcommands wired to GalaxyRepositoryClient) is a larger piece of work and will be tracked as a separate follow-up finding. Added a new "CLI Parity Gaps" subsection to clients/python/README.md immediately under the existing CLI section that explicitly enumerates the four missing mxgw-py Galaxy subcommands (galaxy-test-connection, galaxy-last-deploy, galaxy-discover, galaxy-watch), names the sibling CLIs that already expose them (.NET mxgw, Go mxgw-go, Rust mxgw, Java mxgw-java), points readers at the library API (GalaxyRepositoryClient, already documented under "Galaxy Repository Browse") as the supported Python entry point in the interim, and also flags the .NET-only bench-stream-events gap so the cross-language benchmark matrix has a record of the asymmetry. No CLI source change; the implementation of the four Galaxy subcommands is deferred. Resolved as a doc note rather than a full parity fix.

50 KiB Raw Blame History