Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules

Re-reviewed every module/client against the 10-category checklist (REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and fixed them in three priority waves (3 High, 17 Medium, 52 Low). Highs - Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in GatewayGrpcScopeResolver so non-admin keys can use them; document the mapping in docs/Authorization.md; add interceptor tests. - Client.Java-013: add the five missing bulk-method stubs to the CLI FakeSession so the test module compiles on a clean tree. - Client.Rust-013: fix the clippy::doc_lazy_continuation regression in generated tonic code by reformatting the ReadBulkCommand proto comment and scoping a #![allow(...)] to the generated submodules. Mediums (highlights) - Server: unify GatewaySession state-lock discipline (-015) and make DisposeAsync race-safe against in-flight CloseAsync (-016); add constraint-enforcement test coverage for the bulk-plan path (-021). - Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop can distinguish graceful shutdown from a real STA-affinity violation (-016); have the watchdog skip StaHung while CurrentCommandCorrelationId is non-empty so a legitimate slow ReadBulk no longer self-faults (-017). - Tests: add per-method round-trip + cancellation coverage for the 11 GatewaySession bulk methods (-013); replace the real TCP probe in GalaxyHierarchyCacheTests with an IGalaxyRepository fake (-016). - IntegrationTests: drive the StreamEvents writer in the live Write test and assert OnWriteComplete (-012); add live tests for Unadvise/RemoveItem/Unregister ordering, WriteSecured, and abnormal worker exit (-014). - Worker.Tests: replace MxAccessSession reflection with an internal CreateForTesting factory (-016); cover WorkerCancel and unexpected-body envelope branches (-017). - Client.Java: cancel MxEventStream when close() races beforeStart() (-014); return a CancellingCompletableFuture that actually forwards cancellation through .thenApply chains (-015). - Client.Python: drop the silent localhost-plaintext downgrade in the CLI; require explicit --plaintext (-013). - Client.Rust: stop bench-read-bulk from polluting success-latency histograms with failed-call durations (-015); add coverage for the five MalformedReply paths, the bulk-write helpers, the Error::Unavailable mapping, and the unary-fault path (-016). - Contracts: extend docs/Contracts.md with the bulk read/write command family (-009). Lows (highlights) - Server: cap GalaxyGlobMatcher.RegexCache; align WorkerAlarmRpcDispatcher missing-session handling; drop the duplicate dashboard @page routes; refresh IAlarmRpcDispatcher XML doc. - Worker: surface SetXmlAlarmQuery COM failures; remove dead subscriptionExpression / ExecutingCommand arms; preserve factory-supplied runtime sessions; split MxAlarmSnapshot.cs into three files. - Tests: dispose the WebApplication in seven test classes; rebuild FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion source; switch the heartbeat-expires test to ManualTimeProvider; add InvariantCulture to the remaining DateTimeOffset.Parse sites; document GalaxyFilterInputSafetyTests in GatewayTesting.md. - IntegrationTests: comment fixes, RecordingServerStreamWriter IDisposable, class-level [Trait], single-source ZB default connection string. - Worker.Tests: replace silent-return gating with LiveMxAccessFact so absent env vars SKIP not pass; PascalCase rename of probe [Fact]s; deterministic deadline test; new frame-protocol error tests; ComputeTransitions diff-coverage; relocate dev-rig probes to Probes/. - Contracts: add round-trip coverage and per-field redaction / Galaxy-identifier comments to the protos. - Client.Dotnet: introduce clients/dotnet/Directory.Build.props so TreatWarningsAsErrors / analysers apply; document DiscoverHierarchyOptions and IMxGatewayCliClient; require typed bulk-read handles in CLI; surface AcknowledgeAlarm transport faults through Translate(). - Client.Go: kill dead code in alarms_test / fakeGalaxyServer / runWriteBulkVariant; document the six new subcommands in writeUsage; drain galaxy-watch events on limit; switch io.EOF comparisons to errors.Is. - Client.Java: shared shutdown helpers + new shutdownTimeout option; regex-based credential redaction; Long.toUnsignedString for uint64 sequence; doc fixes. - Client.Python: combine duplicate imports; add coverage for _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS / _api_key_from_env; populate pyproject metadata and ship py.typed. - Client.Rust: expose next_correlation_id() so CLI ping/close stop hard-coding correlation IDs; resync RustClientDesign.md with the current Session / Error surface and CLI subcommand set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00
parent 1cd51bbda3
commit a0203503a7
122 changed files with 8723 additions and 757 deletions
@@ -4,25 +4,29 @@
 |---|---|
 | Module | `clients/python` |
 | Reviewer | Claude Code |
-| Review date | 2026-05-18 |
-| Commit reviewed | `3cc53a8` |
+| Review date | 2026-05-20 |
+| Commit reviewed | `1cd51bb` |
 | Status | Reviewed |
 | Open findings | 0 |

 ## Checklist coverage

+A re-review at commit `1cd51bb` over the same module. Prior findings
+(Client.Python-001 — Client.Python-012) remain closed and are kept as
+history. This section reflects categories evaluated in this pass.
+
 | # | Category | Result |
 |---|---|---|
-| 1 | Correctness & logic bugs | Issues found: dead `closed` variable (Client.Python-004); float/bytes value-mapping assumptions (Client.Python-008). |
-| 2 | mxaccessgw conventions | Largely adheres; one missing export and a `*_raw` MXAccess-failure documentation gap (Client.Python-002, Client.Python-012). |
-| 3 | Concurrency & thread safety | Issue found: `close()` idempotency claim does not hold under concurrent close (Client.Python-006). |
-| 4 | Error handling & resilience | Issues found: inconsistent timeout-kwarg fallback (Client.Python-003); `success == 0` default-value hazard (Client.Python-011); inconsistent cancel helpers (Client.Python-007). |
-| 5 | Security | No issues found — API keys redacted in repr and CLI output, TLS supported, no secret logging. |
-| 6 | Performance & resource management | Issue found: `discover_hierarchy` buffers the whole hierarchy in memory (Client.Python-005). |
-| 7 | Design-document adherence | Matches the design docs closely; minor CLI doc drift (Client.Python-001). |
-| 8 | Code organization & conventions | Issues found: `MxGatewayCommandError` omitted from `__all__` (Client.Python-002); fragile circular-import workaround (Client.Python-010). |
-| 9 | Testing coverage | Issue found: `write2`, `add_item2`, bulk-size limits, TLS `ca_file`, and CLI command bodies untested (Client.Python-009). |
-| 10 | Documentation & comments | Issue found: stale "scaffold" package description (Client.Python-001). |
+| 1 | Correctness & logic bugs | Issue found: `_use_plaintext` silently downgrades any `localhost:` / `127.0.0.1:` endpoint to plaintext (Client.Python-013). |
+| 2 | mxaccessgw conventions | No new issues found — secrets redacted, MXAccess parity preserved, generated code untouched, no Blazor/COM violations apply (Python client). |
+| 3 | Concurrency & thread safety | No new issues found — close-idempotency hazard fixed in Client.Python-006, shared `_canceling_iterator` cancels on `CancelledError`. |
+| 4 | Error handling & resilience | No new issues found at this commit (prior 003, 007, 011 remain closed). |
+| 5 | Security | Issue found: implicit plaintext-on-localhost (Client.Python-013) means a user explicitly listing a TLS-fronted loopback endpoint with `--api-key` but without `--tls`/`--plaintext` silently transmits the bearer token in cleartext. |
+| 6 | Performance & resource management | No new issues found — `iter_hierarchy` streams pages lazily (Client.Python-005 resolution). |
+| 7 | Design-document adherence | No new issues found — `PythonClientDesign.md` matches the implemented surface. |
+| 8 | Code organization & conventions | Issue found: duplicate `from mxgateway.values import` lines in `commands.py:22-23` (Client.Python-014). |
+| 9 | Testing coverage | Issues found: `bench_read_bulk` CLI body, `MAX_AGGREGATE_EVENTS` event-cap, and `_use_plaintext` localhost-auto-plaintext path are untested (Client.Python-015, Client.Python-016). |
+| 10 | Documentation & comments | Issues found: `pyproject.toml` lacks PyPI metadata (`authors`, `license`, `classifiers`, `urls`) and no PEP 561 `py.typed` marker (Client.Python-017); auto-plaintext behaviour is undocumented (Client.Python-013). |

 ## Findings

@@ -205,3 +209,258 @@
 **Recommendation:** Document explicitly (README + docstring) that `*_raw` methods surface MXAccess HRESULT/status failures only inside the reply and do not raise `MxAccessError`, so parity-test callers know to inspect `protocol_status`/`hresult`/`statuses` themselves.

 **Resolution:** 2026-05-18 — Won't Fix (no behaviour change). Confirmed this is intentional, correct parity behaviour: the `*_raw` methods exist precisely so parity-test callers can inspect an unmodified gateway reply, including embedded MXAccess HRESULT/status failures, without an exception masking them. Changing `invoke_raw` to raise `MxAccessError` would defeat its purpose and duplicate `Session.invoke`. The finding's only actionable point is the documentation gap, which has been addressed: `clients/python/README.md` now states explicitly that `*_raw` methods enforce gateway protocol success only and do **not** run MXAccess-failure detection, and the docstrings of `GatewayClient.invoke_raw` and `Session.invoke_raw` say the same and point callers to inspect `protocol_status`/`hresult`/`statuses` (and to `Session.invoke` for the checked variant). No code/test change — the runtime contract is unchanged and correct.
+
+### Client.Python-013
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `clients/python/src/mxgateway_cli/commands.py:757-762` |
+| Status | Resolved |
+
+**Description:** `_use_plaintext` silently returns `True` whenever the endpoint
+string starts with `localhost:` or `127.0.0.1:`, even if neither `--plaintext`
+nor `--tls` is supplied on the command line. Any CLI subcommand (e.g.
+`mxgw-py open-session --endpoint localhost:5001 --api-key mxgw_<secret>`) then
+attaches the API key to a plaintext gRPC channel without warning. This is a
+silent security downgrade: a user who deliberately ran the gateway behind TLS
+on loopback (e.g. for testing a production-shaped TLS config locally) and who
+passes `--api-key` expecting the secret to be transport-protected gets a
+plaintext bearer token instead. The auto-downgrade is also undocumented —
+`README.md` and the CLI `--help` text both describe `--plaintext` and `--tls`
+as the controls, with no mention that endpoint-prefix matching can override
+either. The other client CLIs do not auto-downgrade: the .NET CLI uses
+`https://`-prefix detection on a URI scheme (an explicit signal), Go and Java
+require an explicit `--plaintext`/`--tls` choice, and Rust defaults to
+plaintext only when `plaintext = true` is set on the options struct.
+
+**Recommendation:** Drop the localhost-prefix auto-plaintext branch and
+require the user to pass `--plaintext` or `--tls` (or default to TLS to match
+the rest of the matrix). If the implicit-localhost behaviour is kept for
+ergonomics, document it prominently in both `README.md` and `--help`, emit a
+stderr warning when `--api-key` is combined with the auto-downgrade path, and
+add a CLI test asserting the auto-downgrade is in fact active so it is not
+silently lost in a future refactor.
+
+**Resolution:** 2026-05-20 — Removed the silent `localhost:` / `127.0.0.1:`
+auto-plaintext branch from `_use_plaintext`. The new contract matches the Go
+and Java CLIs: **TLS is the default**, `--plaintext` is the only way to opt
+in to an unencrypted channel, and `--tls` is accepted as a redundant, explicit
+affirmation of the default (mutually exclusive with `--plaintext`, which now
+raises `click.UsageError`). The `--plaintext` / `--tls` `--help` text and
+`clients/python/README.md` both call out the new behaviour. Added six
+regression tests in `clients/python/tests/test_cli.py` covering: (a) a
+`localhost:` endpoint with no flags resolves to TLS, (b) a `127.0.0.1:`
+endpoint with no flags resolves to TLS, (c) `--plaintext` opts in to plaintext,
+(d) `--tls` is accepted and idempotent with the default, (e) `--plaintext`
+combined with `--tls` is rejected, and (f) an end-to-end CliRunner test
+asserting `ClientOptions.plaintext == False` flows through to
+`GatewayClient.connect` when no flag is supplied against a `localhost:`
+endpoint. **Behaviour change for callers:** scripts that previously relied on
+`mxgw-py … --endpoint localhost:5000 …` selecting plaintext silently must now
+add an explicit `--plaintext` flag (or set up TLS on the gateway). Calling
+`mxgw-py` with an `--api-key` against a plaintext-only gateway without
+`--plaintext` will now fail to connect rather than silently leaking the bearer
+token.
+
+### Client.Python-014
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `clients/python/src/mxgateway_cli/commands.py:22-23` |
+| Status | Resolved |
+
+**Description:** `commands.py` has two consecutive `from mxgateway.values
+import` lines:
+
+```python
+from mxgateway.values import to_mx_value
+from mxgateway.values import MxValueInput
+```
+
+These import from the same module and should be combined into a single
+`from mxgateway.values import MxValueInput, to_mx_value`. The split form is
+inconsistent with the rest of the file (every other module is imported in a
+single statement) and would be flagged by `ruff`/`isort` if any linter were
+configured. Pure style, no behavioural impact.
+
+**Recommendation:** Collapse the two imports into one statement, ordered to
+match the conventional alphabetical-within-module pattern:
+`from mxgateway.values import MxValueInput, to_mx_value`.
+
+**Resolution:** 2026-05-20 — Collapsed the two consecutive
+`from mxgateway.values import to_mx_value` / `from mxgateway.values import MxValueInput`
+lines in `clients/python/src/mxgateway_cli/commands.py` into a single
+`from mxgateway.values import MxValueInput, to_mx_value` statement, matching
+the alphabetical-within-module pattern used elsewhere in the file. Pure style
+fix — no behavioural impact, covered by the existing CLI tests.
+
+### Client.Python-015
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `clients/python/src/mxgateway_cli/commands.py:273-294,564-647`, `clients/python/tests/` |
+| Status | Resolved |
+
+**Description:** `_bench_read_bulk` is a ~80-line CLI body that opens its own
+session, registers, subscribe_bulks, runs a warm-up loop, a measurement loop,
+collects per-call latencies, computes a percentile summary, and emits the
+shared cross-language JSON schema. It is the largest untested CLI command in
+the module — `tests/` has no `bench_read_bulk` test, fake-stub-driven or
+otherwise. A drift in the schema field names (`callsPerSecond`,
+`cachedReadResults`, `latencyMs.p50`, …) would break the cross-language
+`scripts/bench-read-bulk.ps1` aggregation silently. `_percentile_summary` and
+`_percentile` are also untested — the boundary cases (`n == 0`, `n == 1`,
+quantile interpolation) would benefit from a small unit test since the
+identical algorithm is duplicated in the .NET / Go / Rust / Java drivers and
+a divergence would corrupt cross-language comparisons.
+
+**Recommendation:** Add a fake-stub-driven `bench_read_bulk` test that drives
+a short `--duration-seconds 0 --warmup-seconds 0` run through `CliRunner` and
+asserts the JSON schema (`language == "python"`, the full key set,
+`latencyMs.p50/p95/p99/max/mean` present). Add unit tests for `_percentile`
+covering `n == 0`, `n == 1`, and a known-good interpolated value at p95 so
+the implementation cannot silently drift from the other clients.
+
+**Resolution:** 2026-05-20 — Added `clients/python/tests/test_cli_bench_and_helpers.py`
+with three layers of coverage. (1) `_percentile` unit tests pin the
+cross-language algorithm (`rank = q * (n - 1)`, linear interpolation between
+adjacent ranks): empty sample returns `0.0`, single element returns that
+element, exact-rank queries return the sample value (p50 of `[10,20,30,40,50]`
+is `30.0`), and the interpolated p95/p99 values (`48.0` / `49.6` for that same
+five-element sample) are locked down so any drift from the .NET / Go / Rust /
+Java drivers fails fast. (2) `_percentile_summary` tests assert the full
+`{p50, p95, p99, max, mean}` dict shape, the zero-sample placeholder, and the
+3-decimal rounding contract. (3) A `bench-read-bulk` smoke test
+(`test_bench_read_bulk_emits_cross_language_schema`) drives the CLI through
+`CliRunner` with `--duration-seconds 0 --warmup-seconds 0` against a fake stub
+that handles `OpenSession`, `Register`, `SubscribeBulk`, `ReadBulk`, and
+`UnsubscribeBulk`, then asserts the emitted JSON has exactly the 16
+cross-language schema keys (`language`, `command`, `endpoint`, `clientName`,
+`bulkSize`, `durationSeconds`, `warmupSeconds`, `durationMs`, `tags`,
+`totalCalls`, `successfulCalls`, `failedCalls`, `totalReadResults`,
+`cachedReadResults`, `callsPerSecond`, `latencyMs`) and that `latencyMs` is a
+`{p50, p95, p99, max, mean}` sub-object — guarding against silent breakage of
+`scripts/bench-read-bulk.ps1`'s cross-language aggregation. No source change —
+this is a pure coverage finding.
+
+### Client.Python-016
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `clients/python/src/mxgateway_cli/commands.py:25,757-775,805-830` |
+| Status | Resolved |
+
+**Description:** Three CLI helper paths are not covered by `tests/`:
+
+1. `_use_plaintext` localhost auto-downgrade (line 762) — the
+   `endpoint.startswith("localhost:") or endpoint.startswith("127.0.0.1:")`
+   branch (see also Client.Python-013) is untested; no test asserts that an
+   endpoint without `--plaintext` and without `--tls` resolves to plaintext.
+2. `_collect_events` `MAX_AGGREGATE_EVENTS` guard (line 811-815) — passing
+   `--max-events` greater than `MAX_AGGREGATE_EVENTS` raises
+   `click.BadParameter`, but no test exercises the guard. A silent removal of
+   the constant or the comparison would not be caught.
+3. `_api_key_from_env` (line 765-768) — only the implicit path through
+   `_secrets` is exercised; there is no test that verifies an env-var name
+   resolves to a value and that an unset env var produces `None`.
+
+These are all small, fake-stub-driven CLI behaviours rather than end-to-end
+paths. The previous coverage finding (Client.Python-009) closed without
+adding tests for these specific paths.
+
+**Recommendation:** Add three small `CliRunner` / unit tests: one asserting
+the localhost auto-plaintext (or its replacement, if Client.Python-013 is
+fixed), one asserting `--max-events 10001` exits non-zero with the
+`MAX_AGGREGATE_EVENTS` error message, and one asserting
+`_api_key_from_env("MXGATEWAY_API_KEY")` returns the env value and `None` for
+an unset variable.
+
+**Resolution:** 2026-05-20 — Scope adjusted: Client.Python-013 has since
+removed the `_use_plaintext` localhost auto-plaintext branch, so item (1) is
+no longer a real code path — the
+`test_use_plaintext_requires_explicit_flag_for_localhost_endpoint` and
+`test_cli_localhost_endpoint_defaults_to_tls_via_open_session` regressions
+added under Client.Python-013 already pin the new TLS-by-default contract.
+The remaining two helpers are now covered in
+`clients/python/tests/test_cli_bench_and_helpers.py`. (2)
+`MAX_AGGREGATE_EVENTS` cap:
+`test_collect_events_rejects_max_events_above_aggregate_cap` drives
+`stream-events` with `--max-events 10001` through `CliRunner` against
+stubbed `_connect` / `_session` fakes and asserts the CLI exits non-zero with
+the documented `less than or equal to 10000` message;
+`test_collect_events_accepts_max_events_at_aggregate_cap_boundary` confirms
+`--max-events 10000` is accepted at the boundary and returns an empty event
+list. (3) `_api_key_from_env`:
+`test_api_key_from_env_resolves_value_when_variable_is_set` (env-var
+populated → returned),
+`test_api_key_from_env_returns_none_when_variable_is_unset` (env-var unset
+→ `None`), `test_api_key_from_env_returns_none_when_name_is_none` (the
+`name is None` early-return), and
+`test_api_key_from_env_returns_none_when_name_is_empty_string` (the
+`if not name` truthiness guard). No source change — pure coverage finding.
+
+### Client.Python-017
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `clients/python/pyproject.toml:5-25`, `clients/python/src/mxgateway/` |
+| Status | Resolved |
+
+**Description:** The package metadata in `pyproject.toml` is minimal for a
+published wheel:
+
+* No `authors` field. PyPI / `pip show` will display no author.
+* No `license` field, no `license-files` field, and no `LICENSE` file is
+  referenced from the project. The repo as a whole has no top-level
+  `LICENSE` either, but other client packages (Java has a license entry, the
+  .NET package has a license expression in the `csproj`) tend to set this.
+* No `classifiers` (no `Programming Language :: Python :: 3.12`,
+  `Operating System :: Microsoft :: Windows`, `Topic :: …`, no
+  development-status classifier). Without these the PyPI search facets are
+  empty and tooling like `pip` cannot tell whether the package is
+  alpha/beta/stable.
+* No `keywords`, no `[project.urls]` (no homepage / source / issue link
+  pointing back to the repo).
+* The package ships no PEP 561 `py.typed` marker file in
+  `src/mxgateway/`. Type hints are written throughout the module
+  (`from __future__ import annotations`, full annotations on every public
+  function), but downstream consumers running `mypy` on `mxaccess-gateway-client`
+  will not see those hints — PEP 561 requires the marker file to opt the
+  package into type-stub distribution.
+
+**Recommendation:** Add `authors`, `license = "<spdx>"`, `keywords`, and
+`[project.urls]` to `pyproject.toml`; add at least the standard `classifiers`
+trio (`Development Status`, `Programming Language :: Python :: 3.12`,
+`Intended Audience`); create an empty `src/mxgateway/py.typed` file and
+include it in the wheel via `[tool.setuptools.package-data]` so consumers
+running `mypy` against an installed wheel pick up the type information.
+
+**Resolution:** 2026-05-20 — Filled out `clients/python/pyproject.toml`
+with the missing PyPI metadata: `authors = [{ name = "MXAccess Gateway
+Authors" }]`, `license = "Proprietary"` (the repo has no top-level
+`LICENSE` file and no other client publishes under an OSS licence, so the
+SPDX `Proprietary` expression matches the de-facto status), the standard
+classifier set (`Development Status :: 4 - Beta`, `Intended Audience ::
+Developers` / `Information Technology`, `Operating System :: Microsoft ::
+Windows` and `:: POSIX`, `Programming Language :: Python` /
+`Python :: 3` / `Python :: 3.12`, `Topic :: Software Development ::
+Libraries :: Python Modules`, `Topic :: System :: Distributed Computing`,
+and `Typing :: Typed`), a `keywords` list
+(`mxaccess`, `archestra`, `gateway`, `grpc`, `industrial`, `scada`), and
+`[project.urls]` with `Homepage` / `Source` / `Issues` pointing at the
+Gitea repo. Added the PEP 561 marker file
+`clients/python/src/mxgateway/py.typed` (empty, as the spec requires) and
+declared it in `[tool.setuptools.package-data] mxgateway = ["py.typed"]`
+so the wheel ships the marker and downstream `mypy` users see the
+inline type hints. Pure metadata / packaging change — `python -m pytest -q`
+still passes (91 tests).