Review the clients/ language clients; mark Server-001/003 resolved

Adds per-module code reviews for the five language clients under
clients/ (Client.Dotnet, Client.Go, Client.Java, Client.Python,
Client.Rust) at commit 3cc53a8 — 53 findings (4 High, 15 Medium,
34 Low; all Open). Extends REVIEW-PROCESS.md so a "module" may also be
a language client under clients/, not only a src/ project.

Marks Server-001 (Critical) and Server-003 (High) Resolved — fixed in
a8aafdf — and regenerates code-reviews/README.md (now 11 modules).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-18 16:51:00 -04:00
parent a8aafdf974
commit f0a4af62b9
8 changed files with 1006 additions and 15 deletions
+8 -6
View File
@@ -3,15 +3,17 @@
This document describes how to perform a comprehensive, per-module code review of
the `mxaccessgw` codebase and how to track findings to resolution.
A **module** is one buildable project under `src/` (e.g. `src/MxGateway.Worker`).
Each module has its own folder under `code-reviews/` containing a single
`findings.md`.
A **module** is one buildable project under `src/` (e.g. `src/MxGateway.Worker`)
or one language client under `clients/` (e.g. `clients/rust`). Each module has
its own folder under `code-reviews/` containing a single `findings.md`.
## 1. Before you start
1. Pick the module to review. Its folder is `code-reviews/<Module>/` where
`<Module>` is the project name with the `MxGateway.` prefix stripped — so
`src/MxGateway.Server` is reviewed in `code-reviews/Server/`.
1. Pick the module to review. Its folder is `code-reviews/<Module>/`:
- For a `src/` project, `<Module>` is the project name with the `MxGateway.`
prefix stripped — `src/MxGateway.Server` is reviewed in `code-reviews/Server/`.
- For a language client, `<Module>` is `Client.<Lang>``clients/rust` is
reviewed in `code-reviews/Client.Rust/`.
2. Identify the design context for the module:
- `gateway.md` — top-level architecture, command/event surface, IPC envelope,
STA thread model, fault handling.
+147
View File
@@ -0,0 +1,147 @@
# Code Review — Client.Dotnet
| Field | Value |
|---|---|
| Module | `clients/dotnet` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 8 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Minor: handle-selector fallback `?? reply.ReturnValue.Int32Value` can mask a missing typed reply (Client.Dotnet-005); CLI redactor misses env-var keys (Client.Dotnet-008). |
| 2 | mxaccessgw conventions | Good — consumes the shared contracts project, no forked proto, `authorization: Bearer` metadata correct, parity preserved via split `EnsureProtocolSuccess`/`EnsureMxAccessSuccess`. |
| 3 | Concurrency & thread safety | Issue found: `_disposed` flags unsynchronized; `MxGatewaySession.DisposeAsync` can race a concurrent `CloseAsync` (Client.Dotnet-003). |
| 4 | Error handling & resilience | Issues found: gRPC-to-native mapping collapses non-auth statuses into one untyped exception (Client.Dotnet-001); shared retry/timeout budget (Client.Dotnet-004). |
| 5 | Security | Good — API key never logged by the library, CLI redacts keys, TLS custom-root validation correct. |
| 6 | Performance & resource management | No issues found — channels and streaming calls disposed correctly. |
| 7 | Design-document adherence | No issues found — matches `ClientLibrariesDesign.md`. |
| 8 | Code organization & conventions | Issue found: undocumented public members (Client.Dotnet-006). |
| 9 | Testing coverage | Issue found: the production retry path is never exercised (Client.Dotnet-002). |
| 10 | Documentation & comments | Issue found: doc misstates the unary timeout retry budget as per-call (Client.Dotnet-004, Client.Dotnet-007). |
## Findings
### Client.Dotnet-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` |
| Status | Open |
**Description:** `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into the base `MxGatewayException` with no surfaced `StatusCode`. Callers cannot programmatically distinguish a transient outage from a permanent bad-argument error without reflecting into `InnerException` and downcasting to `RpcException`.
**Recommendation:** Carry the gRPC `StatusCode` on `MxGatewayException` (e.g. a `StatusCode` property) and/or add typed subclasses for at least `NotFound`, `InvalidArgument`, and `Unavailable`. Populate it from `exception.StatusCode` in `MapRpcException`.
**Resolution:** _(open)_
### Client.Dotnet-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` |
| Status | Open |
**Description:** The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException``MxGatewayException` before it reaches the retry pipeline, so only the wrapped-`MxGatewayException` branch ever runs in production. But `FakeGatewayTransport` throws the raw `RpcException` and never maps it, so every retry test exercises only the raw-`RpcException` branch — the branch that never occurs in production. The production retry behaviour is effectively untested.
**Recommendation:** Add a fake/transport mode that maps `RpcException` to `MxGatewayException` the way `GrpcMxGatewayClientTransport` does (or add tests that enqueue a pre-wrapped `MxGatewayException`), so the actually-used predicate branch is covered.
**Resolution:** _(open)_
### Client.Dotnet-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Status | Open |
**Description:** `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract — disposing the `SemaphoreSlim` while a `WaitAsync` is pending throws `ObjectDisposedException` into that caller. The `_disposed` flags in both clients are also plain unsynchronised `bool` reads/writes; `ThrowIfDisposed` racing `DisposeAsync` can observe a stale value.
**Recommendation:** Either document `MxGatewaySession`/`MxGatewayClient` as not thread-safe for concurrent dispose, or guard `_disposed` with `Interlocked`/`volatile` and avoid disposing `_closeLock` until all in-flight `CloseAsync` calls complete.
**Resolution:** _(open)_
### Client.Dotnet-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` |
| Status | Open |
**Description:** `ExecuteSafeUnaryAsync` wraps the whole Polly retry pipeline in a single linked CTS cancelled after `Options.DefaultCallTimeout`, while `CreateCallOptions` also stamps each individual call with a `DefaultCallTimeout` gRPC deadline. The retry pipeline therefore shares one `DefaultCallTimeout` budget across the initial attempt plus all retries plus backoff delays. The README/XML docs describe `DefaultCallTimeout` as a per-call timeout, which misrepresents this. `DeadlineExceeded` is also classified as transient, so an attempt that exhausts the shared budget is retried only to immediately fail again.
**Recommendation:** Decide whether `DefaultCallTimeout` is per-attempt or per-operation and make code and docs consistent — e.g. a separate per-attempt deadline and a distinct overall-operation timeout. Reconsider retrying on `DeadlineExceeded` when the deadline was client-imposed.
**Resolution:** _(open)_
### Client.Dotnet-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` |
| Status | Open |
**Description:** `RegisterAsync`/`AddItemAsync`/`AddItem2Async` return `reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value`. After `EnsureMxAccessSuccess()` passes, a missing typed payload silently falls back to `ReturnValue.Int32Value`, which for a reply carrying no return value is `0`. A caller then uses `0` as a `ServerHandle`/`ItemHandle`, producing a confusing downstream invalid-handle failure rather than a clear "gateway reply missing payload" error.
**Recommendation:** If the typed sub-message is the contract for these commands, treat its absence on an otherwise-successful reply as an error (throw a descriptive `MxGatewayException`) rather than falling through to `ReturnValue.Int32Value`.
**Resolution:** _(open)_
### Client.Dotnet-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` |
| Status | Open |
**Description:** `MxGatewayClientOptions.MaxGrpcMessageBytes` and the two `const`s in `MxGatewayClientContractInfo` are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# style emphasis on a documented public surface.
**Recommendation:** Add `<summary>` doc comments to `MaxGrpcMessageBytes`, `GatewayProtocolVersion`, and `WorkerProtocolVersion`.
**Resolution:** _(open)_
### Client.Dotnet-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` |
| Status | Open |
**Description:** The `AcknowledgeAlarmAsync` XML comment states the gateway authenticates against an `invoke:alarm-ack` scope, but `CLAUDE.md` documents the scope set without any `invoke:alarm-ack` sub-scope. The comment may describe an intended finer-grained scope that does not exist, misleading integrators about what API key they need.
**Recommendation:** Reconcile the comment with the actual server-side scope check, or update the scope documentation if sub-scopes were genuinely added; keep client doc and gateway auth model in sync.
**Resolution:** _(open)_
### Client.Dotnet-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` |
| Status | Open |
**Description:** The CLI redactor only removes the API key string when it was supplied via `--api-key`; `RunCoreAsync` passes `arguments.GetOptional("api-key")` to `Redact`. When the key comes from an environment variable (`--api-key-env`, the documented default path), `apiKey` is `null` and no redaction occurs. If a gRPC/transport error message ever echoes the bearer token, it would be printed unredacted.
**Recommendation:** Resolve the effective API key (same logic as `ResolveApiKey`) before redacting, so the env-var-sourced key is also stripped from error output.
**Resolution:** _(open)_
+177
View File
@@ -0,0 +1,177 @@
# Code Review — Client.Go
| Field | Value |
|---|---|
| Module | `clients/go` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 10 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a typed-nil `Unwrap`/`errors.As` trap (Client.Go-001), a CLI `panic` on malformed input (Client.Go-003), empty-string correlation id on rand failure (Client.Go-007). |
| 2 | mxaccessgw conventions | Generally good; two test files fail `gofmt`, breaking the documented workflow (Client.Go-004). |
| 3 | Concurrency & thread safety | No issues found — stream goroutines and cancellation are sound. |
| 4 | Error handling & resilience | Issues found: the compatibility event path silently drops events (Client.Go-002); no transient/permanent classification (Client.Go-006). |
| 5 | Security | No issues found — TLS by default with a TLS 1.2 floor, API key redaction, no secret logging. |
| 6 | Performance & resource management | No issues found — connections/streams closed via deferred `Close`/`cancel`. |
| 7 | Design-document adherence | Issues found: deprecated `grpc.DialContext`+`WithBlock` usage and a missing error taxonomy (Client.Go-005, Client.Go-006). |
| 8 | Code organization & conventions | Issue found: duplication between `Client` and `GalaxyClient` (Client.Go-009). |
| 9 | Testing coverage | Issue found: TLS path, `callContext` deadline logic, and `NativeValue`/`NativeArray` edges untested (Client.Go-008). |
| 10 | Documentation & comments | Issue found: a stale `WithBlock` dial-cancellation claim (Client.Go-010). |
## Findings
### Client.Go-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` |
| Status | Open |
**Description:** `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()` returns a non-nil `error` interface wrapping a nil pointer. Consequently `errors.As(err, &ce)` for `*CommandError` returns `true` while setting `ce` to nil — a caller writing the idiomatic `if errors.As(err, &commandErr) { use commandErr.Status }` nil-dereferences and panics. Verified empirically; the existing test only exercises the populated-`Command` path.
**Recommendation:** Make `Unwrap` return an untyped nil when `Command` is nil: `if e == nil || e.Command == nil { return nil }; return e.Command`. Add a test for the HRESULT-only `MxAccessError` asserting `errors.As(err, &ce)` is `false`.
**Resolution:** _(open)_
### Client.Go-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/session.go:440-516` |
| Status | Open |
**Description:** For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consumer sees the channel close with **no `EventResult{Err: ...}` ever delivered**. A slow consumer cannot distinguish "stream ended normally" from "events were silently dropped." This contradicts the design doc's "libraries should not reorder, coalesce, or drop events by default", and a test currently pins this lossy behaviour.
**Recommendation:** Before cancelling on a full buffer, deliver a terminal `EventResult` carrying an explicit error (e.g. `ErrEventBufferOverflow`). Document the behaviour on `Session.Events`; steer callers to `SubscribeEvents` (which blocks instead of dropping).
**Resolution:** _(open)_
### Client.Go-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Status | Open |
**Description:** `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead of returning a clean error and exit code 2 like every other validation path in `main.go`.
**Recommendation:** Change `parseInt32List` to return `([]int32, error)` and have `runUnsubscribeBulk` propagate the error, matching `parseValue`'s pattern.
**Resolution:** _(open)_
### Client.Go-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Status | Open |
**Description:** `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated checks and CI.
**Recommendation:** Run `gofmt -w mxgateway/alarms_test.go mxgateway/galaxy_test.go`.
**Resolution:** _(open)_
### Client.Go-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Status | Open |
**Description:** The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time becomes a hard `Dial` error rather than a connection that recovers when the gateway comes up, working against the design doc's resilience intent.
**Recommendation:** Migrate to `grpc.NewClient`; if a fail-fast connect probe is still wanted, do an explicit readiness wait bounded by `DialTimeout`, and update the doc comment.
**Resolution:** _(open)_
### Client.Go-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/go/mxgateway/errors.go:9-130` |
| Status | Open |
**Description:** `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classify transient (`Unavailable`, `DeadlineExceeded`) vs permanent (`Unauthenticated`, `InvalidArgument`) without manually unwrapping and calling `status.Code`.
**Recommendation:** Add a helper (e.g. `IsTransient(err) bool`) or expose the gRPC `codes.Code` on `GatewayError`, so retry/timeout/auth handling can be written without re-parsing the wrapped error.
**Resolution:** _(open)_
### Client.Go-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/mxgateway/session.go:526-532` |
| Status | Open |
**Description:** `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse than failing loud, and the empty-id path is untested.
**Recommendation:** Either propagate the error up through `invokeCommand`, or fall back to a time/counter-based id rather than an empty string.
**Resolution:** _(open)_
### Client.Go-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/mxgateway/` (test files) |
| Status | Open |
**Description:** Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disable case; and `NativeValue`/`NativeArray` for the array, raw-bytes, null, and unsupported-kind branches.
**Recommendation:** Add unit tests for `resolveTransportCredentials` precedence, `callContext` deadline arithmetic, and `NativeValue`/`NativeArray` round-trips for every kind.
**Resolution:** _(open)_
### Client.Go-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` |
| Status | Open |
**Description:** `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migration) must be applied twice and can drift.
**Recommendation:** Extract a shared unexported `dial(ctx, opts)` and a free `callContext(opts, ctx)` function, and have both client constructors call them.
**Resolution:** _(open)_
### Client.Go-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/go/mxgateway/client.go:39-40` |
| Status | Open |
**Description:** The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancellation work.
**Recommendation:** Reword to describe the actual connect/timeout semantics after resolving Client.Go-005, and clarify that `DialTimeout` bounds the initial connect attempt.
**Resolution:** _(open)_
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Java
| Field | Value |
|---|---|
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 12 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: `register`/`addItem` silently fall back to `getReturnValue()` masking missing payloads (Client.Java-004); fragile `resolved()` mutation pattern (Client.Java-012). |
| 2 | mxaccessgw conventions | Largely adheres; the gateway protocol-version handshake is never verified despite the contract field existing (Client.Java-003). |
| 3 | Concurrency & thread safety | Issue found: `MxEventStream.next` is a plain field and terminal-state transitions race (Client.Java-002). |
| 4 | Error handling & resilience | Issues found: `close()` can mask the primary exception (Client.Java-005); async/sync error surfaces inconsistent (Client.Java-008). |
| 5 | Security | Issue found: API-key redaction leaks the trailing 4 secret characters (Client.Java-001). |
| 6 | Performance & resource management | Issues found: `close()` does not await termination (Client.Java-006); no stream flow control (Client.Java-011). |
| 7 | Design-document adherence | Matches `JavaClientDesign.md` closely; the protocol-version check is undocumented-missing (Client.Java-003). |
| 8 | Code organization & conventions | Issue found: ~80 duplicated lines across the two clients (Client.Java-009). |
| 9 | Testing coverage | Issue found: alarm RPCs, TLS setup, async streams, and queue overflow untested (Client.Java-007). |
| 10 | Documentation & comments | Issue found: README/Javadoc assert undocumented scope names (Client.Java-010). |
## Findings
### Client.Java-001
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Status | Open |
**Description:** `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the actual secret into logs, CLI JSON output (`CommonOptions.redactedJsonMap`), and `MxGatewayClientOptions.toString()`. CLAUDE.md states API keys must never reach logs.
**Recommendation:** Redact the secret entirely. Show only a stable non-secret prefix (e.g. the `mxgw_<key-id>_` portion) and mask everything after it, or emit a fixed `mxgw_***` form. Do not echo any trailing characters of the secret.
**Resolution:** _(open)_
### Client.Java-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Status | Open |
**Description:** The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is enqueued after `END` and never observed — the contract that "next() throws after overflow" is not guaranteed once `close()` has been called.
**Recommendation:** Document single-consumer-thread usage explicitly in the Javadoc, and serialise terminal state transitions (overflow vs END vs close) behind a single guarded flag so the first terminal condition wins deterministically.
**Resolution:** _(open)_
### Client.Java-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | mxaccessgw conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
| Status | Open |
**Description:** `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtocolVersion()` nor compares it against the compiled-in version. A client built against an older/newer contract issues commands blindly and fails with confusing downstream errors instead of a clear version-mismatch failure.
**Recommendation:** In `openSession`/`openSessionRaw`, compare `reply.getGatewayProtocolVersion()` with `MxGatewayClientVersion.gatewayProtocolVersion()` and throw a typed `MxGatewayException` on mismatch.
**Resolution:** _(open)_
### Client.Java-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` |
| Status | Open |
**Description:** `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, the method silently returns `0` — indistinguishable from a legitimate handle of 0. This masks a contract violation rather than surfacing it.
**Recommendation:** If the expected typed payload is absent and no `return_value` is present, throw `MxGatewayException` (protocol violation) instead of returning `0`.
**Resolution:** _(open)_
### Client.Java-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` |
| Status | Open |
**Description:** `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces the original exception as the propagated throwable (the body exception becomes a suppressed exception) — a known try-with-resources footgun for I/O-performing `close()`.
**Recommendation:** Either make `close()` swallow/log close-time failures (keeping `closeRaw()` for callers who want the result), or document clearly that `close()` performs a network call that can throw.
**Resolution:** _(open)_
### Client.Java-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Status | Open |
**Description:** `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the caller assumes the resource is released. `closeAndAwaitTermination()` does it correctly but is not the method try-with-resources uses, and the README examples all rely on try-with-resources.
**Recommendation:** Have `close()` await termination for a bounded time and `shutdownNow()` on timeout (the logic already in `closeAndAwaitTermination()`), or document that try-with-resources callers should call `closeAndAwaitTermination()`.
**Resolution:** _(open)_
### Client.Java-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Status | Open |
**Description:** The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription` pre-start cancellation, and `MxEventStream` queue overflow are likewise untested. `JavaClientDesign.md` explicitly lists async stream-observer cancellation and status/error mapping as required tests.
**Recommendation:** Add in-process gRPC tests for the alarm RPCs, the async streaming/subscription cancellation paths, and at least one TLS-config construction test.
**Resolution:** _(open)_
### Client.Java-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
| Status | Open |
**Description:** `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation, unlike the synchronous paths which normalise via `try/catch`. The async and sync error surfaces are therefore inconsistent.
**Recommendation:** Wrap the `thenApply` body so any non-`MxGatewayException` is routed through `MxGatewayErrors.fromGrpc`, matching the synchronous methods.
**Resolution:** _(open)_
### Client.Java-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` |
| Status | Open |
**Description:** `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other.
**Recommendation:** Extract the channel-builder and future-adaptor helpers into a shared package-private utility class.
**Resolution:** _(open)_
### Client.Java-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` |
| Status | Open |
**Description:** The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc/README assert a scope contract the project's own auth documentation does not corroborate.
**Recommendation:** Reconcile the scope names with `src/MxGateway.Server/Security/` and CLAUDE.md; correct the Javadoc/README to the actual scope strings, or fix CLAUDE.md if sub-scopes were genuinely added.
**Resolution:** _(open)_
### Client.Java-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` |
| Status | Open |
**Description:** The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an immediate stream-fault cancel. This is consistent with the documented fail-fast event-backpressure design, but the client never applies real flow control, so even brief consumer stalls kill the subscription.
**Recommendation:** Confirm fail-fast is intended (it appears to be); if so, document it on `MxEventStream` so callers know a slow consumer terminates the stream. Optionally expose the queue capacity or opt-in flow control.
**Resolution:** _(open)_
### Client.Java-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` |
| Status | Open |
**Description:** `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it silently emits empty-string defaults. The "return this after mutating" pattern is fragile and surprising.
**Recommendation:** Make `resolved()` return an immutable resolved value object, or compute `resolvedApiKey`/`resolvedTimeout` lazily in their getters so call ordering cannot produce stale output.
**Resolution:** _(open)_
+207
View File
@@ -0,0 +1,207 @@
# Code Review — Client.Python
| Field | Value |
|---|---|
| Module | `clients/python` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 12 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: dead `closed` variable (Client.Python-004); float/bytes value-mapping assumptions (Client.Python-008). |
| 2 | mxaccessgw conventions | Largely adheres; one missing export and a `*_raw` MXAccess-failure documentation gap (Client.Python-002, Client.Python-012). |
| 3 | Concurrency & thread safety | Issue found: `close()` idempotency claim does not hold under concurrent close (Client.Python-006). |
| 4 | Error handling & resilience | Issues found: inconsistent timeout-kwarg fallback (Client.Python-003); `success == 0` default-value hazard (Client.Python-011); inconsistent cancel helpers (Client.Python-007). |
| 5 | Security | No issues found — API keys redacted in repr and CLI output, TLS supported, no secret logging. |
| 6 | Performance & resource management | Issue found: `discover_hierarchy` buffers the whole hierarchy in memory (Client.Python-005). |
| 7 | Design-document adherence | Matches the design docs closely; minor CLI doc drift (Client.Python-001). |
| 8 | Code organization & conventions | Issues found: `MxGatewayCommandError` omitted from `__all__` (Client.Python-002); fragile circular-import workaround (Client.Python-010). |
| 9 | Testing coverage | Issue found: `write2`, `add_item2`, bulk-size limits, TLS `ca_file`, and CLI command bodies untested (Client.Python-009). |
| 10 | Documentation & comments | Issue found: stale "scaffold" package description (Client.Python-001). |
## Findings
### Client.Python-001
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` |
| Status | Open |
**Description:** The package `description` in `pyproject.toml` still says "Async Python client *scaffold*" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The `mxgw-py` console-script name is itself consistent between `pyproject.toml` and the README.)
**Recommendation:** Update the `pyproject.toml` description to drop "scaffold"; keep README CLI examples in sync with the actual `mxgw-py` entry point.
**Resolution:** _(open)_
### Client.Python-002
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/__init__.py:27` |
| Status | Open |
**Description:** `MxGatewayCommandError` is imported into `__init__.py` and is a documented public exception, but it is missing from `__all__`. It is the parent of `MxAccessError` and a meaningful catch target, so omitting it from the public surface is inconsistent — `from mxgateway import *` will not expose it and tooling that respects `__all__` treats it as private.
**Recommendation:** Add `"MxGatewayCommandError"` to the `__all__` list.
**Resolution:** _(open)_
### Client.Python-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:125-137,155-173` |
| Status | Open |
**Description:** `stream_events_raw` and `query_active_alarms` call the stub directly with a `timeout` kwarg when `stream_timeout` is set, with no `TypeError` fallback. `galaxy.py:watch_deploy_events` and `_unary` *do* have a fallback that strips `timeout` if the callable rejects it. This asymmetry means a fake/older stub that does not accept `timeout` crashes for gateway streams but not Galaxy streams. It is only masked today because `stream_timeout` defaults to `None`.
**Recommendation:** Apply the same `try/except TypeError` timeout-fallback pattern to `stream_events_raw` and `query_active_alarms`, or remove the fallback everywhere and standardise on a single behaviour.
**Resolution:** _(open)_
### Client.Python-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway_cli/commands.py:386,402-404` |
| Status | Open |
**Description:** In `_smoke`, the local variable `closed` is set to `False` and never reassigned; the `finally` block's `if not closed:` is therefore always true. This is dead/misleading code suggesting a removed early-close path.
**Recommendation:** Remove the `closed` variable and the `if not closed:` guard; call `await session.close()` directly in the `finally` block (or use `async with session:`).
**Resolution:** _(open)_
### Client.Python-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Performance & resource management |
| Location | `clients/python/src/mxgateway/galaxy.py:117-140` |
| Status | Open |
**Description:** `discover_hierarchy` pages through the entire Galaxy object hierarchy and accumulates every `GalaxyObject` (each carrying its full attribute list) into a single in-memory `list` before returning. For a large Galaxy this is a very large allocation with no streaming alternative and no caller-side bound.
**Recommendation:** Offer an async-generator variant (e.g. `iter_hierarchy()`) that yields objects/pages as they arrive, keeping `discover_hierarchy()` as a convenience wrapper. At minimum document the memory characteristic.
**Resolution:** _(open)_
### Client.Python-006
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` |
| Status | Open |
**Description:** `close()` on the clients and `Session.close()` use a plain `self._closed` check-then-set with an `await` between, with no lock. If two coroutines call `close()` concurrently both can pass the guard before either sets it, causing a double `channel.close()` / double `CloseSession` RPC. Single-task usage is the documented contract, so impact is low, but the idempotency guarantee asserted in docstrings only holds for sequential calls.
**Recommendation:** Set `self._closed = True` before the `await`, or guard with an `asyncio.Lock`, so the idempotency claim holds under concurrent close.
**Resolution:** _(open)_
### Client.Python-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/client.py:204-213` |
| Status | Open |
**Description:** `_canceling_iterator` (gateway event stream) does not catch `asyncio.CancelledError` to invoke `call.cancel()` explicitly — it relies on the `finally` block. `galaxy.py:_canceling_iterator` *does* explicitly catch `CancelledError`, cancel, and re-raise. The two are functionally equivalent today, but the inconsistency between near-identical helpers invites future divergence.
**Recommendation:** Make the two `_canceling_iterator` helpers identical, ideally by factoring a single shared helper.
**Resolution:** _(open)_
### Client.Python-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/mxgateway/values.py:62-67,83-88` |
| Status | Open |
**Description:** `to_mx_value` maps any Python `float` to `VT_R8`/`MX_DATA_TYPE_DOUBLE` with no handling for `nan`/`inf`, which are serialised and forwarded to MXAccess which may reject or mis-handle them. `bytes` is mapped to `VT_RECORD`/`MX_DATA_TYPE_UNKNOWN`, a questionable default. The `data_type` keyword exists but `Session.write` never forwards it.
**Recommendation:** Document the float/bytes mapping assumptions, optionally validate finiteness, and consider plumbing the `data_type` keyword through `Session.write`/`write2`.
**Resolution:** _(open)_
### Client.Python-009
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `clients/python/tests/` |
| Status | Open |
**Description:** Several non-trivial public paths are untested: `Session.write2`/`add_item2` request construction; the bulk-size limit `_ensure_bulk_size`/`MAX_BULK_ITEMS` guard; the `None`-argument `TypeError` guards in bulk methods; the TLS `ca_file` read path in `create_channel`; most CLI command bodies; and `map_rpc_error`'s default (non-auth) branch.
**Recommendation:** Add tests for `write2`/`add_item2` request shape, the bulk-size `ValueError`, the `ca_file` TLS branch, the generic `map_rpc_error` fallthrough, and at least one happy-path CLI command using a fake stub.
**Resolution:** _(open)_
### Client.Python-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` |
| Status | Open |
**Description:** `session.py` ends with a module-level late import `from .client import GatewayClient # noqa: E402` purely to satisfy a string type hint, and `commands.py:_session` does a function-local import. Both work around a circular dependency that `from __future__ import annotations` (already in effect) makes unnecessary. `_session` also lacks a return type annotation.
**Recommendation:** Drop the runtime late import in `session.py` and use a `TYPE_CHECKING`-guarded import for the hint; add the `-> Session` return annotation to `commands.py:_session`.
**Resolution:** _(open)_
### Client.Python-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/python/src/mxgateway/errors.py:122-148` |
| Status | Open |
**Description:** `ensure_mxaccess_success` raises `MxAccessError` if any `mx_status.success == 0`. This treats `success == 0` as the failure sentinel, but `0` is also the proto3 scalar default for an unset `MxStatusProxy`. If the gateway ever returns a reply with an unpopulated status entry (e.g. a partially-filled bulk result), the client raises `MxAccessError` even though no real failure occurred.
**Recommendation:** Confirm against the proto/gateway contract whether `success` is guaranteed populated for every `statuses` entry; if not, key the failure decision on an explicit failure field rather than the `success == 0` default.
**Resolution:** _(open)_
### Client.Python-012
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` |
| Status | Open |
**Description:** `Session.invoke_raw` does not run `ensure_mxaccess_success` while `Session.invoke` does, so a caller using `invoke_raw` for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by design but under-documented — the README's "preserve raw replies" sentence does not state that `*_raw` methods skip MXAccess-failure detection entirely.
**Recommendation:** Document explicitly (README + docstring) that `*_raw` methods surface MXAccess HRESULT/status failures only inside the reply and do not raise `MxAccessError`, so parity-test callers know to inspect `protocol_status`/`hresult`/`statuses` themselves.
**Resolution:** _(open)_
+192
View File
@@ -0,0 +1,192 @@
# Code Review — Client.Rust
| Field | Value |
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-05-18 |
| Commit reviewed | `3cc53a8` |
| Status | Reviewed |
| Open findings | 11 |
## Checklist coverage
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: a stale unit test fails the suite (Client.Rust-003); handle extractors silently return 0 on a shapeless OK reply (Client.Rust-005). |
| 2 | mxaccessgw conventions | `cargo clippy --workspace --all-targets -- -D warnings` fails (Client.Rust-001, Client.Rust-002), violating a CLAUDE.md hard requirement; hard-coded correlation ids (Client.Rust-011). |
| 3 | Concurrency & thread safety | No issues found — clients are cheaply cloneable, streams are `Send`, drop-cancels-call is verified. |
| 4 | Error handling & resilience | Issues found: empty-vec on shapeless bulk reply (Client.Rust-006); no transient/permanent classification (Client.Rust-010). |
| 5 | Security | No issues found — API keys redacted in `Debug`/`Display`, status messages scrubbed, TLS handled correctly. |
| 6 | Performance & resource management | Issue found: value/array projections clone every element, doubling array memory (Client.Rust-008). |
| 7 | Design-document adherence | Issue found: `RustClientDesign.md` documents a stale crate layout and an unused `tracing` dependency (Client.Rust-007). |
| 8 | Code organization & conventions | Issue found: `BulkReplyKind` trips a clippy lint; undocumented public methods (Client.Rust-001, Client.Rust-002). |
| 9 | Testing coverage | Issue found: TLS setup, mid-stream fault propagation, and the bulk-size cap untested (Client.Rust-009). |
| 10 | Documentation & comments | Issue found: the version-constant doc comment is wrong (Client.Rust-004). |
## Findings
### Client.Rust-001
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/options.rs:98,143` |
| Status | Open |
**Description:** `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` these become hard errors, so clippy fails to compile the crate — breaking the documented build/test workflow for the module.
**Recommendation:** Add doc comments to both methods, e.g. `/// Maximum encoded/decoded gRPC message size in bytes (default 16 MiB).`
**Resolution:** _(open)_
### Client.Rust-002
| Field | Value |
|---|---|
| Severity | High |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:522` |
| Status | Open |
**Description:** The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is a compile error, so `cargo clippy --workspace --all-targets -- -D warnings` fails — a violation of the CLAUDE.md requirement that clippy pass cleanly.
**Recommendation:** Rename the variants to drop the common suffix (e.g. `AddItem`, `AdviseItem`, …) or add a narrowly-scoped `#[allow(clippy::enum_variant_names)]` with a reason comment.
**Resolution:** _(open)_
### Client.Rust-003
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:1051` |
| Status | Open |
**Description:** The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtocolVersion = 3`. The test fails, so `cargo test --workspace` (the documented test step) does not pass — the test was not updated when the protocol version was bumped.
**Recommendation:** Update the assertion to `3`, or better, assert against `GATEWAY_PROTOCOL_VERSION` so it cannot drift again.
**Resolution:** _(open)_
### Client.Rust-004
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/rust/src/version.rs:7` |
| Status | Open |
**Description:** `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest.
**Recommendation:** Either set `CLIENT_VERSION` from the build via `env!("CARGO_PKG_VERSION")`, or correct the constant to `"0.1.0"` and drop the "Mirrors Cargo.toml" claim.
**Resolution:** _(open)_
### Client.Rust-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/src/session.rs:489-520` |
| Status | Open |
**Description:** `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because `Session::invoke` has already confirmed `protocol_status == Ok`, a malformed-but-OK reply silently yields handle `0`, which the caller then uses as a real handle against the worker.
**Recommendation:** Return `Err(Error::ProtocolStatus { … })` (or a dedicated `Error::MalformedReply`) when an OK reply lacks an extractable handle, instead of defaulting to `0`.
**Resolution:** _(open)_
### Client.Rust-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `clients/rust/src/session.rs:531-555` |
| Status | Open |
**Description:** `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result vector and cannot distinguish "zero items processed" from "gateway returned a shapeless reply".
**Recommendation:** Treat a missing/mismatched bulk payload on an OK reply as an error rather than an empty vector, or document the empty-vec fallback explicitly and log it.
**Resolution:** _(open)_
### Client.Rust-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:14-55` |
| Status | Open |
**Description:** `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `tracing` appears in no `Cargo.toml`. CLAUDE.md requires docs to change with the source.
**Recommendation:** Update `RustClientDesign.md` to the actual flat layout and remove `tracing` from the dependency list (or add `tracing` if structured logging is genuinely intended).
**Resolution:** _(open)_
### Client.Rust-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/rust/src/value.rs:161-261` |
| Status | Open |
**Description:** `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its payload, wasteful for large string arrays or raw blobs arriving on the event stream.
**Recommendation:** Compute the projection lazily on demand, or have the projection borrow from `raw`, so array/raw payloads are not duplicated for every wrapped value.
**Resolution:** _(open)_
### Client.Rust-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` |
| Status | Open |
**Description:** Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap (`ensure_bulk_size` rejecting >1000 items).
**Recommendation:** Add tests that (a) feed an `Err(Status)` into the event/deploy streams and assert it surfaces as the mapped `Error`, (b) assert `add_item_bulk` with 1001 items returns `Error::InvalidArgument`, and (c) exercise the CA-file/`InvalidEndpoint` error path.
**Resolution:** _(open)_
### Client.Rust-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` |
| Status | Open |
**Description:** The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is indistinguishable from a permanent failure. This is an acceptable v1 stance but is undocumented.
**Recommendation:** Either add a documented `Error::Unavailable` variant classifying `Code::Unavailable`/`Code::ResourceExhausted`, or explicitly document in the README that the client performs no retries and that transient failures arrive as `Error::Status`.
**Resolution:** _(open)_
### Client.Rust-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | mxaccessgw conventions |
| Location | `clients/rust/src/session.rs:469` |
| Status | Open |
**Description:** `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request/reply pair in gateway logs or among concurrent in-flight calls. MXAccess parity diagnostics rely on correlation ids being unique per call.
**Recommendation:** Append a per-call unique suffix (monotonic counter or UUID) to the correlation id, or expose a way for the caller to supply one.
**Resolution:** _(open)_
+63 -4
View File
@@ -10,9 +10,14 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 8 | 8 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 10 | 10 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 12 | 12 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 12 | 12 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-05-18 | `3cc53a8` | Reviewed | 11 | 11 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 8 | 8 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 10 | 10 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 14 | 14 |
| [Server](Server/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 12 | 14 |
| [Tests](Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 12 | 12 |
| [Worker](Worker/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 15 | 15 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-05-18 | `6c64030` | Reviewed | 15 | 15 |
@@ -23,10 +28,12 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Server-001 | Critical | Security | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` | The dashboard authorization policy (`DashboardAuthenticationDefaults.AuthorizationPolicy`), `DashboardAuthorizationRequirement`, and `DashboardAuthorizationHandler` are registered in DI but never applied to any endpoint. `MapRazorComponent… |
| Client.Go-001 | High | Correctness & logic bugs | `clients/go/mxgateway/errors.go:88-93`, `clients/go/mxgateway/errors.go:117-128` | `MxAccessError.Unwrap` returns `e.Command` directly. `EnsureMxAccessSuccess` constructs `&MxAccessError{Reply: reply}` with `Command` left nil (the HRESULT / failing-`MxStatusProxy` path). When `Command` is a nil `*CommandError`, `Unwrap()… |
| Client.Rust-001 | High | mxaccessgw conventions | `clients/rust/src/options.rs:98,143` | `with_max_grpc_message_bytes` and `max_grpc_message_bytes` have no `///` doc comments. The crate sets `#![warn(missing_docs)]` and CLAUDE.md mandates that `cargo clippy --workspace --all-targets -- -D warnings` pass. Under `-D warnings` th… |
| Client.Rust-002 | High | mxaccessgw conventions | `clients/rust/src/session.rs:522` | The `BulkReplyKind` enum's variants (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) all share the `Bulk` suffix, tripping `clippy::enum_variant_names`. Under `-D warnings` this is… |
| Client.Rust-003 | High | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:1051` | The unit test `version_json_output_has_protocol_versions` asserts `value["gatewayProtocolVersion"] == 2`, but `GATEWAY_PROTOCOL_VERSION` is `3` (version.rs:10), matching the authoritative server constant `GatewayContractInfo.GatewayProtoco… |
| IntegrationTests-001 | High | Design-document adherence | `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs` | The Galaxy Repository live test suite and its gating env var `MXGATEWAY_RUN_LIVE_GALAXY_TESTS` (plus connection-string override `MXGATEWAY_LIVE_GALAXY_CONN`) are completely absent from `docs/GatewayTesting.md`. CLAUDE.md mandates updating… |
| IntegrationTests-002 | High | Design-document adherence | `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13`, `src/MxGateway.Server/Configuration/LdapOptions.cs:27` | `DashboardLdapLiveTests` builds the authenticator with `new GatewayOptions()`, so it relies on `LdapOptions.RequiredGroup` defaulting to `GwAdmin` and asserts the `admin` user is a member of a `GwAdmin` LDAP group. `glauth.md` does not lis… |
| Server-003 | High | Security | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` | When `Dashboard:RequireAdminScope` is true (the default) and the request is not loopback, `DashboardAuthorizationHandler` succeeds only if `HasAdminScope` finds a claim of type `"scope"` with value `"admin"`. But `DashboardAuthenticator.Cr… |
| Tests-001 | High | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489` | `FakeSessionManager.TryGetSession` unconditionally returns `true` and synthesizes a session for any id. As a result, `Invoke_WhenSessionMissing_ThrowsNotFound` (line 52) only passes because `InvokeException` is pre-seeded — it does not ver… |
| Tests-002 | High | Security | `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210` | The Galaxy Repository RPCs browse a SQL Server database (`ZB`). Every test injects a `StubGalaxyHierarchyCache`, so actual SQL query construction, parameterization, and filter/glob translation are never exercised. No test demonstrates that… |
| Worker-001 | High | Concurrency & thread safety | `src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:204-207` | When constructed with `pollIntervalMilliseconds > 0`, `Subscribe` starts a `System.Threading.Timer` whose `OnPoll` callback runs `PollOnce()` — which calls `wwAlarmConsumerClass.GetXmlCurrentAlarms2` — on a thread-pool thread. The wnwrap C… |
@@ -34,6 +41,21 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| Worker-003 | High | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeSession.cs:399-403`, `:416-419` | `ProcessCommandAsync` checks `_state` after `DispatchAsync` completes and silently `return`s without writing a `WorkerCommandReply` (or fault) when `_state` is not `Ready`/`ExecutingCommand`. `_state` is a plain field mutated from multiple… |
| Worker.Tests-001 | High | Testing coverage | `src/MxGateway.Worker.Tests/Sta/` (no `StaMessagePumpTests.cs`) | `StaMessagePump` — whose entire reason for existing is pumping Windows messages so MXAccess COM event sink calls deliver onto the STA — has no direct unit test. `WaitForWorkOrMessages` (timeout conversion, the `MsgWaitForMultipleObjectsEx`… |
| Worker.Tests-002 | High | Testing coverage | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs`, `src/MxGateway.Worker.Tests/MxAccess/MxAccessEventMapperTests.cs` | No test verifies that a COM event raised on the STA thread is converted to protobuf and lands in the `MxAccessEventQueue`. `MxAccessEventMapperTests` exercises the mapper directly with hand-built fakes, and `AlarmDispatcherTests` covers th… |
| Client.Dotnet-001 | Medium | Error handling & resilience | `clients/dotnet/MxGateway.Client/GrpcMxGatewayClientTransport.cs:190-199`, `clients/dotnet/MxGateway.Client/GrpcGalaxyRepositoryClientTransport.cs:131-140` | `MapRpcException` only produces typed exceptions for `Unauthenticated` and `PermissionDenied`. Every other gRPC status — `NotFound`, `InvalidArgument`, `ResourceExhausted`, `FailedPrecondition`, `Unavailable`, `Internal` — collapses into t… |
| Client.Dotnet-002 | Medium | Testing coverage | `clients/dotnet/MxGateway.Client.Tests/FakeGatewayTransport.cs:145-148`, `clients/dotnet/MxGateway.Client.Tests/MxGatewayClientSessionTests.cs:236-256` | The retry predicate `MxGatewayClientRetryPolicy.IsTransientGrpcFailure` handles two shapes: a raw `RpcException` and an `MxGatewayException { InnerException: RpcException }`. In production the transport always maps `RpcException` → `MxGate… |
| Client.Dotnet-003 | Medium | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` | `DisposeAsync` calls `CloseAsync()` (no token) then unconditionally `_closeLock.Dispose()`. If another thread is concurrently awaiting `CloseAsync(token)` — legal, since the type exposes public async methods and no single-threaded contract… |
| Client.Go-002 | Medium | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` | For the `Events`/`EventsAfter` compatibility API (`cancelWhenResultBufferFull == true`), when the 16-slot `results` channel is full `sendEventResult` cancels and returns `false`; the goroutine returns and `close(results)` runs — the consum… |
| Client.Go-003 | Medium | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` | `parseInt32List` calls `panic(err)` when an `item-handles` token fails to parse as an int32. The CLI is a documented user-facing tool; a typo like `-item-handles 1,foo` crashes the process with an unrecovered panic and stack trace instead… |
| Client.Java-001 | Medium | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` | `redactApiKey` preserves the leading and trailing four characters of the key. A gateway API key has the form `mxgw_<key-id>_<secret>`; the last four characters belong to the secret portion, so the "redacted" form leaks 4 characters of the… |
| Client.Java-002 | Medium | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` | The `next` field is a plain (non-volatile) instance field, and `MxEventStream` exposes no thread-confinement guarantee. More concretely, a queue-overflow `offer()` and a `close()` `offer(END)` can interleave so the overflow exception is en… |
| Client.Java-003 | Medium | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` | `OpenSessionReply` carries `gateway_protocol_version` (proto field 8), and `MxGatewayClientVersion.GATEWAY_PROTOCOL_VERSION` exists so the client can reject incompatible generated-code inputs. The client never reads `reply.getGatewayProtoc… |
| Client.Java-004 | Medium | Correctness & logic bugs | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:114-120,157-163,191-197` | `register`, `addItem`, and `addItem2` check `reply.hasRegister()`/`hasAddItem()` and otherwise fall back to `reply.getReturnValue().getInt32Value()`. If the gateway returns a reply with neither the typed payload nor a `return_value` set, t… |
| Client.Java-005 | Medium | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySession.java:92-105` | `close()` delegates to `closeRaw()`, which performs a network RPC. When `MxGatewaySession` is used in try-with-resources and the body throws, a failure inside `closeSession` (e.g. `WORKER_UNAVAILABLE`) throws from `close()` and replaces th… |
| Client.Python-003 | Medium | Error handling & resilience | `clients/python/src/mxgateway/client.py:125-137,155-173` | `stream_events_raw` and `query_active_alarms` call the stub directly with a `timeout` kwarg when `stream_timeout` is set, with no `TypeError` fallback. `galaxy.py:watch_deploy_events` and `_unary` *do* have a fallback that strips `timeout`… |
| Client.Python-005 | Medium | Performance & resource management | `clients/python/src/mxgateway/galaxy.py:117-140` | `discover_hierarchy` pages through the entire Galaxy object hierarchy and accumulates every `GalaxyObject` (each carrying its full attribute list) into a single in-memory `list` before returning. For a large Galaxy this is a very large all… |
| Client.Python-009 | Medium | Testing coverage | `clients/python/tests/` | Several non-trivial public paths are untested: `Session.write2`/`add_item2` request construction; the bulk-size limit `_ensure_bulk_size`/`MAX_BULK_ITEMS` guard; the `None`-argument `TypeError` guards in bulk methods; the TLS `ca_file` rea… |
| Client.Rust-005 | Medium | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` | `register_server_handle`, `add_item_handle`, and `add_item2_handle` fall through to `reply.return_value … .unwrap_or_default()`, returning `0` when the reply carries neither the expected typed payload nor an `Int32` `return_value`. Because… |
| Client.Rust-006 | Medium | Error handling & resilience | `clients/rust/src/session.rs:531-555` | `bulk_results` returns `Vec::new()` for any `(payload, kind)` combination that does not match the expected arm — including an OK reply carrying the wrong or no payload. A caller of `subscribe_bulk`/`add_item_bulk` then sees an empty result… |
| Contracts-002 | Medium | Error handling & resilience | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` | `MxCommandKind` includes `MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME = 29` and `MxCommand.payload` carries `AcknowledgeAlarmByNameCommand acknowledge_alarm_by_name_command = 38`, but `MxCommandReply.payload` has only `acknowledge_alarm = 34… |
| IntegrationTests-003 | Medium | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` | The test asserts only on the first `MxEvent` recorded by `RecordingServerStreamWriter`. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected `OnDataChange` (e.g. a registratio… |
| IntegrationTests-004 | Medium | Error handling & resilience | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111` | In the `finally` block, after `CloseSessionAsync`, the test does `await streamTask.WaitAsync(StreamShutdownTimeout)`. If closing the session does not promptly complete the stream (or `StreamEvents` itself faults), this throws `TimeoutExcep… |
@@ -57,6 +79,40 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| Worker.Tests-005 | Medium | Performance & resource management | `src/MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs:20-31,103-105`, `src/MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:28-31` | `MemoryStream` instances are created and never disposed across the frame-protocol and pipe-session tests (`MemoryStream stream = new();` with no `using`). Disposal is cheap so impact is low, but it is inconsistent with the rest of the suit… |
| Worker.Tests-006 | Medium | Performance & resource management | `src/MxGateway.Worker.Tests/MxAccess/MxAccessStaSessionTests.cs:282,305,315,323` | `Dispose_StopsAlarmPollLoop` constructs `MxAccessStaSession session` without `using` (unlike every sibling test) and relies on an explicit `session.Dispose()`. If an assertion between `StartAsync` and `Dispose()` throws, the session — its… |
| Worker.Tests-007 | Medium | Design-document adherence | `docs/WorkerFrameProtocol.md:38-49` | `docs/WorkerFrameProtocol.md` instructs running `dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests` and states the frame protocol "is part of `MxGateway.Server`". The frame protocol actually lives in… |
| Client.Dotnet-004 | Low | Error handling & resilience | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:283-294`, `clients/dotnet/MxGateway.Client/GalaxyRepositoryClient.cs:392-403` | `ExecuteSafeUnaryAsync` wraps the whole Polly retry pipeline in a single linked CTS cancelled after `Options.DefaultCallTimeout`, while `CreateCallOptions` also stamps each individual call with a `DefaultCallTimeout` gRPC deadline. The ret… |
| Client.Dotnet-005 | Low | Correctness & logic bugs | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:82,124,175` | `RegisterAsync`/`AddItemAsync`/`AddItem2Async` return `reply.<Typed>?.ServerHandle ?? reply.ReturnValue.Int32Value`. After `EnsureMxAccessSuccess()` passes, a missing typed payload silently falls back to `ReturnValue.Int32Value`, which for… |
| Client.Dotnet-006 | Low | Code organization & conventions | `clients/dotnet/MxGateway.Client/MxGatewayClientOptions.cs:50`, `clients/dotnet/MxGateway.Client/MxGatewayClientContractInfo.cs:10-14` | `MxGatewayClientOptions.MaxGrpcMessageBytes` and the two `const`s in `MxGatewayClientContractInfo` are public members with no XML doc comments, inconsistent with every other public member in the assembly and with the repo's documented C# s… |
| Client.Dotnet-007 | Low | Documentation & comments | `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:185-192` | The `AcknowledgeAlarmAsync` XML comment states the gateway authenticates against an `invoke:alarm-ack` scope, but `CLAUDE.md` documents the scope set without any `invoke:alarm-ack` sub-scope. The comment may describe an intended finer-grai… |
| Client.Dotnet-008 | Low | Correctness & logic bugs | `clients/dotnet/MxGateway.Client.Cli/MxGatewayCliSecretRedactor.cs:9-17` | The CLI redactor only removes the API key string when it was supplied via `--api-key`; `RunCoreAsync` passes `arguments.GetOptional("api-key")` to `Redact`. When the key comes from an environment variable (`--api-key-env`, the documented d… |
| Client.Go-004 | Low | mxaccessgw conventions | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` | `gofmt -l` flags `alarms_test.go` and `galaxy_test.go` for misaligned struct-literal field padding. The Go client README lists `gofmt` as part of the workflow and the repo enforces style; unformatted committed code breaks `gofmt`-gated che… |
| Client.Go-005 | Low | Design-document adherence | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` | The client uses `grpc.DialContext` with `grpc.WithBlock()`. In current grpc-go both are deprecated in favour of `grpc.NewClient` (lazy connection). `WithBlock` also changes failure semantics: a transient gateway-unavailable at dial time be… |
| Client.Go-006 | Low | Error handling & resilience | `clients/go/mxgateway/errors.go:9-130` | `docs/ClientLibrariesDesign.md` recommends a high-level error taxonomy (`TransportError`, `AuthenticationError`, `TimeoutError`, etc.). The Go client collapses all transport/gRPC failures into a single `GatewayError` with no way to classif… |
| Client.Go-007 | Low | Correctness & logic bugs | `clients/go/mxgateway/session.go:526-532` | `newCorrelationID` returns an empty string when `crypto/rand.Read` fails, silently producing an `MxCommandRequest` with no correlation id. `rand.Read` failure is rare, but the failure mode (untraceable command, no error surfaced) is worse… |
| Client.Go-008 | Low | Testing coverage | `clients/go/mxgateway/` (test files) | Several critical paths are untested: TLS credential resolution in `resolveTransportCredentials` (only the `Plaintext` path is exercised); the `callContext` deadline-shortening logic (`client.go:198-204`) including the negative-timeout disa… |
| Client.Go-009 | Low | Code organization & conventions | `clients/go/mxgateway/galaxy.go:60-93,241-256`, `clients/go/mxgateway/client.go:41-74,190-205` | `DialGalaxy`/`Dial` and `GalaxyClient.callContext`/`Client.callContext` are near-identical duplicates (dial-context setup, credential resolution, dial-option assembly, deadline arithmetic). A fix to one (e.g. the Client.Go-005 dial migrati… |
| Client.Go-010 | Low | Documentation & comments | `clients/go/mxgateway/client.go:39-40` | The `Dial` doc comment states it configures "blocking dial cancellation from ctx." This describes the deprecated `WithBlock` behaviour; once Client.Go-005 is addressed the comment is misleading about how connection establishment and cancel… |
| Client.Java-006 | Low | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` | `close()` (the `AutoCloseable` method invoked by try-with-resources) calls only `ownedChannel.shutdown()` and returns immediately without awaiting termination. In-flight calls and Netty event-loop threads may still be running when the call… |
| Client.Java-007 | Low | Testing coverage | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` | The alarm surface — `acknowledgeAlarm`/`acknowledgeAlarmAsync`/`queryActiveAlarms` and `MxGatewayActiveAlarmsSubscription` — has zero test coverage. TLS channel construction, the async `streamEventsAsync` path, `MxGatewayEventSubscription`… |
| Client.Java-008 | Low | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` | `acknowledgeAlarmAsync` and `openSessionAsync` apply `ensureProtocolSuccess` inside `thenApply`. If that validator throws a non-`MxGatewayException` `RuntimeException` it is wrapped by `CompletionException` with no `fromGrpc` normalisation… |
| Client.Java-009 | Low | Code organization & conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:310-391`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:346-413` | `createChannel`, `withDeadline`, `withStreamDeadline`, and `toCompletable` are duplicated nearly verbatim across `MxGatewayClient` and `GalaxyRepositoryClient` (~80 lines). A fix to one will not propagate to the other. |
| Client.Java-010 | Low | Documentation & comments | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:269-272`, `clients/java/README.md:76` | The `acknowledgeAlarm` Javadoc states the gateway authenticates against an `invoke:alarm-ack` scope, and the README states the Galaxy Repository requires a `metadata:read` scope. CLAUDE.md's documented scope set names neither — the Javadoc… |
| Client.Java-011 | Low | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:37-63` | The event stream relies on default gRPC auto-inbound flow control: the async stub auto-requests messages, so the server can push faster than the 16-element bounded queue drains. A momentarily slow consumer triggers queue overflow and an im… |
| Client.Java-012 | Low | Correctness & logic bugs | `clients/java/mxgateway-cli/src/main/java/com/dohertylan/mxgateway/cli/MxGatewayCli.java:667-674` | `CommonOptions.resolved()` mutates `this` (`resolvedApiKey`, `resolvedTimeout`) and returns `this`, but `toClientOptions()` and `redactedJsonMap()` read those mutated fields. If `redactedJsonMap()` is ever called before `resolved()`, it si… |
| Client.Python-001 | Low | Documentation & comments | `clients/python/pyproject.toml:8,25`, `clients/python/src/mxgateway_cli/commands.py:25` | The package `description` in `pyproject.toml` still says "Async Python client *scaffold*" even though the client is fully implemented. Stale "scaffold" wording misrepresents maturity to anyone reading PyPI metadata. (The `mxgw-py` console-… |
| Client.Python-002 | Low | Code organization & conventions | `clients/python/src/mxgateway/__init__.py:27` | `MxGatewayCommandError` is imported into `__init__.py` and is a documented public exception, but it is missing from `__all__`. It is the parent of `MxAccessError` and a meaningful catch target, so omitting it from the public surface is inc… |
| Client.Python-004 | Low | Correctness & logic bugs | `clients/python/src/mxgateway_cli/commands.py:386,402-404` | In `_smoke`, the local variable `closed` is set to `False` and never reassigned; the `finally` block's `if not closed:` is therefore always true. This is dead/misleading code suggesting a removed early-close path. |
| Client.Python-006 | Low | Concurrency & thread safety | `clients/python/src/mxgateway/client.py:74-82`, `clients/python/src/mxgateway/galaxy.py:85-93`, `clients/python/src/mxgateway/session.py:38-55` | `close()` on the clients and `Session.close()` use a plain `self._closed` check-then-set with an `await` between, with no lock. If two coroutines call `close()` concurrently both can pass the guard before either sets it, causing a double `… |
| Client.Python-007 | Low | Error handling & resilience | `clients/python/src/mxgateway/client.py:204-213` | `_canceling_iterator` (gateway event stream) does not catch `asyncio.CancelledError` to invoke `call.cancel()` explicitly — it relies on the `finally` block. `galaxy.py:_canceling_iterator` *does* explicitly catch `CancelledError`, cancel,… |
| Client.Python-008 | Low | Correctness & logic bugs | `clients/python/src/mxgateway/values.py:62-67,83-88` | `to_mx_value` maps any Python `float` to `VT_R8`/`MX_DATA_TYPE_DOUBLE` with no handling for `nan`/`inf`, which are serialised and forwarded to MXAccess which may reject or mis-handle them. `bytes` is mapped to `VT_RECORD`/`MX_DATA_TYPE_UNK… |
| Client.Python-010 | Low | Code organization & conventions | `clients/python/src/mxgateway/session.py:404`, `clients/python/src/mxgateway_cli/commands.py:422-425` | `session.py` ends with a module-level late import `from .client import GatewayClient # noqa: E402` purely to satisfy a string type hint, and `commands.py:_session` does a function-local import. Both work around a circular dependency that `… |
| Client.Python-011 | Low | Error handling & resilience | `clients/python/src/mxgateway/errors.py:122-148` | `ensure_mxaccess_success` raises `MxAccessError` if any `mx_status.success == 0`. This treats `success == 0` as the failure sentinel, but `0` is also the proto3 scalar default for an unset `MxStatusProxy`. If the gateway ever returns a rep… |
| Client.Python-012 | Low | mxaccessgw conventions | `clients/python/src/mxgateway/client.py:84-108`, `clients/python/src/mxgateway/session.py:57-77` | `Session.invoke_raw` does not run `ensure_mxaccess_success` while `Session.invoke` does, so a caller using `invoke_raw` for parity tests gets a reply where an MXAccess HRESULT failure is silently embedded with no exception. This is by desi… |
| Client.Rust-004 | Low | Documentation & comments | `clients/rust/src/version.rs:7` | `CLIENT_VERSION` is `"0.1.0-dev"` and its doc comment claims "Mirrors `Cargo.toml`", but `Cargo.toml` declares `version = "0.1.0"` (no `-dev` suffix). The comment is misleading and the value is not actually kept in sync with the manifest. |
| Client.Rust-007 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` | `RustClientDesign.md` is stale relative to the implemented code. It documents a nested `crates/mxgateway-client/` layout (the real crate root is `clients/rust/` with a flat `src/`), and lists `tracing` among "Expected dependencies", but `t… |
| Client.Rust-008 | Low | Performance & resource management | `clients/rust/src/value.rs:161-261` | `MxValueProjection::from_proto` and `MxArrayProjection::from_proto` deep-clone every element out of the wire message while `MxValue`/`MxArrayValue` also retain the original `raw` message. Every `MxValue` therefore holds two copies of its p… |
| Client.Rust-009 | Low | Testing coverage | `clients/rust/tests/client_behavior.rs`, `clients/rust/src/galaxy.rs` | Several critical paths are untested: TLS channel setup (`with_plaintext(false)` / CA-file loading), mid-stream `tonic::Status` fault propagation through `EventStream`/`DeployEventStream` (tests only send `Ok` items), and the bulk-size cap… |
| Client.Rust-010 | Low | Error handling & resilience | `clients/rust/src/client.rs:255-268`, `clients/rust/src/galaxy.rs:204-216` | The client applies only a per-call deadline via `Request::set_timeout` and has no retry, reconnect, or transient-vs-permanent classification. A transient `Unavailable` (e.g. a gateway restart) maps to the catch-all `Error::Status` and is i… |
| Client.Rust-011 | Low | mxaccessgw conventions | `clients/rust/src/session.rs:469` | `command_request` hard-codes `client_correlation_id` as `format!("rust-client-{}", kind.as_str_name())`. Every invocation of the same command kind on a session uses an identical correlation id, so the id cannot correlate a specific request… |
| Contracts-001 | Low | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) | `mxaccess_gateway.proto` now declares six RPCs on `MxAccessGateway` (`OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, `QueryActiveAlarms`). `docs/Grpc.md` still describes "the four `MxAccessGateway` RPCs" in its… |
| Contracts-003 | Low | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` | The `<Protobuf>` item for `mxaccess_worker.proto` omits `ProtoRoot="Protos"`, while the items for `mxaccess_gateway.proto` (line 9) and `galaxy_repository.proto` (line 11) both set it. `mxaccess_worker.proto` does `import "mxaccess_gateway… |
| Contracts-004 | Low | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` | The XML summary says the class exposes version metadata "before generated protobuf contracts are introduced." Generated protobuf contracts have long been introduced and are consumed across the solution. The comment is stale; the class now… |
@@ -102,4 +158,7 @@ Findings with status `Open` or `In Progress`, ordered by severity.
Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
_No closed findings._
| ID | Severity | Status | Category | Location |
|---|---|---|---|---|
| Server-001 | Critical | Resolved | Security | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Server-003 | High | Resolved | Security | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-18 |
| Commit reviewed | `6c64030` |
| Status | Reviewed |
| Open findings | 14 |
| Open findings | 12 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | Critical |
| Category | Security |
| Location | `src/MxGateway.Server/GatewayApplication.cs:147-149`, `src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58`, `src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15` |
| Status | Open |
| Status | Resolved |
**Description:** The dashboard authorization policy (`DashboardAuthenticationDefaults.AuthorizationPolicy`), `DashboardAuthorizationRequirement`, and `DashboardAuthorizationHandler` are registered in DI but never applied to any endpoint. `MapRazorComponents<App>()` has no `.RequireAuthorization(...)`, the `<Router>` in `Routes.razor` uses plain `RouteView` (not `AuthorizeRouteView`), and no dashboard page carries `[Authorize]` — a module-wide grep finds zero `RequireAuthorization`/`[Authorize]`/`AuthorizeRouteView` usages. Every dashboard page (Sessions, Workers, Events, Galaxy, Settings, and the API Keys list exposing key IDs, scopes, and constraints) is reachable by any unauthenticated remote client regardless of `Dashboard:AllowAnonymousLocalhost` or `Dashboard:RequireAdminScope`. Only the API-key mutation operations remain protected, via the separate `DashboardApiKeyManagementService.CanManage` check.
**Recommendation:** Apply the policy at the route level — `endpoints.MapRazorComponents<App>().AddInteractiveServerRenderMode().RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)` — and/or switch `Routes.razor` to `AuthorizeRouteView` with a `[Authorize]` fallback policy plus a `NotAuthorized` redirect to the login page. Add an integration test that GETs a dashboard page anonymously and asserts 302-to-login / 401.
**Resolution:** _(open)_
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `MapRazorComponents<App>()` now calls `.RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy)`, so an unauthenticated request to any dashboard component route is challenged by the cookie scheme and redirected to the login page. `GatewayApplicationTests` gained `ComponentRoutesRequireAuthorization` (component routes carry the policy) and `AuthEndpointsAllowAnonymousAccess`, replacing the prior test that asserted the insecure behavior.
### Server-002
@@ -63,13 +63,13 @@
| Severity | High |
| Category | Security |
| Location | `src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59`, `src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258` |
| Status | Open |
| Status | Resolved |
**Description:** When `Dashboard:RequireAdminScope` is true (the default) and the request is not loopback, `DashboardAuthorizationHandler` succeeds only if `HasAdminScope` finds a claim of type `"scope"` with value `"admin"`. But `DashboardAuthenticator.CreatePrincipal` issues only `NameIdentifier`, `Name`, and `LdapGroupClaimType` claims — never a `scope`/`admin` claim. So a correctly LDAP-authenticated user who passed the required-group check is still denied dashboard access on any non-loopback connection. The bug is currently masked by the missing route-level enforcement (Server-001) and by `AllowAnonymousLocalhost`; fixing Server-001 would make the dashboard unusable for all real LDAP logins.
**Recommendation:** Either have `DashboardAuthenticator.CreatePrincipal` add a `scope=admin` claim when the user is in the required group, or change `DashboardAuthorizationHandler.HasAdminScope` to evaluate LDAP group membership (reuse `IsMemberOfRequiredGroup` against the `LdapGroupClaimType` claims, as `DashboardApiKeyAuthorization.CanManage` already does).
**Resolution:** _(open)_
**Resolution:** Resolved in `a8aafdf` (2026-05-18): `DashboardAuthenticator.CreatePrincipal` — reached only after the required-group check passes — now emits the `scope=admin` claim that `DashboardAuthorizationHandler` checks, so group-validated LDAP users pass `RequireAdminScope` once route-level authorization (Server-001) is enforced.
### Server-004