fix: resolve code-review findings (locally verified)

Server-054/055/056, Contracts-020/021/022, Tests-036/038/039,
IntegrationTests-030/031/032 (+033 deferred to live rig),
Client.Dotnet-026/028/029 (+027 won't-fix), Client.Go-030..034,
Client.Python-032..036, Client.Rust-033..038.

Key fix: SessionEventDistributor orphaned a subscriber that registered after
the pump completed but before disposal (Server-056) -> register paths now
complete late registrants under _lifecycleLock; regression test added. The
racy dashboard-mirror gRPC test made deterministic (Tests-039).

Verified green locally: gateway Tests targeted classes (GatewaySession,
SessionEventDistributor, GatewayOptionsValidator, ProtobufContractRoundTrip,
GatewaySessionDashboardMirror) + dotnet/go/python/rust client suites.
This commit is contained in:
Joseph Doherty
2026-06-17 05:23:14 -04:00
parent 25d04ec37e
commit 6b5fe6aa82
37 changed files with 1049 additions and 211 deletions
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 4 |
| Open findings | 0 |
## Checklist coverage
@@ -628,13 +628,13 @@ Re-review of the .NET client delta: `LazyBrowseNode` lazy paging + tests, the ne
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) |
| Status | Open |
| Status | Resolved |
**Description:** Client.Dotnet-015 extended `isLongRunning` to include the bench commands so they aren't silently cancelled by the default 30s CTS. The new `galaxy-browse` command is NOT in `isLongRunning`. A `galaxy-browse --depth N` tree walk on a large Galaxy can exceed 30s (sequential paginated RPCs per node); the CTS fires and the OCE escapes as a non-zero exit with no output — the same silent failure the bench commands were exempted from.
**Recommendation:** Add `"galaxy-browse"` to the `isLongRunning` set alongside `galaxy-watch`/bench, so it defaults to unlimited wall-clock and only applies `CancelAfter` with an explicit `--timeout`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed against source: `CreateCancellation`'s `isLongRunning` expression at line 306 read `command is "galaxy-watch"` only — `galaxy-browse` was absent, so the default 30 s `CancelAfter` budget applied and a deep paginated tree walk that overran it would have the OCE escape as a non-zero exit with no output. (Note: at HEAD the bench commands the finding cites are also not in this set despite Client.Dotnet-015's recorded resolution, but per the task scope only `galaxy-browse` is added here.) Changed the expression to `command is "galaxy-watch" or "galaxy-browse"`, so `galaxy-browse` now runs to completion by default and only applies `CancelAfter` when the caller supplies an explicit `--timeout`. Pure correctness fix matching the existing `galaxy-watch` precedent.
### Client.Dotnet-027
@@ -643,13 +643,13 @@ Re-review of the .NET client delta: `LazyBrowseNode` lazy paging + tests, the ne
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` |
| Status | Open |
| Status | Won't Fix |
**Description:** `LazyBrowseNode` allocates one `SemaphoreSlim _expandLock = new(1,1)` per node and never disposes it (the type is not IDisposable). For a large Galaxy browse tree (thousands of nodes), live SemaphoreSlim instances accumulate; OS handles are released only on finalization. Negligible for small trees, meaningful for long-lived large trees.
**Recommendation:** Replace the once-only async gate with a non-disposable primitive (e.g. `Lazy<Task>`-based dedup) or make `LazyBrowseNode` IDisposable and dispose the semaphore. Document the chosen lifetime contract.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — **Won't Fix.** The finding's premise — that the undisposed semaphore leaks an OS handle until finalization — does not hold for this usage. `SemaphoreSlim` only allocates a kernel wait handle (`ManualResetEvent`) lazily, the first time its `AvailableWaitHandle` property is accessed; `LazyBrowseNode` uses the gate exclusively via `WaitAsync`/`Release` and never touches `AvailableWaitHandle` (verified by grep), so no unmanaged/OS handle is ever created. The semaphore is therefore pure managed memory whose lifetime is the node's and which is reclaimed by the GC with the node — `SemaphoreSlim.Dispose()` would have nothing to release. Making the type `IDisposable` (or restructuring to a `Lazy<Task>` gate) would change the public surface and push per-node disposal onto every tree consumer (thousands of nodes) for zero resource benefit, so it is not worth the over-engineering. Added an inline code comment at `LazyBrowseNode.cs:15` documenting this lifetime contract and the no-handle rationale so the design intent is explicit. No test added (no behavior change).
### Client.Dotnet-028
@@ -658,13 +658,13 @@ Re-review of the .NET client delta: `LazyBrowseNode` lazy paging + tests, the ne
| Severity | Medium |
| Category | Security |
| Location | `clients/dotnet/.../MxGatewayClientCli.cs:156` |
| Status | Open |
| Status | Resolved |
**Description:** Client.Dotnet-008 was recorded resolved by adding a `TryResolveApiKey` helper resolving both `--api-key` and the `--api-key-env` env-var path, wired into the error-redaction catch block. At HEAD the catch block reads `arguments.GetOptional("api-key")` only — the pre-008 code. When the key is sourced from the env var, `GetOptional("api-key")` returns null, `Redact(message, null)` is a no-op, and an exception message echoing the bearer key would print it raw to stderr. The existing regression test only covers the `--api-key` direct path, so it passes against the broken code. (Claimed regression — verify root cause before fixing.)
**Recommendation:** Restore the `TryResolveApiKey` pattern (resolve `--api-key` then the `--api-key-env`-named env var, default `MXGATEWAY_API_KEY`) in the catch block, and add a regression test that sources the key from the env var and asserts it is redacted in stderr.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — **Confirmed: real regression.** The `RunCoreAsync` catch block at line 156 resolved the redaction key via `arguments.GetOptional("api-key")` only, and no `TryResolveApiKey` helper existed anywhere in the CLI project (verified by grep) — the Client.Dotnet-008 helper had been lost from the history reaching HEAD, same as the 012/013/022/023 props/doc regressions. On the `--api-key-env` path `GetOptional("api-key")` is null, so `Redact(message, null)` was a no-op and a transport error echoing the bearer token would have reached stderr unredacted. Restored a non-throwing `TryResolveApiKey(CliArguments)` helper that resolves `--api-key` then the `--api-key-env`-named env var (default `MXGATEWAY_API_KEY`) and returns null when neither is set; refactored `ResolveApiKey` to call it (so the resolution order stays single-sourced) and changed the catch block to redact `TryResolveApiKey(arguments)` instead of `GetOptional("api-key")`. Regression test `MxGatewayClientCliTests.RunAsync_ErrorOutput_RedactsApiKey_WhenSourcedFromEnvironmentVariable` sets a dedicated env var (`MXGATEWAY_TEST_API_KEY_028`), runs `open-session --api-key-env <name>` (no `--api-key` flag) against a client factory that throws an `InvalidOperationException` whose message embeds the secret, and asserts exit 1, that the secret is absent from stderr, and that `[redacted]` is present. The pre-existing `--api-key`-path test (`RunAsync_ErrorOutput_RedactsApiKey`) is retained; the new test fails against the `GetOptional("api-key")`-only catch block (key printed raw) and passes after the fix.
### Client.Dotnet-029
@@ -673,10 +673,10 @@ Re-review of the .NET client delta: `LazyBrowseNode` lazy paging + tests, the ne
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/dotnet/.../IMxGatewayCliClient.cs:6` |
| Status | Open |
| Status | Resolved |
**Description:** `IMxGatewayCliClient` is a public interface with no type-level `<summary>` XML doc. The Client.Dotnet-013 resolution recorded adding one; at HEAD it is absent. No CS1591 fires (GenerateDocumentationFile now scoped to the packable library only), but the public extension point should follow the public-surface doc convention.
**Recommendation:** Add a one-line `<summary>` describing the interface and noting `MxGatewayCliClientAdapter` is the production binding.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed against source: the interface declaration at `IMxGatewayCliClient.cs:6` had no type-level `<summary>` (only the members were documented). Added a type-level `<summary>` describing the interface as the CLI's transport seam over the gateway and Galaxy Repository RPCs, naming `MxGatewayCliClientAdapter` (over a real `MxGatewayClient`) as the production binding and the in-memory fake as the test substitute. Pure documentation change — no test needed.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -731,13 +731,13 @@ if ($dirty) {
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/go/cmd/mxgw-go/main.go:1491-1494` |
| Status | Open |
| Status | Resolved |
**Description:** `runGalaxyWatch`'s limit-reached branch calls `cancelStream()` and returns WITHOUT draining the buffered `events` channel, unlike the signal-cancel branch which drains. This is the shape Client.Go-013's resolution claimed to have fixed ("now drains via for range events"). The WatchDeployEvents goroutine may still be blocked sending into the 16-deep channel; it exits via ctx cancellation (not a permanent leak) but remains alive until that propagates, racing `defer client.Close()`. (Claimed regression — verify root cause.)
**Recommendation:** After `cancelStream()` in the limit-reached branch, drain: `for range events {}`, mirroring the signal-cancel branch.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed real: the limit-reached branch returned right after `cancelStream()` while the signal-cancel branch drained `events`, so the buffered (16-deep) `WatchDeployEvents` producer could remain blocked on a send while `defer client.Close()` tore the stream down. Added the `for range events {}` drain to the limit-reached branch, mirroring the signal-cancel branch. Behaviour exercised by the existing `runGalaxyWatch` flow; verified via `go vet`/`go build`/`go test ./...`.
### Client.Go-031
@@ -746,13 +746,13 @@ if ($dirty) {
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:1037-1046` |
| Status | Open |
| Status | Resolved |
**Description:** `closeSmokeSession` registers `defer cancel()` twice on the same `cancel` variable across two `context.WithTimeout` calls when the deadline-shortening branch fires. Because `cancel` is reassigned, both defers end up calling the second context's cancel (idempotent, harmless today), while the first context is released by an explicit `cancel()`. The double-defer-on-reassigned-variable is fragile: removing the explicit `cancel()` in a future refactor would leak the first context's timer goroutine.
**Recommendation:** Use a distinct variable for the second cancel, or compute the close timeout once before allocating a single context.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed real. Rewrote `closeSmokeSession` to compute the close timeout once (default 5s, shortened to the caller's remaining deadline when sooner) and then allocate a single `context.WithTimeout` with a single `defer cancel()`, removing the reassigned-variable double-defer entirely.
### Client.Go-032
@@ -761,13 +761,13 @@ if ($dirty) {
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:839-841` |
| Status | Open |
| Status | Resolved |
**Description:** `runStreamEvents` does not install a `signal.NotifyContext` handler, while `runStreamAlarms` and `runGalaxyWatch` do. Client.Go-020's resolution claimed this was added. Without a signal-aware parent context, Ctrl+C kills the process without running `defer subscription.Close()`/`client.Close()`, so the gateway sees a torn connection rather than a clean `codes.Canceled`. (Claimed regression — verify root cause.)
**Recommendation:** Wrap `ctx` with `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)` (defer the stop) before deriving `streamCtx`, matching the other two stream commands.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed real: `runStreamEvents` derived `streamCtx` directly from `ctx` with no signal handler (and `runStreamAlarms` even carried a "Mirror runStreamEvents" comment that no longer matched). Added `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)` (with `defer stopSignals()`) before deriving `streamCtx`, so Ctrl+C/SIGTERM cancels the stream cleanly (gateway sees `codes.Canceled`) and the deferred `subscription.Close()`/`client.Close()` run. Imports already present. CLI guard covered by `TestRunStreamEventsRequiresSessionID`.
### Client.Go-033
@@ -776,13 +776,13 @@ if ($dirty) {
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/cmd/mxgw-go/main_test.go` |
| Status | Open |
| Status | Resolved |
**Description:** Gaps vs prior coverage: (1) `TestRunBenchReadBulkRejectsNonPositiveDuration` (named in Client.Go-021's resolution) is absent — the `-duration-seconds`-positive guard at main.go:619 is untested; (2) `runStreamEvents` has no CLI-level test (session-id-required and limit paths untested); (3) `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` (Client.Go-021 deliverable) is absent — the len-mismatch guard at main.go:508-510 is untested.
**Recommendation:** Add the three missing tests; all run through `runWithIO` without a fake server (except the stream-events one which can reuse the ping test's fake-server pattern).
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed all three tests absent. Added them to `cmd/mxgw-go/main_test.go`, each driving `runWithIO` and asserting the guard error before any dial: `TestRunBenchReadBulkRejectsNonPositiveDuration` (`-duration-seconds 0` → "duration-seconds must be positive"), `TestRunStreamEventsRequiresSessionID` (no `-session-id` → "session-id is required"), and `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` (2 handles / 1 value → "does not match values count"). All three pass under `go test ./...`.
### Client.Go-034
@@ -791,10 +791,10 @@ if ($dirty) {
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/go/README.md:245-263` |
| Status | Open |
| Status | Resolved |
**Description:** The README CLI example table lists ~12 commands but the binary now exposes ~27 subcommands (per `writeUsage`). Absent: `ping`, `galaxy-browse`, `batch`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`, `acknowledge-alarm`, and more. `batch` (the cross-language harness interface with an EOR sentinel + 16 MiB line cap) is undocumented entirely.
**Recommendation:** Add a complete subcommand reference, and document the `batch` EOR-sentinel protocol and line cap.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Expanded the README CLI section with a "Subcommand reference" table covering all 27 subcommands wired into `run` (incl. `ping`, `galaxy-browse`, `read-bulk`, the four bulk-write variants, `bench-read-bulk`, `stream-alarms`, `acknowledge-alarm`, `batch`), refreshed the example block, and added a "`batch` mode" subsection documenting the `__MXGW_BATCH_EOR__` end-of-result sentinel, the JSON error framing, blank-line skipping, and the 16 MiB scanner line cap.
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 5 |
| Open findings | 0 |
## Checklist coverage
@@ -1463,13 +1463,13 @@ passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` |
| Status | Open |
| Status | Resolved |
**Description:** `_smoke` reintroduces the dead `closed = False` / `if not closed:` guard that Client.Python-004's resolution claimed to have removed via `async with session:`. `closed` is never reassigned, so the guard is always true. Behavior is correct (session always closed) but the dead variable misleads readers into expecting an early-close path. (Claimed regression — verify root cause.)
**Recommendation:** Use `async with session:` or drop the `closed` variable and close unconditionally.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Confirmed regression: the dead `closed = False` / `if not closed:` guard had returned. Replaced the `try/finally` with `async with session:` (Session implements the async context-manager protocol). Test: `test_smoke_does_not_carry_dead_closed_guard` in `tests/test_review_findings_032_to_036.py`.
### Client.Python-033
@@ -1478,13 +1478,13 @@ passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` |
| Status | Open |
| Status | Resolved |
**Description:** `_parse_string_list` always emits `param_hint="--items"`, but it is also called from `_build_write_bulk_entries` with `kwargs["values"]`. An empty `--values ""` on the write-bulk commands yields `Error: Invalid value for '--items': ...`, pointing at a flag that doesn't exist on those commands.
**Recommendation:** Add an optional `param_hint` parameter (default `--items`) and pass `--values` from the write-bulk caller.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added `param_hint="--items"` default param to `_parse_string_list`; `_build_write_bulk_entries` now passes `param_hint="--values"`. Tests: `test_parse_string_list_default_param_hint_is_items`, `test_parse_string_list_accepts_caller_supplied_param_hint` in `tests/test_review_findings_032_to_036.py`.
### Client.Python-034
@@ -1493,13 +1493,13 @@ passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` |
| Status | Open |
| Status | Resolved |
**Description:** `_parse_int_list` does `int(item)` with no error handling. A non-numeric token (e.g. `--item-handles "10,abc"`) raises a raw `ValueError`, surfacing as an unformatted traceback interactively (other input errors raise `click.BadParameter`).
**Recommendation:** Wrap the conversion and re-raise as `click.BadParameter(..., param_hint="--item-handles")`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Wrapped the `int()` comprehension in `try/except ValueError` and re-raise as `click.BadParameter(..., param_hint="--item-handles")`. Tests: `test_parse_int_list_non_numeric_raises_bad_parameter`, `test_parse_int_list_happy_path` in `tests/test_review_findings_032_to_036.py`.
### Client.Python-035
@@ -1508,13 +1508,13 @@ passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` |
| Status | Open |
| Status | Resolved |
**Description:** Two new public types — `BrowseChildrenOptions` (options.py) and `LazyBrowseNode` (galaxy.py) — are absent from `__init__.py`/`__all__`, so callers can't `from zb_mom_ww_mxgateway import BrowseChildrenOptions`, breaking the package-root import contract that `ClientOptions`/`GatewayClient`/etc. follow.
**Recommendation:** Re-export both from `__init__.py` and add them to `__all__`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Re-exported `BrowseChildrenOptions` (from `.options`) and `LazyBrowseNode` (from `.galaxy`) in `__init__.py` and added both to `__all__`. Tests: `test_browse_children_options_is_exported_from_package_root`, `test_lazy_browse_node_is_exported_from_package_root` in `tests/test_review_findings_032_to_036.py`.
### Client.Python-036
@@ -1523,10 +1523,10 @@ passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/python/README.md:143-158` |
| Status | Open |
| Status | Resolved |
**Description:** The README "Browsing lazily" section's code example calls `galaxy.browse_children(...)`, a method that does not exist — the actual public low-level method is `browse_children_raw`. The example raises `AttributeError` at runtime. The README-parse test only covers shell CLI invocations, not Python code fragments, so it doesn't catch this.
**Recommendation:** Update the example/prose to `browse_children_raw(...)` (and promote the high-level `browse()`/`LazyBrowseNode` path), or add a `browse_children` alias. Add a `hasattr` test to catch future renames.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Updated the README "Browsing lazily" prose and example to `browse_children_raw(...)` and added a pointer to the higher-level `browse()`/`LazyBrowseNode` walker. Tests: `test_galaxy_client_exposes_browse_children_raw` (hasattr guard) and `test_readme_browse_example_uses_existing_method` (parses every `galaxy.<method>()` call in README against the client class) in `tests/test_review_findings_032_to_036.py`.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 6 |
| Open findings | 0 |
## Checklist coverage
@@ -787,13 +787,13 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:485` |
| Status | Open |
| Status | Resolved |
**Description:** `ConnectionArgs::options()` computes plaintext as `!self.tls || self.plaintext`. With both `--tls` and `--plaintext` supplied, this is `true`, silently degrading to an unencrypted channel despite the explicit `--tls`. A security-sensitive footgun (e.g. a script auto-appending `--plaintext`).
**Recommendation:** Add clap `conflicts_with = "tls"` on `--plaintext` (reject the combo), or prefer `--tls` and warn.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added `conflicts_with = "tls"` to the `--plaintext` arg so supplying both is rejected at parse time, removing the silent downgrade. Tests: `connection_rejects_tls_and_plaintext_together`, `connection_tls_flag_disables_plaintext`, `connection_defaults_to_plaintext`, `connection_plaintext_flag_selects_plaintext`.
### Client.Rust-034
@@ -802,13 +802,13 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` |
| Status | Open |
| Status | Resolved |
**Description:** `Command::Version` carries a `jsonl: bool` field that is never read; the dispatch arm matches `{ json, .. }` and discards `jsonl`. `mxgw version --jsonl` silently behaves as plain text.
**Recommendation:** Handle `jsonl` in the Version arm (treat like `--json`) or remove the unused field.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Removed the unused `jsonl` field from `Command::Version` (version output is a single record, not a stream); the dispatch arm now matches `{ json }` exhaustively, so `mxgw version --jsonl` errors as an unknown flag instead of silently being ignored. No test (CLI surface change verified by build).
### Client.Rust-035
@@ -817,13 +817,13 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Low |
| Category | Security |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` |
| Status | Open |
| Status | Resolved |
**Description:** `--api-key-env` (default `MXGATEWAY_API_KEY`) names an env var read into an `ApiKey` Bearer token, but its clap help has no description of the expected value format. A user pointing it at another credential's env var would silently forward that credential to the gateway as a Bearer token. Low risk (redacted Debug; bounded to user's own shell) but an implicit-trust gap.
**Recommendation:** Add help text stating the variable must hold a value of the form `mxgw_<key-id>_<secret>`.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added clap doc-comment help to `--api-key-env` stating the variable's value must be a full gateway key of the form `mxgw_<key-id>_<secret>` and is forwarded verbatim as the Bearer token. Doc/help-only change, no test.
### Client.Rust-036
@@ -832,13 +832,13 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:351` |
| Status | Open |
| Status | Resolved |
**Description:** The new `galaxy browse` subcommand (with its filter/depth/json flags) is not listed in the "Test CLI" command table in RustClientDesign.md, which still reads `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}`.
**Recommendation:** Add `mxgw galaxy browse [...flags]` and note `--depth 0` = requested level only, `--depth N` eagerly expands, and `--parent-gobject-id` makes `--depth` a no-op.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added the `mxgw galaxy browse` line (with all flags) to the CLI table and a paragraph documenting that `--depth 0` prints only the requested level, `--depth N` eagerly expands N further levels, and `--parent-gobject-id` makes `--depth` a no-op. Doc-only change, no test.
### Client.Rust-037
@@ -847,13 +847,13 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/README.md:164-179` |
| Status | Open |
| Status | Resolved |
**Description:** The README "Browsing lazily" example calls `galaxy.browse_children(...).await?.into_inner()`, but the public API is `GalaxyClient::browse_children_raw` (the bare `browse_children` is the generated proto-client method, not public; and `browse_children_raw` returns the reply struct directly, no `.into_inner()`). The example would not compile.
**Recommendation:** Replace with `galaxy.browse_children_raw(BrowseChildrenRequest::default()).await?` (drop `.into_inner()`).
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Verified `browse_children_raw` is the public method (galaxy.rs:302) and returns `BrowseChildrenReply` directly. Updated the README prose and example to call `browse_children_raw(...).await?` without `.into_inner()`. Doc-only change, no test.
### Client.Rust-038
@@ -862,10 +862,10 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` |
| Status | Open |
| Status | Resolved |
**Description:** Three CLI test gaps: (1) `ConnectionArgs::options()` `--tls`/`--plaintext` resolution (incl. the both-set path of Client.Rust-033) is untested; (2) `browse_children_one_level`'s repeated-page-token guard is untested; (3) `parse_rfc3339_timestamp` has no error-path tests (trailing chars, day=0, month 13, out-of-range day).
**Recommendation:** Add unit tests for all three (none need a network connection).
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added all three test groups. (1) `--tls`/`--plaintext` resolution: `connection_defaults_to_plaintext`, `connection_tls_flag_disables_plaintext`, `connection_plaintext_flag_selects_plaintext`, `connection_rejects_tls_and_plaintext_together`. (2) Extracted the page-token dedup guard into pure `register_page_token` and covered it with `register_page_token_accepts_distinct_tokens_and_rejects_repeats`. (3) RFC3339 error paths: `rfc3339_parser_rejects_trailing_characters`, `rfc3339_parser_rejects_day_zero`, `rfc3339_parser_rejects_month_thirteen`, `rfc3339_parser_rejects_day_out_of_range_for_month`.
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 3 |
| Open findings | 0 |
## Checklist coverage
@@ -433,13 +433,13 @@ Re-review of the proto delta (`git diff 410acc9..8df5ab3 -- .../Protos/`): the n
| Severity | Low |
| Category | Design-document adherence |
| Location | `gateway.md:1087,1101-1102` |
| Status | Open |
| Status | Resolved |
**Description:** gateway.md still lists "no reconnectable sessions" under "Resolved for v1" and lists "reconnectable sessions" / "multi-subscriber event fan-out" as post-v1 revisit items. The shipped `ReplayGap` reconnect-replay contract and multi-subscriber fan-out (documented in docs/Sessions.md) contradict this. docs/Sessions.md was updated; gateway.md's scope summary was left stale.
**Recommendation:** Update the gateway.md Resolved/Post-v1 lists to reflect that reconnectable sessions (via `after_worker_sequence` + `ReplayGap`) and multi-subscriber fan-out have shipped, cross-referencing docs/Sessions.md.
**Resolution:** _(empty until closed)_
**Resolution:** _(2026-06-16)_ Updated `gateway.md` "Resolved for v1" list: replaced "no reconnectable sessions" / "one active event subscriber" with bullet points describing the shipped reconnect-replay (`after_worker_sequence` + `ReplayGap` sentinel, cross-referencing `docs/Sessions.md`) and multi-subscriber fan-out (single-subscriber fail-fast vs. multi-subscriber per-consumer disconnect, cross-referencing `docs/Sessions.md`). Removed "reconnectable sessions" and "multi-subscriber event fan-out" from the Post-v1 revisit list. Updated the backpressure bullet to mention both modes.
### Contracts-021
@@ -448,13 +448,13 @@ Re-review of the proto delta (`git diff 410acc9..8df5ab3 -- .../Protos/`): the n
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` |
| Status | Open |
| Status | Resolved |
**Description:** The `replay_gap` field comment ends with "(Reconnect/replay logic is Task 12; this is the contract surface only.)". That parenthetical is now stale — the reconnect/replay logic has shipped and is exercised by EventStreamServiceTests/SessionEventDistributorTests. A reader is misled into thinking only the contract exists.
**Recommendation:** Drop the "Task 12 / contract surface only" parenthetical; the rest of the comment is accurate.
**Resolution:** _(empty until closed)_
**Resolution:** _(2026-06-16)_ Removed the stale "(Reconnect/replay logic is Task 12; this is the contract surface only.)" parenthetical from the `replay_gap` field comment in `mxaccess_gateway.proto`. The "Additive (proto3):" sentence before it is retained. Comment-only change; no wire-format or generated-type impact.
### Contracts-022
@@ -463,10 +463,10 @@ Re-review of the proto delta (`git diff 410acc9..8df5ab3 -- .../Protos/`): the n
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** No round-trip / descriptor pin exists for the new `ReplayGap` message or `MxEvent.replay_gap` (field 14). The field is exercised functionally end-to-end, but there is no contract-level pin to catch a future renumber/type-narrowing of `replay_gap = 14` or the two `ReplayGap` sequence-field numbers — the same gap class as Contracts-007/010/018.
**Recommendation:** Add a round-trip test setting `MxEvent.ReplayGap` with both sequence fields, asserting `BodyCase == None`, plus a descriptor assertion pinning `ReplayGapFieldNumber == 14` and the `ReplayGap` field numbers (1, 2).
**Resolution:** _(empty until closed)_
**Resolution:** _(2026-06-16)_ Added `ProtobufContractRoundTripTests.MxEvent_RoundTripsReplayGapSentinelAndPinsFieldNumbers` to `ProtobufContractRoundTripTests.cs`. The test pins `MxEvent.ReplayGapFieldNumber == 14` via the generated constant, pins `ReplayGap.RequestedAfterSequenceFieldNumber == 1` and `ReplayGap.OldestAvailableSequenceFieldNumber == 2` via `ReplayGap.Descriptor.Fields` (asserting both the number and the field name), builds a sentinel `MxEvent` with both sequence fields populated and no body oneof set, serializes and parses it, then asserts both sequence values survive and `BodyCase == None` (confirming `replay_gap` is orthogonal to the body oneof).
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 4 |
| Open findings | 0 |
## Checklist coverage
@@ -633,13 +633,13 @@ The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/GatewayTesting.md` says "All six tests are gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1" and enumerates five parity paths. This diff adds two new `[LiveMxAccessFact]` tests (B8 new COM commands: AuthenticateUser/ArchestrAUserToId/Suspend/Activate; and the buffered-data path: AddBufferedItem/SetBufferedUpdateInterval), bringing the total to eight. The doc still says "six" and omits the two new parity surfaces.
**Recommendation:** Update GatewayTesting.md to "eight" and add bullets for the B8 new-COM-commands and buffered-data parity surfaces.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: Updated `docs/GatewayTesting.md` — changed "five parity paths" to "seven", "All six tests" to "All eight tests", and added bullets for the B8 new-COM-commands surface (AuthenticateUser/ArchestrAUserToId/Suspend/Activate against an added-but-not-advised item) and the buffered-data surface (AddBufferedItem/SetBufferedUpdateInterval/Advise round-trip with at least one OnBufferedDataChange event, residual noted for multi-sample conversion).
### IntegrationTests-031
@@ -648,13 +648,13 @@ The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` |
| Status | Open |
| Status | Resolved |
**Description:** The inline comment at line 672 says "Suspend / Activate against the advised item", but no `Advise` call is made between `AddItem` (line 616) and `CreateSuspendRequest` (line 677) — the item is added but not advised. The comment mislabels the COM subscription state under test (the parity assertion only requires a real reply, not a successful one).
**Recommendation:** Change "against the advised item" to "against the added-but-not-advised item" (or remove "advised"), and note that Suspend/Activate is exercised without a prior Advise.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: Rewrote the comment to "Suspend / Activate against the added-but-not-advised item (no Advise was issued between AddItem and this call)," making the COM subscription state explicit and noting that parity requires only a real reply, not a successful one.
### IntegrationTests-032
@@ -663,13 +663,13 @@ The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` |
| Status | Open |
| Status | Resolved |
**Description:** In the buffered-item test, when no sample-bearing `OnBufferedDataChange` batch arrives, the sample-predicate `TimeoutException` is caught and discarded (line 831) before asserting `bootstrapBufferedEvents > 0`. The final failure message ("No OnBufferedDataChange event arrived at all") conflates two failure modes (NoData bootstrap not delivered vs. delivered-but-no-sample), reducing residual diagnostic quality.
**Recommendation:** Before nulling the batch, log the caught timeout message (e.g. `output.WriteLine($"B8: sample-bearing batch predicate timed out: {ex.Message}")`) so the residual log distinguishes the two cases.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: Added `output.WriteLine($"B8: sample-bearing batch predicate timed out: {ex.Message}")` inside the `catch (TimeoutException ex)` block before nulling `bufferedBatch`, so the residual log clearly records the timeout detail and distinguishes "predicate timed out" from "no OnBufferedDataChange arrived at all".
### IntegrationTests-033
@@ -678,10 +678,10 @@ The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` |
| Status | Open |
| Status | Deferred |
**Description:** The new-COM-commands live test covers AuthenticateUser/ArchestrAUserToId/Suspend/Activate but not `AddItem2`/`Write2` — the B8 extended commands with a second context parameter introduced in the same bundle. Only live COM tests can verify the COM call succeeds with the correct argument split; a parity regression short-circuiting AddItem2/Write2 to InvalidRequest would not be caught.
**Recommendation:** Add AddItem2/Write2 to the parity test (or a dedicated test) asserting each produces a real reply (not InvalidRequest) against a valid handle and item-definition split.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: requires a live MXAccess rig + provider state not available on this dev box; add the AddItem2/Write2 parity assertions when running on the MXAccess host.
+42 -40
View File
@@ -10,15 +10,15 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 29 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 34 |
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 29 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 34 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 9 | 48 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 36 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 6 | 38 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 22 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 33 |
| [Server](Server/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 2 | 55 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 38 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 36 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 38 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 22 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 33 |
| [Server](Server/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 56 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 39 |
| [Worker](Worker/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 28 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 36 |
@@ -28,19 +28,7 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Dotnet-028 | Medium | Security | `clients/dotnet/.../MxGatewayClientCli.cs:156` | Client.Dotnet-008 was recorded resolved by adding a `TryResolveApiKey` helper resolving both `--api-key` and the `--api-key-env` env-var path, wired into the error-redaction catch block. At HEAD the catch block reads `arguments.GetOptional… |
| Client.Go-030 | Medium | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:1491-1494` | `runGalaxyWatch`'s limit-reached branch calls `cancelStream()` and returns WITHOUT draining the buffered `events` channel, unlike the signal-cancel branch which drains. This is the shape Client.Go-013's resolution claimed to have fixed ("n… |
| Client.Java-040 | Medium | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1552-1561` | The `stream-alarms` overflow handler does `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)` non-atomically on an `ArrayBlockingQueue` shared with the gRPC delivery thread. In production gRPC (netty I/O thread), a concurrent… |
| Client.Python-036 | Medium | Documentation & comments | `clients/python/README.md:143-158` | The README "Browsing lazily" section's code example calls `galaxy.browse_children(...)`, a method that does not exist — the actual public low-level method is `browse_children_raw`. The example raises `AttributeError` at runtime. The README… |
| Client.Rust-033 | Medium | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:485` | `ConnectionArgs::options()` computes plaintext as `!self.tls \|\| self.plaintext`. With both `--tls` and `--plaintext` supplied, this is `true`, silently degrading to an unencrypted channel despite the explicit `--tls`. A security-sensitive… |
| Server-054 | Medium | Design-document adherence | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) | The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still… |
| Client.Dotnet-026 | Low | Correctness & logic bugs | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) | Client.Dotnet-015 extended `isLongRunning` to include the bench commands so they aren't silently cancelled by the default 30s CTS. The new `galaxy-browse` command is NOT in `isLongRunning`. A `galaxy-browse --depth N` tree walk on a large… |
| Client.Dotnet-027 | Low | Performance & resource management | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` | `LazyBrowseNode` allocates one `SemaphoreSlim _expandLock = new(1,1)` per node and never disposes it (the type is not IDisposable). For a large Galaxy browse tree (thousands of nodes), live SemaphoreSlim instances accumulate; OS handles ar… |
| Client.Dotnet-029 | Low | Code organization & conventions | `clients/dotnet/.../IMxGatewayCliClient.cs:6` | `IMxGatewayCliClient` is a public interface with no type-level `<summary>` XML doc. The Client.Dotnet-013 resolution recorded adding one; at HEAD it is absent. No CS1591 fires (GenerateDocumentationFile now scoped to the packable library o… |
| Client.Go-031 | Low | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:1037-1046` | `closeSmokeSession` registers `defer cancel()` twice on the same `cancel` variable across two `context.WithTimeout` calls when the deadline-shortening branch fires. Because `cancel` is reassigned, both defers end up calling the second cont… |
| Client.Go-032 | Low | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:839-841` | `runStreamEvents` does not install a `signal.NotifyContext` handler, while `runStreamAlarms` and `runGalaxyWatch` do. Client.Go-020's resolution claimed this was added. Without a signal-aware parent context, Ctrl+C kills the process withou… |
| Client.Go-033 | Low | Testing coverage | `clients/go/cmd/mxgw-go/main_test.go` | Gaps vs prior coverage: (1) `TestRunBenchReadBulkRejectsNonPositiveDuration` (named in Client.Go-021's resolution) is absent — the `-duration-seconds`-positive guard at main.go:619 is untested; (2) `runStreamEvents` has no CLI-level test (… |
| Client.Go-034 | Low | Documentation & comments | `clients/go/README.md:245-263` | The README CLI example table lists ~12 commands but the binary now exposes ~27 subcommands (per `writeUsage`). Absent: `ping`, `galaxy-browse`, `batch`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`,… |
| Client.Java-041 | Low | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:2187-2194` | `jsonString` escapes only `\`, `"`, `\r`, `\n` — not `\t`, `\b`, `\f`, or U+0000U+001F/U+007F. A tag address/message/reference containing a tab produces malformed JSON (RFC 8259). Affects the hand-rolled `jsonObject`/`jsonString`/`jsonVal… |
| Client.Java-042 | Low | Error handling & resilience | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1565-1567` | `StreamAlarmsCommand.onError` calls `queue.offer(error)` without checking the return value. If the queue is full when a transport error arrives, the error is dropped and the drain loop blocks forever on `queue.take()`. Same class as Client… |
| Client.Java-043 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:241-264` | `galaxyBrowseParentZeroEmitsWarningToStderr` calls `MxGatewayCli.execute(new FakeClientFactory(), ...)` for a galaxy-browse command, which wires the real `GrpcGalaxyClientFactory` and constructs a live Netty channel to localhost:5000 as a… |
@@ -49,26 +37,6 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| Client.Java-046 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:680-696` | `streamAlarmsCommandFailsFastOnQueueOverflow` delivers all 2000 onNext synchronously from within `streamAlarms`, so `subscriptionRef` is still null when the overflow fires — the `sub.cancel()` branch is never exercised. The test also doesn… |
| Client.Java-047 | Low | Documentation & comments | `clients/java/README.md` | README advertises the `0.1.1` artifact coordinate (Gitea Maven section) while the `version` command reports `0.1.0` — the user-visible symptom of Client.Java-044. Cross-ref `MxGatewayClientVersion.java:12`. |
| Client.Java-048 | Low | Documentation & comments | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:88-105` | The public `execute(PrintWriter, PrintWriter, String...)` Javadoc calls it "Test-friendly entry point", but it wires `GrpcMxGatewayCliClientFactory` with no injection — the actual test seam is the package-private `execute(MxGatewayCliClien… |
| Client.Python-032 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` | `_smoke` reintroduces the dead `closed = False` / `if not closed:` guard that Client.Python-004's resolution claimed to have removed via `async with session:`. `closed` is never reassigned, so the guard is always true. Behavior is correct… |
| Client.Python-033 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` | `_parse_string_list` always emits `param_hint="--items"`, but it is also called from `_build_write_bulk_entries` with `kwargs["values"]`. An empty `--values ""` on the write-bulk commands yields `Error: Invalid value for '--items': ...`, p… |
| Client.Python-034 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` | `_parse_int_list` does `int(item)` with no error handling. A non-numeric token (e.g. `--item-handles "10,abc"`) raises a raw `ValueError`, surfacing as an unformatted traceback interactively (other input errors raise `click.BadParameter`). |
| Client.Python-035 | Low | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` | Two new public types — `BrowseChildrenOptions` (options.py) and `LazyBrowseNode` (galaxy.py) — are absent from `__init__.py`/`__all__`, so callers can't `from zb_mom_ww_mxgateway import BrowseChildrenOptions`, breaking the package-root imp… |
| Client.Rust-034 | Low | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` | `Command::Version` carries a `jsonl: bool` field that is never read; the dispatch arm matches `{ json, .. }` and discards `jsonl`. `mxgw version --jsonl` silently behaves as plain text. |
| Client.Rust-035 | Low | Security | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` | `--api-key-env` (default `MXGATEWAY_API_KEY`) names an env var read into an `ApiKey` Bearer token, but its clap help has no description of the expected value format. A user pointing it at another credential's env var would silently forward… |
| Client.Rust-036 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:351` | The new `galaxy browse` subcommand (with its filter/depth/json flags) is not listed in the "Test CLI" command table in RustClientDesign.md, which still reads `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}`. |
| Client.Rust-037 | Low | Design-document adherence | `clients/rust/README.md:164-179` | The README "Browsing lazily" example calls `galaxy.browse_children(...).await?.into_inner()`, but the public API is `GalaxyClient::browse_children_raw` (the bare `browse_children` is the generated proto-client method, not public; and `brow… |
| Client.Rust-038 | Low | Testing coverage | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` | Three CLI test gaps: (1) `ConnectionArgs::options()` `--tls`/`--plaintext` resolution (incl. the both-set path of Client.Rust-033) is untested; (2) `browse_children_one_level`'s repeated-page-token guard is untested; (3) `parse_rfc3339_tim… |
| Contracts-020 | Low | Design-document adherence | `gateway.md:1087,1101-1102` | gateway.md still lists "no reconnectable sessions" under "Resolved for v1" and lists "reconnectable sessions" / "multi-subscriber event fan-out" as post-v1 revisit items. The shipped `ReplayGap` reconnect-replay contract and multi-subscrib… |
| Contracts-021 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` | The `replay_gap` field comment ends with "(Reconnect/replay logic is Task 12; this is the contract surface only.)". That parenthetical is now stale — the reconnect/replay logic has shipped and is exercised by EventStreamServiceTests/Sessio… |
| Contracts-022 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | No round-trip / descriptor pin exists for the new `ReplayGap` message or `MxEvent.replay_gap` (field 14). The field is exercised functionally end-to-end, but there is no contract-level pin to catch a future renumber/type-narrowing of `repl… |
| IntegrationTests-030 | Low | Documentation & comments | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` | `docs/GatewayTesting.md` says "All six tests are gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1" and enumerates five parity paths. This diff adds two new `[LiveMxAccessFact]` tests (B8 new COM commands: AuthenticateUser/ArchestrAUserToId/Sus… |
| IntegrationTests-031 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` | The inline comment at line 672 says "Suspend / Activate against the advised item", but no `Advise` call is made between `AddItem` (line 616) and `CreateSuspendRequest` (line 677) — the item is added but not advised. The comment mislabels t… |
| IntegrationTests-032 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` | In the buffered-item test, when no sample-bearing `OnBufferedDataChange` batch arrives, the sample-predicate `TimeoutException` is caught and discarded (line 831) before asserting `bootstrapBufferedEvents > 0`. The final failure message ("… |
| IntegrationTests-033 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` | The new-COM-commands live test covers AuthenticateUser/ArchestrAUserToId/Suspend/Activate but not `AddItem2`/`Write2` — the B8 extended commands with a second context parameter introduced in the same bundle. Only live COM tests can verify… |
| Server-055 | Low | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` | When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` an… |
| Tests-036 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` | Three new validator rules — `DetachGraceSeconds >= 0` (GatewayOptionsValidator.cs:185-186), `ReplayBufferCapacity >= 0` (:215-216), `ReplayRetentionSeconds >= 0` (:219-220) — have no tests, while the sibling new options (`MaxEventSubscribe… |
| Tests-037 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | The reconnect/replay contract surface (`ReplayGap` message, `MxEvent.replay_gap = 14`, `StreamEventsRequest.after_worker_sequence`) has no protobuf serialize/parse round-trip test pinning the wire shape and the documented sentinel invarian… |
| Tests-038 | Low | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` | `DrainUntilFaultAsync` relies on the channel completing WITH a fault so `WaitToReadAsync` re-throws. Correct for current callers, but if reused on a channel that completes gracefully, `WaitToReadAsync` returns false without throwing and th… |
| Worker.Tests-034 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs:2233`, `src/ZB.MOM.WW.MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs:97` | `FakeMxStatus` is defined twice — file-scope in `TestSupport/NoopMxAccessServer.cs:97` and nested in `MxAccessCommandExecutorTests.FakeMxAccessComObject:2233` — both exposing the same four public fields that `MxStatusProxyConverter` reflec… |
| Worker.Tests-035 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs`, `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:99-136` | `MxAccessCommandExecutor.Execute` has a `_` discard arm returning `CreateInvalidRequestReply(... "Unsupported MXAccess command kind ...")` — the safety net for an unknown `MxCommandKind` (e.g. a future gateway enum value before the worker… |
| Worker.Tests-036 | Low | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:983-996` | `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop` carries a redundant wall-clock assertion `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)`. The existing `heartbeatWait` CTS (cancel-after 5s) already enforces the same bound — th… |
@@ -111,11 +79,13 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Dotnet-003 | Medium | Resolved | Concurrency & thread safety | `clients/dotnet/MxGateway.Client/MxGatewaySession.cs:659-663`, `clients/dotnet/MxGateway.Client/MxGatewayClient.cs:230-240` |
| Client.Dotnet-018 | Medium | Resolved | Documentation & comments | `clients/dotnet/README.md:137-138` |
| Client.Dotnet-022 | Medium | Resolved | mxaccessgw conventions | `clients/dotnet/Directory.Build.props:1-21` |
| Client.Dotnet-028 | Medium | Resolved | Security | `clients/dotnet/.../MxGatewayClientCli.cs:156` |
| Client.Go-002 | Medium | Resolved | Error handling & resilience | `clients/go/mxgateway/session.go:440-516` |
| Client.Go-003 | Medium | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:517-532` |
| Client.Go-022 | Medium | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:398-412,417-519` |
| Client.Go-023 | Medium | Resolved | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:604-606,616-632` |
| Client.Go-028 | Medium | Resolved | Correctness & logic bugs | `scripts/tag-go-module.ps1:42-46` |
| Client.Go-030 | Medium | Resolved | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:1491-1494` |
| Client.Java-001 | Medium | Resolved | Security | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewaySecrets.java:30-32` |
| Client.Java-002 | Medium | Resolved | Concurrency & thread safety | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxEventStream.java:31,66-92` |
| Client.Java-003 | Medium | Resolved | mxaccessgw conventions | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:119-140` |
@@ -137,6 +107,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Python-024 | Medium | Resolved | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:13,48-119` |
| Client.Python-027 | Medium | Resolved | Security | `clients/python/src/zb_mom_ww_mxgateway/client.py:36-54`, `clients/python/src/zb_mom_ww_mxgateway/galaxy.py:47-66`, `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:165-172,918-930` |
| Client.Python-028 | Medium | Resolved | Error handling & resilience | `clients/python/src/zb_mom_ww_mxgateway/options.py:120-130`, `clients/python/src/zb_mom_ww_mxgateway/client.py:59`, `clients/python/src/zb_mom_ww_mxgateway/galaxy.py:71` |
| Client.Python-036 | Medium | Resolved | Documentation & comments | `clients/python/README.md:143-158` |
| Client.Rust-005 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:489-520` |
| Client.Rust-006 | Medium | Resolved | Error handling & resilience | `clients/rust/src/session.rs:531-555` |
| Client.Rust-015 | Medium | Resolved | Error handling & resilience | `clients/rust/crates/mxgw-cli/src/main.rs:1053-1070` |
@@ -145,6 +116,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Rust-022 | Medium | Resolved | Correctness & logic bugs | `clients/rust/src/session.rs:369-391,403-420,427-444,452-469,476-493,631-696,706-724` |
| Client.Rust-024 | Medium | Resolved | Testing coverage | `clients/rust/tests/client_behavior.rs:405-415`; `clients/rust/src/session.rs:369-493`; `clients/rust/src/client.rs:265-291`; `clients/rust/crates/mxgw-cli/src/main.rs:1310-1505` |
| Client.Rust-031 | Medium | Resolved | Error handling & resilience | `clients/rust/src/options.rs:196-240` (`build_tls_config`); `clients/rust/Cargo.toml:40` (tonic features); docs: `clients/rust/src/options.rs:76-101`, `clients/rust/README.md` (TLS trust section), `clients/rust/crates/mxgw-cli/src/main.rs:429-431`, `clients/rust/RustClientDesign.md:202` |
| Client.Rust-033 | Medium | Resolved | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:485` |
| Contracts-002 | Medium | Resolved | Error handling & resilience | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
| Contracts-009 | Medium | Resolved | Design-document adherence | `docs/Contracts.md:13-24` |
| IntegrationTests-003 | Medium | Resolved | Correctness & logic bugs | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97` |
@@ -169,6 +141,8 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-038 | Medium | Resolved | Security | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/Hubs/EventsHub.cs:23-44` |
| Server-044 | Medium | Resolved | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-254` |
| Server-051 | Medium | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs:64-78` |
| Server-054 | Medium | Resolved | Design-document adherence | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) |
| Server-056 | Medium | Resolved | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionEventDistributor.cs:296-310,449-453,629-635` |
| Tests-003 | Medium | Resolved | Performance & resource management | `src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176`, `src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258` |
| Tests-004 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs` |
| Tests-005 | Medium | Resolved | Testing coverage | `src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs` |
@@ -217,6 +191,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Dotnet-023 | Low | Resolved | Code organization & conventions | `clients/dotnet/Directory.Build.props:17`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/IMxGatewayCliClient.cs:6`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Tests/*.cs` |
| Client.Dotnet-024 | Low | Resolved | Code organization & conventions | `clients/dotnet/Directory.Build.props:12`, `clients/dotnet/ZB.MOM.WW.MxGateway.Client/ZB.MOM.WW.MxGateway.Client.csproj:19-24` |
| Client.Dotnet-025 | Low | Resolved | Concurrency & thread safety | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:38,41,54,82,94` |
| Client.Dotnet-026 | Low | Resolved | Correctness & logic bugs | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) |
| Client.Dotnet-027 | Low | Won't Fix | Performance & resource management | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` |
| Client.Dotnet-029 | Low | Resolved | Code organization & conventions | `clients/dotnet/.../IMxGatewayCliClient.cs:6` |
| Client.Go-004 | Low | Resolved | mxaccessgw conventions | `clients/go/mxgateway/alarms_test.go:153-154`, `clients/go/mxgateway/galaxy_test.go:58-59` |
| Client.Go-005 | Low | Resolved | Design-document adherence | `clients/go/mxgateway/client.go:64,68`, `clients/go/mxgateway/galaxy.go:83,87` |
| Client.Go-006 | Low | Resolved | Error handling & resilience | `clients/go/mxgateway/errors.go:9-130` |
@@ -240,6 +217,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Go-026 | Low | Resolved | Error handling & resilience | `clients/go/cmd/mxgw-go/main.go:1196-1222` |
| Client.Go-027 | Low | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:1195-1206` |
| Client.Go-029 | Low | Resolved | Documentation & comments | `clients/go/README.md:300-303` |
| Client.Go-031 | Low | Resolved | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:1037-1046` |
| Client.Go-032 | Low | Resolved | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:839-841` |
| Client.Go-033 | Low | Resolved | Testing coverage | `clients/go/cmd/mxgw-go/main_test.go` |
| Client.Go-034 | Low | Resolved | Documentation & comments | `clients/go/README.md:245-263` |
| Client.Java-006 | Low | Resolved | Performance & resource management | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:323-328`, `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/GalaxyRepositoryClient.java:279-284` |
| Client.Java-007 | Low | Resolved | Testing coverage | `clients/java/mxgateway-client/src/test/java/com/dohertylan/mxgateway/client/` |
| Client.Java-008 | Low | Resolved | Error handling & resilience | `clients/java/mxgateway-client/src/main/java/com/dohertylan/mxgateway/client/MxGatewayClient.java:298-304` |
@@ -284,6 +265,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Python-029 | Low | Resolved | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway/options.py:78-90` |
| Client.Python-030 | Low | Resolved | Code organization & conventions | `clients/python/pyproject.toml:17` |
| Client.Python-031 | Low | Resolved | Testing coverage | `clients/python/tests/test_tls.py:34`, `clients/python/pyproject.toml:53-56` |
| Client.Python-032 | Low | Resolved | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` |
| Client.Python-033 | Low | Resolved | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` |
| Client.Python-034 | Low | Resolved | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` |
| Client.Python-035 | Low | Resolved | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` |
| Client.Rust-004 | Low | Resolved | Documentation & comments | `clients/rust/src/version.rs:7` |
| Client.Rust-007 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:14-55` |
| Client.Rust-008 | Low | Resolved | Performance & resource management | `clients/rust/src/value.rs:161-261` |
@@ -301,6 +286,11 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Client.Rust-027 | Low | Resolved | Documentation & comments | `clients/rust/.cargo/config.toml:1-9` |
| Client.Rust-028 | Low | Resolved | mxaccessgw conventions | `clients/rust/crates/mxgw-cli/src/main.rs:1126-1166` |
| Client.Rust-032 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md`; surface in `clients/rust/src/galaxy.rs:281-379` |
| Client.Rust-034 | Low | Resolved | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` |
| Client.Rust-035 | Low | Resolved | Security | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` |
| Client.Rust-036 | Low | Resolved | Design-document adherence | `clients/rust/RustClientDesign.md:351` |
| Client.Rust-037 | Low | Resolved | Design-document adherence | `clients/rust/README.md:164-179` |
| Client.Rust-038 | Low | Resolved | Testing coverage | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` |
| Contracts-001 | Low | Resolved | Design-document adherence | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
| Contracts-003 | Low | Won't Fix | Code organization & conventions | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
| Contracts-004 | Low | Resolved | Documentation & comments | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
@@ -318,6 +308,9 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Contracts-017 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:23-29` (the `rpc QueryActiveAlarms` block) |
| Contracts-018 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs:396` (`ActiveAlarmSnapshot_RoundTripsAllFields`) |
| Contracts-019 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:850-851` (`ActiveAlarmSnapshot`), `:318-324` (`AlarmProviderMode`) |
| Contracts-020 | Low | Resolved | Design-document adherence | `gateway.md:1087,1101-1102` |
| Contracts-021 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` |
| Contracts-022 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| IntegrationTests-007 | Low | Resolved | Concurrency & thread safety | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20`, `src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5`, `src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9` |
| IntegrationTests-008 | Low | Resolved | Code organization & conventions | `src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs`, `src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs`, `src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs` |
| IntegrationTests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375` |
@@ -336,6 +329,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| IntegrationTests-027 | Low | Resolved | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:4-5,134` |
| IntegrationTests-028 | Low | Resolved | Design-document adherence | `src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:120-161`, `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardServiceCollectionExtensions.cs:35` |
| IntegrationTests-029 | Low | Resolved | Documentation & comments | `docs/GatewayTesting.md:218-224` |
| IntegrationTests-030 | Low | Resolved | Documentation & comments | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` |
| IntegrationTests-031 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` |
| IntegrationTests-032 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` |
| IntegrationTests-033 | Low | Deferred | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` |
| Server-007 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70` |
| Server-008 | Low | Resolved | Performance & resource management | `src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189` |
| Server-009 | Low | Resolved | Error handling & resilience | `src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32` |
@@ -372,6 +369,7 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Server-050 | Low | Resolved | Error handling & resilience | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:42-75,92-125` |
| Server-052 | Low | Resolved | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Server/Alarms/IAlarmWatchListResolver.cs:24-30`, `src/ZB.MOM.WW.MxGateway.Server/Alarms/AlarmWatchListResolver.cs:101-114`, `docs/GatewayConfiguration.md:247` |
| Server-053 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmWatchListResolverTests.cs`, `src/ZB.MOM.WW.MxGateway.Tests/Alarms/GatewayAlarmMonitorProviderModeTests.cs` |
| Server-055 | Low | Resolved | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` |
| Tests-007 | Low | Resolved | Code organization & conventions | `src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682`, `src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324`, `src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460`, `src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233` |
| Tests-008 | Low | Resolved | mxaccessgw conventions | `src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9`, `src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3`, `src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1` |
| Tests-009 | Low | Resolved | Documentation & comments | `src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365` |
@@ -395,6 +393,10 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Tests-033 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardAlarmProviderStatus.cs`, `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardBrowseAndAlarmModelTests.cs:140-195` |
| Tests-034 | Low | Resolved | mxaccessgw conventions | `src/ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorSeamTests.cs:1-15` |
| Tests-035 | Low | Resolved | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Tests/Alarms/AlarmFailoverEndToEndTests.cs:315-329` |
| Tests-036 | Low | Resolved | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` |
| Tests-037 | Low | Won't Fix | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Tests-038 | Low | Resolved | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` |
| Tests-039 | Low | Resolved | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/GatewaySessionDashboardMirrorTests.cs` (`DashboardMirror_AndGrpcSubscriber_BothReceiveEvents`) |
| Worker-009 | Low | Resolved | Performance & resource management | `src/MxGateway.Worker/Ipc/WorkerFrameReader.cs:31,49`, `src/MxGateway.Worker/Ipc/WorkerFrameWriter.cs:57-58` |
| Worker-010 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Conversion/VariantConverter.cs:204-226` |
| Worker-011 | Low | Resolved | Correctness & logic bugs | `src/MxGateway.Worker/Ipc/WorkerPipeClient.cs:169-171` |
+20 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 2 |
| Open findings | 0 |
## Checklist coverage
@@ -1047,13 +1047,13 @@ Additionally, `GatewayAlarmMonitor.ApplyProviderModeChangeAsync` increments the
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) |
| Status | Open |
| Status | Resolved |
**Description:** The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still states "no reconnectable sessions for v1" and "one active StreamEvents subscriber per session for v1", and still files both as post-v1 "Later Revisit Items". `CLAUDE.md` likewise still says these are "explicitly out of scope". This is the stale-prose-vs-shipped-behavior drift the "update docs in the same change as the source" rule prohibits.
**Recommendation:** Update both `DesignDecisions.md` sections and the revisit list to describe the shipped behavior (gated by `AllowMultipleEventSubscribers`, `DetachGraceSeconds`, replay options), and amend the CLAUDE.md convention bullet.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: updated `docs/DesignDecisions.md` (Session Reconnect section rewritten to describe the shipped detach-grace + replay-on-reconnect behavior with config references; Event Subscribers section rewritten to describe the config-gated multi-subscriber fan-out, mode-dependent `FailFast` semantics, and internal vs external subscriber distinction; Later Revisit Items list removes the two shipped items and records them as shipped with config cross-references) and the `CLAUDE.md` conventions bullet to describe the shipped config-gated multi-subscriber + reconnect-replay behavior while preserving the one-worker-per-session invariant.
### Server-055
@@ -1062,10 +1062,25 @@ Additionally, `GatewayAlarmMonitor.ApplyProviderModeChangeAsync` increments the
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` |
| Status | Open |
| Status | Resolved |
**Description:** When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` and detach-grace is enabled — stamps `_detachedAtUtc = now`. A freshly-`Ready` session that never had a successful subscriber thus enters the detach-grace window on a failed first attach, making it sweep-eligible after `DetachGraceSeconds` even though no client ever streamed. Impact is minor (the lease still protects it; a later successful attach clears the stamp) but the "last subscriber dropped" semantics are violated.
**Recommendation:** Only stamp `_detachedAtUtc` on a detach that mirrors a prior successful attach — roll the failure path back without entering grace, or guard the stamp on "a subscriber had previously been registered."
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: `GatewaySession` now tracks `_everHadEventSubscriber` (a `bool` field, set to `true` inside `MarkEventSubscriberAttached()` which is called only after `StartDistributorAndRegister` succeeds). `DetachEventSubscriber` gates the `_detachedAtUtc` stamp on `_everHadEventSubscriber`, so the catch-path rollback decrements the reserved slot but does not enter detach-grace. A regression `[Fact]` (`DetachGrace_FailedFirstAttach_DoesNotEnterGrace`) in `GatewaySessionTests.cs` verifies that after a failed first attach the session has `DetachedAtUtc == null`, `ActiveEventSubscriberCount == 0`, and `IsDetachGraceExpired` returns `false` regardless of clock advance.
### Server-056
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionEventDistributor.cs:296-310,449-453,629-635` |
| Status | Resolved |
**Description:** `SessionEventDistributor` orphaned any subscriber that registered in the window AFTER the pump ran its final `CompleteAllSubscribers` sweep (the event source completed or faulted and the pump exited) but BEFORE `DisposeAsync`. `RegisterSubscriber`/`RegisterWithReplay` guarded only against `_disposed`, not against the pump having already completed, so a subscriber added in that window got a channel the now-exited pump would never complete — its reader (`ReadAllAsync`) waited forever. In production this is the edge case of a client calling `StreamEvents` after the worker's event stream has ended but before the session is torn down. Discovered while diagnosing an order-dependent hang in `GatewaySessionDashboardMirrorTests`, where a gRPC subscriber attached after a fast-completing worker stream had already drained (its `await foreach` has no timeout, so the orphaned channel surfaced as an infinite hang rather than a clean failure).
**Recommendation:** Record terminal completion (a `_completed` flag plus the terminal error) under `_lifecycleLock` and have both register paths complete a late registrant's channel immediately with the same terminal state.
**Resolution:** 2026-06-17: added `_completed` + `_completionError`, set inside `CompleteAllSubscribers` under `_lifecycleLock` — the same lock the register paths take, so completion and registration serialize (a subscriber added before the sweep is completed by the loop; one racing in after sees `_completed` and self-completes). `Register` and `RegisterWithReplay` now `TryComplete` a late registrant's channel with `_completionError` when `_completed`; a late resume still receives its retained replay batch, then a cleanly-completed empty live channel. No lock-ordering risk — `CompleteAllSubscribers` takes only `_lifecycleLock`, and the subscriber channels use `AllowSynchronousContinuations=false` so `TryComplete` under the lock runs no continuation inline. New regression `[Fact]` `Register_AfterSourceCompletes_CompletesLateSubscriberInsteadOfHanging` (`SessionEventDistributorTests.cs`) registers a subscriber after the pump completes and asserts its channel completes (bounded read); verified it fails without the fix (5 s timeout) and passes with it (12 ms). The racy `GatewaySessionDashboardMirrorTests.DashboardMirror_AndGrpcSubscriber_BothReceiveEvents` that exposed it was also made deterministic — see Tests-039.
+22 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 3 |
| Open findings | 0 |
## Checklist coverage
@@ -672,13 +672,13 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** Three new validator rules — `DetachGraceSeconds >= 0` (GatewayOptionsValidator.cs:185-186), `ReplayBufferCapacity >= 0` (:215-216), `ReplayRetentionSeconds >= 0` (:219-220) — have no tests, while the sibling new options (`MaxEventSubscribersPerSession`, `WorkerReadyWaitTimeoutMs`) do. A regression dropping/inverting any of the three guards would pass with no failing test.
**Recommendation:** Add boundary theories mirroring the `MaxEventSubscribersPerSession` pattern: a failing case (`-1`) asserting the message contains each config path, and a succeeding boundary case (`0`).
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — Added six tests to `GatewayOptionsValidatorTests.cs` covering all three guards: `Validate_Fails_WhenDetachGraceSecondsIsNegative` / `Validate_Succeeds_WhenDetachGraceSecondsIsZero` (via `CloneWithSessions`); `Validate_Fails_WhenReplayBufferCapacityIsNegative` / `Validate_Succeeds_WhenReplayBufferCapacityIsZero` and `Validate_Fails_WhenReplayRetentionSecondsIsNegative` / `Validate_Succeeds_WhenReplayRetentionSecondsIsZero` (via a new `CloneWithEvents` helper). Each failing case asserts the failure message contains the config path; each boundary case asserts `Succeeded`. Mirrors the `MaxEventSubscribersPerSession` / `WorkerReadyWaitTimeoutMs` pattern.
### Tests-037
@@ -687,13 +687,13 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Open |
| Status | Won't Fix |
**Description:** The reconnect/replay contract surface (`ReplayGap` message, `MxEvent.replay_gap = 14`, `StreamEventsRequest.after_worker_sequence`) has no protobuf serialize/parse round-trip test pinning the wire shape and the documented sentinel invariant (family UNSPECIFIED, body oneof and per-item fields unset). Behavior is exercised in EventStreamServiceTests; this is a wire-contract gap.
**Recommendation:** Add a round-trip test building an `MxEvent` with `ReplayGap` populated, asserting the two sequence fields survive and the sentinel invariants hold (field 14, `Family == Unspecified`, `BodyCase` unset).
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16: covered by the ReplayGap round-trip + descriptor-pin test added under Contracts-022 in ProtobufContractRoundTripTests.cs; a duplicate here would be redundant.
### Tests-038
@@ -702,10 +702,25 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` |
| Status | Open |
| Status | Resolved |
**Description:** `DrainUntilFaultAsync` relies on the channel completing WITH a fault so `WaitToReadAsync` re-throws. Correct for current callers, but if reused on a channel that completes gracefully, `WaitToReadAsync` returns false without throwing and the helper spins in a tight CPU loop with no escape (ReadTimeout bounds only the individual wait). A maintenance hazard, not a current bug.
**Recommendation:** When `WaitToReadAsync` returns false, await `reader.Completion` (surfaces the fault or completes cleanly) and `Assert.Fail` on graceful completion, so the helper fails fast instead of spinning.
**Resolution:** _(empty until closed)_
**Resolution:** 2026-06-16 — When `WaitToReadAsync` returns `false` (graceful completion), the helper now awaits `reader.Completion` (propagating any stored fault) and then calls `Assert.Fail` so the helper fails fast rather than spinning; the fault-path behavior (re-throw from `WaitToReadAsync`) is preserved unchanged.
### Tests-039
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/GatewaySessionDashboardMirrorTests.cs` (`DashboardMirror_AndGrpcSubscriber_BothReceiveEvents`) |
| Status | Resolved |
**Description:** `DashboardMirror_AndGrpcSubscriber_BothReceiveEvents` attached its gRPC subscriber via `StreamEventsAsync` AFTER `MarkReady()` had already started the pump draining a fast-completing 3-event fake worker — a register-vs-pump race. It passed alone but hung the whole `GatewaySessionDashboardMirrorTests` class when another test ran first (warm JIT let the pump drain and complete before the gRPC subscriber registered). Its `await foreach` over the gRPC stream uses `CancellationToken.None` with no timeout, so the race surfaced as an indefinite hang rather than a clean failure (unlike the sibling tests' `WaitUntilAsync`, which self-times-out at 5s). This exposed the production race fixed under Server-056.
**Recommendation:** Make the test deterministic — hold the worker stream until both the dashboard mirror and the gRPC subscriber have attached, then release, so neither subscriber can miss an event regardless of scheduling.
**Resolution:** 2026-06-17 — added a release-gate to the test's `FakeWorkerClient` (`HoldEventsUntilReleased()` / `ReleaseEvents()`; `ReadEventsAsync` awaits the gate before yielding, ungated by default so other tests are unaffected). The test now holds the stream, starts the gRPC reader on a background task, waits for `session.ActiveEventSubscriberCount == 1` (the internal dashboard mirror is excluded from the count, so this confirms the gRPC subscriber attached), then releases — both subscribers deterministically receive all three events. With the Server-056 production fix in place, the full `GatewaySessionDashboardMirrorTests` class now passes (5/5) instead of hanging.