code-reviews: 2026-06-16 re-review of all 11 modules at 8df5ab3

Re-review of the 99-commit delta since the 410acc9 baseline (session-resilience
epic, dashboard disable-login, galaxy browse fixes, and stillpending §8).

44 new Open findings, no Critical/High:
- Server 2 (incl. Medium design-doc drift), Worker 0 (026/027/028 confirmed
  resolved), Contracts 3, Tests 3, Worker.Tests 3, IntegrationTests 4
- Client.Dotnet 4 (Medium env-var key redaction), Client.Go 5 (Medium watch
  drain), Client.Java 9 (Medium overflow race), Client.Python 5 (Medium README
  API), Client.Rust 6 (Medium --tls/--plaintext downgrade)

README regenerated; regen-readme.py --check passes.
This commit is contained in:
Joseph Doherty
2026-06-16 18:57:56 -04:00
parent 8df5ab381a
commit 25d04ec37e
12 changed files with 936 additions and 44 deletions
+80 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `clients/dotnet` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 4 |
## Checklist coverage
@@ -603,3 +603,80 @@ Net effect at HEAD: `dotnet build clients/dotnet/ZB.MOM.WW.MxGateway.Client.slnx
**Recommendation:** Either (a) tighten the documented contract to "ExpandAsync is safe to call concurrently, but Children/IsExpanded must only be read after the awaited ExpandAsync completes (no concurrent reader/expander)", or (b) make the publication safe: write `_isExpanded` via `Volatile.Write` and read via `Volatile.Read`, and return an immutable snapshot from `Children` (e.g. assign a completed `IReadOnlyList` under the lock and expose that field) so lock-free readers never observe a partially-populated list. Option (a) is the smallest change and matches the realistic usage (UI thread expands then renders).
**Resolution:** 2026-06-15 — Confirmed against source: `Children => _children` returned the live mutable backing `List<LazyBrowseNode>` and `IsExpanded => _isExpanded` read a plain `bool`, while `ExpandAsync` appended to that same list under `_expandLock` with no release/acquire barrier to lock-free readers — so a concurrent reader could enumerate a mid-append list and throw `InvalidOperationException` ("collection was modified"). Applied option (b) (safe publication): `ExpandAsync` now accumulates children into a method-local `List<LazyBrowseNode>` and, only when fully drained across all pages, publishes it via `Volatile.Write(ref _children, children)` (release) immediately before setting the now-`volatile bool _isExpanded = true`. The `_children` field is an `IReadOnlyList<LazyBrowseNode>` read via `Volatile.Read` from the `Children` getter (acquire), so a reader that observes `IsExpanded == true` always sees the fully-populated snapshot and never enumerates a partially-built list. Updated the `ExpandAsync` `<remarks>` to document the strengthened concurrent-read guarantee. Regression test `LazyBrowseNodeTests.Expand_ConcurrentReadOfChildren_NeverTearsAndPublishesAtomically` gates the child-page RPCs (via a new `FakeGalaxyRepositoryTransport.BrowseChildrenGate` hook) to hold the expand mid-flight while a background reader spins enumerating `Children` and reading `IsExpanded`, asserting no exception escapes and that once `IsExpanded` is true the published snapshot has all five children. Verified red against the pre-fix code (the reader threw `InvalidOperationException: Collection was modified` deterministically across three runs) and green after the fix.
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the .NET client delta: `LazyBrowseNode` lazy paging + tests, the new `MxGatewayClientCli` galaxy-browse surface + tests, `GalaxyClientFactory`/adapter seam. Client.Dotnet-025 (LazyBrowseNode publish ordering) confirmed resolved. One Medium security regression.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Dotnet-026 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | Client.Dotnet-028 |
| 6 | Performance & resource management | Client.Dotnet-027 |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | Client.Dotnet-029 |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | No issues found |
### Client.Dotnet-026
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) |
| Status | Open |
**Description:** Client.Dotnet-015 extended `isLongRunning` to include the bench commands so they aren't silently cancelled by the default 30s CTS. The new `galaxy-browse` command is NOT in `isLongRunning`. A `galaxy-browse --depth N` tree walk on a large Galaxy can exceed 30s (sequential paginated RPCs per node); the CTS fires and the OCE escapes as a non-zero exit with no output — the same silent failure the bench commands were exempted from.
**Recommendation:** Add `"galaxy-browse"` to the `isLongRunning` set alongside `galaxy-watch`/bench, so it defaults to unlimited wall-clock and only applies `CancelAfter` with an explicit `--timeout`.
**Resolution:** _(empty until closed)_
### Client.Dotnet-027
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` |
| Status | Open |
**Description:** `LazyBrowseNode` allocates one `SemaphoreSlim _expandLock = new(1,1)` per node and never disposes it (the type is not IDisposable). For a large Galaxy browse tree (thousands of nodes), live SemaphoreSlim instances accumulate; OS handles are released only on finalization. Negligible for small trees, meaningful for long-lived large trees.
**Recommendation:** Replace the once-only async gate with a non-disposable primitive (e.g. `Lazy<Task>`-based dedup) or make `LazyBrowseNode` IDisposable and dispose the semaphore. Document the chosen lifetime contract.
**Resolution:** _(empty until closed)_
### Client.Dotnet-028
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | `clients/dotnet/.../MxGatewayClientCli.cs:156` |
| Status | Open |
**Description:** Client.Dotnet-008 was recorded resolved by adding a `TryResolveApiKey` helper resolving both `--api-key` and the `--api-key-env` env-var path, wired into the error-redaction catch block. At HEAD the catch block reads `arguments.GetOptional("api-key")` only — the pre-008 code. When the key is sourced from the env var, `GetOptional("api-key")` returns null, `Redact(message, null)` is a no-op, and an exception message echoing the bearer key would print it raw to stderr. The existing regression test only covers the `--api-key` direct path, so it passes against the broken code. (Claimed regression — verify root cause before fixing.)
**Recommendation:** Restore the `TryResolveApiKey` pattern (resolve `--api-key` then the `--api-key-env`-named env var, default `MXGATEWAY_API_KEY`) in the catch block, and add a regression test that sources the key from the env var and asserts it is redacted in stderr.
**Resolution:** _(empty until closed)_
### Client.Dotnet-029
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/dotnet/.../IMxGatewayCliClient.cs:6` |
| Status | Open |
**Description:** `IMxGatewayCliClient` is a public interface with no type-level `<summary>` XML doc. The Client.Dotnet-013 resolution recorded adding one; at HEAD it is absent. No CS1591 fires (GenerateDocumentationFile now scoped to the packable library only), but the public extension point should follow the public-surface doc convention.
**Recommendation:** Add a one-line `<summary>` describing the interface and noting `MxGatewayCliClientAdapter` is the production binding.
**Resolution:** _(empty until closed)_
+95 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `clients/go` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 5 |
## Checklist coverage
@@ -116,6 +116,23 @@ justified — not a finding. The `LazyBrowseNode` concurrency model
| 9 | Testing coverage | No issues found — new walker, pagination, dup-token, filter-forwarding, and TLS-posture paths are all covered. |
| 10 | Documentation & comments | New issue: README "Installing the Go client" recommends the `GONOSUMCHECK` env var, which was removed from the Go toolchain in 1.13 and is a no-op on Go 1.26 (Client.Go-029). |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the Go client delta: new `ping`/`galaxy-browse` CLI commands, `Write2`/bulk additions, session.go. gofmt/vet/build clean. Two claimed regressions of prior resolutions (Go-013 drain, Go-020 signal handler) — verify root cause before fixing.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Go-031 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | Client.Go-030 |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | Client.Go-032 |
| 9 | Testing coverage | Client.Go-033 |
| 10 | Documentation & comments | Client.Go-034 |
## Findings
### Client.Go-001
@@ -706,3 +723,78 @@ if ($dirty) {
**Recommendation:** Drop `GONOSUMCHECK` and document the current knobs: set `GOPRIVATE=gitea.dohertylan.com/*` (covers both sum-db bypass and direct VCS fetch), or for the checksum database specifically `GONOSUMCHECK`'s modern equivalent `GONOSUMDB` is also gone — use `GONOSUMCHECK`→`GOFLAGS=-insecure` only for plaintext, and `GONOSUMCHECK`. Concretely: "set `GOPRIVATE=gitea.dohertylan.com/*` (this disables both the checksum database and the public module proxy for that path); add `GOINSECURE=gitea.dohertylan.com/*` if the host serves the module over plain HTTP."
**Resolution:** 2026-06-15 — Dropped the dead `GONOSUMCHECK` advice from the "Installing the Go client" section of `clients/go/README.md`; it now documents `GOPRIVATE=gitea.dohertylan.com/*` (which bypasses both the public module proxy and checksum-database verification for that path) plus `GOINSECURE=gitea.dohertylan.com/*` for plain-HTTP hosts.
### Client.Go-030
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `clients/go/cmd/mxgw-go/main.go:1491-1494` |
| Status | Open |
**Description:** `runGalaxyWatch`'s limit-reached branch calls `cancelStream()` and returns WITHOUT draining the buffered `events` channel, unlike the signal-cancel branch which drains. This is the shape Client.Go-013's resolution claimed to have fixed ("now drains via for range events"). The WatchDeployEvents goroutine may still be blocked sending into the 16-deep channel; it exits via ctx cancellation (not a permanent leak) but remains alive until that propagates, racing `defer client.Close()`. (Claimed regression — verify root cause.)
**Recommendation:** After `cancelStream()` in the limit-reached branch, drain: `for range events {}`, mirroring the signal-cancel branch.
**Resolution:** _(empty until closed)_
### Client.Go-031
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/go/cmd/mxgw-go/main.go:1037-1046` |
| Status | Open |
**Description:** `closeSmokeSession` registers `defer cancel()` twice on the same `cancel` variable across two `context.WithTimeout` calls when the deadline-shortening branch fires. Because `cancel` is reassigned, both defers end up calling the second context's cancel (idempotent, harmless today), while the first context is released by an explicit `cancel()`. The double-defer-on-reassigned-variable is fragile: removing the explicit `cancel()` in a future refactor would leak the first context's timer goroutine.
**Recommendation:** Use a distinct variable for the second cancel, or compute the close timeout once before allocating a single context.
**Resolution:** _(empty until closed)_
### Client.Go-032
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/go/cmd/mxgw-go/main.go:839-841` |
| Status | Open |
**Description:** `runStreamEvents` does not install a `signal.NotifyContext` handler, while `runStreamAlarms` and `runGalaxyWatch` do. Client.Go-020's resolution claimed this was added. Without a signal-aware parent context, Ctrl+C kills the process without running `defer subscription.Close()`/`client.Close()`, so the gateway sees a torn connection rather than a clean `codes.Canceled`. (Claimed regression — verify root cause.)
**Recommendation:** Wrap `ctx` with `signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)` (defer the stop) before deriving `streamCtx`, matching the other two stream commands.
**Resolution:** _(empty until closed)_
### Client.Go-033
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/go/cmd/mxgw-go/main_test.go` |
| Status | Open |
**Description:** Gaps vs prior coverage: (1) `TestRunBenchReadBulkRejectsNonPositiveDuration` (named in Client.Go-021's resolution) is absent — the `-duration-seconds`-positive guard at main.go:619 is untested; (2) `runStreamEvents` has no CLI-level test (session-id-required and limit paths untested); (3) `TestRunWriteBulkVariantRejectsMismatchedHandlesAndValues` (Client.Go-021 deliverable) is absent — the len-mismatch guard at main.go:508-510 is untested.
**Recommendation:** Add the three missing tests; all run through `runWithIO` without a fake server (except the stream-events one which can reuse the ping test's fake-server pattern).
**Resolution:** _(empty until closed)_
### Client.Go-034
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/go/README.md:245-263` |
| Status | Open |
**Description:** The README CLI example table lists ~12 commands but the binary now exposes ~27 subcommands (per `writeUsage`). Absent: `ping`, `galaxy-browse`, `batch`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`, `bench-read-bulk`, `stream-alarms`, `acknowledge-alarm`, and more. `batch` (the cross-language harness interface with an EOR sentinel + 16 MiB line cap) is undocumented entirely.
**Recommendation:** Add a complete subcommand reference, and document the `batch` EOR-sentinel protocol and line cap.
**Resolution:** _(empty until closed)_
+155 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `clients/java` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 9 |
## Checklist coverage
@@ -106,6 +106,23 @@ Client.Java-001..036 are unchanged.
| 9 | Testing coverage | No issues found. The browse surface has thorough library tests in `GalaxyRepositoryClientTests` (roots, expand-populates, idempotent-single-RPC, unknown-parent not-found, multi-page gather, concurrent-callers-one-RPC, filter forwarding, repeated-page-token rejection); TLS lenient/strict paths are covered by `MxGatewayClientTlsTests` against a real in-process TLS server. |
| 10 | Documentation & comments | Issue found: the README "Browsing lazily" first code snippet calls `galaxy.browseChildren(BrowseChildrenRequest…)`, but no such method exists on `GalaxyRepositoryClient` — the raw single-RPC method is `browseChildrenRaw(BrowseChildrenRequest)`; the documented snippet does not compile (Client.Java-037). |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the Java client delta: the §8 `GalaxyClientFactory` seam, `InProcessGatewayHarness`, and the §8 CLI test coverage. Seam is behavior-preserving; harness channel lifecycle correct. One Medium concurrency item in the pre-existing stream-alarms overflow handler.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Java-040, Client.Java-041 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | Client.Java-040 |
| 4 | Error handling & resilience | Client.Java-042 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | Client.Java-043, Client.Java-044 |
| 9 | Testing coverage | Client.Java-045, Client.Java-046 |
| 10 | Documentation & comments | Client.Java-047, Client.Java-048 |
## Findings
### Client.Java-001
@@ -728,6 +745,141 @@ BrowseChildrenReply reply = galaxy.browseChildren(
**Resolution:** 2026-06-15 — Confirmed against source: `MxGatewayClientOptions` (`zb-mom-ww-mxgateway-client/.../MxGatewayClientOptions.java:108,260`) exposes `requireCertificateValidation()` and a `Builder.requireCertificateValidation(boolean)`, but the CLI `CommonOptions` in `MxGatewayCli.java` declared no flag and `toClientOptions()` never set it, forcing the lenient default on every non-pinned TLS CLI connection. Added a bare-boolean `@Option(names = "--require-certificate-validation")` field to `CommonOptions` (defaults to `false`, preserving the lenient default; mirrors the existing `--plaintext` flag-style option), propagated it through `toClientOptions()` via `.requireCertificateValidation(requireCertificateValidation)`, and added it to `redactedJsonMap()` so `--json` output reflects the effective trust posture. Documented the new flag and the lenient-by-default trust posture in `clients/java/README.md`. Note: the Client.Java-025 precedent (`shutdownTimeout`) was applied to the pre-rename `mxgateway-cli` module and is not present in this renamed `zb-mom-ww-mxgateway-cli` `toClientOptions()`; I mirrored the live `--ca-file`/`--server-name-override` TLS-option plumbing pattern instead, which is the correct precedent here. Regression tests in `MxGatewayCliTests`: `requireCertificateValidationFlagPropagatesThroughToClientOptions` (drives `acknowledge-alarm --require-certificate-validation` through a new `CapturingClientFactory` that records `options.toClientOptions()` and asserts `MxGatewayClientOptions.requireCertificateValidation()` is `true`) and `requireCertificateValidationDefaultsToLenientWhenFlagAbsent` (asserts the flag defaults to `false`). The capturing factory exercises the real `toClientOptions()` propagation, stronger than a parse-only check.
### Client.Java-040
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1552-1561` |
| Status | Open |
**Description:** The `stream-alarms` overflow handler does `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)` non-atomically on an `ArrayBlockingQueue` shared with the gRPC delivery thread. In production gRPC (netty I/O thread), a concurrent `onNext` between the clear and the offers can re-enqueue a normal message, displacing the overflow exception so the drain loop hits the normal message and may exit before reaching the exception — exiting 0 on a truncated feed. Same race class as Client.Java-002/033.
**Recommendation:** Guard the overflow transition with an `AtomicBoolean` (mirror `MxGatewayStreamSubscription.terminate()`'s terminated-flag + lock) instead of re-clearing the queue.
**Resolution:** _(empty until closed)_
### Client.Java-041
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:2187-2194` |
| Status | Open |
**Description:** `jsonString` escapes only `\`, `"`, `\r`, `\n` — not `\t`, `\b`, `\f`, or U+0000U+001F/U+007F. A tag address/message/reference containing a tab produces malformed JSON (RFC 8259). Affects the hand-rolled `jsonObject`/`jsonString`/`jsonValue` output paths (the protobuf `JsonFormat` path is spec-correct).
**Recommendation:** Add `\t`/`\b`/`\f` escapes and `\u00XX` for control chars, or route all JSON through a real JSON library.
**Resolution:** _(empty until closed)_
### Client.Java-042
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1565-1567` |
| Status | Open |
**Description:** `StreamAlarmsCommand.onError` calls `queue.offer(error)` without checking the return value. If the queue is full when a transport error arrives, the error is dropped and the drain loop blocks forever on `queue.take()`. Same class as Client.Java-033 on the error path.
**Recommendation:** Reserve a sentinel slot or use the `terminate(Throwable)` guard from `MxEventStream`; ensure the drain always sees a terminal item.
**Resolution:** _(empty until closed)_
### Client.Java-043
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:241-264` |
| Status | Open |
**Description:** `galaxyBrowseParentZeroEmitsWarningToStderr` calls `MxGatewayCli.execute(new FakeClientFactory(), ...)` for a galaxy-browse command, which wires the real `GrpcGalaxyClientFactory` and constructs a live Netty channel to localhost:5000 as a side effect (asserting only the warning). Wasteful and non-deterministic if port 5000 is reachable.
**Recommendation:** Use `executeGalaxy(...)` with a `GalaxyClientFactory` stub that throws, so only the warning path runs.
**Resolution:** _(empty until closed)_
### Client.Java-044
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientVersion.java:12` |
| Status | Open |
**Description:** `CLIENT_VERSION = "0.1.0"` is out of sync with Gradle `version = '0.1.1'` (cross-ref `clients/java/build.gradle:6`). The `version` command advertises 0.1.0 while the published artifact is 0.1.1; consumers can't use the version string as a reliable artifact check.
**Recommendation:** Bump `CLIENT_VERSION` to `0.1.1` (and the two test assertions), or source it from a Gradle-generated properties file.
**Resolution:** _(empty until closed)_
### Client.Java-045
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java` |
| Status | Open |
**Description:** The harness implements only `streamEvents`/`closeSession` (gateway) and `discoverHierarchy`/`watchDeployEvents` (galaxy); all other RPCs return gRPC UNIMPLEMENTED. This is undocumented, so a future test exercising invoke/register through the harness would silently get UNIMPLEMENTED.
**Recommendation:** Add a Javadoc note enumerating implemented RPCs and warning that others return UNIMPLEMENTED by design.
**Resolution:** _(empty until closed)_
### Client.Java-046
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:680-696` |
| Status | Open |
**Description:** `streamAlarmsCommandFailsFastOnQueueOverflow` delivers all 2000 onNext synchronously from within `streamAlarms`, so `subscriptionRef` is still null when the overflow fires — the `sub.cancel()` branch is never exercised. The test also doesn't assert the overflow message text. It passes for a reason that doesn't generalize to async gRPC delivery.
**Recommendation:** Deliver messages asynchronously so the cancel path runs, and assert the overflow error text appears in output.
**Resolution:** _(empty until closed)_
### Client.Java-047
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/README.md` |
| Status | Open |
**Description:** README advertises the `0.1.1` artifact coordinate (Gitea Maven section) while the `version` command reports `0.1.0` — the user-visible symptom of Client.Java-044. Cross-ref `MxGatewayClientVersion.java:12`.
**Recommendation:** Resolved by fixing Client.Java-044 (sync the compiled-in version).
**Resolution:** _(empty until closed)_
### Client.Java-048
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:88-105` |
| Status | Open |
**Description:** The public `execute(PrintWriter, PrintWriter, String...)` Javadoc calls it "Test-friendly entry point", but it wires `GrpcMxGatewayCliClientFactory` with no injection — the actual test seam is the package-private `execute(MxGatewayCliClientFactory, ...)` / `commandLine(...)` overload. Misleading.
**Recommendation:** Clarify the Javadoc to direct readers to the injectable overload for testing.
**Resolution:** _(empty until closed)_
### Client.Java-039
+95 -3
View File
@@ -4,13 +4,30 @@
|---|---|
| Module | `clients/python` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 5 |
## Checklist coverage
### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the Python client delta: new galaxy CLI commands, options.py TLS/auth, large test additions. Prior Client.Python-027..031 confirmed resolved. One claimed regression (Python-004 dead variable) and one Medium README/API mismatch.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Python-032, Client.Python-033, Client.Python-034 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | Client.Python-035 |
| 9 | Testing coverage | Client.Python-036 |
| 10 | Documentation & comments | Client.Python-036 |
### 2026-06-15 re-review (commit 410acc9)
Re-review pass at `410acc9`. The diff against the previous review base
@@ -1438,3 +1455,78 @@ under `[tool.pytest.ini_options]` in `clients/python/pyproject.toml`.
`python -m pytest` now reports no `PytestUnknownMarkWarning` (full run: 91
passed, 1 skipped, 0 warnings; previously 1 warning). The `tls`-marked
`tests/test_tls.py` module is the guard — its run is now warning-free.
### Client.Python-032
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` |
| Status | Open |
**Description:** `_smoke` reintroduces the dead `closed = False` / `if not closed:` guard that Client.Python-004's resolution claimed to have removed via `async with session:`. `closed` is never reassigned, so the guard is always true. Behavior is correct (session always closed) but the dead variable misleads readers into expecting an early-close path. (Claimed regression — verify root cause.)
**Recommendation:** Use `async with session:` or drop the `closed` variable and close unconditionally.
**Resolution:** _(empty until closed)_
### Client.Python-033
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` |
| Status | Open |
**Description:** `_parse_string_list` always emits `param_hint="--items"`, but it is also called from `_build_write_bulk_entries` with `kwargs["values"]`. An empty `--values ""` on the write-bulk commands yields `Error: Invalid value for '--items': ...`, pointing at a flag that doesn't exist on those commands.
**Recommendation:** Add an optional `param_hint` parameter (default `--items`) and pass `--values` from the write-bulk caller.
**Resolution:** _(empty until closed)_
### Client.Python-034
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` |
| Status | Open |
**Description:** `_parse_int_list` does `int(item)` with no error handling. A non-numeric token (e.g. `--item-handles "10,abc"`) raises a raw `ValueError`, surfacing as an unformatted traceback interactively (other input errors raise `click.BadParameter`).
**Recommendation:** Wrap the conversion and re-raise as `click.BadParameter(..., param_hint="--item-handles")`.
**Resolution:** _(empty until closed)_
### Client.Python-035
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` |
| Status | Open |
**Description:** Two new public types — `BrowseChildrenOptions` (options.py) and `LazyBrowseNode` (galaxy.py) — are absent from `__init__.py`/`__all__`, so callers can't `from zb_mom_ww_mxgateway import BrowseChildrenOptions`, breaking the package-root import contract that `ClientOptions`/`GatewayClient`/etc. follow.
**Recommendation:** Re-export both from `__init__.py` and add them to `__all__`.
**Resolution:** _(empty until closed)_
### Client.Python-036
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Documentation & comments |
| Location | `clients/python/README.md:143-158` |
| Status | Open |
**Description:** The README "Browsing lazily" section's code example calls `galaxy.browse_children(...)`, a method that does not exist — the actual public low-level method is `browse_children_raw`. The example raises `AttributeError` at runtime. The README-parse test only covers shell CLI invocations, not Python code fragments, so it doesn't catch this.
**Recommendation:** Update the example/prose to `browse_children_raw(...)` (and promote the high-level `browse()`/`LazyBrowseNode` path), or add a `browse_children` alias. Add a `hasattr` test to catch future renames.
**Resolution:** _(empty until closed)_
+110 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `clients/rust` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 6 |
## Checklist coverage
@@ -115,6 +115,23 @@ Re-review pass at `410acc9`. The diff against `42b0037` (`git diff 42b0037..HEAD
| 9 | Testing coverage | No issues found in the new surface — the walker has six unit tests (roots, expand, idempotency, NotFound, multi-page, filter-forwarding) and TLS has four. Gap noted: `tls_with_require_certificate_validation_does_not_short_circuit` connects to a dead address, so it only asserts the guard does not fire and never exercises a real handshake — which is why the no-trust-roots defect in Client.Rust-031 is not caught by a test. |
| 10 | Documentation & comments | Issue found: the `alarm_feed_message_summary` / `alarm_feed_message_to_json` doc comments still say "three `payload` oneof cases" (`main.rs:1729,1755`) although the proto now has four; folded into Client.Rust-030's fix. The TLS doc inaccuracy is Client.Rust-031. |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the Rust client delta: options.rs TLS trust decision, mxgw-cli galaxy browse, Cargo metadata. Prior Client.Rust-030/031/032 confirmed resolved. fmt/clippy/test clean. One Medium TLS-downgrade correctness item.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Client.Rust-033, Client.Rust-034 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | Client.Rust-035 |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | Client.Rust-036, Client.Rust-037 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Client.Rust-038 |
| 10 | Documentation & comments | No issues found |
## Findings
### Client.Rust-001
@@ -762,3 +779,93 @@ This is masked by the tests: `tls_with_require_certificate_validation_does_not_s
**Recommendation:** Add a "Lazy browse" subsection to the Galaxy section of `RustClientDesign.md` enumerating `browse`, `browse_children_raw`, `BrowseChildrenOptions` (its filter fields and AND semantics), and `LazyBrowseNode` (the `Arc`-shared clone semantics, the idempotent single-RPC `expand`, the `has_children_hint`, and the internal paged `BrowseChildren` loop with its repeated-page-token guard). Cross-reference `docs/GalaxyRepository.md#browsechildren` for the wire-level request/filter semantics the README already links.
**Resolution:** 2026-06-15 — Confirmed by inspection that `RustClientDesign.md` had no Galaxy library-API coverage at all. Added a new "Galaxy Repository" section documenting `browse`, `browse_children_raw`, the `BrowseChildrenOptions` filter struct (all six fields, AND combination semantics, `include_attributes` tri-state), and `LazyBrowseNode` (`Arc`-shared clone semantics, `has_children_hint`, the idempotent single-RPC `expand` under an async mutex with page size 500, and the repeated-page-token `Error::InvalidArgument` guard), cross-referencing `docs/GalaxyRepository.md#browsechildren`. Also noted the fourth alarm `provider_status` oneof case in the Alarms section while resolving Client.Rust-030. Doc-only change verified by inspection; design-doc anchor target confirmed present.
### Client.Rust-033
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:485` |
| Status | Open |
**Description:** `ConnectionArgs::options()` computes plaintext as `!self.tls || self.plaintext`. With both `--tls` and `--plaintext` supplied, this is `true`, silently degrading to an unencrypted channel despite the explicit `--tls`. A security-sensitive footgun (e.g. a script auto-appending `--plaintext`).
**Recommendation:** Add clap `conflicts_with = "tls"` on `--plaintext` (reject the combo), or prefer `--tls` and warn.
**Resolution:** _(empty until closed)_
### Client.Rust-034
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` |
| Status | Open |
**Description:** `Command::Version` carries a `jsonl: bool` field that is never read; the dispatch arm matches `{ json, .. }` and discards `jsonl`. `mxgw version --jsonl` silently behaves as plain text.
**Recommendation:** Handle `jsonl` in the Version arm (treat like `--json`) or remove the unused field.
**Resolution:** _(empty until closed)_
### Client.Rust-035
| Field | Value |
|---|---|
| Severity | Low |
| Category | Security |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` |
| Status | Open |
**Description:** `--api-key-env` (default `MXGATEWAY_API_KEY`) names an env var read into an `ApiKey` Bearer token, but its clap help has no description of the expected value format. A user pointing it at another credential's env var would silently forward that credential to the gateway as a Bearer token. Low risk (redacted Debug; bounded to user's own shell) but an implicit-trust gap.
**Recommendation:** Add help text stating the variable must hold a value of the form `mxgw_<key-id>_<secret>`.
**Resolution:** _(empty until closed)_
### Client.Rust-036
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/RustClientDesign.md:351` |
| Status | Open |
**Description:** The new `galaxy browse` subcommand (with its filter/depth/json flags) is not listed in the "Test CLI" command table in RustClientDesign.md, which still reads `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}`.
**Recommendation:** Add `mxgw galaxy browse [...flags]` and note `--depth 0` = requested level only, `--depth N` eagerly expands, and `--parent-gobject-id` makes `--depth` a no-op.
**Resolution:** _(empty until closed)_
### Client.Rust-037
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `clients/rust/README.md:164-179` |
| Status | Open |
**Description:** The README "Browsing lazily" example calls `galaxy.browse_children(...).await?.into_inner()`, but the public API is `GalaxyClient::browse_children_raw` (the bare `browse_children` is the generated proto-client method, not public; and `browse_children_raw` returns the reply struct directly, no `.into_inner()`). The example would not compile.
**Recommendation:** Replace with `galaxy.browse_children_raw(BrowseChildrenRequest::default()).await?` (drop `.into_inner()`).
**Resolution:** _(empty until closed)_
### Client.Rust-038
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` |
| Status | Open |
**Description:** Three CLI test gaps: (1) `ConnectionArgs::options()` `--tls`/`--plaintext` resolution (incl. the both-set path of Client.Rust-033) is untested; (2) `browse_children_one_level`'s repeated-page-token guard is untested; (3) `parse_rfc3339_timestamp` has no error-path tests (trailing chars, day=0, month 13, out-of-range day).
**Recommendation:** Add unit tests for all three (none need a network connection).
**Resolution:** _(empty until closed)_
+65 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Contracts` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 3 |
## Checklist coverage
@@ -379,6 +379,23 @@ Re-review: no new findings. Open finding count remains 0. All seventeen
recorded Contracts findings (Contracts-001..017) remain closed
(Resolved / Won't Fix).
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the proto delta (`git diff 410acc9..8df5ab3 -- .../Protos/`): the new `optional ReplayGap replay_gap = 14` on `MxEvent` plus the `ReplayGap` message for reconnect replay. Additive-only confirmed (field 14 is new; oneof body arms 20-25 and fields 1-13 unchanged); `Generated/MxaccessGateway.cs` is consistent (contains `ReplayGapFieldNumber = 14`).
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | mxaccessgw conventions | No issues found (additive-only honoured) |
| 3 | Concurrency & thread safety | N/A — pure contract |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | Contracts-020 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Contracts-022 |
| 10 | Documentation & comments | Contracts-021 |
### Contracts-018
| Field | Value |
@@ -408,3 +425,48 @@ recorded Contracts findings (Contracts-001..017) remain closed
**Recommendation:** (1) Add comments to `ActiveAlarmSnapshot.degraded` / `source_provider` mirroring the wording already on `OnAlarmTransitionEvent` (or a one-line cross-reference). (2) Extend the `AlarmProviderMode` enum comment to note that as a `source_provider` / `mode` provenance value the field is always `ALARMMGR` or `SUBTAG` on the wire and `UNSPECIFIED` should be treated as "unknown / not yet determined", so the zero value is unambiguous at every use site. Comment-only changes; no wire-format impact.
**Resolution:** _(2026-06-15)_ Confirmed both gaps in `mxaccess_gateway.proto`: `ActiveAlarmSnapshot.degraded`/`source_provider` (14/15) were bare while the byte-identical `OnAlarmTransitionEvent` fields were documented, and the `AlarmProviderMode` enum comment only explained `UNSPECIFIED` for the `forced_mode` use. (1) Added comments to `ActiveAlarmSnapshot.degraded`/`source_provider` mirroring the `OnAlarmTransitionEvent` wording (subtag-fallback / reduced-fidelity, always ALARMMGR or SUBTAG, never UNSPECIFIED). (2) Extended the `AlarmProviderMode` enum comment to distinguish its two use sites: as `forced_mode`, `UNSPECIFIED` = auto; as a provenance value (`OnAlarmTransitionEvent.source_provider`, `ActiveAlarmSnapshot.source_provider`, `OnAlarmProviderModeChangedEvent.mode`, `AlarmProviderStatus.mode`) the worker always emits ALARMMGR/SUBTAG and `UNSPECIFIED` should be read as "unknown / not yet determined". Comment-only changes; no wire-format impact. NOTE: on this dev box the `csharp` protoc generator DOES emit proto leading comments into `Generated/MxaccessGateway.cs` `<summary>` XML doc (contrary to the brief's assumption), so the build regenerated `Generated/MxaccessGateway.cs` with the new doc comments only — diff is `///`-comment lines exclusively, zero code/wire/type changes. `dotnet build -f net10.0` succeeds with 0 warnings / 0 errors.
### Contracts-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `gateway.md:1087,1101-1102` |
| Status | Open |
**Description:** gateway.md still lists "no reconnectable sessions" under "Resolved for v1" and lists "reconnectable sessions" / "multi-subscriber event fan-out" as post-v1 revisit items. The shipped `ReplayGap` reconnect-replay contract and multi-subscriber fan-out (documented in docs/Sessions.md) contradict this. docs/Sessions.md was updated; gateway.md's scope summary was left stale.
**Recommendation:** Update the gateway.md Resolved/Post-v1 lists to reflect that reconnectable sessions (via `after_worker_sequence` + `ReplayGap`) and multi-subscriber fan-out have shipped, cross-referencing docs/Sessions.md.
**Resolution:** _(empty until closed)_
### Contracts-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` |
| Status | Open |
**Description:** The `replay_gap` field comment ends with "(Reconnect/replay logic is Task 12; this is the contract surface only.)". That parenthetical is now stale — the reconnect/replay logic has shipped and is exercised by EventStreamServiceTests/SessionEventDistributorTests. A reader is misled into thinking only the contract exists.
**Recommendation:** Drop the "Task 12 / contract surface only" parenthetical; the rest of the comment is accurate.
**Resolution:** _(empty until closed)_
### Contracts-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Open |
**Description:** No round-trip / descriptor pin exists for the new `ReplayGap` message or `MxEvent.replay_gap` (field 14). The field is exercised functionally end-to-end, but there is no contract-level pin to catch a future renumber/type-narrowing of `replay_gap = 14` or the two `ReplayGap` sequence-field numbers — the same gap class as Contracts-007/010/018.
**Recommendation:** Add a round-trip test setting `MxEvent.ReplayGap` with both sequence fields, asserting `BodyCase == None`, plus a descriptor assertion pinning `ReplayGapFieldNumber == 14` and the `ReplayGap` field numbers (1, 2).
**Resolution:** _(empty until closed)_
+80 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.IntegrationTests` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 4 |
## Checklist coverage
@@ -135,6 +135,23 @@ parameter (`d692232`).
| 9 | Testing coverage | Issues found: IntegrationTests-023 (`DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds` asserts the `ldap_group` claim but does not assert the emitted `Role: Admin` claim, leaving the role-mapping path untested). |
| 10 | Documentation & comments | No issues found. |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the live-test delta: two new `[LiveMxAccessFact]` smoke tests (B8 new COM commands; buffered-item path) + `EmptyAlarmWatchListResolver`. Tests correctly gated and serialized; credential-redaction coverage present. Only Low docs/coverage items.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | IntegrationTests-030 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | IntegrationTests-032, IntegrationTests-033 |
| 10 | Documentation & comments | IntegrationTests-030, IntegrationTests-031 |
## Findings
### IntegrationTests-001
@@ -608,3 +625,63 @@ The prior `DashboardAuthenticator` ctor took `IOptions<GatewayOptions>`, so the
**Recommendation:** Reword the `docs/GatewayTesting.md` "Live LDAP" failure-branch sentences to describe observable behavior without referencing the now-internal "candidate bind" mechanics (e.g. "a wrong password is rejected without leaking the password", "an unknown username fails authentication"), and note that bind/search is delegated to the shared `ZB.MOM.WW.Auth.Ldap` provider so the prose stays accurate after the cutover.
**Resolution:** Resolved 2026-06-15: Reworded the "Live LDAP" failure-branch prose to describe observable behavior ("fails authentication without leaking the password", "an unknown username fails authentication") instead of the now-internal "candidate bind" / "no candidate" mechanics, and added a sentence noting `DashboardAuthenticator` delegates the bind/search to the shared `ZB.MOM.WW.Auth.Ldap` provider (`LdapAuthService`) and only maps groups to roles — matching the in-source test-comment cutover. Verified by inspection.
### IntegrationTests-030
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` |
| Status | Open |
**Description:** `docs/GatewayTesting.md` says "All six tests are gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1" and enumerates five parity paths. This diff adds two new `[LiveMxAccessFact]` tests (B8 new COM commands: AuthenticateUser/ArchestrAUserToId/Suspend/Activate; and the buffered-data path: AddBufferedItem/SetBufferedUpdateInterval), bringing the total to eight. The doc still says "six" and omits the two new parity surfaces.
**Recommendation:** Update GatewayTesting.md to "eight" and add bullets for the B8 new-COM-commands and buffered-data parity surfaces.
**Resolution:** _(empty until closed)_
### IntegrationTests-031
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` |
| Status | Open |
**Description:** The inline comment at line 672 says "Suspend / Activate against the advised item", but no `Advise` call is made between `AddItem` (line 616) and `CreateSuspendRequest` (line 677) — the item is added but not advised. The comment mislabels the COM subscription state under test (the parity assertion only requires a real reply, not a successful one).
**Recommendation:** Change "against the advised item" to "against the added-but-not-advised item" (or remove "advised"), and note that Suspend/Activate is exercised without a prior Advise.
**Resolution:** _(empty until closed)_
### IntegrationTests-032
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` |
| Status | Open |
**Description:** In the buffered-item test, when no sample-bearing `OnBufferedDataChange` batch arrives, the sample-predicate `TimeoutException` is caught and discarded (line 831) before asserting `bootstrapBufferedEvents > 0`. The final failure message ("No OnBufferedDataChange event arrived at all") conflates two failure modes (NoData bootstrap not delivered vs. delivered-but-no-sample), reducing residual diagnostic quality.
**Recommendation:** Before nulling the batch, log the caught timeout message (e.g. `output.WriteLine($"B8: sample-bearing batch predicate timed out: {ex.Message}")`) so the residual log distinguishes the two cases.
**Resolution:** _(empty until closed)_
### IntegrationTests-033
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` |
| Status | Open |
**Description:** The new-COM-commands live test covers AuthenticateUser/ArchestrAUserToId/Suspend/Activate but not `AddItem2`/`Write2` — the B8 extended commands with a second context parameter introduced in the same bundle. Only live COM tests can verify the COM call succeeds with the correct argument split; a parity regression short-circuiting AddItem2/Write2 to InvalidRequest would not be caught.
**Recommendation:** Add AddItem2/Write2 to the parity test (or a dedicated test) asserting each produces a real reply (not InvalidRequest) against a valid handle and item-definition split.
**Resolution:** _(empty until closed)_
+57 -12
View File
@@ -10,23 +10,68 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 25 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 29 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 39 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 31 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 32 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 19 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 29 |
| [Server](Server/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 53 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 35 |
| [Worker](Worker/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 28 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-06-15 | `410acc9` | Re-reviewed | 0 | 33 |
| [Client.Dotnet](Client.Dotnet/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 29 |
| [Client.Go](Client.Go/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 34 |
| [Client.Java](Client.Java/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 9 | 48 |
| [Client.Python](Client.Python/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 5 | 36 |
| [Client.Rust](Client.Rust/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 6 | 38 |
| [Contracts](Contracts/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 22 |
| [IntegrationTests](IntegrationTests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 4 | 33 |
| [Server](Server/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 2 | 55 |
| [Tests](Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 38 |
| [Worker](Worker/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 0 | 28 |
| [Worker.Tests](Worker.Tests/findings.md) | Claude Code | 2026-06-16 | `8df5ab3` | Re-reviewed | 3 | 36 |
## Pending findings
Findings with status `Open` or `In Progress`, ordered by severity.
_No pending findings._
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Client.Dotnet-028 | Medium | Security | `clients/dotnet/.../MxGatewayClientCli.cs:156` | Client.Dotnet-008 was recorded resolved by adding a `TryResolveApiKey` helper resolving both `--api-key` and the `--api-key-env` env-var path, wired into the error-redaction catch block. At HEAD the catch block reads `arguments.GetOptional… |
| Client.Go-030 | Medium | Concurrency & thread safety | `clients/go/cmd/mxgw-go/main.go:1491-1494` | `runGalaxyWatch`'s limit-reached branch calls `cancelStream()` and returns WITHOUT draining the buffered `events` channel, unlike the signal-cancel branch which drains. This is the shape Client.Go-013's resolution claimed to have fixed ("n… |
| Client.Java-040 | Medium | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1552-1561` | The `stream-alarms` overflow handler does `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)` non-atomically on an `ArrayBlockingQueue` shared with the gRPC delivery thread. In production gRPC (netty I/O thread), a concurrent… |
| Client.Python-036 | Medium | Documentation & comments | `clients/python/README.md:143-158` | The README "Browsing lazily" section's code example calls `galaxy.browse_children(...)`, a method that does not exist — the actual public low-level method is `browse_children_raw`. The example raises `AttributeError` at runtime. The README… |
| Client.Rust-033 | Medium | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:485` | `ConnectionArgs::options()` computes plaintext as `!self.tls \|\| self.plaintext`. With both `--tls` and `--plaintext` supplied, this is `true`, silently degrading to an unencrypted channel despite the explicit `--tls`. A security-sensitive… |
| Server-054 | Medium | Design-document adherence | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) | The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still… |
| Client.Dotnet-026 | Low | Correctness & logic bugs | `clients/dotnet/.../MxGatewayClientCli.cs:306` (isLongRunning) | Client.Dotnet-015 extended `isLongRunning` to include the bench commands so they aren't silently cancelled by the default 30s CTS. The new `galaxy-browse` command is NOT in `isLongRunning`. A `galaxy-browse --depth N` tree walk on a large… |
| Client.Dotnet-027 | Low | Performance & resource management | `clients/dotnet/ZB.MOM.WW.MxGateway.Client/LazyBrowseNode.cs:15` | `LazyBrowseNode` allocates one `SemaphoreSlim _expandLock = new(1,1)` per node and never disposes it (the type is not IDisposable). For a large Galaxy browse tree (thousands of nodes), live SemaphoreSlim instances accumulate; OS handles ar… |
| Client.Dotnet-029 | Low | Code organization & conventions | `clients/dotnet/.../IMxGatewayCliClient.cs:6` | `IMxGatewayCliClient` is a public interface with no type-level `<summary>` XML doc. The Client.Dotnet-013 resolution recorded adding one; at HEAD it is absent. No CS1591 fires (GenerateDocumentationFile now scoped to the packable library o… |
| Client.Go-031 | Low | Correctness & logic bugs | `clients/go/cmd/mxgw-go/main.go:1037-1046` | `closeSmokeSession` registers `defer cancel()` twice on the same `cancel` variable across two `context.WithTimeout` calls when the deadline-shortening branch fires. Because `cancel` is reassigned, both defers end up calling the second cont… |
| Client.Go-032 | Low | Code organization & conventions | `clients/go/cmd/mxgw-go/main.go:839-841` | `runStreamEvents` does not install a `signal.NotifyContext` handler, while `runStreamAlarms` and `runGalaxyWatch` do. Client.Go-020's resolution claimed this was added. Without a signal-aware parent context, Ctrl+C kills the process withou… |
| Client.Go-033 | Low | Testing coverage | `clients/go/cmd/mxgw-go/main_test.go` | Gaps vs prior coverage: (1) `TestRunBenchReadBulkRejectsNonPositiveDuration` (named in Client.Go-021's resolution) is absent — the `-duration-seconds`-positive guard at main.go:619 is untested; (2) `runStreamEvents` has no CLI-level test (… |
| Client.Go-034 | Low | Documentation & comments | `clients/go/README.md:245-263` | The README CLI example table lists ~12 commands but the binary now exposes ~27 subcommands (per `writeUsage`). Absent: `ping`, `galaxy-browse`, `batch`, `read-bulk`, `write-bulk`, `write2-bulk`, `write-secured-bulk`, `write-secured2-bulk`,… |
| Client.Java-041 | Low | Correctness & logic bugs | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:2187-2194` | `jsonString` escapes only `\`, `"`, `\r`, `\n` — not `\t`, `\b`, `\f`, or U+0000U+001F/U+007F. A tag address/message/reference containing a tab produces malformed JSON (RFC 8259). Affects the hand-rolled `jsonObject`/`jsonString`/`jsonVal… |
| Client.Java-042 | Low | Error handling & resilience | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1565-1567` | `StreamAlarmsCommand.onError` calls `queue.offer(error)` without checking the return value. If the queue is full when a transport error arrives, the error is dropped and the drain loop blocks forever on `queue.take()`. Same class as Client… |
| Client.Java-043 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:241-264` | `galaxyBrowseParentZeroEmitsWarningToStderr` calls `MxGatewayCli.execute(new FakeClientFactory(), ...)` for a galaxy-browse command, which wires the real `GrpcGalaxyClientFactory` and constructs a live Netty channel to localhost:5000 as a… |
| Client.Java-044 | Low | Code organization & conventions | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientVersion.java:12` | `CLIENT_VERSION = "0.1.0"` is out of sync with Gradle `version = '0.1.1'` (cross-ref `clients/java/build.gradle:6`). The `version` command advertises 0.1.0 while the published artifact is 0.1.1; consumers can't use the version string as a… |
| Client.Java-045 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java` | The harness implements only `streamEvents`/`closeSession` (gateway) and `discoverHierarchy`/`watchDeployEvents` (galaxy); all other RPCs return gRPC UNIMPLEMENTED. This is undocumented, so a future test exercising invoke/register through t… |
| Client.Java-046 | Low | Testing coverage | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:680-696` | `streamAlarmsCommandFailsFastOnQueueOverflow` delivers all 2000 onNext synchronously from within `streamAlarms`, so `subscriptionRef` is still null when the overflow fires — the `sub.cancel()` branch is never exercised. The test also doesn… |
| Client.Java-047 | Low | Documentation & comments | `clients/java/README.md` | README advertises the `0.1.1` artifact coordinate (Gitea Maven section) while the `version` command reports `0.1.0` — the user-visible symptom of Client.Java-044. Cross-ref `MxGatewayClientVersion.java:12`. |
| Client.Java-048 | Low | Documentation & comments | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:88-105` | The public `execute(PrintWriter, PrintWriter, String...)` Javadoc calls it "Test-friendly entry point", but it wires `GrpcMxGatewayCliClientFactory` with no injection — the actual test seam is the package-private `execute(MxGatewayCliClien… |
| Client.Python-032 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1048,1065-1066` | `_smoke` reintroduces the dead `closed = False` / `if not closed:` guard that Client.Python-004's resolution claimed to have removed via `async with session:`. `closed` is never reassigned, so the guard is always true. Behavior is correct… |
| Client.Python-033 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:772,1490-1494` | `_parse_string_list` always emits `param_hint="--items"`, but it is also called from `_build_write_bulk_entries` with `kwargs["values"]`. An empty `--values ""` on the write-bulk commands yields `Error: Invalid value for '--items': ...`, p… |
| Client.Python-034 | Low | Correctness & logic bugs | `clients/python/src/zb_mom_ww_mxgateway_cli/commands.py:1497-1501` | `_parse_int_list` does `int(item)` with no error handling. A non-numeric token (e.g. `--item-handles "10,abc"`) raises a raw `ValueError`, surfacing as an unformatted traceback interactively (other input errors raise `click.BadParameter`). |
| Client.Python-035 | Low | Code organization & conventions | `clients/python/src/zb_mom_ww_mxgateway/__init__.py`, `.../options.py:63-77`, `.../galaxy.py:293` | Two new public types — `BrowseChildrenOptions` (options.py) and `LazyBrowseNode` (galaxy.py) — are absent from `__init__.py`/`__all__`, so callers can't `from zb_mom_ww_mxgateway import BrowseChildrenOptions`, breaking the package-root imp… |
| Client.Rust-034 | Low | Correctness & logic bugs | `clients/rust/crates/mxgw-cli/src/main.rs:48-51,548` | `Command::Version` carries a `jsonl: bool` field that is never read; the dispatch arm matches `{ json, .. }` and discards `jsonl`. `mxgw version --jsonl` silently behaves as plain text. |
| Client.Rust-035 | Low | Security | `clients/rust/crates/mxgw-cli/src/main.rs:489-495` | `--api-key-env` (default `MXGATEWAY_API_KEY`) names an env var read into an `ApiKey` Bearer token, but its clap help has no description of the expected value format. A user pointing it at another credential's env var would silently forward… |
| Client.Rust-036 | Low | Design-document adherence | `clients/rust/RustClientDesign.md:351` | The new `galaxy browse` subcommand (with its filter/depth/json flags) is not listed in the "Test CLI" command table in RustClientDesign.md, which still reads `galaxy {test-connection,last-deploy-time,discover-hierarchy,watch}`. |
| Client.Rust-037 | Low | Design-document adherence | `clients/rust/README.md:164-179` | The README "Browsing lazily" example calls `galaxy.browse_children(...).await?.into_inner()`, but the public API is `GalaxyClient::browse_children_raw` (the bare `browse_children` is the generated proto-client method, not public; and `brow… |
| Client.Rust-038 | Low | Testing coverage | `clients/rust/crates/mxgw-cli/src/main.rs:2336-2564` | Three CLI test gaps: (1) `ConnectionArgs::options()` `--tls`/`--plaintext` resolution (incl. the both-set path of Client.Rust-033) is untested; (2) `browse_children_one_level`'s repeated-page-token guard is untested; (3) `parse_rfc3339_tim… |
| Contracts-020 | Low | Design-document adherence | `gateway.md:1087,1101-1102` | gateway.md still lists "no reconnectable sessions" under "Resolved for v1" and lists "reconnectable sessions" / "multi-subscriber event fan-out" as post-v1 revisit items. The shipped `ReplayGap` reconnect-replay contract and multi-subscrib… |
| Contracts-021 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto:731-733` | The `replay_gap` field comment ends with "(Reconnect/replay logic is Task 12; this is the contract surface only.)". That parenthetical is now stale — the reconnect/replay logic has shipped and is exercised by EventStreamServiceTests/Sessio… |
| Contracts-022 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | No round-trip / descriptor pin exists for the new `ReplayGap` message or `MxEvent.replay_gap` (field 14). The field is exercised functionally end-to-end, but there is no contract-level pin to catch a future renumber/type-narrowing of `repl… |
| IntegrationTests-030 | Low | Documentation & comments | `docs/GatewayTesting.md:76`, `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:576,728` | `docs/GatewayTesting.md` says "All six tests are gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1" and enumerates five parity paths. This diff adds two new `[LiveMxAccessFact]` tests (B8 new COM commands: AuthenticateUser/ArchestrAUserToId/Sus… |
| IntegrationTests-031 | Low | Documentation & comments | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:672` | The inline comment at line 672 says "Suspend / Activate against the advised item", but no `Advise` call is made between `AddItem` (line 616) and `CreateSuspendRequest` (line 677) — the item is added but not advised. The comment mislabels t… |
| IntegrationTests-032 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:823-865` | In the buffered-item test, when no sample-bearing `OnBufferedDataChange` batch arrives, the sample-predicate `TimeoutException` is caught and discarded (line 831) before asserting `bootstrapBufferedEvents > 0`. The final failure message ("… |
| IntegrationTests-033 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:577-709` | The new-COM-commands live test covers AuthenticateUser/ArchestrAUserToId/Suspend/Activate but not `AddItem2`/`Write2` — the B8 extended commands with a second context parameter introduced in the same bundle. Only live COM tests can verify… |
| Server-055 | Low | Correctness & logic bugs | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` | When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` an… |
| Tests-036 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` | Three new validator rules — `DetachGraceSeconds >= 0` (GatewayOptionsValidator.cs:185-186), `ReplayBufferCapacity >= 0` (:215-216), `ReplayRetentionSeconds >= 0` (:219-220) — have no tests, while the sibling new options (`MaxEventSubscribe… |
| Tests-037 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` | The reconnect/replay contract surface (`ReplayGap` message, `MxEvent.replay_gap = 14`, `StreamEventsRequest.after_worker_sequence`) has no protobuf serialize/parse round-trip test pinning the wire shape and the documented sentinel invarian… |
| Tests-038 | Low | Performance & resource management | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` | `DrainUntilFaultAsync` relies on the channel completing WITH a fault so `WaitToReadAsync` re-throws. Correct for current callers, but if reused on a channel that completes gracefully, `WaitToReadAsync` returns false without throwing and th… |
| Worker.Tests-034 | Low | Code organization & conventions | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs:2233`, `src/ZB.MOM.WW.MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs:97` | `FakeMxStatus` is defined twice — file-scope in `TestSupport/NoopMxAccessServer.cs:97` and nested in `MxAccessCommandExecutorTests.FakeMxAccessComObject:2233` — both exposing the same four public fields that `MxStatusProxyConverter` reflec… |
| Worker.Tests-035 | Low | Testing coverage | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs`, `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:99-136` | `MxAccessCommandExecutor.Execute` has a `_` discard arm returning `CreateInvalidRequestReply(... "Unsupported MXAccess command kind ...")` — the safety net for an unknown `MxCommandKind` (e.g. a future gateway enum value before the worker… |
| Worker.Tests-036 | Low | Concurrency & thread safety | `src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:983-996` | `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop` carries a redundant wall-clock assertion `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)`. The existing `heartbeatWait` CTS (cancel-after 5s) already enforces the same bound — th… |
## Closed findings
+50 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Server` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 2 |
## Checklist coverage
@@ -69,6 +69,23 @@ findings (Server-001 through Server-032) are unchanged by this pass.
| 9 | Testing coverage | Issues found: Server-037 (no test for the corrupt-snapshot restore path or for `PersistSnapshot = false` at the cache level). |
| 10 | Documentation & comments | No issues found — XML docs match behavior; the `GalaxyRepository.md` "On-disk snapshot" section documents the Stale-on-restore lifecycle. |
### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the session-resilience epic + §8 delta (`git diff 410acc9..8df5ab3`): `SessionEventDistributor` multi-subscriber fan-out, replay-on-reconnect, detach-grace retention, bounded worker-ready wait, dashboard auto-login.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Server-055 |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (replay/handoff atomicity, reconnect-vs-sweep, single-clock ready-wait all verified sound) |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found (DisableLogin auto-login is intentional/config-gated/documented) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | Server-054 |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | No issues found |
### 2026-05-24 re-review (commit 42b0037)
Re-review pass at `42b0037` scoped to the dashboard destructive-action wave on
@@ -1022,3 +1039,33 @@ Additionally, `GatewayAlarmMonitor.ApplyProviderModeChangeAsync` increments the
**Recommendation:** Add resolver tests for (a) cancellation propagation and (b) an include that is also excluded; and a `GatewayAlarmMonitorProviderMode` test pinning the provider-switch counter behaviour for a same-mode repeat event (whichever semantics the team intends). These lock down the contracts the Server-051/052 findings expose.
**Resolution:** Resolved 2026-06-15. Added all three missing tests: (a) `AlarmWatchListResolverTests.ResolveAsync_RepositoryCancelled_PropagatesOperationCanceled` (cancellation propagation, also covers Server-051); (b) `AlarmWatchListResolverTests.ResolveAsync_ExcludeAlsoSuppressesMatchingExplicitInclude` (exclude-vs-include precedence, also Server-052 item 2); and (c) `GatewayAlarmMonitorProviderModeTests.ProviderModeChange_RepeatedSameMode_RecordsASwitchForEachEvent`, which pins the existing semantics — each worker-reported `OnAlarmProviderModeChanged` event records a `provider_switches` increment (and resets `_providerSince`) even when `toMode` equals the current mode, since the worker is the authority on when a mode change occurred and the gateway does not synthesize or suppress it.
### Server-054
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `docs/DesignDecisions.md` (Session Reconnect / Event Subscribers / Later Revisit Items §470-471), `CLAUDE.md` (Repository-Specific Conventions) |
| Status | Open |
**Description:** The session-resilience epic shipped multi-subscriber fan-out (`SessionEventDistributor`), reconnectable sessions with replay (`AttachEventSubscriberWithReplay`/`ReplayGap`), and detach-grace retention — but `docs/DesignDecisions.md` still states "no reconnectable sessions for v1" and "one active StreamEvents subscriber per session for v1", and still files both as post-v1 "Later Revisit Items". `CLAUDE.md` likewise still says these are "explicitly out of scope". This is the stale-prose-vs-shipped-behavior drift the "update docs in the same change as the source" rule prohibits.
**Recommendation:** Update both `DesignDecisions.md` sections and the revisit list to describe the shipped behavior (gated by `AllowMultipleEventSubscribers`, `DetachGraceSeconds`, replay options), and amend the CLAUDE.md convention bullet.
**Resolution:** _(empty until closed)_
### Server-055
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:842-851,1841-1871` |
| Status | Open |
**Description:** When `AttachEventSubscriber`/`AttachEventSubscriberWithReplay` fails inside `StartDistributorAndRegister`, the catch calls `DetachEventSubscriber()`, which decrements the active count back to 0 and — because the session is still `Ready` and detach-grace is enabled — stamps `_detachedAtUtc = now`. A freshly-`Ready` session that never had a successful subscriber thus enters the detach-grace window on a failed first attach, making it sweep-eligible after `DetachGraceSeconds` even though no client ever streamed. Impact is minor (the lease still protects it; a later successful attach clears the stamp) but the "last subscriber dropped" semantics are violated.
**Recommendation:** Only stamp `_detachedAtUtc` on a detach that mirrors a prior successful attach — roll the failure path back without entering grace, or guard the stamp on "a subscriber had previously been registered."
**Resolution:** _(empty until closed)_
+65 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 3 |
## Checklist coverage
@@ -111,6 +111,23 @@ fakes in two test files.
| 9 | Testing coverage | Issues found: Tests-026 (no test proves `EventStreamService` actually calls `IDashboardEventBroadcaster.Publish` for each event — the only consumers in tests are `Null` fakes). |
| 10 | Documentation & comments | No issues found in this diff. |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the gateway-test delta (session-resilience epic + §8). New tests are high quality (bounded async waits, FakeTimeProvider, deterministic gating, meaningful assertions). Verified the §8 FakeWorkerProcess consolidation did NOT drop the `entireProcessTree` kill assertion. Only Low coverage-gap / one latent helper footgun.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Tests-038 |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | Tests-036, Tests-037 |
| 10 | Documentation & comments | No issues found |
## Findings
### Tests-001
@@ -647,3 +664,48 @@ The cancellation tests for `WorkerClient` in `WorkerClientTests` *do* exercise t
**Recommendation:** Bound the second-subscriber drain with the same `WaitTimeout` used elsewhere — e.g. link `newStreamCts` to a `CancellationTokenSource.CreateLinkedTokenSource` plus `CancelAfter(WaitTimeout)`, or wrap the drain in a `Task` awaited via `WaitAsync(WaitTimeout)` — so a missing `SnapshotComplete` surfaces as a deterministic failure rather than a hang.
**Resolution:** 2026-06-15 — Confirmed the unbounded `await foreach` in `DegradedTransition_CachedThenReplayed_CarriesDegradedAndSourceProviderToNewSubscriber`. Bounded the second-subscriber drain with a `CancellationTokenSource.CreateLinkedTokenSource(newStreamCts.Token, drainTimeoutCts.Token)` where `drainTimeoutCts.CancelAfter(WaitTimeout)`, and wrapped the loop in a `try/catch (OperationCanceledException) when (drainTimeoutCts.IsCancellationRequested)` that rethrows a `TimeoutException`. A regression that never emits `SnapshotComplete` now fails cleanly instead of hanging. Test still passes.
### Tests-036
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Configuration/GatewayOptionsValidatorTests.cs` |
| Status | Open |
**Description:** Three new validator rules — `DetachGraceSeconds >= 0` (GatewayOptionsValidator.cs:185-186), `ReplayBufferCapacity >= 0` (:215-216), `ReplayRetentionSeconds >= 0` (:219-220) — have no tests, while the sibling new options (`MaxEventSubscribersPerSession`, `WorkerReadyWaitTimeoutMs`) do. A regression dropping/inverting any of the three guards would pass with no failing test.
**Recommendation:** Add boundary theories mirroring the `MaxEventSubscribersPerSession` pattern: a failing case (`-1`) asserting the message contains each config path, and a succeeding boundary case (`0`).
**Resolution:** _(empty until closed)_
### Tests-037
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
| Status | Open |
**Description:** The reconnect/replay contract surface (`ReplayGap` message, `MxEvent.replay_gap = 14`, `StreamEventsRequest.after_worker_sequence`) has no protobuf serialize/parse round-trip test pinning the wire shape and the documented sentinel invariant (family UNSPECIFIED, body oneof and per-item fields unset). Behavior is exercised in EventStreamServiceTests; this is a wire-contract gap.
**Recommendation:** Add a round-trip test building an `MxEvent` with `ReplayGap` populated, asserting the two sequence fields survive and the sentinel invariants hold (field 14, `Family == Unspecified`, `BodyCase` unset).
**Resolution:** _(empty until closed)_
### Tests-038
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionEventDistributorTests.cs:702-713` |
| Status | Open |
**Description:** `DrainUntilFaultAsync` relies on the channel completing WITH a fault so `WaitToReadAsync` re-throws. Correct for current callers, but if reused on a channel that completes gracefully, `WaitToReadAsync` returns false without throwing and the helper spins in a tight CPU loop with no escape (ReadTimeout bounds only the individual wait). A maintenance hazard, not a current bug.
**Recommendation:** When `WaitToReadAsync` returns false, await `reader.Completion` (surfaces the fault or completes cleanly) and `Assert.Fail` on graceful completion, so the helper fails fast instead of spinning.
**Resolution:** _(empty until closed)_
+65 -3
View File
@@ -4,10 +4,10 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Worker.Tests` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
| Open findings | 3 |
## 2026-06-15 re-review (commit `410acc9`)
@@ -119,6 +119,23 @@ findings (Worker.Tests-001 through -030) are unaffected.
| 9 | Testing coverage | No issues found in this diff. |
| 10 | Documentation & comments | No issues found in this diff. |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the worker-test delta covering the new COM seam (`MxAccessCommandExecutorTests`, `MxAccessComServerTests`) and alarm work. Tests genuinely exercise STA dispatch and parity; only Low organization/coverage/flakiness items.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | Worker.Tests-036 |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found (password-no-leak test present) |
| 6 | Performance & resource management | No issues found |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | Worker.Tests-034 |
| 9 | Testing coverage | Worker.Tests-035 |
| 10 | Documentation & comments | No issues found |
## Findings
### Worker.Tests-001
@@ -615,3 +632,48 @@ findings (Worker.Tests-001 through -030) are unaffected.
**Recommendation:** Add (a) `AckedTrueWhileInactive_EmitsNothingAndDoesNotLatch` — apply `.acked=true` with no prior active raise, assert `Apply` returns empty, then raise active and clear and assert the clear emits `UnackRtn` (proving the stale ack did not latch); and (b) `PriorityChange_FlowsIntoEmittedRecord` — apply a priority value then an active raise and assert the emitted record's `Priority` equals the supplied value (and a `CoerceInt` string/garbage case falls back).
**Resolution:** 2026-06-15 — Added both tests to `SubtagAlarmStateMachineTests`. `AckedTrueWhileInactive_EmitsNothingAndDoesNotLatch` applies `.acked=true` with no preceding active raise (asserts `Apply` returns empty), then drives a fresh raise→clear episode and asserts the clear emits `UnackRtn` — proving the stale inactive ack did not latch `AckedDuringEpisode`. `PriorityChange_FlowsIntoEmittedRecord` (the target now includes a `PrioritySubtag`) applies an `int` priority `750` (asserts the priority change emits nothing), raises active and asserts the emitted record's `Priority == 750` (exercising `CoerceInt`'s `int` path and the priority assignment), then applies a non-numeric `"not-a-number"` priority and asserts the snapshot `Priority` is still `750` (the `CoerceInt` string fallback keeps the prior value, not zero).
### Worker.Tests-034
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs:2233`, `src/ZB.MOM.WW.MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs:97` |
| Status | Open |
**Description:** `FakeMxStatus` is defined twice — file-scope in `TestSupport/NoopMxAccessServer.cs:97` and nested in `MxAccessCommandExecutorTests.FakeMxAccessComObject:2233` — both exposing the same four public fields that `MxStatusProxyConverter` reflects over. The two copies must stay structurally identical; a future field change to the real COM struct requires updating two places, and the duplication is invisible to a reader consulting only one file.
**Recommendation:** Extract `FakeMxStatus` into its own `TestSupport/FakeMxStatus.cs` (or colocate both doubles) and have `MxAccessCommandExecutorTests` use the shared type instead of its nested copy.
**Resolution:** _(empty until closed)_
### Worker.Tests-035
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs`, `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:99-136` |
| Status | Open |
**Description:** `MxAccessCommandExecutor.Execute` has a `_` discard arm returning `CreateInvalidRequestReply(... "Unsupported MXAccess command kind ...")` — the safety net for an unknown `MxCommandKind` (e.g. a future gateway enum value before the worker is updated). No test passes an unknown kind and asserts `InvalidRequest`. A regression changing the arm to `throw` would propagate an unhandled exception through `WorkerPipeSession` and no test would catch it.
**Recommendation:** Add a `[Fact]` constructing a `StaCommand` with an undefined `MxCommandKind` value and asserting the reply is `ProtocolStatusCode.InvalidRequest` with "Unsupported" in the diagnostic.
**Resolution:** _(empty until closed)_
### Worker.Tests-036
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:983-996` |
| Status | Open |
**Description:** `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop` carries a redundant wall-clock assertion `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)`. The existing `heartbeatWait` CTS (cancel-after 5s) already enforces the same bound — the extra wall-clock check can only fire if the heartbeat arrived but took >5s to be received, which the CTS already prevents. It is the same coarse wall-clock pattern prior findings (Worker.Tests-003/004/013/020) corrected.
**Recommendation:** Remove the `start`/`elapsed`/`Assert.True(elapsed < ...)` check; the CTS timeout already pins the timing contract.
**Resolution:** _(empty until closed)_
+19 -2
View File
@@ -4,8 +4,8 @@
|---|---|
| Module | `src/ZB.MOM.WW.MxGateway.Worker` |
| Reviewer | Claude Code |
| Review date | 2026-06-15 |
| Commit reviewed | `410acc9` |
| Review date | 2026-06-16 |
| Commit reviewed | `8df5ab3` |
| Status | Re-reviewed |
| Open findings | 0 |
@@ -87,6 +87,23 @@ contention with the gateway-side watchdog (Server-031) is unchanged.
| 9 | Testing coverage | No issues found in this diff. |
| 10 | Documentation & comments | No issues found in this diff. |
#### 2026-06-16 re-review (commit 8df5ab3)
Re-review of the worker delta (`git diff 410acc9..8df5ab3`): the `IMxAccessServer`/`MxAccessComServer`/`MxAccessSession`/`MxAccessCommandExecutor` seam-extraction refactor plus alarm failover/subtag work. **No new findings.** Prior findings Worker-026/027/028 confirmed resolved at this commit. Every MXAccess COM call in the new seam is reachable only via `StaCommandDispatcher``staRuntime.InvokeAsync` (STA affinity preserved); MXAccess parity preserved (no synthesized events, HRESULTs surfaced); the single COM RCW is released exactly once; net48 idioms respected.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found |
| 2 | mxaccessgw conventions | No issues found |
| 3 | Concurrency & thread safety | No issues found (STA affinity preserved across the new seam) |
| 4 | Error handling & resilience | No issues found |
| 5 | Security | No issues found (no secret/WriteSecured-payload logging) |
| 6 | Performance & resource management | No issues found (single FinalReleaseComObject) |
| 7 | Design-document adherence | No issues found |
| 8 | Code organization & conventions | No issues found |
| 9 | Testing coverage | No issues found |
| 10 | Documentation & comments | No issues found |
## Findings
### Worker-001