fix: resolve Client.Java + Worker.Tests findings (pending windev verification)
Client.Java-040..048, Worker.Tests-034/035/036. Edits applied on the Mac, which has no JRE and cannot build the x86+MXAccess worker tests; findings are marked In Progress pending gradle + x86 build verification on windev. Do not mark Resolved until verified there.
This commit is contained in:
@@ -752,13 +752,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1552-1561` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** The `stream-alarms` overflow handler does `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)` non-atomically on an `ArrayBlockingQueue` shared with the gRPC delivery thread. In production gRPC (netty I/O thread), a concurrent `onNext` between the clear and the offers can re-enqueue a normal message, displacing the overflow exception so the drain loop hits the normal message and may exit before reaching the exception — exiting 0 on a truncated feed. Same race class as Client.Java-002/033.
|
||||
|
||||
**Recommendation:** Guard the overflow transition with an `AtomicBoolean` (mirror `MxGatewayStreamSubscription.terminate()`'s terminated-flag + lock) instead of re-clearing the queue.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed root cause in `StreamAlarmsCommand.call()`: the overflow branch did `queue.clear()` then `offer(exception)` + `offer(ALARM_FEED_END)`, so a concurrent `onNext` between the clear and the offers could re-enqueue a normal message and displace the overflow signal. (Note: `MxGatewayStreamSubscription` has no `terminate()` method; the terminal-guard model lives in `MxEventStream`, which itself still uses the clear+offer shape — I implemented the atomic guard the finding asks for rather than copying the older pattern.) Replaced the clear+offer with a single `AtomicBoolean terminated` guard (`compareAndSet(false,true)` — first terminal wins) plus a dedicated `AtomicReference<Object> terminal` slot that holds the terminal item (overflow exception / transport error / `ALARM_FEED_END`) independently of the bounded queue. `onNext` no longer re-clears the queue; it just stops enqueueing once terminated. The drain loop now `poll(50ms)`s and, when the queue is empty, reads the terminal slot. No re-clear, and a concurrent `onNext` can no longer displace the terminal. Fix applied 2026-06-16, pending gradle verification on windev. Regression test: `MxGatewayCliTests.streamAlarmsCommandFailsFastOnQueueOverflow` (strengthened under Client.Java-046 to drive async delivery and assert the overflow text).
|
||||
|
||||
### Client.Java-041
|
||||
|
||||
@@ -767,13 +767,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:2187-2194` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `jsonString` escapes only `\`, `"`, `\r`, `\n` — not `\t`, `\b`, `\f`, or U+0000–U+001F/U+007F. A tag address/message/reference containing a tab produces malformed JSON (RFC 8259). Affects the hand-rolled `jsonObject`/`jsonString`/`jsonValue` output paths (the protobuf `JsonFormat` path is spec-correct).
|
||||
|
||||
**Recommendation:** Add `\t`/`\b`/`\f` escapes and `\u00XX` for control chars, or route all JSON through a real JSON library.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: `jsonString` escaped only `\\ \" \r \n`, so a tab/backspace/form-feed or any other U+0000–U+001F (or U+007F) char produced malformed JSON. Rewrote `jsonString` as a per-character builder that emits the two-character escapes for `\t \b \f \r \n \" \\` and `\u00XX` for the remaining `< 0x20` range plus DEL (`0x7f`), keeping ordinary printable characters verbatim. Widened `jsonString` from `private` to package-private (matching the Client.Java-032 `commandLine(...)` precedent) so the escaping can be unit-tested directly. Fix applied 2026-06-16, pending gradle verification on windev. Regression test: `MxGatewayCliTests.jsonStringEscapesControlCharacters`.
|
||||
|
||||
### Client.Java-042
|
||||
|
||||
@@ -782,13 +782,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:1565-1567` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `StreamAlarmsCommand.onError` calls `queue.offer(error)` without checking the return value. If the queue is full when a transport error arrives, the error is dropped and the drain loop blocks forever on `queue.take()`. Same class as Client.Java-033 on the error path.
|
||||
|
||||
**Recommendation:** Reserve a sentinel slot or use the `terminate(Throwable)` guard from `MxEventStream`; ensure the drain always sees a terminal item.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: `onError` did a bare `queue.offer(error)` that, on a full queue, dropped the error and stranded the drain on `queue.take()` forever. Fixed together with Client.Java-040: `onError` now routes through the shared `terminate(error)` consumer, which records the throwable in the dedicated `terminal` slot (guarded by the `AtomicBoolean`, never enqueued into the bounded `queue`). The drain loop reads that slot via the `poll(50ms)` + terminal-check path, so a transport error is always observed even when the queue is full, and the `take()`-forever deadlock is gone. Fix applied 2026-06-16, pending gradle verification on windev. Covered by the same `streamAlarmsCommandFailsFastOnQueueOverflow` terminal-slot plumbing; the error path shares the slot with the overflow path.
|
||||
|
||||
### Client.Java-043
|
||||
|
||||
@@ -797,13 +797,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:241-264` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `galaxyBrowseParentZeroEmitsWarningToStderr` calls `MxGatewayCli.execute(new FakeClientFactory(), ...)` for a galaxy-browse command, which wires the real `GrpcGalaxyClientFactory` and constructs a live Netty channel to localhost:5000 as a side effect (asserting only the warning). Wasteful and non-deterministic if port 5000 is reachable.
|
||||
|
||||
**Recommendation:** Use `executeGalaxy(...)` with a `GalaxyClientFactory` stub that throws, so only the warning path runs.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: the test called `MxGatewayCli.execute(new FakeClientFactory(), ...)`, which routes galaxy commands through the production `GrpcGalaxyClientFactory`; `GalaxyBrowseCommand.call()` prints the `--parent 0` warning then `connect()`s a live `GalaxyRepositoryClient` (Netty channel to localhost:5000) before failing — wasteful and non-deterministic. Rewrote the test to use the existing `executeGalaxy(...)` seam with a new `ThrowingGalaxyClientFactory` stub whose `connect()` throws; the warning is emitted before `connect()` is reached, so only the warning path runs and no live channel is constructed. Fix applied 2026-06-16, pending gradle verification on windev. Test: `MxGatewayCliTests.galaxyBrowseParentZeroEmitsWarningToStderr` (updated).
|
||||
|
||||
### Client.Java-044
|
||||
|
||||
@@ -812,13 +812,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-client/src/main/java/com/zb/mom/ww/mxgateway/client/MxGatewayClientVersion.java:12` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `CLIENT_VERSION = "0.1.0"` is out of sync with Gradle `version = '0.1.1'` (cross-ref `clients/java/build.gradle:6`). The `version` command advertises 0.1.0 while the published artifact is 0.1.1; consumers can't use the version string as a reliable artifact check.
|
||||
|
||||
**Recommendation:** Bump `CLIENT_VERSION` to `0.1.1` (and the two test assertions), or source it from a Gradle-generated properties file.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: `MxGatewayClientVersion.CLIENT_VERSION = "0.1.0"` while `clients/java/build.gradle:16` sets `version = '0.1.1'` and the README Maven coordinate is `:0.1.1`. Bumped `CLIENT_VERSION` to `"0.1.1"` and updated the two test assertions (`MxGatewayCliTests.versionCommandPrintsProtocolVersions` line asserting `"mxgateway-java 0.1.0"` and `versionCommandPrintsJson` asserting `"clientVersion":"0.1.0"`) to `0.1.1`. Left as a hardcoded constant (sourcing from a Gradle-generated properties file was the optional alternative, not required). Fix applied 2026-06-16, pending gradle verification on windev. Tests: `MxGatewayCliTests.versionCommandPrintsProtocolVersions`, `versionCommandPrintsJson`.
|
||||
|
||||
### Client.Java-045
|
||||
|
||||
@@ -827,13 +827,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** The harness implements only `streamEvents`/`closeSession` (gateway) and `discoverHierarchy`/`watchDeployEvents` (galaxy); all other RPCs return gRPC UNIMPLEMENTED. This is undocumented, so a future test exercising invoke/register through the harness would silently get UNIMPLEMENTED.
|
||||
|
||||
**Recommendation:** Add a Javadoc note enumerating implemented RPCs and warning that others return UNIMPLEMENTED by design.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed against source (the file lives under `src/test/...`, not `src/main/...` as the finding location states): the scripted fakes override only `streamEvents`/`closeSession` (gateway) and `discoverHierarchy`/`watchDeployEvents` (galaxy); every other RPC inherits the generated `*ImplBase` default and returns gRPC `UNIMPLEMENTED`. Added a "Implemented RPCs" section to the `InProcessGatewayHarness` class Javadoc enumerating the four overridden RPCs and warning that all others (openSession, invoke, register, streamAlarms, queryActiveAlarms, browseChildren, …) return `UNIMPLEMENTED` by design, so a future test must add a scripted override first. Doc-only change. Fix applied 2026-06-16, pending gradle verification on windev. No test needed.
|
||||
|
||||
### Client.Java-046
|
||||
|
||||
@@ -842,13 +842,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java:680-696` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `streamAlarmsCommandFailsFastOnQueueOverflow` delivers all 2000 onNext synchronously from within `streamAlarms`, so `subscriptionRef` is still null when the overflow fires — the `sub.cancel()` branch is never exercised. The test also doesn't assert the overflow message text. It passes for a reason that doesn't generalize to async gRPC delivery.
|
||||
|
||||
**Recommendation:** Deliver messages asynchronously so the cancel path runs, and assert the overflow error text appears in output.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: `OverflowingFakeClient.streamAlarms` pushed all 2000 `onNext` synchronously and returned the subscription only afterward, so `subscriptionRef` was still null when the overflow fired and the `sub.cancel()` branch never ran; the test also asserted only the exit code, not the overflow text. Reworked `OverflowingFakeClient.streamAlarms` to flood on a background daemon thread (mirroring a real netty I/O thread) and return the subscription first, so the overflow fires with a non-null published subscription and exercises the `terminate()` cancel path. Strengthened `streamAlarmsCommandFailsFastOnQueueOverflow` to additionally assert the overflow message text ("queue overflowed") surfaces in stderr/stdout. Fix applied 2026-06-16, pending gradle verification on windev. Test: `MxGatewayCliTests.streamAlarmsCommandFailsFastOnQueueOverflow` (updated; also validates the Client.Java-040 terminal-slot fix).
|
||||
|
||||
### Client.Java-047
|
||||
|
||||
@@ -857,13 +857,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `clients/java/README.md` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** README advertises the `0.1.1` artifact coordinate (Gitea Maven section) while the `version` command reports `0.1.0` — the user-visible symptom of Client.Java-044. Cross-ref `MxGatewayClientVersion.java:12`.
|
||||
|
||||
**Recommendation:** Resolved by fixing Client.Java-044 (sync the compiled-in version).
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Symptom of Client.Java-044, resolved together. The README's `0.1.1` Maven coordinate (`clients/java/README.md:336`) was already correct; the divergence was the compiled-in `CLIENT_VERSION = "0.1.0"`. Bumping `CLIENT_VERSION` to `0.1.1` (Client.Java-044) makes the `version` command report `0.1.1`, matching the README. No README edit needed. Fix applied 2026-06-16, pending gradle verification on windev.
|
||||
|
||||
### Client.Java-048
|
||||
|
||||
@@ -872,13 +872,13 @@ BrowseChildrenReply reply = galaxy.browseChildren(
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java:88-105` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** The public `execute(PrintWriter, PrintWriter, String...)` Javadoc calls it "Test-friendly entry point", but it wires `GrpcMxGatewayCliClientFactory` with no injection — the actual test seam is the package-private `execute(MxGatewayCliClientFactory, ...)` / `commandLine(...)` overload. Misleading.
|
||||
|
||||
**Recommendation:** Clarify the Javadoc to direct readers to the injectable overload for testing.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Confirmed: the public `execute(PrintWriter, PrintWriter, String...)` Javadoc called it the "Test-friendly entry point", but it wires the production `GrpcMxGatewayCliClientFactory` with no injection seam — unit tests actually use the package-private `execute(MxGatewayCliClientFactory, ...)` / `commandLine(...)` overloads. Rewrote the Javadoc to drop "test-friendly", explain it wires a real gRPC channel, and direct test authors to the injectable package-private overloads. Doc-only change. Fix applied 2026-06-16, pending gradle verification on windev. No test needed.
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -640,13 +640,13 @@ Re-review of the worker-test delta covering the new COM seam (`MxAccessCommandEx
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs:2233`, `src/ZB.MOM.WW.MxGateway.Worker.Tests/TestSupport/NoopMxAccessServer.cs:97` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `FakeMxStatus` is defined twice — file-scope in `TestSupport/NoopMxAccessServer.cs:97` and nested in `MxAccessCommandExecutorTests.FakeMxAccessComObject:2233` — both exposing the same four public fields that `MxStatusProxyConverter` reflects over. The two copies must stay structurally identical; a future field change to the real COM struct requires updating two places, and the duplication is invisible to a reader consulting only one file.
|
||||
|
||||
**Recommendation:** Extract `FakeMxStatus` into its own `TestSupport/FakeMxStatus.cs` (or colocate both doubles) and have `MxAccessCommandExecutorTests` use the shared type instead of its nested copy.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Removed the nested `FakeMxStatus` class from `MxAccessCommandExecutorTests.FakeMxAccessComObject`; the two `new FakeMxStatus { ... }` usages in `Suspend`/`Activate` now resolve to the shared `TestSupport.FakeMxStatus` via the pre-existing `using ZB.MOM.WW.MxGateway.Worker.Tests.TestSupport;` import. Updated the XML doc on `TestSupport/NoopMxAccessServer.cs:FakeMxStatus` to note the consolidation. Fix applied 2026-06-16, pending build verification on windev.
|
||||
|
||||
### Worker.Tests-035
|
||||
|
||||
@@ -655,13 +655,13 @@ Re-review of the worker-test delta covering the new COM seam (`MxAccessCommandEx
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/MxAccess/MxAccessCommandExecutorTests.cs`, `src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:99-136` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `MxAccessCommandExecutor.Execute` has a `_` discard arm returning `CreateInvalidRequestReply(... "Unsupported MXAccess command kind ...")` — the safety net for an unknown `MxCommandKind` (e.g. a future gateway enum value before the worker is updated). No test passes an unknown kind and asserts `InvalidRequest`. A regression changing the arm to `throw` would propagate an unhandled exception through `WorkerPipeSession` and no test would catch it.
|
||||
|
||||
**Recommendation:** Add a `[Fact]` constructing a `StaCommand` with an undefined `MxCommandKind` value and asserting the reply is `ProtocolStatusCode.InvalidRequest` with "Unsupported" in the diagnostic.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Added `DispatchAsync_WithUnknownCommandKind_ReturnsInvalidRequestWithUnsupportedDiagnostic` to `MxAccessCommandExecutorTests`. Casts `int.MaxValue` to `MxCommandKind` (an undefined value not present in the proto-generated enum), dispatches it through `MxAccessStaSession.DispatchAsync`, asserts `ProtocolStatusCode.InvalidRequest`, and asserts `reply.DiagnosticMessage` contains "Unsupported" (case-insensitive — matching `CreateInvalidRequestReply`'s `"Unsupported MXAccess command kind ..."` message). Fix applied 2026-06-16, pending build verification on windev.
|
||||
|
||||
### Worker.Tests-036
|
||||
|
||||
@@ -670,10 +670,10 @@ Re-review of the worker-test delta covering the new COM seam (`MxAccessCommandEx
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerPipeSessionTests.cs:983-996` |
|
||||
| Status | Open |
|
||||
| Status | In Progress |
|
||||
|
||||
**Description:** `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop` carries a redundant wall-clock assertion `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)`. The existing `heartbeatWait` CTS (cancel-after 5s) already enforces the same bound — the extra wall-clock check can only fire if the heartbeat arrived but took >5s to be received, which the CTS already prevents. It is the same coarse wall-clock pattern prior findings (Worker.Tests-003/004/013/020) corrected.
|
||||
|
||||
**Recommendation:** Remove the `start`/`elapsed`/`Assert.True(elapsed < ...)` check; the CTS timeout already pins the timing contract.
|
||||
|
||||
**Resolution:** _(empty until closed)_
|
||||
**Resolution:** 2026-06-16 — Removed the `DateTimeOffset start`, `TimeSpan elapsed`, and `Assert.True(elapsed < TimeSpan.FromSeconds(5), ...)` wall-clock assertions from `RunAsync_SendsFirstHeartbeatImmediatelyOnEnteringLoop`. The `heartbeatWait` CTS (cancel-after 5s) already enforces the same timing bound. Added an inline comment explaining why the wall-clock floor is omitted, consistent with the Worker.Tests-003/004/013/020 pattern. Fix applied 2026-06-16, pending build verification on windev.
|
||||
|
||||
Reference in New Issue
Block a user