Also codify targeted-test-per-task rule in CLAUDE.md Source Update Workflow.
8.6 KiB
Still-Pending §8 Completion — Design
Status: Approved 2026-06-16. Next step:
superpowers-extended-cc:writing-plans.
Goal: Close the actionable items in stillpending.md §8 ("Deferred test-coverage
follow-ups, never filed as findings") — the only Bucket-A work that is neither
vendor-gated nor live-rig-gated and is not already covered by the session-resilience
epic plan.
Scope decision: Bucket A only (actionable code/test work). The session-resilience
epic (Tasks 13–28) is already planned in docs/plans/2026-06-15-session-resilience.md
and is explicitly out of scope here — resume it separately. Vendor-gated
(§1.4/§3.4/§3.5) and live-rig/capture-gated (§1.3/§3.x/§5/§6.1) items cannot be
completed from this dev box and are out of scope.
Approach: "C" — the complete option, including new in-process gRPC test infrastructure for the Java streaming/galaxy CLI commands and a full bounded ready-wait in the gateway session hot path.
Important correction (verified 2026-06-16)
The three §8 items cite findings marked Resolved in the review backlog, but those resolutions did not survive into the current tree:
- The Java bulk-family CLI tests that
Client.Java-026(resolved 2026-05-20) describes were written against the oldcom.dohertylan.mxgatewaypackage. After the rename tocom.zb.mom.ww, the currentclients/java/zb-mom-ww-mxgateway-cli/.../MxGatewayCliTests.javahas zero coverage forread-bulk,write-bulk,write2-bulk,write-secured-bulk,write-secured2-bulk,bench-read-bulk,stream-events,close-session,galaxy-discover,galaxy-watch. (galaxy-test-connection/galaxy-last-deploy/galaxy-browse/stream-alarmsdo have tests now.) Server-030(both states in the not-ready diagnostic) is done — confirmed atsrc/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs:1676. The deferred follow-up — should the gateway briefly wait for worker-Ready before failing fast? — is genuinely unbuilt.Tests-023extracted a canonicalTestSupport/FakeWorkerProcess(int), yet three test files still define private nested copies.
So §8's gap is real and current.
Workstreams
Four independently landable workstreams.
| WS | Title | Files (language) | Classification | Depends on |
|---|---|---|---|---|
| A | Synchronous Java CLI tests (7 commands) | Java CLI test | small | — |
| B | In-process gRPC harness + streaming/galaxy CLI tests (3 commands) | Java CLI test + small CLI seam | standard | A (shares test file) |
| C | Worker-Ready bounded ready-wait | C# server session hot path | high-risk | — |
| D | FakeWorkerProcess consolidation |
C# tests | small | — |
A, C, D are mutually independent (disjoint files/languages) and may be dispatched in
parallel. B follows A because both edit MxGatewayCliTests.java.
WS-A — Synchronous Java CLI tests
What: Round-trip CLI tests for the 7 commands testable through the existing
FakeSession/FakeClient seam (the same seam subscribe-bulk/write already use):
read-bulk, write-bulk, write2-bulk, write-secured-bulk, write-secured2-bulk,
bench-read-bulk, close-session.
How: Upgrade FakeSession (currently returns empty lists) to per-call recorders
that capture the parsed entries (timeout, typed values via the shared parseValue(type, text) switch, user-ids, timestamp) and synthesize one BulkReadResult/BulkWriteResult
per requested handle, so JSON-shape assertions exercise the
bulkReadResultMap/bulkWriteResultMap serializers. One @Test per command:
read-bulk:--timeout-msreaches session; JSON carriestagAddress/itemHandle/wasCached/quality.write-bulk:--type int32 --values 111,222 --user-id 5parses throughparseValue; entries built with the expected typedMxValue+userId.write2-bulk:--timestamp …Zreaches the entry astimestampValue(hasTimestampValue()true).write-secured-bulk:--current-user-id/--verifier-user-idboth propagate.write-secured2-bulk: timestamp + both user-ids.bench-read-bulk: 1s steady / 0s warmup; assert cross-language schema keys (language=java,command=bench-read-bulk,totalCalls,successfulCalls,failedCalls,callsPerSecond,latencyMs.p50/p95/p99).close-session:CloseSessionReplyround-trips throughFakeClient.
Verify: gradle :zb-mom-ww-mxgateway-cli:test --tests *MxGatewayCliTests.
WS-B — In-process gRPC harness + streaming/galaxy CLI tests
Why infra is required: MxEventStream and DeployEventStream have package-private
constructors; GalaxyRepositoryClient is final with a static connect() and
GalaxyCommand has no injectable factory. None of stream-events/galaxy-watch/
galaxy-discover can be faked through the FakeSession seam.
What: A JUnit fixture that starts a gRPC InProcessServer hosting scripted
MxAccessGateway + GalaxyRepository service implementations and exposes an in-process
Channel. The real MxGatewayClient/GalaxyRepositoryClient connect to it, so the
real MxEventStream/DeployEventStream queue-draining and GalaxyRepositoryClient
paging are exercised end-to-end (highest fidelity; no reflection, no package hacks).
- Production change (CLI module only, not the library): add a
GalaxyClientFactoryseam toGalaxyCommandmirroring the existingMxGatewayCliClientFactory, so galaxy commands can target the in-process channel. stream-events: server streams a scriptedMxEventsequence → assert CLI render, including the unsigned-uint64 worker-sequence regression.galaxy-watch: server streams scripted deploy events → assert CLI feed output.galaxy-discover: server returns a pagedGalaxyObjecthierarchy → assert CLI JSON.
The 7 synchronous commands stay on the lightweight FakeSession seam (YAGNI — no reason
to route them through a server).
Verify: gradle :zb-mom-ww-mxgateway-cli:test --tests *MxGatewayCliTests.
WS-C — Worker-Ready bounded ready-wait
Problem: GetReadyWorkerClient (GatewaySession.cs:1665) fails fast when the session
is Ready but the worker client's WorkerClientState has diverged (Handshaking after a
heartbeat blip, etc.). The both-states diagnostic exists; a brief wait does not.
Constraint: the check runs inside the _syncRoot lock — we cannot sleep/poll there.
Design (pinned decisions):
- New
GetReadyWorkerClientAsync: read state under_syncRoot; if session isReadybut worker is transient (Handshaking/Created), release the lock, poll at a short interval (e.g. 25 ms) until the worker reachesReadyor a bounded timeout elapses, then re-check under the lock. - Terminal worker states (
Faulted/Closing/Closed/null) fail fast immediately — never wait; retrying a faulted worker is pointless and would mask the fault. - New config
MxGateway:Sessions:WorkerReadyWaitTimeoutonGatewaySessionOptions, default0= disabled (preserves today's exact fail-fast behavior unless opted in), validated>= 0by the options validator. Document indocs/GatewayConfiguration.md. - The both-states diagnostic is preserved for the final failure. Callers at
GatewaySession.cs:918and:1263becomeawait.
Tests:
- Handshaking→Ready within the timeout succeeds (worker invoked once).
- Faulted fails fast with both states in the message, zero waiting.
- Timeout elapses → fails with both states.
- Default
0→ unchanged fail-fast (no wait, no behavior change).
Verify: dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~SessionManager"
(plus the options-validator test class).
WS-D — FakeWorkerProcess consolidation
What: Replace the private nested FakeWorkerProcess in
SessionWorkerClientFactoryFakeWorkerTests, WorkerProcessLauncherTests, and
WorkerClientTests with the canonical TestSupport/FakeWorkerProcess(int) (which already
has MarkExited/Kill/TCS-backed WaitForExitAsync). Where a nested copy carries extra
behavior the canonical lacks, fold that into the canonical first, then delete the copies.
Verify: dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~WorkerClient | FullyQualifiedName~WorkerProcessLauncher | FullyQualifiedName~SessionWorkerClientFactory".
Testing & sequencing
Per the targeted-test rule in CLAUDE.md (Source Update Workflow): each task runs only
its own filtered tests. Run the full gateway suite at most once, after WS-C + WS-D land.
Out-of-scope items remain recorded in stillpending.md (vendor/rig-gated) and the
session-resilience epic (oldtasks.md).