23 KiB
Still-Pending §8 Completion Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.
Goal: Close the three actionable stillpending.md §8 test-coverage follow-ups — Java CLI coverage for the 10 untested subcommands, the gateway Worker-Ready bounded ready-wait, and the FakeWorkerProcess de-duplication.
Architecture: Four independent workstreams. Java CLI tests split into a synchronous tier (existing FakeSession seam) and a streaming/galaxy tier (new in-process gRPC harness over the real client, using the public Channel constructors that already exist). C# work adds an opt-in bounded ready-wait in the session hot path (default off = no behavior change) and consolidates three duplicate test fakes onto the canonical TestSupport/FakeWorkerProcess.
Tech Stack: Java 21 + Gradle + picocli + grpc-java (grpc-inprocess, grpc-testing); .NET 10 + xUnit.
Design doc: docs/plans/2026-06-16-stillpending-section8-design.md. Branch: feat/stillpending-section8.
Key facts verified during planning:
clients/java/.../cli/MxGatewayCli.java:GatewayCommandhas aclientFactory(MxGatewayCliClientFactory) seam tests already override;GalaxyCommand.connect()(line ~368) calls the staticGalaxyRepositoryClient.connect(...)with no injectable seam. TheFakeSession/FakeClient/FakeClientFactorytest doubles live inMxGatewayCliTests.java(~lines 636–984).MxGatewayClient(Channel, MxGatewayClientOptions)andGalaxyRepositoryClient(Channel, MxGatewayClientOptions)are public constructors (line 67 of each) — point them at an in-process channel, no library change needed.grpc-inprocess+grpc-testingare test deps in the client module only; the cli module'sbuild.gradleneeds them added.- C#: option class is
SessionOptions(src/ZB.MOM.WW.MxGateway.Server/Configuration/SessionOptions.cs), config sectionMxGateway:Sessions,{ get; init; }style.GetReadyWorkerClient()is atGatewaySession.cs:1665; callers areInvokeAsync(:918, already async) andReadEventsAsync(:1263, returnsIAsyncEnumerablenon-async — must become an async iterator).
Out of scope: Session-resilience epic (Tasks 13–28, see docs/plans/2026-06-15-session-resilience.md); vendor/rig-gated §1.3/§1.4/§3.x/§5/§6.1 items.
Testing rule (CLAUDE.md): Each task runs ONLY its own filtered tests. Full gateway suite at most once, after Tasks 8 + 9 land.
Workstream / dependency overview
| Task | WS | Title | Class | Files | blockedBy | ∥ with |
|---|---|---|---|---|---|---|
| 1 | A | FakeSession recorders + read/write/write2-bulk tests | small | cli test | — | 3,4,7,9 |
| 2 | A | secured/secured2/bench-bulk + close-session tests | small | cli test | 1 | 3,4,7,9 |
| 3 | B | GalaxyClientFactory seam + cli grpc test deps | small | cli main + build.gradle | — | 1,2,4,7,8,9 |
| 4 | B | In-process gRPC harness fixture | standard | new cli test file | — | 1,2,3,7,8,9 |
| 5 | B | stream-events test via harness | standard | cli test | 2,4 | 7,8,9 |
| 6 | B | galaxy-watch + galaxy-discover tests via harness | standard | cli test | 3,5 | 7,8,9 |
| 7 | C | WorkerReadyWaitTimeoutMs option + validator + doc | small | C# config + doc | — | all Java |
| 8 | C | Bounded ready-wait in GatewaySession + tests | high-risk | C# server + test | 7 | all Java, 9 |
| 9 | D | FakeWorkerProcess consolidation | standard | C# tests | — | all |
Task 1: FakeSession recorders + read-bulk / write-bulk / write2-bulk CLI tests
Classification: small Estimated implement time: ~5 min Parallelizable with: Task 3, 4, 7, 9
Files:
- Test:
clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
Context: FakeSession (a MxGatewayCli.MxGatewayCliSession impl, ~line 812) currently returns empty lists from readBulk/writeBulk/write2Bulk. Empty returns make CLI JSON-shape assertions vacuous. Mirror the existing subscribeBulkCommandPrintsResults test style (uses FakeClientFactory → execute(...) → asserts on captured stdout JSON).
Step 1: Upgrade FakeSession to record + synthesize. Add fields capturing the last call args (e.g. lastReadBulkTimeoutMs, lastReadBulkItems, lastWriteBulkEntries, lastWrite2BulkEntries) and change readBulk/writeBulk/write2Bulk to synthesize one result per requested handle: a BulkReadResult carrying tagAddress, itemHandle, wasCached, quality; a BulkWriteResult carrying the handle + an Ok status. Keep empty-list default only when no handles requested.
Step 2: Write the three failing tests:
readBulkCommandForwardsTimeoutAndPrintsResults— runread-bulkwith--timeout-ms 750+ two tags; assertlastReadBulkTimeoutMs == 750and the stdout JSON carries per-tagtagAddress/itemHandle/wasCached/quality.writeBulkCommandParsesTypedValuesAndPrintsResults—--type int32 --values 111,222 --user-id 5; assert entries parsed through the sharedparseValueswitch into typedMxValues withuserId==5, and JSON shows thebulkWriteResultMap.write2BulkCommandForwardsTimestampAndPrintsResults—--timestamp 2026-05-20T00:00:00Z; assert the entry'shasTimestampValue()is true.
Step 3: Run them and confirm they fail (empty/zero before the recorder upgrade):
gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests' (from clients/java).
Step 4: With the Step-1 recorder in place, run again — expect PASS.
Step 5: Commit.
git add clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
git commit -m "test(java-cli): cover read-bulk/write-bulk/write2-bulk round trips"
Acceptance: 3 new green tests; FakeSession records args and returns one row per handle.
Task 2: secured / secured2 / bench-read-bulk + close-session CLI tests
Classification: small
Estimated implement time: ~5 min
Parallelizable with: Task 3, 4, 7, 9
blockedBy: Task 1 (same test file + shared FakeSession recorders)
Files:
- Test:
clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
Step 1: Extend FakeSession/FakeClient recorders for writeSecuredBulk/writeSecured2Bulk (capture currentUserId/verifierUserId) and add a CloseSessionReply recorder to FakeClient for closeSession.
Step 2: Write the four failing tests:
writeSecuredBulkCommandForwardsUserIdsAndPrintsResults—--current-user-id 7 --verifier-user-id 8; assert both propagate.writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults— timestamp + both user-ids.benchReadBulkCommandEmitsJsonSchemaKeys—--duration-seconds 1 --warmup-seconds 0; assert the JSON containslanguage=java,command=bench-read-bulk,bulkSize,totalCalls,successfulCalls,failedCalls,callsPerSecond,latencyMs.p50/p95/p99, and the synthesizedtags. Assert schema keys, NOT numeric values.closeSessionCommandPrintsReply— assert theCloseSessionReplyround-trips to stdout.
Step 3–4: Run failing → implement recorders → run PASS (same gradle command as Task 1, narrowed --tests '*MxGatewayCliTests').
Step 5: Commit test(java-cli): cover secured/secured2/bench bulk + close-session.
Acceptance: 4 new green tests; bench test pins schema keys only.
Task 3: GalaxyClientFactory seam + cli grpc test deps
Classification: small Estimated implement time: ~4 min Parallelizable with: Task 1, 2, 4, 7, 8, 9
Files:
- Modify:
clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java(GalaxyCommand, ~lines 361–371;connect()~line 368) - Modify:
clients/java/zb-mom-ww-mxgateway-cli/build.gradle
Context: GalaxyCommand.connect() hard-calls GalaxyRepositoryClient.connect(...). Mirror the existing MxGatewayCliClientFactory seam so tests can supply an in-process-backed client.
Step 1: Add the seam. Introduce an interface GalaxyClientFactory { GalaxyRepositoryClient connect(MxGatewayClientOptions options); }, give GalaxyCommand a final GalaxyClientFactory galaxyClientFactory field (constructor-injected, defaulting to GalaxyRepositoryClient::connect for production wiring), and change connect() to delegate to it. Thread the factory through the picocli command construction the same way clientFactory is threaded for gateway commands. Keep the default production path identical (no behavior change).
Step 2: Add test deps to clients/java/zb-mom-ww-mxgateway-cli/build.gradle:
testImplementation "io.grpc:grpc-inprocess:${grpcVersion}"
testImplementation "io.grpc:grpc-testing:${grpcVersion}"
Step 3: Build to confirm wiring compiles + production galaxy commands still resolve:
gradle :zb-mom-ww-mxgateway-cli:compileJava :zb-mom-ww-mxgateway-cli:compileTestJava (from clients/java).
Step 4: Run the existing galaxy CLI tests to confirm no regression:
gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.
Step 5: Commit feat(java-cli): inject GalaxyClientFactory seam; add grpc inprocess test deps.
Acceptance: Seam present, default wiring unchanged, existing galaxy tests green, in-process deps available to the test source set.
Task 4: In-process gRPC harness fixture
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 1, 2, 3, 7, 8, 9
Files:
- Create:
clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java
Context: Streaming/galaxy commands can't use FakeSession (real MxEventStream/DeployEventStream package-private ctors; GalaxyRepositoryClient final). Drive the real client over an in-process channel against scripted fake services. The public MxGatewayClient(Channel, options) / GalaxyRepositoryClient(Channel, options) ctors make this clean.
Step 1: Build the fixture — AutoCloseable, unique server name per instance:
- Start
InProcessServerBuilder.forName(name).directExecutor().addService(fakeGateway).addService(fakeGalaxy).build().start(). - Expose
ManagedChannel channel()viaInProcessChannelBuilder.forName(name).directExecutor().build(). fakeGatewayextendsMxAccessGatewayGrpc.MxAccessGatewayImplBase, overridingstreamEvents(push a scriptedList<MxEvent>to theStreamObserver, thenonCompleted) andcloseSession.fakeGalaxyextendsGalaxyRepositoryGrpc.GalaxyRepositoryImplBase, overridingdiscoverHierarchy(return a small pagedGalaxyObjectset) andwatchDeployEvents(stream scripted deploy events). Make the scripted payloads settable on the harness (constructor args or setters).- Provide helpers:
MxGatewayClient gatewayClient()→new MxGatewayClient(channel(), testOptions());GalaxyRepositoryClient galaxyClient()→new GalaxyRepositoryClient(channel(), testOptions()), wheretestOptions()builds anMxGatewayClientOptionswith a dummy api-key. close()shuts down channel + server.
Step 2: Smoke-verify the harness in isolation — add a temporary @Test (or a tiny self-test) that opens the harness, calls gatewayClient() and streams one scripted event, then delete it before commit. Build:
gradle :zb-mom-ww-mxgateway-cli:compileTestJava.
Step 3: Run the cli test set to confirm nothing breaks:
gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.
Step 4: Commit test(java-cli): add in-process gRPC harness fixture.
Acceptance: Compiles; harness starts/stops cleanly; scripted services reachable through the real client types. (No assertions on CLI yet — that's Tasks 5–6.)
Task 5: stream-events CLI test via harness
Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 7, 8, 9 blockedBy: Task 2 (same test file ordering), Task 4 (harness)
Files:
- Test:
clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
Step 1: Wire a harness-backed MxGatewayCliClientFactory in the test that builds the CLI client over harness.gatewayClient() (reuse the production adapter that wraps MxGatewayClient as MxGatewayCliClient; it is package-visible from the test's package). Script ≥2 MxEvents including one with a high uint64 worker sequence to cover the unsigned-format regression.
Step 2: Write the failing test streamEventsRendersScriptedEventsIncludingHighUint64Sequence — run stream-events, assert stdout contains the scripted event fields and the high sequence renders unsigned (not negative).
Step 3: Run failing → (harness already supplies behavior) → PASS:
gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.
Step 4: Commit test(java-cli): cover stream-events over in-process harness.
Acceptance: stream-events exercised through the real MxEventStream; unsigned-sequence rendering asserted.
Task 6: galaxy-watch + galaxy-discover CLI tests via harness
Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 7, 8, 9 blockedBy: Task 3 (GalaxyClientFactory seam), Task 5 (same test file ordering)
Files:
- Test:
clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
Step 1: Wire a harness-backed GalaxyClientFactory (from Task 3) that returns harness.galaxyClient().
Step 2: Write the failing tests:
galaxyDiscoverPrintsPagedHierarchyJson— assert the scriptedGalaxyObjecthierarchy renders in CLI JSON (object fields + counts).galaxyWatchRendersScriptedDeployEvents— assert the scripted deploy events render in the CLI feed; honor--limitif the command supports it.
Step 3: Run failing → PASS: gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.
Step 4: Full cli module verification (this closes the §8 Java item): gradle :zb-mom-ww-mxgateway-cli:test.
Step 5: Commit test(java-cli): cover galaxy-discover/galaxy-watch over in-process harness.
Acceptance: All 10 previously-untested subcommands now have CLI coverage; MxGatewayCliTests green.
Task 7: WorkerReadyWaitTimeoutMs option + validator + doc
Classification: small Estimated implement time: ~4 min Parallelizable with: all Java tasks (1–6), Task 9
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Configuration/SessionOptions.cs - Modify:
src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs - Modify:
docs/GatewayConfiguration.md - Test:
src/ZB.MOM.WW.MxGateway.Tests/Gateway/Configuration/GatewayOptionsTests.cs(or the existing options-validator test class)
Step 1: Add the option to SessionOptions (match the { get; init; } + XML-doc style):
/// <summary>
/// Gets the bounded time, in milliseconds, the gateway will wait for a worker client
/// to reach <c>Ready</c> when the session itself is already <c>Ready</c> but the worker
/// state has transiently diverged (e.g. <c>Handshaking</c> after a heartbeat blip).
/// The wait applies only to transient worker states; terminal states
/// (<c>Faulted</c>/<c>Closing</c>/<c>Closed</c>/no worker) fail fast immediately.
/// A value of <c>0</c> (the default) disables the wait — the gateway keeps the original
/// fail-fast behavior. Must be greater than or equal to zero.
/// </summary>
public int WorkerReadyWaitTimeoutMs { get; init; }
Step 2: Validate >= 0 in GatewayOptionsValidator (mirror an existing numeric check; message e.g. MxGateway:Sessions:WorkerReadyWaitTimeoutMs must be greater than or equal to zero.).
Step 3: Document the new key in docs/GatewayConfiguration.md under the MxGateway:Sessions section (default 0 = disabled; transient-only; terminal fails fast).
Step 4: Write + run the failing test asserting default is 0 and a negative value fails validation:
dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~GatewayOptions" → expect FAIL pre-impl, PASS post.
Step 5: Commit feat(server): add MxGateway:Sessions:WorkerReadyWaitTimeoutMs (default off).
Acceptance: Option binds, default 0, negative rejected, doc updated.
Task 8: Bounded ready-wait in GatewaySession + tests
Classification: high-risk Estimated implement time: ~5 min (split if it grows) Parallelizable with: all Java tasks, Task 9 blockedBy: Task 7
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs(GetReadyWorkerClient:1665;InvokeAsync:918;ReadEventsAsync:1263) - Test:
src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs(or the test class coveringGetReadyWorkerClientdiagnostics)
Context & constraints: The not-ready check runs inside the _syncRoot lock — never sleep/poll inside the lock. Read state under the lock, release, await, re-check. The both-states diagnostic (Session state is {_state}; worker state is {workerState}.) MUST be preserved for the final failure. Default timeout 0 ⇒ behavior identical to today.
Step 1: Write failing tests first (TDD) in the session-manager test class, using a FakeWorkerClient whose State is settable:
InvokeAsync_WhenWorkerHandshakingThenReadyWithinTimeout_Succeeds— optionWorkerReadyWaitTimeoutMs=500; worker startsHandshaking, flips toReadyafter ~50 ms (e.g. via a backgroundTaskor aTimeProvider-driven advance); assert the invoke succeeds and the worker is invoked once.InvokeAsync_WhenWorkerFaulted_FailsFastWithBothStates— workerFaulted, timeout 500; assert it throws immediately (no meaningful delay) and the message contains bothSession state is Readyandworker state is Faulted.InvokeAsync_WhenTimeoutElapsesStillNotReady_FailsWithBothStates— worker staysHandshaking, timeout small (e.g. 100 ms); assert throw after ~timeout with both states.InvokeAsync_WhenTimeoutZero_FailsFastUnchanged— workerHandshaking, timeout 0; assert immediate fail-fast (pins the no-behavior-change default).
Step 2: Run tests → expect FAIL/compile-error (GetReadyWorkerClientAsync not present):
dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~SessionManager".
Step 3: Implement GetReadyWorkerClientAsync(CancellationToken):
- Under
_syncRoot: capture_stateand_workerClient?.State. If session isReadyand worker isReady→ return it (fast path, no await). If worker is terminal (Faulted/Closing/Closed) or null, or session notReady→ throw the both-statesSessionManagerExceptionnow (fail fast). If worker is transient (Handshaking/Created) ANDWorkerReadyWaitTimeoutMs > 0→ fall through to the wait. - Wait loop OUTSIDE the lock: until a deadline (
now + WorkerReadyWaitTimeoutMs),await Task.Delay(pollIntervalMs, ct)(constpollIntervalMs = 25), then re-acquire_syncRootand re-evaluate: Ready → return; terminal/null/session-not-Ready → fail fast with both states; still transient → keep waiting. On deadline → throw both-states. - Keep the existing synchronous
GetReadyWorkerClient()for any non-async caller, or have it delegate to a zero-wait evaluation to avoid duplicated message logic (extract a privateEvaluateReadyUnderLock(out string failureMessage)helper used by both).
Step 4: Update callers:
InvokeAsync(:918):IWorkerClient workerClient = await GetReadyWorkerClientAsync(cancellationToken).ConfigureAwait(false);.ReadEventsAsync(:1263): convert to an async iterator —public async IAsyncEnumerable<WorkerEvent> ReadEventsAsync([EnumeratorCancellation] CancellationToken cancellationToken),await GetReadyWorkerClientAsync(...),TouchClientActivity(...), thenawait foreach (var e in workerClient.ReadEventsAsync(cancellationToken)) yield return e;. Verify no caller relied on eager (pre-enumeration) throw semantics — if one does, note it for the reviewer.
Step 5: Run the targeted tests → PASS (same filter). Confirm the 4 new tests + pre-existing GetReadyWorkerClient diagnostic test all pass.
Step 6: Commit feat(server): bounded worker-ready wait in GatewaySession (default off).
Acceptance: Transient states wait up to the timeout; terminal states fail fast with both states; default 0 is byte-for-byte the old behavior; no sleeping under _syncRoot.
Task 9: FakeWorkerProcess consolidation
Classification: standard Estimated implement time: ~5 min Parallelizable with: all tasks
Files:
- Modify:
src/ZB.MOM.WW.MxGateway.Tests/TestSupport/FakeWorkerProcess.cs(canonical — extend if needed) - Modify:
src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionWorkerClientFactoryFakeWorkerTests.cs(nested copy ~line 343) - Modify:
src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerProcessLauncherTests.cs(nested copy ~line 244) - Modify:
src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs(nested copy ~line 767; alreadyusing ...TestSupport)
Context: Canonical TestSupport/FakeWorkerProcess(int) has MarkExited/Kill/TCS-backed WaitForExitAsync. Three test files still declare private nested FakeWorkerProcess. Consolidate.
Step 1: Diff each nested copy against the canonical. For each, list any members/behavior the canonical lacks (e.g. extra counters, scripted exit codes). Fold those into the canonical TestSupport/FakeWorkerProcess first (additively, so existing canonical users keep compiling).
Step 2: Delete each nested class and update references to the canonical type; add/confirm using ZB.MOM.WW.MxGateway.Tests.TestSupport;.
Step 3: Run the three affected test classes:
dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~WorkerClientTests|FullyQualifiedName~WorkerProcessLauncherTests|FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests"
Expected: all pass (behavior preserved; KillCount/HasExited/ExitCode semantics intact).
Step 4: Commit refactor(tests): consolidate FakeWorkerProcess onto TestSupport canonical.
Acceptance: Exactly one FakeWorkerProcess definition (the canonical); three files import it; affected tests green.
Final verification (after Tasks 8 + 9)
Run the full gateway suite once to confirm no cross-cutting regression:
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj
Expected baseline: prior green count + the new Task 7/8/9 tests; the 3 known macOS-environmental failures (TLS temp-file, OrphanWorkerTerminator ×2) may persist — confirm no new failures.
Then finish via superpowers-extended-cc:finishing-a-development-branch.