Files
mxaccessgw/docs/plans/2026-06-16-stillpending-section8.md
T

23 KiB
Raw Blame History

Still-Pending §8 Completion Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans (or subagent-driven-development) to implement this plan task-by-task.

Goal: Close the three actionable stillpending.md §8 test-coverage follow-ups — Java CLI coverage for the 10 untested subcommands, the gateway Worker-Ready bounded ready-wait, and the FakeWorkerProcess de-duplication.

Architecture: Four independent workstreams. Java CLI tests split into a synchronous tier (existing FakeSession seam) and a streaming/galaxy tier (new in-process gRPC harness over the real client, using the public Channel constructors that already exist). C# work adds an opt-in bounded ready-wait in the session hot path (default off = no behavior change) and consolidates three duplicate test fakes onto the canonical TestSupport/FakeWorkerProcess.

Tech Stack: Java 21 + Gradle + picocli + grpc-java (grpc-inprocess, grpc-testing); .NET 10 + xUnit.

Design doc: docs/plans/2026-06-16-stillpending-section8-design.md. Branch: feat/stillpending-section8.

Key facts verified during planning:

  • clients/java/.../cli/MxGatewayCli.java: GatewayCommand has a clientFactory (MxGatewayCliClientFactory) seam tests already override; GalaxyCommand.connect() (line ~368) calls the static GalaxyRepositoryClient.connect(...) with no injectable seam. The FakeSession/FakeClient/FakeClientFactory test doubles live in MxGatewayCliTests.java (~lines 636984).
  • MxGatewayClient(Channel, MxGatewayClientOptions) and GalaxyRepositoryClient(Channel, MxGatewayClientOptions) are public constructors (line 67 of each) — point them at an in-process channel, no library change needed.
  • grpc-inprocess + grpc-testing are test deps in the client module only; the cli module's build.gradle needs them added.
  • C#: option class is SessionOptions (src/ZB.MOM.WW.MxGateway.Server/Configuration/SessionOptions.cs), config section MxGateway:Sessions, { get; init; } style. GetReadyWorkerClient() is at GatewaySession.cs:1665; callers are InvokeAsync (:918, already async) and ReadEventsAsync (:1263, returns IAsyncEnumerable non-async — must become an async iterator).

Out of scope: Session-resilience epic (Tasks 1328, see docs/plans/2026-06-15-session-resilience.md); vendor/rig-gated §1.3/§1.4/§3.x/§5/§6.1 items.

Testing rule (CLAUDE.md): Each task runs ONLY its own filtered tests. Full gateway suite at most once, after Tasks 8 + 9 land.


Workstream / dependency overview

Task WS Title Class Files blockedBy ∥ with
1 A FakeSession recorders + read/write/write2-bulk tests small cli test 3,4,7,9
2 A secured/secured2/bench-bulk + close-session tests small cli test 1 3,4,7,9
3 B GalaxyClientFactory seam + cli grpc test deps small cli main + build.gradle 1,2,4,7,8,9
4 B In-process gRPC harness fixture standard new cli test file 1,2,3,7,8,9
5 B stream-events test via harness standard cli test 2,4 7,8,9
6 B galaxy-watch + galaxy-discover tests via harness standard cli test 3,5 7,8,9
7 C WorkerReadyWaitTimeoutMs option + validator + doc small C# config + doc all Java
8 C Bounded ready-wait in GatewaySession + tests high-risk C# server + test 7 all Java, 9
9 D FakeWorkerProcess consolidation standard C# tests all

Task 1: FakeSession recorders + read-bulk / write-bulk / write2-bulk CLI tests

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 3, 4, 7, 9

Files:

  • Test: clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java

Context: FakeSession (a MxGatewayCli.MxGatewayCliSession impl, ~line 812) currently returns empty lists from readBulk/writeBulk/write2Bulk. Empty returns make CLI JSON-shape assertions vacuous. Mirror the existing subscribeBulkCommandPrintsResults test style (uses FakeClientFactoryexecute(...) → asserts on captured stdout JSON).

Step 1: Upgrade FakeSession to record + synthesize. Add fields capturing the last call args (e.g. lastReadBulkTimeoutMs, lastReadBulkItems, lastWriteBulkEntries, lastWrite2BulkEntries) and change readBulk/writeBulk/write2Bulk to synthesize one result per requested handle: a BulkReadResult carrying tagAddress, itemHandle, wasCached, quality; a BulkWriteResult carrying the handle + an Ok status. Keep empty-list default only when no handles requested.

Step 2: Write the three failing tests:

  • readBulkCommandForwardsTimeoutAndPrintsResults — run read-bulk with --timeout-ms 750 + two tags; assert lastReadBulkTimeoutMs == 750 and the stdout JSON carries per-tag tagAddress/itemHandle/wasCached/quality.
  • writeBulkCommandParsesTypedValuesAndPrintsResults--type int32 --values 111,222 --user-id 5; assert entries parsed through the shared parseValue switch into typed MxValues with userId==5, and JSON shows the bulkWriteResultMap.
  • write2BulkCommandForwardsTimestampAndPrintsResults--timestamp 2026-05-20T00:00:00Z; assert the entry's hasTimestampValue() is true.

Step 3: Run them and confirm they fail (empty/zero before the recorder upgrade): gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests' (from clients/java).

Step 4: With the Step-1 recorder in place, run again — expect PASS.

Step 5: Commit.

git add clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java
git commit -m "test(java-cli): cover read-bulk/write-bulk/write2-bulk round trips"

Acceptance: 3 new green tests; FakeSession records args and returns one row per handle.


Task 2: secured / secured2 / bench-read-bulk + close-session CLI tests

Classification: small Estimated implement time: ~5 min Parallelizable with: Task 3, 4, 7, 9 blockedBy: Task 1 (same test file + shared FakeSession recorders)

Files:

  • Test: clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java

Step 1: Extend FakeSession/FakeClient recorders for writeSecuredBulk/writeSecured2Bulk (capture currentUserId/verifierUserId) and add a CloseSessionReply recorder to FakeClient for closeSession.

Step 2: Write the four failing tests:

  • writeSecuredBulkCommandForwardsUserIdsAndPrintsResults--current-user-id 7 --verifier-user-id 8; assert both propagate.
  • writeSecured2BulkCommandForwardsTimestampAndUserIdsAndPrintsResults — timestamp + both user-ids.
  • benchReadBulkCommandEmitsJsonSchemaKeys--duration-seconds 1 --warmup-seconds 0; assert the JSON contains language=java, command=bench-read-bulk, bulkSize, totalCalls, successfulCalls, failedCalls, callsPerSecond, latencyMs.p50/p95/p99, and the synthesized tags. Assert schema keys, NOT numeric values.
  • closeSessionCommandPrintsReply — assert the CloseSessionReply round-trips to stdout.

Step 34: Run failing → implement recorders → run PASS (same gradle command as Task 1, narrowed --tests '*MxGatewayCliTests').

Step 5: Commit test(java-cli): cover secured/secured2/bench bulk + close-session.

Acceptance: 4 new green tests; bench test pins schema keys only.


Task 3: GalaxyClientFactory seam + cli grpc test deps

Classification: small Estimated implement time: ~4 min Parallelizable with: Task 1, 2, 4, 7, 8, 9

Files:

  • Modify: clients/java/zb-mom-ww-mxgateway-cli/src/main/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCli.java (GalaxyCommand, ~lines 361371; connect() ~line 368)
  • Modify: clients/java/zb-mom-ww-mxgateway-cli/build.gradle

Context: GalaxyCommand.connect() hard-calls GalaxyRepositoryClient.connect(...). Mirror the existing MxGatewayCliClientFactory seam so tests can supply an in-process-backed client.

Step 1: Add the seam. Introduce an interface GalaxyClientFactory { GalaxyRepositoryClient connect(MxGatewayClientOptions options); }, give GalaxyCommand a final GalaxyClientFactory galaxyClientFactory field (constructor-injected, defaulting to GalaxyRepositoryClient::connect for production wiring), and change connect() to delegate to it. Thread the factory through the picocli command construction the same way clientFactory is threaded for gateway commands. Keep the default production path identical (no behavior change).

Step 2: Add test deps to clients/java/zb-mom-ww-mxgateway-cli/build.gradle:

testImplementation "io.grpc:grpc-inprocess:${grpcVersion}"
testImplementation "io.grpc:grpc-testing:${grpcVersion}"

Step 3: Build to confirm wiring compiles + production galaxy commands still resolve: gradle :zb-mom-ww-mxgateway-cli:compileJava :zb-mom-ww-mxgateway-cli:compileTestJava (from clients/java).

Step 4: Run the existing galaxy CLI tests to confirm no regression: gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.

Step 5: Commit feat(java-cli): inject GalaxyClientFactory seam; add grpc inprocess test deps.

Acceptance: Seam present, default wiring unchanged, existing galaxy tests green, in-process deps available to the test source set.


Task 4: In-process gRPC harness fixture

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 1, 2, 3, 7, 8, 9

Files:

  • Create: clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/InProcessGatewayHarness.java

Context: Streaming/galaxy commands can't use FakeSession (real MxEventStream/DeployEventStream package-private ctors; GalaxyRepositoryClient final). Drive the real client over an in-process channel against scripted fake services. The public MxGatewayClient(Channel, options) / GalaxyRepositoryClient(Channel, options) ctors make this clean.

Step 1: Build the fixtureAutoCloseable, unique server name per instance:

  • Start InProcessServerBuilder.forName(name).directExecutor().addService(fakeGateway).addService(fakeGalaxy).build().start().
  • Expose ManagedChannel channel() via InProcessChannelBuilder.forName(name).directExecutor().build().
  • fakeGateway extends MxAccessGatewayGrpc.MxAccessGatewayImplBase, overriding streamEvents (push a scripted List<MxEvent> to the StreamObserver, then onCompleted) and closeSession. fakeGalaxy extends GalaxyRepositoryGrpc.GalaxyRepositoryImplBase, overriding discoverHierarchy (return a small paged GalaxyObject set) and watchDeployEvents (stream scripted deploy events). Make the scripted payloads settable on the harness (constructor args or setters).
  • Provide helpers: MxGatewayClient gatewayClient()new MxGatewayClient(channel(), testOptions()); GalaxyRepositoryClient galaxyClient()new GalaxyRepositoryClient(channel(), testOptions()), where testOptions() builds an MxGatewayClientOptions with a dummy api-key.
  • close() shuts down channel + server.

Step 2: Smoke-verify the harness in isolation — add a temporary @Test (or a tiny self-test) that opens the harness, calls gatewayClient() and streams one scripted event, then delete it before commit. Build: gradle :zb-mom-ww-mxgateway-cli:compileTestJava.

Step 3: Run the cli test set to confirm nothing breaks: gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.

Step 4: Commit test(java-cli): add in-process gRPC harness fixture.

Acceptance: Compiles; harness starts/stops cleanly; scripted services reachable through the real client types. (No assertions on CLI yet — that's Tasks 56.)


Task 5: stream-events CLI test via harness

Classification: standard Estimated implement time: ~4 min Parallelizable with: Task 7, 8, 9 blockedBy: Task 2 (same test file ordering), Task 4 (harness)

Files:

  • Test: clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java

Step 1: Wire a harness-backed MxGatewayCliClientFactory in the test that builds the CLI client over harness.gatewayClient() (reuse the production adapter that wraps MxGatewayClient as MxGatewayCliClient; it is package-visible from the test's package). Script ≥2 MxEvents including one with a high uint64 worker sequence to cover the unsigned-format regression.

Step 2: Write the failing test streamEventsRendersScriptedEventsIncludingHighUint64Sequence — run stream-events, assert stdout contains the scripted event fields and the high sequence renders unsigned (not negative).

Step 3: Run failing → (harness already supplies behavior) → PASS: gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.

Step 4: Commit test(java-cli): cover stream-events over in-process harness.

Acceptance: stream-events exercised through the real MxEventStream; unsigned-sequence rendering asserted.


Task 6: galaxy-watch + galaxy-discover CLI tests via harness

Classification: standard Estimated implement time: ~5 min Parallelizable with: Task 7, 8, 9 blockedBy: Task 3 (GalaxyClientFactory seam), Task 5 (same test file ordering)

Files:

  • Test: clients/java/zb-mom-ww-mxgateway-cli/src/test/java/com/zb/mom/ww/mxgateway/cli/MxGatewayCliTests.java

Step 1: Wire a harness-backed GalaxyClientFactory (from Task 3) that returns harness.galaxyClient().

Step 2: Write the failing tests:

  • galaxyDiscoverPrintsPagedHierarchyJson — assert the scripted GalaxyObject hierarchy renders in CLI JSON (object fields + counts).
  • galaxyWatchRendersScriptedDeployEvents — assert the scripted deploy events render in the CLI feed; honor --limit if the command supports it.

Step 3: Run failing → PASS: gradle :zb-mom-ww-mxgateway-cli:test --tests '*MxGatewayCliTests'.

Step 4: Full cli module verification (this closes the §8 Java item): gradle :zb-mom-ww-mxgateway-cli:test.

Step 5: Commit test(java-cli): cover galaxy-discover/galaxy-watch over in-process harness.

Acceptance: All 10 previously-untested subcommands now have CLI coverage; MxGatewayCliTests green.


Task 7: WorkerReadyWaitTimeoutMs option + validator + doc

Classification: small Estimated implement time: ~4 min Parallelizable with: all Java tasks (16), Task 9

Files:

  • Modify: src/ZB.MOM.WW.MxGateway.Server/Configuration/SessionOptions.cs
  • Modify: src/ZB.MOM.WW.MxGateway.Server/Configuration/GatewayOptionsValidator.cs
  • Modify: docs/GatewayConfiguration.md
  • Test: src/ZB.MOM.WW.MxGateway.Tests/Gateway/Configuration/GatewayOptionsTests.cs (or the existing options-validator test class)

Step 1: Add the option to SessionOptions (match the { get; init; } + XML-doc style):

/// <summary>
///     Gets the bounded time, in milliseconds, the gateway will wait for a worker client
///     to reach <c>Ready</c> when the session itself is already <c>Ready</c> but the worker
///     state has transiently diverged (e.g. <c>Handshaking</c> after a heartbeat blip).
///     The wait applies only to transient worker states; terminal states
///     (<c>Faulted</c>/<c>Closing</c>/<c>Closed</c>/no worker) fail fast immediately.
///     A value of <c>0</c> (the default) disables the wait — the gateway keeps the original
///     fail-fast behavior. Must be greater than or equal to zero.
/// </summary>
public int WorkerReadyWaitTimeoutMs { get; init; }

Step 2: Validate >= 0 in GatewayOptionsValidator (mirror an existing numeric check; message e.g. MxGateway:Sessions:WorkerReadyWaitTimeoutMs must be greater than or equal to zero.).

Step 3: Document the new key in docs/GatewayConfiguration.md under the MxGateway:Sessions section (default 0 = disabled; transient-only; terminal fails fast).

Step 4: Write + run the failing test asserting default is 0 and a negative value fails validation: dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~GatewayOptions" → expect FAIL pre-impl, PASS post.

Step 5: Commit feat(server): add MxGateway:Sessions:WorkerReadyWaitTimeoutMs (default off).

Acceptance: Option binds, default 0, negative rejected, doc updated.


Task 8: Bounded ready-wait in GatewaySession + tests

Classification: high-risk Estimated implement time: ~5 min (split if it grows) Parallelizable with: all Java tasks, Task 9 blockedBy: Task 7

Files:

  • Modify: src/ZB.MOM.WW.MxGateway.Server/Sessions/GatewaySession.cs (GetReadyWorkerClient :1665; InvokeAsync :918; ReadEventsAsync :1263)
  • Test: src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs (or the test class covering GetReadyWorkerClient diagnostics)

Context & constraints: The not-ready check runs inside the _syncRoot lock — never sleep/poll inside the lock. Read state under the lock, release, await, re-check. The both-states diagnostic (Session state is {_state}; worker state is {workerState}.) MUST be preserved for the final failure. Default timeout 0 ⇒ behavior identical to today.

Step 1: Write failing tests first (TDD) in the session-manager test class, using a FakeWorkerClient whose State is settable:

  • InvokeAsync_WhenWorkerHandshakingThenReadyWithinTimeout_Succeeds — option WorkerReadyWaitTimeoutMs=500; worker starts Handshaking, flips to Ready after ~50 ms (e.g. via a background Task or a TimeProvider-driven advance); assert the invoke succeeds and the worker is invoked once.
  • InvokeAsync_WhenWorkerFaulted_FailsFastWithBothStates — worker Faulted, timeout 500; assert it throws immediately (no meaningful delay) and the message contains both Session state is Ready and worker state is Faulted.
  • InvokeAsync_WhenTimeoutElapsesStillNotReady_FailsWithBothStates — worker stays Handshaking, timeout small (e.g. 100 ms); assert throw after ~timeout with both states.
  • InvokeAsync_WhenTimeoutZero_FailsFastUnchanged — worker Handshaking, timeout 0; assert immediate fail-fast (pins the no-behavior-change default).

Step 2: Run tests → expect FAIL/compile-error (GetReadyWorkerClientAsync not present): dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~SessionManager".

Step 3: Implement GetReadyWorkerClientAsync(CancellationToken):

  • Under _syncRoot: capture _state and _workerClient?.State. If session is Ready and worker is Ready → return it (fast path, no await). If worker is terminal (Faulted/Closing/Closed) or null, or session not Ready → throw the both-states SessionManagerException now (fail fast). If worker is transient (Handshaking/Created) AND WorkerReadyWaitTimeoutMs > 0 → fall through to the wait.
  • Wait loop OUTSIDE the lock: until a deadline (now + WorkerReadyWaitTimeoutMs), await Task.Delay(pollIntervalMs, ct) (const pollIntervalMs = 25), then re-acquire _syncRoot and re-evaluate: Ready → return; terminal/null/session-not-Ready → fail fast with both states; still transient → keep waiting. On deadline → throw both-states.
  • Keep the existing synchronous GetReadyWorkerClient() for any non-async caller, or have it delegate to a zero-wait evaluation to avoid duplicated message logic (extract a private EvaluateReadyUnderLock(out string failureMessage) helper used by both).

Step 4: Update callers:

  • InvokeAsync (:918): IWorkerClient workerClient = await GetReadyWorkerClientAsync(cancellationToken).ConfigureAwait(false);.
  • ReadEventsAsync (:1263): convert to an async iterator — public async IAsyncEnumerable<WorkerEvent> ReadEventsAsync([EnumeratorCancellation] CancellationToken cancellationToken), await GetReadyWorkerClientAsync(...), TouchClientActivity(...), then await foreach (var e in workerClient.ReadEventsAsync(cancellationToken)) yield return e;. Verify no caller relied on eager (pre-enumeration) throw semantics — if one does, note it for the reviewer.

Step 5: Run the targeted tests → PASS (same filter). Confirm the 4 new tests + pre-existing GetReadyWorkerClient diagnostic test all pass.

Step 6: Commit feat(server): bounded worker-ready wait in GatewaySession (default off).

Acceptance: Transient states wait up to the timeout; terminal states fail fast with both states; default 0 is byte-for-byte the old behavior; no sleeping under _syncRoot.


Task 9: FakeWorkerProcess consolidation

Classification: standard Estimated implement time: ~5 min Parallelizable with: all tasks

Files:

  • Modify: src/ZB.MOM.WW.MxGateway.Tests/TestSupport/FakeWorkerProcess.cs (canonical — extend if needed)
  • Modify: src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionWorkerClientFactoryFakeWorkerTests.cs (nested copy ~line 343)
  • Modify: src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerProcessLauncherTests.cs (nested copy ~line 244)
  • Modify: src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs (nested copy ~line 767; already using ...TestSupport)

Context: Canonical TestSupport/FakeWorkerProcess(int) has MarkExited/Kill/TCS-backed WaitForExitAsync. Three test files still declare private nested FakeWorkerProcess. Consolidate.

Step 1: Diff each nested copy against the canonical. For each, list any members/behavior the canonical lacks (e.g. extra counters, scripted exit codes). Fold those into the canonical TestSupport/FakeWorkerProcess first (additively, so existing canonical users keep compiling).

Step 2: Delete each nested class and update references to the canonical type; add/confirm using ZB.MOM.WW.MxGateway.Tests.TestSupport;.

Step 3: Run the three affected test classes:

dotnet test src/ZB.MOM.WW.MxGateway.Tests --filter "FullyQualifiedName~WorkerClientTests|FullyQualifiedName~WorkerProcessLauncherTests|FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests"

Expected: all pass (behavior preserved; KillCount/HasExited/ExitCode semantics intact).

Step 4: Commit refactor(tests): consolidate FakeWorkerProcess onto TestSupport canonical.

Acceptance: Exactly one FakeWorkerProcess definition (the canonical); three files import it; affected tests green.


Final verification (after Tasks 8 + 9)

Run the full gateway suite once to confirm no cross-cutting regression:

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj

Expected baseline: prior green count + the new Task 7/8/9 tests; the 3 known macOS-environmental failures (TLS temp-file, OrphanWorkerTerminator ×2) may persist — confirm no new failures.

Then finish via superpowers-extended-cc:finishing-a-development-branch.