Files
wwtools/mbproxy/tests/Mbproxy.Tests
Joseph Doherty 7a435957ee mbproxy: Wave 4 — fix issues introduced by the Wave-1/2 fixes
Closes the new findings from the post-remediation re-review
(codereviews/2026-05-14/ReReviewAfterRemediation.md):

NC1 — ProxyWorker.StopAsync drain loop is structurally always-zero
  Wave 1's W1.5 inherited the original ShutdownCoordinator bug it was
  meant to replace. Supervisor.StopAsync nulls the per-mux counter
  provider before the drain loop runs, so CountInFlight always returns 0
  and the drain budget is never spent on actual draining. Fix: snapshot
  the in-flight count BEFORE supervisor stop, drop the theatrical
  post-stop loop, and report InFlightAtCancel as the snapshot count
  (= the number of in-flight requests dropped by the stop). The
  supervisor stop IS the drain — there is nothing to drain that
  wouldn't be killed by the stop itself.

NM1 — TearDownBackendAsync._connectGate.WaitAsync uncancellable
  Without a token, a long Polly-wrapped EnsureBackendConnectedAsync
  against an unreachable host could hold the gate for the full
  BackendConnectTimeoutMs * MaxAttempts window, blocking DisposeAsync
  (and therefore ProxyWorker.StopAsync) for that duration. Fix: bound
  the wait with a 2 s teardown deadline; on timeout proceed
  best-effort without the gate. Worst-case consequence is one orphaned
  in-flight cycle on the dying backend, surfaced to upstream as
  exception 0x0B by the watchdog.

NM2 — ReplaceContext non-atomic ctx + provider swap
  Snapshot path reads `_cacheStatsProvider` independently of `_ctx`. If
  `_ctx` was swapped first, a snapshot taken in the gap would still hold
  the OLD adapter wrapping the OLD cache — which the supervisor disposes
  immediately after we return. Fix: set the provider FIRST, then swap
  `_ctx`. Snapshots in the swap window now read either (old, old) or
  (new, new), never (old-after-disposed).

NM5 — Self-cascade ObjectDisposedException after dispose
  Writer/reader fault catches fired `_ = TearDownBackendAsync(...)`
  unconditionally. After DisposeAsync runs `_connectGate.Dispose()`, the
  fire-and-forget TearDown threw ObjectDisposedException on WaitAsync as
  an unobserved Task exception. Fix: skip self-cascade when
  `_disposeCts.IsCancellationRequested` — DisposeAsync runs an explicit
  TearDown anyway.

Nm1 — Saturation cleanup uses await SendResponseAsync
  W1.2's per-attacher delivery loop awaited the blocking SendResponseAsync,
  which would serialise on a wedged late-attacher's full bounded channel
  and stall delivery to its peers — contradicting the W1.3 doctrine that
  the fan-out path must never await per-pipe writes. Fix: use
  TrySendResponse and increment ResponseDropForFullUpstream on drop.

T2 — WatchdogVsResponse_Race seeded Random fragility
  Used `new Random(12345)` over [350, 450) ms with watchdog at 400 ms;
  Random's algorithm is implementation-defined across .NET major versions
  (legacy → Xoshiro128 in .NET 6) so a runtime upgrade could land all
  samples on one side of the deadline and break the "both branches must
  fire" assertion. Fix: deterministic counter-based alternation (15 fast
  + 15 slow across 30 iterations) — guaranteed by construction.

Latent items NM3 (_supervisorCts leak on re-Start) and NM4 (TCS
single-shot semantics) are unfixed: no caller actually re-Starts a
supervisor today; both become real only if the reconciler ever changes
to re-Start instead of dispose-and-rebuild. Documented in the re-review.

Tests: 387 pass / 0 fail. Three back-to-back race-test runs in
isolation all green (T2 alternation is deterministic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 06:52:33 -04:00
..