Closes the new findings from the post-remediation re-review
(codereviews/2026-05-14/ReReviewAfterRemediation.md):
NC1 — ProxyWorker.StopAsync drain loop is structurally always-zero
Wave 1's W1.5 inherited the original ShutdownCoordinator bug it was
meant to replace. Supervisor.StopAsync nulls the per-mux counter
provider before the drain loop runs, so CountInFlight always returns 0
and the drain budget is never spent on actual draining. Fix: snapshot
the in-flight count BEFORE supervisor stop, drop the theatrical
post-stop loop, and report InFlightAtCancel as the snapshot count
(= the number of in-flight requests dropped by the stop). The
supervisor stop IS the drain — there is nothing to drain that
wouldn't be killed by the stop itself.
NM1 — TearDownBackendAsync._connectGate.WaitAsync uncancellable
Without a token, a long Polly-wrapped EnsureBackendConnectedAsync
against an unreachable host could hold the gate for the full
BackendConnectTimeoutMs * MaxAttempts window, blocking DisposeAsync
(and therefore ProxyWorker.StopAsync) for that duration. Fix: bound
the wait with a 2 s teardown deadline; on timeout proceed
best-effort without the gate. Worst-case consequence is one orphaned
in-flight cycle on the dying backend, surfaced to upstream as
exception 0x0B by the watchdog.
NM2 — ReplaceContext non-atomic ctx + provider swap
Snapshot path reads `_cacheStatsProvider` independently of `_ctx`. If
`_ctx` was swapped first, a snapshot taken in the gap would still hold
the OLD adapter wrapping the OLD cache — which the supervisor disposes
immediately after we return. Fix: set the provider FIRST, then swap
`_ctx`. Snapshots in the swap window now read either (old, old) or
(new, new), never (old-after-disposed).
NM5 — Self-cascade ObjectDisposedException after dispose
Writer/reader fault catches fired `_ = TearDownBackendAsync(...)`
unconditionally. After DisposeAsync runs `_connectGate.Dispose()`, the
fire-and-forget TearDown threw ObjectDisposedException on WaitAsync as
an unobserved Task exception. Fix: skip self-cascade when
`_disposeCts.IsCancellationRequested` — DisposeAsync runs an explicit
TearDown anyway.
Nm1 — Saturation cleanup uses await SendResponseAsync
W1.2's per-attacher delivery loop awaited the blocking SendResponseAsync,
which would serialise on a wedged late-attacher's full bounded channel
and stall delivery to its peers — contradicting the W1.3 doctrine that
the fan-out path must never await per-pipe writes. Fix: use
TrySendResponse and increment ResponseDropForFullUpstream on drop.
T2 — WatchdogVsResponse_Race seeded Random fragility
Used `new Random(12345)` over [350, 450) ms with watchdog at 400 ms;
Random's algorithm is implementation-defined across .NET major versions
(legacy → Xoshiro128 in .NET 6) so a runtime upgrade could land all
samples on one side of the deadline and break the "both branches must
fire" assertion. Fix: deterministic counter-based alternation (15 fast
+ 15 slow across 30 iterations) — guaranteed by construction.
Latent items NM3 (_supervisorCts leak on re-Start) and NM4 (TCS
single-shot semantics) are unfixed: no caller actually re-Starts a
supervisor today; both become real only if the reconciler ever changes
to re-Start instead of dispose-and-rebuild. Documented in the re-review.
Tests: 387 pass / 0 fail. Three back-to-back race-test runs in
isolation all green (T2 alternation is deterministic).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight area-focused reviews (BCD rewriter, multiplexer, response cache,
supervisor + hot-reload, admin + diagnostics, hosting + options, test
suite) plus an Overview that prioritises findings across areas, and a
RemediationPlan that groups the work into three waves with per-item
file:line citations and regression-test sketches.
Findings call out: hot-reload tag-list/cache changes that don't reach the
running multiplexer, a coalescing factory leak that hangs late attachers,
backend-reader head-of-line block on a wedged upstream, stranded outbound
frames after cascade, and ShutdownCoordinator double-stop ordering. Plus
the unconventional 32-bit BCD wire format (two base-10000 digits in CDAB,
not standard binary), unreachable BcdValidationError.DuplicateAddress,
mbproxy.cache.flushed event that's defined but never emitted, and missing
test coverage for Cache.AllowLongTtl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>