mbproxy: close remaining 5 W3 test gaps from 2026-05-14 review

Closes the 5 "easily addable" W3 test gaps left after the prior W3 commit;
the 5 race-hard gaps remain documented as known omissions per the plan.
Tests: 382 pass / 0 fail (baseline 378 + 4 net new methods — the supervisor
runtime-fault test replaces the existing placeholder).

  #11 BcdCodecTests.Encode16_IntMinValue_Throws_OutOfRange_NoArithmeticSurprise
      Locks the (uint)value > Max16 boundary check against int.MinValue. The
      cast becomes 0x80000000 which is well above 9999, so the throw fires
      cleanly. Prevents regression to a two-sided int comparison that would
      underflow.

  #15 BcdPduPipelineTests.FC03_Request_QtyAbove128_AtNonBcdAddress_PassesThroughUnchanged
      DL205/DL260 caps FC03/FC04 at qty=128 (DL260/dl205.md). The proxy must
      NOT truncate the qty field — passing through unchanged lets the PLC's
      own validator return exception 03 to the client (transparent contract
      for FCs/addresses the rewriter doesn't own).

  #4 SupervisorTests.Supervisor_RuntimeFault_OnRunningListener_RecoversAndRebinds
      Replaces the previous placeholder. Genuinely faults the running listener
      mid-life by stopping its underlying TcpListener via reflection (the
      single externally-observable hook to force the accept loop's
      AcceptAsync to throw ObjectDisposedException). Asserts the supervisor
      transitions to Recovering, re-binds via the Polly pipeline, and bumps
      RecoveryAttempts.

  #10 HotReloadE2ETests.E2E_ReadCoalescingEnabled_FlipAtRuntime_PropagatesToOptionsMonitor
      Validates that flipping Mbproxy.Resilience.ReadCoalescing.Enabled at
      runtime via hot-reload propagates through the live IOptionsMonitor.
      The W2.1 fix wires the accessor through to add/restart supervisors;
      the multiplexer reads it per-PDU (unit-tested separately). Proving
      IOptionsMonitor sees the new value is sufficient for the contract.

  #16 ConfigReconcilerTests.Apply_ManyConcurrentReloads_With_PlcChurn_NoCorruption
      Stress-tests the W2.3 ConcurrentDictionary fix. 16 concurrent applies
      cycle through 8 distinct PLC rosters, driving Add+Remove churn against
      the live supervisor dict. Without W2.3 the inner Task.WhenAll
      continuations would corrupt Dictionary<,> and crash with
      KeyNotFoundException / ArgumentException. Asserts every apply succeeds,
      no orphan supervisors remain, and the reload counter equals 16.

The 5 deterministically-race-hard gaps (#5 TxId saturation propagation, #6
coalescing factory leak under saturation, #7 backend-reader head-of-line
block, #8 watchdog↔response race, #9 cascade↔new-accept race) remain open
by design — reproducing those races deterministically requires test seams in
production code or stress-style tests that flake on slow CI. The Wave-1
fixes are still verified at the unit-contract level
(UpstreamPipeTests.TrySendResponse_WhenChannelFull, etc.).

This closes everything actionable in codereviews/2026-05-14/RemediationPlan.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 06:17:03 -04:00
parent 7ead3581ab
commit 2545237973
5 changed files with 211 additions and 12 deletions
@@ -404,6 +404,27 @@ public sealed class BcdPduPipelineTests
ctx.Counters.Snapshot().RewrittenSlots.ShouldBe(0);
}
/// <summary>
/// Phase 12 (W3 test gap #15) — DL205/DL260 caps FC03/FC04 reads at qty=128 (above
/// Modbus spec's 125; documented in DL260/dl205.md). The proxy must NOT truncate the
/// qty field — a request with qty &gt; 128 at non-BCD addresses must pass through
/// unchanged so the PLC's own validator returns exception 03 to the client. This is
/// the transparent-pass-through contract for FCs and addresses the rewriter doesn't
/// own.
/// </summary>
[Fact]
public void FC03_Request_QtyAbove128_AtNonBcdAddress_PassesThroughUnchanged()
{
var ctx = MakeContext(BcdTag.Create(1024, 16)); // tag elsewhere; not in this read
var req = Fc03Request(address: 5000, qty: 200);
byte[] original = [..req];
Pipeline.Process(MbapDirection.RequestToBackend, ReadOnlySpan<byte>.Empty, req.AsSpan(), ctx);
req.ShouldBe(original, "qty must NOT be truncated; the PLC validates and returns ex03");
ctx.Counters.Snapshot().RewrittenSlots.ShouldBe(0);
}
/// <summary>
/// Phase 12 (W3 test gap) — symmetric inverse of the existing partial-overlap test:
/// the write range starts ON the high register of a 32-bit pair (low word is BEFORE
@@ -6,6 +6,7 @@ using Mbproxy.Proxy.Supervision;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Polly;
using Shouldly;
using Xunit;
namespace Mbproxy.Tests.Proxy.Supervision;
@@ -174,28 +175,63 @@ public sealed class SupervisorTests
// ── Test 4: runtime fault triggers recovery ──────────────────────────────────────────
/// <summary>
/// Phase 12 (W3 test gap #4) — replaces the previous placeholder. Genuinely faults
/// the running listener mid-life by stopping its underlying <see cref="TcpListener"/>
/// via reflection (the only externally-observable hook to force the accept loop's
/// <see cref="Socket.AcceptAsync"/> to throw <see cref="ObjectDisposedException"/>).
/// The supervisor must observe the fault, transition to <see cref="SupervisorState.Recovering"/>,
/// and re-bind on the next Polly attempt — emitting one
/// <c>mbproxy.listener.recovered</c> event and bumping <c>RecoveryAttempts</c>.
/// </summary>
[Fact]
public async Task Supervisor_RuntimeFault_TriggersRecovery()
public async Task Supervisor_RuntimeFault_OnRunningListener_RecoversAndRebinds()
{
// This test verifies that a supervisor that starts successfully stays Bound
// and that recovery mechanics are wired. For a full runtime-fault scenario,
// see the E2E tests. Here we verify:
// 1. Supervisor reaches Bound.
// 2. After StopAsync, transitions to Stopped.
// 3. RecoveryAttempts is 0 when no fault occurred.
int port = PickFreePort();
// Fast retry so the test completes in a few hundred ms.
var pipeline = FastRecoveryPipeline(initialMs: 100, steadyMs: 100);
await using var supervisor = BuildSupervisor(port, pipeline);
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(15));
await supervisor.StartAsync(cts.Token);
await supervisor.WaitForInitialBindAttemptAsync(cts.Token);
Assert.Equal(SupervisorState.Bound, supervisor.Snapshot().State);
var snap = supervisor.Snapshot();
Assert.Equal(SupervisorState.Bound, snap.State);
Assert.Equal(0, snap.RecoveryAttempts);
// Force the accept loop to fault by reaching into PlcListener._listener via
// reflection and calling Stop(). PlcListener.RunAsync's catch handler observes
// the resulting ObjectDisposedException, returns normally without cancellation,
// and the supervisor's "RunAsync returned without cancellation" branch fires —
// transitioning to Recovering and re-throwing into Polly.
var supervisorType = typeof(PlcListenerSupervisor);
var currentListenerField = supervisorType.GetField("_currentListener",
System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)!;
var listener = currentListenerField.GetValue(supervisor);
listener.ShouldNotBeNull("_currentListener must be set after Bound");
var listenerType = listener.GetType();
var innerListenerField = listenerType.GetField("_listener",
System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)!;
var inner = (TcpListener?)innerListenerField.GetValue(listener);
inner.ShouldNotBeNull("PlcListener._listener must be set after StartAsync");
inner!.Stop();
// Wait for re-bind: state must return to Bound and recoveryAttempts must be ≥ 1.
bool recovered = false;
for (int i = 0; i < 50; i++)
{
var snap = supervisor.Snapshot();
if (snap.State == SupervisorState.Bound && snap.RecoveryAttempts >= 1)
{
recovered = true;
break;
}
await Task.Delay(50, cts.Token);
}
recovered.ShouldBeTrue("supervisor must rebind after the underlying listener is faulted");
var final = supervisor.Snapshot();
final.State.ShouldBe(SupervisorState.Bound);
final.RecoveryAttempts.ShouldBeGreaterThanOrEqualTo(1);
await supervisor.StopAsync(cts.Token);
Assert.Equal(SupervisorState.Stopped, supervisor.Snapshot().State);