mbproxy: close remaining 5 W3 test gaps from 2026-05-14 review

Closes the 5 "easily addable" W3 test gaps left after the prior W3 commit;
the 5 race-hard gaps remain documented as known omissions per the plan.
Tests: 382 pass / 0 fail (baseline 378 + 4 net new methods — the supervisor
runtime-fault test replaces the existing placeholder).

  #11 BcdCodecTests.Encode16_IntMinValue_Throws_OutOfRange_NoArithmeticSurprise
      Locks the (uint)value > Max16 boundary check against int.MinValue. The
      cast becomes 0x80000000 which is well above 9999, so the throw fires
      cleanly. Prevents regression to a two-sided int comparison that would
      underflow.

  #15 BcdPduPipelineTests.FC03_Request_QtyAbove128_AtNonBcdAddress_PassesThroughUnchanged
      DL205/DL260 caps FC03/FC04 at qty=128 (DL260/dl205.md). The proxy must
      NOT truncate the qty field — passing through unchanged lets the PLC's
      own validator return exception 03 to the client (transparent contract
      for FCs/addresses the rewriter doesn't own).

  #4 SupervisorTests.Supervisor_RuntimeFault_OnRunningListener_RecoversAndRebinds
      Replaces the previous placeholder. Genuinely faults the running listener
      mid-life by stopping its underlying TcpListener via reflection (the
      single externally-observable hook to force the accept loop's
      AcceptAsync to throw ObjectDisposedException). Asserts the supervisor
      transitions to Recovering, re-binds via the Polly pipeline, and bumps
      RecoveryAttempts.

  #10 HotReloadE2ETests.E2E_ReadCoalescingEnabled_FlipAtRuntime_PropagatesToOptionsMonitor
      Validates that flipping Mbproxy.Resilience.ReadCoalescing.Enabled at
      runtime via hot-reload propagates through the live IOptionsMonitor.
      The W2.1 fix wires the accessor through to add/restart supervisors;
      the multiplexer reads it per-PDU (unit-tested separately). Proving
      IOptionsMonitor sees the new value is sufficient for the contract.

  #16 ConfigReconcilerTests.Apply_ManyConcurrentReloads_With_PlcChurn_NoCorruption
      Stress-tests the W2.3 ConcurrentDictionary fix. 16 concurrent applies
      cycle through 8 distinct PLC rosters, driving Add+Remove churn against
      the live supervisor dict. Without W2.3 the inner Task.WhenAll
      continuations would corrupt Dictionary<,> and crash with
      KeyNotFoundException / ArgumentException. Asserts every apply succeeds,
      no orphan supervisors remain, and the reload counter equals 16.

The 5 deterministically-race-hard gaps (#5 TxId saturation propagation, #6
coalescing factory leak under saturation, #7 backend-reader head-of-line
block, #8 watchdog↔response race, #9 cascade↔new-accept race) remain open
by design — reproducing those races deterministically requires test seams in
production code or stress-style tests that flake on slow CI. The Wave-1
fixes are still verified at the unit-contract level
(UpstreamPipeTests.TrySendResponse_WhenChannelFull, etc.).

This closes everything actionable in codereviews/2026-05-14/RemediationPlan.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 06:17:03 -04:00
parent 7ead3581ab
commit 2545237973
5 changed files with 211 additions and 12 deletions
@@ -280,6 +280,72 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
// under concurrent load.
Assert.Equal(5, counters.ReloadAppliedCount);
}
/// <summary>
/// Phase 12 (W3 test gap #16) — stress-test the W2.3 ConcurrentDictionary fix and the
/// W2.1 coalescing-accessor wiring. Many concurrent Apply calls drive add/remove of
/// many distinct PLCs; without W2.3's ConcurrentDictionary the inner Task.WhenAll
/// continuations would corrupt the dictionary and crash with KeyNotFoundException or
/// ArgumentException. The test asserts: all applies complete, no exceptions are
/// thrown, and the reload counter is exactly the apply count.
/// </summary>
[Fact(Timeout = 30_000)]
public async Task Apply_ManyConcurrentReloads_With_PlcChurn_NoCorruption()
{
// Empty initial — first Apply will Add all PLCs.
var initial = MakeOptions([]);
var monitor = new FakeOptionsMonitor(initial);
var supervisors = new System.Collections.Concurrent.ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal);
var counters = new ServiceCounters();
var reconciler = BuildReconciler(monitor, counters);
_reconcilers.Add(reconciler);
reconciler.Attach(supervisors, initial);
// Build 8 different option snapshots, each a different PLC roster.
// Each Apply will trigger Add+Remove churn against the live supervisor dict —
// exactly the path that W2.3's ConcurrentDictionary was needed for.
const int snapshots = 8;
const int plcsPerSnapshot = 4;
var snaps = new MbproxyOptions[snapshots];
var allPlcs = new List<PlcOptions>();
for (int s = 0; s < snapshots; s++)
{
var plcsForSnap = new PlcOptions[plcsPerSnapshot];
for (int p = 0; p < plcsPerSnapshot; p++)
{
plcsForSnap[p] = MakePlc($"PLC-{s}-{p}", PickFreePort());
allPlcs.Add(plcsForSnap[p]);
}
snaps[s] = MakeOptions(plcsForSnap);
}
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(25));
// Fire 16 concurrent applies cycling through the 8 snapshots so each is
// submitted twice. Inner per-PLC Task.WhenAll continuations from W2.3 will run
// in parallel and stress-test the dictionary mutation safety.
var tasks = Enumerable.Range(0, 16)
.Select(i => Task.Run(() => reconciler.ApplyAsync(snaps[i % snapshots], cts.Token), cts.Token))
.ToArray();
var results = await Task.WhenAll(tasks);
Assert.All(results, r => Assert.True(r, "every Apply must succeed"));
Assert.Equal(16, counters.ReloadAppliedCount);
// Final dictionary state: all keys present must come from the last-applied snapshot.
// The "last-applied snapshot" depends on scheduling so we just verify NO orphan
// entries — every supervisor in the dict must correspond to some snapshot's PLCs.
var validNames = new HashSet<string>(allPlcs.Select(p => p.Name));
foreach (var name in supervisors.Keys)
Assert.Contains(name, validNames);
// Track supervisors for cleanup.
foreach (var s in supervisors.Values)
_supervisors.Add(s);
}
}
/// <summary>
@@ -363,6 +363,68 @@ public sealed class HotReloadE2ETests : IAsyncLifetime
await host.StopAsync(stopCts.Token);
}
// ── Phase 12 (W3 test gap #10) — ReadCoalescing.Enabled hot-reload flip ─────────────
/// <summary>
/// W3 — verifies that flipping <c>Mbproxy.Resilience.ReadCoalescing.Enabled</c> at
/// runtime via hot-reload propagates to the live <see cref="IOptionsMonitor{T}"/>
/// snapshot. The W2.1 fix wires the accessor through to add/restart supervisors;
/// the multiplexer reads it per-PDU. Proving the IOptionsMonitor sees the new value
/// is sufficient — the per-PDU read path is unit-tested at the multiplexer level.
/// </summary>
[Fact(Timeout = 8_000)]
public async Task E2E_ReadCoalescingEnabled_FlipAtRuntime_PropagatesToOptionsMonitor()
{
int port = PickFreePort();
int adminPort = PickFreePort();
WriteConfigWithCoalescing(_configPath, port, adminPort, enabled: true);
using var host = BuildHost(_configPath);
using var startCts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
await host.StartAsync(startCts.Token);
await WaitForAsync(() => CanConnect(port), TimeSpan.FromSeconds(5),
"listener should be reachable after startup");
var monitor = host.Services
.GetRequiredService<Microsoft.Extensions.Options.IOptionsMonitor<Mbproxy.Options.MbproxyOptions>>();
monitor.CurrentValue.Resilience.ReadCoalescing.Enabled.ShouldBeTrue(
"initial config sets Enabled=true");
// Flip to false and re-save.
WriteConfigWithCoalescing(_configPath, port, adminPort, enabled: false);
await WaitForAsync(
() => monitor.CurrentValue.Resilience.ReadCoalescing.Enabled == false,
TimeSpan.FromSeconds(5),
"IOptionsMonitor.CurrentValue must reflect Enabled=false after hot-reload");
using var stopCts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
await host.StopAsync(stopCts.Token);
}
private static void WriteConfigWithCoalescing(
string path, int listenPort, int adminPort, bool enabled)
{
var doc = new
{
Mbproxy = new
{
AdminPort = adminPort,
BcdTags = new { Global = Array.Empty<object>() },
Plcs = new[] { new { Name = "PLC-A", ListenPort = listenPort, Host = "127.0.0.1", Port = 502 } },
Connection = new { BackendConnectTimeoutMs = 500, BackendRequestTimeoutMs = 500 },
Resilience = new
{
ReadCoalescing = new { Enabled = enabled, MaxParties = 32 },
},
},
};
string tmp = path + ".tmp";
File.WriteAllText(tmp, JsonSerializer.Serialize(doc, new JsonSerializerOptions { WriteIndented = true }));
File.Move(tmp, path, overwrite: true);
}
private static void WriteConfigWithCacheableTag(
string path, int listenPort, int adminPort, int address, int cacheTtlMs)
{