mbproxy: Wave 2 fixes from 2026-05-14 code review

Resolves the 21 Major findings catalogued in
codereviews/2026-05-14/RemediationPlan.md (Wave 2). Tests: 370 pass / 0 fail
(baseline 363 + 7 new W2 regression tests).

Multiplexer / concurrency:
  W2.1  ConfigReconciler.Attach now threads the live coalescingAccessor through
        to add/restart-built supervisors so a hot-reload of
        ReadCoalescing.{Enabled,MaxParties} propagates to PLCs added or
        restarted via reload.
  W2.2  PlcMultiplexer._disposed and UpstreamPipe._disposed are now volatile
        for ARM/portability defense.
  W2.3  ProxyWorker._supervisors / ConfigReconciler._supervisors switched from
        Dictionary to ConcurrentDictionary; reconciler uses TryRemove. The
        outer Apply is serialised by a semaphore but the inner Add/Remove/
        Restart Task.WhenAll continuations run in parallel.
  W2.4  Counter parity for cache miss + coalescing-saturation miss documented
        inline (per-design contract; behavior unchanged).
  W2.5  _disposeCts.Dispose() and _connectGate.Dispose() guarded against late
        watchdog ticks.
  W2.6  _connectGate disposed in DisposeAsync.
  W2.7  Inline doc clarifying the post-rewriter FC byte read.

Cache / hot-reload:
  W2.8  PlcListenerSupervisor.ReplaceContextAsync now calls Clear() to capture
        the entry count, emits mbproxy.cache.flushed, then disposes the old
        cache. Previously the event was defined but never emitted.
  W2.9  Inline doc explaining the implicit "skip cache invalidation while
        recovering" gating (no backend reader during recovery → no FC06/FC16
        response → no invalidation).
  W2.10 ReloadValidator now re-checks resolved per-tag CacheTtlMs against
        Cache.AllowLongTtl after BcdTagMapBuilder folds the per-PLC default.

BCD rewriter:
  W2.11 Duplicate addresses detected within Global itself and within the per-PLC
        Add list itself, BEFORE the working dictionary collapses keys. Cross-list
        collisions (Global vs Add) remain the documented width-override pattern.
        Previously the DuplicateAddress error was unreachable dead code.
  W2.12 OverlappingHighRegister reports each colliding pair exactly once
        (canonicalised low/high pair tracked in a HashSet).
  W2.13 FC16 32-bit write rejects clientLow > 9999 or clientHigh > 9999 BEFORE
        the high*10000+low reconstruction. Without this guard, (high=9999,
        low=9999) silently re-encoded as (high=9998, low=9999), losing 1 from
        the high word.
  W2.14 FC16 validates pdu.Length >= 6 + qty*2 upfront — no half-rewritten
        requests when a malformed client claims more registers than it ships.

Supervisor:
  W2.15 WaitForInitialBindAttemptAsync now backed by TaskCompletionSource
        instead of 10ms busy-poll. Resolves race against fast Stopped→Bound→
        Stopped transitions and hangs when the supervisor task throws.
  W2.16 StartAsync refuses re-entry on a non-Stopped supervisor (was leaking
        the previous _supervisorCts).
  W2.17 New TransitionTo helper writes _state, _lastBindError, and (optionally)
        _recoveryAttempts under one lock. Snapshot() reads under the same lock
        so the status page never reports an inconsistent triple. Truncate
        helper extracted (was copy-pasted across three sites).
  W2.18 MbproxyOptionsValidator + ReloadValidator reject Connection.{Backend
        ConnectTimeoutMs, BackendRequestTimeoutMs, GracefulShutdownTimeoutMs}
        <= 0. Misconfigured 0 produces immediate CancelAfter(0) failures.

Hosting / diagnostics:
  W2.20 ProxyWorker.StopAsync supervisor-stop deadline now reads from
        IOptionsMonitor.CurrentValue.Connection.GracefulShutdownTimeoutMs
        (was hard-coded 5s).
  W2.21 src/Mbproxy/appsettings.json deleted; the published file is now a Link
        to install/mbproxy.config.template.json so the binary ships with a
        usable, fully-commented example config instead of an empty stub. Tests
        strip the inherited file from their bin via an AfterTargets="Build"
        Target so they don't pick up the template's example PLCs.
  W2.22 invalidBcdWarnings (PlcPdusStatus) and codeOther (ExceptionCounts)
        added to StatusDto, plumbed through StatusSnapshotBuilder, surfaced
        in StatusHtmlRenderer table cells.
  W2.23 EventLogBridge caches EventLog.SourceExists at construction so Emit
        doesn't hit the registry on every Error+ log line.

New regression tests:
  ReloadValidatorTests:
    Validate_PerTagCacheTtl_Above60s_Without_AllowLongTtl_Fails
    Validate_PerTagCacheTtl_Above60s_With_AllowLongTtl_Passes
    Validate_ResolvedTtl_FromPerPlcDefault_AboveCap_Fails
    Validate_ZeroBackendConnectTimeoutMs_Fails
    Validate_NegativeGracefulShutdownTimeoutMs_Fails
  BcdPduPipelineTests:
    FC16_32Bit_ClientHighOrLowAbove9999_PassesThroughRaw_WithInvalidBcdWarning
    FC16_TruncatedRegisterData_PassesThroughRaw_NoPartialRewrite

Reworked tests in BcdTagMapBuilderTests for the W2.11 contract (Global dup,
Add dup, Add-overrides-Global accepted as width override).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-14 05:48:44 -04:00
parent ce32c5cee8
commit e66b17fe5f
21 changed files with 545 additions and 163 deletions
@@ -43,10 +43,10 @@ public sealed class StatusHtmlRendererTests
ListenPort: 5020,
Listener: new PlcListenerStatus(state, lastBindError, recoveryAttempts),
Clients: new PlcClientsStatus(clients?.Count ?? 0, clients ?? noClients),
Pdus: new PlcPdusStatus(100, new FcCounts(50, 10, 20, 15, 5), 30, 2),
Pdus: new PlcPdusStatus(100, new FcCounts(50, 10, 20, 15, 5), 30, 2, 0),
Backend: new PlcBackendStatus(
ConnectsSuccess: 0, ConnectsFailed: 0,
ExceptionsByCode: new ExceptionCounts(1, 0, 0, 0),
ExceptionsByCode: new ExceptionCounts(1, 0, 0, 0, 0),
LastRoundTripMs: 3.5,
InFlight: 0, MaxInFlight: 0, TxIdWraps: 0,
DisconnectCascades: 0, QueueDepth: 0,
@@ -98,62 +98,71 @@ public sealed class BcdTagMapBuilderTests
tag.Width.ShouldBe((byte)32);
}
/// <summary>
/// Phase 12 (W2.11) — duplicates within Global itself are now detected
/// pre-collapse and produce a DuplicateAddress error. (Before W2.11 the input
/// dictionary silently collapsed to last-write-wins, leaving the validator dead.)
/// </summary>
[Fact]
public void Build_DuplicateAddressInGlobal_ReturnsDuplicateAddressError()
{
// Two options with the same address in Global.
// The working dictionary collapses them (last-write-wins),
// so a true duplicate is one in Add that matches Global after step 3
// has already resolved — which the builder handles as "Add wins" (no error).
// This test instead validates the case where Global has a structural duplicate
// after the full resolution results in one address appearing twice, which can
// happen if the options list is constructed with the same address twice.
var global = new BcdTagListOptions
{
Global =
[
new BcdTagOptions { Address = 1072, Width = 16 },
new BcdTagOptions { Address = 1072, Width = 32 }, // same address, different width
new BcdTagOptions { Address = 1072, Width = 32 }, // duplicate within Global
]
};
// The dictionary collapses to one entry (last-write-wins in the dictionary).
// A real duplicate-detection scenario: two separately-identical entries through Add.
// Let's construct a true duplicate through the Add path overwriting Global
// and then adding the same address again.
// Actually: our builder uses Dictionary<ushort, BcdTagOptions> which deduplicates
// by key. The DuplicateAddress error fires when seenAddresses already contains addr,
// which can only happen if working has two entries with the same key — but Dictionary
// prevents that. The correct scenario is: two Add entries with the same address in
// the IReadOnlyList (list allows duplication even though dict collapses them).
// Since the builder iterates the list and adds to dict, duplicates in the list
// get silently resolved. The DuplicateAddress error is thus for a theoretical
// future path; let's verify the "Add with same address as existing" path instead.
var result = BcdTagMapBuilder.Build(global, perPlc: null);
// Should resolve cleanly (dict collapses to last write).
result.Errors.ShouldBeEmpty();
result.Map.Count.ShouldBe(1);
result.Errors.ShouldContain(e => e.Kind == BcdValidationError.DuplicateAddress
&& e.Address == 1072);
}
/// <summary>
/// Phase 12 (W2.11) — duplicates within the per-PLC Add list itself are now detected
/// pre-collapse. (Cross-list collisions Global vs Add remain the legitimate width-
/// override pattern and are NOT errors — see the next test.)
/// </summary>
[Fact]
public void Build_DuplicateAddress_Via_AddList_Produces_No_Error_LastWriteWins()
public void Build_DuplicateAddress_Within_AddList_ReturnsDuplicateAddressError()
{
// The Add list has two entries for the same address; builder sees the last one.
// This is intentional: it allows width overrides. No duplicate error expected.
var global = Global((1072, 16));
var perPlc = new PlcBcdOverrides
{
Add =
[
new BcdTagOptions { Address = 1072, Width = 16 },
new BcdTagOptions { Address = 1072, Width = 32 }, // override the first Add
new BcdTagOptions { Address = 1080, Width = 16 },
new BcdTagOptions { Address = 1080, Width = 32 }, // duplicate within Add
],
Remove = [],
};
var result = BcdTagMapBuilder.Build(global, perPlc);
result.Errors.ShouldContain(e => e.Kind == BcdValidationError.DuplicateAddress
&& e.Address == 1080);
}
/// <summary>
/// Phase 12 (W2.11) — same-address entries appearing in BOTH Global AND Add are
/// the documented width-override pattern (design.md "Hybrid tag resolution"). They
/// must NOT be flagged as duplicates; Add wins.
/// </summary>
[Fact]
public void Build_AddOverridesGlobalAtSameAddress_NoDuplicateError_AddWins()
{
var global = Global((1072, 16));
var perPlc = new PlcBcdOverrides
{
Add = [ new BcdTagOptions { Address = 1072, Width = 32 } ],
Remove = [],
};
var result = BcdTagMapBuilder.Build(global, perPlc);
result.Errors.ShouldBeEmpty();
result.Map.TryGet(1072, out var tag).ShouldBeTrue();
tag.Width.ShouldBe((byte)32);
@@ -109,7 +109,7 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
_supervisors.Add(supA);
await supA.StartAsync(CancellationToken.None);
var supervisors = new Dictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
var supervisors = new System.Collections.Concurrent.ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
{
["A"] = supA,
};
@@ -149,7 +149,7 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
_supervisors.Add(supA);
await supA.StartAsync(CancellationToken.None);
var supervisors = new Dictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
var supervisors = new System.Collections.Concurrent.ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
{
["A"] = supA,
};
@@ -207,7 +207,7 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
await supA.WaitForInitialBindAttemptAsync(waitCts.Token);
Assert.Equal(SupervisorState.Bound, supA.Snapshot().State);
var supervisors = new Dictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
var supervisors = new System.Collections.Concurrent.ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal)
{
["A"] = supA,
};
@@ -245,7 +245,7 @@ public sealed class ConfigReconcilerTests : IAsyncDisposable
var counters = new ServiceCounters();
var reconciler = BuildReconciler(monitor, counters);
_reconcilers.Add(reconciler);
reconciler.Attach(new Dictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal), initial);
reconciler.Attach(new System.Collections.Concurrent.ConcurrentDictionary<string, PlcListenerSupervisor>(StringComparer.Ordinal), initial);
// Fire 5 concurrent Apply calls — they must execute one-at-a-time.
var opts = MakeOptions([]);
@@ -155,4 +155,113 @@ public sealed class ReloadValidatorTests
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("non-empty"));
}
// ── Phase 12 (W2.10) — Cache.AllowLongTtl gate ──────────────────────────────────────
/// <summary>
/// W2 — per-tag CacheTtlMs > 60_000 without Cache.AllowLongTtl is rejected.
/// </summary>
[Fact]
public void Validate_PerTagCacheTtl_Above60s_Without_AllowLongTtl_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
BcdTags = new BcdTagListOptions
{
Global = [ new BcdTagOptions { Address = 1024, Width = 16, CacheTtlMs = 120_000 } ],
},
Cache = new CacheOptions { AllowLongTtl = false },
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("AllowLongTtl") && e.Contains("60_000"));
}
/// <summary>
/// W2 — same value passes when AllowLongTtl is true (operator opt-in).
/// </summary>
[Fact]
public void Validate_PerTagCacheTtl_Above60s_With_AllowLongTtl_Passes()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
BcdTags = new BcdTagListOptions
{
Global = [ new BcdTagOptions { Address = 1024, Width = 16, CacheTtlMs = 120_000 } ],
},
Cache = new CacheOptions { AllowLongTtl = true },
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.True(valid);
Assert.Empty(errors);
}
/// <summary>
/// W2 — per-PLC DefaultCacheTtlMs > 60_000 inherited by a tag with null CacheTtlMs is
/// caught by the resolved-value check even if the per-PLC default check itself passes
/// (it doesn't, but this validates the defensive resolved re-check from W2.10).
/// </summary>
[Fact]
public void Validate_ResolvedTtl_FromPerPlcDefault_AboveCap_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [
new PlcOptions
{
Name = "PLC-A", ListenPort = 5020, Host = "127.0.0.1", Port = 502,
DefaultCacheTtlMs = 90_000,
},
],
BcdTags = new BcdTagListOptions
{
// Tag with no explicit CacheTtlMs — inherits the per-PLC 90_000.
Global = [ new BcdTagOptions { Address = 1024, Width = 16 } ],
},
Cache = new CacheOptions { AllowLongTtl = false },
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("60_000"));
}
// ── Phase 12 (W2.18) — ConnectionOptions validation ─────────────────────────────────
[Fact]
public void Validate_ZeroBackendConnectTimeoutMs_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
Connection = new ConnectionOptions { BackendConnectTimeoutMs = 0 },
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("BackendConnectTimeoutMs"));
}
[Fact]
public void Validate_NegativeGracefulShutdownTimeoutMs_Fails()
{
var opts = new MbproxyOptions
{
Plcs = [MakePlc("PLC-A", 5020)],
Connection = new ConnectionOptions { GracefulShutdownTimeoutMs = -1 },
};
bool valid = ReloadValidator.Validate(opts, out var errors);
Assert.False(valid);
Assert.Contains(errors, e => e.Contains("GracefulShutdownTimeoutMs"));
}
}
@@ -31,4 +31,12 @@
<ProjectReference Include="..\..\src\Mbproxy\Mbproxy.csproj" />
</ItemGroup>
<!-- Phase 12 (W2.21) — the linked appsettings.json from Mbproxy.csproj propagates to
the test bin via the project reference. Tests build their own in-memory
configurations and must not pick up the shipped template's example PLCs. The
Target deletes the inherited file from the test bin after build. -->
<Target Name="RemoveInheritedAppsettings" AfterTargets="Build">
<Delete Files="$(OutputPath)appsettings.json" Condition="Exists('$(OutputPath)appsettings.json')" />
</Target>
</Project>
@@ -360,6 +360,50 @@ public sealed class BcdPduPipelineTests
ctx.Counters.Snapshot().RewrittenSlots.ShouldBe(2);
}
/// <summary>
/// Phase 12 (W2.13) — a client writing a 32-bit BCD value where either word exceeds
/// 9999 must NOT be silently mutated by the `high*10000+low` reconstruction. Validation
/// rejects the slot, increments invalidBcdWarnings, and passes the raw bytes through.
/// Without W2.13 the codec would accept e.g. (high=9999, low=9999) → 99_989_999 →
/// re-encode as (high=9998, low=9999), silently losing 1 from the high word.
/// </summary>
[Fact]
public void FC16_32Bit_ClientHighOrLowAbove9999_PassesThroughRaw_WithInvalidBcdWarning()
{
var ctx = MakeContext(BcdTag.Create(800, 32));
// qty=2, low at offset 0, high at offset 1; both at 0xFFFF (= 65535, > 9999).
var pdu = Fc16Request(800, 0xFFFF, 0xFFFF);
byte[] original = [..pdu];
Pipeline.Process(MbapDirection.RequestToBackend, ReadOnlySpan<byte>.Empty, pdu.AsSpan(), ctx);
pdu.ShouldBe(original, "OOR client values must pass through raw, not be silently mutated");
ctx.Counters.Snapshot().InvalidBcdWarnings.ShouldBe(1);
ctx.Counters.Snapshot().RewrittenSlots.ShouldBe(0);
}
/// <summary>
/// Phase 12 (W2.14) — a malformed FC16 request that claims qty=N but ships fewer than
/// 6+N*2 bytes must NOT be partially rewritten. Without W2.14 each individual slot's
/// per-slot bounds check would skip the OOB slot, leaving early slots rewritten and late
/// slots untouched (a half-rewritten request reaching the PLC).
/// </summary>
[Fact]
public void FC16_TruncatedRegisterData_PassesThroughRaw_NoPartialRewrite()
{
var ctx = MakeContext(BcdTag.Create(900, 16));
// Build a normal 1-register write, then trim 1 byte off the end so qty=1 but only
// 1 byte of register data remains.
var pdu = Fc16Request(900, 1234);
byte[] truncated = pdu.AsSpan(0, pdu.Length - 1).ToArray();
byte[] original = [..truncated];
Pipeline.Process(MbapDirection.RequestToBackend, ReadOnlySpan<byte>.Empty, truncated.AsSpan(), ctx);
truncated.ShouldBe(original, "truncated FC16 must pass through raw");
ctx.Counters.Snapshot().RewrittenSlots.ShouldBe(0);
}
[Fact]
public void FC16_WritePartiallyOverlapping32BitPair_PassesThroughRaw_WithPartialWarning()
{