mbproxy: Wave 2 fixes from 2026-05-14 code review
Resolves the 21 Major findings catalogued in
codereviews/2026-05-14/RemediationPlan.md (Wave 2). Tests: 370 pass / 0 fail
(baseline 363 + 7 new W2 regression tests).
Multiplexer / concurrency:
W2.1 ConfigReconciler.Attach now threads the live coalescingAccessor through
to add/restart-built supervisors so a hot-reload of
ReadCoalescing.{Enabled,MaxParties} propagates to PLCs added or
restarted via reload.
W2.2 PlcMultiplexer._disposed and UpstreamPipe._disposed are now volatile
for ARM/portability defense.
W2.3 ProxyWorker._supervisors / ConfigReconciler._supervisors switched from
Dictionary to ConcurrentDictionary; reconciler uses TryRemove. The
outer Apply is serialised by a semaphore but the inner Add/Remove/
Restart Task.WhenAll continuations run in parallel.
W2.4 Counter parity for cache miss + coalescing-saturation miss documented
inline (per-design contract; behavior unchanged).
W2.5 _disposeCts.Dispose() and _connectGate.Dispose() guarded against late
watchdog ticks.
W2.6 _connectGate disposed in DisposeAsync.
W2.7 Inline doc clarifying the post-rewriter FC byte read.
Cache / hot-reload:
W2.8 PlcListenerSupervisor.ReplaceContextAsync now calls Clear() to capture
the entry count, emits mbproxy.cache.flushed, then disposes the old
cache. Previously the event was defined but never emitted.
W2.9 Inline doc explaining the implicit "skip cache invalidation while
recovering" gating (no backend reader during recovery → no FC06/FC16
response → no invalidation).
W2.10 ReloadValidator now re-checks resolved per-tag CacheTtlMs against
Cache.AllowLongTtl after BcdTagMapBuilder folds the per-PLC default.
BCD rewriter:
W2.11 Duplicate addresses detected within Global itself and within the per-PLC
Add list itself, BEFORE the working dictionary collapses keys. Cross-list
collisions (Global vs Add) remain the documented width-override pattern.
Previously the DuplicateAddress error was unreachable dead code.
W2.12 OverlappingHighRegister reports each colliding pair exactly once
(canonicalised low/high pair tracked in a HashSet).
W2.13 FC16 32-bit write rejects clientLow > 9999 or clientHigh > 9999 BEFORE
the high*10000+low reconstruction. Without this guard, (high=9999,
low=9999) silently re-encoded as (high=9998, low=9999), losing 1 from
the high word.
W2.14 FC16 validates pdu.Length >= 6 + qty*2 upfront — no half-rewritten
requests when a malformed client claims more registers than it ships.
Supervisor:
W2.15 WaitForInitialBindAttemptAsync now backed by TaskCompletionSource
instead of 10ms busy-poll. Resolves race against fast Stopped→Bound→
Stopped transitions and hangs when the supervisor task throws.
W2.16 StartAsync refuses re-entry on a non-Stopped supervisor (was leaking
the previous _supervisorCts).
W2.17 New TransitionTo helper writes _state, _lastBindError, and (optionally)
_recoveryAttempts under one lock. Snapshot() reads under the same lock
so the status page never reports an inconsistent triple. Truncate
helper extracted (was copy-pasted across three sites).
W2.18 MbproxyOptionsValidator + ReloadValidator reject Connection.{Backend
ConnectTimeoutMs, BackendRequestTimeoutMs, GracefulShutdownTimeoutMs}
<= 0. Misconfigured 0 produces immediate CancelAfter(0) failures.
Hosting / diagnostics:
W2.20 ProxyWorker.StopAsync supervisor-stop deadline now reads from
IOptionsMonitor.CurrentValue.Connection.GracefulShutdownTimeoutMs
(was hard-coded 5s).
W2.21 src/Mbproxy/appsettings.json deleted; the published file is now a Link
to install/mbproxy.config.template.json so the binary ships with a
usable, fully-commented example config instead of an empty stub. Tests
strip the inherited file from their bin via an AfterTargets="Build"
Target so they don't pick up the template's example PLCs.
W2.22 invalidBcdWarnings (PlcPdusStatus) and codeOther (ExceptionCounts)
added to StatusDto, plumbed through StatusSnapshotBuilder, surfaced
in StatusHtmlRenderer table cells.
W2.23 EventLogBridge caches EventLog.SourceExists at construction so Emit
doesn't hit the registry on every Error+ log line.
New regression tests:
ReloadValidatorTests:
Validate_PerTagCacheTtl_Above60s_Without_AllowLongTtl_Fails
Validate_PerTagCacheTtl_Above60s_With_AllowLongTtl_Passes
Validate_ResolvedTtl_FromPerPlcDefault_AboveCap_Fails
Validate_ZeroBackendConnectTimeoutMs_Fails
Validate_NegativeGracefulShutdownTimeoutMs_Fails
BcdPduPipelineTests:
FC16_32Bit_ClientHighOrLowAbove9999_PassesThroughRaw_WithInvalidBcdWarning
FC16_TruncatedRegisterData_PassesThroughRaw_NoPartialRewrite
Reworked tests in BcdTagMapBuilderTests for the W2.11 contract (Global dup,
Add dup, Add-overrides-Global accepted as width override).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -88,7 +88,11 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
private Task? _backendReaderTask;
|
||||
|
||||
private readonly CancellationTokenSource _disposeCts = new();
|
||||
private bool _disposed;
|
||||
// Phase 12 (W2.2) — volatile so the disposing thread's write is observed by every
|
||||
// hot-path reader (OnUpstreamFrameAsync, ReplaceContext, Attach, etc.) without a
|
||||
// separate fence. On x86/x64 plain reads happen to give acquire-release semantics, so
|
||||
// this is defense for ARM hosts and future portability.
|
||||
private volatile bool _disposed;
|
||||
private Task? _watchdogTask;
|
||||
|
||||
public PlcMultiplexer(
|
||||
@@ -240,7 +244,11 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
}
|
||||
_pipes.Clear();
|
||||
|
||||
_disposeCts.Dispose();
|
||||
// Phase 12 (W2.5, W2.6) — guard the CTS dispose against a watchdog tick that
|
||||
// raced past the WaitAsync above (e.g. a slow Task.Delay completion observing
|
||||
// cancellation late). Also dispose the connect-gate semaphore.
|
||||
try { _disposeCts.Dispose(); } catch (ObjectDisposedException) { /* already disposed */ }
|
||||
try { _connectGate.Dispose(); } catch (ObjectDisposedException) { /* already disposed */ }
|
||||
}
|
||||
|
||||
// ── Backend connect / teardown ────────────────────────────────────────────
|
||||
@@ -522,9 +530,14 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
// cache-eligible (resolvedTtlMs > 0).
|
||||
// * FC06/FC16 successful responses invalidate every cached entry whose
|
||||
// address range overlaps the write.
|
||||
//
|
||||
// Phase 12 (W2.7) — exception bit comes from the post-rewriter buffer
|
||||
// (the rewriter never touches the FC byte today, but reading from
|
||||
// inFlight.Fc would lose the exception bit). The base FC for routing
|
||||
// decisions uses inFlight.Fc — the request side knows what was sent.
|
||||
if (_ctx.Cache is { } postCache)
|
||||
{
|
||||
byte fcInResponse = frame[MbapFrame.HeaderSize]; // post-rewriter, but the FC byte is never rewritten
|
||||
byte fcInResponse = frame[MbapFrame.HeaderSize];
|
||||
bool isException = (fcInResponse & 0x80) != 0;
|
||||
|
||||
if (!isException)
|
||||
@@ -555,6 +568,16 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
}
|
||||
else if (inFlight.Fc is 0x06 or 0x10)
|
||||
{
|
||||
// Phase 12 (W2.9) — the design contract "invalidations during a
|
||||
// recovering listener state are skipped" (design.md:203) is
|
||||
// upheld IMPLICITLY here: invalidation only fires inside the
|
||||
// backend reader task when a non-exception FC06/FC16 response
|
||||
// arrives. A `Recovering` listener has no backend reader (the
|
||||
// multiplexer is torn down between recovery attempts), so no
|
||||
// response can land here, so no invalidation. The gating is
|
||||
// structural, not conditional. If a future change ever produces
|
||||
// a write response off the live backend, an explicit recovering-
|
||||
// state check would need to be added.
|
||||
int invalidated = postCache.Invalidate(
|
||||
inFlight.UnitId, inFlight.StartAddress, inFlight.Qty);
|
||||
if (invalidated > 0)
|
||||
@@ -692,6 +715,12 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
return;
|
||||
}
|
||||
|
||||
// Per design contract: "miss" = "fell through to coalescing/backend".
|
||||
// When two upstream peers issue the same cache-eligible read, both increment
|
||||
// CacheMiss; only one then opens a backend round-trip (the second coalesces
|
||||
// onto the first via the InFlightByKey path below). So `CacheMiss` does NOT
|
||||
// equal "produced a backend round-trip" — it equals "did not find a fresh
|
||||
// cache entry". The identity `Hit + Miss = cache-eligible requests` holds.
|
||||
_ctx.Counters.IncrementCacheMiss();
|
||||
CacheLogEvents.Miss(_logger, _plc.Name, unitId, fcByte, startAddr, qty);
|
||||
}
|
||||
@@ -786,7 +815,11 @@ internal sealed class PlcMultiplexer : IAsyncDisposable, IMultiplexCountersProvi
|
||||
return;
|
||||
}
|
||||
|
||||
// Coalesce miss: we just opened a fresh in-flight entry.
|
||||
// Coalesce miss: this request did not attach to an in-flight peer. Per the
|
||||
// design contract `coalescedHitCount + coalescedMissCount = total FC03/FC04`,
|
||||
// so even saturation-failure paths (factory below returns null inFlightForSend)
|
||||
// count as a miss — every FC03/FC04 entered the coalescing path exactly once.
|
||||
// "Miss" here means "did not coalesce", NOT "produced a backend round-trip".
|
||||
_ctx.Counters.IncrementCoalescedMiss();
|
||||
CoalescingLogEvents.Miss(_logger, _plc.Name, unitId, fcByte, startAddr, qty);
|
||||
|
||||
|
||||
@@ -49,7 +49,9 @@ internal sealed partial class UpstreamPipe : IAsyncDisposable
|
||||
// Internal CTS lets the multiplexer signal "drop this pipe now" without waiting for
|
||||
// the upstream socket to close cleanly.
|
||||
private readonly CancellationTokenSource _cts = new();
|
||||
private bool _disposed;
|
||||
// Phase 12 (W2.2) — volatile so writes from DisposeAsync are observed by IsAlive /
|
||||
// TrySendResponse on other threads without a fence.
|
||||
private volatile bool _disposed;
|
||||
|
||||
// Phase 9: per-pipe forwarded-PDU counter (replaces the per-pair counter from the
|
||||
// 1:1 model). Read by the status page.
|
||||
|
||||
Reference in New Issue
Block a user