Comments described the *history* of how the code arrived (phase numbers,
wave IDs, review IDs, dated TODOs) instead of what it does today. That
scaffolding rotted as the codebase evolved. Cleaned 60 source files +
.gitignore; behaviour unchanged (387/387 tests still pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the four critical correctness defects + the ShutdownCoordinator
double-stop ordering bug called out in codereviews/2026-05-14/Overview.md.
Tests: 362 pass / 0 fail (baseline 358 + 4 new W1 regression tests).
W1.1 — Context swap on running multiplexer.
PlcMultiplexer._ctx becomes volatile with a new ReplaceContext() method
that re-registers the cache stats provider on the (preserved) counters.
PlcListener exposes its multiplexer; PlcListenerSupervisor.ReplaceContextAsync
swaps the running mux first, then disposes the old cache. Hot-reload
tag-list changes and the cache-flush-on-reload contract now actually take
effect on the next PDU instead of waiting for the next listener fault.
W1.2 — Coalescing factory leak.
When the InFlightByKey factory soft-fails (allocator saturation or duplicate
TxId), the cleanup path now TryRemoves the stub and walks every party on it
(including late attachers) to deliver Modbus exception 0x04. Previously
only the leader got the exception; late attachers waited forever for a
response that no backend round-trip would ever fire.
W1.3 — Backend-reader head-of-line block.
UpstreamPipe gains TrySendResponse for non-blocking enqueue. The per-PLC
backend reader's fan-out loop uses it instead of awaiting SendResponseAsync,
so a wedged upstream's full bounded response channel can no longer stall
the single backend reader and starve every other client on that PLC. New
responseDropForFullUpstream counter on ProxyCounters / CounterSnapshot
records the drops.
W1.4 — Stranded outbound frames after cascade.
TearDownBackendAsync acquires _connectGate and drains any frames left in
_outboundChannel after the writer task faulted/cancelled, releasing their
proxy TxIds back to the allocator. Without this, a fresh
EnsureBackendConnectedAsync racing the cascade would send stranded frames
with old TxIds onto the new backend socket; the responses would arrive
with no correlation entry and the upstream peers would hang on the
watchdog until BackendRequestTimeoutMs.
W1.5 — Delete ShutdownCoordinator (Option B).
Drain logic moved into ProxyWorker.StopAsync. AdminEndpointHost is no
longer registered as IHostedService; ProxyWorker drives its lifecycle
directly so admin starts after listeners are bound and stops AFTER the
in-flight drain (the design's documented contract). Admin is resolved
lazily in ExecuteAsync to break the circular DI graph
(Admin -> StatusSnapshotBuilder -> ProxyWorker). GracefulShutdownTimeoutMs
is now read fresh from IOptionsMonitor.CurrentValue at stop time, so a
hot-reloaded value is honoured. Removes ShutdownCoordinator + tests.
New tests:
PlcMultiplexerTests.ReplaceContext_NewTagMap_VisibleOnNextPdu
PlcMultiplexerTests.ReplaceContext_NewCache_NextReadGoesToBackend_NotOldCache
UpstreamPipeTests.TrySendResponse_WhenChannelFull_ReturnsFalse_WithoutBlocking
UpstreamPipeTests.TrySendResponse_AfterDispose_ReturnsFalse
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>