ce32c5cee8
Resolves the four critical correctness defects + the ShutdownCoordinator double-stop ordering bug called out in codereviews/2026-05-14/Overview.md. Tests: 362 pass / 0 fail (baseline 358 + 4 new W1 regression tests). W1.1 — Context swap on running multiplexer. PlcMultiplexer._ctx becomes volatile with a new ReplaceContext() method that re-registers the cache stats provider on the (preserved) counters. PlcListener exposes its multiplexer; PlcListenerSupervisor.ReplaceContextAsync swaps the running mux first, then disposes the old cache. Hot-reload tag-list changes and the cache-flush-on-reload contract now actually take effect on the next PDU instead of waiting for the next listener fault. W1.2 — Coalescing factory leak. When the InFlightByKey factory soft-fails (allocator saturation or duplicate TxId), the cleanup path now TryRemoves the stub and walks every party on it (including late attachers) to deliver Modbus exception 0x04. Previously only the leader got the exception; late attachers waited forever for a response that no backend round-trip would ever fire. W1.3 — Backend-reader head-of-line block. UpstreamPipe gains TrySendResponse for non-blocking enqueue. The per-PLC backend reader's fan-out loop uses it instead of awaiting SendResponseAsync, so a wedged upstream's full bounded response channel can no longer stall the single backend reader and starve every other client on that PLC. New responseDropForFullUpstream counter on ProxyCounters / CounterSnapshot records the drops. W1.4 — Stranded outbound frames after cascade. TearDownBackendAsync acquires _connectGate and drains any frames left in _outboundChannel after the writer task faulted/cancelled, releasing their proxy TxIds back to the allocator. Without this, a fresh EnsureBackendConnectedAsync racing the cascade would send stranded frames with old TxIds onto the new backend socket; the responses would arrive with no correlation entry and the upstream peers would hang on the watchdog until BackendRequestTimeoutMs. W1.5 — Delete ShutdownCoordinator (Option B). Drain logic moved into ProxyWorker.StopAsync. AdminEndpointHost is no longer registered as IHostedService; ProxyWorker drives its lifecycle directly so admin starts after listeners are bound and stops AFTER the in-flight drain (the design's documented contract). Admin is resolved lazily in ExecuteAsync to break the circular DI graph (Admin -> StatusSnapshotBuilder -> ProxyWorker). GracefulShutdownTimeoutMs is now read fresh from IOptionsMonitor.CurrentValue at stop time, so a hot-reloaded value is honoured. Removes ShutdownCoordinator + tests. New tests: PlcMultiplexerTests.ReplaceContext_NewTagMap_VisibleOnNextPdu PlcMultiplexerTests.ReplaceContext_NewCache_NextReadGoesToBackend_NotOldCache UpstreamPipeTests.TrySendResponse_WhenChannelFull_ReturnsFalse_WithoutBlocking UpstreamPipeTests.TrySendResponse_AfterDispose_ReturnsFalse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
234 lines
9.2 KiB
C#
234 lines
9.2 KiB
C#
using System.Text.Json;
|
|
using Microsoft.AspNetCore.Builder;
|
|
using Microsoft.AspNetCore.Hosting;
|
|
using Microsoft.AspNetCore.Http;
|
|
using Microsoft.Extensions.Options;
|
|
using Mbproxy.Options;
|
|
|
|
namespace Mbproxy.Admin;
|
|
|
|
/// <summary>
|
|
/// Owns the Kestrel-backed admin HTTP endpoint. Driven by <see cref="Proxy.ProxyWorker"/>
|
|
/// (which calls <see cref="StartAsync"/> after listeners are up and <see cref="StopAsync"/>
|
|
/// at the END of graceful shutdown — supervisors stop and drain first, admin stops last).
|
|
///
|
|
/// <para>Lifecycle:</para>
|
|
/// <list type="bullet">
|
|
/// <item><see cref="StartAsync"/> builds a <see cref="WebApplication"/> bound to
|
|
/// <c>Mbproxy.AdminPort</c> and starts it non-blocking.</item>
|
|
/// <item>If the bind fails (port in use, etc.), logs <c>mbproxy.admin.bind.failed</c>
|
|
/// at Error and continues — the proxy listeners are unaffected.</item>
|
|
/// <item>If <c>AdminPort</c> changes via hot-reload, the current app is stopped and a
|
|
/// new one is started on the new port. Other config changes are ignored here.</item>
|
|
/// <item><see cref="StopAsync"/> shuts down the current Kestrel app with a 2 s deadline.</item>
|
|
/// </list>
|
|
///
|
|
/// <para>Routes: exactly two — <c>GET /</c> (HTML) and <c>GET /status.json</c> (JSON).</para>
|
|
///
|
|
/// <para><b>Phase 12 (W1.5)</b> — was previously also registered as <see cref="IHostedService"/>,
|
|
/// but the host's automatic stop ordering (reverse of registration) ran admin.StopAsync
|
|
/// BEFORE ProxyWorker.StopAsync, which broke the design's "drain THEN stop admin" guarantee
|
|
/// and caused a double-stop with the now-deleted <c>ShutdownCoordinator</c>. Now a plain
|
|
/// singleton with explicit lifecycle calls from ProxyWorker.</para>
|
|
/// </summary>
|
|
internal sealed partial class AdminEndpointHost : IAsyncDisposable
|
|
{
|
|
private readonly IOptionsMonitor<MbproxyOptions> _optionsMonitor;
|
|
private readonly StatusSnapshotBuilder _builder;
|
|
private readonly ILoggerFactory _loggerFactory;
|
|
private readonly ILogger<AdminEndpointHost> _logger;
|
|
|
|
// The currently-running Kestrel app; null when stopped or when bind failed.
|
|
private WebApplication? _app;
|
|
|
|
// Protects concurrent Start/Stop calls (hot-reload + StopAsync racing).
|
|
private readonly SemaphoreSlim _lock = new(1, 1);
|
|
|
|
// Current configured port — used to detect changes on hot-reload.
|
|
private int _currentPort;
|
|
|
|
// Subscription token for IOptionsMonitor.OnChange.
|
|
private IDisposable? _optionsChangeRegistration;
|
|
|
|
public AdminEndpointHost(
|
|
IOptionsMonitor<MbproxyOptions> optionsMonitor,
|
|
StatusSnapshotBuilder builder,
|
|
ILoggerFactory loggerFactory)
|
|
{
|
|
_optionsMonitor = optionsMonitor;
|
|
_builder = builder;
|
|
_loggerFactory = loggerFactory;
|
|
_logger = loggerFactory.CreateLogger<AdminEndpointHost>();
|
|
}
|
|
|
|
public async Task StartAsync(CancellationToken cancellationToken)
|
|
{
|
|
_currentPort = _optionsMonitor.CurrentValue.AdminPort;
|
|
|
|
await StartAppAsync(_currentPort, cancellationToken).ConfigureAwait(false);
|
|
|
|
// Subscribe to config changes: if AdminPort changes, re-bind.
|
|
_optionsChangeRegistration = _optionsMonitor.OnChange(opts =>
|
|
{
|
|
int newPort = opts.AdminPort;
|
|
if (newPort == _currentPort) return; // Only care about AdminPort changes.
|
|
|
|
// Fire-and-forget: re-bind is async; we can't await in OnChange.
|
|
_ = Task.Run(async () =>
|
|
{
|
|
await _lock.WaitAsync().ConfigureAwait(false);
|
|
try
|
|
{
|
|
if (newPort == _currentPort) return; // double-check under lock
|
|
|
|
// Stop the old app.
|
|
await StopCurrentAppAsync().ConfigureAwait(false);
|
|
|
|
_currentPort = newPort;
|
|
|
|
// Start on the new port.
|
|
await StartAppAsync(newPort, CancellationToken.None).ConfigureAwait(false);
|
|
}
|
|
finally
|
|
{
|
|
_lock.Release();
|
|
}
|
|
});
|
|
});
|
|
}
|
|
|
|
public async Task StopAsync(CancellationToken cancellationToken)
|
|
{
|
|
_optionsChangeRegistration?.Dispose();
|
|
_optionsChangeRegistration = null;
|
|
|
|
await _lock.WaitAsync(cancellationToken).ConfigureAwait(false);
|
|
try
|
|
{
|
|
await StopCurrentAppAsync().ConfigureAwait(false);
|
|
}
|
|
finally
|
|
{
|
|
_lock.Release();
|
|
}
|
|
}
|
|
|
|
// ── Internal helpers ─────────────────────────────────────────────────────
|
|
|
|
/// <summary>
|
|
/// Builds and starts a Kestrel <see cref="WebApplication"/> on <paramref name="port"/>.
|
|
/// On bind failure, logs the error and sets <c>_app = null</c> — does NOT throw.
|
|
/// Caller must hold <c>_lock</c> or be in a single-threaded context (StartAsync).
|
|
/// </summary>
|
|
private async Task StartAppAsync(int port, CancellationToken ct)
|
|
{
|
|
try
|
|
{
|
|
// Use CreateSlimBuilder with explicit args (empty) to avoid inheriting
|
|
// process-level environment variables like ASPNETCORE_URLS.
|
|
var builder = WebApplication.CreateSlimBuilder(new WebApplicationOptions
|
|
{
|
|
Args = [],
|
|
});
|
|
|
|
// Suppress Kestrel/ASP.NET Core built-in logging; forward to the outer host's
|
|
// logger factory so that admin-endpoint errors appear in the proxy's log stream.
|
|
builder.Logging.ClearProviders();
|
|
builder.Logging.AddProvider(new ForwardingLoggerProvider(_loggerFactory));
|
|
|
|
// Explicit Kestrel listen — overrides any ASPNETCORE_URLS that leaked in.
|
|
builder.WebHost.UseKestrel(k =>
|
|
{
|
|
k.Listen(System.Net.IPAddress.Any, port);
|
|
});
|
|
|
|
var app = builder.Build();
|
|
|
|
// ── Routes ───────────────────────────────────────────────────────
|
|
app.MapGet("/", (HttpContext ctx) =>
|
|
{
|
|
var snapshot = _builder.Build();
|
|
string html = StatusHtmlRenderer.Render(snapshot);
|
|
return Results.Content(html, "text/html; charset=utf-8");
|
|
});
|
|
|
|
app.MapGet("/status.json", (HttpContext ctx) =>
|
|
{
|
|
var snapshot = _builder.Build();
|
|
string json = JsonSerializer.Serialize(snapshot, StatusJsonContext.Default.StatusResponse);
|
|
return Results.Content(json, "application/json");
|
|
});
|
|
|
|
await app.StartAsync(ct).ConfigureAwait(false);
|
|
_app = app;
|
|
|
|
LogAdminStarted(_logger, port);
|
|
}
|
|
catch (Exception ex) when (ex is not OperationCanceledException)
|
|
{
|
|
// Bind failed — log and continue. Proxy listeners are unaffected.
|
|
LogAdminBindFailed(_logger, port, ex.Message);
|
|
_app = null;
|
|
}
|
|
}
|
|
|
|
/// <summary>
|
|
/// Stops the current <see cref="WebApplication"/> with a 2 s deadline, then disposes it.
|
|
/// </summary>
|
|
private async Task StopCurrentAppAsync()
|
|
{
|
|
if (_app is null) return;
|
|
|
|
var app = _app;
|
|
_app = null;
|
|
|
|
try
|
|
{
|
|
using var stopCts = new CancellationTokenSource(TimeSpan.FromSeconds(2));
|
|
await app.StopAsync(stopCts.Token).ConfigureAwait(false);
|
|
}
|
|
catch
|
|
{
|
|
// Best-effort.
|
|
}
|
|
|
|
await app.DisposeAsync().ConfigureAwait(false);
|
|
}
|
|
|
|
// ── IAsyncDisposable ─────────────────────────────────────────────────────
|
|
|
|
public async ValueTask DisposeAsync()
|
|
{
|
|
_optionsChangeRegistration?.Dispose();
|
|
_lock.Dispose();
|
|
|
|
if (_app is { } app)
|
|
{
|
|
_app = null;
|
|
await app.DisposeAsync().ConfigureAwait(false);
|
|
}
|
|
}
|
|
|
|
// ── Logging ──────────────────────────────────────────────────────────────
|
|
|
|
[LoggerMessage(EventId = 70, EventName = "mbproxy.admin.started",
|
|
Level = LogLevel.Information,
|
|
Message = "Admin endpoint started on port {Port}")]
|
|
private static partial void LogAdminStarted(ILogger logger, int port);
|
|
|
|
[LoggerMessage(EventId = 71, EventName = "mbproxy.admin.bind.failed",
|
|
Level = LogLevel.Error,
|
|
Message = "Admin endpoint bind failed — admin page will be unavailable: Port={Port} Reason={Reason}")]
|
|
private static partial void LogAdminBindFailed(ILogger logger, int port, string reason);
|
|
|
|
// ── Inner logger provider (forwards Kestrel/ASP.NET logs to the proxy's factory) ────
|
|
|
|
private sealed class ForwardingLoggerProvider : ILoggerProvider
|
|
{
|
|
private readonly ILoggerFactory _factory;
|
|
public ForwardingLoggerProvider(ILoggerFactory factory) => _factory = factory;
|
|
public ILogger CreateLogger(string categoryName) => _factory.CreateLogger(categoryName);
|
|
public void Dispose() { }
|
|
}
|
|
}
|