56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.5 KiB
9.5 KiB
Phase 07 — Status page
Stand up the read-only Kestrel-hosted admin endpoint on Mbproxy.AdminPort. Two routes — GET / (self-contained HTML, meta-refresh 5 s) and GET /status.json (the same data as JSON). No admin actions, no auth.
Depends on: Phase 05 (supervisor snapshots), Phase 06 (config reload counters). Parallel-safe with: nothing (touches DI registration + needs counters from both 05 and 06).
Goal
A single port that an operator can open in a browser and see, at a glance:
- Service uptime, version, last-reload timestamp + counts.
- Every configured PLC's listener state (
bound/recovering/stopped), last bind error, currently connected clients and their per-client PDU counts, PDU counts by function code, BCD slots rewritten, partial-overlap warnings, backend exception counts by code, last round-trip ms, bytes upstream/downstream.
Same data is exposed as /status.json for scraping (Prometheus textfile, custom Nagios check, etc.).
Outputs
src/Mbproxy/Admin/AdminEndpointHost.cs # owns the Kestrel server lifecycle
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # composes per-PLC + service-wide snapshots
src/Mbproxy/Admin/StatusDto.cs # the wire DTOs for /status.json
src/Mbproxy/Admin/StatusHtmlRenderer.cs # builds the single-page HTML
src/Mbproxy/Admin/AssemblyVersionAccessor.cs # cached version string
tests/Mbproxy.Tests/Admin/StatusSnapshotBuilderTests.cs
tests/Mbproxy.Tests/Admin/AdminEndpointTests.cs # HTTP-level; live Kestrel + HttpClient
Modifications:
src/Mbproxy/Mbproxy.csproj— addMicrosoft.AspNetCore.Appframework reference (the Worker SDK doesn't include ASP.NET Core by default).src/Mbproxy/Program.cs— registerAdminEndpointHostas a hosted service; wire it through DI alongside the proxy worker. AdminPort comes fromIOptionsMonitor<MbproxyOptions>.src/Mbproxy/Proxy/ProxyCounters.cs— extend with per-client counters:IReadOnlyList<ClientCounterSnapshot> Snapshot()includes connected clients withRemote,ConnectedAtUtc,PdusForwarded,LastRoundTripMs.src/Mbproxy/Proxy/PlcConnectionPair.cs— record connect time, exposeRemoteEndpoint, track round-trip time per request (EWMA viaLastRoundTripMsfield).- Service-wide counters introduced here:
ServiceCounterswithUptimeStartedAtUtc,LastReloadUtc,ReloadCount,ReloadRejectedCount. Wired intoConfigReconciler(bump on apply / reject) and the service start path (set started-at).
Tasks
StatusDto.cs— record types matching the design's per-PLC + service-wide field tables verbatim. UseSystem.Text.Jsonsource generation (JsonSerializerContext) to keep the response allocation-light:[JsonSerializable(typeof(StatusResponse))] internal partial class StatusJsonContext : JsonSerializerContext;StatusSnapshotBuilder.cs— pulls from injectedProxyWorker(or a slim view of it),ConfigReconciler,ServiceCounters, and eachPlcListenerSupervisor. Builds aStatusResponserecord. Pure logic; no I/O. The builder is[Sealed]and constructed once via DI; callingBuild()is the only operation.StatusHtmlRenderer.cs— pure functionstring Render(StatusResponse status). Produces a single HTML document with:<meta http-equiv="refresh" content="5">for auto-refresh.- A header line with service version + uptime + last-reload info.
- A table per PLC. Columns match the per-PLC field set;
listener.stateis colour-coded inline (CSS in a<style>block — no external assets). - Total page weight under 50 KB for typical fleets; the design's 54-PLC count puts the table at ~54 rows.
AssemblyVersionAccessor.cs— readsAssemblyInformationalVersionAttributeonce at startup, caches it as a string. Used for theservice.versionfield.AdminEndpointHost.cs—IHostedServicethat:- On start: builds a
WebApplication(Kestrel) configured to listen onAdminPort. MapsGET /to a handler that callsStatusSnapshotBuilder.Build()thenStatusHtmlRenderer.Render(), returningtext/html. MapsGET /status.jsonto a handler returningJsonSerializer.Serialize(snapshot, StatusJsonContext.Default.StatusResponse). NO other routes. - If
AdminPortis in use at startup: logmbproxy.admin.bind.failed(new event) at Error, do not throw. The proxy listeners continue to run; only the admin endpoint is missing. Operators see this in logs. - On hot-reload of
AdminPort: stop and restart the Kestrel server bound to the new port. - On stop:
Stop()the Kestrel app gracefully with a 2 s deadline.
- On start: builds a
ServiceCounters.cs(undersrc/Mbproxy/) — a singleton DI service holding the service-wide counters.Initialize(DateTimeOffset startedAtUtc);RecordReloadApplied(DateTimeOffset);RecordReloadRejected(). Snapshot returns an immutable record.
Public surface declared in this phase
namespace Mbproxy.Admin;
internal sealed class AdminEndpointHost : IHostedService { /* ... */ }
public sealed record StatusResponse(
ServiceFields Service,
ListenersAggregate Listeners,
IReadOnlyList<PlcStatus> Plcs);
public sealed record ServiceFields(
long UptimeSeconds, string Version,
DateTimeOffset? ConfigLastReloadUtc, int ConfigReloadCount, int ConfigReloadRejectedCount);
public sealed record ListenersAggregate(int Bound, int Configured);
public sealed record PlcStatus(
string Name, string Host, int ListenPort,
PlcListenerStatus Listener,
PlcClientsStatus Clients,
PlcPdusStatus Pdus,
PlcBackendStatus Backend,
PlcBytesStatus Bytes);
public sealed record PlcListenerStatus(string State, string? LastBindError, int RecoveryAttempts);
public sealed record PlcClientsStatus(int Connected, IReadOnlyList<ClientSnapshot> RemoteEndpoints);
public sealed record ClientSnapshot(string Remote, DateTimeOffset ConnectedAtUtc, long PdusForwarded);
public sealed record PlcPdusStatus(long Forwarded, FcCounts ByFc, long RewrittenSlots, long PartialBcdWarnings);
public sealed record FcCounts(long Fc03, long Fc04, long Fc06, long Fc16, long Other);
public sealed record PlcBackendStatus(long ConnectsSuccess, long ConnectsFailed, ExceptionCounts ExceptionsByCode, double LastRoundTripMs);
public sealed record ExceptionCounts(long Code01, long Code02, long Code03, long Code04);
public sealed record PlcBytesStatus(long UpstreamIn, long UpstreamOut);
Tests required
Unit (Category = Unit)
StatusSnapshotBuilderTests (≥ 6 tests):
Build_NoPlcsConfigured_ReturnsEmptyPlcListBuild_OnePlcBound_PopulatesListenerState_BoundBuild_PlcRecovering_PopulatesLastBindError_AndAttemptsBuild_AggregatesListenersBoundAndConfiguredBuild_PerClientSnapshot_Includes_RemoteAndConnectedAt_AndPduCountBuild_ServiceFields_IncludeUptime_Version_AndLastReload
StatusHtmlRendererTests (≥ 3 tests):
Render_OnePlc_ProducesValidHtml_WithMetaRefreshRender_RecoveringPlc_HighlightsStateRender_PageWeightUnder50KB_For54Plcs— assert character length.
E2E (Category = E2E)
AdminEndpointTests (≥ 5 tests, against a live in-process Kestrel + simulator):
Get_StatusJson_ReturnsValidShapeGet_StatusJson_AfterReadFC03_ShowsPduCountIncreasedGet_StatusJson_AfterPartialBcdWrite_ShowsPartialBcdWarningGet_Root_ReturnsHtml_WithMetaRefreshAdminPort_BindFailure_ServiceStaysUp_AndLogsBindFailed— pre-bind the AdminPort, start the service, assert proxy listeners come up and the admin endpoint logs the failure.
Phase gate
- Zero-warnings build.
- All phase 00–06 tests still green.
- All new unit + e2e tests green.
/status.jsonshape matches the field tables in../design.md→ "Status page" exactly (field names, casing, nesting).- Counters on the read path (
PdusForwarded, etc.) remain allocation-free;Snapshot()is the only allocating call and it's on the cold path. - AdminPort collision is logged but does NOT take down the proxy.
- Hot-reload of
AdminPortworks (verified by adding a test in this phase or extending one of phase 06's e2e tests).
Out of scope
- Authentication / authorisation on the admin port. Design explicitly defers to network-layer trust.
- Prometheus exposition format. The
/status.jsonshape is the contract; downstream tools can transform. - WebSocket push of counters. Meta-refresh is good enough at 54 PLCs.
- Historical counter retention (rolling windows, time series). Counters are cumulative since process start; restart resets.
- Per-tag-level telemetry (which BCD addresses got rewritten how often). The per-PLC
RewrittenSlotstotal is enough; finer granularity goes in a future phase if needed.
Notes for the subagent
- Use the minimal-API style for the two endpoints; no controllers. The whole admin endpoint is ~50 lines of map / handler code.
System.Text.Jsonsource generation needs[JsonSerializable]on the DTO chain. Don't use reflection-based serialization in this codebase — it adds AOT-unsafety and is slower for the simple shape.- For the HTML page, embed CSS in a
<style>block. Do not link external stylesheets — the admin endpoint must work over a firewalled network with no internet egress. - Test 3 of
AdminEndpointTestsrequires triggering a partial-BCD warning, which means configuring a 32-bit BCD tag and reading only one half of it through the proxy. This is the same scenario phase 04's e2e test 5 exercised; reuse the setup. - The admin port collision test is important: an operator misconfiguration must not take down the proxy itself. Log Error, continue running.