56eee3c563
Adds the mbproxy service end-to-end. Phases 00-08 implement the production-ready single-listener / 1:1-backend transparent Modbus TCP proxy with bidirectional BCD rewriting for the ~54-PLC DL205/DL260 fleet. Phase 9 replaces the connection layer with a single backend socket per PLC plus MBAP TxId rewriting, lifting the H2-ECOM100's 4-concurrent-client cap as an operational ceiling. Phase 9 additions of note: - PlcMultiplexer + UpstreamPipe + TxIdAllocator + CorrelationMap - InFlightRequest with IReadOnlyList<InterestedParty> (load-bearing for Phase 10 read coalescing — do not collapse to a single field) - Per-request watchdog: surfaces Modbus exception 0x0B to upstream on BackendRequestTimeoutMs, defending against lost responses, dead-PLC paths, and pymodbus 3.13.0's concurrent-multiplexed- request bug (its ServerRequestHandler.last_pdu state race) - Status DTO + HTML gain inFlight / maxInFlight / txIdWraps / disconnectCascades / queueDepth (Tier 1.6 in docs/kpi.md) Tests: 263 unit + 38 E2E. Multiplexer correctness under truly concurrent backend traffic is proved against a stub backend in PlcMultiplexerTests; MultiplexerE2ETests paces requests so pymodbus 3.13's single-PDU framer stays in known-good mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
148 lines
9.5 KiB
Markdown
148 lines
9.5 KiB
Markdown
# Phase 07 — Status page
|
||
|
||
Stand up the read-only Kestrel-hosted admin endpoint on `Mbproxy.AdminPort`. Two routes — `GET /` (self-contained HTML, meta-refresh 5 s) and `GET /status.json` (the same data as JSON). No admin actions, no auth.
|
||
|
||
**Depends on:** Phase 05 (supervisor snapshots), Phase 06 (config reload counters).
|
||
**Parallel-safe with:** nothing (touches DI registration + needs counters from both 05 and 06).
|
||
|
||
## Goal
|
||
|
||
A single port that an operator can open in a browser and see, at a glance:
|
||
|
||
- Service uptime, version, last-reload timestamp + counts.
|
||
- Every configured PLC's listener state (`bound` / `recovering` / `stopped`), last bind error, currently connected clients and their per-client PDU counts, PDU counts by function code, BCD slots rewritten, partial-overlap warnings, backend exception counts by code, last round-trip ms, bytes upstream/downstream.
|
||
|
||
Same data is exposed as `/status.json` for scraping (Prometheus textfile, custom Nagios check, etc.).
|
||
|
||
## Outputs
|
||
|
||
```
|
||
src/Mbproxy/Admin/AdminEndpointHost.cs # owns the Kestrel server lifecycle
|
||
src/Mbproxy/Admin/StatusSnapshotBuilder.cs # composes per-PLC + service-wide snapshots
|
||
src/Mbproxy/Admin/StatusDto.cs # the wire DTOs for /status.json
|
||
src/Mbproxy/Admin/StatusHtmlRenderer.cs # builds the single-page HTML
|
||
src/Mbproxy/Admin/AssemblyVersionAccessor.cs # cached version string
|
||
|
||
tests/Mbproxy.Tests/Admin/StatusSnapshotBuilderTests.cs
|
||
tests/Mbproxy.Tests/Admin/AdminEndpointTests.cs # HTTP-level; live Kestrel + HttpClient
|
||
```
|
||
|
||
Modifications:
|
||
- `src/Mbproxy/Mbproxy.csproj` — add `Microsoft.AspNetCore.App` framework reference (the Worker SDK doesn't include ASP.NET Core by default).
|
||
- `src/Mbproxy/Program.cs` — register `AdminEndpointHost` as a hosted service; wire it through DI alongside the proxy worker. AdminPort comes from `IOptionsMonitor<MbproxyOptions>`.
|
||
- `src/Mbproxy/Proxy/ProxyCounters.cs` — extend with per-client counters: `IReadOnlyList<ClientCounterSnapshot> Snapshot()` includes connected clients with `Remote`, `ConnectedAtUtc`, `PdusForwarded`, `LastRoundTripMs`.
|
||
- `src/Mbproxy/Proxy/PlcConnectionPair.cs` — record connect time, expose `RemoteEndpoint`, track round-trip time per request (EWMA via `LastRoundTripMs` field).
|
||
- Service-wide counters introduced here: `ServiceCounters` with `UptimeStartedAtUtc`, `LastReloadUtc`, `ReloadCount`, `ReloadRejectedCount`. Wired into `ConfigReconciler` (bump on apply / reject) and the service start path (set started-at).
|
||
|
||
## Tasks
|
||
|
||
1. **`StatusDto.cs`** — record types matching the design's per-PLC + service-wide field tables verbatim. Use `System.Text.Json` source generation (`JsonSerializerContext`) to keep the response allocation-light:
|
||
```csharp
|
||
[JsonSerializable(typeof(StatusResponse))]
|
||
internal partial class StatusJsonContext : JsonSerializerContext;
|
||
```
|
||
2. **`StatusSnapshotBuilder.cs`** — pulls from injected `ProxyWorker` (or a slim view of it), `ConfigReconciler`, `ServiceCounters`, and each `PlcListenerSupervisor`. Builds a `StatusResponse` record. Pure logic; no I/O. The builder is `[Sealed]` and constructed once via DI; calling `Build()` is the only operation.
|
||
3. **`StatusHtmlRenderer.cs`** — pure function `string Render(StatusResponse status)`. Produces a single HTML document with:
|
||
- `<meta http-equiv="refresh" content="5">` for auto-refresh.
|
||
- A header line with service version + uptime + last-reload info.
|
||
- A table per PLC. Columns match the per-PLC field set; `listener.state` is colour-coded inline (CSS in a `<style>` block — no external assets).
|
||
- Total page weight under 50 KB for typical fleets; the design's 54-PLC count puts the table at ~54 rows.
|
||
4. **`AssemblyVersionAccessor.cs`** — reads `AssemblyInformationalVersionAttribute` once at startup, caches it as a string. Used for the `service.version` field.
|
||
5. **`AdminEndpointHost.cs`** — `IHostedService` that:
|
||
- On start: builds a `WebApplication` (Kestrel) configured to listen on `AdminPort`. Maps `GET /` to a handler that calls `StatusSnapshotBuilder.Build()` then `StatusHtmlRenderer.Render()`, returning `text/html`. Maps `GET /status.json` to a handler returning `JsonSerializer.Serialize(snapshot, StatusJsonContext.Default.StatusResponse)`. NO other routes.
|
||
- If `AdminPort` is in use at startup: log `mbproxy.admin.bind.failed` (new event) at Error, do not throw. The proxy listeners continue to run; only the admin endpoint is missing. Operators see this in logs.
|
||
- On hot-reload of `AdminPort`: stop and restart the Kestrel server bound to the new port.
|
||
- On stop: `Stop()` the Kestrel app gracefully with a 2 s deadline.
|
||
6. **`ServiceCounters.cs`** (under `src/Mbproxy/`) — a singleton DI service holding the service-wide counters. `Initialize(DateTimeOffset startedAtUtc)`; `RecordReloadApplied(DateTimeOffset)`; `RecordReloadRejected()`. Snapshot returns an immutable record.
|
||
|
||
## Public surface declared in this phase
|
||
|
||
```csharp
|
||
namespace Mbproxy.Admin;
|
||
|
||
internal sealed class AdminEndpointHost : IHostedService { /* ... */ }
|
||
|
||
public sealed record StatusResponse(
|
||
ServiceFields Service,
|
||
ListenersAggregate Listeners,
|
||
IReadOnlyList<PlcStatus> Plcs);
|
||
|
||
public sealed record ServiceFields(
|
||
long UptimeSeconds, string Version,
|
||
DateTimeOffset? ConfigLastReloadUtc, int ConfigReloadCount, int ConfigReloadRejectedCount);
|
||
|
||
public sealed record ListenersAggregate(int Bound, int Configured);
|
||
|
||
public sealed record PlcStatus(
|
||
string Name, string Host, int ListenPort,
|
||
PlcListenerStatus Listener,
|
||
PlcClientsStatus Clients,
|
||
PlcPdusStatus Pdus,
|
||
PlcBackendStatus Backend,
|
||
PlcBytesStatus Bytes);
|
||
|
||
public sealed record PlcListenerStatus(string State, string? LastBindError, int RecoveryAttempts);
|
||
public sealed record PlcClientsStatus(int Connected, IReadOnlyList<ClientSnapshot> RemoteEndpoints);
|
||
public sealed record ClientSnapshot(string Remote, DateTimeOffset ConnectedAtUtc, long PdusForwarded);
|
||
public sealed record PlcPdusStatus(long Forwarded, FcCounts ByFc, long RewrittenSlots, long PartialBcdWarnings);
|
||
public sealed record FcCounts(long Fc03, long Fc04, long Fc06, long Fc16, long Other);
|
||
public sealed record PlcBackendStatus(long ConnectsSuccess, long ConnectsFailed, ExceptionCounts ExceptionsByCode, double LastRoundTripMs);
|
||
public sealed record ExceptionCounts(long Code01, long Code02, long Code03, long Code04);
|
||
public sealed record PlcBytesStatus(long UpstreamIn, long UpstreamOut);
|
||
```
|
||
|
||
## Tests required
|
||
|
||
### Unit (`Category = Unit`)
|
||
|
||
`StatusSnapshotBuilderTests` (≥ 6 tests):
|
||
|
||
1. `Build_NoPlcsConfigured_ReturnsEmptyPlcList`
|
||
2. `Build_OnePlcBound_PopulatesListenerState_Bound`
|
||
3. `Build_PlcRecovering_PopulatesLastBindError_AndAttempts`
|
||
4. `Build_AggregatesListenersBoundAndConfigured`
|
||
5. `Build_PerClientSnapshot_Includes_RemoteAndConnectedAt_AndPduCount`
|
||
6. `Build_ServiceFields_IncludeUptime_Version_AndLastReload`
|
||
|
||
`StatusHtmlRendererTests` (≥ 3 tests):
|
||
|
||
1. `Render_OnePlc_ProducesValidHtml_WithMetaRefresh`
|
||
2. `Render_RecoveringPlc_HighlightsState`
|
||
3. `Render_PageWeightUnder50KB_For54Plcs` — assert character length.
|
||
|
||
### E2E (`Category = E2E`)
|
||
|
||
`AdminEndpointTests` (≥ 5 tests, against a live in-process Kestrel + simulator):
|
||
|
||
1. `Get_StatusJson_ReturnsValidShape`
|
||
2. `Get_StatusJson_AfterReadFC03_ShowsPduCountIncreased`
|
||
3. `Get_StatusJson_AfterPartialBcdWrite_ShowsPartialBcdWarning`
|
||
4. `Get_Root_ReturnsHtml_WithMetaRefresh`
|
||
5. `AdminPort_BindFailure_ServiceStaysUp_AndLogsBindFailed` — pre-bind the AdminPort, start the service, assert proxy listeners come up and the admin endpoint logs the failure.
|
||
|
||
## Phase gate
|
||
|
||
- [ ] Zero-warnings build.
|
||
- [ ] All phase 00–06 tests still green.
|
||
- [ ] All new unit + e2e tests green.
|
||
- [ ] `/status.json` shape matches the field tables in [`../design.md`](../design.md) → "Status page" exactly (field names, casing, nesting).
|
||
- [ ] Counters on the read path (`PdusForwarded`, etc.) remain allocation-free; `Snapshot()` is the only allocating call and it's on the cold path.
|
||
- [ ] AdminPort collision is logged but does NOT take down the proxy.
|
||
- [ ] Hot-reload of `AdminPort` works (verified by adding a test in this phase or extending one of phase 06's e2e tests).
|
||
|
||
## Out of scope
|
||
|
||
- Authentication / authorisation on the admin port. Design explicitly defers to network-layer trust.
|
||
- Prometheus exposition format. The `/status.json` shape is the contract; downstream tools can transform.
|
||
- WebSocket push of counters. Meta-refresh is good enough at 54 PLCs.
|
||
- Historical counter retention (rolling windows, time series). Counters are cumulative since process start; restart resets.
|
||
- Per-tag-level telemetry (which BCD addresses got rewritten how often). The per-PLC `RewrittenSlots` total is enough; finer granularity goes in a future phase if needed.
|
||
|
||
## Notes for the subagent
|
||
|
||
- Use the minimal-API style for the two endpoints; no controllers. The whole admin endpoint is ~50 lines of map / handler code.
|
||
- `System.Text.Json` source generation needs `[JsonSerializable]` on the DTO chain. Don't use reflection-based serialization in this codebase — it adds AOT-unsafety and is slower for the simple shape.
|
||
- For the HTML page, embed CSS in a `<style>` block. Do not link external stylesheets — the admin endpoint must work over a firewalled network with no internet egress.
|
||
- Test 3 of `AdminEndpointTests` requires triggering a partial-BCD warning, which means configuring a 32-bit BCD tag and reading only one half of it through the proxy. This is the same scenario phase 04's e2e test 5 exercised; reuse the setup.
|
||
- The admin port collision test is important: an operator misconfiguration must not take down the proxy itself. Log Error, continue running.
|