The standalone design.md, kpi.md, operations.md, and the docs/plan/ phase tree were point-in-time planning artefacts now superseded by the topic-organized docs/ tree (Architecture/, Features/, Operations/, Reference/, Testing/). The DL260/ folder mixed a device-reference doc, a test fixture, a sample test, and a screenshot; its contents now live in their natural homes (dl205.md + mbtcp_settings.JPG under docs/Reference/, dl205.json next to its launcher in tests/sim/, sample test dropped). All cross-references in the surviving docs, README, CLAUDE.md, the config template, and source comments are repointed to the new locations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
22 KiB
Status Page
The status page is the operator-facing view of the running service: an auto-refreshing HTML dashboard at GET / and a JSON twin at GET /status.json that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions.
Endpoint Surface
The admin endpoint is owned by AdminEndpointHost (see src/Mbproxy/Admin/AdminEndpointHost.cs). It exposes exactly two routes:
GET /— a single self-contained HTML document with a<meta http-equiv="refresh" content="5">tag. The page refreshes every five seconds by reload, not by JavaScript polling. There is no JS bundle, no external CSS, no remote fonts, and no favicon fetch.GET /status.json— the same in-memory snapshot serialized as JSON via the source-generatedStatusJsonContext(camelCase property names).
The endpoint is read-only. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. Reload happens automatically via IOptionsMonitor; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to IPAddress.Any on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall.
Both routes call StatusSnapshotBuilder.Build() for every request. The builder reads atomic counters directly from the supervisor map and per-PLC ProxyCounters; it holds no locks and performs no I/O.
Port and Configuration
The listen port is read from Mbproxy.AdminPort and defaults to 8080. Configuration semantics for this key live in ./Configuration.md.
If Kestrel cannot bind the configured port at startup (port already in use, missing permissions on a reserved range, etc.) the host logs mbproxy.admin.bind.failed at Error level with the underlying reason. The host then sets _app = null and returns — the rest of the service keeps running. The Modbus listener supervisors are completely independent of the admin endpoint, so a bind failure here is non-fatal for proxying. See ../Reference/LogEvents.md for the event-id catalogue.
If Mbproxy.AdminPort changes via hot-reload, the currently-running Kestrel app is stopped (2 s deadline) and a new one is started on the new port. Other config changes do not touch the admin endpoint.
Service-Wide Fields
Top-level fields come from ServiceFields and ListenersAggregate in src/Mbproxy/Admin/StatusDto.cs.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
service.uptimeSeconds |
long |
ServiceFields.UptimeSeconds |
Seconds since process start, computed as now - ServiceCounters.StartedAtUtc at snapshot time. |
service.version |
string |
ServiceFields.Version via AssemblyVersionAccessor |
AssemblyInformationalVersion of the running assembly. Useful for confirming a deployment took effect. |
service.configLastReloadUtc |
DateTimeOffset? |
ServiceCounters.LastReloadUtc |
Wall-clock time of the most recent accepted hot-reload. null if no reload has occurred since process start. See ../Features/HotReload.md. |
service.configReloadCount |
int |
ServiceCounters.ReloadAppliedCount |
Number of appsettings.json reloads that validated and applied since process start. |
service.configReloadRejectedCount |
int |
ServiceCounters.ReloadRejectedCount |
Number of reload attempts rejected by validation. A non-zero value here paired with a stale configLastReloadUtc indicates the operator's last edit was malformed and the service is still running the previous config. |
listeners.bound |
int |
boundCount accumulated while iterating opts.Plcs |
Count of PLC entries whose supervisor currently reports SupervisorState.Bound. |
listeners.configured |
int |
opts.Plcs.Count |
Total number of PLC entries in the active configuration. |
Operator triggers:
listeners.bound < listeners.configuredfor more than one refresh cycle indicates one or more listeners are stuck recovering. Drill into the per-PLClistener.stateandlistener.lastBindErrorfields below.configReloadRejectedCountrising means edits are reaching the watcher but failing validation — check the live log formbproxy.config.reload.rejected.
Per-PLC Fields
Each entry in plcs[] is a PlcStatus (see src/Mbproxy/Admin/StatusDto.cs). The builder iterates opts.Plcs in configured order, looks up the matching supervisor in ProxyWorker.Supervisors, and projects the supervisor's CurrentCounters.Snapshot() into wire fields.
Identity
| JSON path | Type | Source | Meaning |
|---|---|---|---|
name |
string |
PlcOptions.Name |
Stable identifier from appsettings.json. Used as the dictionary key for supervisor lookup. |
host |
string |
PlcOptions.Host |
Backend PLC host (IP or DNS name) the proxy connects out to. |
listenPort |
int |
PlcOptions.ListenPort |
Local TCP port the proxy binds for upstream clients connecting to the proxy. |
Listener state
| JSON path | Type | Source | Meaning |
|---|---|---|---|
listener.state |
string |
SupervisorSnapshot.State mapped to "bound" / "recovering" / "stopped" |
Current supervisor state. bound = TCP listener is accepting connections; recovering = Polly retry loop is trying to re-bind after a fault; stopped = no supervisor entry (typically a PLC that was just added and not yet started). |
listener.lastBindError |
string? |
SupervisorSnapshot.LastBindError |
Message from the last bind exception. Populated whenever state == "recovering". Common values: "Address already in use", "Permission denied". |
listener.recoveryAttempts |
int |
SupervisorSnapshot.RecoveryAttempts |
Number of bind retries since the supervisor entered recovery. Resets on a successful bind. A monotonically rising value indicates the underlying problem is persistent. |
Client tracking
| JSON path | Type | Source | Meaning |
|---|---|---|---|
clients.connected |
int |
clientSnapshots.Count |
Number of currently-connected upstream clients. Capped by the H2-ECOM100 four-client ceiling; values at 4 imply additional upstream connect attempts will be refused by the PLC. |
clients.remoteEndpoints[].remote |
string |
UpstreamPipe.RemoteEp |
Upstream TCP endpoint as ip:port. |
clients.remoteEndpoints[].connectedAtUtc |
DateTimeOffset |
UpstreamPipe.ConnectedAtUtc |
Wall-clock time the upstream socket was accepted. Useful for spotting zombie sockets that survived a network outage. |
clients.remoteEndpoints[].pdusForwarded |
long |
UpstreamPipe.PdusForwardedCount |
PDUs forwarded on this specific upstream pipe since it connected. Lets you see which client is responsible for what fraction of fleet traffic. |
PDU traffic
| JSON path | Type | Source | Meaning |
|---|---|---|---|
pdus.forwarded |
long |
CounterSnapshot.PdusForwarded |
Total PDUs (requests + responses) that traversed the proxy for this PLC since start. Increments once per PDU handed to the rewriter. |
pdus.byFc.fc03 |
long |
CounterSnapshot.Fc03 |
Count of FC03 (read holding registers) requests seen. |
pdus.byFc.fc04 |
long |
CounterSnapshot.Fc04 |
Count of FC04 (read input registers) requests seen. |
pdus.byFc.fc06 |
long |
CounterSnapshot.Fc06 |
Count of FC06 (write single register) requests seen. |
pdus.byFc.fc16 |
long |
CounterSnapshot.Fc16 |
Count of FC16 (write multiple registers) requests seen. |
pdus.byFc.other |
long |
CounterSnapshot.FcOther |
All other function codes (FC01/02/05/15, diagnostic codes, etc.) seen. The proxy forwards these untouched. |
pdus.rewrittenSlots |
long |
CounterSnapshot.RewrittenSlots |
Number of register slots the BCD rewriter touched, counting reads and writes. Indicates how much of the traffic actually hits BCD-configured addresses. See ../Features/BcdRewriting.md. |
pdus.partialBcdWarnings |
long |
CounterSnapshot.PartialBcdWarnings |
Count of requests whose [start, qty) range partially overlapped a 32-bit BCD tag without fully covering its CDAB word pair. A rising value here is an operator signal: an upstream client is requesting partial-overlap reads, which the proxy cannot rewrite safely — review tag-list addresses or fix the client's request shape. |
Backend health
| JSON path | Type | Source | Meaning |
|---|---|---|---|
backend.connectsSuccess |
long |
CounterSnapshot.ConnectsSuccess |
Successful backend TCP connects since start. Increments once per accepted upstream client (the proxy opens one backend socket per upstream client). |
backend.connectsFailed |
long |
CounterSnapshot.ConnectsFailed |
Failed backend TCP connects after the Polly retry budget is exhausted (3 attempts at 100/500/2000 ms). A rising counter means the backend host is unreachable or the PLC is at its connection cap. |
backend.exceptionsByCode.code01 |
long |
CounterSnapshot.BackendException01 |
Count of Modbus exception responses with code 01 (Illegal Function) received from the PLC. Typically indicates a client is sending function codes the PLC does not support. |
backend.exceptionsByCode.code02 |
long |
CounterSnapshot.BackendException02 |
Code 02 (Illegal Data Address) — the requested register range is out of the PLC's V-memory map. |
backend.exceptionsByCode.code03 |
long |
CounterSnapshot.BackendException03 |
Code 03 (Illegal Data Value) — quantity exceeds the PLC's per-FC cap (FC03/04 = 128 registers, FC16 = 100). |
backend.exceptionsByCode.code04 |
long |
CounterSnapshot.BackendException04 |
Code 04 (Server Device Failure) — internal PLC fault, often correlated with the PLC entering STOP mode. |
backend.lastRoundTripMs |
double |
CounterSnapshot.LastRoundTripMs |
Exponentially-weighted moving average of recent successful request → response round-trip times in milliseconds. Tracks PLC responsiveness; sustained values above the historical baseline indicate backend latency degradation. |
Multiplexer state
These five fields describe the per-PLC backend multiplexer. See ../Architecture/ConnectionModel.md for the design rationale and how transaction-id (TxId) reuse and queueing work.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
backend.inFlight |
long |
CounterSnapshot.InFlightCount |
Number of MBAP transactions currently in flight on the backend socket (request sent, response pending). |
backend.maxInFlight |
long |
CounterSnapshot.MaxInFlight |
High-water mark of inFlight since start. Used to size the queue and to verify the multiplexer is in fact pipelining requests. |
backend.txIdWraps |
long |
CounterSnapshot.TxIdWraps |
Times the 16-bit MBAP transaction-id allocator has wrapped through 0xFFFF. A rising rate quantifies sustained request volume. |
backend.disconnectCascades |
long |
CounterSnapshot.BackendDisconnectCascades |
Times a backend disconnect cascaded into closing all upstream pipes that were waiting on in-flight TxIds. Each cascade aborts every queued request bound for that PLC. |
backend.queueDepth |
long |
CounterSnapshot.BackendQueueDepth |
Current count of requests queued behind the multiplexer's TxId allocator and write semaphore. A sustained non-zero queue means the multiplexer is the bottleneck (backend slower than upstream demand). |
Coalescing counters
These fields describe duplicate-read coalescing on FC03/FC04. See ../Architecture/ReadCoalescing.md for the matching criteria and lifecycle.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
backend.coalescedHitCount |
long |
CounterSnapshot.CoalescedHitCount |
Reads that attached to an already-in-flight identical read instead of issuing a new backend request. |
backend.coalescedMissCount |
long |
CounterSnapshot.CoalescedMissCount |
Reads that did not find a matching in-flight request and issued their own. The dashboard-side ratio is hit / (hit + miss); the wire format intentionally does not carry the derived ratio (consumers compute it). |
backend.coalescedResponseToDeadUpstream |
long |
CounterSnapshot.CoalescedResponseToDeadUpstream |
Coalesced responses that arrived after their attached upstream pipe had closed. Normal in bursty traffic; sustained growth indicates upstream clients are aborting too quickly. |
Cache counters
These fields describe the short-TTL response cache for FC03/FC04. See ../Architecture/ResponseCache.md.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
backend.cacheHitCount |
long |
CounterSnapshot.CacheHitCount |
Reads served from the cache without touching the backend at all. |
backend.cacheMissCount |
long |
CounterSnapshot.CacheMissCount |
Cache-eligible reads that fell through to the backend. The derived cacheHitRatio is hit / (hit + miss); like coalescing, it is not carried on the wire. |
backend.cacheInvalidations |
long |
CounterSnapshot.CacheInvalidations |
Times a write (FC06/FC16) invalidated overlapping cache entries on this PLC. A high invalidation rate relative to writes means write coverage is broad and the cache is doing less work. |
Cache memory-watch
These two fields are Tier-2 KPIs intended for memory-budget alerts. The cache is per-PLC; the dashboard aggregates these across the fleet.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
backend.cacheEntryCount |
long |
CounterSnapshot.CacheEntryCount |
Current number of cached response entries for this PLC. |
backend.cacheBytes |
long |
CounterSnapshot.CacheBytes |
Approximate byte cost of the cache entries (response payloads plus key overhead). Used to detect runaway growth from a chatty client. |
Bytes
| JSON path | Type | Source | Meaning |
|---|---|---|---|
bytes.upstreamIn |
long |
CounterSnapshot.BytesUpstreamIn |
Total bytes read from upstream client sockets bound to this PLC since start. |
bytes.upstreamOut |
long |
CounterSnapshot.BytesUpstreamOut |
Total bytes written back to upstream client sockets bound to this PLC since start. |
Counter Atomicity
All counters are System.Threading.Interlocked longs. Each read in StatusSnapshotBuilder.Build() is atomic per field; no locks are held across the snapshot build, and the build itself does no I/O.
The practical consequence: a single /status.json request returns a coherent value for any one counter, but the assembled response is not a globally consistent snapshot — different per-PLC counters may straddle increments by microseconds. For example, pdus.forwarded for PLC A and pdus.forwarded for PLC B are not guaranteed to reflect the same instant. This is acceptable for dashboards and rate calculations; do not use these counters for fine-grained accounting.
Example JSON Response
A representative two-PLC deployment, ~2 hours into a run:
{
"service": {
"uptimeSeconds": 7234,
"version": "1.0.0",
"configLastReloadUtc": "2026-05-13T14:02:11+00:00",
"configReloadCount": 2,
"configReloadRejectedCount": 0
},
"listeners": {
"bound": 2,
"configured": 2
},
"plcs": [
{
"name": "line1-press",
"host": "10.20.30.41",
"listenPort": 5021,
"listener": {
"state": "bound",
"lastBindError": null,
"recoveryAttempts": 0
},
"clients": {
"connected": 2,
"remoteEndpoints": [
{
"remote": "10.20.40.10:51223",
"connectedAtUtc": "2026-05-13T12:01:55+00:00",
"pdusForwarded": 184213
},
{
"remote": "10.20.40.11:53901",
"connectedAtUtc": "2026-05-13T13:30:02+00:00",
"pdusForwarded": 41008
}
]
},
"pdus": {
"forwarded": 225221,
"byFc": {
"fc03": 218904,
"fc04": 0,
"fc06": 12,
"fc16": 6203,
"other": 102
},
"rewrittenSlots": 1318622,
"partialBcdWarnings": 0
},
"backend": {
"connectsSuccess": 2,
"connectsFailed": 0,
"exceptionsByCode": {
"code01": 0,
"code02": 14,
"code03": 0,
"code04": 0
},
"lastRoundTripMs": 12.4,
"inFlight": 1,
"maxInFlight": 4,
"txIdWraps": 3,
"disconnectCascades": 0,
"queueDepth": 0,
"coalescedHitCount": 41892,
"coalescedMissCount": 177012,
"coalescedResponseToDeadUpstream": 7,
"cacheHitCount": 88321,
"cacheMissCount": 88691,
"cacheInvalidations": 6203,
"cacheEntryCount": 47,
"cacheBytes": 18512
},
"bytes": {
"upstreamIn": 4108290,
"upstreamOut": 12993021
}
},
{
"name": "line2-oven",
"host": "10.20.30.42",
"listenPort": 5022,
"listener": {
"state": "recovering",
"lastBindError": "Address already in use",
"recoveryAttempts": 12
},
"clients": {
"connected": 0,
"remoteEndpoints": []
},
"pdus": {
"forwarded": 0,
"byFc": { "fc03": 0, "fc04": 0, "fc06": 0, "fc16": 0, "other": 0 },
"rewrittenSlots": 0,
"partialBcdWarnings": 0
},
"backend": {
"connectsSuccess": 0,
"connectsFailed": 0,
"exceptionsByCode": { "code01": 0, "code02": 0, "code03": 0, "code04": 0 },
"lastRoundTripMs": 0.0,
"inFlight": 0,
"maxInFlight": 0,
"txIdWraps": 0,
"disconnectCascades": 0,
"queueDepth": 0,
"coalescedHitCount": 0,
"coalescedMissCount": 0,
"coalescedResponseToDeadUpstream": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"cacheInvalidations": 0,
"cacheEntryCount": 0,
"cacheBytes": 0
},
"bytes": { "upstreamIn": 0, "upstreamOut": 0 }
}
]
}
HTML Page Layout
The HTML renderer is StatusHtmlRenderer.Render(StatusResponse) in src/Mbproxy/Admin/StatusHtmlRenderer.cs. The page is one document, inline CSS in a <style> block, no external resources of any kind — operators can serve it behind a corporate firewall without whitelisting a CDN.
Structure:
- Header summary — version, formatted uptime (
Nh MMm SSs),bound/configuredlistener tally, last reload timestamp, reload count with a(N rejected)suffix when applicable. - PLC table — one row per configured PLC. Columns: Name, Host, Port, State (colour-coded —
bound= green,recovering= orange,stopped= grey), Clients (count plus a comma-separated list ofremote (N PDUs)), PDUs forwarded, FC03/FC04/FC06/FC16/FC? counts, BCD slots, Partial BCD, exception codes 01/02/03/04, RTT (ms), bytes in/out, multiplexer columns (in-flight, max in-flight, TxId wraps, cascades, queue), coalescing ratio cell, cache ratio cell. - State cell error detail — when
state == "recovering", the cell also showslastBindErrorand(attempt N)in a small red span.
The coalescing and cache cells each render as <pct>% (<hits>). When neither has been exercised (hit + miss == 0), the cell renders an em-dash to keep the column narrow. Page weight is bounded by the design budget (≤ 50 KB for a 54-PLC fleet).
The page does not depend on JavaScript. Refresh is driven entirely by the <meta http-equiv="refresh" content="5"> tag, so any browser — including text-mode browsers — sees the same view.
How to Scrape It
The JSON twin is plain HTTP. Any monitoring system that can curl an endpoint can scrape it.
PowerShell, pulling the cache hit ratio for the first PLC into a variable:
$snap = Invoke-WebRequest -Uri "http://mbproxy-host:8080/status.json" -UseBasicParsing |
Select-Object -ExpandProperty Content |
ConvertFrom-Json
$plc = $snap.plcs[0]
$hits = $plc.backend.cacheHitCount
$total = $hits + $plc.backend.cacheMissCount
$ratio = if ($total -gt 0) { [math]::Round(100.0 * $hits / $total, 1) } else { 0.0 }
"PLC $($plc.name): cache hit ratio = $ratio% over $total reads"
Bash with curl and jq, fanning out across the fleet:
curl -s http://mbproxy-host:8080/status.json |
jq -r '.plcs[] | "\(.name)\t\(.listener.state)\t\(.backend.lastRoundTripMs)"'
Prometheus-style scrapers should poll /status.json directly and translate fields into their own metric names; the service does not expose Prometheus exposition format.
Scope of This Document
This document covers the endpoint surface: what is on the wire and how each field is computed. When a new counter is added, list it here.
Related Documentation
../Architecture/ConnectionModel.md— multiplexer counter meanings (inFlight,maxInFlight,txIdWraps,queueDepth,disconnectCascades).../Architecture/ReadCoalescing.md— coalescing counter meanings and matching criteria.../Architecture/ResponseCache.md— cache counter meanings, TTL, invalidation rules.../Features/BcdRewriting.md— what incrementsrewrittenSlotsandpartialBcdWarnings.../Features/HotReload.md— what incrementsconfigReloadCountvs.configReloadRejectedCount../Configuration.md—Mbproxy.AdminPortand other option keys../Troubleshooting.md— using these counters to diagnose specific failure modes.../Reference/LogEvents.md— event-id catalogue includingmbproxy.admin.bind.failed.