0868613890
The DL205/DL260 ECOM emits no TCP keepalives, so an idle backend socket can be silently dropped by a middlebox (switch, firewall, NAT) after 2-5 minutes. Enable OS SO_KEEPALIVE on backend and accepted upstream sockets, and drive a periodic synthetic FC03 heartbeat on each idle backend socket so a dead path is detected before a real client request hits it. Controlled by Connection.Keepalive (ON by default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
350 lines
23 KiB
Markdown
350 lines
23 KiB
Markdown
# Status Page
|
|
|
|
The status page is the operator-facing view of the running service: an auto-refreshing HTML dashboard at `GET /` and a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions.
|
|
|
|
## Endpoint Surface
|
|
|
|
The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes exactly two routes:
|
|
|
|
- `GET /` — a single self-contained HTML document with a `<meta http-equiv="refresh" content="5">` tag. The page refreshes every five seconds by reload, not by JavaScript polling. There is no JS bundle, no external CSS, no remote fonts, and no favicon fetch.
|
|
- `GET /status.json` — the same in-memory snapshot serialized as JSON via the source-generated `StatusJsonContext` (camelCase property names).
|
|
|
|
The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall.
|
|
|
|
Both routes call `StatusSnapshotBuilder.Build()` for every request. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O.
|
|
|
|
## Port and Configuration
|
|
|
|
The listen port is read from `Mbproxy.AdminPort` and defaults to `8080`. Configuration semantics for this key live in [`./Configuration.md`](./Configuration.md).
|
|
|
|
If Kestrel cannot bind the configured port at startup (port already in use, missing permissions on a reserved range, etc.) the host logs `mbproxy.admin.bind.failed` at `Error` level with the underlying reason. The host then sets `_app = null` and returns — the rest of the service keeps running. The Modbus listener supervisors are completely independent of the admin endpoint, so a bind failure here is non-fatal for proxying. See [`../Reference/LogEvents.md`](../Reference/LogEvents.md) for the event-id catalogue.
|
|
|
|
If `Mbproxy.AdminPort` changes via hot-reload, the currently-running Kestrel app is stopped (2 s deadline) and a new one is started on the new port. Other config changes do not touch the admin endpoint.
|
|
|
|
## Service-Wide Fields
|
|
|
|
Top-level fields come from `ServiceFields` and `ListenersAggregate` in `src/Mbproxy/Admin/StatusDto.cs`.
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `service.uptimeSeconds` | `long` | `ServiceFields.UptimeSeconds` | Seconds since process start, computed as `now - ServiceCounters.StartedAtUtc` at snapshot time. |
|
|
| `service.version` | `string` | `ServiceFields.Version` via `AssemblyVersionAccessor` | `AssemblyInformationalVersion` of the running assembly. Useful for confirming a deployment took effect. |
|
|
| `service.configLastReloadUtc` | `DateTimeOffset?` | `ServiceCounters.LastReloadUtc` | Wall-clock time of the most recent **accepted** hot-reload. `null` if no reload has occurred since process start. See [`../Features/HotReload.md`](../Features/HotReload.md). |
|
|
| `service.configReloadCount` | `int` | `ServiceCounters.ReloadAppliedCount` | Number of `appsettings.json` reloads that validated and applied since process start. |
|
|
| `service.configReloadRejectedCount` | `int` | `ServiceCounters.ReloadRejectedCount` | Number of reload attempts rejected by validation. A non-zero value here paired with a stale `configLastReloadUtc` indicates the operator's last edit was malformed and the service is still running the previous config. |
|
|
| `listeners.bound` | `int` | `boundCount` accumulated while iterating `opts.Plcs` | Count of PLC entries whose supervisor currently reports `SupervisorState.Bound`. |
|
|
| `listeners.configured` | `int` | `opts.Plcs.Count` | Total number of PLC entries in the active configuration. |
|
|
|
|
Operator triggers:
|
|
|
|
- `listeners.bound < listeners.configured` for more than one refresh cycle indicates one or more listeners are stuck recovering. Drill into the per-PLC `listener.state` and `listener.lastBindError` fields below.
|
|
- `configReloadRejectedCount` rising means edits are reaching the watcher but failing validation — check the live log for `mbproxy.config.reload.rejected`.
|
|
|
|
## Per-PLC Fields
|
|
|
|
Each entry in `plcs[]` is a `PlcStatus` (see `src/Mbproxy/Admin/StatusDto.cs`). The builder iterates `opts.Plcs` in configured order, looks up the matching supervisor in `ProxyWorker.Supervisors`, and projects the supervisor's `CurrentCounters.Snapshot()` into wire fields.
|
|
|
|
### Identity
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `name` | `string` | `PlcOptions.Name` | Stable identifier from `appsettings.json`. Used as the dictionary key for supervisor lookup. |
|
|
| `host` | `string` | `PlcOptions.Host` | Backend PLC host (IP or DNS name) the proxy connects out to. |
|
|
| `listenPort` | `int` | `PlcOptions.ListenPort` | Local TCP port the proxy binds for upstream clients connecting *to* the proxy. |
|
|
|
|
### Listener state
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `listener.state` | `string` | `SupervisorSnapshot.State` mapped to `"bound"` / `"recovering"` / `"stopped"` | Current supervisor state. `bound` = TCP listener is accepting connections; `recovering` = Polly retry loop is trying to re-bind after a fault; `stopped` = no supervisor entry (typically a PLC that was just added and not yet started). |
|
|
| `listener.lastBindError` | `string?` | `SupervisorSnapshot.LastBindError` | Message from the last bind exception. Populated whenever `state == "recovering"`. Common values: `"Address already in use"`, `"Permission denied"`. |
|
|
| `listener.recoveryAttempts` | `int` | `SupervisorSnapshot.RecoveryAttempts` | Number of bind retries since the supervisor entered recovery. Resets on a successful bind. A monotonically rising value indicates the underlying problem is persistent. |
|
|
|
|
### Client tracking
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `clients.connected` | `int` | `clientSnapshots.Count` | Number of currently-connected upstream clients. Capped by the H2-ECOM100 four-client ceiling; values at 4 imply additional upstream connect attempts will be refused by the PLC. |
|
|
| `clients.remoteEndpoints[].remote` | `string` | `UpstreamPipe.RemoteEp` | Upstream TCP endpoint as `ip:port`. |
|
|
| `clients.remoteEndpoints[].connectedAtUtc` | `DateTimeOffset` | `UpstreamPipe.ConnectedAtUtc` | Wall-clock time the upstream socket was accepted. Useful for spotting zombie sockets that survived a network outage. |
|
|
| `clients.remoteEndpoints[].pdusForwarded` | `long` | `UpstreamPipe.PdusForwardedCount` | PDUs forwarded on this specific upstream pipe since it connected. Lets you see which client is responsible for what fraction of fleet traffic. |
|
|
|
|
### PDU traffic
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `pdus.forwarded` | `long` | `CounterSnapshot.PdusForwarded` | Total PDUs (requests + responses) that traversed the proxy for this PLC since start. Increments once per PDU handed to the rewriter. |
|
|
| `pdus.byFc.fc03` | `long` | `CounterSnapshot.Fc03` | Count of FC03 (read holding registers) requests seen. |
|
|
| `pdus.byFc.fc04` | `long` | `CounterSnapshot.Fc04` | Count of FC04 (read input registers) requests seen. |
|
|
| `pdus.byFc.fc06` | `long` | `CounterSnapshot.Fc06` | Count of FC06 (write single register) requests seen. |
|
|
| `pdus.byFc.fc16` | `long` | `CounterSnapshot.Fc16` | Count of FC16 (write multiple registers) requests seen. |
|
|
| `pdus.byFc.other` | `long` | `CounterSnapshot.FcOther` | All other function codes (FC01/02/05/15, diagnostic codes, etc.) seen. The proxy forwards these untouched. |
|
|
| `pdus.rewrittenSlots` | `long` | `CounterSnapshot.RewrittenSlots` | Number of register slots the BCD rewriter touched, counting reads and writes. Indicates how much of the traffic actually hits BCD-configured addresses. See [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md). |
|
|
| `pdus.partialBcdWarnings` | `long` | `CounterSnapshot.PartialBcdWarnings` | Count of requests whose `[start, qty)` range partially overlapped a 32-bit BCD tag without fully covering its CDAB word pair. A rising value here is an operator signal: an upstream client is requesting partial-overlap reads, which the proxy cannot rewrite safely — review tag-list addresses or fix the client's request shape. |
|
|
|
|
### Backend health
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.connectsSuccess` | `long` | `CounterSnapshot.ConnectsSuccess` | Successful backend TCP connects since start. Increments once per accepted upstream client (the proxy opens one backend socket per upstream client). |
|
|
| `backend.connectsFailed` | `long` | `CounterSnapshot.ConnectsFailed` | Failed backend TCP connects after the Polly retry budget is exhausted (3 attempts at 100/500/2000 ms). A rising counter means the backend host is unreachable or the PLC is at its connection cap. |
|
|
| `backend.exceptionsByCode.code01` | `long` | `CounterSnapshot.BackendException01` | Count of Modbus exception responses with code 01 (Illegal Function) received from the PLC. Typically indicates a client is sending function codes the PLC does not support. |
|
|
| `backend.exceptionsByCode.code02` | `long` | `CounterSnapshot.BackendException02` | Code 02 (Illegal Data Address) — the requested register range is out of the PLC's V-memory map. |
|
|
| `backend.exceptionsByCode.code03` | `long` | `CounterSnapshot.BackendException03` | Code 03 (Illegal Data Value) — quantity exceeds the PLC's per-FC cap (FC03/04 = 128 registers, FC16 = 100). |
|
|
| `backend.exceptionsByCode.code04` | `long` | `CounterSnapshot.BackendException04` | Code 04 (Server Device Failure) — internal PLC fault, often correlated with the PLC entering STOP mode. |
|
|
| `backend.lastRoundTripMs` | `double` | `CounterSnapshot.LastRoundTripMs` | Exponentially-weighted moving average of recent successful request → response round-trip times in milliseconds. Tracks PLC responsiveness; sustained values above the historical baseline indicate backend latency degradation. |
|
|
|
|
### Multiplexer state
|
|
|
|
These five fields describe the per-PLC backend multiplexer. See [`../Architecture/ConnectionModel.md`](../Architecture/ConnectionModel.md) for the design rationale and how transaction-id (TxId) reuse and queueing work.
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.inFlight` | `long` | `CounterSnapshot.InFlightCount` | Number of MBAP transactions currently in flight on the backend socket (request sent, response pending). |
|
|
| `backend.maxInFlight` | `long` | `CounterSnapshot.MaxInFlight` | High-water mark of `inFlight` since start. Used to size the queue and to verify the multiplexer is in fact pipelining requests. |
|
|
| `backend.txIdWraps` | `long` | `CounterSnapshot.TxIdWraps` | Times the 16-bit MBAP transaction-id allocator has wrapped through `0xFFFF`. A rising rate quantifies sustained request volume. |
|
|
| `backend.disconnectCascades` | `long` | `CounterSnapshot.BackendDisconnectCascades` | Times a backend disconnect cascaded into closing all upstream pipes that were waiting on in-flight TxIds. Each cascade aborts every queued request bound for that PLC. |
|
|
| `backend.queueDepth` | `long` | `CounterSnapshot.BackendQueueDepth` | Current count of requests queued behind the multiplexer's TxId allocator and write semaphore. A sustained non-zero queue means the multiplexer is the bottleneck (backend slower than upstream demand). |
|
|
|
|
### Coalescing counters
|
|
|
|
These fields describe duplicate-read coalescing on FC03/FC04. See [`../Architecture/ReadCoalescing.md`](../Architecture/ReadCoalescing.md) for the matching criteria and lifecycle.
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.coalescedHitCount` | `long` | `CounterSnapshot.CoalescedHitCount` | Reads that attached to an already-in-flight identical read instead of issuing a new backend request. |
|
|
| `backend.coalescedMissCount` | `long` | `CounterSnapshot.CoalescedMissCount` | Reads that did not find a matching in-flight request and issued their own. The dashboard-side ratio is `hit / (hit + miss)`; the wire format intentionally does **not** carry the derived ratio (consumers compute it). |
|
|
| `backend.coalescedResponseToDeadUpstream` | `long` | `CounterSnapshot.CoalescedResponseToDeadUpstream` | Coalesced responses that arrived after their attached upstream pipe had closed. Normal in bursty traffic; sustained growth indicates upstream clients are aborting too quickly. |
|
|
|
|
### Cache counters
|
|
|
|
These fields describe the short-TTL response cache for FC03/FC04. See [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md).
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.cacheHitCount` | `long` | `CounterSnapshot.CacheHitCount` | Reads served from the cache without touching the backend at all. |
|
|
| `backend.cacheMissCount` | `long` | `CounterSnapshot.CacheMissCount` | Cache-eligible reads that fell through to the backend. The derived `cacheHitRatio` is `hit / (hit + miss)`; like coalescing, it is **not** carried on the wire. |
|
|
| `backend.cacheInvalidations` | `long` | `CounterSnapshot.CacheInvalidations` | Times a write (FC06/FC16) invalidated overlapping cache entries on this PLC. A high invalidation rate relative to writes means write coverage is broad and the cache is doing less work. |
|
|
|
|
### Cache memory-watch
|
|
|
|
These two fields are Tier-2 KPIs intended for memory-budget alerts. The cache is per-PLC; the dashboard aggregates these across the fleet.
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.cacheEntryCount` | `long` | `CounterSnapshot.CacheEntryCount` | Current number of cached response entries for this PLC. |
|
|
| `backend.cacheBytes` | `long` | `CounterSnapshot.CacheBytes` | Approximate byte cost of the cache entries (response payloads plus key overhead). Used to detect runaway growth from a chatty client. |
|
|
|
|
### Keepalive counters
|
|
|
|
These fields describe the backend keepalive heartbeat. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `backend.backendHeartbeatsSent` | `long` | `CounterSnapshot.BackendHeartbeatsSent` | Synthetic FC03 heartbeat probes issued on this PLC's idle backend socket. |
|
|
| `backend.backendHeartbeatsFailed` | `long` | `CounterSnapshot.BackendHeartbeatsFailed` | Heartbeat probes not answered within `BackendRequestTimeoutMs`. Each failure tears the backend down. |
|
|
| `backend.backendIdleDisconnects` | `long` | `CounterSnapshot.BackendIdleDisconnects` | Backend teardowns triggered by a failed heartbeat — an event count, distinct from `disconnectCascades` (which counts cascaded pipes). Sustained growth means a PLC is repeatedly going dark while idle. |
|
|
|
|
### Bytes
|
|
|
|
| JSON path | Type | Source | Meaning |
|
|
|---|---|---|---|
|
|
| `bytes.upstreamIn` | `long` | `CounterSnapshot.BytesUpstreamIn` | Total bytes read from upstream client sockets bound to this PLC since start. |
|
|
| `bytes.upstreamOut` | `long` | `CounterSnapshot.BytesUpstreamOut` | Total bytes written back to upstream client sockets bound to this PLC since start. |
|
|
|
|
## Counter Atomicity
|
|
|
|
All counters are `System.Threading.Interlocked` longs. Each read in `StatusSnapshotBuilder.Build()` is atomic per field; no locks are held across the snapshot build, and the build itself does no I/O.
|
|
|
|
The practical consequence: a single `/status.json` request returns a coherent value for any **one** counter, but the assembled response is **not** a globally consistent snapshot — different per-PLC counters may straddle increments by microseconds. For example, `pdus.forwarded` for PLC A and `pdus.forwarded` for PLC B are not guaranteed to reflect the same instant. This is acceptable for dashboards and rate calculations; do not use these counters for fine-grained accounting.
|
|
|
|
## Example JSON Response
|
|
|
|
A representative two-PLC deployment, ~2 hours into a run:
|
|
|
|
```json
|
|
{
|
|
"service": {
|
|
"uptimeSeconds": 7234,
|
|
"version": "1.0.0",
|
|
"configLastReloadUtc": "2026-05-13T14:02:11+00:00",
|
|
"configReloadCount": 2,
|
|
"configReloadRejectedCount": 0
|
|
},
|
|
"listeners": {
|
|
"bound": 2,
|
|
"configured": 2
|
|
},
|
|
"plcs": [
|
|
{
|
|
"name": "line1-press",
|
|
"host": "10.20.30.41",
|
|
"listenPort": 5021,
|
|
"listener": {
|
|
"state": "bound",
|
|
"lastBindError": null,
|
|
"recoveryAttempts": 0
|
|
},
|
|
"clients": {
|
|
"connected": 2,
|
|
"remoteEndpoints": [
|
|
{
|
|
"remote": "10.20.40.10:51223",
|
|
"connectedAtUtc": "2026-05-13T12:01:55+00:00",
|
|
"pdusForwarded": 184213
|
|
},
|
|
{
|
|
"remote": "10.20.40.11:53901",
|
|
"connectedAtUtc": "2026-05-13T13:30:02+00:00",
|
|
"pdusForwarded": 41008
|
|
}
|
|
]
|
|
},
|
|
"pdus": {
|
|
"forwarded": 225221,
|
|
"byFc": {
|
|
"fc03": 218904,
|
|
"fc04": 0,
|
|
"fc06": 12,
|
|
"fc16": 6203,
|
|
"other": 102
|
|
},
|
|
"rewrittenSlots": 1318622,
|
|
"partialBcdWarnings": 0
|
|
},
|
|
"backend": {
|
|
"connectsSuccess": 2,
|
|
"connectsFailed": 0,
|
|
"exceptionsByCode": {
|
|
"code01": 0,
|
|
"code02": 14,
|
|
"code03": 0,
|
|
"code04": 0
|
|
},
|
|
"lastRoundTripMs": 12.4,
|
|
"inFlight": 1,
|
|
"maxInFlight": 4,
|
|
"txIdWraps": 3,
|
|
"disconnectCascades": 0,
|
|
"queueDepth": 0,
|
|
"coalescedHitCount": 41892,
|
|
"coalescedMissCount": 177012,
|
|
"coalescedResponseToDeadUpstream": 7,
|
|
"cacheHitCount": 88321,
|
|
"cacheMissCount": 88691,
|
|
"cacheInvalidations": 6203,
|
|
"cacheEntryCount": 47,
|
|
"cacheBytes": 18512,
|
|
"backendHeartbeatsSent": 412,
|
|
"backendHeartbeatsFailed": 0,
|
|
"backendIdleDisconnects": 0
|
|
},
|
|
"bytes": {
|
|
"upstreamIn": 4108290,
|
|
"upstreamOut": 12993021
|
|
}
|
|
},
|
|
{
|
|
"name": "line2-oven",
|
|
"host": "10.20.30.42",
|
|
"listenPort": 5022,
|
|
"listener": {
|
|
"state": "recovering",
|
|
"lastBindError": "Address already in use",
|
|
"recoveryAttempts": 12
|
|
},
|
|
"clients": {
|
|
"connected": 0,
|
|
"remoteEndpoints": []
|
|
},
|
|
"pdus": {
|
|
"forwarded": 0,
|
|
"byFc": { "fc03": 0, "fc04": 0, "fc06": 0, "fc16": 0, "other": 0 },
|
|
"rewrittenSlots": 0,
|
|
"partialBcdWarnings": 0
|
|
},
|
|
"backend": {
|
|
"connectsSuccess": 0,
|
|
"connectsFailed": 0,
|
|
"exceptionsByCode": { "code01": 0, "code02": 0, "code03": 0, "code04": 0 },
|
|
"lastRoundTripMs": 0.0,
|
|
"inFlight": 0,
|
|
"maxInFlight": 0,
|
|
"txIdWraps": 0,
|
|
"disconnectCascades": 0,
|
|
"queueDepth": 0,
|
|
"coalescedHitCount": 0,
|
|
"coalescedMissCount": 0,
|
|
"coalescedResponseToDeadUpstream": 0,
|
|
"cacheHitCount": 0,
|
|
"cacheMissCount": 0,
|
|
"cacheInvalidations": 0,
|
|
"cacheEntryCount": 0,
|
|
"cacheBytes": 0,
|
|
"backendHeartbeatsSent": 0,
|
|
"backendHeartbeatsFailed": 0,
|
|
"backendIdleDisconnects": 0
|
|
},
|
|
"bytes": { "upstreamIn": 0, "upstreamOut": 0 }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## HTML Page Layout
|
|
|
|
The HTML renderer is `StatusHtmlRenderer.Render(StatusResponse)` in `src/Mbproxy/Admin/StatusHtmlRenderer.cs`. The page is one document, inline CSS in a `<style>` block, no external resources of any kind — operators can serve it behind a corporate firewall without whitelisting a CDN.
|
|
|
|
Structure:
|
|
|
|
1. **Header summary** — version, formatted uptime (`Nh MMm SSs`), `bound/configured` listener tally, last reload timestamp, reload count with a `(N rejected)` suffix when applicable.
|
|
2. **PLC table** — one row per configured PLC. Columns: Name, Host, Port, State (colour-coded — `bound` = green, `recovering` = orange, `stopped` = grey), Clients (count plus a comma-separated list of `remote (N PDUs)`), PDUs forwarded, FC03/FC04/FC06/FC16/FC? counts, BCD slots, Partial BCD, exception codes 01/02/03/04, RTT (ms), bytes in/out, multiplexer columns (in-flight, max in-flight, TxId wraps, cascades, queue), coalescing ratio cell, cache ratio cell, keepalive cell.
|
|
3. **State cell error detail** — when `state == "recovering"`, the cell also shows `lastBindError` and `(attempt N)` in a small red span.
|
|
|
|
The coalescing and cache cells each render as `<pct>% (<hits>)`. When neither has been exercised (`hit + miss == 0`), the cell renders an em-dash to keep the column narrow. The keepalive cell shows the heartbeat-sent count, with `(fail N, idle-disc N)` appended only when either is non-zero. Page weight is bounded by the design budget (≤ 50 KB for a 54-PLC fleet).
|
|
|
|
The page does not depend on JavaScript. Refresh is driven entirely by the `<meta http-equiv="refresh" content="5">` tag, so any browser — including text-mode browsers — sees the same view.
|
|
|
|
## How to Scrape It
|
|
|
|
The JSON twin is plain HTTP. Any monitoring system that can curl an endpoint can scrape it.
|
|
|
|
PowerShell, pulling the cache hit ratio for the first PLC into a variable:
|
|
|
|
```powershell
|
|
$snap = Invoke-WebRequest -Uri "http://mbproxy-host:8080/status.json" -UseBasicParsing |
|
|
Select-Object -ExpandProperty Content |
|
|
ConvertFrom-Json
|
|
|
|
$plc = $snap.plcs[0]
|
|
$hits = $plc.backend.cacheHitCount
|
|
$total = $hits + $plc.backend.cacheMissCount
|
|
$ratio = if ($total -gt 0) { [math]::Round(100.0 * $hits / $total, 1) } else { 0.0 }
|
|
|
|
"PLC $($plc.name): cache hit ratio = $ratio% over $total reads"
|
|
```
|
|
|
|
Bash with `curl` and `jq`, fanning out across the fleet:
|
|
|
|
```bash
|
|
curl -s http://mbproxy-host:8080/status.json |
|
|
jq -r '.plcs[] | "\(.name)\t\(.listener.state)\t\(.backend.lastRoundTripMs)"'
|
|
```
|
|
|
|
Prometheus-style scrapers should poll `/status.json` directly and translate fields into their own metric names; the service does not expose Prometheus exposition format.
|
|
|
|
## Scope of This Document
|
|
|
|
This document covers the **endpoint surface**: what is on the wire and how each field is computed. When a new counter is added, list it here.
|
|
|
|
## Related Documentation
|
|
|
|
- [`../Architecture/ConnectionModel.md`](../Architecture/ConnectionModel.md) — multiplexer counter meanings (`inFlight`, `maxInFlight`, `txIdWraps`, `queueDepth`, `disconnectCascades`).
|
|
- [`../Architecture/ReadCoalescing.md`](../Architecture/ReadCoalescing.md) — coalescing counter meanings and matching criteria.
|
|
- [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md) — cache counter meanings, TTL, invalidation rules.
|
|
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — what increments `rewrittenSlots` and `partialBcdWarnings`.
|
|
- [`../Features/HotReload.md`](../Features/HotReload.md) — what increments `configReloadCount` vs. `configReloadRejectedCount`.
|
|
- [`./Configuration.md`](./Configuration.md) — `Mbproxy.AdminPort` and other option keys.
|
|
- [`./Troubleshooting.md`](./Troubleshooting.md) — using these counters to diagnose specific failure modes.
|
|
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — event-id catalogue including `mbproxy.admin.bind.failed`.
|