Files
wwtools/mbproxy/docs/Operations/StatusPage.md
T
Joseph Doherty 374eecd205 mbproxy: fix the dashboard's C2/M-series review findings
Closes the on-demand-capture leak cluster from the code review. The capture's armed state was driven off SignalR's ConnectionId, which changes on every transport reconnect, so a reconnect-during-view leaked a subscriber and left the capture armed forever with no viewer. PlcSubscriptionTracker now keys on a stable per-page-load tabId, and StatusBroadcaster reconciles capture arm state from the live viewer set each push cycle — making arming single-threaded and reconnect-safe. Also fixes the TagValueCapture disarm-vs-record race, the bind-failure broadcaster/listener leak, removes dead JSON-context code, and reworks the frontend cold-start retry plus an unknown-PLC watchdog. Adds tracker / broadcaster-loop / race / wire-shape test coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:12:43 -04:00

375 lines
26 KiB
Markdown

# Status Page
The status page is the operator-facing view of the running service: a live web dashboard backed by SignalR, plus a JSON twin at `GET /status.json` that monitoring scrapers consume. This document describes the endpoint surface, every wire-level field, and how counters map back to architecture decisions.
## Endpoint Surface
The admin endpoint is owned by `AdminEndpointHost` (see `src/Mbproxy/Admin/AdminEndpointHost.cs`). It exposes:
- `GET /` — the **fleet dashboard** SPA shell: aggregate fleet health cards and a filterable/sortable per-PLC KPI table.
- `GET /plc/{name}` — the **connection-detail** SPA shell for one PLC: every per-PLC counter grouped into readable cards, the connected-client list, and a real-time debug view (per-tag PLC-side raw BCD vs. client-side decoded value).
- `GET /assets/{path}` — embedded static assets: Bootstrap 5, the SignalR JS client, two vendored IBM Plex woff2 fonts, and the dashboard's own HTML/CSS/JS. Everything is embedded in the binary; nothing is fetched from a CDN, so the UI works on a firewalled network. Served with a long immutable cache header.
- `GET /status.json` — the same in-memory snapshot serialized as JSON via the source-generated `StatusJsonContext` (camelCase property names).
- `/hub/status` — the SignalR hub. The two SPA shells open a hub connection and subscribe: the dashboard to the `fleet` group, a detail page to its `plc:{name}` group. A `StatusBroadcaster` loop pushes a fresh snapshot every `Mbproxy.AdminPushIntervalMs` (default 1000 ms).
The endpoint is **read-only**. There are no admin actions exposed — no kick-client, no force-reload, no listener restart, no log download. The detail-page debug view is the one feature with a runtime side effect, and it is benign and read-only: a PLC's tag-value capture is *armed* (begins recording last-seen values) only while at least one detail page is subscribed to it, and *disarmed* when the last viewer leaves. Reload happens automatically via `IOptionsMonitor`; listener recovery is owned by the supervisor. Authentication lives at the network layer: the service binds to `IPAddress.Any` on the admin port and assumes the deployment runs in a trusted internal segment behind a firewall.
`GET /status.json` and every SignalR push call `StatusSnapshotBuilder.Build()`. The builder reads atomic counters directly from the supervisor map and per-PLC `ProxyCounters`; it holds no locks and performs no I/O.
## Port and Configuration
The listen port is read from `Mbproxy.AdminPort` and defaults to `8080`. Configuration semantics for this key live in [`./Configuration.md`](./Configuration.md).
If Kestrel cannot bind the configured port at startup (port already in use, missing permissions on a reserved range, etc.) the host logs `mbproxy.admin.bind.failed` at `Error` level with the underlying reason. The host then sets `_app = null` and returns — the rest of the service keeps running. The Modbus listener supervisors are completely independent of the admin endpoint, so a bind failure here is non-fatal for proxying. See [`../Reference/LogEvents.md`](../Reference/LogEvents.md) for the event-id catalogue.
If `Mbproxy.AdminPort` changes via hot-reload, the currently-running Kestrel app is stopped (2 s deadline) and a new one is started on the new port. Other config changes do not touch the admin endpoint.
## Service-Wide Fields
Top-level fields come from `ServiceFields` and `ListenersAggregate` in `src/Mbproxy/Admin/StatusDto.cs`.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `service.uptimeSeconds` | `long` | `ServiceFields.UptimeSeconds` | Seconds since process start, computed as `now - ServiceCounters.StartedAtUtc` at snapshot time. |
| `service.version` | `string` | `ServiceFields.Version` via `AssemblyVersionAccessor` | `AssemblyInformationalVersion` of the running assembly. Useful for confirming a deployment took effect. |
| `service.configLastReloadUtc` | `DateTimeOffset?` | `ServiceCounters.LastReloadUtc` | Wall-clock time of the most recent **accepted** hot-reload. `null` if no reload has occurred since process start. See [`../Features/HotReload.md`](../Features/HotReload.md). |
| `service.configReloadCount` | `int` | `ServiceCounters.ReloadAppliedCount` | Number of `appsettings.json` reloads that validated and applied since process start. |
| `service.configReloadRejectedCount` | `int` | `ServiceCounters.ReloadRejectedCount` | Number of reload attempts rejected by validation. A non-zero value here paired with a stale `configLastReloadUtc` indicates the operator's last edit was malformed and the service is still running the previous config. |
| `listeners.bound` | `int` | `boundCount` accumulated while iterating `opts.Plcs` | Count of PLC entries whose supervisor currently reports `SupervisorState.Bound`. |
| `listeners.configured` | `int` | `opts.Plcs.Count` | Total number of PLC entries in the active configuration. |
Operator triggers:
- `listeners.bound < listeners.configured` for more than one refresh cycle indicates one or more listeners are stuck recovering. Drill into the per-PLC `listener.state` and `listener.lastBindError` fields below.
- `configReloadRejectedCount` rising means edits are reaching the watcher but failing validation — check the live log for `mbproxy.config.reload.rejected`.
## Per-PLC Fields
Each entry in `plcs[]` is a `PlcStatus` (see `src/Mbproxy/Admin/StatusDto.cs`). The builder iterates `opts.Plcs` in configured order, looks up the matching supervisor in `ProxyWorker.Supervisors`, and projects the supervisor's `CurrentCounters.Snapshot()` into wire fields.
### Identity
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `name` | `string` | `PlcOptions.Name` | Stable identifier from `appsettings.json`. Used as the dictionary key for supervisor lookup. |
| `host` | `string` | `PlcOptions.Host` | Backend PLC host (IP or DNS name) the proxy connects out to. |
| `listenPort` | `int` | `PlcOptions.ListenPort` | Local TCP port the proxy binds for upstream clients connecting *to* the proxy. |
### Listener state
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `listener.state` | `string` | `SupervisorSnapshot.State` mapped to `"bound"` / `"recovering"` / `"stopped"` | Current supervisor state. `bound` = TCP listener is accepting connections; `recovering` = Polly retry loop is trying to re-bind after a fault; `stopped` = no supervisor entry (typically a PLC that was just added and not yet started). |
| `listener.lastBindError` | `string?` | `SupervisorSnapshot.LastBindError` | Message from the last bind exception. Populated whenever `state == "recovering"`. Common values: `"Address already in use"`, `"Permission denied"`. |
| `listener.recoveryAttempts` | `int` | `SupervisorSnapshot.RecoveryAttempts` | Number of bind retries since the supervisor entered recovery. Resets on a successful bind. A monotonically rising value indicates the underlying problem is persistent. |
### Client tracking
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `clients.connected` | `int` | `clientSnapshots.Count` | Number of currently-connected upstream clients. Capped by the H2-ECOM100 four-client ceiling; values at 4 imply additional upstream connect attempts will be refused by the PLC. |
| `clients.remoteEndpoints[].remote` | `string` | `UpstreamPipe.RemoteEp` | Upstream TCP endpoint as `ip:port`. |
| `clients.remoteEndpoints[].connectedAtUtc` | `DateTimeOffset` | `UpstreamPipe.ConnectedAtUtc` | Wall-clock time the upstream socket was accepted. Useful for spotting zombie sockets that survived a network outage. |
| `clients.remoteEndpoints[].pdusForwarded` | `long` | `UpstreamPipe.PdusForwardedCount` | PDUs forwarded on this specific upstream pipe since it connected. Lets you see which client is responsible for what fraction of fleet traffic. |
### PDU traffic
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `pdus.forwarded` | `long` | `CounterSnapshot.PdusForwarded` | Total PDUs (requests + responses) that traversed the proxy for this PLC since start. Increments once per PDU handed to the rewriter. |
| `pdus.byFc.fc03` | `long` | `CounterSnapshot.Fc03` | Count of FC03 (read holding registers) requests seen. |
| `pdus.byFc.fc04` | `long` | `CounterSnapshot.Fc04` | Count of FC04 (read input registers) requests seen. |
| `pdus.byFc.fc06` | `long` | `CounterSnapshot.Fc06` | Count of FC06 (write single register) requests seen. |
| `pdus.byFc.fc16` | `long` | `CounterSnapshot.Fc16` | Count of FC16 (write multiple registers) requests seen. |
| `pdus.byFc.other` | `long` | `CounterSnapshot.FcOther` | All other function codes (FC01/02/05/15, diagnostic codes, etc.) seen. The proxy forwards these untouched. |
| `pdus.rewrittenSlots` | `long` | `CounterSnapshot.RewrittenSlots` | Number of register slots the BCD rewriter touched, counting reads and writes. Indicates how much of the traffic actually hits BCD-configured addresses. See [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md). |
| `pdus.partialBcdWarnings` | `long` | `CounterSnapshot.PartialBcdWarnings` | Count of requests whose `[start, qty)` range partially overlapped a 32-bit BCD tag without fully covering its CDAB word pair. A rising value here is an operator signal: an upstream client is requesting partial-overlap reads, which the proxy cannot rewrite safely — review tag-list addresses or fix the client's request shape. |
### Backend health
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.connectsSuccess` | `long` | `CounterSnapshot.ConnectsSuccess` | Successful backend TCP connects since start. Increments once per accepted upstream client (the proxy opens one backend socket per upstream client). |
| `backend.connectsFailed` | `long` | `CounterSnapshot.ConnectsFailed` | Failed backend TCP connects after the Polly retry budget is exhausted (3 attempts at 100/500/2000 ms). A rising counter means the backend host is unreachable or the PLC is at its connection cap. |
| `backend.exceptionsByCode.code01` | `long` | `CounterSnapshot.BackendException01` | Count of Modbus exception responses with code 01 (Illegal Function) received from the PLC. Typically indicates a client is sending function codes the PLC does not support. |
| `backend.exceptionsByCode.code02` | `long` | `CounterSnapshot.BackendException02` | Code 02 (Illegal Data Address) — the requested register range is out of the PLC's V-memory map. |
| `backend.exceptionsByCode.code03` | `long` | `CounterSnapshot.BackendException03` | Code 03 (Illegal Data Value) — quantity exceeds the PLC's per-FC cap (FC03/04 = 128 registers, FC16 = 100). |
| `backend.exceptionsByCode.code04` | `long` | `CounterSnapshot.BackendException04` | Code 04 (Server Device Failure) — internal PLC fault, often correlated with the PLC entering STOP mode. |
| `backend.lastRoundTripMs` | `double` | `CounterSnapshot.LastRoundTripMs` | Exponentially-weighted moving average of recent successful request → response round-trip times in milliseconds. Tracks PLC responsiveness; sustained values above the historical baseline indicate backend latency degradation. |
### Multiplexer state
These five fields describe the per-PLC backend multiplexer. See [`../Architecture/ConnectionModel.md`](../Architecture/ConnectionModel.md) for the design rationale and how transaction-id (TxId) reuse and queueing work.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.inFlight` | `long` | `CounterSnapshot.InFlightCount` | Number of MBAP transactions currently in flight on the backend socket (request sent, response pending). |
| `backend.maxInFlight` | `long` | `CounterSnapshot.MaxInFlight` | High-water mark of `inFlight` since start. Used to size the queue and to verify the multiplexer is in fact pipelining requests. |
| `backend.txIdWraps` | `long` | `CounterSnapshot.TxIdWraps` | Times the 16-bit MBAP transaction-id allocator has wrapped through `0xFFFF`. A rising rate quantifies sustained request volume. |
| `backend.disconnectCascades` | `long` | `CounterSnapshot.BackendDisconnectCascades` | Times a backend disconnect cascaded into closing all upstream pipes that were waiting on in-flight TxIds. Each cascade aborts every queued request bound for that PLC. |
| `backend.queueDepth` | `long` | `CounterSnapshot.BackendQueueDepth` | Current count of requests queued behind the multiplexer's TxId allocator and write semaphore. A sustained non-zero queue means the multiplexer is the bottleneck (backend slower than upstream demand). |
### Coalescing counters
These fields describe duplicate-read coalescing on FC03/FC04. See [`../Architecture/ReadCoalescing.md`](../Architecture/ReadCoalescing.md) for the matching criteria and lifecycle.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.coalescedHitCount` | `long` | `CounterSnapshot.CoalescedHitCount` | Reads that attached to an already-in-flight identical read instead of issuing a new backend request. |
| `backend.coalescedMissCount` | `long` | `CounterSnapshot.CoalescedMissCount` | Reads that did not find a matching in-flight request and issued their own. The dashboard-side ratio is `hit / (hit + miss)`; the wire format intentionally does **not** carry the derived ratio (consumers compute it). |
| `backend.coalescedResponseToDeadUpstream` | `long` | `CounterSnapshot.CoalescedResponseToDeadUpstream` | Coalesced responses that arrived after their attached upstream pipe had closed. Normal in bursty traffic; sustained growth indicates upstream clients are aborting too quickly. |
### Cache counters
These fields describe the short-TTL response cache for FC03/FC04. See [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md).
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.cacheHitCount` | `long` | `CounterSnapshot.CacheHitCount` | Reads served from the cache without touching the backend at all. |
| `backend.cacheMissCount` | `long` | `CounterSnapshot.CacheMissCount` | Cache-eligible reads that fell through to the backend. The derived `cacheHitRatio` is `hit / (hit + miss)`; like coalescing, it is **not** carried on the wire. |
| `backend.cacheInvalidations` | `long` | `CounterSnapshot.CacheInvalidations` | Times a write (FC06/FC16) invalidated overlapping cache entries on this PLC. A high invalidation rate relative to writes means write coverage is broad and the cache is doing less work. |
### Cache memory-watch
These two fields are Tier-2 KPIs intended for memory-budget alerts. The cache is per-PLC; the dashboard aggregates these across the fleet.
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.cacheEntryCount` | `long` | `CounterSnapshot.CacheEntryCount` | Current number of cached response entries for this PLC. |
| `backend.cacheBytes` | `long` | `CounterSnapshot.CacheBytes` | Approximate byte cost of the cache entries (response payloads plus key overhead). Used to detect runaway growth from a chatty client. |
### Keepalive counters
These fields describe the backend keepalive heartbeat. See [`../Architecture/Keepalive.md`](../Architecture/Keepalive.md).
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `backend.backendHeartbeatsSent` | `long` | `CounterSnapshot.BackendHeartbeatsSent` | Synthetic FC03 heartbeat probes issued on this PLC's idle backend socket. |
| `backend.backendHeartbeatsFailed` | `long` | `CounterSnapshot.BackendHeartbeatsFailed` | Heartbeat probes not answered within `BackendRequestTimeoutMs`. Each failure tears the backend down. |
| `backend.backendIdleDisconnects` | `long` | `CounterSnapshot.BackendIdleDisconnects` | Backend teardowns triggered by a failed heartbeat — an event count, distinct from `disconnectCascades` (which counts cascaded pipes). Sustained growth means a PLC is repeatedly going dark while idle. |
### Bytes
| JSON path | Type | Source | Meaning |
|---|---|---|---|
| `bytes.upstreamIn` | `long` | `CounterSnapshot.BytesUpstreamIn` | Total bytes read from upstream client sockets bound to this PLC since start. |
| `bytes.upstreamOut` | `long` | `CounterSnapshot.BytesUpstreamOut` | Total bytes written back to upstream client sockets bound to this PLC since start. |
## Counter Atomicity
All counters are `System.Threading.Interlocked` longs. Each read in `StatusSnapshotBuilder.Build()` is atomic per field; no locks are held across the snapshot build, and the build itself does no I/O.
The practical consequence: a single `/status.json` request returns a coherent value for any **one** counter, but the assembled response is **not** a globally consistent snapshot — different per-PLC counters may straddle increments by microseconds. For example, `pdus.forwarded` for PLC A and `pdus.forwarded` for PLC B are not guaranteed to reflect the same instant. This is acceptable for dashboards and rate calculations; do not use these counters for fine-grained accounting.
## Example JSON Response
A representative two-PLC deployment, ~2 hours into a run:
```json
{
"service": {
"uptimeSeconds": 7234,
"version": "1.0.0",
"configLastReloadUtc": "2026-05-13T14:02:11+00:00",
"configReloadCount": 2,
"configReloadRejectedCount": 0
},
"listeners": {
"bound": 2,
"configured": 2
},
"plcs": [
{
"name": "line1-press",
"host": "10.20.30.41",
"listenPort": 5021,
"listener": {
"state": "bound",
"lastBindError": null,
"recoveryAttempts": 0
},
"clients": {
"connected": 2,
"remoteEndpoints": [
{
"remote": "10.20.40.10:51223",
"connectedAtUtc": "2026-05-13T12:01:55+00:00",
"pdusForwarded": 184213
},
{
"remote": "10.20.40.11:53901",
"connectedAtUtc": "2026-05-13T13:30:02+00:00",
"pdusForwarded": 41008
}
]
},
"pdus": {
"forwarded": 225221,
"byFc": {
"fc03": 218904,
"fc04": 0,
"fc06": 12,
"fc16": 6203,
"other": 102
},
"rewrittenSlots": 1318622,
"partialBcdWarnings": 0
},
"backend": {
"connectsSuccess": 2,
"connectsFailed": 0,
"exceptionsByCode": {
"code01": 0,
"code02": 14,
"code03": 0,
"code04": 0
},
"lastRoundTripMs": 12.4,
"inFlight": 1,
"maxInFlight": 4,
"txIdWraps": 3,
"disconnectCascades": 0,
"queueDepth": 0,
"coalescedHitCount": 41892,
"coalescedMissCount": 177012,
"coalescedResponseToDeadUpstream": 7,
"cacheHitCount": 88321,
"cacheMissCount": 88691,
"cacheInvalidations": 6203,
"cacheEntryCount": 47,
"cacheBytes": 18512,
"backendHeartbeatsSent": 412,
"backendHeartbeatsFailed": 0,
"backendIdleDisconnects": 0
},
"bytes": {
"upstreamIn": 4108290,
"upstreamOut": 12993021
}
},
{
"name": "line2-oven",
"host": "10.20.30.42",
"listenPort": 5022,
"listener": {
"state": "recovering",
"lastBindError": "Address already in use",
"recoveryAttempts": 12
},
"clients": {
"connected": 0,
"remoteEndpoints": []
},
"pdus": {
"forwarded": 0,
"byFc": { "fc03": 0, "fc04": 0, "fc06": 0, "fc16": 0, "other": 0 },
"rewrittenSlots": 0,
"partialBcdWarnings": 0
},
"backend": {
"connectsSuccess": 0,
"connectsFailed": 0,
"exceptionsByCode": { "code01": 0, "code02": 0, "code03": 0, "code04": 0 },
"lastRoundTripMs": 0.0,
"inFlight": 0,
"maxInFlight": 0,
"txIdWraps": 0,
"disconnectCascades": 0,
"queueDepth": 0,
"coalescedHitCount": 0,
"coalescedMissCount": 0,
"coalescedResponseToDeadUpstream": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"cacheInvalidations": 0,
"cacheEntryCount": 0,
"cacheBytes": 0,
"backendHeartbeatsSent": 0,
"backendHeartbeatsFailed": 0,
"backendIdleDisconnects": 0
},
"bytes": { "upstreamIn": 0, "upstreamOut": 0 }
}
]
}
```
## Web Dashboard
The UI is a Bootstrap 5 single-page app served from embedded assets under `src/Mbproxy/Admin/wwwroot/` (`index.html` / `plc.html` shells, `theme.css` + per-view CSS/JS, vendored Bootstrap / SignalR client / IBM Plex fonts). It is built as vanilla JS — no framework, no build step. Updates arrive over the SignalR `/hub/status` feed (`StatusBroadcaster`, ~1 s cadence); there is no page reload and no JavaScript polling.
### Fleet dashboard (`GET /`)
1. **App bar** — service version, formatted uptime, accepted-reload count, and a live SignalR connection-state pill.
2. **Aggregate strip** — six cards: listeners bound/configured, total connected clients, fleet PDU/s (rate derived client-side from successive snapshots), PLCs in `recovering`, total backend exceptions, fleet cache hit ratio. The recovering / exceptions cards highlight when non-zero.
3. **KPI table** — one row per configured PLC, Tier-1 columns only: PLC name, backend `host:listenPort`, state chip (`bound` green / `recovering` amber / `stopped` grey), clients, PDU/s, RTT ms, exception total, coalesce %, cache %, keepalive. The table is client-side filterable (name/host search, state, "problems only") and sortable. Clicking a row opens that PLC's detail page in a new tab.
### Connection detail (`GET /plc/{name}`)
1. **Identity header** — PLC name, `host:listenPort`, state chip. If the PLC was removed by a hot-reload, a "no longer configured" notice replaces the counter cards.
2. **Grouped counter cards** — every per-PLC counter from the JSON schema above, regrouped for readability: Listener, Clients (with the per-connection list), PDU traffic, Backend health, Multiplexer, Read coalescing, Response cache, Keepalive, Bytes.
3. **Debug view** — a per-tag table showing, for each configured BCD tag, the last raw PLC-side value (BCD nibbles in hex), the decoded client-side value, the direction (read/write), and the age of the observation. The header carries a capture-armed indicator. See *Debug View Data* below.
## Debug View Data
The detail page's debug view is fed by an **on-demand per-tag value capture** (`Proxy/TagValueCapture.cs`, one per PLC, held in `Proxy/TagCaptureRegistry.cs`). The `BcdPduPipeline` records the last raw/decoded value for each configured BCD tag — but only while the capture is *armed*. `StatusBroadcaster` reconciles arm state every push cycle from `PlcSubscriptionTracker`: a PLC's capture is armed exactly while at least one detail-page browser tab is open, and disarmed (clearing all slots) otherwise — so the hot path carries zero cost when nobody is watching. The tracker keys on a stable per-page-load tab id, not the SignalR `ConnectionId`, so a transport reconnect cannot leak an armed capture. The per-PLC payload is `PlcDetailResponse` (`src/Mbproxy/Admin/DebugDto.cs`):
> When the response cache is enabled, an FC03/FC04 **cache hit** bypasses the pipeline. To keep the debug view live for cached tags, each cache entry carries the tag observations captured when it was stored (only when a viewer was armed at that time); a hit replays them into the capture, re-stamped to the hit time. The debug view therefore reflects the value the client actually receives — cache-served reads included — not only backend round-trips.
| JSON path | Type | Meaning |
|---|---|---|
| `plc` | `PlcStatus?` | The standard per-PLC status row, or `null` if the PLC was removed by a hot-reload. |
| `debug.captureArmed` | `bool` | Whether a detail page currently has the capture armed. |
| `debug.tags[].address` | `int` | BCD tag PDU address. |
| `debug.tags[].width` | `int` | 16 or 32. |
| `debug.tags[].name` | `string?` | Optional human-friendly tag label from config (`BcdTags[].Name`); `null` when unset. Shown as the debug-row heading, with the PDU address as sub-text. |
| `debug.tags[].hasValue` | `bool` | `false` until the first observation since the capture was armed. |
| `debug.tags[].direction` | `string` | `"read"` (FC03/FC04) or `"write"` (FC06/FC16). |
| `debug.tags[].rawHex` | `string` | Raw PLC-side value as BCD nibbles — `0xLLLL` (16-bit) or `0xHHHHLLLL` (32-bit). |
| `debug.tags[].decodedValue` | `long` | Decoded binary integer the client reads/wrote. |
| `debug.tags[].updatedAtUtc` | `string?` | ISO-8601 time of the observation; `null` when no traffic yet. |
| `debug.tags[].ageSeconds` | `double?` | Seconds since the observation; `null` when no traffic yet. |
## How to Scrape It
The JSON twin is plain HTTP. Any monitoring system that can curl an endpoint can scrape it.
PowerShell, pulling the cache hit ratio for the first PLC into a variable:
```powershell
$snap = Invoke-WebRequest -Uri "http://mbproxy-host:8080/status.json" -UseBasicParsing |
Select-Object -ExpandProperty Content |
ConvertFrom-Json
$plc = $snap.plcs[0]
$hits = $plc.backend.cacheHitCount
$total = $hits + $plc.backend.cacheMissCount
$ratio = if ($total -gt 0) { [math]::Round(100.0 * $hits / $total, 1) } else { 0.0 }
"PLC $($plc.name): cache hit ratio = $ratio% over $total reads"
```
Bash with `curl` and `jq`, fanning out across the fleet:
```bash
curl -s http://mbproxy-host:8080/status.json |
jq -r '.plcs[] | "\(.name)\t\(.listener.state)\t\(.backend.lastRoundTripMs)"'
```
Prometheus-style scrapers should poll `/status.json` directly and translate fields into their own metric names; the service does not expose Prometheus exposition format.
## Scope of This Document
This document covers the **endpoint surface**: what is on the wire and how each field is computed. When a new counter is added, list it here.
## Related Documentation
- [`../Architecture/ConnectionModel.md`](../Architecture/ConnectionModel.md) — multiplexer counter meanings (`inFlight`, `maxInFlight`, `txIdWraps`, `queueDepth`, `disconnectCascades`).
- [`../Architecture/ReadCoalescing.md`](../Architecture/ReadCoalescing.md) — coalescing counter meanings and matching criteria.
- [`../Architecture/ResponseCache.md`](../Architecture/ResponseCache.md) — cache counter meanings, TTL, invalidation rules.
- [`../Features/BcdRewriting.md`](../Features/BcdRewriting.md) — what increments `rewrittenSlots` and `partialBcdWarnings`.
- [`../Features/HotReload.md`](../Features/HotReload.md) — what increments `configReloadCount` vs. `configReloadRejectedCount`.
- [`./Configuration.md`](./Configuration.md) — `Mbproxy.AdminPort` and other option keys.
- [`./Troubleshooting.md`](./Troubleshooting.md) — using these counters to diagnose specific failure modes.
- [`../Reference/LogEvents.md`](../Reference/LogEvents.md) — event-id catalogue including `mbproxy.admin.bind.failed`.