Files

T

Joseph Doherty 554b05d28c mbproxy: fix dashboard review findings, add named BCD tags + fleet config

Reviewed the new SignalR dashboard and fixed its two top findings: a stored XSS on the connection-detail page (unescaped tag name / direction / timestamp rendered into innerHTML) and FC03/FC04 cache hits bypassing the debug-view capture, which left cached tags frozen while their age climbed. Also adds an optional human-friendly Name to BCD tags surfaced on the debug view, and loads the real fleet config from tags.txt (12 named BCD tags, PLC Z28061) so the published appsettings.json is deploy-ready.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 03:39:39 -04:00

17 KiB

Raw Permalink Blame History

Frontend / Live Web Dashboard — Code Review

Scope: commit e719dd5 ("replace status page with a live SignalR web dashboard"). Files reviewed — all under src/Mbproxy/Admin/wwwroot/: index.html, plc.html, dashboard.js, detail.js, theme.css, dashboard.css, detail.css. Vendored assets (bootstrap.min.css, bootstrap.bundle.min.js, signalr.min.js, *.woff2) explicitly out of scope. Cross-checked against docs/Operations/StatusPage.md (wire contract), CLAUDE.md (mbproxy), and the server side StatusHub.cs / StatusBroadcaster.cs for the SignalR contract.

Summary

The dashboard is well-built vanilla JS: thoughtful escaping helpers, clean rate computation, sensible reconnection wiring, and a correct subscribe-on-reconnect pattern that keeps the on-demand tag capture armed/disarmed correctly.
The single most important finding is a real, exploitable XSS hole on the detail page: detail.js renders t.rawHex, t.address, t.width, t.direction, c.remote, and plc.host straight into innerHTML without escaping. escapeHtml exists in the file but is applied inconsistently — several attacker/PLC-influenceable fields bypass it. The fleet table (dashboard.js) is escaped correctly throughout; the detail page is not. This is a Critical finding.
A second Major issue: on a cold SignalR failure the page enters a setTimeout(start, 3000) retry loop that builds a brand-new HubConnection on every retry — no, it reuses the same connection object, but it never tears down handlers, and combined with withAutomaticReconnect the failure/retry semantics are muddled. There is also no upper bound and no UX past "retrying".
DOM updates are full-innerHTML re-renders of the whole table body every ~1 s. Correct and flicker-free for 54 rows, but it blows away focus/selection and is wasteful; acceptable at this scale, flagged.

Critical findings

C1. Stored/reflected XSS on the connection-detail page — multiple unescaped fields rendered via innerHTML. detail.js defines escapeHtml (line 23) and uses it for some fields but omits it for several others, all of which are interpolated into template strings later assigned to .innerHTML:

detail.js:194 and :203 — the debug-row builder:
```
<td>${t.address} <span class="ratio-sub">${hex4(t.address)}</span></td>
<td>${t.width}-bit</td>
...
<td><span class="dir-tag ${dirCls}">${t.direction}</span></td>
```
t.direction is a server string ("read"/"write" per the schema) interpolated raw into both an attribute-ish context and element text. t.address/t.width are typed int in the documented DTO, so they are lower risk — but the code does not coerce them (Number(...)) or escape them, so it relies entirely on the server DTO type. t.direction is a free string on the wire and is not escaped.
detail.js:86 — the client list:
```
`<div class="client-line">${escapeHtml(c.remote)}` +
`<span class="pdu"> · ${num(c.pdusForwarded)} PDUs · since ${shortTime(c.connectedAtUtc)}</span></div>`
```
c.remote is escaped (good), but shortTime(c.connectedAtUtc) is not — and shortTime (line 36) has a catch { return iso; } branch that returns the raw, unescaped connectedAtUtc string verbatim when new Date(iso) parsing throws or the value is non-ISO. A malformed/attacker-controlled timestamp string therefore lands unescaped inside innerHTML.
detail.js:203 — escapeHtml(t.rawHex) is applied (good), and num(t.decodedValue) is numeric-safe. So rawHex is the one debug field that is handled correctly. (Correcting the brief: rawHex is escaped; direction and the timestamp path are the live holes.)
detail.js:65 — $('plc-sub').textContent = ${plc.host}:${plc.listenPort}; is safe (textContent). But detail.js:14–16 does the same for plcName via textContent — also safe. So the identity header is fine; the holes are specifically the innerHTML card/table builders.

Severity rationale: per docs/Operations/StatusPage.md the admin endpoint binds IPAddress.Any with no authentication ("Authentication lives at the network layer"). PLC name/host and backend-derived strings come off the Modbus wire / appsettings.json and are operator- or device-influenceable. A PLC named or a backend that returns a crafted string can inject <img src=x onerror=...> into any operator's browser session on the trusted segment. Read-only UI does not mean low impact: the injected script runs with the operator's origin.

Concrete fix: route every dynamic value through escapeHtml before it enters an innerHTML template, with no exceptions:

detail.js:194 / :203: wrap t.direction in escapeHtml(...); coerce t.address/t.width with Number(...) (or escape) rather than trusting the DTO type.
detail.js:86: wrap the shortTime(...) result in escapeHtml(...), and/or change shortTime's fallback to escapeHtml(iso) / return '—'.
Add a single escapeAttr helper (as dashboard.js has at line 179) and use it for the dirCls/class interpolation at :197 if dirCls ever becomes data-derived (currently it is a literal, so low priority — but card()'s cls parameter at detail.js:45 is interpolated into class="v ${cls||''}" and is caller-supplied; today all callers pass literals, so it is safe now but fragile).
Better still: stop hand-building HTML. Build rows with document.createElement + textContent/dataset, which makes escaping structural rather than a discipline that one missed call defeats.

Major findings

M1. Cold-start retry loop layers a manual setTimeout retry on top of withAutomaticReconnect, with no bound and a misleading pill. dashboard.js:252–263 and the identical detail.js:236–245:

async function start() {
  try { ...; await connection.start(); await connection.invoke('SubscribeFleet'); ... }
  catch { setConn('disconnected','retrying'); setTimeout(start, 3000); }
}

withAutomaticReconnect only covers a connection that was established and then dropped — it does not retry the initial start(). So the manual loop is needed. But: (a) it retries forever every 3 s with no backoff and no cap — if the hub is permanently gone the browser hammers it indefinitely; (b) if connection.start() succeeds but invoke('SubscribeFleet') throws, the catch calls start() again on an already-started connection, which will throw "cannot start a connection that is not in the Disconnected state" and wedge the retry loop; (c) the pill shows disconnected/retrying which is reasonable, but there is no terminal "giving up" state and no way for an operator to know the difference between "server down" and "server slow". Fix: separate "start the connection" from "subscribe" so a subscribe failure doesn't restart the socket; add capped exponential backoff; guard start() against re-entry when connection.state !== 'Disconnected'.

M2. Detail page never handles "PLC name not in fleet" vs. "hub never delivers". detail.js subscribes with SubscribePlc(plcName) where plcName is taken from location.pathname. If the name is wrong (typo, stale bookmark, or a name that never existed), StatusHub.SubscribePlc happily adds the connection to a plc:{bogus} group and _captureRegistry.Arm is a documented no-op — so the server never sends a plc message and onDetail never fires. The page sits forever on "Waiting for first snapshot…" with a green connected pill. There is no timeout, no "unknown PLC" state. The renderMissing() path (detail.js:168) only triggers when the server does push a payload with detail.plc === null (PLC removed by hot-reload) — it cannot fire for a name that was never configured because nothing is pushed at all. Fix: after connected, start a watchdog (e.g. 3× the push interval); if no plc message arrives, show an "unknown or unreachable PLC" notice.

M3. Fleet PDU/s rate is wrong for the first snapshot after any reconnect, and silently leaks prevPdu/rateByName entries for removed PLCs. dashboard.js:65–77 updateRates keys prevPdu/rateByName by plc.name and never removes entries. On hot-reload removal of a PLC, its stale rate stays in rateByName forever and is still summed into the fleet rate? — no, renderAggregates only iterates s.plcs, so a removed PLC drops out of the sum; but the Map still grows unboundedly across the process lifetime if PLCs churn. More importantly, after a SignalR reconnect the counters are cumulative since service start, so cur - prev.forwarded across a multi-second reconnect gap produces a correct (if coarse) rate — that part is fine. The real bug: performance.now() is monotonic per-page, so that's fine too. Net: low-grade Map leak on PLC churn; prune prevPdu/rateByName to the current snapshot.plcs name set each cycle.

M4. Full-table-body innerHTML re-render on every ~1 s push destroys transient UI state. dashboard.js:155–173 rebuilds the entire <tbody> string and assigns tbody.innerHTML. Consequences at the 1 s cadence: any text selection inside the table is lost every second; :hover is re-evaluated (minor); the browser re-parses ~54 rows of HTML each tick. It is not a flicker source (synchronous replace, no intermediate paint) and 54 rows is cheap, so this is acceptable — but it is the kind of thing that becomes a problem if columns/rows grow. A keyed diff (update existing <tr> cells in place, add/remove only on set change) would also fix the selection-loss annoyance. Flagged as Major because of the selection-loss UX regression on a screen operators stare at; the brief explicitly asked about this.

Minor findings

N1. detail.js has no escapeAttr and dashboard.js's is duplicated. Both files define their own escapeHtml (dashboard.js:176, detail.js:23) — identical code, copy-pasted. escapeAttr exists only in dashboard.js:179. Since both files are separately served and there is no shared module, duplication is the pragmatic choice for a no-build-step project, but the inconsistency (detail.js lacking escapeAttr) is exactly what enabled C1. Recommend a tiny shared util.js served from /assets/, or at minimum make the two escapeHtml definitions and an escapeAttr identical and present in both.

N2. Accessibility gaps.

The sortable <th> elements (index.html:72–81) are clickable via a JS click handler but are not keyboard-focusable and carry no role="button"/tabindex="0"/aria-sort. A keyboard-only operator cannot sort the table. Add tabindex="0", aria-sort reflecting sorted-asc/sorted-desc, and a keydown (Enter/Space) handler.
Table rows are clickable (dashboard.js:232) to open a detail page but are <tr> with cursor:pointer only — not keyboard-reachable and not announced as interactive. Consider making the PLC-name cell an actual <a href="/plc/..."> so it is focusable, middle-click/ctrl-click works natively, and screen readers announce it. This would also remove the need for the JS window.open handler.
The connection-state pill (#conn) updates visually but has no aria-live region, so a screen-reader user is never told the hub dropped. Add aria-live="polite" to #conn or to #conn-text.
<input type="search" id="f-search"> has a placeholder but no associated <label> (visible or aria-label). Same for #f-state. Add aria-label.

N3. Row click always window.open(..., '_blank') — no modifier-key respect, popup-blocker exposure. dashboard.js:235 unconditionally calls window.open. A plain click should arguably open in the same tab or respect the user's intent; programmatic window.open not in direct response to a trusted click on an anchor can be caught by popup blockers in some configs. Tied to N2's suggestion: render the PLC name as a real <a target="_blank" rel="noopener"> and delete the handler. (rel="noopener" also matters — window.open without it leaves window.opener live; here the opened page is same-origin so impact is low, but it is still best practice.)

N4. detail.js plcName parsing is brittle. detail.js:10: decodeURIComponent(location.pathname.replace(/^\/plc\//, '')). If the route is ever served under a path prefix, or the name itself contains an encoded /, this misparses. Also if decodeURIComponent throws (malformed % sequence in the URL) the whole script aborts at line 10 before connect() is ever reached — the page is then blank with no error. Wrap in try/catch and fall back to a visible error state.

N5. dashboard.js:101 and :106 — fleet PDU/s shows — until the second snapshot. Expected (rate needs two samples) and correctly handled, but the aggregate card shows — for the first ~1 s while the table already shows per-row — rates. Cosmetic; no fix required, noted for completeness.

N6. No console.log, no hardcoded absolute URLs, no obvious dead code. All asset/hub URLs are root-relative (/assets/..., /hub/status, /plc/...) — correct, survives any host/port. card()'s extra parameter and the cls third element of card rows are lightly-used but not dead. Clean on this axis.

N7. formatUptime / formatAge silently misrender negative or NaN input. dashboard.js:199 and detail.js:28: if uptimeSeconds/ageSeconds ever arrive negative (clock skew) or non-numeric, Math.floor yields NaN and the card shows NaN. Low risk given server types; a Number.isFinite guard returning '—' is cheap insurance.

What looks good

Reconnect → re-subscribe is correct. dashboard.js:249 and detail.js:233 both re-invoke('SubscribeFleet'/'SubscribePlc') inside onreconnected. This is essential and easy to forget: SignalR auto-reconnect gives a new ConnectionId, server-side group membership does not survive, and the detail page's tag capture is armed per-connection — without the re-subscribe the capture would silently disarm on every transient drop. Verified against StatusHub.SubscribePlc/OnDisconnectedAsync and StatusBroadcaster group targeting — the lifecycle is sound.
onclose correctly does not re-subscribe — it just sets the pill; withAutomaticReconnect's own loop owns recovery. No double-retry on the warm path.
Single HubConnection per page, closed implicitly on navigation. Opening a detail page in a new tab (window.open) means each tab owns exactly one hub connection; closing the tab fires OnDisconnectedAsync server-side which disarms the capture. No connection leak across navigation.
The fleet table escapes correctly. dashboard.js:161–163 routes plc.name through escapeAttr for the data-name attribute and escapeHtml for cell text; plc.host through escapeHtml; plc.listener.lastBindError through escapeAttr for the title= attribute (line 160). This is the right discipline — it is only detail.js that fails to match it (C1).
escapeHtml is a correct minimal implementation — &, <, > cover element-text contexts; escapeAttr adds ". Order matters (& first) and is correct.
Rate computation is robust. dashboard.js:70 guards now > prev.t, Math.max(0, ...) clamps a counter reset/reconnect to a non-negative rate, and performance.now() (monotonic) is the right clock for deltas — not Date.now().
Filter is correct and cheap. visiblePlcs (dashboard.js:130) filters a .slice() copy (never mutates the snapshot), search is case-folded once, and the sort has a stable localeCompare tiebreaker by name. At 54 rows the filter+sort+render per keystroke is sub-millisecond. It correctly survives live updates because render() always re-derives from latest + current filter state.
renderMissing() hot-reload path. The detail page genuinely handles a PLC disappearing mid-session (detail.js:168) — notice shown, cards cleared and hidden. Good defensive UX for the hot-reload scenario (the gap is the never-existed name, see M2).
No CDN dependencies — every <script>/<link> is /assets/..., consistent with the firewalled-network design goal in StatusPage.md.
CSS is clean: design tokens via custom properties, font-display: swap on all @font-face, responsive agg-grid breakpoints, prefers-free but no egregious issues. tr.stale / tr.no-traffic styling gives the debug view real legibility. No !important abuse beyond two justified .empty-row overrides.

Open questions

C1 exploitability depends on whether t.direction and connectedAtUtc can actually carry attacker bytes. direction is server-derived from the FC, so today it is effectively a closed set — but it is typed string on the wire and detail.js trusts it. connectedAtUtc is a DateTimeOffset serialized server-side, so it too is well-formed today. The finding stands because the frontend must not depend on server-side type discipline for its own XSS safety — but if the threat model formally excludes a compromised service, C1 could be re-rated Major. Recommend fixing regardless: the cost is three escapeHtml calls.
Does StatusBroadcaster ever push a plc payload for a PLC that has zero configured BCD tags? detail.js:183 handles debug.tags empty → "No BCD tags configured". Confirmed handled; noted only to flag that the empty-state is covered.
Should the detail page cap how long it waits before declaring the PLC unknown (M2)? The server has no "unknown PLC" rejection in SubscribePlc — it silently accepts any name. A client-side watchdog is the only place this can be surfaced without a server change.

17 KiB Raw Permalink Blame History Unescape Escape