Files
wwtools/mbproxy/codereviews/2026-05-16/Frontend.md
Joseph Doherty b222362ce0 mbproxy: remediate the 2026-05-16 code-review findings
Fixes every finding from the codereviews/2026-05-16 multi-agent review
(2 Critical, 20 Major, 38 Minor) and adds that review to the repo.

Highlights: dashboard XSS escape; response cache invalidated on the
write request (not just the response); ReloadValidator now runs at
startup so port collisions / duplicate names / malformed Resilience
profiles fail fast; AdminPort 0 genuinely disables the admin endpoint;
PlcListener accept-loop faults propagate to the supervisor's faulted
path; reconciler Restart builds before removing; Resilience pipelines
are restart-only from a frozen snapshot; multiplexer connect-race leak,
watchdog party-list snapshot, backend-response and FC16 framing
validation; frontend reconnect retry and util.js load guard; plus the
log-event/doc drift sweep and test-port hygiene.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:08:06 -04:00

161 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Frontend / Live Web Dashboard — Code Review
Scope: commit `0308490` ("close out the dashboard code-review minor findings"). Files reviewed under `src/Mbproxy/Admin/wwwroot/`: `index.html`, `plc.html`, `dashboard.js`, `detail.js`, `util.js`, `theme.css`, `dashboard.css`, `detail.css`. Vendored assets out of scope. Cross-checked against `docs/Operations/StatusPage.md`, `src/Mbproxy/Admin/StatusDto.cs`, `src/Mbproxy/Admin/DebugDto.cs`, `src/Mbproxy/Admin/StatusBroadcaster.cs`, and `src/Mbproxy/Admin/StatusHub.cs`. Prior findings from `codereviews/2026-05-15/Frontend.md` were verified: the XSS on `detail.js` (C1 in that review) and the `shortTime`/`escapeHtml` gap are confirmed fixed. The cold-start retry loop, URL-decode guard, NaN guards, accessibility attributes, and `util.js` extraction are all verified resolved. This review covers the current code fresh.
**Finding counts: 1 Critical, 1 Major, 4 Minor.**
---
## Critical Findings
### C1 — `dashboard.js` `stateChip` renders `listener.state` into `innerHTML` without escaping
`src/Mbproxy/Admin/wwwroot/dashboard.js:48`
**What is wrong.** `stateChip(state)` in `dashboard.js` returns:
```js
return `<span class="chip ${cls}">${state}</span>`; // line 48
```
The `state` value is a server-supplied string from `plc.listener.state`, interpolated raw into an `innerHTML`-bound template string with no `escapeHtml` call. It flows into `innerHTML` at line 166:
```js
<td>${stateChip(plc.listener.state)}</td>
```
**Inconsistency with `detail.js`.** The identical `stateChip` function in `detail.js` was fixed by the prior review and now correctly uses `escapeHtml(state)` at line 76:
```js
return `<span class="chip ${cls}">${escapeHtml(state)}</span>`;
```
The fix was applied only to `detail.js` and was not propagated back to `dashboard.js`.
**Impact.** `listener.state` is a `string` on the SignalR wire. Its values in the current implementation are the closed set `"bound"` / `"recovering"` / `"stopped"`, produced by the supervisor state machine — so the immediate exploitability is limited. However, the frontend must not rely on server-side type discipline for its own XSS safety (same rationale as the prior C1 finding). Under the same threat model as the prior review — no authentication on the admin port, PLC names and other state fields originate from operator-editable `appsettings.json` — any path that injects an unexpected string value into `listener.state` (serialization change, a future hot-reload edge case, a compromised service) would execute arbitrary script in every open fleet-dashboard tab. The fleet dashboard is more exposed than the detail page because it is the persistent long-lived view.
**Fix.** One character change: replace `${state}` with `${escapeHtml(state)}` at `dashboard.js:48`. `escapeHtml` is already imported from `window.mbproxyUtil` at line 18.
---
## Major Findings
### M1 — `onreconnected` silently swallows subscribe failure; `connected` pill is misleading when the subscribe invoke is rejected
`src/Mbproxy/Admin/wwwroot/dashboard.js:254`, `src/Mbproxy/Admin/wwwroot/detail.js:285`
**What is wrong.** Both pages' `onreconnected` callbacks fire after `withAutomaticReconnect` restores the transport. They call the hub method (`SubscribeFleet` / `SubscribePlc`) and swallow any error with `.catch(() => {})`:
```js
// dashboard.js:252-255
connection.onreconnected(() => {
setConn('connected');
connection.invoke('SubscribeFleet').catch(() => {});
});
```
If the hub method throws (hub overloaded, exception in `OnConnectedAsync`, network jitter between the reconnect and the invoke), the connection is live but the subscription is silently dropped. The pill shows `connected` (green) while the page receives zero further updates. There is no retry, no error indication to the operator, and no watchdog that would notice the dead feed. The cold-start path (`connect()`) retries with backoff and guards against re-entering on a live socket; the warm-reconnect path has no equivalent safety net.
**Impact.** On a typical LAN deployment with stable infrastructure this will be rare, but it is the kind of silent failure that is hard to diagnose. An operator watching a page that appears connected but shows stale counter values has no indication anything is wrong. The detail page additionally does not re-arm `armSnapshotWatchdog()` on the reconnect path, so the 6-second "no data" notice will not fire either if the subscribe silently fails.
**Fix.** Add a minimal retry for the `onreconnected` subscribe. The simplest form: invoke, and on rejection set the pill to `'disconnected'`/`'retrying'` and call `connect()` (the cold-start function already guards against re-starting a non-Disconnected connection). Alternatively, factor the subscribe into a shared helper used by both the cold-start and warm-reconnect paths, so the retry discipline is written once.
---
## Minor Findings
### N1 — `detail.js` does not destructure `escapeAttr` from `window.mbproxyUtil`; the attribute-escaping helper is silently unavailable
`src/Mbproxy/Admin/wwwroot/detail.js:23`
**What is wrong.** `detail.js` imports only `escapeHtml`:
```js
const { escapeHtml } = window.mbproxyUtil; // line 23 — escapeAttr not imported
```
`escapeAttr` is defined in `util.js` (line 15) and is available on `window.mbproxyUtil`, but `detail.js` never references it. All current attribute-context class interpolations in `detail.js` (the `cls` slots at lines 65, 76; `dirCls` at line 222; `stale.trim()` at line 224) happen to be computed from literal strings, so there is no current XSS. The risk is latent: any future change that passes a server-derived value into a class or other attribute slot in `detail.js` will produce an unescaped-attribute injection with no indication at the call site that the escaping tool is absent. The inconsistency was directly implicated in the prior review's C1 finding (the fix added `escapeHtml` to `detail.js` but did not import `escapeAttr`).
**Recommendation.** Change line 23 to:
```js
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
```
No other changes needed — the helper just needs to be in scope so future reviewers and the linter can verify attribute contexts are correctly escaped.
---
### N2 — `prevPdu` and `rateByName` Maps are never pruned; stale entries accumulate across hot-reload PLC removal
`src/Mbproxy/Admin/wwwroot/dashboard.js:1112, 6678`
**What is wrong.** `updateRates(snapshot)` (lines 6678) only writes entries for PLCs present in the current snapshot; it never deletes entries for PLCs removed by a hot-reload. Over a long session with PLC churn, `prevPdu` and `rateByName` grow without bound. Concretely: `rateByName.size` (line 102) is used to decide whether the fleet PDU/s card shows a rate or `—`; after a PLC is removed its stale entry keeps `rateByName.size > 0`, so the card shows `0` instead of the arguably more correct `—` for a fleet that has shrunk to zero active PLCs. `renderAggregates` already iterates only `s.plcs` (line 91) so the stale rate values are never summed — this is correct — but the Map still grows.
**Impact.** Negligible in the current 54-PLC deployment with low PLC churn rate. Becomes visible if PLCs are repeatedly added and removed via hot-reload over a long-lived browser session. The `rateByName.size` false-positive is the most user-visible symptom.
**Recommendation.** At the end of `updateRates`, prune both Maps to the current snapshot's PLC name set:
```js
const currentNames = new Set(snapshot.plcs.map(p => p.name));
for (const k of prevPdu.keys()) { if (!currentNames.has(k)) prevPdu.delete(k); }
for (const k of rateByName.keys()) { if (!currentNames.has(k)) rateByName.delete(k); }
```
---
### N3 — Empty `class=""` attribute emitted on non-stale debug rows; `stale.trim()` call is a no-op
`src/Mbproxy/Admin/wwwroot/detail.js:223224`
**What is wrong.** The stale/non-stale class is computed as:
```js
const stale = (t.ageSeconds || 0) > 30 ? ' stale' : ''; // line 223
return `<tr class="${stale.trim()}"> // line 224
```
When the row is not stale, `stale` is `''`, `stale.trim()` is `''`, and the emitted markup is `<tr class="">`. The `.trim()` call is defensive against the leading space in `' stale'`, but a simpler and cleaner expression that also avoids the empty-attribute emission is:
```js
const staleCls = (t.ageSeconds || 0) > 30 ? 'stale' : '';
return `<tr${staleCls ? ` class="${staleCls}"` : ''}>
```
Or simply omit `.trim()` and use `class="${stale}"` — the leading space is harmless in CSS class matching. This is cosmetic.
**Impact.** Zero functional impact. The empty `class=""` attribute is parsed and ignored by every browser. Flagged as a code-cleanliness observation.
---
### N4 — `window.mbproxyUtil` is destructured at script-top-level; a util.js load failure aborts the entire page silently
`src/Mbproxy/Admin/wwwroot/dashboard.js:18`, `src/Mbproxy/Admin/wwwroot/detail.js:23`
**What is wrong.** Both scripts destructure `window.mbproxyUtil` at the top of their IIFE, before any DOM or error-handling setup:
```js
const { escapeHtml, escapeAttr } = window.mbproxyUtil; // dashboard.js:18
const { escapeHtml } = window.mbproxyUtil; // detail.js:23
```
If `util.js` fails to load (404 after a publish error, embedded-asset routing bug, script tag typo), `window.mbproxyUtil` is `undefined`, the destructuring throws a `TypeError` at the first `const`, and the remaining script — including `DOMContentLoaded`, error notices, and the SignalR connection — never runs. The page renders its static HTML indefinitely with no visible error. The HTML `<noscript>` fallback is absent.
**Impact.** Unlikely in production given the assets are embedded in the binary and verified by `EmbeddedAssetsTests`. Worth noting because the failure mode is silent — no browser console error surfaces to the operator, and the page appears partially rendered rather than broken.
**Recommendation.** Add a null-guard:
```js
if (!window.mbproxyUtil) {
document.body.innerHTML = '<p style="padding:2rem;color:red">Admin UI failed to load — missing util.js. Check the browser console.</p>';
throw new Error('window.mbproxyUtil not defined');
}
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
```
Or equivalently move the destructuring inside `DOMContentLoaded` after a check. The `throw` intentionally aborts the script — at least with a visible error — rather than letting it limp along with broken escaping.
---
## What Looks Good
- **All prior C1 and M-series findings from `codereviews/2026-05-15/Frontend.md` are confirmed fixed.** Specifically: `t.direction`, `t.rawHex`, `t.name`, `shortTime(c.connectedAtUtc)`, and `l.lastBindError` in `detail.js` are all now correctly wrapped in `escapeHtml`. The `shortTime` fallback now goes through `escapeHtml` at the call site (line 105). The `plcNameError` guard prevents the script from silently failing on a malformed URL. The `util.js` shared helper was introduced and is loaded before both page scripts. The `armSnapshotWatchdog` addresses the "unknown PLC sits forever on waiting" scenario. Keyboard/Enter/Space sort handlers and `aria-sort` on `<th>` elements are in place. `aria-live="polite"` on the connection pill and `aria-label` on the toolbar inputs are present. The PLC name is a real `<a>` link with `rel="noopener"` — no more `window.open`.
- **`detail.js` `stateChip` is correctly escaped.** The fix from the prior review was applied to `detail.js:76` (`escapeHtml(state)`). Only `dashboard.js` has the regression (C1 above).
- **`detail.js` `card()` `v` slot is safe through consistent caller discipline.** All values passed as the second element of card rows are either `stateChip()` output (internally escaped), `num()` output (numeric-only), `ratioText()` output (`n%` or `—`), or explicitly `escapeHtml`-wrapped. No server string lands unescaped in the `v` slot.
- **SignalR lifecycle is correct.** `onreconnected` re-invokes `SubscribeFleet`/`SubscribePlc` — essential for group re-subscription after a transport reconnect. `onclose` does not re-subscribe (correct — `withAutomaticReconnect` owns the warm path). `tabId` is stable per page-load (not per ConnectionId) so transport reconnects do not leak armed captures server-side. `gotSnapshot()` correctly clears the watchdog timer on the first data arrival.
- **Cold-start retry is well-structured.** The `connect()` function guards `connection.state === Disconnected` before calling `start()`, preventing a subscribe-failure from re-starting a live socket. Capped exponential backoff (`retryMs = Math.min(retryMs * 2, 30000)`) prevents infinite tight retries.
- **All DTO field names are correct.** `detail.js` and `dashboard.js` read `coalescedHitCount`, `cacheHitCount`, `backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`, `disconnectCascades`, `queueDepth`, `txIdWraps`, `inFlight`, `maxInFlight`, `invalidBcdWarnings`, `exceptionsByCode.codeOther`, etc. — all matching the camelCase JSON policy applied by `StatusJsonContext`/`JsonKnownNamingPolicy.CamelCase`. No field name typos found.
- **`escapeHtml` and `escapeAttr` in `util.js` are correct.** `&` is replaced before `<`/`>` (correct order), and `escapeAttr` adds `"` on top of `escapeHtml`. Both functions are pure and side-effect-free.
- **No CDN dependencies.** All `<script>`/`<link>` tags reference `/assets/…` — consistent with the firewalled-network design intent documented in `StatusPage.md`.
- **`formatUptime` and `formatAge` both guard `Number.isFinite` and `< 0`.** NaN/negative clock skew returns `—` rather than `NaN` text.
- **`tagCell(t)` correctly coerces `t.address` with `Number()`.** The address is rendered as a pure decimal or hex numeric, never as a raw server string.
- **`debug-rows` empty-state and `colspan` counts are consistent with the HTML `<thead>`.** The debug table has 6 columns; all static empty-row `<td colspan="6">` and the `no-traffic` span layout (1+1+3+1) sum to 6. The KPI table has 10 columns; `colspan="10"` matches. No cross-file DOM-id inconsistencies found.
- **CSS is clean.** `focus-visible` outlines on sortable `<th>` elements and the PLC-name `<a>` provide keyboard focus indicators. No `!important` abuse. Design tokens are consistent across the three CSS files.