b222362ce0
Fixes every finding from the codereviews/2026-05-16 multi-agent review (2 Critical, 20 Major, 38 Minor) and adds that review to the repo. Highlights: dashboard XSS escape; response cache invalidated on the write request (not just the response); ReloadValidator now runs at startup so port collisions / duplicate names / malformed Resilience profiles fail fast; AdminPort 0 genuinely disables the admin endpoint; PlcListener accept-loop faults propagate to the supervisor's faulted path; reconciler Restart builds before removing; Resilience pipelines are restart-only from a frozen snapshot; multiplexer connect-race leak, watchdog party-list snapshot, backend-response and FC16 framing validation; frontend reconnect retry and util.js load guard; plus the log-event/doc drift sweep and test-port hygiene. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
161 lines
14 KiB
Markdown
161 lines
14 KiB
Markdown
# Frontend / Live Web Dashboard — Code Review
|
||
|
||
Scope: commit `0308490` ("close out the dashboard code-review minor findings"). Files reviewed under `src/Mbproxy/Admin/wwwroot/`: `index.html`, `plc.html`, `dashboard.js`, `detail.js`, `util.js`, `theme.css`, `dashboard.css`, `detail.css`. Vendored assets out of scope. Cross-checked against `docs/Operations/StatusPage.md`, `src/Mbproxy/Admin/StatusDto.cs`, `src/Mbproxy/Admin/DebugDto.cs`, `src/Mbproxy/Admin/StatusBroadcaster.cs`, and `src/Mbproxy/Admin/StatusHub.cs`. Prior findings from `codereviews/2026-05-15/Frontend.md` were verified: the XSS on `detail.js` (C1 in that review) and the `shortTime`/`escapeHtml` gap are confirmed fixed. The cold-start retry loop, URL-decode guard, NaN guards, accessibility attributes, and `util.js` extraction are all verified resolved. This review covers the current code fresh.
|
||
|
||
**Finding counts: 1 Critical, 1 Major, 4 Minor.**
|
||
|
||
---
|
||
|
||
## Critical Findings
|
||
|
||
### C1 — `dashboard.js` `stateChip` renders `listener.state` into `innerHTML` without escaping
|
||
`src/Mbproxy/Admin/wwwroot/dashboard.js:48`
|
||
|
||
**What is wrong.** `stateChip(state)` in `dashboard.js` returns:
|
||
```js
|
||
return `<span class="chip ${cls}">${state}</span>`; // line 48
|
||
```
|
||
The `state` value is a server-supplied string from `plc.listener.state`, interpolated raw into an `innerHTML`-bound template string with no `escapeHtml` call. It flows into `innerHTML` at line 166:
|
||
```js
|
||
<td>${stateChip(plc.listener.state)}</td>
|
||
```
|
||
|
||
**Inconsistency with `detail.js`.** The identical `stateChip` function in `detail.js` was fixed by the prior review and now correctly uses `escapeHtml(state)` at line 76:
|
||
```js
|
||
return `<span class="chip ${cls}">${escapeHtml(state)}</span>`;
|
||
```
|
||
The fix was applied only to `detail.js` and was not propagated back to `dashboard.js`.
|
||
|
||
**Impact.** `listener.state` is a `string` on the SignalR wire. Its values in the current implementation are the closed set `"bound"` / `"recovering"` / `"stopped"`, produced by the supervisor state machine — so the immediate exploitability is limited. However, the frontend must not rely on server-side type discipline for its own XSS safety (same rationale as the prior C1 finding). Under the same threat model as the prior review — no authentication on the admin port, PLC names and other state fields originate from operator-editable `appsettings.json` — any path that injects an unexpected string value into `listener.state` (serialization change, a future hot-reload edge case, a compromised service) would execute arbitrary script in every open fleet-dashboard tab. The fleet dashboard is more exposed than the detail page because it is the persistent long-lived view.
|
||
|
||
**Fix.** One character change: replace `${state}` with `${escapeHtml(state)}` at `dashboard.js:48`. `escapeHtml` is already imported from `window.mbproxyUtil` at line 18.
|
||
|
||
---
|
||
|
||
## Major Findings
|
||
|
||
### M1 — `onreconnected` silently swallows subscribe failure; `connected` pill is misleading when the subscribe invoke is rejected
|
||
`src/Mbproxy/Admin/wwwroot/dashboard.js:254`, `src/Mbproxy/Admin/wwwroot/detail.js:285`
|
||
|
||
**What is wrong.** Both pages' `onreconnected` callbacks fire after `withAutomaticReconnect` restores the transport. They call the hub method (`SubscribeFleet` / `SubscribePlc`) and swallow any error with `.catch(() => {})`:
|
||
|
||
```js
|
||
// dashboard.js:252-255
|
||
connection.onreconnected(() => {
|
||
setConn('connected');
|
||
connection.invoke('SubscribeFleet').catch(() => {});
|
||
});
|
||
```
|
||
|
||
If the hub method throws (hub overloaded, exception in `OnConnectedAsync`, network jitter between the reconnect and the invoke), the connection is live but the subscription is silently dropped. The pill shows `connected` (green) while the page receives zero further updates. There is no retry, no error indication to the operator, and no watchdog that would notice the dead feed. The cold-start path (`connect()`) retries with backoff and guards against re-entering on a live socket; the warm-reconnect path has no equivalent safety net.
|
||
|
||
**Impact.** On a typical LAN deployment with stable infrastructure this will be rare, but it is the kind of silent failure that is hard to diagnose. An operator watching a page that appears connected but shows stale counter values has no indication anything is wrong. The detail page additionally does not re-arm `armSnapshotWatchdog()` on the reconnect path, so the 6-second "no data" notice will not fire either if the subscribe silently fails.
|
||
|
||
**Fix.** Add a minimal retry for the `onreconnected` subscribe. The simplest form: invoke, and on rejection set the pill to `'disconnected'`/`'retrying'` and call `connect()` (the cold-start function already guards against re-starting a non-Disconnected connection). Alternatively, factor the subscribe into a shared helper used by both the cold-start and warm-reconnect paths, so the retry discipline is written once.
|
||
|
||
---
|
||
|
||
## Minor Findings
|
||
|
||
### N1 — `detail.js` does not destructure `escapeAttr` from `window.mbproxyUtil`; the attribute-escaping helper is silently unavailable
|
||
`src/Mbproxy/Admin/wwwroot/detail.js:23`
|
||
|
||
**What is wrong.** `detail.js` imports only `escapeHtml`:
|
||
```js
|
||
const { escapeHtml } = window.mbproxyUtil; // line 23 — escapeAttr not imported
|
||
```
|
||
`escapeAttr` is defined in `util.js` (line 15) and is available on `window.mbproxyUtil`, but `detail.js` never references it. All current attribute-context class interpolations in `detail.js` (the `cls` slots at lines 65, 76; `dirCls` at line 222; `stale.trim()` at line 224) happen to be computed from literal strings, so there is no current XSS. The risk is latent: any future change that passes a server-derived value into a class or other attribute slot in `detail.js` will produce an unescaped-attribute injection with no indication at the call site that the escaping tool is absent. The inconsistency was directly implicated in the prior review's C1 finding (the fix added `escapeHtml` to `detail.js` but did not import `escapeAttr`).
|
||
|
||
**Recommendation.** Change line 23 to:
|
||
```js
|
||
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||
```
|
||
No other changes needed — the helper just needs to be in scope so future reviewers and the linter can verify attribute contexts are correctly escaped.
|
||
|
||
---
|
||
|
||
### N2 — `prevPdu` and `rateByName` Maps are never pruned; stale entries accumulate across hot-reload PLC removal
|
||
`src/Mbproxy/Admin/wwwroot/dashboard.js:11–12, 66–78`
|
||
|
||
**What is wrong.** `updateRates(snapshot)` (lines 66–78) only writes entries for PLCs present in the current snapshot; it never deletes entries for PLCs removed by a hot-reload. Over a long session with PLC churn, `prevPdu` and `rateByName` grow without bound. Concretely: `rateByName.size` (line 102) is used to decide whether the fleet PDU/s card shows a rate or `—`; after a PLC is removed its stale entry keeps `rateByName.size > 0`, so the card shows `0` instead of the arguably more correct `—` for a fleet that has shrunk to zero active PLCs. `renderAggregates` already iterates only `s.plcs` (line 91) so the stale rate values are never summed — this is correct — but the Map still grows.
|
||
|
||
**Impact.** Negligible in the current 54-PLC deployment with low PLC churn rate. Becomes visible if PLCs are repeatedly added and removed via hot-reload over a long-lived browser session. The `rateByName.size` false-positive is the most user-visible symptom.
|
||
|
||
**Recommendation.** At the end of `updateRates`, prune both Maps to the current snapshot's PLC name set:
|
||
```js
|
||
const currentNames = new Set(snapshot.plcs.map(p => p.name));
|
||
for (const k of prevPdu.keys()) { if (!currentNames.has(k)) prevPdu.delete(k); }
|
||
for (const k of rateByName.keys()) { if (!currentNames.has(k)) rateByName.delete(k); }
|
||
```
|
||
|
||
---
|
||
|
||
### N3 — Empty `class=""` attribute emitted on non-stale debug rows; `stale.trim()` call is a no-op
|
||
`src/Mbproxy/Admin/wwwroot/detail.js:223–224`
|
||
|
||
**What is wrong.** The stale/non-stale class is computed as:
|
||
```js
|
||
const stale = (t.ageSeconds || 0) > 30 ? ' stale' : ''; // line 223
|
||
return `<tr class="${stale.trim()}"> // line 224
|
||
```
|
||
When the row is not stale, `stale` is `''`, `stale.trim()` is `''`, and the emitted markup is `<tr class="">`. The `.trim()` call is defensive against the leading space in `' stale'`, but a simpler and cleaner expression that also avoids the empty-attribute emission is:
|
||
```js
|
||
const staleCls = (t.ageSeconds || 0) > 30 ? 'stale' : '';
|
||
return `<tr${staleCls ? ` class="${staleCls}"` : ''}>
|
||
```
|
||
Or simply omit `.trim()` and use `class="${stale}"` — the leading space is harmless in CSS class matching. This is cosmetic.
|
||
|
||
**Impact.** Zero functional impact. The empty `class=""` attribute is parsed and ignored by every browser. Flagged as a code-cleanliness observation.
|
||
|
||
---
|
||
|
||
### N4 — `window.mbproxyUtil` is destructured at script-top-level; a util.js load failure aborts the entire page silently
|
||
`src/Mbproxy/Admin/wwwroot/dashboard.js:18`, `src/Mbproxy/Admin/wwwroot/detail.js:23`
|
||
|
||
**What is wrong.** Both scripts destructure `window.mbproxyUtil` at the top of their IIFE, before any DOM or error-handling setup:
|
||
```js
|
||
const { escapeHtml, escapeAttr } = window.mbproxyUtil; // dashboard.js:18
|
||
const { escapeHtml } = window.mbproxyUtil; // detail.js:23
|
||
```
|
||
If `util.js` fails to load (404 after a publish error, embedded-asset routing bug, script tag typo), `window.mbproxyUtil` is `undefined`, the destructuring throws a `TypeError` at the first `const`, and the remaining script — including `DOMContentLoaded`, error notices, and the SignalR connection — never runs. The page renders its static HTML indefinitely with no visible error. The HTML `<noscript>` fallback is absent.
|
||
|
||
**Impact.** Unlikely in production given the assets are embedded in the binary and verified by `EmbeddedAssetsTests`. Worth noting because the failure mode is silent — no browser console error surfaces to the operator, and the page appears partially rendered rather than broken.
|
||
|
||
**Recommendation.** Add a null-guard:
|
||
```js
|
||
if (!window.mbproxyUtil) {
|
||
document.body.innerHTML = '<p style="padding:2rem;color:red">Admin UI failed to load — missing util.js. Check the browser console.</p>';
|
||
throw new Error('window.mbproxyUtil not defined');
|
||
}
|
||
const { escapeHtml, escapeAttr } = window.mbproxyUtil;
|
||
```
|
||
Or equivalently move the destructuring inside `DOMContentLoaded` after a check. The `throw` intentionally aborts the script — at least with a visible error — rather than letting it limp along with broken escaping.
|
||
|
||
---
|
||
|
||
## What Looks Good
|
||
|
||
- **All prior C1 and M-series findings from `codereviews/2026-05-15/Frontend.md` are confirmed fixed.** Specifically: `t.direction`, `t.rawHex`, `t.name`, `shortTime(c.connectedAtUtc)`, and `l.lastBindError` in `detail.js` are all now correctly wrapped in `escapeHtml`. The `shortTime` fallback now goes through `escapeHtml` at the call site (line 105). The `plcNameError` guard prevents the script from silently failing on a malformed URL. The `util.js` shared helper was introduced and is loaded before both page scripts. The `armSnapshotWatchdog` addresses the "unknown PLC sits forever on waiting" scenario. Keyboard/Enter/Space sort handlers and `aria-sort` on `<th>` elements are in place. `aria-live="polite"` on the connection pill and `aria-label` on the toolbar inputs are present. The PLC name is a real `<a>` link with `rel="noopener"` — no more `window.open`.
|
||
|
||
- **`detail.js` `stateChip` is correctly escaped.** The fix from the prior review was applied to `detail.js:76` (`escapeHtml(state)`). Only `dashboard.js` has the regression (C1 above).
|
||
|
||
- **`detail.js` `card()` `v` slot is safe through consistent caller discipline.** All values passed as the second element of card rows are either `stateChip()` output (internally escaped), `num()` output (numeric-only), `ratioText()` output (`n%` or `—`), or explicitly `escapeHtml`-wrapped. No server string lands unescaped in the `v` slot.
|
||
|
||
- **SignalR lifecycle is correct.** `onreconnected` re-invokes `SubscribeFleet`/`SubscribePlc` — essential for group re-subscription after a transport reconnect. `onclose` does not re-subscribe (correct — `withAutomaticReconnect` owns the warm path). `tabId` is stable per page-load (not per ConnectionId) so transport reconnects do not leak armed captures server-side. `gotSnapshot()` correctly clears the watchdog timer on the first data arrival.
|
||
|
||
- **Cold-start retry is well-structured.** The `connect()` function guards `connection.state === Disconnected` before calling `start()`, preventing a subscribe-failure from re-starting a live socket. Capped exponential backoff (`retryMs = Math.min(retryMs * 2, 30000)`) prevents infinite tight retries.
|
||
|
||
- **All DTO field names are correct.** `detail.js` and `dashboard.js` read `coalescedHitCount`, `cacheHitCount`, `backendHeartbeatsSent`, `backendHeartbeatsFailed`, `backendIdleDisconnects`, `disconnectCascades`, `queueDepth`, `txIdWraps`, `inFlight`, `maxInFlight`, `invalidBcdWarnings`, `exceptionsByCode.codeOther`, etc. — all matching the camelCase JSON policy applied by `StatusJsonContext`/`JsonKnownNamingPolicy.CamelCase`. No field name typos found.
|
||
|
||
- **`escapeHtml` and `escapeAttr` in `util.js` are correct.** `&` is replaced before `<`/`>` (correct order), and `escapeAttr` adds `"` on top of `escapeHtml`. Both functions are pure and side-effect-free.
|
||
|
||
- **No CDN dependencies.** All `<script>`/`<link>` tags reference `/assets/…` — consistent with the firewalled-network design intent documented in `StatusPage.md`.
|
||
|
||
- **`formatUptime` and `formatAge` both guard `Number.isFinite` and `< 0`.** NaN/negative clock skew returns `—` rather than `NaN` text.
|
||
|
||
- **`tagCell(t)` correctly coerces `t.address` with `Number()`.** The address is rendered as a pure decimal or hex numeric, never as a raw server string.
|
||
|
||
- **`debug-rows` empty-state and `colspan` counts are consistent with the HTML `<thead>`.** The debug table has 6 columns; all static empty-row `<td colspan="6">` and the `no-traffic` span layout (1+1+3+1) sum to 6. The KPI table has 10 columns; `colspan="10"` matches. No cross-file DOM-id inconsistencies found.
|
||
|
||
- **CSS is clean.** `focus-visible` outlines on sortable `<th>` elements and the PLC-name `<a>` provide keyboard focus indicators. No `!important` abuse. Design tokens are consistent across the three CSS files.
|