554b05d28c
Reviewed the new SignalR dashboard and fixed its two top findings: a stored XSS on the connection-detail page (unescaped tag name / direction / timestamp rendered into innerHTML) and FC03/FC04 cache hits bypassing the debug-view capture, which left cached tags frozen while their age climbed. Also adds an optional human-friendly Name to BCD tags surfaced on the debug view, and loads the real fleet config from tags.txt (12 named BCD tags, PLC Z28061) so the published appsettings.json is deploy-ready. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
53 lines
7.4 KiB
Markdown
53 lines
7.4 KiB
Markdown
# mbproxy SignalR Web Dashboard — Code Review Overview
|
|
|
|
Scope: commit `e719dd5` ("replace status page with a live SignalR web dashboard") — ~3,500 lines across 49 files. Reviewed in four subsystem passes:
|
|
|
|
- [`AdminSignalR.md`](AdminSignalR.md) — `src/Mbproxy/Admin/*` (host, hub, broadcaster, push sink, subscription tracker, snapshot builder, DTOs).
|
|
- [`TagCapture.md`](TagCapture.md) — `src/Mbproxy/Proxy/{TagValueCapture,TagCaptureRegistry}.cs` + pipeline/reconciler/worker integration.
|
|
- [`Frontend.md`](Frontend.md) — `src/Mbproxy/Admin/wwwroot/*` (hand-written HTML/CSS/JS only; vendored assets excluded).
|
|
- [`TestsAndConfig.md`](TestsAndConfig.md) — new/changed tests, `MbproxyOptions`/`ReloadValidator`, csproj `EmbeddedResource`, smoke config.
|
|
|
|
## Verdict
|
|
|
|
The dashboard is well-architected at the macro level — the `IStatusPushSink` testability seam, the lock-free `TagValueCapture` data structure, the broadcaster's lifecycle binding to the Kestrel app, and the embedded-asset model are all sound. But the review surfaced **one security bug and a cluster of concurrency bugs that share a single root cause**, plus a feature-correctness gap. None of these should ship to operators as-is.
|
|
|
|
The standout pattern: **two independent reviewers (`AdminSignalR.md`, `TagCapture.md`) converged on the same root cause** — capture arm/disarm state is not authoritatively owned. `TagValueCapture.IsArmed` is carried on the transient capture instance and counted by `PlcSubscriptionTracker` keyed on SignalR `ConnectionId`. That single design choice produces C2, C3, and M3 below. Fix it once and three findings collapse.
|
|
|
|
## Cross-cutting critical findings
|
|
|
|
**C1 — Stored XSS on the connection-detail page.** (`Frontend.md` C1) `detail.js` interpolates `t.direction` raw into `.innerHTML` (`detail.js:194/203`), and the client-list builder escapes `c.remote` but not the `shortTime(c.connectedAtUtc)` result — whose `catch` branch returns the raw timestamp string verbatim (`detail.js:86`). The admin endpoint binds `IPAddress.Any` with **no authentication**, and the injected strings are device-/config-influenceable. A crafted value executes script in any operator's browser on the trusted segment. Fleet table (`dashboard.js`) escapes correctly — only the detail page breaks discipline. **Fix:** three `escapeHtml` calls, or switch the row builders to `createElement`/`textContent`.
|
|
|
|
**C2 — Capture armed forever after a SignalR reconnect.** (`AdminSignalR.md` C1) `PlcSubscriptionTracker` keys subscriber counts on `ConnectionId`, which changes on every transport reconnect. `OnDisconnectedAsync` for the old connection is unordered relative to the new connection's `SubscribePlc`, so a reconnect-during-view leaks the count and leaves a PLC's `TagValueCapture` **armed with no viewer for the life of the process**. The documented invariant "zero hot-path cost when nobody is watching" becomes false after any reconnect — every backend read and FC06/FC16 write then pays a `FrozenDictionary` lookup + `TagValueObservation` allocation, fleet-wide. `DisarmAll()` only fires on admin shutdown/port hot-reload, so the leak is never bounded in steady state.
|
|
|
|
**C3 — Cache hits never reach `Record()`.** (`TagCapture.md` C1) The `ctx.Capture?.Record(...)` calls live only in `BcdPduPipeline.ProcessResponse`, but the Phase-11 response-cache hit path (`PlcMultiplexer.cs:823-828`) returns cached post-rewrite bytes without invoking the pipeline. For any BCD tag with `CacheTtlMs > 0`, once the cache is warm the debug view **freezes at the last cache-miss observation while `AgeSeconds` climbs** — actively misleading an operator into thinking a live tag is dead. Feature-correctness, not a crash; caching is OFF by default so the default deployment is unaffected.
|
|
|
|
## Major findings (consolidated)
|
|
|
|
- **M1 — Non-atomic `SubscribePlc`.** (`AdminSignalR.md` C2) `AddToGroupAsync` then `_tracker.Add` span two awaits; a same-connection `OnDisconnectedAsync` can interleave and arm a capture on an already-gone connection. Same root cause as C2.
|
|
- **M2 — `GetOrCreate` lost-update race.** (`TagCapture.md` M1) The `AddOrUpdate` delegate reads `existing.IsArmed`, but a concurrent detail-page open can land its `Arm` on the about-to-be-discarded instance — publishing the rebuilt capture disarmed under an open page, or leaking it armed with no viewer.
|
|
- **M3 — Recommended fix for C2/M1/M2:** make `PlcSubscriptionTracker`'s subscriber count the *single authority* for arm state — key it on a stable per-tab identifier (or count distinct viewers, not connections), and derive `IsArmed` from the count rather than carrying it on the transient capture.
|
|
- **M4 — `StatusBroadcaster.LoopAsync` has zero coverage.** (`TestsAndConfig.md`) All four broadcaster tests call `PushOnceAsync` directly; the production push loop, its interval hot-reload re-read, the `Math.Max(100,…)` floor, and cancellation are unverified. The `/hub/status` endpoint is never exercised end-to-end.
|
|
- **M5 — `DebugJsonContext` is dead code; SignalR serializes via reflection `System.Text.Json`.** (`AdminSignalR.md`) A latent AOT trap; the camelCase wire-shape guarantee has no test.
|
|
- **M6 — Bind failure after `_broadcaster.Start()` leaks the broadcaster loop and a bound listener** — the catch block's cleanup is incomplete. (`AdminSignalR.md`)
|
|
- **M7 — Untested arm/disarm race in `TagValueCapture`.** (`TestsAndConfig.md`) `Disarm()` flips `_armed=false` then clears slots; a `Record()` that wins the `_armed` check before `Disarm` runs leaves a stale observation on a disarmed capture. The torn-read test only races `Record` vs `Snapshot`, never vs `Disarm`.
|
|
- **M8 — Frontend cold-start retry loop** layers an unbounded `setTimeout` retry over `withAutomaticReconnect` and can wedge if `start()` succeeds but `invoke()` fails. (`Frontend.md` M1)
|
|
- **M9 — Detail page never handles an unknown PLC name** — sits forever on "Waiting for first snapshot…" with a green pill. (`Frontend.md` M2)
|
|
|
|
## Recommended remediation order
|
|
|
|
1. **C1 (XSS)** — smallest fix, highest severity, ships in any operator-facing build. Do first.
|
|
2. **M3** — re-root capture arm/disarm authority in `PlcSubscriptionTracker`; closes C2, M1, M2, M7 together. Add a concurrency test for the tracker (currently has none).
|
|
3. **C3** — add a `Record()` call on the cache-hit path in `PlcMultiplexer`, or document the debug view as cache-blind. Decide explicitly.
|
|
4. **M4** — add an end-to-end `/hub/status` test (real `HubConnection`, assert a `fleet`/`plc` message and its camelCase shape — also closes the M5 gap) and a `LoopAsync` interval/cancellation test.
|
|
5. **M6, M8, M9** and the Minor findings in each subsystem file.
|
|
|
|
## What looks good
|
|
|
|
- `IStatusPushSink` is a genuine, well-placed testability seam.
|
|
- `TagValueCapture` itself — lock-free, torn-read-safe via immutable records + `Volatile.Write`/`Read`, `FrozenDictionary` address map — is correct. The weakness is *who arms it*, not the structure.
|
|
- Broadcaster per-cycle error isolation and lifecycle binding to the Kestrel app (no leak across port hot-reloads).
|
|
- Fleet table (`dashboard.js`) escapes all dynamic content; reconnect→re-subscribe is correctly wired in both JS files; no CDN deps, no stray `console.log`.
|
|
- `StatusHtmlRenderer` removed cleanly — no dangling source or test references.
|
|
- csproj `EmbeddedResource` glob is correct (Worker SDK has no competing web default globs).
|
|
- `AdminPushIntervalMs` validation matches house style across both validators.
|