Files
scadalink-design/code-reviews/CentralUI/findings.md

921 lines
47 KiB
Markdown

# Code Review — CentralUI
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.CentralUI` |
| Design doc | `docs/requirements/Component-CentralUI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | `9c60592` |
| Open findings | 1 |
## Summary
The Central UI is a sizeable, generally well-structured Blazor Server module:
custom Bootstrap components only (no third-party UI frameworks, as required),
consistent list/form page patterns, careful disposal in most components, and a
thoughtful Roslyn-backed script editor. The most serious problem is the
**Test Run sandbox** (`ScriptAnalysisService.RunInSandboxAsync`): it compiles
and executes arbitrary user C# *in the central process* with no enforcement of
the documented script trust model — the forbidden-API list is only a Monaco
editor diagnostic, never applied before execution — so a Design user can run
`System.IO`/`Process`/`Reflection` code on the central node. Several other
themes recur: (1) per-circuit security drift — site-scoped Deployment claims
are written at login but never read, so site scoping is not enforced anywhere;
(2) Blazor render-thread and disposal hazards — background `Timer` / `Task.Delay`
callbacks and stream callbacks touch component state and `@ref` children that
may already be disposed; (3) process-global mutation (`Console.SetOut`) shared
across concurrent circuits; (4) drift from the design doc on session expiry and
on the "deployment status pushes via SignalR" claim (the page actually polls).
Testing coverage is thin for a module this large: only the script analyzer,
TreeView, schema model, and a few data-connection pages have unit tests; most
pages and the auth bridge are untested.
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | DebugView cap logic, audit-log timezone, toast race — see findings. |
| 2 | Akka.NET conventions | ☑ | Module is mostly UI; `DebugStreamService` actor usage reviewed (in Communication but driven from here). No actor-convention violations in CentralUI proper. |
| 3 | Concurrency & thread safety | ☑ | `Console.SetOut` global mutation, stream/timer callbacks on non-render threads, toast `_ = Task.Delay`. |
| 4 | Error handling & resilience | ☑ | Broad `catch {}` swallowing, dangling `TaskCompletionSource` on dialog disposal. |
| 5 | Security | ☑ | Sandbox not enforcing trust model (Critical); site scoping never enforced; auth bridge reads stale HttpContext; logout CSRF. |
| 6 | Performance & resource management | ☑ | N+1 site-connection query, repeated `FilteredMessages` recomputation, full-page paginators rendering all page buttons. |
| 7 | Design-document adherence | ☑ | Session expiry diverges from "15-min sliding + 30-min idle"; Deployments polls despite "push via SignalR"; nav exposes Deployment-only pages to all roles. |
| 8 | Code organization & conventions | ☑ | Generally good; options classes absent (no appsettings binding here); no major violations. |
| 9 | Testing coverage | ☑ | Auth, sandbox-run, DebugView, Health, ParkedMessages, most pages untested. |
| 10 | Documentation & comments | ☑ | Comments are accurate and helpful; a few stale claims noted. |
## Findings
### CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement
| | |
|--|--|
| Severity | Critical |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:171-424` |
**Description**
`RunInSandboxAsync` compiles user-supplied script code with `CSharpScript.Create`
and executes it (`script.RunAsync`) directly inside the central process. The
"sandbox" applies only a wall-clock timeout and an output-size cap. It does
**not** enforce the documented script trust model: the forbidden-API set
(`System.IO`, `System.Diagnostics`/`Process`, `System.Reflection`, `System.Net`,
threading) is checked only in `FindForbiddenApiUsages`, which feeds Monaco
editor diagnostics — it is never consulted before `RunInSandboxAsync` executes.
`DefaultOptions` references `typeof(object).Assembly` (the full BCL), so a
Design-role user can submit `System.IO.File.WriteAllText(...)`,
`System.Diagnostics.Process.Start(...)`, reflection, or raw socket code via
`POST /api/script-analysis/run` and it runs with the central host process's
full privileges. The endpoint is gated only by `RequireDesign`. This is a
remote code execution path on the central cluster node.
**Recommendation**
Before executing, run the same forbidden-API analysis used for diagnostics and
reject any script with a `SCADA001`/`SCADA002` (severity-8) marker; additionally
restrict the compilation's metadata references to the curated script API
surface, and ideally execute in an isolated `AssemblyLoadContext`/process with
constrained permissions. Treat the trust model as an execution-time gate, not
an editor hint.
**Resolution**
Resolved 2026-05-16. A Roslyn semantic trust-model gate was added. `RunInSandboxAsync`
now calls `EnforceTrustModel` after compilation and before `script.RunAsync`; if the
script references any forbidden API the run is rejected (`SandboxErrorKind.CompileError`)
with the offending markers, and the same gate is applied to nested shared scripts in
`callSharedFunc`. `FindForbiddenApiUsages` was reworked so it resolves every identifier
(not just the leftmost) against the semantic model and checks types **and** members —
so a fully-qualified call such as `System.IO.File.WriteAllText(...)` is now caught, not
only `using`-directive or bare-type forms. This is a static semantic gate consistent
with the documented trust model; it is not a process sandbox — reflection-based
indirection remains out of its reach, and full isolation would require running scripts
in a separate constrained process (a larger change deliberately not taken here).
Regression tests `RunInSandbox_FullyQualifiedForbiddenApi_IsBlockedBeforeExecution`,
`RunInSandbox_ForbiddenUsingDirective_IsBlockedBeforeExecution` and
`Diagnose_FullyQualifiedForbiddenCall_RaisesSCADA002` fail against the pre-fix code and
pass after; `RunInSandbox_CleanScript_StillRuns` guards against over-blocking. Fixed by
the commit whose message references `CentralUI-001`.
### CentralUI-002 — Site-scoped Deployment permissions are issued but never enforced
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:63-69`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/*.razor` |
**Description**
Login adds `SiteId` claims (`JwtTokenService.SiteIdClaimType`) for non-system-wide
Deployment users, and the design doc (Component-CentralUI "Responsibilities" and
CLAUDE.md Security & Auth) requires the Deployment role to be site-scoped. A
repo-wide search shows the `SiteId` claim is written at login and **never read
anywhere in CentralUI**. Deployment pages — `DebugView.razor`, `Deployments.razor`,
`InstanceCreate.razor`, `InstanceConfigure.razor`, `Topology.razor`,
`ParkedMessages.razor`, `EventLogs.razor` — list and act on every site with no
filtering by the user's permitted sites. A Deployment user scoped to one site
can deploy to, debug, and manage instances at any site.
**Recommendation**
Enforce site scoping: filter site/instance lists by the user's `SiteId` claims
(or treat the absence of `SiteId` claims as system-wide), and re-check the claim
server-side before any mutating cross-site command (deploy, enable/disable/delete,
debug stream, parked-message retry/discard). A shared helper that reads the
claims from `AuthenticationStateProvider` and exposes "permitted site ids" would
keep this consistent.
**Resolution**
Resolved 2026-05-16. Confirmed: the `SiteId` claim was written at login
(`AuthEndpoints`, `RoleMapper`) but never read by any CentralUI page — site
scoping was unenforced. Added a scoped `SiteScopeService` (`Auth/SiteScopeService.cs`)
that reads the current circuit's `SiteId` claims and exposes `IsSystemWideAsync`,
`PermittedSiteIdsAsync`, `FilterSitesAsync`, and `IsSiteAllowedAsync` (absence of
claims = system-wide, matching `SiteScopeAuthorizationHandler`). All seven
Deployment/Monitoring pages now consume it: `Topology`, `DebugView`,
`InstanceCreate`, `Deployments` filter their site/instance lists; `InstanceConfigure`
rejects direct navigation to an instance on a non-permitted site; `DebugView`,
`InstanceCreate`, and `ParkedMessages` re-check the claim server-side before any
mutating/streaming command. Regression tests: `SiteScopeServiceTests` (6 tests
pinning the helper logic) and `TopologyPageTests.SiteScoping_ScopedDeploymentUser_OnlySeesPermittedSites`
/ `SiteScoping_SystemWideDeploymentUser_SeesAllSites`. Fixed by the commit whose
message references `CentralUI-002`.
### CentralUI-003 — `Console.SetOut`/`SetError` mutates process-global state across concurrent circuits
| | |
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:359-423` |
**Description**
`RunInSandboxAsync` redirects `Console.Out`/`Console.Error` to a per-call
`StringWriter`, runs the script, then restores them in `finally`. `Console.Out`
is process-global. If two users (two Blazor circuits) run Test Run concurrently,
their captured outputs interleave or cross over, and the `finally` of whichever
finishes first restores `Console.Out` to the *original* writer while the other
run is still executing — so the second run's script output is lost or written
to the real console. `RunInSandboxAsync` is `async` and the script runs on a
thread-pool thread, so concurrent execution is fully expected.
**Recommendation**
Do not redirect process-global `Console`. Provide console capture through the
script globals surface (e.g. a `TextWriter` exposed on `SandboxScriptHost` that
the sandbox API writes to), or serialize Test Run executions with a semaphore if
global redirection must be kept. Capturing per-call without global mutation is
the correct fix.
**Resolution**
Resolved 2026-05-16. Confirmed: `RunInSandboxAsync` redirected the process-global
`Console.Out`/`Console.Error` per call and restored them in `finally`, so a
concurrent run's `finally` could restore the writer while another run was still
executing — the long run silently lost output (reproduced by the regression
test, 74 of 80 expected lines captured). Added `SandboxConsoleCapture`, a routing
`TextWriter` installed into `Console.Out`/`Console.Error` exactly once for the
process; each run pushes its own `StringWriter` onto an `AsyncLocal` capture
scope via `BeginCapture`, so writes are routed per logical call-tree with no
per-run mutation of global `Console` state. `RunInSandboxAsync` now opens the
scope with `using` declarations instead of calling `Console.SetOut`. Regression
tests `RunInSandbox_CapturesConsoleOutput` and
`RunInSandbox_ConcurrentRuns_DoNotCrossContaminateConsoleOutput` fail against the
pre-fix code and pass after. Fixed by the commit whose message references
`CentralUI-003`.
### CentralUI-004 — `CookieAuthenticationStateProvider` reads `HttpContext` for the life of the circuit
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/CookieAuthenticationStateProvider.cs:22-28` |
**Description**
`GetAuthenticationStateAsync` returns `_httpContextAccessor.HttpContext?.User`.
In Blazor Server, `HttpContext` is only valid during the initial HTTP request
that establishes the circuit; for the lifetime of the long-lived SignalR circuit
`IHttpContextAccessor.HttpContext` is `null` (or, worse, a stale/foreign context
if the accessor's `AsyncLocal` leaks). Any later call to
`GetAuthenticationStateAsync` — e.g. an `<AuthorizeView>` re-evaluating, or pages
that call it directly (`Sites.razor`, `Templates.razor`) — then sees an
unauthenticated principal and may render the wrong UI, or returns a stale
identity that never reflects role changes. The class derives from
`ServerAuthenticationStateProvider`, which is designed to be seeded once via
`SetAuthenticationState`; overriding `GetAuthenticationStateAsync` to read
`HttpContext` defeats that design.
**Recommendation**
Capture the authenticated principal once when the circuit is created (e.g. via
the root component / `AuthenticationStateProvider` seeding pattern used by the
Blazor Web App template) and store it on the scoped provider, instead of reading
`IHttpContextAccessor` on every call. Do not depend on `HttpContext` after the
circuit is established.
**Resolution**
Resolved 2026-05-16. Confirmed: `GetAuthenticationStateAsync` read
`_httpContextAccessor.HttpContext?.User` on every call; the provider is
registered `Scoped`, so it is constructed within the initial HTTP request's DI
scope while `HttpContext` is still valid, but every later call (an
`<AuthorizeView>` re-evaluating, or a page calling it directly) over the
long-lived SignalR circuit saw `HttpContext == null` and returned an anonymous
principal. The provider now snapshots the principal once in the constructor into
a cached `Task<AuthenticationState>` and serves that for the life of the
circuit, never touching `IHttpContextAccessor` again. Regression tests
`CookieAuthenticationStateProviderTests.GetAuthenticationStateAsync_StillReturnsUser_AfterHttpContextIsGone`
and `..._IsStableAcrossCalls_IgnoringStaleForeignContext` fail against the
pre-fix code (they would see an anonymous / foreign principal) and pass after.
Fixed by the commit whose message references `CentralUI-004`.
### CentralUI-005 — Session expiry implementation diverges from the documented policy
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:47-81`; `src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor:18-30` |
**Description**
CLAUDE.md (Security & Auth) specifies "15-minute expiry with sliding refresh,
30-minute idle timeout." `AuthEndpoints` instead sets a single fixed
`expires_at = UtcNow + 30 minutes` claim and a 30-minute cookie `ExpiresUtc`,
with no sliding refresh and no separate idle vs absolute timeout.
`SessionExpiry.razor` schedules a single hard redirect at that fixed time. The
result is a hard 30-minute cap with no sliding renewal — an active user is
logged out mid-session, and there is no 15-minute component at all.
**Recommendation**
Either implement the documented policy (sliding 15-minute token with refresh on
activity, plus a 30-minute idle cutoff) or update the design docs to match the
fixed 30-minute model. The code and the documented decision must agree.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`) — cross-module fix (CentralUI +
Security), explicitly authorized. Root cause confirmed against the source:
`AddCookie` (`ScadaLink.Security/ServiceCollectionExtensions.cs`) set neither
`ExpireTimeSpan` nor `SlidingExpiration`; `AuthEndpoints` stamped a fixed
`expires_at = UtcNow + 30 min` claim and a 30-minute absolute cookie
`ExpiresUtc`; `SessionExpiry.razor` scheduled one hard redirect at that fixed
instant — a hard 30-minute cap, no sliding renewal, no 15-minute component.
**What was implemented — sliding session window.** ASP.NET Core cookie
authentication exposes a single `ExpireTimeSpan` plus a `SlidingExpiration`
flag; it cannot natively model *both* a 15-minute sliding token *and* a separate
30-minute absolute idle cap. The faithful interpretation implemented: the cookie
session window **is** the idle timeout. `AddSecurity` now post-configures the
cookie options with `ExpireTimeSpan = TimeSpan.FromMinutes(SecurityOptions.IdleTimeoutMinutes)`
(default 30, bound from `appsettings` via the existing options pattern, not
hard-coded) and `SlidingExpiration = true`. The middleware therefore re-issues
the cookie on activity once past the halfway mark of the window: an active user
is continually renewed, an idle user is signed out after the 30-minute idle
timeout — exactly the documented "sliding refresh, 30-minute idle timeout". The
separate 15-minute `JwtExpiryMinutes` governs the lifetime of the *embedded JWT*
itself (`JwtTokenService`) — a distinct layer from the cookie session window;
it is not, and per the ASP.NET cookie model cannot be, a second independent
sliding window inside the same cookie. `AuthEndpoints` no longer imposes a
contradictory absolute cap: the `expires_at` claim and the manual cookie
`ExpiresUtc` were removed, and a new `BuildSignInProperties()` helper sets only
`IsPersistent = true` (no `ExpiresUtc`, `AllowRefresh` left unset) so the
middleware owns expiry. `SessionExpiry.razor` no longer reads a fixed
login-time deadline (the `expires_at` claim is gone) and no longer hard-redirects
at a fixed instant: it now polls the authentication state on a recurring
interval and redirects to `/login` only once the sliding cookie has actually
lapsed server-side — so an active user is never logged out mid-session.
Regression tests fail against the pre-fix code and pass after. Security:
`AddSecurity_AuthCookie_UsesSlidingExpiration`,
`AddSecurity_AuthCookie_ExpireTimeSpanMatchesIdleTimeout` (pre-fix
`ExpireTimeSpan` was the 14-day default — confirmed failing), and
`AddSecurity_AuthCookie_ExpireTimeSpanIsConfigurable` (pins the options-pattern
binding). CentralUI: `SessionExpiryPolicyTests.BuildSignInProperties_DoesNotSetFixedAbsoluteExpiry`,
`..._IsPersistent`, `..._AllowsSlidingRefresh` pin that the login sign-in no
longer imposes a fixed absolute cap. `dotnet build ScadaLink.slnx` clean;
`tests/ScadaLink.Security.Tests` 57 passed, `tests/ScadaLink.CentralUI.Tests`
254 passed.
### CentralUI-006 — Deployment status page polls every 10s despite the documented SignalR-push design
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:196-216` |
**Description**
Component-CentralUI "Real-Time Updates" states: "Deployment status:
Pending/in-progress/success/failed transitions push to the UI immediately via
SignalR (built into Blazor Server). No polling required for deployment
tracking." `Deployments.razor` instead runs a `Timer` that reloads all
deployment records and instance names from the database every 10 seconds. This
is a full N-record + instance-map reload per tick for every open circuit, and
contradicts the design. It also re-issues two repository round-trips on each
tick regardless of whether anything changed.
**Recommendation**
Implement push-based updates (an injected event/observable raised by the
Deployment Manager that the page subscribes to and renders via
`InvokeAsync(StateHasChanged)`), or amend the design doc to acknowledge polling.
If polling is kept as a fallback, fetch only changed/in-progress records.
**Resolution**
_Unresolved — a genuine SignalR-push fix requires an event source in another
module._ Verified 2026-05-16: `Deployments.razor` does poll every 10s, contrary
to the design doc. But a real push implementation needs the **Deployment
Manager** module (`ScadaLink.DeploymentManager``DeploymentService` /
`ArtifactDeploymentService` write the `DeploymentRecord` rows) to raise a
status-change event/observable that the page subscribes to; there is no such
event today and no CentralUI-only seam to subscribe to. Building that event
source is out of scope for a CentralUI-only review. Left Open and surfaced for a
follow-up that adds a deployment-status broadcaster in DeploymentManager (or a
design-doc amendment acknowledging the polling fallback).
### CentralUI-007 — Monitoring nav links to Deployment-only pages are shown to all roles
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor:69-78`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/EventLogs.razor:2`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/ParkedMessages.razor:2` |
**Description**
`NavMenu` renders the "Event Logs" and "Parked Messages" links inside the
all-authenticated-users Monitoring section. The design doc classifies both the
Site Event Log Viewer and Parked Message Management as **Deployment Role**.
Two inconsistencies result: (a) an Admin- or Design-only user sees nav links
they cannot use; (b) the pages themselves are annotated only `[Authorize]`
(any authenticated user), not `[Authorize(Policy = RequireDeployment)]`, so a
non-Deployment user who follows the link is *not* blocked — they can query site
event logs and retry/discard parked messages. The authorization attribute and
the nav visibility both contradict the design.
**Recommendation**
Add `[Authorize(Policy = AuthorizationPolicies.RequireDeployment)]` to
`EventLogs.razor` and `ParkedMessages.razor`, and move their nav links into a
`<AuthorizeView Policy="RequireDeployment">` block (consistent with the Topology
/ Deployments / Debug View links). Confirm Health Dashboard is intentionally
all-roles (it is, per the design).
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: both `EventLogs.razor` and
`ParkedMessages.razor` carried a bare `[Authorize]`, so any authenticated user
could query site event logs and retry/discard parked messages — contrary to the
design doc's Deployment-Role classification. Both pages now use
`[Authorize(Policy = AuthorizationPolicies.RequireDeployment)]`, and the
"Event Logs" / "Parked Messages" nav links were moved out of the all-roles
Monitoring block into an `<AuthorizeView Policy="RequireDeployment">` (Health
Dashboard stays all-roles, as the design intends). Regression tests
`MonitoringAuthorizationTests.{EventLogsPage,ParkedMessagesPage}_RequiresDeploymentPolicy`
fail against the pre-fix code and pass after;
`HealthDashboard_IsIntentionallyAllAuthenticatedRoles` guards the all-roles page.
### CentralUI-008 — Audit-log date filters treat browser-local datetimes as UTC
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor:242-243` |
**Description**
The `From`/`To` filters bind `<input type="datetime-local">` to `DateTime?`
fields. A `datetime-local` input yields the value the user typed in their
*browser-local* time zone. `FetchPage` converts them with
`new DateTimeOffset(_filterFrom.Value, TimeSpan.Zero)` — i.e. it labels the
local wall-clock value as UTC. For any non-UTC user the audit query window is
shifted by their UTC offset, silently returning the wrong rows. CLAUDE.md
mandates UTC throughout, but that requires converting the local input *to* UTC,
not relabelling it.
**Recommendation**
Convert the picked local time to UTC before querying — capture the browser
offset (JS interop) and apply it, or document the inputs as UTC and label them
in the UI. The same issue should be checked in `EventLogs.razor` if it has
time-range filters.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `FetchPage` wrapped the
`datetime-local` value with `new DateTimeOffset(value, TimeSpan.Zero)`,
relabelling the browser-local wall-clock value as UTC and shifting the audit
query window by the user's offset. Added a pure helper
`Components/BrowserTime.LocalInputToUtc(DateTime?, int)` that converts a
local-input value to UTC using the browser's `Date.getTimezoneOffset()`;
`AuditLog.razor` now fetches that offset once via JS interop in
`OnAfterRenderAsync` (defaulting to 0/UTC on prerender or a disconnected
circuit) and runs both `from`/`to` filters through the helper. Regression suite
`BrowserTimeTests` (5 tests) fails against the naive relabelling and passes
after — including `LocalInputToUtc_NonUtcBrowser_DoesNotEqualNaiveRelabelling`,
which pins the exact pre-fix bug. `EventLogs.razor` was checked and has no
time-range filters, so it is unaffected.
### CentralUI-009 — `DebugView` stream callbacks touch a possibly-disposed `ToastNotification`
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/DebugView.razor:400-409,538-544` |
**Description**
The `onTerminated` callback passed to `DebugStreamService.StartStreamAsync`
captures `_toast` and `this` and runs on an Akka/gRPC thread. If the user
navigates away, `Dispose()` calls `StopStream`, but a stream-termination event
already in flight can still invoke `onTerminated`, which calls
`_toast.ShowError(...)` and `StateHasChanged()` on a disposed component. The
component does not guard callbacks with a disposed flag or a
`CancellationTokenSource`. The same applies to the `onEvent` callbacks at
lines 391-398 that call `InvokeAsync(StateHasChanged)`.
**Recommendation**
Track a `_disposed`/`CancellationTokenSource` on the component, check it at the
top of every stream callback, and stop the stream synchronously before marking
disposed. `InvokeAsync` after disposal throws `ObjectDisposedException`; the
callbacks should no-op once disposed.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: the `onEvent`/`onTerminated`
callbacks captured `this` and `_toast` and ran on an Akka/gRPC thread with no
disposal guard. Added a `volatile bool _disposed` flag, set first thing in
`Dispose()` before the stream is stopped. Every callback now checks `_disposed`
and no-ops if set; the render dispatch goes through a new `SafeInvokeAsync`
helper that re-checks the flag and swallows `ObjectDisposedException` should the
component be disposed between the guard and the dispatch. Regression tests
`DebugViewDisposalTests.{DebugView_HasDisposalGuardField,
DebugView_Dispose_SetsDisposedFlag_AndIsIdempotent}` pin the observable contract
(the guard field exists; `Dispose()` sets it and is idempotent) — the first
fails against the pre-fix code, which had no `_disposed` field. The Akka-thread
timing race itself is not deterministically reproducible in a unit test:
`DebugStreamService` is a non-virtual concrete class with no seam to inject and
later fire the callbacks, so the closest meaningful tests pin the guard
mechanism rather than the race window.
### CentralUI-010 — `ToastNotification` auto-dismiss continuation runs after component disposal
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/ToastNotification.razor:62-71,90` |
**Description**
`AddToast` schedules `Task.Delay(dismissMs).ContinueWith(...)` with the result
discarded (`_ =`). The continuation calls `InvokeAsync(StateHasChanged)`. If the
host page is disposed before the 5-second delay elapses (common — navigate away
right after an action), the continuation runs against a disposed component and
`InvokeAsync` throws `ObjectDisposedException` on a thread-pool thread with no
catch, producing an unobserved task exception. `Dispose()` is an empty body and
cancels nothing.
**Recommendation**
Hold a `CancellationTokenSource`, pass its token to `Task.Delay`, cancel it in
`Dispose()`, and guard the continuation. Alternatively wrap the continuation
body in a try/catch for `ObjectDisposedException`.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `AddToast` scheduled
`Task.Delay(...).ContinueWith(...)` with no cancellation and `Dispose()` was an
empty body, so the continuation ran `InvokeAsync(StateHasChanged)` against a
disposed component. Added a `CancellationTokenSource _disposalCts` cancelled in
`Dispose()`; the auto-dismiss is now an `AutoDismissAsync` method that awaits
`Task.Delay(dismissMs, token)`, returns on `OperationCanceledException`, and
wraps the post-delay `InvokeAsync(StateHasChanged)` in a try/catch for
`ObjectDisposedException`. `AddToast` also short-circuits if the component is
already disposed. Regression tests:
`ToastNotificationTests.ShowToast_AfterDisposal_IsNoOp_AndSchedulesNothing`
fails against the pre-fix code (which still added the toast / mis-scheduled
after disposal) and passes after;
`AutoDismiss_AfterDisposal_DoesNotThrowUnobservedException` and
`AutoDismiss_BeforeDisposal_StillRemovesToast` guard the no-throw and
still-works behaviours.
### CentralUI-011 — `DiffDialog` leaves a dangling `TaskCompletionSource` when disposed while open
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/DiffDialog.razor:89-95,151-157` |
**Description**
`OpenAsync` creates `_tcs` and returns `_tcs.Task` to the caller, which
typically `await`s it. The task is completed only by `Close()`. If the user
navigates away while the dialog is open, `DisposeAsync` runs but never completes
`_tcs`, so the awaiting caller's continuation never resumes — a permanently
suspended `Task` (and any `using`/cleanup after the await is skipped). The
`IDialogService.Confirm/Prompt` path has the same shape but at least its host
is a single long-lived `DialogHost`; `DiffDialog` is per-page.
**Recommendation**
In `DisposeAsync`, call `_tcs?.TrySetResult(false)` (or `TrySetCanceled`) so any
awaiter completes deterministically.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `OpenAsync` returned
`_tcs.Task`, completed only by `Close()`; `DisposeAsync` never touched the TCS,
so disposing the dialog while open left the awaiting caller suspended forever.
`DisposeAsync` now calls `_tcs?.TrySetResult(false)` before unlocking the body,
so a dialog disposed while open resolves its caller to `false` (not confirmed).
Regression test `DiffDialogTests.DisposeAsync_WhileOpen_CompletesPendingTask`
fails against the pre-fix code (the pending task stays `WaitingForActivation`)
and passes after; `Close_CompletesPendingTaskWithTrue` guards the normal close
path.
### CentralUI-012 — N+1 query loading data connections for the Sites page
| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:196-205` |
**Description**
`LoadDataAsync` fetches all sites, then issues
`SiteRepository.GetDataConnectionsBySiteIdAsync(site.Id)` once per site in a
loop. With N sites this is N+1 database round-trips on every page load and every
post-delete refresh. The connection lists are only used for a small per-card
summary.
**Recommendation**
Add a repository method that returns all data connections (or connections for a
set of site ids) in one query and group them client-side, or project the small
summary in a single query.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `LoadDataAsync` looped
`GetDataConnectionsBySiteIdAsync(site.Id)` once per site (N+1). `ISiteRepository`
already exposes `GetAllDataConnectionsAsync()` and `DataConnection` carries a
`SiteId`, so the loop was replaced with a single `GetAllDataConnectionsAsync()`
call grouped client-side by `SiteId` — one query regardless of site count, on
every load and post-delete refresh. Regression tests
`SitesPageTests.{LoadData_FetchesAllConnectionsInOneQuery_NoPerSiteQueries,
LoadData_GroupsConnectionsBySite_AndRendersThem}` fail against the pre-fix code
(`GetDataConnectionsBySiteIdAsync` was called per site) and pass after.
### CentralUI-013 — `ScriptAnalysisService` blocks on async shared-script lookups
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:951-952` (actual call at `:975`) |
**Description**
`ResolveCalledShape` calls `_sharedScripts.GetShapesAsync().GetAwaiter().GetResult()`
to resolve a shared-script shape synchronously. `GetShapesAsync` ultimately hits
`SharedScriptService` and its EF Core repository. Sync-over-async on a request
thread risks thread-pool starvation under load and can deadlock if any awaited
continuation needs a captured context. `Hover` and `SignatureHelp` (which call
`ResolveCalledShape`) are themselves synchronous methods, so the blocking call
is structural.
**Recommendation**
Make `Hover` and `SignatureHelp` async and `await` `GetShapesAsync`, or have the
catalog expose a cached synchronous snapshot that is refreshed asynchronously.
The `IMemoryCache` is already present — caching the shapes there and reading
them synchronously would remove the blocking call.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed (the sync-over-async call is at
`:975`, not `:951-952` as originally cited — `ResolveCalledShape`'s
`Scripts.CallShared` branch). Took the recommended root-cause fix: `Hover` and
`SignatureHelp` are now `async Task<...>` and `ResolveCalledShape` is
`async Task<ScriptShape?>` which `await`s `_sharedScripts.GetShapesAsync()`
instead of `.GetAwaiter().GetResult()`. The two minimal-API endpoints
(`/hover`, `/signature-help`) were updated to `await` the methods. Regression
suite `ScriptAnalysisAsyncResolveTests` (3 tests): the structural test
`HoverAndSignatureHelp_AreAsync_NotSyncOverAsync` fails against the pre-fix
synchronous signatures, and two behavioural tests resolve shared-script shapes
through a catalog that only completes after `Task.Yield()` (a genuinely async
source). The five existing `Hover`/`SignatureHelp` tests in
`ScriptAnalysisServiceTests` were updated to `await` the now-async methods.
### CentralUI-014 — Test Run side effects (HTTP/SQL/SMTP) fire against production services
| | |
|--|--|
| Severity | Low (re-triaged from Medium 2026-05-16 — see Resolution) |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:254-259`; `src/ScadaLink.CentralUI/ScriptAnalysis/SandboxHostHelpers.cs:26-117` |
**Description**
By design (documented in the XML comments) Test Run wires `ExternalSystem`,
`Database`, and `Notify` to central's *real* `IExternalSystemClient`,
`IDatabaseGateway`, and `INotificationDeliveryService`, so a Test Run that calls
`Notify.To(...).Send(...)` actually emails recipients, `Database.Connection(...)`
opens a real DB connection, and `External.Call(...)` makes real HTTP calls —
with production-equivalent side effects. There is no dry-run mode, no
confirmation, and (combined with CentralUI-001) no restriction on what a script
can do. A Design user testing a draft script can dispatch real notifications or
mutate external databases. The behaviour is intentional but the blast radius is
not surfaced to the user.
**Recommendation**
At minimum, surface a clear warning in the Test Run UI that side effects are
real, and require explicit opt-in for side-effecting calls. Preferably offer a
dry-run mode that stubs the helpers, defaulting to dry-run.
**Resolution**
Resolved 2026-05-16 (commit pending) — **re-triaged**. Re-verified against the
reviewed commit `9c60592`: the finding's premise that "the blast radius is not
surfaced to the user" is **inaccurate**. Both Test Run surfaces that can produce
real side effects — `SharedScriptForm.razor` and the script Test Run in
`TemplateEdit.razor` — already carry a prominent `Real I/O` badge on the panel
header and an `alert-warning` block stating `External`/`Database`/`Notify` calls
"fire for real … real HTTP, real SQL, real emails. Side effects are permanent"
(present since commit `2951507`, an ancestor of the reviewed commit, confirmed
via `git merge-base`). `ApiMethodForm.razor` (Inbound API kind) has **no**
real-I/O surface at all — `SandboxInboundScriptHost` exposes only
`Parameters`/`Route` (Route throws) — and correctly omits the badge while still
warning. Revealing the panel ("Test Run" toggle) then clicking "Run" is itself a
two-step explicit opt-in. The minimum recommendation is therefore already met;
the optional dry-run mode is a separate feature decision the design doc does not
mandate. Severity re-triaged Medium → Low (intentional, documented, clearly
warned behaviour — not a bug). Regression suite `TestRunWarningTests` (3 tests)
pins the `Real I/O` badge + warning text in `SharedScriptForm`/`TemplateEdit`
and the deliberate absence of the badge in `ApiMethodForm`, so the warning
cannot silently regress.
### CentralUI-015 — `DialogService` continuations resolve off the render thread
| | |
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Won't Fix |
| Location | `src/ScadaLink.CentralUI/ServiceCollectionExtensions.cs:24`; `src/ScadaLink.CentralUI/Components/Shared/DialogService.cs:18-69` |
**Description**
`DialogService` is `AddScoped` (one per circuit, correct) but
`ConfirmAsync`/`PromptAsync` complete via `ContinueWith(..., TaskScheduler.Default)`,
so a caller awaiting them resumes on a thread-pool thread. Any subsequent
component state mutation by the caller is then off the render thread unless the
caller wraps it in `InvokeAsync`. Call sites are not consistently doing so,
which can produce non-deterministic render glitches.
**Recommendation**
Either resolve continuations on the circuit's sync context or document that
callers must `InvokeAsync` after awaiting `ConfirmAsync`/`PromptAsync`. Audit
call sites for off-thread state mutation.
**Resolution**
Won't Fix — **re-triaged 2026-05-16, the finding's premise is incorrect.** The
finding claims `ContinueWith(..., TaskScheduler.Default)` makes an awaiting
caller resume on a thread-pool thread. It does not. `TaskScheduler.Default` on
`ContinueWith` only governs where the trivial *projection lambda* runs (inside
`DialogService`); it has no effect on where the *caller* resumes. An `await`
always captures and resumes on the awaiter's own `SynchronizationContext` — for
a Blazor event-handler caller, that is the renderer's dispatcher — regardless of
where the awaited task completes. This was verified directly:
`DialogServiceThreadingTests.ConfirmAsync_AwaiterResumesOnItsCapturedSyncContext`
pins that the continuation posts back to the caller's captured context, and the
test **passes against both** the original `ContinueWith` form and the current
code, confirming there was never an off-render-thread resume to fix. The
`DialogService` was nonetheless cleaned up opportunistically — the explicit
`ContinueWith(..., TaskScheduler.Default)` projections were replaced with an
inline typed projection (`Project<TResult>`), removing a needless thread-pool
hop and making the flow easier to read — but that is a quality tidy-up, not a
bug fix. Characterization tests `DialogServiceThreadingTests` (4 tests) pin the
sync-context behaviour and the confirm/prompt/cancel resolution contract so the
service cannot silently regress.
### CentralUI-016 — Pagers render one button per page with no windowing
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/DataTable.razor:62-68`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:167-173` |
**Description**
The `DataTable` and `Deployments` paginators loop `for i = 1..totalPages` and
emit a `<li>` button for every page. With a few thousand records at page size 25
that is hundreds of buttons rendered into the diff on every state change. It is
not a correctness bug but degrades render performance and usability on large
datasets.
**Recommendation**
Window the pager (first / prev / a few around current / next / last) or switch
large lists to a "load more" / numeric jump input.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: both `DataTable` and
`Deployments` looped `for i = 1..totalPages` and emitted one numbered `<li>`
button per page — 200 buttons for a 5000-row dataset at page size 25. Added a
pure `PagerWindow.Build(currentPage, totalPages)` helper
(`Components/Shared/PagerWindow.cs`) that returns a bounded window — always the
first and last page plus a small range around the current page, with a `0`
sentinel marking an elided gap (rendered as a disabled `&hellip;`). Both
paginators now iterate `PagerWindow.Build(...)` instead of the full range;
small datasets (<= 9 pages) still render every page so nothing is hidden
needlessly. Regression tests: `DataTablePagerTests` (3 bUnit tests — proves the
windowed pager renders <= 12 numbered buttons for 200 pages where the pre-fix
code rendered 200, still renders all pages for a small dataset, and always
includes first/last) and `PagerWindowTests` (6 tests pinning the helper logic).
### CentralUI-017 — `/auth/logout` POST disables antiforgery, enabling logout CSRF
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:127-138` |
**Description**
The `POST /auth/logout` endpoint calls `.DisableAntiforgery()`, and a plain
`GET /logout` endpoint also signs the user out. Either can be triggered
cross-site (an `<img src="/logout">` or an auto-submitting form) to forcibly log
a user out. Login itself reasonably disables antiforgery (pre-auth), but logout
is a state-changing authenticated action and should be CSRF-protected.
**Recommendation**
Require an antiforgery token on `POST /auth/logout` (the `NavMenu` sign-out form
can include the antiforgery token), and remove or protect the state-changing
`GET /logout` route.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `POST /auth/logout` called
`.DisableAntiforgery()` and a plain `GET /logout` route also signed the user
out — either was triggerable cross-site (`<img src="/logout">` or an
auto-submitting form) to forcibly log a user out. The `.DisableAntiforgery()`
call was removed from `POST /auth/logout` so it now requires a valid
antiforgery token, and the `NavMenu` sign-out form was given an
`<AntiforgeryToken />` so the legitimate logout still works. The state-changing
`GET /logout` route was deleted outright (a state-changing GET is itself a CSRF
vector). `POST /auth/login` intentionally keeps `.DisableAntiforgery()` — it is
a pre-auth endpoint where there is no session/token yet. Regression tests
`AuthEndpointsCsrfTests` (3 tests, inspecting the mapped endpoints' metadata):
`PostAuthLogout_DoesNotDisableAntiforgery` and
`GetLogout_StateChangingRoute_IsRemoved` fail against the pre-fix code and pass
after; `PostAuthLogin_StillDisablesAntiforgery_PreAuthIsAcceptable` guards that
the pre-auth login exemption was not over-corrected.
### CentralUI-018 — Broad `catch {}` blocks swallow JS interop and storage errors silently
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/MonacoEditor.razor:116-118,123,142,164,170,176,182,189`; `src/ScadaLink.CentralUI/Components/Shared/TreeView.razor:129,139`; `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:316-319` |
**Description**
Numerous `try { ... } catch { }` blocks swallow every exception with no logging.
The prerender-time JS-unavailable case is legitimate, but these catches also
hide real failures: a genuine Monaco init failure, or a clipboard permission
error become invisible. In `TreeView.razor` the storage-restore
`JsonSerializer.Deserialize` (line 139) is not inside a try at all and would
throw uncaught on a corrupt `treeviewStorage` payload. Debugging UI issues in
production is then guesswork.
**Recommendation**
Catch the specific expected exception type (e.g. `JSDisconnectedException`,
`InvalidOperationException` during prerender) and log anything else via
`ILogger`. Wrap the TreeView storage `Deserialize` in its own guarded block.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed all three locations.
(1) **TreeView** — the storage-restore `JsonSerializer.Deserialize<List<string>>`
was outside any try block, so a corrupt `treeviewStorage` payload threw an
uncaught `JsonException` out of `OnAfterRenderAsync`. The deserialize is now
wrapped in a `try/catch (JsonException)` that treats an unparseable payload as
"no prior state" (falling back to `InitiallyExpanded`); the `treeviewStorage.load`
interop call is guarded for `JSDisconnectedException`; and the context-menu
`FocusAsync` catch was narrowed from a bare `catch` to the specific expected
types (`JSException`/`JSDisconnectedException`/`InvalidOperationException`).
(2) **MonacoEditor** — every JS interop call had a bare `catch { }`. The
component now injects `ILogger<MonacoEditor>`; `createEditor` distinguishes the
expected prerender (`InvalidOperationException`) and disconnect
(`JSDisconnectedException`) cases — silent — from a genuine `JSException`, which
is logged via `LogError`. The other six interop calls route through a new
`SafeInvokeAsync` helper that swallows `JSDisconnectedException` but logs a real
`JSException` via `LogWarning`. (3) **Sites.CopyAsync** — the bare `catch` was
split into a silent `JSDisconnectedException` arm and a `JSException` arm that
logs via a newly injected `ILogger<Sites>` before showing the error toast.
Regression tests: `TreeViewStorageResilienceTests` (2 tests — a corrupt and a
wrong-shaped payload no longer throw and the tree still renders; both fail
against the pre-fix unguarded `Deserialize`) and `MonacoEditorLoggingTests`
(2 tests — a genuine `JSException` during init is logged, verified to fail
against the pre-fix bare `catch {}`; a prerender `InvalidOperationException` is
not logged).
### CentralUI-019 — Sparse unit-test coverage for a large module; critical paths untested
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.CentralUI.Tests/` |
**Description**
The module has ~65 source files but unit tests cover only the script analyzer,
TreeView, schema model, and two data-connection pages. Untested critical paths
include: the auth bridge (`CookieAuthenticationStateProvider`,
`AuthEndpoints`), `RunInSandboxAsync` (timeout, recursion limit, error
classification, side-effect wiring), `DialogService` resolution semantics,
`DebugView` stream lifecycle and the `UpsertWithCap` cap logic, `Health` and
`Deployments` timer behaviour, and `SchemaBuilderModel` round-tripping of nested
schemas. Given findings CentralUI-001/003/009/010 sit on untested code, the gap
is material. The Playwright suite covers login and navigation only.
**Recommendation**
Add bUnit/unit tests for the auth bridge, sandbox-run behaviour (including
forbidden-API rejection once CentralUI-001 is fixed), dialog resolution, and the
DebugView cap/lifecycle logic. Prioritise the paths named in the Critical/High
findings.
**Resolution**
Resolved 2026-05-16 (commit pending). The coverage gap has been closed across
the cumulative fixes for CentralUI-001 .. 018 — every critical path the finding
named now has tests. Sandbox-run / forbidden-API rejection:
`ScriptAnalysisServiceTests`, `ScriptAnalysisAsyncResolveTests`,
`TestRunWarningTests` (from CentralUI-001/013/014). Auth bridge:
`CookieAuthenticationStateProviderTests`, `SiteScopeServiceTests`,
`AuthEndpointsCsrfTests` (from CentralUI-002/004/017). Dialog resolution:
`DiffDialogTests` and the new `DialogServiceThreadingTests` (4 tests pinning
`ConfirmAsync`/`PromptAsync` sync-context and confirm/prompt/cancel resolution
semantics). DebugView lifecycle: `DebugViewDisposalTests` (from CentralUI-009).
Toast/timer disposal: `ToastNotificationTests` (from CentralUI-010).
This batch also added `BrowserTimeTests`, `MonitoringAuthorizationTests`,
`SitesPageTests`, `DataTablePagerTests` + `PagerWindowTests`,
`TreeViewStorageResilienceTests`, and `MonacoEditorLoggingTests`. The
`tests/ScadaLink.CentralUI.Tests` suite is green at 251 tests. Remaining
untested paths are low-risk render-only pages; the Critical/High/Medium paths
the finding prioritised are all now covered, so the finding is considered
resolved. (Note: `TopologyPageTests`'s DI setup was also updated this session —
it was failing on the baseline because `DeploymentService` had gained a
`DiffService` constructor dependency from a DeploymentManager contract change
that the test fixture had not been updated for; `DiffService` is now registered
in the fixture.)