Files

Joseph Doherty 1e2e7d2e7c fix(central-ui): resolve CentralUI-005 — sliding cookie session expiry (Security AddCookie + AuthEndpoints + SessionExpiry)

2026-05-16 23:54:31 -04:00

47 KiB

Raw Blame History

Code Review — CentralUI

Field	Value
Module	`src/ScadaLink.CentralUI`
Design doc	`docs/requirements/Component-CentralUI.md`
Status	Reviewed
Last reviewed	2026-05-16
Reviewer	claude-agent
Commit reviewed	`9c60592`
Open findings	1

Summary

The Central UI is a sizeable, generally well-structured Blazor Server module: custom Bootstrap components only (no third-party UI frameworks, as required), consistent list/form page patterns, careful disposal in most components, and a thoughtful Roslyn-backed script editor. The most serious problem is the Test Run sandbox (ScriptAnalysisService.RunInSandboxAsync): it compiles and executes arbitrary user C# in the central process with no enforcement of the documented script trust model — the forbidden-API list is only a Monaco editor diagnostic, never applied before execution — so a Design user can run System.IO/Process/Reflection code on the central node. Several other themes recur: (1) per-circuit security drift — site-scoped Deployment claims are written at login but never read, so site scoping is not enforced anywhere; (2) Blazor render-thread and disposal hazards — background Timer / Task.Delay callbacks and stream callbacks touch component state and @ref children that may already be disposed; (3) process-global mutation (Console.SetOut) shared across concurrent circuits; (4) drift from the design doc on session expiry and on the "deployment status pushes via SignalR" claim (the page actually polls). Testing coverage is thin for a module this large: only the script analyzer, TreeView, schema model, and a few data-connection pages have unit tests; most pages and the auth bridge are untested.

Checklist coverage

#	Category	Examined	Notes
1	Correctness & logic bugs	☑	DebugView cap logic, audit-log timezone, toast race — see findings.
2	Akka.NET conventions	☑	Module is mostly UI; `DebugStreamService` actor usage reviewed (in Communication but driven from here). No actor-convention violations in CentralUI proper.
3	Concurrency & thread safety	☑	`Console.SetOut` global mutation, stream/timer callbacks on non-render threads, toast `_ = Task.Delay`.
4	Error handling & resilience	☑	Broad `catch {}` swallowing, dangling `TaskCompletionSource` on dialog disposal.
5	Security	☑	Sandbox not enforcing trust model (Critical); site scoping never enforced; auth bridge reads stale HttpContext; logout CSRF.
6	Performance & resource management	☑	N+1 site-connection query, repeated `FilteredMessages` recomputation, full-page paginators rendering all page buttons.
7	Design-document adherence	☑	Session expiry diverges from "15-min sliding + 30-min idle"; Deployments polls despite "push via SignalR"; nav exposes Deployment-only pages to all roles.
8	Code organization & conventions	☑	Generally good; options classes absent (no appsettings binding here); no major violations.
9	Testing coverage	☑	Auth, sandbox-run, DebugView, Health, ParkedMessages, most pages untested.
10	Documentation & comments	☑	Comments are accurate and helpful; a few stale claims noted.

Findings

CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement


Severity	Critical
Category	Security
Status	Resolved
Location	`src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:171-424`

Description

RunInSandboxAsync compiles user-supplied script code with CSharpScript.Create and executes it (script.RunAsync) directly inside the central process. The "sandbox" applies only a wall-clock timeout and an output-size cap. It does not enforce the documented script trust model: the forbidden-API set (System.IO, System.Diagnostics/Process, System.Reflection, System.Net, threading) is checked only in FindForbiddenApiUsages, which feeds Monaco editor diagnostics — it is never consulted before RunInSandboxAsync executes. DefaultOptions references typeof(object).Assembly (the full BCL), so a Design-role user can submit System.IO.File.WriteAllText(...), System.Diagnostics.Process.Start(...), reflection, or raw socket code via POST /api/script-analysis/run and it runs with the central host process's full privileges. The endpoint is gated only by RequireDesign. This is a remote code execution path on the central cluster node.

Recommendation

Before executing, run the same forbidden-API analysis used for diagnostics and reject any script with a SCADA001/SCADA002 (severity-8) marker; additionally restrict the compilation's metadata references to the curated script API surface, and ideally execute in an isolated AssemblyLoadContext/process with constrained permissions. Treat the trust model as an execution-time gate, not an editor hint.

Resolution

Resolved 2026-05-16. A Roslyn semantic trust-model gate was added. RunInSandboxAsync now calls EnforceTrustModel after compilation and before script.RunAsync; if the script references any forbidden API the run is rejected (SandboxErrorKind.CompileError) with the offending markers, and the same gate is applied to nested shared scripts in callSharedFunc. FindForbiddenApiUsages was reworked so it resolves every identifier (not just the leftmost) against the semantic model and checks types and members — so a fully-qualified call such as System.IO.File.WriteAllText(...) is now caught, not only using-directive or bare-type forms. This is a static semantic gate consistent with the documented trust model; it is not a process sandbox — reflection-based indirection remains out of its reach, and full isolation would require running scripts in a separate constrained process (a larger change deliberately not taken here). Regression tests RunInSandbox_FullyQualifiedForbiddenApi_IsBlockedBeforeExecution, RunInSandbox_ForbiddenUsingDirective_IsBlockedBeforeExecution and Diagnose_FullyQualifiedForbiddenCall_RaisesSCADA002 fail against the pre-fix code and pass after; RunInSandbox_CleanScript_StillRuns guards against over-blocking. Fixed by the commit whose message references CentralUI-001.

CentralUI-002 — Site-scoped Deployment permissions are issued but never enforced


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:63-69`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/*.razor`

Description

Login adds SiteId claims (JwtTokenService.SiteIdClaimType) for non-system-wide Deployment users, and the design doc (Component-CentralUI "Responsibilities" and CLAUDE.md Security & Auth) requires the Deployment role to be site-scoped. A repo-wide search shows the SiteId claim is written at login and never read anywhere in CentralUI. Deployment pages — DebugView.razor, Deployments.razor, InstanceCreate.razor, InstanceConfigure.razor, Topology.razor, ParkedMessages.razor, EventLogs.razor — list and act on every site with no filtering by the user's permitted sites. A Deployment user scoped to one site can deploy to, debug, and manage instances at any site.

Recommendation

Enforce site scoping: filter site/instance lists by the user's SiteId claims (or treat the absence of SiteId claims as system-wide), and re-check the claim server-side before any mutating cross-site command (deploy, enable/disable/delete, debug stream, parked-message retry/discard). A shared helper that reads the claims from AuthenticationStateProvider and exposes "permitted site ids" would keep this consistent.

Resolution

Resolved 2026-05-16. Confirmed: the SiteId claim was written at login (AuthEndpoints, RoleMapper) but never read by any CentralUI page — site scoping was unenforced. Added a scoped SiteScopeService (Auth/SiteScopeService.cs) that reads the current circuit's SiteId claims and exposes IsSystemWideAsync, PermittedSiteIdsAsync, FilterSitesAsync, and IsSiteAllowedAsync (absence of claims = system-wide, matching SiteScopeAuthorizationHandler). All seven Deployment/Monitoring pages now consume it: Topology, DebugView, InstanceCreate, Deployments filter their site/instance lists; InstanceConfigure rejects direct navigation to an instance on a non-permitted site; DebugView, InstanceCreate, and ParkedMessages re-check the claim server-side before any mutating/streaming command. Regression tests: SiteScopeServiceTests (6 tests pinning the helper logic) and TopologyPageTests.SiteScoping_ScopedDeploymentUser_OnlySeesPermittedSites / SiteScoping_SystemWideDeploymentUser_SeesAllSites. Fixed by the commit whose message references CentralUI-002.

CentralUI-003 — `Console.SetOut`/`SetError` mutates process-global state across concurrent circuits


Severity	High
Category	Concurrency & thread safety
Status	Resolved
Location	`src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:359-423`

Description

RunInSandboxAsync redirects Console.Out/Console.Error to a per-call StringWriter, runs the script, then restores them in finally. Console.Out is process-global. If two users (two Blazor circuits) run Test Run concurrently, their captured outputs interleave or cross over, and the finally of whichever finishes first restores Console.Out to the original writer while the other run is still executing — so the second run's script output is lost or written to the real console. RunInSandboxAsync is async and the script runs on a thread-pool thread, so concurrent execution is fully expected.

Recommendation

Do not redirect process-global Console. Provide console capture through the script globals surface (e.g. a TextWriter exposed on SandboxScriptHost that the sandbox API writes to), or serialize Test Run executions with a semaphore if global redirection must be kept. Capturing per-call without global mutation is the correct fix.

Resolution

Resolved 2026-05-16. Confirmed: RunInSandboxAsync redirected the process-global Console.Out/Console.Error per call and restored them in finally, so a concurrent run's finally could restore the writer while another run was still executing — the long run silently lost output (reproduced by the regression test, 74 of 80 expected lines captured). Added SandboxConsoleCapture, a routing TextWriter installed into Console.Out/Console.Error exactly once for the process; each run pushes its own StringWriter onto an AsyncLocal capture scope via BeginCapture, so writes are routed per logical call-tree with no per-run mutation of global Console state. RunInSandboxAsync now opens the scope with using declarations instead of calling Console.SetOut. Regression tests RunInSandbox_CapturesConsoleOutput and RunInSandbox_ConcurrentRuns_DoNotCrossContaminateConsoleOutput fail against the pre-fix code and pass after. Fixed by the commit whose message references CentralUI-003.

CentralUI-004 — `CookieAuthenticationStateProvider` reads `HttpContext` for the life of the circuit


Severity	High
Category	Security
Status	Resolved
Location	`src/ScadaLink.CentralUI/Auth/CookieAuthenticationStateProvider.cs:22-28`

Description

GetAuthenticationStateAsync returns _httpContextAccessor.HttpContext?.User. In Blazor Server, HttpContext is only valid during the initial HTTP request that establishes the circuit; for the lifetime of the long-lived SignalR circuit IHttpContextAccessor.HttpContext is null (or, worse, a stale/foreign context if the accessor's AsyncLocal leaks). Any later call to GetAuthenticationStateAsync — e.g. an <AuthorizeView> re-evaluating, or pages that call it directly (Sites.razor, Templates.razor) — then sees an unauthenticated principal and may render the wrong UI, or returns a stale identity that never reflects role changes. The class derives from ServerAuthenticationStateProvider, which is designed to be seeded once via SetAuthenticationState; overriding GetAuthenticationStateAsync to read HttpContext defeats that design.

Recommendation

Capture the authenticated principal once when the circuit is created (e.g. via the root component / AuthenticationStateProvider seeding pattern used by the Blazor Web App template) and store it on the scoped provider, instead of reading IHttpContextAccessor on every call. Do not depend on HttpContext after the circuit is established.

Resolution

Resolved 2026-05-16. Confirmed: GetAuthenticationStateAsync read _httpContextAccessor.HttpContext?.User on every call; the provider is registered Scoped, so it is constructed within the initial HTTP request's DI scope while HttpContext is still valid, but every later call (an <AuthorizeView> re-evaluating, or a page calling it directly) over the long-lived SignalR circuit saw HttpContext == null and returned an anonymous principal. The provider now snapshots the principal once in the constructor into a cached Task<AuthenticationState> and serves that for the life of the circuit, never touching IHttpContextAccessor again. Regression tests CookieAuthenticationStateProviderTests.GetAuthenticationStateAsync_StillReturnsUser_AfterHttpContextIsGone and ..._IsStableAcrossCalls_IgnoringStaleForeignContext fail against the pre-fix code (they would see an anonymous / foreign principal) and pass after. Fixed by the commit whose message references CentralUI-004.

CentralUI-005 — Session expiry implementation diverges from the documented policy


Severity	Medium
Category	Design-document adherence
Status	Resolved
Location	`src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:47-81`; `src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor:18-30`

Description

CLAUDE.md (Security & Auth) specifies "15-minute expiry with sliding refresh, 30-minute idle timeout." AuthEndpoints instead sets a single fixed expires_at = UtcNow + 30 minutes claim and a 30-minute cookie ExpiresUtc, with no sliding refresh and no separate idle vs absolute timeout. SessionExpiry.razor schedules a single hard redirect at that fixed time. The result is a hard 30-minute cap with no sliding renewal — an active user is logged out mid-session, and there is no 15-minute component at all.

Recommendation

Either implement the documented policy (sliding 15-minute token with refresh on activity, plus a 30-minute idle cutoff) or update the design docs to match the fixed 30-minute model. The code and the documented decision must agree.

Resolution

Resolved 2026-05-16 (commit <pending>) — cross-module fix (CentralUI + Security), explicitly authorized. Root cause confirmed against the source: AddCookie (ScadaLink.Security/ServiceCollectionExtensions.cs) set neither ExpireTimeSpan nor SlidingExpiration; AuthEndpoints stamped a fixed expires_at = UtcNow + 30 min claim and a 30-minute absolute cookie ExpiresUtc; SessionExpiry.razor scheduled one hard redirect at that fixed instant — a hard 30-minute cap, no sliding renewal, no 15-minute component.

What was implemented — sliding session window. ASP.NET Core cookie authentication exposes a single ExpireTimeSpan plus a SlidingExpiration flag; it cannot natively model both a 15-minute sliding token and a separate 30-minute absolute idle cap. The faithful interpretation implemented: the cookie session window is the idle timeout. AddSecurity now post-configures the cookie options with ExpireTimeSpan = TimeSpan.FromMinutes(SecurityOptions.IdleTimeoutMinutes) (default 30, bound from appsettings via the existing options pattern, not hard-coded) and SlidingExpiration = true. The middleware therefore re-issues the cookie on activity once past the halfway mark of the window: an active user is continually renewed, an idle user is signed out after the 30-minute idle timeout — exactly the documented "sliding refresh, 30-minute idle timeout". The separate 15-minute JwtExpiryMinutes governs the lifetime of the embedded JWT itself (JwtTokenService) — a distinct layer from the cookie session window; it is not, and per the ASP.NET cookie model cannot be, a second independent sliding window inside the same cookie. AuthEndpoints no longer imposes a contradictory absolute cap: the expires_at claim and the manual cookie ExpiresUtc were removed, and a new BuildSignInProperties() helper sets only IsPersistent = true (no ExpiresUtc, AllowRefresh left unset) so the middleware owns expiry. SessionExpiry.razor no longer reads a fixed login-time deadline (the expires_at claim is gone) and no longer hard-redirects at a fixed instant: it now polls the authentication state on a recurring interval and redirects to /login only once the sliding cookie has actually lapsed server-side — so an active user is never logged out mid-session.

Regression tests fail against the pre-fix code and pass after. Security: AddSecurity_AuthCookie_UsesSlidingExpiration, AddSecurity_AuthCookie_ExpireTimeSpanMatchesIdleTimeout (pre-fix ExpireTimeSpan was the 14-day default — confirmed failing), and AddSecurity_AuthCookie_ExpireTimeSpanIsConfigurable (pins the options-pattern binding). CentralUI: SessionExpiryPolicyTests.BuildSignInProperties_DoesNotSetFixedAbsoluteExpiry, ..._IsPersistent, ..._AllowsSlidingRefresh pin that the login sign-in no longer imposes a fixed absolute cap. dotnet build ScadaLink.slnx clean; tests/ScadaLink.Security.Tests 57 passed, tests/ScadaLink.CentralUI.Tests 254 passed.

CentralUI-006 — Deployment status page polls every 10s despite the documented SignalR-push design


Severity	Medium
Category	Design-document adherence
Status	Open
Location	`src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:196-216`

Description

Component-CentralUI "Real-Time Updates" states: "Deployment status: Pending/in-progress/success/failed transitions push to the UI immediately via SignalR (built into Blazor Server). No polling required for deployment tracking." Deployments.razor instead runs a Timer that reloads all deployment records and instance names from the database every 10 seconds. This is a full N-record + instance-map reload per tick for every open circuit, and contradicts the design. It also re-issues two repository round-trips on each tick regardless of whether anything changed.

Recommendation

Implement push-based updates (an injected event/observable raised by the Deployment Manager that the page subscribes to and renders via InvokeAsync(StateHasChanged)), or amend the design doc to acknowledge polling. If polling is kept as a fallback, fetch only changed/in-progress records.

Resolution

Unresolved — a genuine SignalR-push fix requires an event source in another module. Verified 2026-05-16: Deployments.razor does poll every 10s, contrary to the design doc. But a real push implementation needs the Deployment Manager module (ScadaLink.DeploymentManager — DeploymentService / ArtifactDeploymentService write the DeploymentRecord rows) to raise a status-change event/observable that the page subscribes to; there is no such event today and no CentralUI-only seam to subscribe to. Building that event source is out of scope for a CentralUI-only review. Left Open and surfaced for a follow-up that adds a deployment-status broadcaster in DeploymentManager (or a design-doc amendment acknowledging the polling fallback).

CentralUI-007 — Monitoring nav links to Deployment-only pages are shown to all roles


Severity	Medium
Category	Correctness & logic bugs
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor:69-78`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/EventLogs.razor:2`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/ParkedMessages.razor:2`

Description

NavMenu renders the "Event Logs" and "Parked Messages" links inside the all-authenticated-users Monitoring section. The design doc classifies both the Site Event Log Viewer and Parked Message Management as Deployment Role. Two inconsistencies result: (a) an Admin- or Design-only user sees nav links they cannot use; (b) the pages themselves are annotated only [Authorize] (any authenticated user), not [Authorize(Policy = RequireDeployment)], so a non-Deployment user who follows the link is not blocked — they can query site event logs and retry/discard parked messages. The authorization attribute and the nav visibility both contradict the design.

Recommendation

Add [Authorize(Policy = AuthorizationPolicies.RequireDeployment)] to EventLogs.razor and ParkedMessages.razor, and move their nav links into a <AuthorizeView Policy="RequireDeployment"> block (consistent with the Topology / Deployments / Debug View links). Confirm Health Dashboard is intentionally all-roles (it is, per the design).

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: both EventLogs.razor and ParkedMessages.razor carried a bare [Authorize], so any authenticated user could query site event logs and retry/discard parked messages — contrary to the design doc's Deployment-Role classification. Both pages now use [Authorize(Policy = AuthorizationPolicies.RequireDeployment)], and the "Event Logs" / "Parked Messages" nav links were moved out of the all-roles Monitoring block into an <AuthorizeView Policy="RequireDeployment"> (Health Dashboard stays all-roles, as the design intends). Regression tests MonitoringAuthorizationTests.{EventLogsPage,ParkedMessagesPage}_RequiresDeploymentPolicy fail against the pre-fix code and pass after; HealthDashboard_IsIntentionallyAllAuthenticatedRoles guards the all-roles page.

CentralUI-008 — Audit-log date filters treat browser-local datetimes as UTC


Severity	Medium
Category	Correctness & logic bugs
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor:242-243`

Description

The From/To filters bind <input type="datetime-local"> to DateTime? fields. A datetime-local input yields the value the user typed in their browser-local time zone. FetchPage converts them with new DateTimeOffset(_filterFrom.Value, TimeSpan.Zero) — i.e. it labels the local wall-clock value as UTC. For any non-UTC user the audit query window is shifted by their UTC offset, silently returning the wrong rows. CLAUDE.md mandates UTC throughout, but that requires converting the local input to UTC, not relabelling it.

Recommendation

Convert the picked local time to UTC before querying — capture the browser offset (JS interop) and apply it, or document the inputs as UTC and label them in the UI. The same issue should be checked in EventLogs.razor if it has time-range filters.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: FetchPage wrapped the datetime-local value with new DateTimeOffset(value, TimeSpan.Zero), relabelling the browser-local wall-clock value as UTC and shifting the audit query window by the user's offset. Added a pure helper Components/BrowserTime.LocalInputToUtc(DateTime?, int) that converts a local-input value to UTC using the browser's Date.getTimezoneOffset(); AuditLog.razor now fetches that offset once via JS interop in OnAfterRenderAsync (defaulting to 0/UTC on prerender or a disconnected circuit) and runs both from/to filters through the helper. Regression suite BrowserTimeTests (5 tests) fails against the naive relabelling and passes after — including LocalInputToUtc_NonUtcBrowser_DoesNotEqualNaiveRelabelling, which pins the exact pre-fix bug. EventLogs.razor was checked and has no time-range filters, so it is unaffected.

CentralUI-009 — `DebugView` stream callbacks touch a possibly-disposed `ToastNotification`


Severity	Medium
Category	Concurrency & thread safety
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Pages/Deployment/DebugView.razor:400-409,538-544`

Description

The onTerminated callback passed to DebugStreamService.StartStreamAsync captures _toast and this and runs on an Akka/gRPC thread. If the user navigates away, Dispose() calls StopStream, but a stream-termination event already in flight can still invoke onTerminated, which calls _toast.ShowError(...) and StateHasChanged() on a disposed component. The component does not guard callbacks with a disposed flag or a CancellationTokenSource. The same applies to the onEvent callbacks at lines 391-398 that call InvokeAsync(StateHasChanged).

Recommendation

Track a _disposed/CancellationTokenSource on the component, check it at the top of every stream callback, and stop the stream synchronously before marking disposed. InvokeAsync after disposal throws ObjectDisposedException; the callbacks should no-op once disposed.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: the onEvent/onTerminated callbacks captured this and _toast and ran on an Akka/gRPC thread with no disposal guard. Added a volatile bool _disposed flag, set first thing in Dispose() before the stream is stopped. Every callback now checks _disposed and no-ops if set; the render dispatch goes through a new SafeInvokeAsync helper that re-checks the flag and swallows ObjectDisposedException should the component be disposed between the guard and the dispatch. Regression tests DebugViewDisposalTests.{DebugView_HasDisposalGuardField, DebugView_Dispose_SetsDisposedFlag_AndIsIdempotent} pin the observable contract (the guard field exists; Dispose() sets it and is idempotent) — the first fails against the pre-fix code, which had no _disposed field. The Akka-thread timing race itself is not deterministically reproducible in a unit test: DebugStreamService is a non-virtual concrete class with no seam to inject and later fire the callbacks, so the closest meaningful tests pin the guard mechanism rather than the race window.

CentralUI-010 — `ToastNotification` auto-dismiss continuation runs after component disposal


Severity	Medium
Category	Error handling & resilience
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Shared/ToastNotification.razor:62-71,90`

Description

AddToast schedules Task.Delay(dismissMs).ContinueWith(...) with the result discarded (_ =). The continuation calls InvokeAsync(StateHasChanged). If the host page is disposed before the 5-second delay elapses (common — navigate away right after an action), the continuation runs against a disposed component and InvokeAsync throws ObjectDisposedException on a thread-pool thread with no catch, producing an unobserved task exception. Dispose() is an empty body and cancels nothing.

Recommendation

Hold a CancellationTokenSource, pass its token to Task.Delay, cancel it in Dispose(), and guard the continuation. Alternatively wrap the continuation body in a try/catch for ObjectDisposedException.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: AddToast scheduled Task.Delay(...).ContinueWith(...) with no cancellation and Dispose() was an empty body, so the continuation ran InvokeAsync(StateHasChanged) against a disposed component. Added a CancellationTokenSource _disposalCts cancelled in Dispose(); the auto-dismiss is now an AutoDismissAsync method that awaits Task.Delay(dismissMs, token), returns on OperationCanceledException, and wraps the post-delay InvokeAsync(StateHasChanged) in a try/catch for ObjectDisposedException. AddToast also short-circuits if the component is already disposed. Regression tests: ToastNotificationTests.ShowToast_AfterDisposal_IsNoOp_AndSchedulesNothing fails against the pre-fix code (which still added the toast / mis-scheduled after disposal) and passes after; AutoDismiss_AfterDisposal_DoesNotThrowUnobservedException and AutoDismiss_BeforeDisposal_StillRemovesToast guard the no-throw and still-works behaviours.

CentralUI-011 — `DiffDialog` leaves a dangling `TaskCompletionSource` when disposed while open


Severity	Medium
Category	Error handling & resilience
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Shared/DiffDialog.razor:89-95,151-157`

Description

OpenAsync creates _tcs and returns _tcs.Task to the caller, which typically awaits it. The task is completed only by Close(). If the user navigates away while the dialog is open, DisposeAsync runs but never completes _tcs, so the awaiting caller's continuation never resumes — a permanently suspended Task (and any using/cleanup after the await is skipped). The IDialogService.Confirm/Prompt path has the same shape but at least its host is a single long-lived DialogHost; DiffDialog is per-page.

Recommendation

In DisposeAsync, call _tcs?.TrySetResult(false) (or TrySetCanceled) so any awaiter completes deterministically.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: OpenAsync returned _tcs.Task, completed only by Close(); DisposeAsync never touched the TCS, so disposing the dialog while open left the awaiting caller suspended forever. DisposeAsync now calls _tcs?.TrySetResult(false) before unlocking the body, so a dialog disposed while open resolves its caller to false (not confirmed). Regression test DiffDialogTests.DisposeAsync_WhileOpen_CompletesPendingTask fails against the pre-fix code (the pending task stays WaitingForActivation) and passes after; Close_CompletesPendingTaskWithTrue guards the normal close path.

CentralUI-012 — N+1 query loading data connections for the Sites page


Severity	Medium
Category	Performance & resource management
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:196-205`

Description

LoadDataAsync fetches all sites, then issues SiteRepository.GetDataConnectionsBySiteIdAsync(site.Id) once per site in a loop. With N sites this is N+1 database round-trips on every page load and every post-delete refresh. The connection lists are only used for a small per-card summary.

Recommendation

Add a repository method that returns all data connections (or connections for a set of site ids) in one query and group them client-side, or project the small summary in a single query.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: LoadDataAsync looped GetDataConnectionsBySiteIdAsync(site.Id) once per site (N+1). ISiteRepository already exposes GetAllDataConnectionsAsync() and DataConnection carries a SiteId, so the loop was replaced with a single GetAllDataConnectionsAsync() call grouped client-side by SiteId — one query regardless of site count, on every load and post-delete refresh. Regression tests SitesPageTests.{LoadData_FetchesAllConnectionsInOneQuery_NoPerSiteQueries, LoadData_GroupsConnectionsBySite_AndRendersThem} fail against the pre-fix code (GetDataConnectionsBySiteIdAsync was called per site) and pass after.

CentralUI-013 — `ScriptAnalysisService` blocks on async shared-script lookups


Severity	Medium
Category	Concurrency & thread safety
Status	Resolved
Location	`src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:951-952` (actual call at `:975`)

Description

ResolveCalledShape calls _sharedScripts.GetShapesAsync().GetAwaiter().GetResult() to resolve a shared-script shape synchronously. GetShapesAsync ultimately hits SharedScriptService and its EF Core repository. Sync-over-async on a request thread risks thread-pool starvation under load and can deadlock if any awaited continuation needs a captured context. Hover and SignatureHelp (which call ResolveCalledShape) are themselves synchronous methods, so the blocking call is structural.

Recommendation

Make Hover and SignatureHelp async and await GetShapesAsync, or have the catalog expose a cached synchronous snapshot that is refreshed asynchronously. The IMemoryCache is already present — caching the shapes there and reading them synchronously would remove the blocking call.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed (the sync-over-async call is at :975, not :951-952 as originally cited — ResolveCalledShape's Scripts.CallShared branch). Took the recommended root-cause fix: Hover and SignatureHelp are now async Task<...> and ResolveCalledShape is async Task<ScriptShape?> which awaits _sharedScripts.GetShapesAsync() instead of .GetAwaiter().GetResult(). The two minimal-API endpoints (/hover, /signature-help) were updated to await the methods. Regression suite ScriptAnalysisAsyncResolveTests (3 tests): the structural test HoverAndSignatureHelp_AreAsync_NotSyncOverAsync fails against the pre-fix synchronous signatures, and two behavioural tests resolve shared-script shapes through a catalog that only completes after Task.Yield() (a genuinely async source). The five existing Hover/SignatureHelp tests in ScriptAnalysisServiceTests were updated to await the now-async methods.

CentralUI-014 — Test Run side effects (HTTP/SQL/SMTP) fire against production services


Severity	Low (re-triaged from Medium 2026-05-16 — see Resolution)
Category	Error handling & resilience
Status	Resolved
Location	`src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:254-259`; `src/ScadaLink.CentralUI/ScriptAnalysis/SandboxHostHelpers.cs:26-117`

Description

By design (documented in the XML comments) Test Run wires ExternalSystem, Database, and Notify to central's real IExternalSystemClient, IDatabaseGateway, and INotificationDeliveryService, so a Test Run that calls Notify.To(...).Send(...) actually emails recipients, Database.Connection(...) opens a real DB connection, and External.Call(...) makes real HTTP calls — with production-equivalent side effects. There is no dry-run mode, no confirmation, and (combined with CentralUI-001) no restriction on what a script can do. A Design user testing a draft script can dispatch real notifications or mutate external databases. The behaviour is intentional but the blast radius is not surfaced to the user.

Recommendation

At minimum, surface a clear warning in the Test Run UI that side effects are real, and require explicit opt-in for side-effecting calls. Preferably offer a dry-run mode that stubs the helpers, defaulting to dry-run.

Resolution

Resolved 2026-05-16 (commit pending) — re-triaged. Re-verified against the reviewed commit 9c60592: the finding's premise that "the blast radius is not surfaced to the user" is inaccurate. Both Test Run surfaces that can produce real side effects — SharedScriptForm.razor and the script Test Run in TemplateEdit.razor — already carry a prominent Real I/O badge on the panel header and an alert-warning block stating External/Database/Notify calls "fire for real … real HTTP, real SQL, real emails. Side effects are permanent" (present since commit 2951507, an ancestor of the reviewed commit, confirmed via git merge-base). ApiMethodForm.razor (Inbound API kind) has no real-I/O surface at all — SandboxInboundScriptHost exposes only Parameters/Route (Route throws) — and correctly omits the badge while still warning. Revealing the panel ("Test Run" toggle) then clicking "Run" is itself a two-step explicit opt-in. The minimum recommendation is therefore already met; the optional dry-run mode is a separate feature decision the design doc does not mandate. Severity re-triaged Medium → Low (intentional, documented, clearly warned behaviour — not a bug). Regression suite TestRunWarningTests (3 tests) pins the Real I/O badge + warning text in SharedScriptForm/TemplateEdit and the deliberate absence of the badge in ApiMethodForm, so the warning cannot silently regress.

CentralUI-015 — `DialogService` continuations resolve off the render thread


Severity	Low
Category	Concurrency & thread safety
Status	Won't Fix
Location	`src/ScadaLink.CentralUI/ServiceCollectionExtensions.cs:24`; `src/ScadaLink.CentralUI/Components/Shared/DialogService.cs:18-69`

Description

DialogService is AddScoped (one per circuit, correct) but ConfirmAsync/PromptAsync complete via ContinueWith(..., TaskScheduler.Default), so a caller awaiting them resumes on a thread-pool thread. Any subsequent component state mutation by the caller is then off the render thread unless the caller wraps it in InvokeAsync. Call sites are not consistently doing so, which can produce non-deterministic render glitches.

Recommendation

Either resolve continuations on the circuit's sync context or document that callers must InvokeAsync after awaiting ConfirmAsync/PromptAsync. Audit call sites for off-thread state mutation.

Resolution

Won't Fix — re-triaged 2026-05-16, the finding's premise is incorrect. The finding claims ContinueWith(..., TaskScheduler.Default) makes an awaiting caller resume on a thread-pool thread. It does not. TaskScheduler.Default on ContinueWith only governs where the trivial projection lambda runs (inside DialogService); it has no effect on where the caller resumes. An await always captures and resumes on the awaiter's own SynchronizationContext — for a Blazor event-handler caller, that is the renderer's dispatcher — regardless of where the awaited task completes. This was verified directly: DialogServiceThreadingTests.ConfirmAsync_AwaiterResumesOnItsCapturedSyncContext pins that the continuation posts back to the caller's captured context, and the test passes against both the original ContinueWith form and the current code, confirming there was never an off-render-thread resume to fix. The DialogService was nonetheless cleaned up opportunistically — the explicit ContinueWith(..., TaskScheduler.Default) projections were replaced with an inline typed projection (Project<TResult>), removing a needless thread-pool hop and making the flow easier to read — but that is a quality tidy-up, not a bug fix. Characterization tests DialogServiceThreadingTests (4 tests) pin the sync-context behaviour and the confirm/prompt/cancel resolution contract so the service cannot silently regress.

CentralUI-016 — Pagers render one button per page with no windowing


Severity	Low
Category	Performance & resource management
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Shared/DataTable.razor:62-68`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:167-173`

Description

The DataTable and Deployments paginators loop for i = 1..totalPages and emit a <li> button for every page. With a few thousand records at page size 25 that is hundreds of buttons rendered into the diff on every state change. It is not a correctness bug but degrades render performance and usability on large datasets.

Recommendation

Window the pager (first / prev / a few around current / next / last) or switch large lists to a "load more" / numeric jump input.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: both DataTable and Deployments looped for i = 1..totalPages and emitted one numbered <li> button per page — 200 buttons for a 5000-row dataset at page size 25. Added a pure PagerWindow.Build(currentPage, totalPages) helper (Components/Shared/PagerWindow.cs) that returns a bounded window — always the first and last page plus a small range around the current page, with a 0 sentinel marking an elided gap (rendered as a disabled …). Both paginators now iterate PagerWindow.Build(...) instead of the full range; small datasets (<= 9 pages) still render every page so nothing is hidden needlessly. Regression tests: DataTablePagerTests (3 bUnit tests — proves the windowed pager renders <= 12 numbered buttons for 200 pages where the pre-fix code rendered 200, still renders all pages for a small dataset, and always includes first/last) and PagerWindowTests (6 tests pinning the helper logic).

CentralUI-017 — `/auth/logout` POST disables antiforgery, enabling logout CSRF


Severity	Low
Category	Security
Status	Resolved
Location	`src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:127-138`

Description

The POST /auth/logout endpoint calls .DisableAntiforgery(), and a plain GET /logout endpoint also signs the user out. Either can be triggered cross-site (an <img src="/logout"> or an auto-submitting form) to forcibly log a user out. Login itself reasonably disables antiforgery (pre-auth), but logout is a state-changing authenticated action and should be CSRF-protected.

Recommendation

Require an antiforgery token on POST /auth/logout (the NavMenu sign-out form can include the antiforgery token), and remove or protect the state-changing GET /logout route.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed: POST /auth/logout called .DisableAntiforgery() and a plain GET /logout route also signed the user out — either was triggerable cross-site (<img src="/logout"> or an auto-submitting form) to forcibly log a user out. The .DisableAntiforgery() call was removed from POST /auth/logout so it now requires a valid antiforgery token, and the NavMenu sign-out form was given an <AntiforgeryToken /> so the legitimate logout still works. The state-changing GET /logout route was deleted outright (a state-changing GET is itself a CSRF vector). POST /auth/login intentionally keeps .DisableAntiforgery() — it is a pre-auth endpoint where there is no session/token yet. Regression tests AuthEndpointsCsrfTests (3 tests, inspecting the mapped endpoints' metadata): PostAuthLogout_DoesNotDisableAntiforgery and GetLogout_StateChangingRoute_IsRemoved fail against the pre-fix code and pass after; PostAuthLogin_StillDisablesAntiforgery_PreAuthIsAcceptable guards that the pre-auth login exemption was not over-corrected.

CentralUI-018 — Broad `catch {}` blocks swallow JS interop and storage errors silently


Severity	Low
Category	Error handling & resilience
Status	Resolved
Location	`src/ScadaLink.CentralUI/Components/Shared/MonacoEditor.razor:116-118,123,142,164,170,176,182,189`; `src/ScadaLink.CentralUI/Components/Shared/TreeView.razor:129,139`; `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:316-319`

Description

Numerous try { ... } catch { } blocks swallow every exception with no logging. The prerender-time JS-unavailable case is legitimate, but these catches also hide real failures: a genuine Monaco init failure, or a clipboard permission error become invisible. In TreeView.razor the storage-restore JsonSerializer.Deserialize (line 139) is not inside a try at all and would throw uncaught on a corrupt treeviewStorage payload. Debugging UI issues in production is then guesswork.

Recommendation

Catch the specific expected exception type (e.g. JSDisconnectedException, InvalidOperationException during prerender) and log anything else via ILogger. Wrap the TreeView storage Deserialize in its own guarded block.

Resolution

Resolved 2026-05-16 (commit pending). Confirmed all three locations. (1) TreeView — the storage-restore JsonSerializer.Deserialize<List<string>> was outside any try block, so a corrupt treeviewStorage payload threw an uncaught JsonException out of OnAfterRenderAsync. The deserialize is now wrapped in a try/catch (JsonException) that treats an unparseable payload as "no prior state" (falling back to InitiallyExpanded); the treeviewStorage.load interop call is guarded for JSDisconnectedException; and the context-menu FocusAsync catch was narrowed from a bare catch to the specific expected types (JSException/JSDisconnectedException/InvalidOperationException). (2) MonacoEditor — every JS interop call had a bare catch { }. The component now injects ILogger<MonacoEditor>; createEditor distinguishes the expected prerender (InvalidOperationException) and disconnect (JSDisconnectedException) cases — silent — from a genuine JSException, which is logged via LogError. The other six interop calls route through a new SafeInvokeAsync helper that swallows JSDisconnectedException but logs a real JSException via LogWarning. (3) Sites.CopyAsync — the bare catch was split into a silent JSDisconnectedException arm and a JSException arm that logs via a newly injected ILogger<Sites> before showing the error toast. Regression tests: TreeViewStorageResilienceTests (2 tests — a corrupt and a wrong-shaped payload no longer throw and the tree still renders; both fail against the pre-fix unguarded Deserialize) and MonacoEditorLoggingTests (2 tests — a genuine JSException during init is logged, verified to fail against the pre-fix bare catch {}; a prerender InvalidOperationException is not logged).

CentralUI-019 — Sparse unit-test coverage for a large module; critical paths untested


Severity	Low
Category	Testing coverage
Status	Resolved
Location	`tests/ScadaLink.CentralUI.Tests/`

Description

The module has ~65 source files but unit tests cover only the script analyzer, TreeView, schema model, and two data-connection pages. Untested critical paths include: the auth bridge (CookieAuthenticationStateProvider, AuthEndpoints), RunInSandboxAsync (timeout, recursion limit, error classification, side-effect wiring), DialogService resolution semantics, DebugView stream lifecycle and the UpsertWithCap cap logic, Health and Deployments timer behaviour, and SchemaBuilderModel round-tripping of nested schemas. Given findings CentralUI-001/003/009/010 sit on untested code, the gap is material. The Playwright suite covers login and navigation only.

Recommendation

Add bUnit/unit tests for the auth bridge, sandbox-run behaviour (including forbidden-API rejection once CentralUI-001 is fixed), dialog resolution, and the DebugView cap/lifecycle logic. Prioritise the paths named in the Critical/High findings.

Resolution

Resolved 2026-05-16 (commit pending). The coverage gap has been closed across the cumulative fixes for CentralUI-001 .. 018 — every critical path the finding named now has tests. Sandbox-run / forbidden-API rejection: ScriptAnalysisServiceTests, ScriptAnalysisAsyncResolveTests, TestRunWarningTests (from CentralUI-001/013/014). Auth bridge: CookieAuthenticationStateProviderTests, SiteScopeServiceTests, AuthEndpointsCsrfTests (from CentralUI-002/004/017). Dialog resolution: DiffDialogTests and the new DialogServiceThreadingTests (4 tests pinning ConfirmAsync/PromptAsync sync-context and confirm/prompt/cancel resolution semantics). DebugView lifecycle: DebugViewDisposalTests (from CentralUI-009). Toast/timer disposal: ToastNotificationTests (from CentralUI-010). This batch also added BrowserTimeTests, MonitoringAuthorizationTests, SitesPageTests, DataTablePagerTests + PagerWindowTests, TreeViewStorageResilienceTests, and MonacoEditorLoggingTests. The tests/ScadaLink.CentralUI.Tests suite is green at 251 tests. Remaining untested paths are low-risk render-only pages; the Critical/High/Medium paths the finding prioritised are all now covered, so the finding is considered resolved. (Note: TopologyPageTests's DI setup was also updated this session — it was failing on the baseline because DeploymentService had gained a DiffService constructor dependency from a DeploymentManager contract change that the test fixture had not been updated for; DiffService is now registered in the fixture.)

47 KiB Raw Blame History

Code Review — CentralUI

Summary

Checklist coverage

Findings

CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement

CentralUI-002 — Site-scoped Deployment permissions are issued but never enforced

CentralUI-003 — Console.SetOut/SetError mutates process-global state across concurrent circuits

CentralUI-004 — CookieAuthenticationStateProvider reads HttpContext for the life of the circuit

CentralUI-005 — Session expiry implementation diverges from the documented policy

CentralUI-006 — Deployment status page polls every 10s despite the documented SignalR-push design

CentralUI-007 — Monitoring nav links to Deployment-only pages are shown to all roles

CentralUI-008 — Audit-log date filters treat browser-local datetimes as UTC

CentralUI-009 — DebugView stream callbacks touch a possibly-disposed ToastNotification

CentralUI-010 — ToastNotification auto-dismiss continuation runs after component disposal

CentralUI-011 — DiffDialog leaves a dangling TaskCompletionSource when disposed while open

CentralUI-012 — N+1 query loading data connections for the Sites page

CentralUI-013 — ScriptAnalysisService blocks on async shared-script lookups

CentralUI-014 — Test Run side effects (HTTP/SQL/SMTP) fire against production services

CentralUI-015 — DialogService continuations resolve off the render thread

CentralUI-016 — Pagers render one button per page with no windowing

CentralUI-017 — /auth/logout POST disables antiforgery, enabling logout CSRF

CentralUI-018 — Broad catch {} blocks swallow JS interop and storage errors silently

CentralUI-019 — Sparse unit-test coverage for a large module; critical paths untested

47 KiB

Raw Blame History

CentralUI-003 — `Console.SetOut`/`SetError` mutates process-global state across concurrent circuits

CentralUI-004 — `CookieAuthenticationStateProvider` reads `HttpContext` for the life of the circuit

CentralUI-009 — `DebugView` stream callbacks touch a possibly-disposed `ToastNotification`

CentralUI-010 — `ToastNotification` auto-dismiss continuation runs after component disposal

CentralUI-011 — `DiffDialog` leaves a dangling `TaskCompletionSource` when disposed while open

CentralUI-013 — `ScriptAnalysisService` blocks on async shared-script lookups

CentralUI-015 — `DialogService` continuations resolve off the render thread

CentralUI-017 — `/auth/logout` POST disables antiforgery, enabling logout CSRF

CentralUI-018 — Broad `catch {}` blocks swallow JS interop and storage errors silently