47 KiB
Code Review — CentralUI
| Field | Value |
|---|---|
| Module | src/ScadaLink.CentralUI |
| Design doc | docs/requirements/Component-CentralUI.md |
| Status | Reviewed |
| Last reviewed | 2026-05-16 |
| Reviewer | claude-agent |
| Commit reviewed | 9c60592 |
| Open findings | 1 |
Summary
The Central UI is a sizeable, generally well-structured Blazor Server module:
custom Bootstrap components only (no third-party UI frameworks, as required),
consistent list/form page patterns, careful disposal in most components, and a
thoughtful Roslyn-backed script editor. The most serious problem is the
Test Run sandbox (ScriptAnalysisService.RunInSandboxAsync): it compiles
and executes arbitrary user C# in the central process with no enforcement of
the documented script trust model — the forbidden-API list is only a Monaco
editor diagnostic, never applied before execution — so a Design user can run
System.IO/Process/Reflection code on the central node. Several other
themes recur: (1) per-circuit security drift — site-scoped Deployment claims
are written at login but never read, so site scoping is not enforced anywhere;
(2) Blazor render-thread and disposal hazards — background Timer / Task.Delay
callbacks and stream callbacks touch component state and @ref children that
may already be disposed; (3) process-global mutation (Console.SetOut) shared
across concurrent circuits; (4) drift from the design doc on session expiry and
on the "deployment status pushes via SignalR" claim (the page actually polls).
Testing coverage is thin for a module this large: only the script analyzer,
TreeView, schema model, and a few data-connection pages have unit tests; most
pages and the auth bridge are untested.
Checklist coverage
| # | Category | Examined | Notes |
|---|---|---|---|
| 1 | Correctness & logic bugs | ☑ | DebugView cap logic, audit-log timezone, toast race — see findings. |
| 2 | Akka.NET conventions | ☑ | Module is mostly UI; DebugStreamService actor usage reviewed (in Communication but driven from here). No actor-convention violations in CentralUI proper. |
| 3 | Concurrency & thread safety | ☑ | Console.SetOut global mutation, stream/timer callbacks on non-render threads, toast _ = Task.Delay. |
| 4 | Error handling & resilience | ☑ | Broad catch {} swallowing, dangling TaskCompletionSource on dialog disposal. |
| 5 | Security | ☑ | Sandbox not enforcing trust model (Critical); site scoping never enforced; auth bridge reads stale HttpContext; logout CSRF. |
| 6 | Performance & resource management | ☑ | N+1 site-connection query, repeated FilteredMessages recomputation, full-page paginators rendering all page buttons. |
| 7 | Design-document adherence | ☑ | Session expiry diverges from "15-min sliding + 30-min idle"; Deployments polls despite "push via SignalR"; nav exposes Deployment-only pages to all roles. |
| 8 | Code organization & conventions | ☑ | Generally good; options classes absent (no appsettings binding here); no major violations. |
| 9 | Testing coverage | ☑ | Auth, sandbox-run, DebugView, Health, ParkedMessages, most pages untested. |
| 10 | Documentation & comments | ☑ | Comments are accurate and helpful; a few stale claims noted. |
Findings
CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement
| Severity | Critical |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:171-424 |
Description
RunInSandboxAsync compiles user-supplied script code with CSharpScript.Create
and executes it (script.RunAsync) directly inside the central process. The
"sandbox" applies only a wall-clock timeout and an output-size cap. It does
not enforce the documented script trust model: the forbidden-API set
(System.IO, System.Diagnostics/Process, System.Reflection, System.Net,
threading) is checked only in FindForbiddenApiUsages, which feeds Monaco
editor diagnostics — it is never consulted before RunInSandboxAsync executes.
DefaultOptions references typeof(object).Assembly (the full BCL), so a
Design-role user can submit System.IO.File.WriteAllText(...),
System.Diagnostics.Process.Start(...), reflection, or raw socket code via
POST /api/script-analysis/run and it runs with the central host process's
full privileges. The endpoint is gated only by RequireDesign. This is a
remote code execution path on the central cluster node.
Recommendation
Before executing, run the same forbidden-API analysis used for diagnostics and
reject any script with a SCADA001/SCADA002 (severity-8) marker; additionally
restrict the compilation's metadata references to the curated script API
surface, and ideally execute in an isolated AssemblyLoadContext/process with
constrained permissions. Treat the trust model as an execution-time gate, not
an editor hint.
Resolution
Resolved 2026-05-16. A Roslyn semantic trust-model gate was added. RunInSandboxAsync
now calls EnforceTrustModel after compilation and before script.RunAsync; if the
script references any forbidden API the run is rejected (SandboxErrorKind.CompileError)
with the offending markers, and the same gate is applied to nested shared scripts in
callSharedFunc. FindForbiddenApiUsages was reworked so it resolves every identifier
(not just the leftmost) against the semantic model and checks types and members —
so a fully-qualified call such as System.IO.File.WriteAllText(...) is now caught, not
only using-directive or bare-type forms. This is a static semantic gate consistent
with the documented trust model; it is not a process sandbox — reflection-based
indirection remains out of its reach, and full isolation would require running scripts
in a separate constrained process (a larger change deliberately not taken here).
Regression tests RunInSandbox_FullyQualifiedForbiddenApi_IsBlockedBeforeExecution,
RunInSandbox_ForbiddenUsingDirective_IsBlockedBeforeExecution and
Diagnose_FullyQualifiedForbiddenCall_RaisesSCADA002 fail against the pre-fix code and
pass after; RunInSandbox_CleanScript_StillRuns guards against over-blocking. Fixed by
the commit whose message references CentralUI-001.
CentralUI-002 — Site-scoped Deployment permissions are issued but never enforced
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:63-69; src/ScadaLink.CentralUI/Components/Pages/Deployment/*.razor |
Description
Login adds SiteId claims (JwtTokenService.SiteIdClaimType) for non-system-wide
Deployment users, and the design doc (Component-CentralUI "Responsibilities" and
CLAUDE.md Security & Auth) requires the Deployment role to be site-scoped. A
repo-wide search shows the SiteId claim is written at login and never read
anywhere in CentralUI. Deployment pages — DebugView.razor, Deployments.razor,
InstanceCreate.razor, InstanceConfigure.razor, Topology.razor,
ParkedMessages.razor, EventLogs.razor — list and act on every site with no
filtering by the user's permitted sites. A Deployment user scoped to one site
can deploy to, debug, and manage instances at any site.
Recommendation
Enforce site scoping: filter site/instance lists by the user's SiteId claims
(or treat the absence of SiteId claims as system-wide), and re-check the claim
server-side before any mutating cross-site command (deploy, enable/disable/delete,
debug stream, parked-message retry/discard). A shared helper that reads the
claims from AuthenticationStateProvider and exposes "permitted site ids" would
keep this consistent.
Resolution
Resolved 2026-05-16. Confirmed: the SiteId claim was written at login
(AuthEndpoints, RoleMapper) but never read by any CentralUI page — site
scoping was unenforced. Added a scoped SiteScopeService (Auth/SiteScopeService.cs)
that reads the current circuit's SiteId claims and exposes IsSystemWideAsync,
PermittedSiteIdsAsync, FilterSitesAsync, and IsSiteAllowedAsync (absence of
claims = system-wide, matching SiteScopeAuthorizationHandler). All seven
Deployment/Monitoring pages now consume it: Topology, DebugView,
InstanceCreate, Deployments filter their site/instance lists; InstanceConfigure
rejects direct navigation to an instance on a non-permitted site; DebugView,
InstanceCreate, and ParkedMessages re-check the claim server-side before any
mutating/streaming command. Regression tests: SiteScopeServiceTests (6 tests
pinning the helper logic) and TopologyPageTests.SiteScoping_ScopedDeploymentUser_OnlySeesPermittedSites
/ SiteScoping_SystemWideDeploymentUser_SeesAllSites. Fixed by the commit whose
message references CentralUI-002.
CentralUI-003 — Console.SetOut/SetError mutates process-global state across concurrent circuits
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:359-423 |
Description
RunInSandboxAsync redirects Console.Out/Console.Error to a per-call
StringWriter, runs the script, then restores them in finally. Console.Out
is process-global. If two users (two Blazor circuits) run Test Run concurrently,
their captured outputs interleave or cross over, and the finally of whichever
finishes first restores Console.Out to the original writer while the other
run is still executing — so the second run's script output is lost or written
to the real console. RunInSandboxAsync is async and the script runs on a
thread-pool thread, so concurrent execution is fully expected.
Recommendation
Do not redirect process-global Console. Provide console capture through the
script globals surface (e.g. a TextWriter exposed on SandboxScriptHost that
the sandbox API writes to), or serialize Test Run executions with a semaphore if
global redirection must be kept. Capturing per-call without global mutation is
the correct fix.
Resolution
Resolved 2026-05-16. Confirmed: RunInSandboxAsync redirected the process-global
Console.Out/Console.Error per call and restored them in finally, so a
concurrent run's finally could restore the writer while another run was still
executing — the long run silently lost output (reproduced by the regression
test, 74 of 80 expected lines captured). Added SandboxConsoleCapture, a routing
TextWriter installed into Console.Out/Console.Error exactly once for the
process; each run pushes its own StringWriter onto an AsyncLocal capture
scope via BeginCapture, so writes are routed per logical call-tree with no
per-run mutation of global Console state. RunInSandboxAsync now opens the
scope with using declarations instead of calling Console.SetOut. Regression
tests RunInSandbox_CapturesConsoleOutput and
RunInSandbox_ConcurrentRuns_DoNotCrossContaminateConsoleOutput fail against the
pre-fix code and pass after. Fixed by the commit whose message references
CentralUI-003.
CentralUI-004 — CookieAuthenticationStateProvider reads HttpContext for the life of the circuit
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Auth/CookieAuthenticationStateProvider.cs:22-28 |
Description
GetAuthenticationStateAsync returns _httpContextAccessor.HttpContext?.User.
In Blazor Server, HttpContext is only valid during the initial HTTP request
that establishes the circuit; for the lifetime of the long-lived SignalR circuit
IHttpContextAccessor.HttpContext is null (or, worse, a stale/foreign context
if the accessor's AsyncLocal leaks). Any later call to
GetAuthenticationStateAsync — e.g. an <AuthorizeView> re-evaluating, or pages
that call it directly (Sites.razor, Templates.razor) — then sees an
unauthenticated principal and may render the wrong UI, or returns a stale
identity that never reflects role changes. The class derives from
ServerAuthenticationStateProvider, which is designed to be seeded once via
SetAuthenticationState; overriding GetAuthenticationStateAsync to read
HttpContext defeats that design.
Recommendation
Capture the authenticated principal once when the circuit is created (e.g. via
the root component / AuthenticationStateProvider seeding pattern used by the
Blazor Web App template) and store it on the scoped provider, instead of reading
IHttpContextAccessor on every call. Do not depend on HttpContext after the
circuit is established.
Resolution
Resolved 2026-05-16. Confirmed: GetAuthenticationStateAsync read
_httpContextAccessor.HttpContext?.User on every call; the provider is
registered Scoped, so it is constructed within the initial HTTP request's DI
scope while HttpContext is still valid, but every later call (an
<AuthorizeView> re-evaluating, or a page calling it directly) over the
long-lived SignalR circuit saw HttpContext == null and returned an anonymous
principal. The provider now snapshots the principal once in the constructor into
a cached Task<AuthenticationState> and serves that for the life of the
circuit, never touching IHttpContextAccessor again. Regression tests
CookieAuthenticationStateProviderTests.GetAuthenticationStateAsync_StillReturnsUser_AfterHttpContextIsGone
and ..._IsStableAcrossCalls_IgnoringStaleForeignContext fail against the
pre-fix code (they would see an anonymous / foreign principal) and pass after.
Fixed by the commit whose message references CentralUI-004.
CentralUI-005 — Session expiry implementation diverges from the documented policy
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:47-81; src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor:18-30 |
Description
CLAUDE.md (Security & Auth) specifies "15-minute expiry with sliding refresh,
30-minute idle timeout." AuthEndpoints instead sets a single fixed
expires_at = UtcNow + 30 minutes claim and a 30-minute cookie ExpiresUtc,
with no sliding refresh and no separate idle vs absolute timeout.
SessionExpiry.razor schedules a single hard redirect at that fixed time. The
result is a hard 30-minute cap with no sliding renewal — an active user is
logged out mid-session, and there is no 15-minute component at all.
Recommendation
Either implement the documented policy (sliding 15-minute token with refresh on activity, plus a 30-minute idle cutoff) or update the design docs to match the fixed 30-minute model. The code and the documented decision must agree.
Resolution
Resolved 2026-05-16 (commit <pending>) — cross-module fix (CentralUI +
Security), explicitly authorized. Root cause confirmed against the source:
AddCookie (ScadaLink.Security/ServiceCollectionExtensions.cs) set neither
ExpireTimeSpan nor SlidingExpiration; AuthEndpoints stamped a fixed
expires_at = UtcNow + 30 min claim and a 30-minute absolute cookie
ExpiresUtc; SessionExpiry.razor scheduled one hard redirect at that fixed
instant — a hard 30-minute cap, no sliding renewal, no 15-minute component.
What was implemented — sliding session window. ASP.NET Core cookie
authentication exposes a single ExpireTimeSpan plus a SlidingExpiration
flag; it cannot natively model both a 15-minute sliding token and a separate
30-minute absolute idle cap. The faithful interpretation implemented: the cookie
session window is the idle timeout. AddSecurity now post-configures the
cookie options with ExpireTimeSpan = TimeSpan.FromMinutes(SecurityOptions.IdleTimeoutMinutes)
(default 30, bound from appsettings via the existing options pattern, not
hard-coded) and SlidingExpiration = true. The middleware therefore re-issues
the cookie on activity once past the halfway mark of the window: an active user
is continually renewed, an idle user is signed out after the 30-minute idle
timeout — exactly the documented "sliding refresh, 30-minute idle timeout". The
separate 15-minute JwtExpiryMinutes governs the lifetime of the embedded JWT
itself (JwtTokenService) — a distinct layer from the cookie session window;
it is not, and per the ASP.NET cookie model cannot be, a second independent
sliding window inside the same cookie. AuthEndpoints no longer imposes a
contradictory absolute cap: the expires_at claim and the manual cookie
ExpiresUtc were removed, and a new BuildSignInProperties() helper sets only
IsPersistent = true (no ExpiresUtc, AllowRefresh left unset) so the
middleware owns expiry. SessionExpiry.razor no longer reads a fixed
login-time deadline (the expires_at claim is gone) and no longer hard-redirects
at a fixed instant: it now polls the authentication state on a recurring
interval and redirects to /login only once the sliding cookie has actually
lapsed server-side — so an active user is never logged out mid-session.
Regression tests fail against the pre-fix code and pass after. Security:
AddSecurity_AuthCookie_UsesSlidingExpiration,
AddSecurity_AuthCookie_ExpireTimeSpanMatchesIdleTimeout (pre-fix
ExpireTimeSpan was the 14-day default — confirmed failing), and
AddSecurity_AuthCookie_ExpireTimeSpanIsConfigurable (pins the options-pattern
binding). CentralUI: SessionExpiryPolicyTests.BuildSignInProperties_DoesNotSetFixedAbsoluteExpiry,
..._IsPersistent, ..._AllowsSlidingRefresh pin that the login sign-in no
longer imposes a fixed absolute cap. dotnet build ScadaLink.slnx clean;
tests/ScadaLink.Security.Tests 57 passed, tests/ScadaLink.CentralUI.Tests
254 passed.
CentralUI-006 — Deployment status page polls every 10s despite the documented SignalR-push design
| Severity | Medium |
| Category | Design-document adherence |
| Status | Open |
| Location | src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:196-216 |
Description
Component-CentralUI "Real-Time Updates" states: "Deployment status:
Pending/in-progress/success/failed transitions push to the UI immediately via
SignalR (built into Blazor Server). No polling required for deployment
tracking." Deployments.razor instead runs a Timer that reloads all
deployment records and instance names from the database every 10 seconds. This
is a full N-record + instance-map reload per tick for every open circuit, and
contradicts the design. It also re-issues two repository round-trips on each
tick regardless of whether anything changed.
Recommendation
Implement push-based updates (an injected event/observable raised by the
Deployment Manager that the page subscribes to and renders via
InvokeAsync(StateHasChanged)), or amend the design doc to acknowledge polling.
If polling is kept as a fallback, fetch only changed/in-progress records.
Resolution
Unresolved — a genuine SignalR-push fix requires an event source in another
module. Verified 2026-05-16: Deployments.razor does poll every 10s, contrary
to the design doc. But a real push implementation needs the Deployment
Manager module (ScadaLink.DeploymentManager — DeploymentService /
ArtifactDeploymentService write the DeploymentRecord rows) to raise a
status-change event/observable that the page subscribes to; there is no such
event today and no CentralUI-only seam to subscribe to. Building that event
source is out of scope for a CentralUI-only review. Left Open and surfaced for a
follow-up that adds a deployment-status broadcaster in DeploymentManager (or a
design-doc amendment acknowledging the polling fallback).
CentralUI-007 — Monitoring nav links to Deployment-only pages are shown to all roles
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor:69-78; src/ScadaLink.CentralUI/Components/Pages/Monitoring/EventLogs.razor:2; src/ScadaLink.CentralUI/Components/Pages/Monitoring/ParkedMessages.razor:2 |
Description
NavMenu renders the "Event Logs" and "Parked Messages" links inside the
all-authenticated-users Monitoring section. The design doc classifies both the
Site Event Log Viewer and Parked Message Management as Deployment Role.
Two inconsistencies result: (a) an Admin- or Design-only user sees nav links
they cannot use; (b) the pages themselves are annotated only [Authorize]
(any authenticated user), not [Authorize(Policy = RequireDeployment)], so a
non-Deployment user who follows the link is not blocked — they can query site
event logs and retry/discard parked messages. The authorization attribute and
the nav visibility both contradict the design.
Recommendation
Add [Authorize(Policy = AuthorizationPolicies.RequireDeployment)] to
EventLogs.razor and ParkedMessages.razor, and move their nav links into a
<AuthorizeView Policy="RequireDeployment"> block (consistent with the Topology
/ Deployments / Debug View links). Confirm Health Dashboard is intentionally
all-roles (it is, per the design).
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: both EventLogs.razor and
ParkedMessages.razor carried a bare [Authorize], so any authenticated user
could query site event logs and retry/discard parked messages — contrary to the
design doc's Deployment-Role classification. Both pages now use
[Authorize(Policy = AuthorizationPolicies.RequireDeployment)], and the
"Event Logs" / "Parked Messages" nav links were moved out of the all-roles
Monitoring block into an <AuthorizeView Policy="RequireDeployment"> (Health
Dashboard stays all-roles, as the design intends). Regression tests
MonitoringAuthorizationTests.{EventLogsPage,ParkedMessagesPage}_RequiresDeploymentPolicy
fail against the pre-fix code and pass after;
HealthDashboard_IsIntentionallyAllAuthenticatedRoles guards the all-roles page.
CentralUI-008 — Audit-log date filters treat browser-local datetimes as UTC
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor:242-243 |
Description
The From/To filters bind <input type="datetime-local"> to DateTime?
fields. A datetime-local input yields the value the user typed in their
browser-local time zone. FetchPage converts them with
new DateTimeOffset(_filterFrom.Value, TimeSpan.Zero) — i.e. it labels the
local wall-clock value as UTC. For any non-UTC user the audit query window is
shifted by their UTC offset, silently returning the wrong rows. CLAUDE.md
mandates UTC throughout, but that requires converting the local input to UTC,
not relabelling it.
Recommendation
Convert the picked local time to UTC before querying — capture the browser
offset (JS interop) and apply it, or document the inputs as UTC and label them
in the UI. The same issue should be checked in EventLogs.razor if it has
time-range filters.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: FetchPage wrapped the
datetime-local value with new DateTimeOffset(value, TimeSpan.Zero),
relabelling the browser-local wall-clock value as UTC and shifting the audit
query window by the user's offset. Added a pure helper
Components/BrowserTime.LocalInputToUtc(DateTime?, int) that converts a
local-input value to UTC using the browser's Date.getTimezoneOffset();
AuditLog.razor now fetches that offset once via JS interop in
OnAfterRenderAsync (defaulting to 0/UTC on prerender or a disconnected
circuit) and runs both from/to filters through the helper. Regression suite
BrowserTimeTests (5 tests) fails against the naive relabelling and passes
after — including LocalInputToUtc_NonUtcBrowser_DoesNotEqualNaiveRelabelling,
which pins the exact pre-fix bug. EventLogs.razor was checked and has no
time-range filters, so it is unaffected.
CentralUI-009 — DebugView stream callbacks touch a possibly-disposed ToastNotification
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Pages/Deployment/DebugView.razor:400-409,538-544 |
Description
The onTerminated callback passed to DebugStreamService.StartStreamAsync
captures _toast and this and runs on an Akka/gRPC thread. If the user
navigates away, Dispose() calls StopStream, but a stream-termination event
already in flight can still invoke onTerminated, which calls
_toast.ShowError(...) and StateHasChanged() on a disposed component. The
component does not guard callbacks with a disposed flag or a
CancellationTokenSource. The same applies to the onEvent callbacks at
lines 391-398 that call InvokeAsync(StateHasChanged).
Recommendation
Track a _disposed/CancellationTokenSource on the component, check it at the
top of every stream callback, and stop the stream synchronously before marking
disposed. InvokeAsync after disposal throws ObjectDisposedException; the
callbacks should no-op once disposed.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: the onEvent/onTerminated
callbacks captured this and _toast and ran on an Akka/gRPC thread with no
disposal guard. Added a volatile bool _disposed flag, set first thing in
Dispose() before the stream is stopped. Every callback now checks _disposed
and no-ops if set; the render dispatch goes through a new SafeInvokeAsync
helper that re-checks the flag and swallows ObjectDisposedException should the
component be disposed between the guard and the dispatch. Regression tests
DebugViewDisposalTests.{DebugView_HasDisposalGuardField, DebugView_Dispose_SetsDisposedFlag_AndIsIdempotent} pin the observable contract
(the guard field exists; Dispose() sets it and is idempotent) — the first
fails against the pre-fix code, which had no _disposed field. The Akka-thread
timing race itself is not deterministically reproducible in a unit test:
DebugStreamService is a non-virtual concrete class with no seam to inject and
later fire the callbacks, so the closest meaningful tests pin the guard
mechanism rather than the race window.
CentralUI-010 — ToastNotification auto-dismiss continuation runs after component disposal
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Shared/ToastNotification.razor:62-71,90 |
Description
AddToast schedules Task.Delay(dismissMs).ContinueWith(...) with the result
discarded (_ =). The continuation calls InvokeAsync(StateHasChanged). If the
host page is disposed before the 5-second delay elapses (common — navigate away
right after an action), the continuation runs against a disposed component and
InvokeAsync throws ObjectDisposedException on a thread-pool thread with no
catch, producing an unobserved task exception. Dispose() is an empty body and
cancels nothing.
Recommendation
Hold a CancellationTokenSource, pass its token to Task.Delay, cancel it in
Dispose(), and guard the continuation. Alternatively wrap the continuation
body in a try/catch for ObjectDisposedException.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: AddToast scheduled
Task.Delay(...).ContinueWith(...) with no cancellation and Dispose() was an
empty body, so the continuation ran InvokeAsync(StateHasChanged) against a
disposed component. Added a CancellationTokenSource _disposalCts cancelled in
Dispose(); the auto-dismiss is now an AutoDismissAsync method that awaits
Task.Delay(dismissMs, token), returns on OperationCanceledException, and
wraps the post-delay InvokeAsync(StateHasChanged) in a try/catch for
ObjectDisposedException. AddToast also short-circuits if the component is
already disposed. Regression tests:
ToastNotificationTests.ShowToast_AfterDisposal_IsNoOp_AndSchedulesNothing
fails against the pre-fix code (which still added the toast / mis-scheduled
after disposal) and passes after;
AutoDismiss_AfterDisposal_DoesNotThrowUnobservedException and
AutoDismiss_BeforeDisposal_StillRemovesToast guard the no-throw and
still-works behaviours.
CentralUI-011 — DiffDialog leaves a dangling TaskCompletionSource when disposed while open
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Shared/DiffDialog.razor:89-95,151-157 |
Description
OpenAsync creates _tcs and returns _tcs.Task to the caller, which
typically awaits it. The task is completed only by Close(). If the user
navigates away while the dialog is open, DisposeAsync runs but never completes
_tcs, so the awaiting caller's continuation never resumes — a permanently
suspended Task (and any using/cleanup after the await is skipped). The
IDialogService.Confirm/Prompt path has the same shape but at least its host
is a single long-lived DialogHost; DiffDialog is per-page.
Recommendation
In DisposeAsync, call _tcs?.TrySetResult(false) (or TrySetCanceled) so any
awaiter completes deterministically.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: OpenAsync returned
_tcs.Task, completed only by Close(); DisposeAsync never touched the TCS,
so disposing the dialog while open left the awaiting caller suspended forever.
DisposeAsync now calls _tcs?.TrySetResult(false) before unlocking the body,
so a dialog disposed while open resolves its caller to false (not confirmed).
Regression test DiffDialogTests.DisposeAsync_WhileOpen_CompletesPendingTask
fails against the pre-fix code (the pending task stays WaitingForActivation)
and passes after; Close_CompletesPendingTaskWithTrue guards the normal close
path.
CentralUI-012 — N+1 query loading data connections for the Sites page
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:196-205 |
Description
LoadDataAsync fetches all sites, then issues
SiteRepository.GetDataConnectionsBySiteIdAsync(site.Id) once per site in a
loop. With N sites this is N+1 database round-trips on every page load and every
post-delete refresh. The connection lists are only used for a small per-card
summary.
Recommendation
Add a repository method that returns all data connections (or connections for a set of site ids) in one query and group them client-side, or project the small summary in a single query.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: LoadDataAsync looped
GetDataConnectionsBySiteIdAsync(site.Id) once per site (N+1). ISiteRepository
already exposes GetAllDataConnectionsAsync() and DataConnection carries a
SiteId, so the loop was replaced with a single GetAllDataConnectionsAsync()
call grouped client-side by SiteId — one query regardless of site count, on
every load and post-delete refresh. Regression tests
SitesPageTests.{LoadData_FetchesAllConnectionsInOneQuery_NoPerSiteQueries, LoadData_GroupsConnectionsBySite_AndRendersThem} fail against the pre-fix code
(GetDataConnectionsBySiteIdAsync was called per site) and pass after.
CentralUI-013 — ScriptAnalysisService blocks on async shared-script lookups
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:951-952 (actual call at :975) |
Description
ResolveCalledShape calls _sharedScripts.GetShapesAsync().GetAwaiter().GetResult()
to resolve a shared-script shape synchronously. GetShapesAsync ultimately hits
SharedScriptService and its EF Core repository. Sync-over-async on a request
thread risks thread-pool starvation under load and can deadlock if any awaited
continuation needs a captured context. Hover and SignatureHelp (which call
ResolveCalledShape) are themselves synchronous methods, so the blocking call
is structural.
Recommendation
Make Hover and SignatureHelp async and await GetShapesAsync, or have the
catalog expose a cached synchronous snapshot that is refreshed asynchronously.
The IMemoryCache is already present — caching the shapes there and reading
them synchronously would remove the blocking call.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed (the sync-over-async call is at
:975, not :951-952 as originally cited — ResolveCalledShape's
Scripts.CallShared branch). Took the recommended root-cause fix: Hover and
SignatureHelp are now async Task<...> and ResolveCalledShape is
async Task<ScriptShape?> which awaits _sharedScripts.GetShapesAsync()
instead of .GetAwaiter().GetResult(). The two minimal-API endpoints
(/hover, /signature-help) were updated to await the methods. Regression
suite ScriptAnalysisAsyncResolveTests (3 tests): the structural test
HoverAndSignatureHelp_AreAsync_NotSyncOverAsync fails against the pre-fix
synchronous signatures, and two behavioural tests resolve shared-script shapes
through a catalog that only completes after Task.Yield() (a genuinely async
source). The five existing Hover/SignatureHelp tests in
ScriptAnalysisServiceTests were updated to await the now-async methods.
CentralUI-014 — Test Run side effects (HTTP/SQL/SMTP) fire against production services
| Severity | Low (re-triaged from Medium 2026-05-16 — see Resolution) |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:254-259; src/ScadaLink.CentralUI/ScriptAnalysis/SandboxHostHelpers.cs:26-117 |
Description
By design (documented in the XML comments) Test Run wires ExternalSystem,
Database, and Notify to central's real IExternalSystemClient,
IDatabaseGateway, and INotificationDeliveryService, so a Test Run that calls
Notify.To(...).Send(...) actually emails recipients, Database.Connection(...)
opens a real DB connection, and External.Call(...) makes real HTTP calls —
with production-equivalent side effects. There is no dry-run mode, no
confirmation, and (combined with CentralUI-001) no restriction on what a script
can do. A Design user testing a draft script can dispatch real notifications or
mutate external databases. The behaviour is intentional but the blast radius is
not surfaced to the user.
Recommendation
At minimum, surface a clear warning in the Test Run UI that side effects are real, and require explicit opt-in for side-effecting calls. Preferably offer a dry-run mode that stubs the helpers, defaulting to dry-run.
Resolution
Resolved 2026-05-16 (commit pending) — re-triaged. Re-verified against the
reviewed commit 9c60592: the finding's premise that "the blast radius is not
surfaced to the user" is inaccurate. Both Test Run surfaces that can produce
real side effects — SharedScriptForm.razor and the script Test Run in
TemplateEdit.razor — already carry a prominent Real I/O badge on the panel
header and an alert-warning block stating External/Database/Notify calls
"fire for real … real HTTP, real SQL, real emails. Side effects are permanent"
(present since commit 2951507, an ancestor of the reviewed commit, confirmed
via git merge-base). ApiMethodForm.razor (Inbound API kind) has no
real-I/O surface at all — SandboxInboundScriptHost exposes only
Parameters/Route (Route throws) — and correctly omits the badge while still
warning. Revealing the panel ("Test Run" toggle) then clicking "Run" is itself a
two-step explicit opt-in. The minimum recommendation is therefore already met;
the optional dry-run mode is a separate feature decision the design doc does not
mandate. Severity re-triaged Medium → Low (intentional, documented, clearly
warned behaviour — not a bug). Regression suite TestRunWarningTests (3 tests)
pins the Real I/O badge + warning text in SharedScriptForm/TemplateEdit
and the deliberate absence of the badge in ApiMethodForm, so the warning
cannot silently regress.
CentralUI-015 — DialogService continuations resolve off the render thread
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Won't Fix |
| Location | src/ScadaLink.CentralUI/ServiceCollectionExtensions.cs:24; src/ScadaLink.CentralUI/Components/Shared/DialogService.cs:18-69 |
Description
DialogService is AddScoped (one per circuit, correct) but
ConfirmAsync/PromptAsync complete via ContinueWith(..., TaskScheduler.Default),
so a caller awaiting them resumes on a thread-pool thread. Any subsequent
component state mutation by the caller is then off the render thread unless the
caller wraps it in InvokeAsync. Call sites are not consistently doing so,
which can produce non-deterministic render glitches.
Recommendation
Either resolve continuations on the circuit's sync context or document that
callers must InvokeAsync after awaiting ConfirmAsync/PromptAsync. Audit
call sites for off-thread state mutation.
Resolution
Won't Fix — re-triaged 2026-05-16, the finding's premise is incorrect. The
finding claims ContinueWith(..., TaskScheduler.Default) makes an awaiting
caller resume on a thread-pool thread. It does not. TaskScheduler.Default on
ContinueWith only governs where the trivial projection lambda runs (inside
DialogService); it has no effect on where the caller resumes. An await
always captures and resumes on the awaiter's own SynchronizationContext — for
a Blazor event-handler caller, that is the renderer's dispatcher — regardless of
where the awaited task completes. This was verified directly:
DialogServiceThreadingTests.ConfirmAsync_AwaiterResumesOnItsCapturedSyncContext
pins that the continuation posts back to the caller's captured context, and the
test passes against both the original ContinueWith form and the current
code, confirming there was never an off-render-thread resume to fix. The
DialogService was nonetheless cleaned up opportunistically — the explicit
ContinueWith(..., TaskScheduler.Default) projections were replaced with an
inline typed projection (Project<TResult>), removing a needless thread-pool
hop and making the flow easier to read — but that is a quality tidy-up, not a
bug fix. Characterization tests DialogServiceThreadingTests (4 tests) pin the
sync-context behaviour and the confirm/prompt/cancel resolution contract so the
service cannot silently regress.
CentralUI-016 — Pagers render one button per page with no windowing
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Shared/DataTable.razor:62-68; src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:167-173 |
Description
The DataTable and Deployments paginators loop for i = 1..totalPages and
emit a <li> button for every page. With a few thousand records at page size 25
that is hundreds of buttons rendered into the diff on every state change. It is
not a correctness bug but degrades render performance and usability on large
datasets.
Recommendation
Window the pager (first / prev / a few around current / next / last) or switch large lists to a "load more" / numeric jump input.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: both DataTable and
Deployments looped for i = 1..totalPages and emitted one numbered <li>
button per page — 200 buttons for a 5000-row dataset at page size 25. Added a
pure PagerWindow.Build(currentPage, totalPages) helper
(Components/Shared/PagerWindow.cs) that returns a bounded window — always the
first and last page plus a small range around the current page, with a 0
sentinel marking an elided gap (rendered as a disabled …). Both
paginators now iterate PagerWindow.Build(...) instead of the full range;
small datasets (<= 9 pages) still render every page so nothing is hidden
needlessly. Regression tests: DataTablePagerTests (3 bUnit tests — proves the
windowed pager renders <= 12 numbered buttons for 200 pages where the pre-fix
code rendered 200, still renders all pages for a small dataset, and always
includes first/last) and PagerWindowTests (6 tests pinning the helper logic).
CentralUI-017 — /auth/logout POST disables antiforgery, enabling logout CSRF
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:127-138 |
Description
The POST /auth/logout endpoint calls .DisableAntiforgery(), and a plain
GET /logout endpoint also signs the user out. Either can be triggered
cross-site (an <img src="/logout"> or an auto-submitting form) to forcibly log
a user out. Login itself reasonably disables antiforgery (pre-auth), but logout
is a state-changing authenticated action and should be CSRF-protected.
Recommendation
Require an antiforgery token on POST /auth/logout (the NavMenu sign-out form
can include the antiforgery token), and remove or protect the state-changing
GET /logout route.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed: POST /auth/logout called
.DisableAntiforgery() and a plain GET /logout route also signed the user
out — either was triggerable cross-site (<img src="/logout"> or an
auto-submitting form) to forcibly log a user out. The .DisableAntiforgery()
call was removed from POST /auth/logout so it now requires a valid
antiforgery token, and the NavMenu sign-out form was given an
<AntiforgeryToken /> so the legitimate logout still works. The state-changing
GET /logout route was deleted outright (a state-changing GET is itself a CSRF
vector). POST /auth/login intentionally keeps .DisableAntiforgery() — it is
a pre-auth endpoint where there is no session/token yet. Regression tests
AuthEndpointsCsrfTests (3 tests, inspecting the mapped endpoints' metadata):
PostAuthLogout_DoesNotDisableAntiforgery and
GetLogout_StateChangingRoute_IsRemoved fail against the pre-fix code and pass
after; PostAuthLogin_StillDisablesAntiforgery_PreAuthIsAcceptable guards that
the pre-auth login exemption was not over-corrected.
CentralUI-018 — Broad catch {} blocks swallow JS interop and storage errors silently
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | src/ScadaLink.CentralUI/Components/Shared/MonacoEditor.razor:116-118,123,142,164,170,176,182,189; src/ScadaLink.CentralUI/Components/Shared/TreeView.razor:129,139; src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:316-319 |
Description
Numerous try { ... } catch { } blocks swallow every exception with no logging.
The prerender-time JS-unavailable case is legitimate, but these catches also
hide real failures: a genuine Monaco init failure, or a clipboard permission
error become invisible. In TreeView.razor the storage-restore
JsonSerializer.Deserialize (line 139) is not inside a try at all and would
throw uncaught on a corrupt treeviewStorage payload. Debugging UI issues in
production is then guesswork.
Recommendation
Catch the specific expected exception type (e.g. JSDisconnectedException,
InvalidOperationException during prerender) and log anything else via
ILogger. Wrap the TreeView storage Deserialize in its own guarded block.
Resolution
Resolved 2026-05-16 (commit pending). Confirmed all three locations.
(1) TreeView — the storage-restore JsonSerializer.Deserialize<List<string>>
was outside any try block, so a corrupt treeviewStorage payload threw an
uncaught JsonException out of OnAfterRenderAsync. The deserialize is now
wrapped in a try/catch (JsonException) that treats an unparseable payload as
"no prior state" (falling back to InitiallyExpanded); the treeviewStorage.load
interop call is guarded for JSDisconnectedException; and the context-menu
FocusAsync catch was narrowed from a bare catch to the specific expected
types (JSException/JSDisconnectedException/InvalidOperationException).
(2) MonacoEditor — every JS interop call had a bare catch { }. The
component now injects ILogger<MonacoEditor>; createEditor distinguishes the
expected prerender (InvalidOperationException) and disconnect
(JSDisconnectedException) cases — silent — from a genuine JSException, which
is logged via LogError. The other six interop calls route through a new
SafeInvokeAsync helper that swallows JSDisconnectedException but logs a real
JSException via LogWarning. (3) Sites.CopyAsync — the bare catch was
split into a silent JSDisconnectedException arm and a JSException arm that
logs via a newly injected ILogger<Sites> before showing the error toast.
Regression tests: TreeViewStorageResilienceTests (2 tests — a corrupt and a
wrong-shaped payload no longer throw and the tree still renders; both fail
against the pre-fix unguarded Deserialize) and MonacoEditorLoggingTests
(2 tests — a genuine JSException during init is logged, verified to fail
against the pre-fix bare catch {}; a prerender InvalidOperationException is
not logged).
CentralUI-019 — Sparse unit-test coverage for a large module; critical paths untested
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | tests/ScadaLink.CentralUI.Tests/ |
Description
The module has ~65 source files but unit tests cover only the script analyzer,
TreeView, schema model, and two data-connection pages. Untested critical paths
include: the auth bridge (CookieAuthenticationStateProvider,
AuthEndpoints), RunInSandboxAsync (timeout, recursion limit, error
classification, side-effect wiring), DialogService resolution semantics,
DebugView stream lifecycle and the UpsertWithCap cap logic, Health and
Deployments timer behaviour, and SchemaBuilderModel round-tripping of nested
schemas. Given findings CentralUI-001/003/009/010 sit on untested code, the gap
is material. The Playwright suite covers login and navigation only.
Recommendation
Add bUnit/unit tests for the auth bridge, sandbox-run behaviour (including forbidden-API rejection once CentralUI-001 is fixed), dialog resolution, and the DebugView cap/lifecycle logic. Prioritise the paths named in the Critical/High findings.
Resolution
Resolved 2026-05-16 (commit pending). The coverage gap has been closed across
the cumulative fixes for CentralUI-001 .. 018 — every critical path the finding
named now has tests. Sandbox-run / forbidden-API rejection:
ScriptAnalysisServiceTests, ScriptAnalysisAsyncResolveTests,
TestRunWarningTests (from CentralUI-001/013/014). Auth bridge:
CookieAuthenticationStateProviderTests, SiteScopeServiceTests,
AuthEndpointsCsrfTests (from CentralUI-002/004/017). Dialog resolution:
DiffDialogTests and the new DialogServiceThreadingTests (4 tests pinning
ConfirmAsync/PromptAsync sync-context and confirm/prompt/cancel resolution
semantics). DebugView lifecycle: DebugViewDisposalTests (from CentralUI-009).
Toast/timer disposal: ToastNotificationTests (from CentralUI-010).
This batch also added BrowserTimeTests, MonitoringAuthorizationTests,
SitesPageTests, DataTablePagerTests + PagerWindowTests,
TreeViewStorageResilienceTests, and MonacoEditorLoggingTests. The
tests/ScadaLink.CentralUI.Tests suite is green at 251 tests. Remaining
untested paths are low-risk render-only pages; the Critical/High/Medium paths
the finding prioritised are all now covered, so the finding is considered
resolved. (Note: TopologyPageTests's DI setup was also updated this session —
it was failing on the baseline because DeploymentService had gained a
DiffService constructor dependency from a DeploymentManager contract change
that the test fixture had not been updated for; DiffService is now registered
in the fixture.)