Files
scadalink-design/code-reviews/CentralUI/findings.md

1219 lines
64 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Review — CentralUI
| Field | Value |
|-------|-------|
| Module | `src/ScadaLink.CentralUI` |
| Design doc | `docs/requirements/Component-CentralUI.md` |
| Status | Reviewed |
| Last reviewed | 2026-05-17 |
| Reviewer | claude-agent |
| Commit reviewed | `39d737e` |
| Open findings | 0 |
## Summary
The Central UI is a sizeable, generally well-structured Blazor Server module:
custom Bootstrap components only (no third-party UI frameworks, as required),
consistent list/form page patterns, careful disposal in most components, and a
thoughtful Roslyn-backed script editor. The most serious problem is the
**Test Run sandbox** (`ScriptAnalysisService.RunInSandboxAsync`): it compiles
and executes arbitrary user C# *in the central process* with no enforcement of
the documented script trust model — the forbidden-API list is only a Monaco
editor diagnostic, never applied before execution — so a Design user can run
`System.IO`/`Process`/`Reflection` code on the central node. Several other
themes recur: (1) per-circuit security drift — site-scoped Deployment claims
are written at login but never read, so site scoping is not enforced anywhere;
(2) Blazor render-thread and disposal hazards — background `Timer` / `Task.Delay`
callbacks and stream callbacks touch component state and `@ref` children that
may already be disposed; (3) process-global mutation (`Console.SetOut`) shared
across concurrent circuits; (4) drift from the design doc on session expiry and
on the "deployment status pushes via SignalR" claim (the page actually polls).
Testing coverage is thin for a module this large: only the script analyzer,
TreeView, schema model, and a few data-connection pages have unit tests; most
pages and the auth bridge are untested.
#### Re-review 2026-05-17 (commit `39d737e`)
All 19 findings from the 2026-05-16 review are confirmed closed. The resolution
batch (`a9bd7ee`..`34588ae`) substantially rewrote the auth bridge, the script
sandbox, several Deployment/Monitoring pages, and the shared component disposal
paths, so this re-review re-examined the post-fix state across all 10 checklist
categories. Six new findings (CentralUI-020 .. 025) were recorded. The most
important is **CentralUI-020**: the two prior fixes interact destructively — the
CentralUI-004 fix made `CookieAuthenticationStateProvider` return a frozen,
constructor-time auth-state snapshot, while the CentralUI-005 fix rewrote
`SessionExpiry.razor` to *poll* that same provider to detect a lapsed session.
Because the snapshot never changes for the life of the circuit, the idle-timeout
redirect can never fire, so the documented idle-logout behaviour is silently
defeated. The remaining new findings are a cross-thread `Dictionary` mutation in
`DebugView`, an unguarded `InvokeAsync` in the new `Deployments` push handler,
and three Low-severity items (residual bare `catch`, magic-string claim
lookups, and the untested `SessionExpiry` polling path).
## Checklist coverage
| # | Category | Examined | Notes |
|---|----------|----------|-------|
| 1 | Correctness & logic bugs | ☑ | DebugView cap logic, audit-log timezone, toast race — see findings. |
| 2 | Akka.NET conventions | ☑ | Module is mostly UI; `DebugStreamService` actor usage reviewed (in Communication but driven from here). No actor-convention violations in CentralUI proper. |
| 3 | Concurrency & thread safety | ☑ | `Console.SetOut` global mutation, stream/timer callbacks on non-render threads, toast `_ = Task.Delay`. |
| 4 | Error handling & resilience | ☑ | Broad `catch {}` swallowing, dangling `TaskCompletionSource` on dialog disposal. |
| 5 | Security | ☑ | Sandbox not enforcing trust model (Critical); site scoping never enforced; auth bridge reads stale HttpContext; logout CSRF. |
| 6 | Performance & resource management | ☑ | N+1 site-connection query, repeated `FilteredMessages` recomputation, full-page paginators rendering all page buttons. |
| 7 | Design-document adherence | ☑ | Session expiry diverges from "15-min sliding + 30-min idle"; Deployments polls despite "push via SignalR"; nav exposes Deployment-only pages to all roles. |
| 8 | Code organization & conventions | ☑ | Generally good; options classes absent (no appsettings binding here); no major violations. |
| 9 | Testing coverage | ☑ | Auth, sandbox-run, DebugView, Health, ParkedMessages, most pages untested. |
| 10 | Documentation & comments | ☑ | Comments are accurate and helpful; a few stale claims noted. |
Re-review 2026-05-17 (`39d737e`): all 10 categories re-examined against the
post-fix source. New findings — category 3 (CentralUI-020 the auth-snapshot vs
session-poll interaction is also a design-adherence regression; CentralUI-021
cross-thread `Dictionary`; CentralUI-022 unguarded `InvokeAsync`), category 4
(CentralUI-023 residual bare `catch`), category 8 (CentralUI-024 magic-string
claims), category 9 (CentralUI-025 untested `SessionExpiry` poll). Categories
1, 2, 5, 6, 7, 10 produced no new findings.
## Findings
### CentralUI-001 — Test Run sandbox executes arbitrary C# with no trust-model enforcement
| | |
|--|--|
| Severity | Critical |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:171-424` |
**Description**
`RunInSandboxAsync` compiles user-supplied script code with `CSharpScript.Create`
and executes it (`script.RunAsync`) directly inside the central process. The
"sandbox" applies only a wall-clock timeout and an output-size cap. It does
**not** enforce the documented script trust model: the forbidden-API set
(`System.IO`, `System.Diagnostics`/`Process`, `System.Reflection`, `System.Net`,
threading) is checked only in `FindForbiddenApiUsages`, which feeds Monaco
editor diagnostics — it is never consulted before `RunInSandboxAsync` executes.
`DefaultOptions` references `typeof(object).Assembly` (the full BCL), so a
Design-role user can submit `System.IO.File.WriteAllText(...)`,
`System.Diagnostics.Process.Start(...)`, reflection, or raw socket code via
`POST /api/script-analysis/run` and it runs with the central host process's
full privileges. The endpoint is gated only by `RequireDesign`. This is a
remote code execution path on the central cluster node.
**Recommendation**
Before executing, run the same forbidden-API analysis used for diagnostics and
reject any script with a `SCADA001`/`SCADA002` (severity-8) marker; additionally
restrict the compilation's metadata references to the curated script API
surface, and ideally execute in an isolated `AssemblyLoadContext`/process with
constrained permissions. Treat the trust model as an execution-time gate, not
an editor hint.
**Resolution**
Resolved 2026-05-16. A Roslyn semantic trust-model gate was added. `RunInSandboxAsync`
now calls `EnforceTrustModel` after compilation and before `script.RunAsync`; if the
script references any forbidden API the run is rejected (`SandboxErrorKind.CompileError`)
with the offending markers, and the same gate is applied to nested shared scripts in
`callSharedFunc`. `FindForbiddenApiUsages` was reworked so it resolves every identifier
(not just the leftmost) against the semantic model and checks types **and** members —
so a fully-qualified call such as `System.IO.File.WriteAllText(...)` is now caught, not
only `using`-directive or bare-type forms. This is a static semantic gate consistent
with the documented trust model; it is not a process sandbox — reflection-based
indirection remains out of its reach, and full isolation would require running scripts
in a separate constrained process (a larger change deliberately not taken here).
Regression tests `RunInSandbox_FullyQualifiedForbiddenApi_IsBlockedBeforeExecution`,
`RunInSandbox_ForbiddenUsingDirective_IsBlockedBeforeExecution` and
`Diagnose_FullyQualifiedForbiddenCall_RaisesSCADA002` fail against the pre-fix code and
pass after; `RunInSandbox_CleanScript_StillRuns` guards against over-blocking. Fixed by
the commit whose message references `CentralUI-001`.
### CentralUI-002 — Site-scoped Deployment permissions are issued but never enforced
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:63-69`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/*.razor` |
**Description**
Login adds `SiteId` claims (`JwtTokenService.SiteIdClaimType`) for non-system-wide
Deployment users, and the design doc (Component-CentralUI "Responsibilities" and
CLAUDE.md Security & Auth) requires the Deployment role to be site-scoped. A
repo-wide search shows the `SiteId` claim is written at login and **never read
anywhere in CentralUI**. Deployment pages — `DebugView.razor`, `Deployments.razor`,
`InstanceCreate.razor`, `InstanceConfigure.razor`, `Topology.razor`,
`ParkedMessages.razor`, `EventLogs.razor` — list and act on every site with no
filtering by the user's permitted sites. A Deployment user scoped to one site
can deploy to, debug, and manage instances at any site.
**Recommendation**
Enforce site scoping: filter site/instance lists by the user's `SiteId` claims
(or treat the absence of `SiteId` claims as system-wide), and re-check the claim
server-side before any mutating cross-site command (deploy, enable/disable/delete,
debug stream, parked-message retry/discard). A shared helper that reads the
claims from `AuthenticationStateProvider` and exposes "permitted site ids" would
keep this consistent.
**Resolution**
Resolved 2026-05-16. Confirmed: the `SiteId` claim was written at login
(`AuthEndpoints`, `RoleMapper`) but never read by any CentralUI page — site
scoping was unenforced. Added a scoped `SiteScopeService` (`Auth/SiteScopeService.cs`)
that reads the current circuit's `SiteId` claims and exposes `IsSystemWideAsync`,
`PermittedSiteIdsAsync`, `FilterSitesAsync`, and `IsSiteAllowedAsync` (absence of
claims = system-wide, matching `SiteScopeAuthorizationHandler`). All seven
Deployment/Monitoring pages now consume it: `Topology`, `DebugView`,
`InstanceCreate`, `Deployments` filter their site/instance lists; `InstanceConfigure`
rejects direct navigation to an instance on a non-permitted site; `DebugView`,
`InstanceCreate`, and `ParkedMessages` re-check the claim server-side before any
mutating/streaming command. Regression tests: `SiteScopeServiceTests` (6 tests
pinning the helper logic) and `TopologyPageTests.SiteScoping_ScopedDeploymentUser_OnlySeesPermittedSites`
/ `SiteScoping_SystemWideDeploymentUser_SeesAllSites`. Fixed by the commit whose
message references `CentralUI-002`.
### CentralUI-003 — `Console.SetOut`/`SetError` mutates process-global state across concurrent circuits
| | |
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:359-423` |
**Description**
`RunInSandboxAsync` redirects `Console.Out`/`Console.Error` to a per-call
`StringWriter`, runs the script, then restores them in `finally`. `Console.Out`
is process-global. If two users (two Blazor circuits) run Test Run concurrently,
their captured outputs interleave or cross over, and the `finally` of whichever
finishes first restores `Console.Out` to the *original* writer while the other
run is still executing — so the second run's script output is lost or written
to the real console. `RunInSandboxAsync` is `async` and the script runs on a
thread-pool thread, so concurrent execution is fully expected.
**Recommendation**
Do not redirect process-global `Console`. Provide console capture through the
script globals surface (e.g. a `TextWriter` exposed on `SandboxScriptHost` that
the sandbox API writes to), or serialize Test Run executions with a semaphore if
global redirection must be kept. Capturing per-call without global mutation is
the correct fix.
**Resolution**
Resolved 2026-05-16. Confirmed: `RunInSandboxAsync` redirected the process-global
`Console.Out`/`Console.Error` per call and restored them in `finally`, so a
concurrent run's `finally` could restore the writer while another run was still
executing — the long run silently lost output (reproduced by the regression
test, 74 of 80 expected lines captured). Added `SandboxConsoleCapture`, a routing
`TextWriter` installed into `Console.Out`/`Console.Error` exactly once for the
process; each run pushes its own `StringWriter` onto an `AsyncLocal` capture
scope via `BeginCapture`, so writes are routed per logical call-tree with no
per-run mutation of global `Console` state. `RunInSandboxAsync` now opens the
scope with `using` declarations instead of calling `Console.SetOut`. Regression
tests `RunInSandbox_CapturesConsoleOutput` and
`RunInSandbox_ConcurrentRuns_DoNotCrossContaminateConsoleOutput` fail against the
pre-fix code and pass after. Fixed by the commit whose message references
`CentralUI-003`.
### CentralUI-004 — `CookieAuthenticationStateProvider` reads `HttpContext` for the life of the circuit
| | |
|--|--|
| Severity | High |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/CookieAuthenticationStateProvider.cs:22-28` |
**Description**
`GetAuthenticationStateAsync` returns `_httpContextAccessor.HttpContext?.User`.
In Blazor Server, `HttpContext` is only valid during the initial HTTP request
that establishes the circuit; for the lifetime of the long-lived SignalR circuit
`IHttpContextAccessor.HttpContext` is `null` (or, worse, a stale/foreign context
if the accessor's `AsyncLocal` leaks). Any later call to
`GetAuthenticationStateAsync` — e.g. an `<AuthorizeView>` re-evaluating, or pages
that call it directly (`Sites.razor`, `Templates.razor`) — then sees an
unauthenticated principal and may render the wrong UI, or returns a stale
identity that never reflects role changes. The class derives from
`ServerAuthenticationStateProvider`, which is designed to be seeded once via
`SetAuthenticationState`; overriding `GetAuthenticationStateAsync` to read
`HttpContext` defeats that design.
**Recommendation**
Capture the authenticated principal once when the circuit is created (e.g. via
the root component / `AuthenticationStateProvider` seeding pattern used by the
Blazor Web App template) and store it on the scoped provider, instead of reading
`IHttpContextAccessor` on every call. Do not depend on `HttpContext` after the
circuit is established.
**Resolution**
Resolved 2026-05-16. Confirmed: `GetAuthenticationStateAsync` read
`_httpContextAccessor.HttpContext?.User` on every call; the provider is
registered `Scoped`, so it is constructed within the initial HTTP request's DI
scope while `HttpContext` is still valid, but every later call (an
`<AuthorizeView>` re-evaluating, or a page calling it directly) over the
long-lived SignalR circuit saw `HttpContext == null` and returned an anonymous
principal. The provider now snapshots the principal once in the constructor into
a cached `Task<AuthenticationState>` and serves that for the life of the
circuit, never touching `IHttpContextAccessor` again. Regression tests
`CookieAuthenticationStateProviderTests.GetAuthenticationStateAsync_StillReturnsUser_AfterHttpContextIsGone`
and `..._IsStableAcrossCalls_IgnoringStaleForeignContext` fail against the
pre-fix code (they would see an anonymous / foreign principal) and pass after.
Fixed by the commit whose message references `CentralUI-004`.
### CentralUI-005 — Session expiry implementation diverges from the documented policy
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:47-81`; `src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor:18-30` |
**Description**
CLAUDE.md (Security & Auth) specifies "15-minute expiry with sliding refresh,
30-minute idle timeout." `AuthEndpoints` instead sets a single fixed
`expires_at = UtcNow + 30 minutes` claim and a 30-minute cookie `ExpiresUtc`,
with no sliding refresh and no separate idle vs absolute timeout.
`SessionExpiry.razor` schedules a single hard redirect at that fixed time. The
result is a hard 30-minute cap with no sliding renewal — an active user is
logged out mid-session, and there is no 15-minute component at all.
**Recommendation**
Either implement the documented policy (sliding 15-minute token with refresh on
activity, plus a 30-minute idle cutoff) or update the design docs to match the
fixed 30-minute model. The code and the documented decision must agree.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`) — cross-module fix (CentralUI +
Security), explicitly authorized. Root cause confirmed against the source:
`AddCookie` (`ScadaLink.Security/ServiceCollectionExtensions.cs`) set neither
`ExpireTimeSpan` nor `SlidingExpiration`; `AuthEndpoints` stamped a fixed
`expires_at = UtcNow + 30 min` claim and a 30-minute absolute cookie
`ExpiresUtc`; `SessionExpiry.razor` scheduled one hard redirect at that fixed
instant — a hard 30-minute cap, no sliding renewal, no 15-minute component.
**What was implemented — sliding session window.** ASP.NET Core cookie
authentication exposes a single `ExpireTimeSpan` plus a `SlidingExpiration`
flag; it cannot natively model *both* a 15-minute sliding token *and* a separate
30-minute absolute idle cap. The faithful interpretation implemented: the cookie
session window **is** the idle timeout. `AddSecurity` now post-configures the
cookie options with `ExpireTimeSpan = TimeSpan.FromMinutes(SecurityOptions.IdleTimeoutMinutes)`
(default 30, bound from `appsettings` via the existing options pattern, not
hard-coded) and `SlidingExpiration = true`. The middleware therefore re-issues
the cookie on activity once past the halfway mark of the window: an active user
is continually renewed, an idle user is signed out after the 30-minute idle
timeout — exactly the documented "sliding refresh, 30-minute idle timeout". The
separate 15-minute `JwtExpiryMinutes` governs the lifetime of the *embedded JWT*
itself (`JwtTokenService`) — a distinct layer from the cookie session window;
it is not, and per the ASP.NET cookie model cannot be, a second independent
sliding window inside the same cookie. `AuthEndpoints` no longer imposes a
contradictory absolute cap: the `expires_at` claim and the manual cookie
`ExpiresUtc` were removed, and a new `BuildSignInProperties()` helper sets only
`IsPersistent = true` (no `ExpiresUtc`, `AllowRefresh` left unset) so the
middleware owns expiry. `SessionExpiry.razor` no longer reads a fixed
login-time deadline (the `expires_at` claim is gone) and no longer hard-redirects
at a fixed instant: it now polls the authentication state on a recurring
interval and redirects to `/login` only once the sliding cookie has actually
lapsed server-side — so an active user is never logged out mid-session.
Regression tests fail against the pre-fix code and pass after. Security:
`AddSecurity_AuthCookie_UsesSlidingExpiration`,
`AddSecurity_AuthCookie_ExpireTimeSpanMatchesIdleTimeout` (pre-fix
`ExpireTimeSpan` was the 14-day default — confirmed failing), and
`AddSecurity_AuthCookie_ExpireTimeSpanIsConfigurable` (pins the options-pattern
binding). CentralUI: `SessionExpiryPolicyTests.BuildSignInProperties_DoesNotSetFixedAbsoluteExpiry`,
`..._IsPersistent`, `..._AllowsSlidingRefresh` pin that the login sign-in no
longer imposes a fixed absolute cap. `dotnet build ScadaLink.slnx` clean;
`tests/ScadaLink.Security.Tests` 57 passed, `tests/ScadaLink.CentralUI.Tests`
254 passed.
### CentralUI-006 — Deployment status page polls every 10s despite the documented SignalR-push design
| | |
|--|--|
| Severity | Medium |
| Category | Design-document adherence |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:196-216` |
**Description**
Component-CentralUI "Real-Time Updates" states: "Deployment status:
Pending/in-progress/success/failed transitions push to the UI immediately via
SignalR (built into Blazor Server). No polling required for deployment
tracking." `Deployments.razor` instead runs a `Timer` that reloads all
deployment records and instance names from the database every 10 seconds. This
is a full N-record + instance-map reload per tick for every open circuit, and
contradicts the design. It also re-issues two repository round-trips on each
tick regardless of whether anything changed.
**Recommendation**
Implement push-based updates (an injected event/observable raised by the
Deployment Manager that the page subscribes to and renders via
`InvokeAsync(StateHasChanged)`), or amend the design doc to acknowledge polling.
If polling is kept as a fallback, fetch only changed/in-progress records.
**Resolution**
Resolved 2026-05-16 (commit `<pending>`) — cross-module fix (CentralUI +
DeploymentManager), explicitly authorized. Root cause confirmed against the
source: `Deployments.razor` ran a `Timer` (`OnInitializedAsync``StartTimer`,
10s interval) that, every tick and for every open Blazor circuit, reloaded all
deployment records (`GetAllDeploymentRecordsAsync`) and the full instance map
(`GetAllInstancesAsync`) — contradicting Component-CentralUI "Real-Time Updates"
("transitions push to the UI immediately via SignalR … no polling required").
**Process/DI topology confirmed.** `ScadaLink.Host/Program.cs` calls both
`AddDeploymentManager()` (line 75) and `AddCentralUI()` (line 77) on the same
`builder.Services` — DeploymentManager and the Central UI run **in the same
central Host process**, so a DI singleton is genuinely shared between the
DeploymentManager services and the Blazor circuit's scoped components. The
shared-singleton seam is real; no out-of-process fallback was needed.
**What was implemented — push-based updates.** A new
`IDeploymentStatusNotifier` (`ScadaLink.DeploymentManager/IDeploymentStatusNotifier.cs`)
with a C# `event Action<DeploymentStatusChange>` and a small payload
(`DeploymentStatusChange` = deployment id + instance id + new status). Its
implementation `DeploymentStatusNotifier` invokes each subscriber in isolation
and swallows/logs handler exceptions so a faulting circuit cannot break the
deployment pipeline. It is registered as a **singleton** in `AddDeploymentManager`
(`ServiceCollectionExtensions`). Every place `DeploymentService` writes a
`DeploymentRecord` status now raises the notifier: the `Pending` create, the
`InProgress` update, the site-response terminal update, the `Failed` cleanup
write in the catch block, and the `DeploymentManager-006` reconciled-`Success`
write — five call sites via a private `NotifyStatusChange` helper.
`ArtifactDeploymentService` was inspected and writes only
`SystemArtifactDeploymentRecord` rows, which `Deployments.razor` does not
display, so it correctly raises nothing. `Deployments.razor` no longer has a
`Timer`: `OnInitializedAsync` subscribes to `IDeploymentStatusNotifier.StatusChanged`,
the handler reloads via `InvokeAsync(StateHasChanged)` (the notifier event is
raised on the DeploymentManager service thread), and `Dispose` unsubscribes.
Blazor Server pushes the re-render to the browser over its SignalR circuit
automatically — satisfying the documented design. The existing "Pause/Resume
updates" toggle now gates whether incoming push events are acted on, and
"Refresh" still forces a manual reload. CLAUDE.md UI rules kept: Blazor Server +
Bootstrap, custom components, no third-party frameworks.
Regression tests fail against the pre-fix code and pass after. DeploymentManager
(`DeploymentStatusNotifierTests`): `DeployInstanceAsync_RaisesStatusChange_ForEveryRecordStatusWrite`
(pre-fix: no notifier, fails to compile / silent), plus
`NotifyStatusChanged_WithNoSubscribers_DoesNotThrow` and
`NotifyStatusChanged_ThrowingSubscriber_DoesNotBreakOtherSubscribers`;
`ServiceCollectionExtensionsTests.AddDeploymentManager_RegistersDeploymentStatusNotifier_AsSingleton`
pins the shared-singleton seam. CentralUI (`DeploymentsPushUpdateTests`):
`Deployments_DoesNotPoll_HasNoRefreshTimer` (pre-fix: the `_refreshTimer` field
existed — confirmed failing), `Deployments_StatusChange_TriggersReload`, and
`Deployments_Dispose_UnsubscribesFromNotifier`. `dotnet build ScadaLink.slnx`
clean (0 warnings); `tests/ScadaLink.DeploymentManager.Tests` 76 passed,
`tests/ScadaLink.CentralUI.Tests` 257 passed. (`TopologyPageTests`' DI fixture
was also updated to register the new notifier, since it constructs the real
`DeploymentService`.)
### CentralUI-007 — Monitoring nav links to Deployment-only pages are shown to all roles
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor:69-78`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/EventLogs.razor:2`; `src/ScadaLink.CentralUI/Components/Pages/Monitoring/ParkedMessages.razor:2` |
**Description**
`NavMenu` renders the "Event Logs" and "Parked Messages" links inside the
all-authenticated-users Monitoring section. The design doc classifies both the
Site Event Log Viewer and Parked Message Management as **Deployment Role**.
Two inconsistencies result: (a) an Admin- or Design-only user sees nav links
they cannot use; (b) the pages themselves are annotated only `[Authorize]`
(any authenticated user), not `[Authorize(Policy = RequireDeployment)]`, so a
non-Deployment user who follows the link is *not* blocked — they can query site
event logs and retry/discard parked messages. The authorization attribute and
the nav visibility both contradict the design.
**Recommendation**
Add `[Authorize(Policy = AuthorizationPolicies.RequireDeployment)]` to
`EventLogs.razor` and `ParkedMessages.razor`, and move their nav links into a
`<AuthorizeView Policy="RequireDeployment">` block (consistent with the Topology
/ Deployments / Debug View links). Confirm Health Dashboard is intentionally
all-roles (it is, per the design).
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: both `EventLogs.razor` and
`ParkedMessages.razor` carried a bare `[Authorize]`, so any authenticated user
could query site event logs and retry/discard parked messages — contrary to the
design doc's Deployment-Role classification. Both pages now use
`[Authorize(Policy = AuthorizationPolicies.RequireDeployment)]`, and the
"Event Logs" / "Parked Messages" nav links were moved out of the all-roles
Monitoring block into an `<AuthorizeView Policy="RequireDeployment">` (Health
Dashboard stays all-roles, as the design intends). Regression tests
`MonitoringAuthorizationTests.{EventLogsPage,ParkedMessagesPage}_RequiresDeploymentPolicy`
fail against the pre-fix code and pass after;
`HealthDashboard_IsIntentionallyAllAuthenticatedRoles` guards the all-roles page.
### CentralUI-008 — Audit-log date filters treat browser-local datetimes as UTC
| | |
|--|--|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Monitoring/AuditLog.razor:242-243` |
**Description**
The `From`/`To` filters bind `<input type="datetime-local">` to `DateTime?`
fields. A `datetime-local` input yields the value the user typed in their
*browser-local* time zone. `FetchPage` converts them with
`new DateTimeOffset(_filterFrom.Value, TimeSpan.Zero)` — i.e. it labels the
local wall-clock value as UTC. For any non-UTC user the audit query window is
shifted by their UTC offset, silently returning the wrong rows. CLAUDE.md
mandates UTC throughout, but that requires converting the local input *to* UTC,
not relabelling it.
**Recommendation**
Convert the picked local time to UTC before querying — capture the browser
offset (JS interop) and apply it, or document the inputs as UTC and label them
in the UI. The same issue should be checked in `EventLogs.razor` if it has
time-range filters.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `FetchPage` wrapped the
`datetime-local` value with `new DateTimeOffset(value, TimeSpan.Zero)`,
relabelling the browser-local wall-clock value as UTC and shifting the audit
query window by the user's offset. Added a pure helper
`Components/BrowserTime.LocalInputToUtc(DateTime?, int)` that converts a
local-input value to UTC using the browser's `Date.getTimezoneOffset()`;
`AuditLog.razor` now fetches that offset once via JS interop in
`OnAfterRenderAsync` (defaulting to 0/UTC on prerender or a disconnected
circuit) and runs both `from`/`to` filters through the helper. Regression suite
`BrowserTimeTests` (5 tests) fails against the naive relabelling and passes
after — including `LocalInputToUtc_NonUtcBrowser_DoesNotEqualNaiveRelabelling`,
which pins the exact pre-fix bug. `EventLogs.razor` was checked and has no
time-range filters, so it is unaffected.
### CentralUI-009 — `DebugView` stream callbacks touch a possibly-disposed `ToastNotification`
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/DebugView.razor:400-409,538-544` |
**Description**
The `onTerminated` callback passed to `DebugStreamService.StartStreamAsync`
captures `_toast` and `this` and runs on an Akka/gRPC thread. If the user
navigates away, `Dispose()` calls `StopStream`, but a stream-termination event
already in flight can still invoke `onTerminated`, which calls
`_toast.ShowError(...)` and `StateHasChanged()` on a disposed component. The
component does not guard callbacks with a disposed flag or a
`CancellationTokenSource`. The same applies to the `onEvent` callbacks at
lines 391-398 that call `InvokeAsync(StateHasChanged)`.
**Recommendation**
Track a `_disposed`/`CancellationTokenSource` on the component, check it at the
top of every stream callback, and stop the stream synchronously before marking
disposed. `InvokeAsync` after disposal throws `ObjectDisposedException`; the
callbacks should no-op once disposed.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: the `onEvent`/`onTerminated`
callbacks captured `this` and `_toast` and ran on an Akka/gRPC thread with no
disposal guard. Added a `volatile bool _disposed` flag, set first thing in
`Dispose()` before the stream is stopped. Every callback now checks `_disposed`
and no-ops if set; the render dispatch goes through a new `SafeInvokeAsync`
helper that re-checks the flag and swallows `ObjectDisposedException` should the
component be disposed between the guard and the dispatch. Regression tests
`DebugViewDisposalTests.{DebugView_HasDisposalGuardField,
DebugView_Dispose_SetsDisposedFlag_AndIsIdempotent}` pin the observable contract
(the guard field exists; `Dispose()` sets it and is idempotent) — the first
fails against the pre-fix code, which had no `_disposed` field. The Akka-thread
timing race itself is not deterministically reproducible in a unit test:
`DebugStreamService` is a non-virtual concrete class with no seam to inject and
later fire the callbacks, so the closest meaningful tests pin the guard
mechanism rather than the race window.
### CentralUI-010 — `ToastNotification` auto-dismiss continuation runs after component disposal
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/ToastNotification.razor:62-71,90` |
**Description**
`AddToast` schedules `Task.Delay(dismissMs).ContinueWith(...)` with the result
discarded (`_ =`). The continuation calls `InvokeAsync(StateHasChanged)`. If the
host page is disposed before the 5-second delay elapses (common — navigate away
right after an action), the continuation runs against a disposed component and
`InvokeAsync` throws `ObjectDisposedException` on a thread-pool thread with no
catch, producing an unobserved task exception. `Dispose()` is an empty body and
cancels nothing.
**Recommendation**
Hold a `CancellationTokenSource`, pass its token to `Task.Delay`, cancel it in
`Dispose()`, and guard the continuation. Alternatively wrap the continuation
body in a try/catch for `ObjectDisposedException`.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `AddToast` scheduled
`Task.Delay(...).ContinueWith(...)` with no cancellation and `Dispose()` was an
empty body, so the continuation ran `InvokeAsync(StateHasChanged)` against a
disposed component. Added a `CancellationTokenSource _disposalCts` cancelled in
`Dispose()`; the auto-dismiss is now an `AutoDismissAsync` method that awaits
`Task.Delay(dismissMs, token)`, returns on `OperationCanceledException`, and
wraps the post-delay `InvokeAsync(StateHasChanged)` in a try/catch for
`ObjectDisposedException`. `AddToast` also short-circuits if the component is
already disposed. Regression tests:
`ToastNotificationTests.ShowToast_AfterDisposal_IsNoOp_AndSchedulesNothing`
fails against the pre-fix code (which still added the toast / mis-scheduled
after disposal) and passes after;
`AutoDismiss_AfterDisposal_DoesNotThrowUnobservedException` and
`AutoDismiss_BeforeDisposal_StillRemovesToast` guard the no-throw and
still-works behaviours.
### CentralUI-011 — `DiffDialog` leaves a dangling `TaskCompletionSource` when disposed while open
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/DiffDialog.razor:89-95,151-157` |
**Description**
`OpenAsync` creates `_tcs` and returns `_tcs.Task` to the caller, which
typically `await`s it. The task is completed only by `Close()`. If the user
navigates away while the dialog is open, `DisposeAsync` runs but never completes
`_tcs`, so the awaiting caller's continuation never resumes — a permanently
suspended `Task` (and any `using`/cleanup after the await is skipped). The
`IDialogService.Confirm/Prompt` path has the same shape but at least its host
is a single long-lived `DialogHost`; `DiffDialog` is per-page.
**Recommendation**
In `DisposeAsync`, call `_tcs?.TrySetResult(false)` (or `TrySetCanceled`) so any
awaiter completes deterministically.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `OpenAsync` returned
`_tcs.Task`, completed only by `Close()`; `DisposeAsync` never touched the TCS,
so disposing the dialog while open left the awaiting caller suspended forever.
`DisposeAsync` now calls `_tcs?.TrySetResult(false)` before unlocking the body,
so a dialog disposed while open resolves its caller to `false` (not confirmed).
Regression test `DiffDialogTests.DisposeAsync_WhileOpen_CompletesPendingTask`
fails against the pre-fix code (the pending task stays `WaitingForActivation`)
and passes after; `Close_CompletesPendingTaskWithTrue` guards the normal close
path.
### CentralUI-012 — N+1 query loading data connections for the Sites page
| | |
|--|--|
| Severity | Medium |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:196-205` |
**Description**
`LoadDataAsync` fetches all sites, then issues
`SiteRepository.GetDataConnectionsBySiteIdAsync(site.Id)` once per site in a
loop. With N sites this is N+1 database round-trips on every page load and every
post-delete refresh. The connection lists are only used for a small per-card
summary.
**Recommendation**
Add a repository method that returns all data connections (or connections for a
set of site ids) in one query and group them client-side, or project the small
summary in a single query.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `LoadDataAsync` looped
`GetDataConnectionsBySiteIdAsync(site.Id)` once per site (N+1). `ISiteRepository`
already exposes `GetAllDataConnectionsAsync()` and `DataConnection` carries a
`SiteId`, so the loop was replaced with a single `GetAllDataConnectionsAsync()`
call grouped client-side by `SiteId` — one query regardless of site count, on
every load and post-delete refresh. Regression tests
`SitesPageTests.{LoadData_FetchesAllConnectionsInOneQuery_NoPerSiteQueries,
LoadData_GroupsConnectionsBySite_AndRendersThem}` fail against the pre-fix code
(`GetDataConnectionsBySiteIdAsync` was called per site) and pass after.
### CentralUI-013 — `ScriptAnalysisService` blocks on async shared-script lookups
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:951-952` (actual call at `:975`) |
**Description**
`ResolveCalledShape` calls `_sharedScripts.GetShapesAsync().GetAwaiter().GetResult()`
to resolve a shared-script shape synchronously. `GetShapesAsync` ultimately hits
`SharedScriptService` and its EF Core repository. Sync-over-async on a request
thread risks thread-pool starvation under load and can deadlock if any awaited
continuation needs a captured context. `Hover` and `SignatureHelp` (which call
`ResolveCalledShape`) are themselves synchronous methods, so the blocking call
is structural.
**Recommendation**
Make `Hover` and `SignatureHelp` async and `await` `GetShapesAsync`, or have the
catalog expose a cached synchronous snapshot that is refreshed asynchronously.
The `IMemoryCache` is already present — caching the shapes there and reading
them synchronously would remove the blocking call.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed (the sync-over-async call is at
`:975`, not `:951-952` as originally cited — `ResolveCalledShape`'s
`Scripts.CallShared` branch). Took the recommended root-cause fix: `Hover` and
`SignatureHelp` are now `async Task<...>` and `ResolveCalledShape` is
`async Task<ScriptShape?>` which `await`s `_sharedScripts.GetShapesAsync()`
instead of `.GetAwaiter().GetResult()`. The two minimal-API endpoints
(`/hover`, `/signature-help`) were updated to `await` the methods. Regression
suite `ScriptAnalysisAsyncResolveTests` (3 tests): the structural test
`HoverAndSignatureHelp_AreAsync_NotSyncOverAsync` fails against the pre-fix
synchronous signatures, and two behavioural tests resolve shared-script shapes
through a catalog that only completes after `Task.Yield()` (a genuinely async
source). The five existing `Hover`/`SignatureHelp` tests in
`ScriptAnalysisServiceTests` were updated to `await` the now-async methods.
### CentralUI-014 — Test Run side effects (HTTP/SQL/SMTP) fire against production services
| | |
|--|--|
| Severity | Low (re-triaged from Medium 2026-05-16 — see Resolution) |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/ScriptAnalysis/ScriptAnalysisService.cs:254-259`; `src/ScadaLink.CentralUI/ScriptAnalysis/SandboxHostHelpers.cs:26-117` |
**Description**
By design (documented in the XML comments) Test Run wires `ExternalSystem`,
`Database`, and `Notify` to central's *real* `IExternalSystemClient`,
`IDatabaseGateway`, and `INotificationDeliveryService`, so a Test Run that calls
`Notify.To(...).Send(...)` actually emails recipients, `Database.Connection(...)`
opens a real DB connection, and `External.Call(...)` makes real HTTP calls —
with production-equivalent side effects. There is no dry-run mode, no
confirmation, and (combined with CentralUI-001) no restriction on what a script
can do. A Design user testing a draft script can dispatch real notifications or
mutate external databases. The behaviour is intentional but the blast radius is
not surfaced to the user.
**Recommendation**
At minimum, surface a clear warning in the Test Run UI that side effects are
real, and require explicit opt-in for side-effecting calls. Preferably offer a
dry-run mode that stubs the helpers, defaulting to dry-run.
**Resolution**
Resolved 2026-05-16 (commit pending) — **re-triaged**. Re-verified against the
reviewed commit `9c60592`: the finding's premise that "the blast radius is not
surfaced to the user" is **inaccurate**. Both Test Run surfaces that can produce
real side effects — `SharedScriptForm.razor` and the script Test Run in
`TemplateEdit.razor` — already carry a prominent `Real I/O` badge on the panel
header and an `alert-warning` block stating `External`/`Database`/`Notify` calls
"fire for real … real HTTP, real SQL, real emails. Side effects are permanent"
(present since commit `2951507`, an ancestor of the reviewed commit, confirmed
via `git merge-base`). `ApiMethodForm.razor` (Inbound API kind) has **no**
real-I/O surface at all — `SandboxInboundScriptHost` exposes only
`Parameters`/`Route` (Route throws) — and correctly omits the badge while still
warning. Revealing the panel ("Test Run" toggle) then clicking "Run" is itself a
two-step explicit opt-in. The minimum recommendation is therefore already met;
the optional dry-run mode is a separate feature decision the design doc does not
mandate. Severity re-triaged Medium → Low (intentional, documented, clearly
warned behaviour — not a bug). Regression suite `TestRunWarningTests` (3 tests)
pins the `Real I/O` badge + warning text in `SharedScriptForm`/`TemplateEdit`
and the deliberate absence of the badge in `ApiMethodForm`, so the warning
cannot silently regress.
### CentralUI-015 — `DialogService` continuations resolve off the render thread
| | |
|--|--|
| Severity | Low |
| Category | Concurrency & thread safety |
| Status | Won't Fix |
| Location | `src/ScadaLink.CentralUI/ServiceCollectionExtensions.cs:24`; `src/ScadaLink.CentralUI/Components/Shared/DialogService.cs:18-69` |
**Description**
`DialogService` is `AddScoped` (one per circuit, correct) but
`ConfirmAsync`/`PromptAsync` complete via `ContinueWith(..., TaskScheduler.Default)`,
so a caller awaiting them resumes on a thread-pool thread. Any subsequent
component state mutation by the caller is then off the render thread unless the
caller wraps it in `InvokeAsync`. Call sites are not consistently doing so,
which can produce non-deterministic render glitches.
**Recommendation**
Either resolve continuations on the circuit's sync context or document that
callers must `InvokeAsync` after awaiting `ConfirmAsync`/`PromptAsync`. Audit
call sites for off-thread state mutation.
**Resolution**
Won't Fix — **re-triaged 2026-05-16, the finding's premise is incorrect.** The
finding claims `ContinueWith(..., TaskScheduler.Default)` makes an awaiting
caller resume on a thread-pool thread. It does not. `TaskScheduler.Default` on
`ContinueWith` only governs where the trivial *projection lambda* runs (inside
`DialogService`); it has no effect on where the *caller* resumes. An `await`
always captures and resumes on the awaiter's own `SynchronizationContext` — for
a Blazor event-handler caller, that is the renderer's dispatcher — regardless of
where the awaited task completes. This was verified directly:
`DialogServiceThreadingTests.ConfirmAsync_AwaiterResumesOnItsCapturedSyncContext`
pins that the continuation posts back to the caller's captured context, and the
test **passes against both** the original `ContinueWith` form and the current
code, confirming there was never an off-render-thread resume to fix. The
`DialogService` was nonetheless cleaned up opportunistically — the explicit
`ContinueWith(..., TaskScheduler.Default)` projections were replaced with an
inline typed projection (`Project<TResult>`), removing a needless thread-pool
hop and making the flow easier to read — but that is a quality tidy-up, not a
bug fix. Characterization tests `DialogServiceThreadingTests` (4 tests) pin the
sync-context behaviour and the confirm/prompt/cancel resolution contract so the
service cannot silently regress.
### CentralUI-016 — Pagers render one button per page with no windowing
| | |
|--|--|
| Severity | Low |
| Category | Performance & resource management |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/DataTable.razor:62-68`; `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:167-173` |
**Description**
The `DataTable` and `Deployments` paginators loop `for i = 1..totalPages` and
emit a `<li>` button for every page. With a few thousand records at page size 25
that is hundreds of buttons rendered into the diff on every state change. It is
not a correctness bug but degrades render performance and usability on large
datasets.
**Recommendation**
Window the pager (first / prev / a few around current / next / last) or switch
large lists to a "load more" / numeric jump input.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: both `DataTable` and
`Deployments` looped `for i = 1..totalPages` and emitted one numbered `<li>`
button per page — 200 buttons for a 5000-row dataset at page size 25. Added a
pure `PagerWindow.Build(currentPage, totalPages)` helper
(`Components/Shared/PagerWindow.cs`) that returns a bounded window — always the
first and last page plus a small range around the current page, with a `0`
sentinel marking an elided gap (rendered as a disabled `&hellip;`). Both
paginators now iterate `PagerWindow.Build(...)` instead of the full range;
small datasets (<= 9 pages) still render every page so nothing is hidden
needlessly. Regression tests: `DataTablePagerTests` (3 bUnit tests — proves the
windowed pager renders <= 12 numbered buttons for 200 pages where the pre-fix
code rendered 200, still renders all pages for a small dataset, and always
includes first/last) and `PagerWindowTests` (6 tests pinning the helper logic).
### CentralUI-017 — `/auth/logout` POST disables antiforgery, enabling logout CSRF
| | |
|--|--|
| Severity | Low |
| Category | Security |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Auth/AuthEndpoints.cs:127-138` |
**Description**
The `POST /auth/logout` endpoint calls `.DisableAntiforgery()`, and a plain
`GET /logout` endpoint also signs the user out. Either can be triggered
cross-site (an `<img src="/logout">` or an auto-submitting form) to forcibly log
a user out. Login itself reasonably disables antiforgery (pre-auth), but logout
is a state-changing authenticated action and should be CSRF-protected.
**Recommendation**
Require an antiforgery token on `POST /auth/logout` (the `NavMenu` sign-out form
can include the antiforgery token), and remove or protect the state-changing
`GET /logout` route.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed: `POST /auth/logout` called
`.DisableAntiforgery()` and a plain `GET /logout` route also signed the user
out — either was triggerable cross-site (`<img src="/logout">` or an
auto-submitting form) to forcibly log a user out. The `.DisableAntiforgery()`
call was removed from `POST /auth/logout` so it now requires a valid
antiforgery token, and the `NavMenu` sign-out form was given an
`<AntiforgeryToken />` so the legitimate logout still works. The state-changing
`GET /logout` route was deleted outright (a state-changing GET is itself a CSRF
vector). `POST /auth/login` intentionally keeps `.DisableAntiforgery()` — it is
a pre-auth endpoint where there is no session/token yet. Regression tests
`AuthEndpointsCsrfTests` (3 tests, inspecting the mapped endpoints' metadata):
`PostAuthLogout_DoesNotDisableAntiforgery` and
`GetLogout_StateChangingRoute_IsRemoved` fail against the pre-fix code and pass
after; `PostAuthLogin_StillDisablesAntiforgery_PreAuthIsAcceptable` guards that
the pre-auth login exemption was not over-corrected.
### CentralUI-018 — Broad `catch {}` blocks swallow JS interop and storage errors silently
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/MonacoEditor.razor:116-118,123,142,164,170,176,182,189`; `src/ScadaLink.CentralUI/Components/Shared/TreeView.razor:129,139`; `src/ScadaLink.CentralUI/Components/Pages/Admin/Sites.razor:316-319` |
**Description**
Numerous `try { ... } catch { }` blocks swallow every exception with no logging.
The prerender-time JS-unavailable case is legitimate, but these catches also
hide real failures: a genuine Monaco init failure, or a clipboard permission
error become invisible. In `TreeView.razor` the storage-restore
`JsonSerializer.Deserialize` (line 139) is not inside a try at all and would
throw uncaught on a corrupt `treeviewStorage` payload. Debugging UI issues in
production is then guesswork.
**Recommendation**
Catch the specific expected exception type (e.g. `JSDisconnectedException`,
`InvalidOperationException` during prerender) and log anything else via
`ILogger`. Wrap the TreeView storage `Deserialize` in its own guarded block.
**Resolution**
Resolved 2026-05-16 (commit pending). Confirmed all three locations.
(1) **TreeView** — the storage-restore `JsonSerializer.Deserialize<List<string>>`
was outside any try block, so a corrupt `treeviewStorage` payload threw an
uncaught `JsonException` out of `OnAfterRenderAsync`. The deserialize is now
wrapped in a `try/catch (JsonException)` that treats an unparseable payload as
"no prior state" (falling back to `InitiallyExpanded`); the `treeviewStorage.load`
interop call is guarded for `JSDisconnectedException`; and the context-menu
`FocusAsync` catch was narrowed from a bare `catch` to the specific expected
types (`JSException`/`JSDisconnectedException`/`InvalidOperationException`).
(2) **MonacoEditor** — every JS interop call had a bare `catch { }`. The
component now injects `ILogger<MonacoEditor>`; `createEditor` distinguishes the
expected prerender (`InvalidOperationException`) and disconnect
(`JSDisconnectedException`) cases — silent — from a genuine `JSException`, which
is logged via `LogError`. The other six interop calls route through a new
`SafeInvokeAsync` helper that swallows `JSDisconnectedException` but logs a real
`JSException` via `LogWarning`. (3) **Sites.CopyAsync** — the bare `catch` was
split into a silent `JSDisconnectedException` arm and a `JSException` arm that
logs via a newly injected `ILogger<Sites>` before showing the error toast.
Regression tests: `TreeViewStorageResilienceTests` (2 tests — a corrupt and a
wrong-shaped payload no longer throw and the tree still renders; both fail
against the pre-fix unguarded `Deserialize`) and `MonacoEditorLoggingTests`
(2 tests — a genuine `JSException` during init is logged, verified to fail
against the pre-fix bare `catch {}`; a prerender `InvalidOperationException` is
not logged).
### CentralUI-019 — Sparse unit-test coverage for a large module; critical paths untested
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.CentralUI.Tests/` |
**Description**
The module has ~65 source files but unit tests cover only the script analyzer,
TreeView, schema model, and two data-connection pages. Untested critical paths
include: the auth bridge (`CookieAuthenticationStateProvider`,
`AuthEndpoints`), `RunInSandboxAsync` (timeout, recursion limit, error
classification, side-effect wiring), `DialogService` resolution semantics,
`DebugView` stream lifecycle and the `UpsertWithCap` cap logic, `Health` and
`Deployments` timer behaviour, and `SchemaBuilderModel` round-tripping of nested
schemas. Given findings CentralUI-001/003/009/010 sit on untested code, the gap
is material. The Playwright suite covers login and navigation only.
**Recommendation**
Add bUnit/unit tests for the auth bridge, sandbox-run behaviour (including
forbidden-API rejection once CentralUI-001 is fixed), dialog resolution, and the
DebugView cap/lifecycle logic. Prioritise the paths named in the Critical/High
findings.
**Resolution**
Resolved 2026-05-16 (commit pending). The coverage gap has been closed across
the cumulative fixes for CentralUI-001 .. 018 — every critical path the finding
named now has tests. Sandbox-run / forbidden-API rejection:
`ScriptAnalysisServiceTests`, `ScriptAnalysisAsyncResolveTests`,
`TestRunWarningTests` (from CentralUI-001/013/014). Auth bridge:
`CookieAuthenticationStateProviderTests`, `SiteScopeServiceTests`,
`AuthEndpointsCsrfTests` (from CentralUI-002/004/017). Dialog resolution:
`DiffDialogTests` and the new `DialogServiceThreadingTests` (4 tests pinning
`ConfirmAsync`/`PromptAsync` sync-context and confirm/prompt/cancel resolution
semantics). DebugView lifecycle: `DebugViewDisposalTests` (from CentralUI-009).
Toast/timer disposal: `ToastNotificationTests` (from CentralUI-010).
This batch also added `BrowserTimeTests`, `MonitoringAuthorizationTests`,
`SitesPageTests`, `DataTablePagerTests` + `PagerWindowTests`,
`TreeViewStorageResilienceTests`, and `MonacoEditorLoggingTests`. The
`tests/ScadaLink.CentralUI.Tests` suite is green at 251 tests. Remaining
untested paths are low-risk render-only pages; the Critical/High/Medium paths
the finding prioritised are all now covered, so the finding is considered
resolved. (Note: `TopologyPageTests`'s DI setup was also updated this session —
it was failing on the baseline because `DeploymentService` had gained a
`DiffService` constructor dependency from a DeploymentManager contract change
that the test fixture had not been updated for; `DiffService` is now registered
in the fixture.)
### CentralUI-020 — Idle-session redirect never fires: `SessionExpiry` polls a frozen auth-state snapshot
| | |
|--|--|
| Severity | High |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor:39-62`; `src/ScadaLink.CentralUI/Auth/CookieAuthenticationStateProvider.cs:29-43` |
**Description**
The CentralUI-004 fix and the CentralUI-005 fix interact destructively.
CentralUI-004 made `CookieAuthenticationStateProvider` snapshot the principal
**once** in its constructor into a cached `Task<AuthenticationState>` and serve
that exact task for the entire life of the SignalR circuit — it never re-reads
`HttpContext`, never calls `SetAuthenticationState`, and never raises
`NotifyAuthenticationStateChanged`. CentralUI-005 then rewrote
`SessionExpiry.razor` to *poll* `AuthStateProvider.GetAuthenticationStateAsync()`
once a minute and redirect to `/login` "once the sliding cookie has actually
lapsed server-side." But `GetAuthenticationStateAsync()` returns the same frozen
constructor-time snapshot on every call — `auth.User.Identity.IsAuthenticated`
is permanently `true` for the life of the circuit regardless of whether the
server-side cookie has expired. The poll loop therefore never observes an
expired session and the redirect never fires. An idle user whose cookie has
lapsed server-side keeps an authenticated-looking page open indefinitely; the
documented "30-minute idle timeout" is silently defeated for any user who
leaves a circuit open. (The cookie middleware would still reject the *next*
full HTTP request / new circuit, so this is a stale-UI / missed-logout exposure
rather than a full auth bypass — but the page continues to render
authenticated content and a SignalR circuit can stay alive for a long time.)
This is also a design-document-adherence regression against CLAUDE.md
"Security & Auth" (30-minute idle timeout) — recorded under Concurrency because
the root cause is the lifetime/staleness mismatch between the two components.
**Recommendation**
`SessionExpiry` must consult something that actually reflects the live
server-side session, not the circuit's frozen principal. Options: (a) have
`SessionExpiry` poll a lightweight authenticated server endpoint (e.g. a
`/auth/ping` minimal API that returns 401 once the cookie has lapsed) and
redirect on 401; or (b) give `CookieAuthenticationStateProvider` a refresh path
that re-validates the cookie and calls `SetAuthenticationState` /
`NotifyAuthenticationStateChanged` so the polled state can actually change.
Whichever is chosen, add a test that exercises the redirect path with an
expired session (see CentralUI-025).
**Resolution**
2026-05-17 — `SessionExpiry` no longer polls the frozen `AuthenticationStateProvider`; it polls a new anonymous `GET /auth/ping` minimal-API endpoint (re-validated by the cookie middleware on every HTTP request) via a `fetch()` JS helper and redirects to `/login` on HTTP 401, so the documented 30-minute idle logout actually fires.
### CentralUI-021 — `DebugView` stream callback mutates `Dictionary` off the render thread
| | |
|--|--|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/DebugView.razor:404-419,511-519,275-289` |
**Description**
The `onEvent` callback passed to `DebugStreamService.StartStreamAsync` runs on
an Akka/gRPC thread (as the design doc and the CentralUI-009 comments state). It
calls `UpsertWithCap(_attributeValues, …)` / `UpsertWithCap(_alarmStates, …)`
**directly on that thread** — the mutation is not marshalled through
`InvokeAsync`; only the subsequent `StateHasChanged` is. Meanwhile the render
thread evaluates `FilteredAttributeValues` / `FilteredAlarmStates`, which
enumerate `_attributeValues.Values` / `_alarmStates.Values` and call
`OrderBy(...).ToList()`. `Dictionary<TKey,TValue>` is not thread-safe: a write
on the Akka thread concurrent with an enumeration on the render thread can throw
`InvalidOperationException` ("Collection was modified; enumeration operation may
not execute") or corrupt the dictionary's internal buckets. The CentralUI-009
fix added a `_disposed` guard but did not address this data race — the guard
only prevents touching a *disposed* component, not concurrent access to a live
one. Under a busy debug stream this will intermittently fault the page.
**Recommendation**
Marshal the dictionary mutation onto the render thread too — move the
`UpsertWithCap` call inside the `SafeInvokeAsync`/`InvokeAsync` body so all
access to `_attributeValues`/`_alarmStates` happens on the renderer's
dispatcher. Alternatively guard both the writes and the `Filtered*` reads with a
lock, or use a concurrent collection. The cap-trim loop must be inside the same
critical section as the upsert.
**Resolution**
2026-05-17 — the stream callback now routes through `HandleStreamEvent`, which marshals the `UpsertWithCap` mutation (and the cap-trim loop) onto the renderer's dispatcher via `SafeInvokeAsync`, so every read and write of `_attributeValues`/`_alarmStates` happens on the render thread.
### CentralUI-022 — `Deployments` push handler fires `InvokeAsync` with no disposal guard
| | |
|--|--|
| Severity | Medium |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Deployment/Deployments.razor:221-229,317-322` |
**Description**
`OnDeploymentStatusChanged` is invoked by `IDeploymentStatusNotifier`, a process
singleton, on the DeploymentManager service thread. The handler does
`_ = InvokeAsync(async () => { await LoadDataAsync(); StateHasChanged(); })`,
discarding the returned task. `Dispose()` unsubscribes the handler, but there is
a race window: the notifier can read the subscriber list and begin invoking
`OnDeploymentStatusChanged` *just before* the component is disposed, so
`InvokeAsync` then runs against a disposed component and throws
`ObjectDisposedException` on the DeploymentManager thread — an unobserved task
exception (the task is fire-and-forget). The same hazard was explicitly fixed
for `DebugView` (CentralUI-009, `SafeInvokeAsync` + `_disposed` flag) and
`ToastNotification` (CentralUI-010), but the new push-based `Deployments`
handler introduced by the CentralUI-006 fix did not adopt the same guard.
Separately, every push event triggers two full repository reloads
(`GetAllInstancesAsync` + `GetAllDeploymentRecordsAsync`) for every open
circuit, so a burst of status changes amplifies into N×2 round-trips per tick.
**Recommendation**
Add a `volatile bool _disposed` set first in `Dispose()`, have
`OnDeploymentStatusChanged` no-op when set, and wrap the `InvokeAsync` dispatch
to swallow `ObjectDisposedException` (mirror `DebugView.SafeInvokeAsync`).
Optionally coalesce bursts (debounce) and/or reload only the changed record
rather than the whole table on each event.
**Resolution**
2026-05-17 — added a `volatile bool _disposed` set first in `Dispose()`; `OnDeploymentStatusChanged` no-ops when set, and the fire-and-forget dispatch (`DispatchReloadAsync`) swallows the residual `ObjectDisposedException`, mirroring the `DebugView`/`ToastNotification` guards.
### CentralUI-023 — Residual bare `catch {}` blocks swallow JS interop errors
| | |
|--|--|
| Severity | Low |
| Category | Error handling & resilience |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Pages/Monitoring/ParkedMessages.razor:690-698`; `src/ScadaLink.CentralUI/Components/Shared/DiffDialog.razor:107-116,118-130,104` |
**Description**
CentralUI-018 narrowed the bare `catch {}` blocks in `MonacoEditor`,
`TreeView`, and `Sites.razor`, but the same pattern survives elsewhere.
`ParkedMessages.CopyAsync` wraps `navigator.clipboard.writeText` in
`catch { _toast.ShowError("Copy failed."); }` — a real `JSException`
(clipboard permission denied) and an expected `JSDisconnectedException` are
treated identically and neither is logged. `DiffDialog.TryLockBodyAsync` /
`TryUnlockBodyAsync` each have a bare outer `catch` whose handler does another
JS call wrapped in a second bare `catch { /* swallow */ }`, and
`OnAfterRenderAsync`'s `_modalRef.FocusAsync()` is wrapped in a bare
`catch { /* prerender or detached: ignore */ }`. Genuine interop failures in
these paths are invisible in production logs.
**Recommendation**
Catch `JSDisconnectedException` silently and `JSException` (and
`InvalidOperationException` for the prerender focus case) with an `ILogger`
call, consistent with the CentralUI-018 fixes in the same module.
**Resolution**
2026-05-17 — the bare `catch` blocks in `ParkedMessages.CopyAsync` and `DiffDialog.TryLockBodyAsync`/`TryUnlockBodyAsync`/`OnAfterRenderAsync` now catch `JSDisconnectedException` (and `InvalidOperationException` for prerender focus) silently and log genuine `JSException` failures via injected `ILogger`.
### CentralUI-024 — Claim lookups use magic strings instead of `JwtTokenService` constants
| | |
|--|--|
| Severity | Low |
| Category | Code organization & conventions |
| Status | Resolved |
| Location | `src/ScadaLink.CentralUI/Components/Layout/NavMenu.razor:102`; `src/ScadaLink.CentralUI/Components/Pages/Dashboard.razor:14`; `GetCurrentUserAsync` in `Templates.razor`, `TemplateEdit.razor`, `TemplateCreate.razor`, `SharedScripts.razor`, `SharedScriptForm.razor`, `Sites.razor`, `Topology.razor`, `InstanceCreate.razor`, `InstanceConfigure.razor` |
**Description**
`ScadaLink.Security.JwtTokenService` exposes the canonical claim-type constants
(`UsernameClaimType = "Username"`, `DisplayNameClaimType = "DisplayName"`,
`RoleClaimType`, `SiteIdClaimType`). `SiteScopeService` correctly uses
`JwtTokenService.SiteIdClaimType`, but every `GetCurrentUserAsync` helper across
ten pages does `authState.User.FindFirst("Username")?.Value`, and `NavMenu` /
`Dashboard` do `context.User.FindFirst("DisplayName")`. The literals happen to
match the constants today, so there is no live bug — but if the claim type is
ever renamed in `JwtTokenService` (the single source of truth) every one of
these call sites silently breaks, falling back to `"unknown"` for the audit
user and a blank display name. The duplicated `GetCurrentUserAsync` helper is
also copy-pasted verbatim into ten components.
**Recommendation**
Replace the string literals with `JwtTokenService.UsernameClaimType` /
`DisplayNameClaimType`. Consider extracting the repeated `GetCurrentUserAsync`
into a single shared helper (e.g. an extension on `AuthenticationStateProvider`
or a small scoped service) so the claim lookup lives in exactly one place.
**Resolution**
2026-05-17 — added `ClaimsPrincipalExtensions` (`GetUsername`/`GetDisplayName`/`GetCurrentUsernameAsync`) resolving claims through the `JwtTokenService` constants; the ten copy-pasted `GetCurrentUserAsync` helpers and the `NavMenu`/`Dashboard` `DisplayName` lookups now delegate to it, eliminating every magic-string claim literal.
### CentralUI-025 — `SessionExpiry` polling/redirect path has no test coverage
| | |
|--|--|
| Severity | Low |
| Category | Testing coverage |
| Status | Resolved |
| Location | `tests/ScadaLink.CentralUI.Tests/Auth/SessionExpiryPolicyTests.cs`; `src/ScadaLink.CentralUI/Components/Shared/SessionExpiry.razor` |
**Description**
`SessionExpiryPolicyTests` covers only `AuthEndpoints.BuildSignInProperties()`
(the sign-in properties shape). The actual runtime behaviour of
`SessionExpiry.razor` — that an expired session triggers a redirect to
`/login`, that an authenticated session does not, and that the component does
not poll/redirect on the `/login` page itself — is untested. Had a behavioural
test exercised the redirect with an expired/anonymous auth state against the
real `CookieAuthenticationStateProvider`, the CentralUI-020 defect (the frozen
snapshot never reporting an expired session) would have been caught. The
component is the system's only client-side idle-logout mechanism, so the gap is
material.
**Recommendation**
Add bUnit tests for `SessionExpiry`: (a) with an unauthenticated auth state the
component navigates to `/login`; (b) with an authenticated state it does not;
(c) on the `/login` route it neither polls nor redirects. The provider used in
the test must be one whose state can actually transition to expired — which
also forces the CentralUI-020 fix.
**Resolution**
2026-05-17 — added `SessionExpiryComponentTests` (bUnit): an expired ping (401) redirects to `/login`, a live ping (200) and a transient failure (status 0) do not, and on the `/login` route the component neither pings nor redirects; also added `AuthPingEndpointTests` covering the `/auth/ping` endpoint contract.