Files
lmxopcua/code-reviews/Admin/findings.md
Joseph Doherty 2b33b64a58 fix(admin): resolve Low code-review findings (Admin-010,011,012)
- Admin-010: vendor Bootstrap 5.3.3 (CSS + JS bundle + maps + provenance
  README) under wwwroot/lib/bootstrap and reference local paths from
  App.razor — Admin no longer pulls Bootstrap from jsDelivr.
- Admin-011: swap FleetStatusPoller's three plain dictionaries for
  ConcurrentDictionary so ResetCache can't race a poll tick.
- Admin-012: drop the EquipmentId column from EquipmentCsvImporter (per
  admin-ui.md — equipment id is system-derived from EquipmentUuid);
  EquipmentImportBatchService and the textarea placeholder updated to
  match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:24:07 -04:00

25 KiB

Code Review — Admin

Field Value
Module src/Server/ZB.MOM.WW.OtOpcUa.Admin
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Admin-005
2 OtOpcUa conventions Admin-010
3 Concurrency & thread safety Admin-011
4 Error handling & resilience Admin-008, Admin-013
5 Security Admin-001, Admin-002, Admin-003, Admin-004, Admin-006
6 Performance & resource management No issues found
7 Design-document adherence Admin-007, Admin-012
8 Code organization & conventions No issues found
9 Testing coverage Admin-009
10 Documentation & comments Admin-012

Findings

Admin-001

Field Value
Severity Critical
Category Security
Location Components/Routes.razor:4-11, Program.cs:150
Status Resolved

Description: The router uses a plain RouteView (not AuthorizeRouteView), and MapRazorComponents<App>() is registered without .RequireAuthorization(). A page-level [Authorize] attribute on a routable Razor component is only enforced when the router is AuthorizeRouteView — with RouteView the attribute is inert. Consequently every page in the app, including those that carry @attribute [Authorize] (ClusterDetail, DraftEditor, Reservations, RoleGrants, Certificates, VirtualTags, ScriptedAlarms, ScriptLog, DiffViewer, ImportEquipment, Account), is reachable by a fully unauthenticated user. There is no authentication gate anywhere in the pipeline. An anonymous browser can read the full fleet configuration, audit log, certificates and ACLs, and exercise mutating pages (see Admin-002).

Recommendation: Replace RouteView with AuthorizeRouteView in Routes.razor (with a <NotAuthorized> slot that redirects to /login), or call .RequireAuthorization() on the MapRazorComponents endpoint with /login and /auth/* explicitly allowed anonymous. Add a fallback policy (AddAuthorizationBuilder().SetFallbackPolicy(...)) so new pages are secure-by-default. Re-verify every page after the gate is in place.

Resolution: Resolved 2026-05-22 — Routes.razor switched to AuthorizeRouteView with a NotAuthorized slot routing unauthenticated callers to /login via a new RedirectToLogin component; AddAuthorizationBuilder().SetFallbackPolicy(RequireAuthenticatedUser()) makes pages secure-by-default; Login.razor opts out with [AllowAnonymous] so the login page and static assets stay anonymous. Covered by PageAuthorizationTests (verified failing pre-fix, passing post-fix).

Admin-002

Field Value
Severity Critical
Category Security
Location Components/Pages/Clusters/NewCluster.razor:1-7, Home.razor, Fleet.razor, Hosts.razor, AlarmsHistorian.razor, Clusters/ClustersList.razor, Clusters/Generations.razor, Drivers/FocasDetail.razor
Status Resolved

Description: Several routable pages carry no authorization attribute at all. Most critically NewCluster (/clusters/new) is a mutating page — its CreateAsync writes a new ServerCluster row and a draft generation. Combined with Admin-001 (the router does not enforce [Authorize] either), an unauthenticated user can create clusters and seed config-DB rows. Home, Fleet, Hosts, AlarmsHistorian, ClustersList, Generations and FocasDetail likewise expose fleet topology, host status, historian diagnostics and generation history to anonymous callers.

Recommendation: Add @attribute [Authorize(...)] to every routable page with the role/policy appropriate to its function (NewCluster and other write surfaces -> CanPublish/CanEdit; read pages -> an authenticated-user policy). A solution-wide fallback policy (see Admin-001) is the durable fix; per-page attributes remain the explicit declaration of intent.

Resolution: Resolved 2026-05-22 — @attribute [Authorize] added to every unprotected routable page (Home, Fleet, Hosts, AlarmsHistorian, ClustersList, FocasDetail, ModbusAddressPreview, ModbusDiagnostics); NewCluster gated with [Authorize(Policy = "CanPublish")] per the admin-ui.md FleetAdmin cluster-create flow. Re-triage note: Clusters/Generations.razor carries no @page directive — it is a child component of ClusterDetail, not a routable page, so it needs no attribute (it inherits the parent route's gate). The Admin-001 fallback policy is the durable secure-by-default backstop; the per-page attributes are the explicit declaration of intent. Covered by PageAuthorizationTests.

Admin-003

Field Value
Severity High
Category Security
Location Program.cs:137-139, Hubs/FleetStatusHub.cs:11, Hubs/AlertHub.cs:10, Hubs/ScriptLogHub.cs:30
Status Resolved

Description: All three SignalR hubs (/hubs/fleet, /hubs/alerts, /hubs/script-log) are mapped with no [Authorize] attribute and no .RequireAuthorization() on the MapHub call. Any unauthenticated client can open a hub connection: FleetStatusHub.SubscribeFleet() streams every node generation/role/resilience state, AlertHub pushes all fleet alerts (including failure detail text), and ScriptLogHub.TailLogAsync streams the contents of the server scripts-*.log files. This is an unauthenticated information-disclosure channel that bypasses the (already broken — see Admin-001) page auth entirely.

Recommendation: Add [Authorize] to each Hub class, or chain .RequireAuthorization() onto each MapHub(...) call in Program.cs. The hub SubscribeCluster/TailLogAsync methods should additionally validate that the caller claims permit the requested cluster/script scope.

Resolution: Resolved 2026-05-22 — [Authorize] added to FleetStatusHub, AlertHub and ScriptLogHub, and .RequireAuthorization() chained onto all three MapHub(...) calls in Program.cs as a belt-and-braces backstop, so an anonymous client can no longer open any hub connection. Covered by AuthEndpointsTests.Anonymous_hub_negotiate_is_rejected.

Admin-004

Field Value
Severity High
Category Security
Location appsettings.json:3,13-14
Status Resolved

Description: The checked-in appsettings.json contains live-looking secrets in plaintext: the ConfigDb connection string with User Id=sa;Password=OtOpcUaDev_2026! and the LDAP ServiceAccountPassword: "serviceaccount123". It also sets Encrypt=False and AllowInsecureLdap: true, so the SQL and LDAP credentials travel unencrypted on the wire. Committing the sa account password and a service-account password to source control is a credential-exposure risk; sa additionally grants full server control, conflicting with the ClusterService doc comment that production should connect with a least-privilege grant.

Recommendation: Move all secrets out of the committed file — use user-secrets for dev and environment variables / a secret store for production; leave only non-secret placeholders in appsettings.json. Use a least-privilege SQL login rather than sa. Enable TLS for both SQL (Encrypt=True) and LDAP (UseTls=true, AllowInsecureLdap=false) for any non-loopback deployment, and document the dev-only exception.

Resolution: Resolved 2026-05-22 — the sa connection string and the LDAP ServiceAccountPassword were replaced with empty placeholders in appsettings.json; a _secrets note documents that they are supplied via user-secrets (dev) or the ConnectionStrings__ConfigDb / Authentication__Ldap__ServiceAccountPassword environment variables (prod), and that the connection string must use Encrypt=True and a least-privilege SQL login. A UserSecretsId was added to the Admin csproj, and Program.cs now fails fast with a clear message when ConfigDb is empty/missing. Covered by AppSettingsSecretHygieneTests.

Admin-005

Field Value
Severity High
Category Correctness & logic bugs
Location Components/Pages/Login.razor:15,107-110
Status Resolved

Description: Login.razor is an interactive component (the project default render mode is interactive server; the page declares no @rendermode but uses EditForm/InputText interactive binding and runs SignInAsync from an event handler). It calls HttpContext.SignInAsync(...) followed by ctx.Response.Redirect("/") from within a SignalR circuit callback. Writing auth cookies and HTTP redirect headers requires a live, unstarted HTTP response; in an interactive circuit the original HTTP response has long completed, so the cookie is typically not emitted and the redirect is ineffective (or throws "response has already started"). admin-ui.md section "Operator authentication" explicitly specifies the login as a static server-rendered HTML form POSTing to a /auth/login minimal-API endpoint with data-enhance="false" — that endpoint is not implemented and is not mapped in Program.cs.

Recommendation: Implement the login as designed: a static-rendered form (@rendermode none, data-enhance="false") posting to a MapPost("/auth/login", ...) minimal-API handler that does the LDAP bind, grant resolution, SignInAsync and redirect while the HTTP response is still owned by the endpoint. Do not perform SignInAsync from an interactive circuit.

Resolution: Resolved 2026-05-22 — Login.razor rewritten as a static-rendered plain HTML <form method="post" action="/auth/login" data-enhance="false"> (no @rendermode, no EditForm/SignInAsync in a circuit); the LDAP bind, grant resolution, cookie SignInAsync and redirect now run in a new AuthEndpoints.MapAuthEndpoints() minimal-API handler (/auth/login, /auth/logout) while the endpoint still owns the HTTP response. The handler is AllowAnonymous, carries an open-redirect guard on returnUrl, and surfaces bind errors back to the login page via a query-string. Covered by AuthEndpointsTests (valid login issues the cookie, invalid login redirects with error, open-redirect rejected, logout clears the cookie).

Admin-006

Field Value
Severity Medium
Category Security
Location Components/Layout/MainLayout.razor:47-49, Program.cs:129,131-135
Status Resolved

Description: app.UseAntiforgery() is enabled, but the Sign-out form (<form method="post" action="/auth/logout">) renders no antiforgery token, and the MapPost("/auth/logout", ...) endpoint does not call .DisableAntiforgery() or otherwise opt out. Depending on framework version this either makes logout fail with a 400 for legitimate users, or — if the endpoint is treated as exempt — leaves logout as an unprotected state-changing POST (CSRF logout). The same concern applies to the login form once Admin-005 is addressed.

Recommendation: Emit an antiforgery token in the logout form and let UseAntiforgery() validate it; or explicitly and deliberately mark the endpoint .DisableAntiforgery() if a tokenless logout is intended. Verify login/logout round-trips after the change.

Resolution: Resolved 2026-05-22 — <AntiforgeryToken /> added to the sign-out form in MainLayout.razor and .DisableAntiforgery() removed from the /auth/logout endpoint so UseAntiforgery() validates the token; a tokenless POST now returns 400, preventing CSRF-logout. The login endpoint retains .DisableAntiforgery() (login is not a state-changing operation CSRF can abuse). AuthEndpointsTests.Logout_without_antiforgery_token_is_rejected regression-guards this.

Admin-007

Field Value
Severity Medium
Category Design-document adherence
Location Components/Pages/Clusters/NewCluster.razor:91,95-96
Status Resolved

Description: NewCluster.CreateAsync hardcodes CreatedBy = "admin-ui" (both on the ServerCluster row and the draft generation) instead of the signed-in operator principal name. admin-ui.md section "Audit" requires "the operator principal" be recorded on every write. The audit trail therefore cannot attribute cluster creation to a person. The same literal would apply to any anonymous creation that Admin-001/002 currently permit.

Recommendation: Pass the authenticated user identity (ClaimTypes.Name / NameIdentifier from the cascaded AuthenticationState) as createdBy. Apply the same pattern to every other Admin write path that records a CreatedBy/PublishedBy/ReleasedBy field.

Resolution: Resolved 2026-05-22 — NewCluster.razor and ClusterDetail.razor (the two pages that call ClusterService.CreateAsync / GenerationService.CreateDraftAsync with a hardcoded literal) now resolve ClaimTypes.Name / ClaimTypes.NameIdentifier from the cascaded AuthenticationState and pass the operator principal name as createdBy; the fallback is "unknown" (defensive, should never occur on an [Authorize]-gated page).

Admin-008

Field Value
Severity Medium
Category Error handling & resilience
Location Services/ReservationService.cs:28-37
Status Resolved

Description: ReservationService.ReleaseAsync calls sp_ReleaseExternalIdReservation with only @Kind, @Value, @ReleaseReason. admin-ui.md section "Release an external-ID reservation" specifies the proc sets ReleasedBy to the FleetAdmin who performed the release, and the action is the only path that allows ZTag/SAPID reuse and "requires explicit FleetAdmin action with a documented reason." The service does not capture or pass the operator principal, so the compliance audit trail for a release records no actor (unless the proc derives it from the DB session login, which would be the shared service account, not the operator).

Recommendation: Add an operator-principal parameter to ReleaseAsync, pass it to the stored proc as @ReleasedBy, and have callers supply the signed-in user. Confirm the proc signature accepts it.

Resolution: Resolved 2026-05-22 — a new EF migration (20260522000001_AddReleasedByToReleaseExternalIdReservation) adds @ReleasedBy nvarchar(128) to sp_ReleaseExternalIdReservation and uses it for both ExternalIdReservation.ReleasedBy and ConfigAuditLog.Principal (replacing SUSER_SNAME()); ReservationService.ReleaseAsync gains a releasedBy parameter with a guard; Reservations.razor resolves ClaimTypes.Name / ClaimTypes.NameIdentifier from the cascaded AuthenticationState and passes the operator principal to the service.

Admin-009

Field Value
Severity Medium
Category Testing coverage
Location src/Server/ZB.MOM.WW.OtOpcUa.Admin (whole module)
Status Resolved

Description: The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of the login -> cookie issuance round-trip (Admin-005), and the AdminRoleGrantResolver / ClusterRoleClaims authorization logic is exercised only in isolation. InternalsVisibleTo points at ZB.MOM.WW.OtOpcUa.Admin.Tests, but the auth pipeline itself is not asserted end-to-end. Per REVIEW-PROCESS.md category 9 these are untested critical paths.

Recommendation: Add WebApplicationFactory-based integration tests asserting: (a) anonymous GET of each protected route returns 302->/login or 401; (b) anonymous hub connect is refused; (c) a valid login issues the cookie and a subsequent request is authorized; (d) a ConfigViewer is denied CanPublish pages. Wire the check into the *.Admin.Tests suite.

Resolution: Resolved 2026-05-22 — (a) covered by existing PageAuthorizationTests; (b) covered by existing AuthEndpointsTests.Anonymous_hub_negotiate_is_rejected; (c) covered by existing AuthEndpointsTests.Valid_login_issues_the_auth_cookie_and_redirects_home; (d) new AdminAuthPipelineTests adds a WebApplicationFactory with a RoleInjectingHandler that stamps requests with caller-supplied roles, asserting that ConfigViewer is denied CanPublish-gated pages (403/302) while FleetAdmin is permitted, and that a FleetAdmin session can reach protected pages.

Admin-010

Field Value
Severity Low
Category OtOpcUa conventions
Location Components/App.razor:9,16
Status Resolved

Description: App.razor loads Bootstrap CSS and JS from the cdn.jsdelivr.net CDN. admin-ui.md section "Tech Stack" specifies "Bootstrap 5 vendored under wwwroot/lib/bootstrap/" precisely so the Admin app has no third-party runtime dependency. A CDN reference makes the UI fail in air-gapped / locked-down fleet deployments (a stated deployment target), introduces an uncontrolled third-party origin, and is not covered by a Subresource Integrity hash.

Recommendation: Vendor Bootstrap under wwwroot/lib/bootstrap/ and reference the local copies, as the design doc requires. If a CDN is retained for any asset, add integrity + crossorigin SRI attributes.

Resolution: Resolved 2026-05-23 — Bootstrap 5.3.3 (CSS + JS bundle, plus their source maps) vendored under src/Server/ZB.MOM.WW.OtOpcUa.Admin/wwwroot/lib/bootstrap/{css,js}/; App.razor now references the local copies (lib/bootstrap/css/bootstrap.min.css, lib/bootstrap/js/bootstrap.bundle.min.js); a README under the vendor directory records provenance + upgrade steps. Covered by BootstrapVendoringTests (asserts no cdn.jsdelivr.net/cdnjs/unpkg references in App.razor, that the vendored files exist with non-trivial sizes, and that App.razor references the vendored paths) — verified failing pre-fix, passing post-fix.

Admin-011

Field Value
Severity Low
Category Concurrency & thread safety
Location Hubs/FleetStatusPoller.cs:24-26,98-103
Status Resolved

Description: FleetStatusPoller keeps three plain Dictionary<> fields (_last, _lastRole, _lastResilience) mutated from PollOnceAsync. The poller ExecuteAsync loop is single-threaded so the steady-state poll path is safe, but ResetCache() (exposed internal for tests) clears those same dictionaries with no synchronization. If a test (or any caller) invokes ResetCache() while a poll tick is mid-iteration, the Dictionary enumeration/mutation race can throw InvalidOperationException or corrupt state.

Recommendation: Either document ResetCache() as "only safe when the poller is stopped" and have tests stop the service first, or guard the three dictionaries with a lock / swap them atomically. Using ConcurrentDictionary (as the sibling ResilientLdapGroupRoleMappingService does) would make the intent explicit.

Resolution: Resolved 2026-05-23 — _last, _lastRole, and _lastResilience swapped from plain Dictionary<,> to ConcurrentDictionary<,> so concurrent ResetCache() / poll-tick mutations are safe by construction (the recommendation's "explicit intent" form). Covered by FleetStatusPollerConcurrencyTests — one test guards the structural choice via reflection so a future refactor cannot silently revert; the other stress-runs concurrent mutate + ResetCache() via reflection, verifying the race throws no exception (verified failing pre-fix with Dictionary<,>).

Admin-012

Field Value
Severity Low
Category Design-document adherence
Location Services/EquipmentCsvImporter.cs:18-19,33-37,229,232
Status Resolved

Description: EquipmentCsvImporter declares EquipmentId as a required CSV column and parses it into a required field. admin-ui.md section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No EquipmentId column — operator-supplied EquipmentId would mint duplicate equipment identity on typos ... never accepted from CSV imports." EquipmentId is system-derived (EQ- plus first 12 hex chars of EquipmentUuid). Accepting it from CSV either contradicts the design or silently lets an import set an identity field the doc says is un-settable. The XML doc on the class also cites the column as required per "decision #117", so either the code or the design doc is stale. EquipmentImportBatchService.StageRowsAsync propagates row.EquipmentId into the staging row, so any change must cover the finalize path.

Recommendation: Reconcile with the design: drop EquipmentId from RequiredColumns and the EquipmentCsvRow shape (deriving it from EquipmentUuid at finalize time), or — if accepting it is a deliberate reversal — update admin-ui.md and the decision log so the two agree.

Resolution: Resolved 2026-05-23 — code reconciled with the design: EquipmentId dropped from EquipmentCsvImporter.RequiredColumns, BuildRow, GetCell, and the EquipmentCsvRow shape; the class XML doc now records the admin-ui.md "No EquipmentId column" rule. The finalize path is covered: EquipmentImportBatchService.StageRowsAsync now derives the staging-row's EquipmentId via DraftValidator.DeriveEquipmentId(equipmentUuid), and FinaliseBatchAsync re-derives it from the UUID that actually lands in the Equipment row (so a blank/invalid staged UUID that gets replaced by Guid.NewGuid() no longer leaves EquipmentId and EquipmentUuid out of sync). ImportEquipment.razor's textarea placeholder updated to the new header shape. Covered by EquipmentCsvNoEquipmentIdColumnTests (five tests guarding RequiredColumns/OptionalColumns/EquipmentCsvRow shape and asserting CSVs with an EquipmentId column are rejected as unknown while CSVs without are accepted) — verified failing pre-fix, passing post-fix. The existing EquipmentCsvImporterTests + EquipmentImportBatchServiceTests were updated to the new header shape and pass green (DB-backed suite ran against 10.100.0.35,14330).

Admin-013

Field Value
Severity High
Category Error handling & resilience
Location Components/Pages/Clusters/ClusterDetail.razor:180-197, Components/Pages/Clusters/AclsTab.razor, Components/Pages/Clusters/RedundancyTab.razor, Components/Pages/RoleGrants.razor, Components/Pages/Hosts.razor, Components/Pages/ScriptLog.razor, Program.cs:157-159
Status Resolved

Description: The Admin-003 fix gated all three SignalR hubs with [Authorize] plus .RequireAuthorization(), but the six pages that open a client HubConnection to those hubs were never updated to authenticate. A server-side Blazor HubConnection runs inside the interactive circuit and has no access to the browser's HttpOnly OtOpcUa.Admin auth cookie, so the hub negotiate request returns 401. Four pages (ClusterDetail, AclsTab, RedundancyTab, RoleGrants) called HubConnection.StartAsync() with no try/catch, so the 401 surfaced as an unhandled exception — a full HTTP 500 page for the prerendered /clusters/{ClusterId} route (the core cluster-config surface) and a faulted circuit for the others. Hosts and ScriptLog already wrapped the connect in try/catch, so they did not crash, but the SignalR live-update feature was non-functional Admin-wide regardless. The Admin-003 hardening was therefore incomplete: it secured the hub server side without giving the in-process clients any way to present credentials. Discovered during a post-review browser smoke test of /clusters/cluster-dev.

Recommendation: Two parts. (1) Stop the crash: guard every HubConnection.StartAsync() in try/catch, matching the best-effort pattern already documented in Hosts.razor — a hub hiccup must degrade live updates, not fault the page. (2) Restore the feature: give the hub clients a real credential. Cookie forwarding is not viable (the HttpOnly cookie is unreachable from the interactive circuit and persisting it into page state would leak it), so add a token scheme — mint a short-lived token for the circuit's authenticated user and supply it via HttpConnectionOptions.AccessTokenProvider, with a matching server-side authentication handler on the hub endpoints.

Resolution: Resolved 2026-05-22 — (1) StartAsync/SendAsync wrapped in try/catch on ClusterDetail, AclsTab, RedundancyTab and RoleGrants so a hub failure degrades gracefully. (2) Added a bearer-token auth path: HubTokenService mints/validates short-lived tokens using ASP.NET Core Data Protection (no signing-key management, no new packages); HubTokenAuthenticationHandler is a custom HubToken scheme reading the token from the Authorization: Bearer header (negotiate) or the access_token query parameter (WebSocket upgrade); the HubClients authorization policy runs both the cookie and HubToken schemes and is applied via RequireAuthorization("HubClients") on all three MapHub calls; AdminHubConnectionFactory builds connections with an AccessTokenProvider that re-mints a token for the circuit's authenticated user on every (re)connect, and all six hub-consuming pages resolve their connections through it. Verified end-to-end in the browser: hub negotiate returns 200 and the WebSocket upgrades (101) where it previously 401'd.