Files
mxaccessgw/code-reviews/Server/findings.md
T
Joseph Doherty f0a4af62b9 Review the clients/ language clients; mark Server-001/003 resolved
Adds per-module code reviews for the five language clients under
clients/ (Client.Dotnet, Client.Go, Client.Java, Client.Python,
Client.Rust) at commit 3cc53a8 — 53 findings (4 High, 15 Medium,
34 Low; all Open). Extends REVIEW-PROCESS.md so a "module" may also be
a language client under clients/, not only a src/ project.

Marks Server-001 (Critical) and Server-003 (High) Resolved — fixed in
a8aafdf — and regenerates code-reviews/README.md (now 11 modules).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:51:00 -04:00

17 KiB

Code Review — Server

Field Value
Module src/MxGateway.Server
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 6c64030
Status Reviewed
Open findings 12

Checklist coverage

# Category Result
1 Correctness & logic bugs Issues found: Server-006 (metrics open-session leak on alarm auto-subscribe failure), Server-010 (rotate reactivates revoked keys).
2 mxaccessgw conventions Issues found: Server-002 (orphan-worker termination on startup not implemented), Server-011 (style deviation in WorkerAlarmRpcDispatcher).
3 Concurrency & thread safety No issues found — locking is correct; inconsistent-but-safe discipline in GatewayMetrics noted only.
4 Error handling & resilience Issues found: Server-005 (Galaxy first-load can fault the host BackgroundService), Server-009 (SQLite has no busy-timeout/WAL under concurrent writes).
5 Security Issues found: Server-001 (Critical: dashboard authorization never enforced on any route), Server-003 (LDAP dashboard users denied for lack of a scope claim), Server-010.
6 Performance & resource management Issues found: Server-007 (DiscoverHierarchy paging is O(total) per page), Server-008 (WatchDeployEvents re-projects whole hierarchy per event).
7 Design-document adherence Issues found: Server-002 (orphan workers), Server-012 (CLAUDE.md scope names stale vs code/docs).
8 Code organization & conventions Issues found: Server-011 (style), Server-004 (CLI accepts unvalidated scope strings).
9 Testing coverage Issues found: Server-013 (no dashboard route-level authorization test; WorkerExecutableValidator, GalaxyGlobMatcher, projector paging untested).
10 Documentation & comments Issues found: Server-014 (stale "not yet wired" alarm comments), Server-012.

Findings

Server-001

Field Value
Severity Critical
Category Security
Location src/MxGateway.Server/GatewayApplication.cs:147-149, src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58, src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15
Status Resolved

Description: The dashboard authorization policy (DashboardAuthenticationDefaults.AuthorizationPolicy), DashboardAuthorizationRequirement, and DashboardAuthorizationHandler are registered in DI but never applied to any endpoint. MapRazorComponents<App>() has no .RequireAuthorization(...), the <Router> in Routes.razor uses plain RouteView (not AuthorizeRouteView), and no dashboard page carries [Authorize] — a module-wide grep finds zero RequireAuthorization/[Authorize]/AuthorizeRouteView usages. Every dashboard page (Sessions, Workers, Events, Galaxy, Settings, and the API Keys list exposing key IDs, scopes, and constraints) is reachable by any unauthenticated remote client regardless of Dashboard:AllowAnonymousLocalhost or Dashboard:RequireAdminScope. Only the API-key mutation operations remain protected, via the separate DashboardApiKeyManagementService.CanManage check.

Recommendation: Apply the policy at the route level — endpoints.MapRazorComponents<App>().AddInteractiveServerRenderMode().RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy) — and/or switch Routes.razor to AuthorizeRouteView with a [Authorize] fallback policy plus a NotAuthorized redirect to the login page. Add an integration test that GETs a dashboard page anonymously and asserts 302-to-login / 401.

Resolution: Resolved in a8aafdf (2026-05-18): MapRazorComponents<App>() now calls .RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy), so an unauthenticated request to any dashboard component route is challenged by the cookie scheme and redirected to the login page. GatewayApplicationTests gained ComponentRoutesRequireAuthorization (component routes carry the policy) and AuthEndpointsAllowAnonymousAccess, replacing the prior test that asserted the insecure behavior.

Server-002

Field Value
Severity Medium
Category Design-document adherence
Location src/MxGateway.Server/Program.cs:24, src/MxGateway.Server/GatewayApplication.cs
Status Open

Description: gateway.md:583 and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover MxGateway.Worker.exe processes at startup — a grep for orphan/reattach/terminate finds nothing. After an unclean gateway crash, x86 worker processes (each holding an MXAccess COM instance) leak and survive indefinitely, and a restarted gateway does not reclaim or kill them.

Recommendation: Add a startup hosted service that finds and kills stale worker processes (by executable path / a well-known argument or environment marker) before the server accepts sessions, or update the design docs if reattachment/cleanup is deliberately deferred.

Resolution: (open)

Server-003

Field Value
Severity High
Category Security
Location src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59, src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258
Status Resolved

Description: When Dashboard:RequireAdminScope is true (the default) and the request is not loopback, DashboardAuthorizationHandler succeeds only if HasAdminScope finds a claim of type "scope" with value "admin". But DashboardAuthenticator.CreatePrincipal issues only NameIdentifier, Name, and LdapGroupClaimType claims — never a scope/admin claim. So a correctly LDAP-authenticated user who passed the required-group check is still denied dashboard access on any non-loopback connection. The bug is currently masked by the missing route-level enforcement (Server-001) and by AllowAnonymousLocalhost; fixing Server-001 would make the dashboard unusable for all real LDAP logins.

Recommendation: Either have DashboardAuthenticator.CreatePrincipal add a scope=admin claim when the user is in the required group, or change DashboardAuthorizationHandler.HasAdminScope to evaluate LDAP group membership (reuse IsMemberOfRequiredGroup against the LdapGroupClaimType claims, as DashboardApiKeyAuthorization.CanManage already does).

Resolution: Resolved in a8aafdf (2026-05-18): DashboardAuthenticator.CreatePrincipal — reached only after the required-group check passes — now emits the scope=admin claim that DashboardAuthorizationHandler checks, so group-validated LDAP users pass RequireAdminScope once route-level authorization (Server-001) is enforced.

Server-004

Field Value
Severity Medium
Category Code organization & conventions
Location src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233, src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77, src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67
Status Open

Description: ParseScopes accepts any comma-separated strings and CreateKeyAsync persists them verbatim; neither the CLI nor the dashboard create path validates scopes against GatewayScopes. A typo or non-canonical name (e.g. CLAUDE.md's example --scopes session,invoke,event,metadata,admin, which does not match the resolver's session:open/invoke:read/etc.) silently creates a key whose scope strings the authorization resolver never checks for — the key is unusable for those RPCs with no error at creation time.

Recommendation: Validate every requested scope against the GatewayScopes catalog at create time in both the CLI parser/runner and DashboardApiKeyManagementService.ValidateCreateRequest, rejecting unknown scope strings.

Resolution: (open)

Server-005

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28, src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184
Status Open

Description: GalaxyHierarchyCache.RefreshCoreAsync only catches SqlException and InvalidOperationException. The initial cache.RefreshAsync call in GalaxyHierarchyRefreshService.ExecuteAsync is wrapped only for OperationCanceledException. A transient non-SqlException failure on the first refresh (e.g. a Win32Exception/TimeoutException from connection establishment, or another DbException subtype) escapes both layers, faults the BackgroundService, and — with default host behavior — stops the whole gateway. The periodic-tick loop does catch general exceptions, so only the first load is exposed.

Recommendation: Broaden the catch in RefreshCoreAsync to all non-cancellation exceptions (record Unavailable/Stale and still complete _firstLoad), or wrap the initial RefreshAsync in GalaxyHierarchyRefreshService with the same general catch the tick loop uses.

Resolution: (open)

Server-006

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.Server/Sessions/SessionManager.cs:84-114
Status Open

Description: In OpenSessionAsync, _metrics.SessionOpened() (line 89) increments the _openSessions gauge before TryAutoSubscribeAlarmsAsync runs. If auto-subscribe throws (which it does when Alarms.RequireSubscribeOnOpen is true and the worker rejects the subscription), the catch block disposes and removes the session and records _metrics.Fault(...) but never calls SessionClosed/SessionRemoved. The mxgateway.sessions.open gauge permanently over-counts by one for every such failed open.

Recommendation: In the catch block, when the session had reached the point where SessionOpened() was recorded, also call _metrics.SessionRemoved() — or move the SessionOpened() call to after auto-subscribe succeeds.

Resolution: (open)

Server-007

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70
Status Open

Description: Project always iterates the full entry.Index.ObjectViews collection and re-applies all filters to skip offset matched items before collecting a page. Paging through a large Galaxy hierarchy is therefore O(total) per page and O(total²/pageSize) end-to-end. The cache is in-memory so impact is bounded, but for large galaxies repeated DiscoverHierarchy pagination wastes CPU.

Recommendation: Precompute and cache the filtered, ordered view list per (filterSignature, sequence) so subsequent pages are an O(pageSize) slice; the existing filter signature already keys page tokens.

Resolution: (open)

Server-008

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189
Status Open

Description: WatchDeployEvents calls ResolveBrowseSubtrees() on every streamed event, and MapDeployEvent re-runs GalaxyHierarchyProjector.Project over the entire cached hierarchy (and Sums attribute counts) for every event of every constrained subscriber. GalaxyGlobMatcher.IsMatch also rebuilds the glob regex on each call. With many constrained subscribers and frequent deploys this is avoidable work.

Recommendation: Hoist ResolveBrowseSubtrees() out of the loop; compute scoped object/attribute counts once per deploy sequence and cache by (sequence, browseSubtrees); cache compiled glob Regex instances in GalaxyGlobMatcher.

Resolution: (open)

Server-009

Field Value
Severity Low
Category Error handling & resilience
Location src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32
Status Open

Description: Each auth-store operation opens a fresh SqliteConnection with no busy timeout, no WAL journal mode, and default journaling. MarkKeyUsedAsync runs on every authenticated request and SqliteApiKeyAuditStore appends on every denial; under concurrent load these writers can collide and surface SQLITE_BUSY as a hard failure on the request path.

Recommendation: Set Pooling, a non-zero DefaultTimeout/busy_timeout, and enable WAL (PRAGMA journal_mode=WAL) once at startup so concurrent readers/writers degrade gracefully.

Resolution: (open)

Server-010

Field Value
Severity Low
Category Security
Location src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114, src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172
Status Open

Description: RotateAsync sets revoked_utc = NULL, so rotating a previously revoked key silently reactivates it. This is documented intentional behavior in docs/Authentication.md:167, but the dashboard renders the "Rotate" button unconditionally — including for keys whose status badge says "Revoked" — so an operator can un-revoke a deliberately disabled key without an explicit warning.

Recommendation: Either hide/disable the Rotate action for revoked keys in ApiKeysPage.razor, require an explicit confirmation, or have RotateAsync preserve revoked_utc and add a separate explicit "reactivate" operation.

Resolution: (open)

Server-011

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46
Status Open

Description: WorkerAlarmRpcDispatcher deviates from the module's conventions: it fully-qualifies System.Guid, System.ArgumentNullException, and System.Threading types inline instead of relying on using directives, and uses an explicit constructor with this.-qualified field assignment while the rest of the module (e.g. ConstraintEnforcer, MxAccessGatewayService, GalaxyRepositoryGrpcService) uses primary constructors. docs/style-guides/CSharpStyleGuide.md is authoritative for gateway code.

Recommendation: Add the needed using directives, drop the inline fully-qualified names, and convert to a primary constructor for consistency.

Resolution: (open)

Server-012

Field Value
Severity Low
Category Documentation & comments
Location CLAUDE.md (Authentication section and apikey create example)
Status Open

Description: CLAUDE.md describes scopes as session, invoke, event, metadata, admin and shows apikey create --scopes session,invoke,event,metadata,admin. The actual canonical scope strings (used by GatewayScopes, GatewayGrpcScopeResolver, and docs/Authorization.md) are session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. A key created per the CLAUDE.md example carries scopes the resolver never matches.

Recommendation: Update CLAUDE.md's scope list and the apikey example to the canonical *:* scope strings, per CLAUDE.md's own rule that docs change with the code.

Resolution: (open)

Server-013

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs, src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs
Status Open

Description: DashboardAuthorizationHandler is unit-tested in isolation, but no test exercises the dashboard routes end-to-end to confirm the policy is actually enforced — which is why Server-001 (policy registered but never wired) went uncaught. There are also no tests for WorkerExecutableValidator (PE-header architecture parsing), GalaxyGlobMatcher (anchoring/escaping/empty-glob fail-open), or GalaxyHierarchyProjector pagination/page-token behavior.

Recommendation: Add a WebApplicationFactory integration test that requests a dashboard page unauthenticated and asserts the redirect/401, plus unit tests for WorkerExecutableValidator, GalaxyGlobMatcher, and projector paging.

Resolution: (open)

Server-014

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237
Status Open

Description: The XML <remarks> and inline comments on AcknowledgeAlarm and QueryActiveAlarms describe the alarm path as not yet wired and say NotWiredAlarmRpcDispatcher is the default ("Clients calling this method today receive an OK reply with a 'worker alarm path not yet wired' diagnostic", "an empty stream until PR A.2"). In fact SessionServiceCollectionExtensions.AddGatewaySessions registers WorkerAlarmRpcDispatcher as IAlarmRpcDispatcher, so DI always injects the production dispatcher; NotWiredAlarmRpcDispatcher is only the null fallback. The comments are stale and misleading.

Recommendation: Update the AcknowledgeAlarm/QueryActiveAlarms remarks to reflect that WorkerAlarmRpcDispatcher is the wired default, and describe its actual GUID-vs-Provider!Group.Tag handling.

Resolution: (open)