Files
mxaccessgw/code-reviews/Server/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

51 KiB

Code Review — Server

Field Value
Module src/MxGateway.Server
Reviewer Claude Code
Review date 2026-05-20
Commit reviewed 1cd51bb
Status Reviewed
Open findings 0

Checklist coverage

This row summarizes the 2026-05-20 review pass at commit 1cd51bb. Findings from prior passes (Server-001 through Server-014) are all closed and remain below as audit history.

# Category Result
1 Correctness & logic bugs Issues found: Server-019 (WorkerAlarmRpcDispatcher.QueryActiveAlarmsAsync yields silently when session is missing).
2 mxaccessgw conventions No issues found — convention drift previously called out is resolved; no new gaps observed.
3 Concurrency & thread safety Issues found: Server-015 (GatewaySession._state is written under _closeLock but read/written elsewhere under _syncRoot).
4 Error handling & resilience Issues found: Server-016 (GatewaySession.DisposeAsync disposes the close-lock semaphore while it may be held).
5 Security Issues found: Server-017 (AcknowledgeAlarm / QueryActiveAlarms fall through to admin-only scope because the resolver was not updated for the new alarm RPCs).
6 Performance & resource management Issues found: Server-018 (GalaxyGlobMatcher regex cache is unbounded — currently low-risk but uncapped).
7 Design-document adherence No issues found at this pass.
8 Code organization & conventions Issues found: Server-020 (dashboard pages each declare two @page directives — @page "/X" AND @page "/dashboard/X" — producing duplicate routes under the /dashboard group prefix).
9 Testing coverage Issues found: Server-021 (MxAccessGatewayService.ApplyConstraintsAsync and the new BulkConstraintPlan / ReadBulkConstraintPlan / WriteBulkConstraintPlan / SubscribeBulkConstraintPlan merge logic is entirely untested).
10 Documentation & comments Issues found: Server-022 (IAlarmRpcDispatcher XML doc still describes the dispatcher as "ships a not-yet-wired default"; stale after Server-014).

Findings

Server-001

Field Value
Severity Critical
Category Security
Location src/MxGateway.Server/GatewayApplication.cs:147-149, src/MxGateway.Server/Dashboard/DashboardEndpointRouteBuilderExtensions.cs:55-58, src/MxGateway.Server/Dashboard/Components/Routes.razor:1-15
Status Resolved

Description: The dashboard authorization policy (DashboardAuthenticationDefaults.AuthorizationPolicy), DashboardAuthorizationRequirement, and DashboardAuthorizationHandler are registered in DI but never applied to any endpoint. MapRazorComponents<App>() has no .RequireAuthorization(...), the <Router> in Routes.razor uses plain RouteView (not AuthorizeRouteView), and no dashboard page carries [Authorize] — a module-wide grep finds zero RequireAuthorization/[Authorize]/AuthorizeRouteView usages. Every dashboard page (Sessions, Workers, Events, Galaxy, Settings, and the API Keys list exposing key IDs, scopes, and constraints) is reachable by any unauthenticated remote client regardless of Dashboard:AllowAnonymousLocalhost or Dashboard:RequireAdminScope. Only the API-key mutation operations remain protected, via the separate DashboardApiKeyManagementService.CanManage check.

Recommendation: Apply the policy at the route level — endpoints.MapRazorComponents<App>().AddInteractiveServerRenderMode().RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy) — and/or switch Routes.razor to AuthorizeRouteView with a [Authorize] fallback policy plus a NotAuthorized redirect to the login page. Add an integration test that GETs a dashboard page anonymously and asserts 302-to-login / 401.

Resolution: Resolved in a8aafdf (2026-05-18): MapRazorComponents<App>() now calls .RequireAuthorization(DashboardAuthenticationDefaults.AuthorizationPolicy), so an unauthenticated request to any dashboard component route is challenged by the cookie scheme and redirected to the login page. GatewayApplicationTests gained ComponentRoutesRequireAuthorization (component routes carry the policy) and AuthEndpointsAllowAnonymousAccess, replacing the prior test that asserted the insecure behavior.

Server-002

Field Value
Severity Medium
Category Design-document adherence
Location src/MxGateway.Server/Program.cs:24, src/MxGateway.Server/GatewayApplication.cs
Status Resolved

Description: gateway.md:583 and CLAUDE.md state the first version "terminates orphaned workers on startup." No code in MxGateway.Server enumerates or kills leftover MxGateway.Worker.exe processes at startup — a grep for orphan/reattach/terminate finds nothing. After an unclean gateway crash, x86 worker processes (each holding an MXAccess COM instance) leak and survive indefinitely, and a restarted gateway does not reclaim or kill them.

Recommendation: Add a startup hosted service that finds and kills stale worker processes (by executable path / a well-known argument or environment marker) before the server accepts sessions, or update the design docs if reattachment/cleanup is deliberately deferred.

Resolution: Resolved 2026-05-18. Confirmed against source: no code path enumerated or killed leftover workers. Added IRunningProcessInspector / SystemRunningProcessInspector (a testable seam over Process.GetProcessesByName/Kill), OrphanWorkerTerminator (kills processes matched by the configured worker executable path, or by image name when the x64 gateway cannot introspect the x86 worker's MainModule, skipping the current process and tolerating per-process kill failures), and OrphanWorkerCleanupHostedService (best-effort IHostedService). The hosted service is registered in AddWorkerProcessLauncher ahead of AddGatewaySessions so cleanup runs before the server accepts sessions. gateway.md updated to describe the implemented behavior. Regression tests: OrphanWorkerTerminatorTests (KillsWorkerProcessesMatchingConfiguredExecutablePath, KillsImageNameMatchWhenExecutablePathUnreadable, DoesNotKillUnrelatedProcessSharingImageName, DoesNotKillCurrentProcess, ContinuesWhenOneKillThrows).

Server-003

Field Value
Severity High
Category Security
Location src/MxGateway.Server/Dashboard/DashboardAuthorizationHandler.cs:39,54-59, src/MxGateway.Server/Dashboard/DashboardAuthenticator.cs:236-258
Status Resolved

Description: When Dashboard:RequireAdminScope is true (the default) and the request is not loopback, DashboardAuthorizationHandler succeeds only if HasAdminScope finds a claim of type "scope" with value "admin". But DashboardAuthenticator.CreatePrincipal issues only NameIdentifier, Name, and LdapGroupClaimType claims — never a scope/admin claim. So a correctly LDAP-authenticated user who passed the required-group check is still denied dashboard access on any non-loopback connection. The bug is currently masked by the missing route-level enforcement (Server-001) and by AllowAnonymousLocalhost; fixing Server-001 would make the dashboard unusable for all real LDAP logins.

Recommendation: Either have DashboardAuthenticator.CreatePrincipal add a scope=admin claim when the user is in the required group, or change DashboardAuthorizationHandler.HasAdminScope to evaluate LDAP group membership (reuse IsMemberOfRequiredGroup against the LdapGroupClaimType claims, as DashboardApiKeyAuthorization.CanManage already does).

Resolution: Resolved in a8aafdf (2026-05-18): DashboardAuthenticator.CreatePrincipal — reached only after the required-group check passes — now emits the scope=admin claim that DashboardAuthorizationHandler checks, so group-validated LDAP users pass RequireAdminScope once route-level authorization (Server-001) is enforced.

Server-004

Field Value
Severity Medium
Category Code organization & conventions
Location src/MxGateway.Server/Security/Authentication/ApiKeyAdminCommandLineParser.cs:227-233, src/MxGateway.Server/Security/Authentication/ApiKeyAdminCliRunner.cs:53-77, src/MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:21-67
Status Resolved

Description: ParseScopes accepts any comma-separated strings and CreateKeyAsync persists them verbatim; neither the CLI nor the dashboard create path validates scopes against GatewayScopes. A typo or non-canonical name (e.g. CLAUDE.md's example --scopes session,invoke,event,metadata,admin, which does not match the resolver's session:open/invoke:read/etc.) silently creates a key whose scope strings the authorization resolver never checks for — the key is unusable for those RPCs with no error at creation time.

Recommendation: Validate every requested scope against the GatewayScopes catalog at create time in both the CLI parser/runner and DashboardApiKeyManagementService.ValidateCreateRequest, rejecting unknown scope strings.

Resolution: Resolved 2026-05-18. Confirmed against source: ParseScopes split unvalidated strings into the create command and ValidateCreateRequest checked only key id and display name. Added GatewayScopes.All (the canonical scope catalog) and GatewayScopes.IsKnown(string). ApiKeyAdminCommandLineParser.Parse now runs ValidateScopes for create-key commands and fails the parse listing the unknown scope(s) and valid set; DashboardApiKeyManagementService.ValidateCreateRequest rejects requests carrying any non-canonical scope. Revoke/rotate paths are unaffected (no scope input). Regression tests: ApiKeyAdminCommandLineParserTests.Parse_CreateKeyCommand_RejectsUnknownScope, Parse_CreateKeyCommand_AcceptsAllCanonicalScopes, and DashboardApiKeyManagementServiceTests.CreateAsync_UnknownScope_DoesNotCallStore.

Server-005

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs:22-28, src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs:184
Status Resolved

Description: GalaxyHierarchyCache.RefreshCoreAsync only catches SqlException and InvalidOperationException. The initial cache.RefreshAsync call in GalaxyHierarchyRefreshService.ExecuteAsync is wrapped only for OperationCanceledException. A transient non-SqlException failure on the first refresh (e.g. a Win32Exception/TimeoutException from connection establishment, or another DbException subtype) escapes both layers, faults the BackgroundService, and — with default host behavior — stops the whole gateway. The periodic-tick loop does catch general exceptions, so only the first load is exposed.

Recommendation: Broaden the catch in RefreshCoreAsync to all non-cancellation exceptions (record Unavailable/Stale and still complete _firstLoad), or wrap the initial RefreshAsync in GalaxyHierarchyRefreshService with the same general catch the tick loop uses.

Resolution: Resolved 2026-05-18. Confirmed against source: the initial RefreshAsync in ExecuteAsync was guarded only for OperationCanceledException, and RefreshCoreAsync filtered its catch to SqlException or InvalidOperationException. Both recommended layers applied: GalaxyHierarchyRefreshService.ExecuteAsync now catches every non-cancellation exception on the initial load (logs a warning; the periodic tick retries), and GalaxyHierarchyCache.RefreshCoreAsync broadens its catch to all non-cancellation exceptions so the cache still records Stale/Unavailable and completes _firstLoad. The now-unused Microsoft.Data.SqlClient using was removed. Regression test: GalaxyHierarchyRefreshServiceTests.ExecuteAsync_WhenFirstRefreshThrowsNonCancellationException_DoesNotFaultBackgroundService.

Server-006

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.Server/Sessions/SessionManager.cs:84-114
Status Resolved

Description: In OpenSessionAsync, _metrics.SessionOpened() (line 89) increments the _openSessions gauge before TryAutoSubscribeAlarmsAsync runs. If auto-subscribe throws (which it does when Alarms.RequireSubscribeOnOpen is true and the worker rejects the subscription), the catch block disposes and removes the session and records _metrics.Fault(...) but never calls SessionClosed/SessionRemoved. The mxgateway.sessions.open gauge permanently over-counts by one for every such failed open.

Recommendation: In the catch block, when the session had reached the point where SessionOpened() was recorded, also call _metrics.SessionRemoved() — or move the SessionOpened() call to after auto-subscribe succeeds.

Resolution: Resolved 2026-05-18. Confirmed against source: the catch block in OpenSessionAsync recorded Fault(...) and removed the session but never decremented the open-session gauge after SessionOpened() had run. Added a sessionOpenedRecorded flag set immediately after _metrics.SessionOpened(); the catch block now calls _metrics.SessionRemoved() when that flag is set, restoring the gauge for a post-SessionOpened() failure (e.g. an auto-subscribe rejection with RequireSubscribeOnOpen=true). Regression test: SessionManagerAlarmAutoSubscribeTests.OpenSessionAsync_DoesNotLeakOpenSessionGauge_WhenAutoSubscribeFailsWithRequireOn.

Server-007

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Server/Galaxy/GalaxyHierarchyProjector.cs:55-70
Status Resolved

Description: Project always iterates the full entry.Index.ObjectViews collection and re-applies all filters to skip offset matched items before collecting a page. Paging through a large Galaxy hierarchy is therefore O(total) per page and O(total²/pageSize) end-to-end. The cache is in-memory so impact is bounded, but for large galaxies repeated DiscoverHierarchy pagination wastes CPU.

Recommendation: Precompute and cache the filtered, ordered view list per (filterSignature, sequence) so subsequent pages are an O(pageSize) slice; the existing filter signature already keys page tokens.

Resolution: Resolved 2026-05-18. Confirmed against source: Project re-scanned and re-filtered the whole ObjectViews list on every page. Added a ConditionalWeakTable<GalaxyHierarchyCacheEntry, ConcurrentDictionary<string, IReadOnlyList<GalaxyObjectView>>> memo in GalaxyHierarchyProjector: the first projection of a given filter signature builds the filtered, ordered view list; subsequent pages take an O(pageSize) slice via index arithmetic. The memo is keyed on the immutable cache-entry instance, so when the cache publishes a new entry the stale memo becomes unreachable and is reclaimed with it — no explicit invalidation. ResolveRoot still runs before the memo lookup so a missing root surfaces NotFound consistently. Regression tests: GalaxyHierarchyProjectorTests (Project_PagedAcrossEntireHierarchy_ReturnsEveryObjectExactlyOnce, Project_DistinctFiltersOnSameEntry_DoNotShareMemoizedViewList, Project_SameFilterRepeated_ReturnsIdenticalTotals, Project_DistinctCacheEntries_ProjectAgainstTheirOwnData); existing GalaxyRepositoryGrpcServiceTests paging tests continue to pass unchanged.

Server-008

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs:111-134,160-189
Status Resolved

Description: WatchDeployEvents calls ResolveBrowseSubtrees() on every streamed event, and MapDeployEvent re-runs GalaxyHierarchyProjector.Project over the entire cached hierarchy (and Sums attribute counts) for every event of every constrained subscriber. GalaxyGlobMatcher.IsMatch also rebuilds the glob regex on each call. With many constrained subscribers and frequent deploys this is avoidable work.

Recommendation: Hoist ResolveBrowseSubtrees() out of the loop; compute scoped object/attribute counts once per deploy sequence and cache by (sequence, browseSubtrees); cache compiled glob Regex instances in GalaxyGlobMatcher.

Resolution: Resolved 2026-05-18. Confirmed against source. Three changes: (1) WatchDeployEvents now resolves ResolveBrowseSubtrees() once before the streaming loop — the caller's identity and constraints are fixed for the stream lifetime, so per-event resolution was pure waste. (2) GalaxyGlobMatcher now caches compiled Regex instances in a ConcurrentDictionary keyed by glob pattern (with RegexOptions.Compiled), so the same handful of globs are translated once instead of on every IsMatch call. (3) The per-event MapDeployEvent re-projection is no longer a separate hot path: with finding Server-007 resolved, GalaxyHierarchyProjector.Project memoizes the filtered view list per (cache entry, filter signature), so the scoped-count projection in MapDeployEvent for a constrained subscriber is O(matched-slice) after the first event of a given deploy sequence rather than a full re-scan — this subsumes the recommendation's (sequence, browseSubtrees) cache (the memo is keyed on the per-sequence cache-entry instance and the browse-subtree-bearing filter signature). Regression tests: GalaxyFilterInputSafetyTests.GlobMatcher_RepeatedAndInterleavedPatterns_StayCorrect (glob cache correctness); existing WatchDeployEvents and GalaxyFilterInputSafetyTests coverage continues to pass.

Server-009

Field Value
Severity Low
Category Error handling & resilience
Location src/MxGateway.Server/Security/Authentication/AuthSqliteConnectionFactory.cs:15-32
Status Resolved

Description: Each auth-store operation opens a fresh SqliteConnection with no busy timeout, no WAL journal mode, and default journaling. MarkKeyUsedAsync runs on every authenticated request and SqliteApiKeyAuditStore appends on every denial; under concurrent load these writers can collide and surface SQLITE_BUSY as a hard failure on the request path.

Recommendation: Set Pooling, a non-zero DefaultTimeout/busy_timeout, and enable WAL (PRAGMA journal_mode=WAL) once at startup so concurrent readers/writers degrade gracefully.

Resolution: Resolved 2026-05-18. Confirmed against source: the connection string set only DataSource and Mode. AuthSqliteConnectionFactory.CreateConnection now also sets Pooling = true and a non-zero DefaultTimeout. A new OpenConnectionAsync(CancellationToken) opens the connection and applies PRAGMA journal_mode=WAL and PRAGMA busy_timeout (5 s); WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op, while busy_timeout is per-connection state. All nine auth-store call sites (SqliteApiKeyAdminStore, SqliteApiKeyAuditStore, SqliteApiKeyStore, SqliteAuthStoreMigrator) were switched from CreateConnection() + OpenAsync() to OpenConnectionAsync(). docs/Authentication.md updated to describe the WAL/busy-timeout behavior. Regression test: SqliteAuthStoreTests.OpenConnectionAsync_EnablesWalJournalModeAndBusyTimeout.

Server-010

Field Value
Severity Low
Category Security
Location src/MxGateway.Server/Security/Authentication/SqliteApiKeyAdminStore.cs:91-114, src/MxGateway.Server/Dashboard/Components/Pages/ApiKeysPage.razor:168-172
Status Resolved

Description: RotateAsync sets revoked_utc = NULL, so rotating a previously revoked key silently reactivates it. This is documented intentional behavior in docs/Authentication.md:167, but the dashboard renders the "Rotate" button unconditionally — including for keys whose status badge says "Revoked" — so an operator can un-revoke a deliberately disabled key without an explicit warning.

Recommendation: Either hide/disable the Rotate action for revoked keys in ApiKeysPage.razor, require an explicit confirmation, or have RotateAsync preserve revoked_utc and add a separate explicit "reactivate" operation.

Resolution: Resolved 2026-05-18. Confirmed against source: ApiKeysPage.razor rendered the Rotate button unconditionally while Revoke was already gated on key.RevokedUtc is null. Took the lowest-risk recommended option — the dashboard now renders the Rotate (and Revoke) actions only for keys whose status is Active; a revoked key shows a "No actions" placeholder, so an operator cannot un-revoke a deliberately disabled key as a side effect of a rotation. RotateAsync's store-level behavior is unchanged (rotation by key_id still clears revoked_utc, which the CLI relies on); docs/Authentication.md updated to document both the store behavior and the dashboard restriction. No automated test added: the change is pure conditional Razor rendering and the test project has no bUnit component-rendering harness; the underlying DashboardApiKeyManagementService is already unit-tested.

Server-011

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:1-46
Status Resolved

Description: WorkerAlarmRpcDispatcher deviates from the module's conventions: it fully-qualifies System.Guid, System.ArgumentNullException, and System.Threading types inline instead of relying on using directives, and uses an explicit constructor with this.-qualified field assignment while the rest of the module (e.g. ConstraintEnforcer, MxAccessGatewayService, GalaxyRepositoryGrpcService) uses primary constructors. docs/style-guides/CSharpStyleGuide.md is authoritative for gateway code.

Recommendation: Add the needed using directives, drop the inline fully-qualified names, and convert to a primary constructor for consistency.

Resolution: Resolved 2026-05-18. Confirmed against source. Converted WorkerAlarmRpcDispatcher to a primary constructor with the standard ?? throw new ArgumentNullException(...) field-initializer guard; dropped the inline System.Guid / System.ArgumentNullException qualifications (using implicit using System;); removed redundant using System.Collections.Generic; / System.Threading / System.Threading.Tasks; directives (covered by ImplicitUsings); replaced the two if (... is null) throw new System.ArgumentNullException(...) checks with ArgumentNullException.ThrowIfNull. The stale class-level <summary>/<remarks> ("Replaces NotWiredAlarmRpcDispatcher once ... wired in", "partially wired", "returns an Unimplemented diagnostic") were corrected to describe the actual GUID-vs-Provider!Group.Tag handling — overlapping with Server-014. No behavior change, so no new test; existing WorkerAlarmRpcDispatcherTests continue to pass and the project builds warning-free under TreatWarningsAsErrors.

Server-012

Field Value
Severity Low
Category Documentation & comments
Location CLAUDE.md (Authentication section and apikey create example)
Status Resolved

Description: CLAUDE.md describes scopes as session, invoke, event, metadata, admin and shows apikey create --scopes session,invoke,event,metadata,admin. The actual canonical scope strings (used by GatewayScopes, GatewayGrpcScopeResolver, and docs/Authorization.md) are session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin. A key created per the CLAUDE.md example carries scopes the resolver never matches.

Recommendation: Update CLAUDE.md's scope list and the apikey example to the canonical *:* scope strings, per CLAUDE.md's own rule that docs change with the code.

Resolution: Resolved 2026-05-18. Confirmed against GatewayScopes (session:open, session:close, invoke:read, invoke:write, invoke:secure, events:read, metadata:read, admin). CLAUDE.md's Build/Test/Run apikey create example and the Authentication-section scope list were both updated to the canonical *:* strings. (Note: since finding Server-004 was resolved, the old example would now be actively rejected at create time rather than silently creating an unusable key, making the doc correction load-bearing.) Pure documentation change; no test.

Server-013

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs, src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs
Status Resolved

Description: DashboardAuthorizationHandler is unit-tested in isolation, but no test exercises the dashboard routes end-to-end to confirm the policy is actually enforced — which is why Server-001 (policy registered but never wired) went uncaught. There are also no tests for WorkerExecutableValidator (PE-header architecture parsing), GalaxyGlobMatcher (anchoring/escaping/empty-glob fail-open), or GalaxyHierarchyProjector pagination/page-token behavior.

Recommendation: Add a WebApplicationFactory integration test that requests a dashboard page unauthenticated and asserts the redirect/401, plus unit tests for WorkerExecutableValidator, GalaxyGlobMatcher, and projector paging.

Resolution: Resolved 2026-05-18. Re-triaged against the current test suite: three of the four named gaps were already closed. (1) The dashboard route-level enforcement test exists — GatewayApplicationTests.Build_WhenDashboardEnabled_ComponentRoutesRequireAuthorization (and ..._AuthEndpointsAllowAnonymousAccess), added when Server-001 was fixed. (2) GalaxyGlobMatcher anchoring/escaping/empty-glob behavior is covered by GalaxyFilterInputSafetyTests (GlobMatcher_TreatsSqlMetacharactersAsLiterals, GlobMatcher_DoesNotTreatLikeWildcardsAsWildcards, GlobMatcher_WithPathologicalInput_DoesNotHang), now extended with GlobMatcher_RepeatedAndInterleavedPatterns_StayCorrect. (3) Projector pagination/page-token behavior is covered end-to-end by GalaxyRepositoryGrpcServiceTests and now directly by the new GalaxyHierarchyProjectorTests. The one genuine remaining gap — WorkerExecutableValidator PE-header parsing — was closed with the new WorkerExecutableValidatorTests (7 cases: matching/mismatched x86 and x64, missing MZ header, file too small, missing PE signature), exercising the validator against synthesized minimal PE fixtures.

Server-014

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:162-171,191-198,206-214,229-237
Status Resolved

Description: The XML <remarks> and inline comments on AcknowledgeAlarm and QueryActiveAlarms describe the alarm path as not yet wired and say NotWiredAlarmRpcDispatcher is the default ("Clients calling this method today receive an OK reply with a 'worker alarm path not yet wired' diagnostic", "an empty stream until PR A.2"). In fact SessionServiceCollectionExtensions.AddGatewaySessions registers WorkerAlarmRpcDispatcher as IAlarmRpcDispatcher, so DI always injects the production dispatcher; NotWiredAlarmRpcDispatcher is only the null fallback. The comments are stale and misleading.

Recommendation: Update the AcknowledgeAlarm/QueryActiveAlarms remarks to reflect that WorkerAlarmRpcDispatcher is the wired default, and describe its actual GUID-vs-Provider!Group.Tag handling.

Resolution: Resolved 2026-05-18. Confirmed against source: SessionServiceCollectionExtensions registers WorkerAlarmRpcDispatcher as IAlarmRpcDispatcher, so the "not yet wired" / "empty stream until PR A.2" / "PR A.6/A.7 follow-up" prose in the AcknowledgeAlarm and QueryActiveAlarms <remarks> and inline comments was stale. Rewrote both <remarks> blocks and both inline comments to state that DI binds the production WorkerAlarmRpcDispatcher, that it routes over the worker pipe IPC, and that AcknowledgeAlarm handles a canonical-GUID reference (→ AcknowledgeAlarmCommand) and a Provider!Group.Tag reference (→ AcknowledgeAlarmByNameCommand), with NotWiredAlarmRpcDispatcher being only the null fallback. The matching stale WorkerAlarmRpcDispatcher class-level XML doc was corrected as part of Server-011. Pure documentation/comment change; no test.

Server-015

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/MxGateway.Server/Sessions/GatewaySession.cs:8-15,266-308,720-775
Status Resolved

Description: GatewaySession guards its mutable state with two different sync primitives. TransitionTo, MarkFaulted, TouchClientActivity, the State/LastClientActivityAt/LeaseExpiresAt/FinalFault/ActiveEventSubscriberCount getters, AttachWorkerClient, and IsLeaseExpired all read/write _state, _finalFault, _lastClientActivityAt, _leaseExpiresAt, _workerClient, and _activeEventSubscriberCount under _syncRoot. CloseAsync (lines 720-775), however, reads _state at line 729 and writes _state at lines 736 (SessionState.Closing) and 761 (SessionState.Closed) while only holding the _closeLock SemaphoreSlim_syncRoot is never acquired. A concurrent TransitionTo or MarkFaulted from another thread sees _state outside the lock that protects it, and the State getter is not guaranteed to observe the Closing/Closed writes promptly. SemaphoreSlim.WaitAsync/Release do happen to provide memory barriers in practice, but the locking discipline is split across two primitives, which is fragile and defeats the audit value of "all _state access is guarded by _syncRoot". Concretely, the race between CloseAsync setting _state = Closing and a concurrent TransitionTo(Ready) is unordered — and TransitionTo will happily overwrite Closing back to Ready because its only guard is "do not overwrite Closed/Faulted".

Recommendation: Make CloseAsync mutate _state through the existing TransitionTo(...) helper (or acquire _syncRoot around the reads/writes) so all _state access uses the same lock. Either extend TransitionTo to accept the Closing and Closed transitions (it already handles Faulted/Closed precedence) or refactor CloseAsync to call a private TrySetClosing() / MarkClosed() that locks _syncRoot. Add a regression test that forces a TransitionTo(Ready) after CloseAsync has set Closing and asserts the session does not flip back to Ready.

Resolution: 2026-05-20 — Unified the close path on _syncRoot. GatewaySession.CloseAsync (src/MxGateway.Server/Sessions/GatewaySession.cs) now mutates _state only through two private _syncRoot-locked helpers — TryBeginClose (writes Closing, returns the prior _closeStarted) and MarkClosed (writes Closed) — so every _state read/write in the session uses the same lock; _closeLock keeps its role of serializing concurrent close attempts. TransitionTo was tightened to refuse a transition out of Closing to anything other than Closed/Faulted so a late lifecycle callback cannot walk a closing session back to Ready. docs/Sessions.md updated to describe the unified lock discipline and the extended terminal precedence. Regression tests in src/MxGateway.Tests/Gateway/Sessions/GatewaySessionTests.cs: TransitionTo_AfterCloseStarted_DoesNotOverwriteClosing (the named scenario — BlockingShutdownWorkerClient parks the close inside worker.ShutdownAsync so the test can call TransitionTo(Ready) between the Closing and Closed writes and assert the state stays Closing) and MarkFaulted_AfterCloseCompletes_DoesNotResurrectSession.

Server-016

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.Server/Sessions/GatewaySession.cs:790-797, src/MxGateway.Server/Sessions/SessionManager.cs:237-258
Status Resolved

Description: GatewaySession.DisposeAsync synchronously calls _closeLock.Dispose() (line 792) without first acquiring the lock and without checking whether a CloseAsync is still in flight. The normal call path is SessionManager.CloseSessionCoreAsyncsession.CloseAsync(...)RemoveSessionAsyncDisposeAsync, where DisposeAsync runs strictly after CloseAsync completes. But the ShutdownAsync path (SessionManager.cs:237-258) and any future caller that disposes a session while another thread is still inside CloseAsync will trip ObjectDisposedException when the in-flight CloseAsync releases the semaphore. The race is narrow today because all Close/Dispose choreography goes through SessionManager, but the class-level contract is broken: nothing on GatewaySession documents or enforces "DisposeAsync must not be called concurrently with CloseAsync".

Recommendation: In DisposeAsync, either (a) take and release _closeLock once before disposing it, so the dispose is sequenced after any in-flight close, or (b) replace _closeLock disposal with a guard flag and let the semaphore be reclaimed by the finalizer. Document the invariant on the public method. Add a regression test that disposes a session whose CloseAsync has not yet completed and asserts no ObjectDisposedException.

Resolution: 2026-05-20 — Took recommendation (a): GatewaySession.DisposeAsync (src/MxGateway.Server/Sessions/GatewaySession.cs) now acquires _closeLock once before disposing the semaphore so an in-flight CloseAsync finishes (its _closeLock.Release()) before the dispose tears the semaphore down. The wait is non-cancellable (CancellationToken.None) and ObjectDisposedException is swallowed at both the wait and the dispose site so double-dispose still completes cleanly. The method's XML doc was extended with a <remarks> block stating the invariant. Regression tests in src/MxGateway.Tests/Gateway/Sessions/GatewaySessionTests.cs: DisposeAsync_WhileCloseInFlight_WaitsForCloseAndDoesNotThrow (parks CloseAsync inside the worker shutdown, calls DisposeAsync concurrently, releases shutdown, asserts both complete without ObjectDisposedException and the worker is disposed exactly once) and DisposeAsync_CalledTwice_DoesNotThrow.

Server-017

Field Value
Severity High
Category Security
Location src/MxGateway.Server/Security/Authorization/GatewayGrpcScopeResolver.cs:13-27, src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:173-247, docs/Authorization.md:108-110
Status Resolved

Description: The two new top-level RPCs added to MxAccessGatewayAcknowledgeAlarm(AcknowledgeAlarmRequest) and QueryActiveAlarms(QueryActiveAlarmsRequest) (proto lines 23-24) — are not enumerated by GatewayGrpcScopeResolver.ResolveRequiredScope. The resolver's request switch covers OpenSessionRequest, CloseSessionRequest, StreamEventsRequest, MxCommandRequest, and the four Galaxy-repository requests; everything else falls through to _ => GatewayScopes.Admin. The interceptor (GatewayGrpcAuthorizationInterceptor.AuthenticateAndAuthorizeAsync) then rejects any non-admin caller with PermissionDenied. This is technically fail-closed (and docs/Authorization.md:108-110 documents the "unrecognized → admin" intent), but in practice it means: (1) only API keys with the admin scope can acknowledge alarms or query active alarms, even though acknowledging is naturally an invoke:write-shaped operation and querying is naturally an invoke:read- or metadata:read-shaped operation; (2) the alarm RPCs ship in a state where any client that successfully opened a session and subscribed to alarm events still cannot perform the operational acks the contract advertises; (3) the test matrix GatewayGrpcScopeResolverTests does not even cover these two request types, so the gap was not caught at unit-test time.

Recommendation: Add explicit arms to ResolveRequiredScope: map AcknowledgeAlarmRequest to GatewayScopes.InvokeWrite (parity with other write actions; ack changes alarm state) and QueryActiveAlarmsRequest to GatewayScopes.MetadataRead or GatewayScopes.InvokeRead. Update docs/Authorization.md to list both. Extend GatewayGrpcScopeResolverTests with the new mappings and an assertion that every request type defined by mxaccess_gateway.proto is named in the resolver (the test can enumerate the assembly's request types so a future RPC cannot quietly add itself only via the admin fallback).

Resolution: 2026-05-20 — Added explicit AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite and QueryActiveAlarmsRequest => GatewayScopes.EventsRead arms to GatewayGrpcScopeResolver.ResolveRequiredScope (src/MxGateway.Server/Security/Authorization/GatewayGrpcScopeResolver.cs:21-22). InvokeWrite matches the existing MxCommandKind.Write* mapping because ack mutates alarm state; EventsRead matches StreamEventsRequest and MxCommandKind.DrainEvents because querying active alarms reads the same alarm/event surface. Extended GatewayGrpcScopeResolverTests with two new InlineData rows covering both request types (src/MxGateway.Tests/Security/Authorization/GatewayGrpcScopeResolverTests.cs:16-17) and added four interceptor-level cases in GatewayGrpcAuthorizationInterceptorTests (UnaryServerHandler_AcknowledgeAlarmMissingScope_ReturnsPermissionDenied, UnaryServerHandler_AcknowledgeAlarmWithScope_RunsHandler, ServerStreamingServerHandler_QueryActiveAlarmsMissingScope_ReturnsPermissionDenied, ServerStreamingServerHandler_QueryActiveAlarmsWithScope_RunsHandler) proving each new RPC denies callers lacking the chosen scope and runs the handler when the scope is held. Updated docs/Authorization.md (resolver snippet and Scope Catalog table) to list both RPCs against their scopes. dotnet test ... --filter FullyQualifiedName~GatewayGrpcAuthorizationInterceptorTests → 14 passed, 0 failed; resolver tests 28 passed, 0 failed.

Server-018

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Server/Galaxy/GalaxyGlobMatcher.cs:15
Status Resolved

Description: GalaxyGlobMatcher.RegexCache is a ConcurrentDictionary<string, Regex> keyed by glob pattern, with no eviction. The fix for Server-008 added this cache deliberately to avoid recompiling the same handful of patterns, but the cache key is the raw glob string. The patterns currently come from two sources — DiscoverHierarchyRequest.TagNameGlob (client-supplied) and ApiKeyConstraints.BrowseSubtrees / ReadSubtrees / WriteSubtrees / ReadTagGlobs / WriteTagGlobs (admin-configured) — and BuildRegex also runs each glob through Regex.Escape so an attacker cannot craft a denial-of-service ReDoS payload. The leak is therefore bounded only by "how many distinct globs a client can submit over the process lifetime", which is in the millions for TagNameGlob if a client iterates through generated names. Each compiled Regex also holds a JIT'd assembly that is non-trivial to reclaim.

Recommendation: Cap the cache at a small bound (e.g. 256 patterns) using a simple LRU or a MemoryCache with sliding expiration, or restrict the cache to globs that originate from API-key constraints (admin-controlled, naturally bounded) and pay the compile cost for client-supplied globs. Add a test that fills the cache with thousands of distinct globs and asserts the cache size stays bounded.

Resolution: 2026-05-20 — Capped GalaxyGlobMatcher's compiled-regex cache at RegexCacheCapacity = 256 entries with FIFO-by-insertion eviction (src/MxGateway.Server/Galaxy/GalaxyGlobMatcher.cs). A ConcurrentQueue<string> tracks insertion order; when the cache grows past the cap, EvictIfOverCapacity takes a small lock and dequeues + removes the oldest entries until the count is back within bound. Reads stay lock-free (the lock guards only the eviction path). Internal CurrentCacheSize / RegexCacheCapacity accessors are surfaced through the existing InternalsVisibleTo("MxGateway.Tests") so tests can assert the bound. Regression test: GalaxyFilterInputSafetyTests.GlobMatcher_WithManyDistinctPatterns_CacheStaysBounded submits RegexCacheCapacity * 4 distinct globs and asserts CurrentCacheSize stays in [0, RegexCacheCapacity]. Existing glob correctness tests (GlobMatcher_RepeatedAndInterleavedPatterns_StayCorrect, the adversarial-input theories) continue to pass, confirming eviction does not corrupt lookups.

Server-019

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs:183-221
Status Resolved

Description: WorkerAlarmRpcDispatcher.QueryActiveAlarmsAsync returns yield break (line 191) when sessionRegistry.TryGet(request.SessionId, ...) fails — it silently produces an empty stream with no diagnostic. The peer AcknowledgeAsync instead returns an AcknowledgeAlarmReply with ProtocolStatus.Code = SessionNotFound (lines 81-89), so the two methods have inconsistent missing-session handling. In production this branch is unreachable because MxAccessGatewayService.QueryActiveAlarms calls ResolveSession(...) first and throws NotFound from the gRPC layer (MxAccessGatewayService.cs:228), but: (a) the dispatcher is the seam other code paths might reach in the future, and (b) any unit test that instantiates the dispatcher directly with a missing session id sees an empty stream rather than a clear error, which is a footgun.

Recommendation: Either throw a SessionManagerException(SessionManagerErrorCode.SessionNotFound, ...) (matching the gRPC service's own resolver) or yield a single ActiveAlarmSnapshot with a diagnostic field set, and add a WorkerAlarmRpcDispatcherTests case that asserts whichever shape is chosen. Aligning with AcknowledgeAsync's SessionNotFound protocol-status pattern is preferred, but QueryActiveAlarms is a server-streaming RPC so a thrown SessionManagerException propagated by the gateway is the cleaner fit.

Resolution: 2026-05-20 — Took the preferred option: WorkerAlarmRpcDispatcher.QueryActiveAlarmsAsync (src/MxGateway.Server/Sessions/WorkerAlarmRpcDispatcher.cs) now throws SessionManagerException(SessionManagerErrorCode.SessionNotFound, ...) instead of yield break-ing when the session is missing. MxAccessGatewayService.MapException already maps that error code to gRPC NotFound, so production callers see a consistent missing-session response and a direct unit-test caller now gets a clear error instead of an empty success. The unary peer AcknowledgeAsync continues to surface the same condition as an in-band ProtocolStatus.Code = SessionNotFound, which is correct for a unary RPC. Regression test: WorkerAlarmRpcDispatcherTests.QueryActiveAlarmsAsync_WhenSessionMissing_ThrowsSessionNotFound replaces the prior _YieldsEmpty assertion — it asserts the new exception shape and also exercises AcknowledgeAsync with the same missing session id to pin the peer-method parity.

Server-020

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Server/Dashboard/Components/Pages/DashboardHome.razor:1-2, …/GalaxyPage.razor:1-2, …/ApiKeysPage.razor:1-2, …/EventsPage.razor:1-2, …/SessionsPage.razor:1-2, …/WorkersPage.razor:1-2, …/SettingsPage.razor:1-2, …/SessionDetailsPage.razor:1-2
Status Resolved

Description: Every dashboard page declares two @page directives — @page "/X" AND @page "/dashboard/X" — even though DashboardEndpointRouteBuilderExtensions.MapGatewayDashboard mounts the Razor components under a RouteGroupBuilder with pathBase = "/dashboard". The group prefix is prepended to each @page route, so the actual endpoints become /dashboard/X (from @page "/X") and /dashboard/dashboard/X (from @page "/dashboard/X"). The pages are reachable at two URLs each, and the deeper one (/dashboard/dashboard/sessions etc.) is almost certainly accidental — it leaks the path-base name into the URL and creates duplicate authorize/render work per route. GatewayApplicationTests.Build_WhenDashboardEnabled_ComponentRoutesRequireAuthorization only checks the /dashboard/X shape, so the duplicate route slipped through without an assertion.

Recommendation: Drop the @page "/dashboard/X" directive from each page; rely on the MapGroup("/dashboard") to provide the prefix. Or, if the team genuinely wants both URL shapes, document the choice in the file header and extend the route-enumeration test to assert that both are present (and both carry the authorization policy). Either way, the current setup is non-obvious.

Resolution: 2026-05-20 — Took the recommended drop: removed the redundant @page "/dashboard/X" directive from every dashboard Razor page (DashboardHome.razor, SessionsPage.razor, WorkersPage.razor, EventsPage.razor, GalaxyPage.razor, SettingsPage.razor, ApiKeysPage.razor, SessionDetailsPage.razor). Each page now declares only its bare route (e.g. @page "/sessions"); DashboardEndpointRouteBuilderExtensions.MapGatewayDashboard continues to prepend /dashboard via MapGroup, so each page is reachable at exactly one URL (/dashboard/X). Regression test: GatewayApplicationTests.Build_WhenDashboardEnabled_DoesNotRegisterDoubledDashboardPrefixRoutes enumerates the eight previously-doubled routes (/dashboard/dashboard/, /dashboard/dashboard/sessions, ... /dashboard/dashboard/sessions/{SessionId}) and asserts none of them are mapped. The existing ..._MapsBlazorDashboardAndAuthEndpoints / ..._ComponentRoutesRequireAuthorization tests continue to verify the desired /dashboard/X shapes are still present and policy-gated. No public URL contract changed (the doubled shape was accidental); no doc update needed — gateway.md and docs/GatewayDashboardDesign.md never referenced the doubled routes.

Server-021

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:266-664, src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs
Status Resolved

Description: The 1cd51bb commit history (the bulk read/write series, f220908/5e375f6/758aca2) added 473 lines of constraint-filtering and reply-merging logic to MxAccessGatewayService: ApplyConstraintsAsync (line 266), EnforceReadTagAsync / EnforceWriteHandleAsync, FilterTagBulkAsync / FilterReadBulkAsync / FilterWriteBulkAsync / FilterHandleBulkAsync, the ReplaceWriteBulkEntries switch, and three concrete BulkConstraintPlan records (SubscribeBulkConstraintPlan, WriteBulkConstraintPlan, ReadBulkConstraintPlan) that splice denied entries back into the worker's allowed-only reply in original-index order. None of this is covered by MxAccessGatewayServiceTests — its FakeSessionManager is wired with an AllowAllConstraintEnforcer (line 430) that never denies anything, so every constraint-related code path is dead at test time. A subtle off-by-one in BuildMerged, a wrong PayloadOneofCase in GetPayload / SetPayload, or a missing case in ReplaceWriteBulkEntries would all ship without a test failure.

Recommendation: Add MxAccessGatewayServiceTests cases that inject a deny-on-glob IConstraintEnforcer and exercise: (1) AddItemBulk / SubscribeBulk / AdviseItemBulk with a mix of allowed and denied tags, asserting BulkSubscribeReply.Results interleaves denied and worker-allowed entries in original-index order; (2) the same for ReadBulk and each of the four bulk-write commands; (3) HasAllowedItems == false so CreateDeniedReply is exercised (no worker call); (4) the unary Write/Write2/WriteSecured/WriteSecured2 paths through EnforceWriteHandleAsync. The fixtures can reuse the existing FakeSessionManager by replacing the constraint enforcer; no live worker is needed.

Resolution: 2026-05-20 — Added a configurable PredicateConstraintEnforcer test double (src/MxGateway.Tests/TestSupport/PredicateConstraintEnforcer.cs) that denies on per-tag and per-handle predicates and records denials. Added 11 new tests in src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceConstraintTests.cs covering: (1) AddItemBulk with mixed denials — asserts the worker is called once with only the allowed subset and the merged reply interleaves denied and worker-allowed SubscribeResults at their original indices; (2) SubscribeBulk with every tag denied — asserts HasAllowedItems short-circuits CreateDeniedReply and the session manager is never invoked; (3) AdviseItemBulk (handle-keyed denial via CheckReadHandleAsync); (4) SubscribeBulk with the allow-all enforcer — pass-through regression guard; (5) ReadBulk partial denial — asserts the BulkReadConstraintPlan produces a BulkReadReply (not a BulkSubscribeReply) with denied entries spliced in at their original indices; (6) ReadBulk all-denied short-circuit; (7) WriteBulk partial denial — asserts denied entries are dropped from the forwarded Entries and the merged reply preserves original-index order; (8) WriteSecuredBulk all-denied — proves the second ReplaceWriteBulkEntries switch arm is reachable; (9) unary Write with denied handle → PermissionDenied, no worker call, denial recorded; (10) unary WriteSecured with denied handle → PermissionDenied; (11) unary AddItem with denied tag → PermissionDenied (EnforceReadTagAsync). MxAccessGatewayServiceTests.CreateService updated to accept an IConstraintEnforcer so future tests can opt into the deny enforcer without duplicating the wiring. All 11 new tests pass; full suite (dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj) is green at 458 passing.

Server-022

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Server/Sessions/IAlarmRpcDispatcher.cs:8-29
Status Resolved

Description: Server-014's resolution noted that the stale "PR A.6 / A.7" / "not yet wired" language was rewritten on MxAccessGatewayService.AcknowledgeAlarm / QueryActiveAlarms and on the WorkerAlarmRpcDispatcher class doc. The corresponding XML doc on the interface IAlarmRpcDispatcher (lines 8-29) still says it is "PR A.6 / A.7 — gateway-side dispatcher" and that "Production implementations live in WorkerAlarmRpcDispatcher (this PR ships a not-yet-wired default that returns a clear worker-pending diagnostic)". That second clause directly contradicts the now-correct comments on the concrete implementations and on the gRPC service: WorkerAlarmRpcDispatcher is the wired default, not a not-yet-wired one. A reader who finds the interface first will believe the dispatcher is non-functional.

Recommendation: Rewrite the IAlarmRpcDispatcher <remarks> block to match the language now used on WorkerAlarmRpcDispatcher and on the gRPC service: DI binds WorkerAlarmRpcDispatcher by default; NotWiredAlarmRpcDispatcher is only the null fallback for tests/DI omission. Drop the "PR A.6 / A.7" prefix from the <summary> — the interface is now the public alarm-RPC seam.

Resolution: 2026-05-20 — Rewrote IAlarmRpcDispatcher's <summary> and <remarks> (src/MxGateway.Server/Sessions/IAlarmRpcDispatcher.cs) to match the language now used on WorkerAlarmRpcDispatcher and on MxAccessGatewayService.AcknowledgeAlarm / QueryActiveAlarms: dropped the stale "PR A.6 / A.7" prefix from the summary, and replaced the "this PR ships a not-yet-wired default that returns a clear worker-pending diagnostic" clause with the correct statement that DI binds the production WorkerAlarmRpcDispatcher by default and NotWiredAlarmRpcDispatcher is only the null fallback for DI omission / standalone tests. Pure documentation change; no test.