Files
mxaccessgw/code-reviews/Tests/findings.md
T
Joseph Doherty 6bae5ea3a3 Resolve Tests-027..031: flake root cause + coverage gaps
Tests-027  GatewayMetrics exposes its internal Meter; the
           StreamEvents_WhenEventIsWritten_RecordsSendDuration listener
           now filters by ReferenceEquals(instrument.Meter, metrics.Meter)
           instead of Meter.Name, so parallel tests with their own
           GatewayMetrics no longer cross-contaminate the families list.
Tests-028  FakeWorkerClient.Kill now captures LastKillReason;
           SessionManager.KillWorkerAsync tests pin the reason
           propagation end-to-end and cover the blank/null guard. The
           DashboardSessionAdminService kill test pins the literal
           dashboard-admin-kill reason.
Tests-029  Added CloseSessionAsync_BlankSessionId_ReturnsFailure to mirror
           the existing KillWorkerAsync blank-id coverage.
Tests-030  DeleteAsync_WhenStoreRefuses_ReportsFriendlyError renamed and
           extended to assert the dashboard-delete-key audit row with
           Details = not-found-or-active. Added
           DeleteAsync_BlankKeyId_ReturnsFailure.
Tests-031  DashboardSnapshotPublisher reconnect test now measures the
           gap from the first throw inside the fake (firstThrowAt) to
           secondSubscribeAt, isolating Task.Delay from StartAsync /
           scheduling overhead.

All resolved at 2026-05-24; 512/512 gateway tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 09:28:54 -04:00

81 KiB
Raw Blame History

Code Review — Tests

Field Value
Module src/ZB.MOM.WW.MxGateway.Tests
Reviewer Claude Code
Review date 2026-05-24
Commit reviewed 42b0037
Status Re-reviewed
Open findings 0

Checklist coverage

This pass (commit a020350) re-reviews the module after the Tests-013019 batch was resolved alongside Server-017, Server-021, and Contracts-010.

# Category Result
1 Correctness & logic bugs Issue found: Tests-023 (the companion FakeWorkerProcess.WaitForExitAsync in SessionWorkerClientFactoryFakeWorkerTests.cs still uses the Tests-015 cheating pattern — HasExited = true; ExitCode = 0; regardless of whether the worker actually exited — and is a latent regression vector if any future exit assertion is added to that file). Tests-015 was only applied to the smoke-test copy.
2 mxaccessgw conventions No new issues. Style/convention drift previously filed (Tests-008) remains resolved at a020350.
3 Concurrency & thread safety No new issues. The remaining wall-clock dependencies (InvokeAsync_WhenSessionReady_RefreshesLease uses UtcNow at both ends of a ~1 hour delta, dwarfing clock resolution; CloseExpiredLeasesAsync_* reads UtcNow once and uses it consistently for both sides) are intrinsic to the production paths and not flake sources. The Tests-017 fix is in place at WorkerClientTests.cs:354.
4 Error handling & resilience No new issues. Tests-013 closed the bulk-method coverage gap end-to-end (per-entry failure surfaces, protocol-status failures, and cancellation propagation are all exercised). Pipe-disconnect / worker-fault / kill paths all covered.
5 Security No new issues. Adversarial-input safety (Tests-002), anonymous-localhost negatives (Tests-010), interceptor-service composition (Tests-004), constraint partial-denial merging (Server-021 — PredicateConstraintEnforcer + MxAccessGatewayServiceConstraintTests), and unmapped-RPC fail-closed (Server-017) all covered.
6 Performance & resource management No new issues. Tests-014 (await using WebApplication) is applied to all seven GatewayApplication.Build(...) sites. Tests-003 (TempDatabaseDirectory) cleanup is in place.
7 Design-document adherence Tests match docs/GatewayTesting.md; the new "Galaxy Filter Safety" subsection added under Tests-019 names GalaxyFilterInputSafetyTests. No drift found.
8 Code organization & conventions Issue found: Tests-021 (ManualTimeProvider is duplicated as a private sealed class in four test files — WorkerClientTests, FakeWorkerHarnessTests, SessionManagerTests, GalaxyHierarchyCacheTests — and should follow the Tests-007 TestSupport/ consolidation pattern).
9 Testing coverage Issues found: Tests-020 (MxAccessGatewayServiceConstraintTests covers only 2 of 4 WriteBulkConstraintPlan switch arms — Write2Bulk/WriteSecured2Bulk GetPayload/SetPayload would silently break with no failing test), Tests-022 (the eleven SessionManagerBulkTests.*_PropagatesCancellation tests pre-cancel the token, so the fake's first-line ThrowIfCancellationRequested handles it before InvokeBulkInternalAsync even runs — they do not exercise mid-flight cancellation), Tests-024 (BulkConstraintPlan.MergeDeniedInto silently drops or under-fills if the worker reply count diverges from the allowed-count — no test pins this protocol-mismatch edge case).
10 Documentation & comments No new issues. Tests-019's docs/GatewayTesting.md addition is in place; new test files (SessionManagerBulkTests, MxAccessGatewayServiceConstraintTests, PredicateConstraintEnforcer) all have orienting class-level summaries.

2026-05-24 re-review (commit 42b0037)

Re-review pass at 42b0037 after the dashboard admin-actions wave: the new DashboardSessionAdminServiceTests covers Close/Kill viewer/admin gating, the extended SessionManagerTests.KillWorkerAsync_* tests cover the SessionManager-side kill path, DashboardApiKeyManagementServiceTests gains DeleteAsync coverage, and EventStreamServiceTests adds a recording broadcaster fixture and a throwing-broadcaster fixture. Four FakeSessionManager impls were extended with KillWorkerAsync stubs to keep the interface compiling. Also new: DashboardSnapshotPublisherTests (Server-042 reconnect loop), HubTokenServiceTests (Server-039 null-identity rejection), and a large ProtobufContractRoundTripTests expansion for the new bulk write/read payload oneof cases.

# Category Result
1 Correctness & logic bugs No issues found in this diff.
2 mxaccessgw conventions No issues found in this diff.
3 Concurrency & thread safety Issue found: Tests-027 (the known-flake MxAccessGatewayServiceTests.StreamEvents_WhenEventIsWritten_RecordsSendDuration cross-talks via the shared MxGateway.Server Meter when parallel tests record mxgateway.events.stream_send.duration).
4 Error handling & resilience Issue found: Tests-029 (DashboardSessionAdminService.KillWorkerAsync has a SessionNotFound and a general SessionManagerException catch branch; neither is tested. Also CloseSessionAsync has no blank-id test even though Kill does — symmetric guard, asymmetric coverage).
5 Security No issues found in this diff.
6 Performance & resource management No issues found in this diff.
7 Design-document adherence No issues found in this diff.
8 Code organization & conventions No issues found in this diff — the NullDashboardEventBroadcaster consolidation (Tests-025) is in place.
9 Testing coverage Issues found: Tests-028 (KillWorkerAsync_KillsWorkerAndRemovesSession does not pin the reason argument propagating through SessionManager.KillWorkerAsyncsession.KillWorker(reason)IWorkerClient.Kill(reason); the fake's Kill(string reason) discards the argument and there is no KillWorkerAsync_WithBlankReason_ThrowsArgument test), Tests-030 (DashboardApiKeyManagementService.DeleteAsync_WhenStoreRefuses_ReportsFriendlyError does not inject an audit store or assert the "not-found-or-active" audit entry the product code emits on the failure path, and there is no DeleteAsync_BlankKeyId_ReturnsFailure for the ValidateKeyId guard).
10 Documentation & comments No issues found in this diff.

Flake-risk note: Tests-031 (DashboardSnapshotPublisherTests.ExecuteAsync_WhenSnapshotServiceThrowsOnce_ReconnectsAfterDelay measures the reconnect gap from startedAt to secondSubscribeAt with only 10 ms slack against a 50 ms reconnectDelay; Windows Task.Delay timer granularity is ~15 ms and the gap baseline includes scheduling overhead, so the lower-bound assertion can spuriously fail on a slow CI agent).

2026-05-24 review (commit d692232)

Re-review pass at d692232 scoped to the test-side fixture churn from the dashboard refactor wave: the rename touched every namespace declaration and using; the dashboard auth refactor rewrote three dashboard test files (DashboardApiKeyAuthorizationTests, DashboardAuthorizationHandlerTests, DashboardAuthenticatorTests); GatewayApplicationTests was updated for root-mounted routes and the new ViewerPolicy; DashboardCookieOptionsTests expects root-relative login/logout; a new DashboardHubsRegistrationTests pins the three hub /negotiate endpoints and the DI shape; and the EventStreamService ctor expansion drove inline NullDashboardEventBroadcaster fakes in two test files.

# Category Result
1 Correctness & logic bugs No issues found in the a020350..d692232 diff.
2 mxaccessgw conventions No issues found — namespaces updated cleanly, the fixture-helper consolidation pattern (TestSupport/) is intact.
3 Concurrency & thread safety No issues found in this diff.
4 Error handling & resilience No issues found in this diff.
5 Security No issues found — DashboardAuthorizationHandlerTests covers both Viewer and Admin role paths and the loopback bypass.
6 Performance & resource management No issues found in this diff.
7 Design-document adherence No issues found in this diff.
8 Code organization & conventions Issues found: Tests-025 (duplicate NullDashboardEventBroadcaster private classes in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests; follow Tests-007 / Tests-021 consolidation pattern).
9 Testing coverage Issues found: Tests-026 (no test proves EventStreamService actually calls IDashboardEventBroadcaster.Publish for each event — the only consumers in tests are Null fakes).
10 Documentation & comments No issues found in this diff.

Findings

Tests-001

Field Value
Severity High
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:483-489
Status Resolved

Description: FakeSessionManager.TryGetSession unconditionally returns true and synthesizes a session for any id. As a result, Invoke_WhenSessionMissing_ThrowsNotFound (line 52) only passes because InvokeException is pre-seeded — it does not verify that the gateway service maps a genuinely missing session to NotFound. No test exercises the real gateway path where TryGetSession returns false (for StreamEvents, CloseSession, alarm RPCs). A regression dropping the missing-session check would not be caught.

Recommendation: Make FakeSessionManager.TryGetSession return false for unknown ids (return only seeded sessions), then assert NotFound/InvalidArgument is produced by the service's own lookup logic rather than an injected exception.

Resolution: Resolved 2026-05-18: confirmed root cause — added ResolveOnlySeededSessions/SeedSession to FakeSessionManager so TryGetSession returns false for unseeded ids, rewrote Invoke_WhenSessionMissing_ThrowsNotFound to drop the injected InvokeException and exercise the service's own ResolveSession lookup (asserts InvokeCount == 0), and added Invoke_WhenSessionSeeded_ResolvesAndInvokes, AcknowledgeAlarm_WhenSessionMissing_ThrowsNotFound, and QueryActiveAlarms_WhenSessionMissing_ThrowsNotFound.

Tests-002

Field Value
Severity High
Category Security
Location src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:198-210
Status Resolved

Description: The Galaxy Repository RPCs browse a SQL Server database (ZB). Every test injects a StubGalaxyHierarchyCache, so actual SQL query construction, parameterization, and filter/glob translation are never exercised. No test demonstrates that TagNameGlob, RootTagName, AlarmFilterPrefix, etc. are passed as parameters rather than concatenated into SQL. SQL-injection resistance of the Galaxy layer has zero coverage.

Recommendation: Add tests for the GalaxyRepository query-building layer (against SQLite or an in-memory abstraction, or by asserting parameter objects), covering glob/prefix inputs containing ', %, _, and ;. At minimum add a unit test over the SQL LIKE-pattern escaping helper.

Re-triage note: The finding's premise is partly misframed. GalaxyRepository issues only four constant SQL statements (HierarchySql, AttributesSql, SELECT 1, SELECT time_of_last_deploy FROM galaxy) — no DiscoverHierarchyRequest field is ever concatenated into SQL, so there is no dynamic SQL-injection surface and no LIKE-escaping helper to test. AlarmFilterPrefix belongs to the worker alarm path, not the Galaxy SQL layer. All filters (TagNameGlob, RootTagName, template-chain, category, contained-path) are applied in memory by GalaxyHierarchyProjector/GalaxyGlobMatcher against the cached snapshot. The genuine, testable concern — that adversarial filter strings are treated as opaque literals (no wildcard behaviour, no ReDoS, no exceptions) — remains valid and was previously uncovered. Severity left at High: an unsafe in-memory filter would still be a real security gap.

Resolution: Resolved 2026-05-18: added src/MxGateway.Tests/Galaxy/GalaxyFilterInputSafetyTests.cs (10 test methods, mostly [Theory] over adversarial inputs ', ' OR '1'='1, '; DROP TABLE gobject;--, %, _, 100%_off, [abc], Pump'001) covering GalaxyGlobMatcher literal-treatment / LIKE-wildcard / pathological-input (ReDoS) behaviour and GalaxyHierarchyProjector + DiscoverHierarchy RPC handling of adversarial TagNameGlob, RootTagName, and TemplateChainContains. No product bug found — the in-memory filter layer treats all metacharacters as literals; the passing tests resolve the coverage gap.

Tests-003

Field Value
Severity Medium
Category Performance & resource management
Location src/MxGateway.Tests/Security/Authentication/SqliteAuthStoreTests.cs:170-176, src/MxGateway.Tests/Security/Authentication/ApiKeyAdminCliRunnerTests.cs:252-258
Status Resolved

Description: CreateTempDatabasePath creates a fresh directory under %TEMP%\mxgateway-auth-tests\<guid> (and ...-cli-tests) for every test but nothing ever deletes it. WorkerProcessLauncherTests.TestDirectory correctly implements IDisposable and cleans up; these two do not. SQLite connection pooling can also keep the .db handle open after the test. Over many CI runs this leaks temp files and open handles.

Recommendation: Wrap the temp directory in an IDisposable/IAsyncDisposable helper (as WorkerProcessLauncherTests does) and call SqliteConnection.ClearAllPools() before deletion, or use Microsoft.Data.Sqlite in-memory mode where a real file is not needed.

Resolution: Resolved 2026-05-18: confirmed root cause — both CreateTempDatabasePath helpers created %TEMP% directories with no cleanup, and Microsoft.Data.Sqlite pools connections by default so the .db handle outlives the test. Added a shared TempDatabaseDirectory (src/MxGateway.Tests/Security/Authentication/TempDatabaseDirectory.cs) IDisposable helper that calls SqliteConnection.ClearAllPools() and recursively deletes its directory. SqliteAuthStoreTests and ApiKeyAdminCliRunnerTests now implement IDisposable, track every directory created via CreateTempDatabasePath, and dispose them after each test. All affected tests still pass.

Tests-004

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs
Status Resolved

Description: The authorization interceptor and MxAccessGatewayService are each tested in isolation, but no test composes the interceptor in front of the real service to confirm scope enforcement gates real RPCs end-to-end. A wiring mistake — interceptor not registered, or a new RPC added without a scope mapping in GatewayGrpcScopeResolver — would pass every existing test. GatewayGrpcScopeResolverTests also only checks an enumerated allow-list; it never asserts an unmapped request type fails closed.

Recommendation: Add an end-to-end test that runs OpenSession/Invoke through the interceptor+service composition with insufficient scope and asserts PermissionDenied; add a GatewayGrpcScopeResolver test asserting an unknown/unmapped request type throws or denies rather than returning a permissive default.

Resolution: Resolved 2026-05-18: confirmed the coverage gap. Added three interceptor+service composition tests to GatewayGrpcAuthorizationInterceptorTests that run the real GatewayGrpcAuthorizationInterceptor continuation into a real MxAccessGatewayService: InterceptorComposedWithService_OpenSessionMissingScope_DeniesBeforeServiceRuns (asserts PermissionDenied and OpenSessionCount == 0), InterceptorComposedWithService_OpenSessionWithScope_RunsServiceWithIdentity (service runs and observes the interceptor-pushed identity), and InterceptorComposedWithService_InvokeWriteCommandWithReadScope_DeniesBeforeServiceRuns (a Write command with only invoke:read is denied). Added two GatewayGrpcScopeResolverTests: ResolveRequiredScope_UnmappedRequestType_FailsClosedToAdminScope confirms an unmapped request type resolves to the most-restrictive Admin scope (the resolver's _ => GatewayScopes.Admin default already fails closed — no product bug), and ResolveRequiredScope_UnknownInvokeCommandKind_ReturnsInvokeReadScope confirms an unknown command kind does not silently grant write/admin access.

Tests-005

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:239-261, src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs
Status Resolved

Description: Worker-crash handling is only tested as a clean terminal exception from ReadEventsAsync or a pre-set ShutdownException. There is no test for a worker that faults mid-command — an InvokeAsync in flight when the pipe/worker dies — which is a core fault-handling path of the two-process design. WorkerClientTests covers pipe-disconnect faulting the read loop, but not the interaction where a pending InvokeAsync task observes the fault and surfaces a meaningful error code.

Recommendation: Add a WorkerClient/SessionManager test that disposes the worker pipe (or emits a WorkerFault) while an InvokeAsync is pending, and assert the invoke task fails with a WorkerClientException/SessionManagerException carrying the worker-faulted error code.

Resolution: Resolved 2026-05-18: confirmed the coverage gap and confirmed the product path already handles it correctly (WorkerClient.ReadLoopAsyncSetFaultedCompletePendingCommands(fault) fails every pending command with the fault exception). Added two WorkerClientTests: InvokeAsync_WhenPipeDisconnectsMidCommand_FailsPendingInvokeWithPipeDisconnected (worker reads the command then disposes its pipe side; the pending invoke task fails with WorkerClientErrorCode.PipeDisconnected) and InvokeAsync_WhenWorkerFaultsMidCommand_FailsPendingInvokeWithWorkerFaulted (worker emits a WorkerFault envelope while the invoke is pending; the task fails with WorkerClientErrorCode.WorkerFaulted). Both also assert the client transitions to Faulted. No product change needed.

Tests-006

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:76, src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:122
Status Resolved

Description: Several tests rely on fixed Task.Delay values: WorkerClientTests.InvokeAsync_WithLateReply… waits a hard-coded 50 ms after writing a late reply before issuing the second command, and the heartbeat tests use a 20 ms delay to make timestamps strictly increase. On a slow CI agent the 50 ms delay can be insufficient, and DateTimeOffset.UtcNow resolution can make the 20 ms heartbeat-advance assertion flaky.

Recommendation: Replace fixed delays with the existing WaitUntilAsync condition polling, and inject a controllable TimeProvider for heartbeat-timestamp comparisons instead of relying on wall-clock advance.

Re-triage note: The brief flagged ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess as "a real WorkerClient fault→kill bug". On inspection it is not a product bug — it is a test race. WorkerClient.SetFaulted publishes the Faulted state under lock before calling KillOwnedProcess, so the old test's WaitUntilAsync(() => client.State == Faulted) could return between those two statements and observe process.KillCount == 0. The kill itself always runs synchronously inside SetFaulted, and ShutdownAsync/DisposeAsync re-issue an idempotent kill, so no real consumer relies on "state==Faulted implies process dead". The fix is therefore a test-quality fix (correctly Medium / Concurrency), not a product fix.

Resolution: Resolved 2026-05-18: (1) Made ReadLoop_WhenClientFaults_KillsOwnedWorkerProcess deterministic — it now awaits FakeWorkerProcess.WaitForExitAsync (the TaskCompletionSource completed inside Kill()), which completes exactly when the kill runs, eliminating the state-polling race; verified by running it five times in isolation (5/5 pass). (2) Removed the fixed 50 ms Task.Delay from InvokeAsync_WithLateReply_IgnoresLateReplyAndKeepsClientReady — the stale reply and the second reply are now sent in pipe (FIFO) order, so the read loop discards the stale reply before the second reply with no timing window. (3) Replaced the 20 ms Task.Delay heartbeat-advance hacks in WorkerClientTests.ReadLoop_WhenHeartbeatArrives_UpdatesLastHeartbeatAndWorkerProcess and FakeWorkerHarnessTests.SendHeartbeatAsync_UpdatesClientHeartbeatState with an injected ManualTimeProvider advanced by a fixed TimeSpan; both tests now assert the exact post-advance timestamp instead of > against wall-clock drift.

Tests-007

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:682, src/MxGateway.Tests/Gateway/Grpc/GalaxyRepositoryGrpcServiceTests.cs:324, src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:460, src/MxGateway.Tests/Security/Authorization/GatewayGrpcAuthorizationInterceptorTests.cs:233
Status Resolved

Description: A near-identical TestServerCallContext implementation is copy-pasted into at least four test files (and AllowAllConstraintEnforcer / TestServerStreamWriter / RecordingStreamWriter into several). Duplication risks the copies drifting and bloats each file.

Recommendation: Extract a shared TestServerCallContext, RecordingServerStreamWriter<T>, and AllowAllConstraintEnforcer into a common test-support folder/namespace.

Resolution: Resolved 2026-05-18: confirmed five duplicated copies (the brief's four plus a fifth in Galaxy/GalaxyFilterInputSafetyTests.cs). Added a shared MxGateway.Tests.TestSupport namespace under src/MxGateway.Tests/TestSupport/: TestServerCallContext.cs (single class with an optional Metadata? requestHeaders constructor parameter that subsumes both the no-arg and headers-bearing variants), RecordingServerStreamWriter.cs (thread-safe writer with Messages and WaitForFirstMessageAsync, replacing TestServerStreamWriter/RecordingStreamWriter/RecordingServerStreamWriter), and AllowAllConstraintEnforcer.cs. Deleted all five TestServerCallContext copies, both AllowAllConstraintEnforcer copies, and the three stream-writer copies; updated the five test files to using MxGateway.Tests.TestSupport; and renamed .Items call sites to .Messages. Removed the now-unused Grpc.Core using from GatewayEndToEndFakeWorkerSmokeTests.cs. Build clean (0 warnings) and suite green.

Tests-008

Field Value
Severity Low
Category mxaccessgw conventions
Location src/MxGateway.Tests/Gateway/Sessions/WorkerAlarmRpcDispatcherTests.cs:1-9, src/MxGateway.Tests/Gateway/Sessions/NotWiredAlarmRpcDispatcherTests.cs:1-3, src/MxGateway.Tests/Gateway/Sessions/SessionManagerAlarmAutoSubscribeTests.cs:1
Status Resolved

Description: The alarm test files diverge from the project's C# style and the rest of the suite: snake_case test method names instead of the PascalCase Method_Condition_Result pattern; redundant explicit using System;/System.Threading; imports despite implicit global usings; and explicit-type new instead of target-typed new() used elsewhere. There is also a typo in fixture data ("wnwrap subscribe failed").

Recommendation: Rename the alarm tests to the house Method_Condition_Result convention, drop redundant System.* usings, align new usage, and fix the wnwrap typo.

Re-triage note: Two of the finding's claims are incorrect. (1) "wnwrap subscribe failed" is not a typoWnWrap is the real name of the worker's WnWrapAlarmConsumer MXAccess component (src/MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs); the fixture string deliberately references it, so it was left unchanged. (2) SessionManagerAlarmAutoSubscribeTests.cs already uses PascalCase Method_Condition_Result names and target-typed new(), and its lone using System.Runtime.CompilerServices; is required for [EnumeratorCancellation] (not a global using) — it is not redundant. That file needed no change. The genuine style drift was confined to WorkerAlarmRpcDispatcherTests.cs and NotWiredAlarmRpcDispatcherTests.cs.

Resolution: Resolved 2026-05-18: renamed all ten WorkerAlarmRpcDispatcherTests methods and both NotWiredAlarmRpcDispatcherTests methods from snake_case to the house Method_Condition_Result PascalCase convention; dropped the redundant System/System.Collections.Generic/System.Linq/System.Threading/System.Threading.Tasks usings from WorkerAlarmRpcDispatcherTests.cs and System.Threading/System.Threading.Tasks from NotWiredAlarmRpcDispatcherTests.cs (all are implicit global usings), keeping the required System.Runtime.CompilerServices; converted explicit-type new SessionRegistry()/new WorkerAlarmRpcDispatcher(...)/new FakeAlarmWorkerClient/new List<...>()/new GatewaySession(...) to target-typed new(); and replaced the fully-qualified System.StringComparison with StringComparison. See the re-triage note for the two claims not actioned. Suite green.

Tests-009

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:36-37,99,365
Status Resolved

Description: Several XML <summary> comments are copy-paste mismatches: the comment above OpenSessionAsync_SetsInitialDefaultLease describes correlation-ID generation; the comment above GatewaySessionSubscribeBulkAsync_ForwardsOneBulkCommand… describes lease refresh; the comment above CloseExpiredLeasesAsync_DoesNotCloseActiveEventSubscriber describes shutdown closing all sessions. Misleading test docs hinder triage.

Recommendation: Correct the <summary> text to match each test's actual behavior, or remove the redundant comments since the test names already describe the behavior.

Resolution: Resolved 2026-05-18: confirmed three copy-paste <summary> mismatches. The mislabelled comments were the summaries of the following tests left attached to the wrong method (the test below each then had no summary). Corrected all three: OpenSessionAsync_SetsInitialDefaultLease now describes setting the initial lease expiry; the comment above InvokeAsync_WhenSessionReady_RefreshesLease (the finding mis-cited the method name as GatewaySessionSubscribeBulkAsync_…) now describes lease refresh on invoke; and CloseExpiredLeasesAsync_DoesNotCloseActiveEventSubscriber now describes the expired-lease sweep leaving an active-event-subscriber session open. No behavior change.

Tests-010

Field Value
Severity Low
Category Security
Location src/MxGateway.Tests/Gateway/Dashboard/DashboardAuthorizationHandlerTests.cs:26-36
Status Resolved

Description: The anonymous-localhost bypass is tested only for the success case (allowAnonymousLocalhost: true + loopback succeeds) and the remote-unauthenticated denial. There is no test for the security-critical negatives: anonymous + loopback when AllowAnonymousLocalhost is false must be denied, and anonymous + non-loopback when the flag is true must still be denied (the bypass is scoped strictly to loopback). Those are the misconfiguration cases that would expose the dashboard.

Recommendation: Add tests: anonymous + loopback + allowAnonymousLocalhost: false → not succeeded; anonymous + non-loopback + allowAnonymousLocalhost: true → not succeeded.

Resolution: Resolved 2026-05-18: confirmed the coverage gap and confirmed DashboardAuthorizationHandler already gates the bypass correctly on AllowAnonymousLocalhost && IsLoopbackRequest() (no product bug). Added two DashboardAuthorizationHandlerTests: HandleAsync_AnonymousLocalhostDisallowed_DoesNotSucceed (anonymous + loopback + allowAnonymousLocalhost: false → not succeeded) and HandleAsync_AnonymousLocalhostAllowedFromRemoteAddress_DoesNotSucceed (anonymous + non-loopback + allowAnonymousLocalhost: true → not succeeded, proving the bypass stays scoped to loopback). Both pass.

Tests-011

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:233-301
Status Resolved

Description: GatewayEndToEndFakeWorkerSmokeTests correctly stores and awaits launcher.WorkerTask, but SessionWorkerClientFactoryFakeWorkerTests uses _ = RunWorkerAsync(...) with no stored task (lines 152, 184, 220). An unhandled exception in the scripted worker becomes an unobserved TaskException that can surface as a process-level failure in an unrelated later test rather than failing the owning test.

Recommendation: Store the worker task and either await it during disposal or attach a continuation that fails the test on fault, mirroring GatewayEndToEndFakeWorkerSmokeTests.

Resolution: Resolved 2026-05-18: confirmed all three scripted launchers in SessionWorkerClientFactoryFakeWorkerTests discarded the worker task. Added an IWorkerTaskLauncher interface (each launcher now stores its scripted task in a WorkerTask property and exposes ObserveWorkerTaskAsync); the test class now implements IAsyncDisposable, tracks every launcher it creates via a Track helper, and in DisposeAsync awaits each WorkerTask (within TestTimeout) so a scripted-worker fault fails the owning test instead of leaking as an unobserved TaskScheduler.UnobservedTaskException. OperationCanceledException and IOException — the expected outcomes of the worker client tearing the pipe down — are swallowed; anything else rethrows. NeverReadyWorkerProcessLauncher (which parks on an infinite Task.Delay) was given its own CancellationTokenSource so disposal can cancel and observe the parked task. Suite green.

Tests-012

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.Tests/Gateway/Workers/Fakes/FakeWorkerHarness.cs:62, src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:472
Status Resolved

Description: Pipe names are uniquified per test with a GUID (good), but xUnit runs test classes in parallel by default and there is no xunit.runner.json or collection configuration. Tests that build a full WebApplication bind ephemeral ports (--urls=http://127.0.0.1:0, fine) but spin up DI containers and hosted services concurrently. Currently safe, but a future test binding a fixed port would silently collide.

Recommendation: Add an xunit.runner.json or a collection grouping the WebApplication-building tests, and keep the :0 ephemeral-port convention explicit so future tests do not introduce a fixed-port collision.

Resolution: Resolved 2026-05-18: added src/MxGateway.Tests/xunit.runner.json making the parallelism policy explicit (parallelizeTestCollections: true, maxParallelThreads: -1, parallelizeAssembly: false, longRunningTestSeconds: 30) and wired it into MxGateway.Tests.csproj as <None Update="xunit.runner.json" CopyToOutputDirectory="PreserveNewest" /> so the runner picks it up (confirmed present in bin/Debug/net10.0/). Added a comment at the only WebApplication-building call site (GatewayApplicationTests.cs, --urls=http://127.0.0.1:0) documenting that the ephemeral-port (:0) convention is mandatory because test collections run in parallel. No fixed-port binding exists today; this is a preventative guardrail as the finding recommends.

Tests-013

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Server/Sessions/GatewaySession.cs:449-679, src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs
Status Resolved

Description: GatewaySession exposes eleven bulk methods (AddItemBulkAsync, AdviseItemBulkAsync, RemoveItemBulkAsync, UnAdviseItemBulkAsync, SubscribeBulkAsync, UnsubscribeBulkAsync, WriteBulkAsync, Write2BulkAsync, WriteSecuredBulkAsync, WriteSecured2BulkAsync, ReadBulkAsync) but only three (SubscribeBulkAsync, WriteBulkAsync, ReadBulkAsync) are exercised in SessionManagerTests. A grep across src/MxGateway.Tests for the other eight method names returns zero matches. The recent commit eaa7093 ("register the five new bulk subcommands in IsKnownGatewayCommand") explicitly added bulk surface to the gateway, and 1cd51bb added stress benchmarks for it, but the gateway-side tests do not pin the command-kind, payload-shape, or WriteSecured*Bulk credential-redaction behaviour for any of the new bulk variants. A future regression in WriteSecuredBulkAsync body construction would not be caught by the gateway unit suite.

Recommendation: Mirror the existing SubscribeBulkAsync / WriteBulkAsync / ReadBulkAsync test pattern for the eight missing methods: each test should OpenSessionAsync, invoke the bulk API, assert the worker received exactly one WorkerCommand of the matching MxCommandKind, and (for the secured variants) confirm the credential payload survives the round-trip without being log-redacted from the over-the-wire command shape.

Resolution: Resolved 2026-05-20: added src/MxGateway.Tests/Gateway/Sessions/SessionManagerBulkTests.cs with per-method coverage for all eleven bulk entry points. Each method now has a round-trip test that pins (a) the exact MxCommandKind sent to the worker, (b) the payload shape (server handle, item handles / tag addresses / entries, timeout for ReadBulk), and (c) per-entry failure surfacing where the reply contains a mix of WasSuccessful = true/false results with an ErrorMessage. Each method also has a *_PropagatesCancellation test that pre-cancels the token and asserts OperationCanceledException flows out. The secured variants additionally pin that CurrentUserId / VerifierUserId survive the over-the-wire command shape unchanged (the gateway's redaction rules apply only to logs, not to the command body the worker receives). New tests use a local FakeBulkWorkerClient keyed by MxCommand.Kind-specific replies; no production-code change. All 54 SessionManager/GalaxyHierarchyCache tests pass with dotnet test --filter "FullyQualifiedName~SessionManager|FullyQualifiedName~GalaxyHierarchyCache".

Tests-014

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.Tests/Gateway/GatewayApplicationTests.cs:18,33,44,62,81,105, src/MxGateway.Tests/Gateway/Dashboard/DashboardCookieOptionsTests.cs:17
Status Resolved

Description: Seven [Fact] methods build a real WebApplication via GatewayApplication.Build([]) and never dispose it. WebApplication is IAsyncDisposable; constructing one stands up a full DI container, an OpenTelemetry meter (GatewayMetrics), Kestrel server objects, hosted services, and logging providers. Because the suite runs test collections in parallel (per the new xunit.runner.json from Tests-012), every undisposed instance keeps its meter/loggers/hosted services alive until the test process exits, doubling up live Meter instances each time and silently extending the memory/handle footprint of an xunit run. Only the two tests that actually call app.StartAsync() (GatewayApplicationTests.StartAsync_InvalidGatewayConfiguration_FailsStartup and SqliteAuthStoreTests.StartAsync_NewerSchemaVersion_BlocksStartup) currently use await using.

Recommendation: Promote each WebApplication app = GatewayApplication.Build(...) to await using WebApplication app = ... and make the containing test method async Task. The endpoint-listing assertions do not need await, but the await using will ensure the DI container, meter, and hosted services are torn down per-test.

Resolution: 2026-05-20 — Promoted all seven WebApplication-building tests (six in GatewayApplicationTests plus the one in DashboardCookieOptionsTests) to async Task with await using WebApplication app = GatewayApplication.Build(...), so the DI container, GatewayMetrics meter, hosted services, and Kestrel objects are torn down per-test rather than leaking until process exit. The previously already-await using StartAsync_InvalidGatewayConfiguration_FailsStartup was unchanged. Full suite green.

Tests-015

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:374-379,87
Status Resolved

Description: The nested FakeWorkerProcess.WaitForExitAsync implementation unconditionally sets HasExited = true and ExitCode ??= 0 when called, regardless of whether the scripted worker actually completed the shutdown handshake. The smoke-test assertion Assert.True(launcher.Process.HasExited) therefore cannot distinguish "the scripted worker received WorkerShutdown, sent WorkerShutdownAck, and called MarkExited(0)" from "the gateway code path simply awaited WaitForExitAsync somewhere during teardown". The scripted worker happens to call MarkExited(0) after receiving the shutdown frame, but a regression that bypassed the shutdown-ack path entirely would still pass this assertion. The companion launcher in SessionWorkerClientFactoryFakeWorkerTests.FakeWorkerProcess.WaitForExitAsync (lines 351-356) has the same shape — fine there because no exit assertion is made — but the smoke test relies on this signal.

Recommendation: Make WaitForExitAsync await an internal TaskCompletionSource that is only completed by Kill() or MarkExited() (the same pattern WorkerClientTests.FakeWorkerProcess already uses for _exited), so HasExited reflects actual exit and the smoke test's assertion is meaningful.

Resolution: 2026-05-20 — Rewrote the smoke-test FakeWorkerProcess to back WaitForExitAsync with a TaskCompletionSource _exited that is only completed inside MarkExited (called by the scripted worker after sending WorkerShutdownAck) or Kill (which calls MarkExited(-1)), removing the "set HasExited = true and return immediately" cheat. The smoke test now also asserts Assert.Equal(0, launcher.Process.ExitCode)MarkExited(0) is reachable only via the shutdown-ack branch, so a regression that bypassed the ack path would produce a non-zero (or null) exit code and fail the assertion deterministically. WorkerClient.ShutdownAsync calls WaitForProcessExitAsync, which now genuinely awaits the scripted worker's ack.

Tests-016

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Tests/Galaxy/GalaxyHierarchyCacheTests.cs:29-41,115-124
Status Resolved

Description: RefreshAsync_WhenSqlIsUnreachable_MarksUnavailableAndDoesNotPublish is in the unit-test project but exercises a real GalaxyHierarchyCache/GalaxyRepository against a hard-coded TCP socket 127.0.0.1:65500 with a one-second connect timeout. Per docs/GatewayTesting.md, live Galaxy coverage belongs in MxGateway.IntegrationTests and is gated by MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1; this test is neither gated nor uses a stub repository. On most boxes the connect fails closed (the test passes), but the outcome depends on OS-level "connection refused" vs "no route to host" behaviour and is sensitive to environments where 127.0.0.1:65500 happens to be bound — a real flakiness source. It also breaks the gateway-without-MXAccess invariant in spirit (the gateway code path under test does I/O the unit project should not need).

Recommendation: Either (a) replace the real repository with an in-test fake that throws a SqlException/TimeoutException from GetHierarchyAsync, exercising GalaxyHierarchyCache.RefreshAsync's exception path directly; or (b) move the test to MxGateway.IntegrationTests and gate it behind a "no-live-DB-required" variant of the live-Galaxy attribute. (a) is preferred because the production path being tested is the cache's reaction to a repository exception, not socket behaviour.

Resolution: Resolved 2026-05-20: applied option (a). Introduced src/MxGateway.Server/Galaxy/IGalaxyRepository.cs with the four methods the cache consumes (TestConnectionAsync, GetLastDeployTimeAsync, GetHierarchyAsync, GetAttributesAsync); made GalaxyRepository implement it; changed GalaxyHierarchyCache's constructor to depend on IGalaxyRepository rather than the concrete type; and registered the interface against the existing concrete singleton in GalaxyRepositoryServiceCollectionExtensions.AddGalaxyRepository. Rewrote the test as RefreshAsync_WhenRepositoryThrows_MarksUnavailableAndDoesNotPublish using a local ThrowingGalaxyRepository : IGalaxyRepository that throws an InvalidOperationException from GetLastDeployTimeAsync (the first call the cache makes against the repository). The test now exercises the cache's exception branch directly — no TCP I/O — and additionally asserts that GetHierarchyAsync/GetAttributesAsync are NOT invoked once the deploy-time probe has failed. Current_BeforeAnyRefresh_ReturnsEmpty was migrated to the same fake. The unreachable CreateCache helper that built a real GalaxyRepository against 127.0.0.1:65500 was removed. The Galaxy SQL surface itself stays covered by MxGateway.IntegrationTests.Galaxy.GalaxyRepositoryLiveTests (gated by MXGATEWAY_RUN_LIVE_GALAXY_REPOSITORY_TESTS=1).

Tests-017

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:346-364
Status Resolved

Description: HeartbeatMonitor_WhenHeartbeatExpires_FaultsClient configures HeartbeatGrace = 80 ms and HeartbeatCheckInterval = 20 ms, then asserts the client faults within the 5-second TestTimeout. The test compares against the real wall clock — the heartbeat monitor reads TimeProvider.System for the grace check. After Tests-006 migrated the other heartbeat tests to an injected ManualTimeProvider for determinism, this one is now the only WorkerClientTests heartbeat case that still rides the wall clock. The 5-second outer bound makes a false failure unlikely, but the test cannot fail fast when the heartbeat-monitor logic regresses — it just waits the full 5 seconds.

Recommendation: Inject the same ManualTimeProvider used by ReadLoop_WhenHeartbeatArrives_UpdatesLastHeartbeatAndWorkerProcess, then clock.Advance(TimeSpan.FromSeconds(2)) past the grace and assert the fault deterministically. The HeartbeatCheckInterval (20 ms) timer fire can stay on the real clock; what needs to be deterministic is the grace comparison.

Resolution: 2026-05-20 — HeartbeatMonitor_WhenHeartbeatExpires_FaultsClient now constructs a ManualTimeProvider seeded at "2026-05-20T12:00:00Z", passes it to CreateClient via the existing timeProvider parameter, and calls clock.Advance(TimeSpan.FromSeconds(2)) after the handshake. WorkerClient.MarkReady records _lastHeartbeatAt from the manual clock, so the next 20 ms HeartbeatCheckInterval tick observes now - lastHeartbeat = 2s > 80ms grace and faults deterministically. The check-interval timer stays on the real clock as the finding recommended; only the grace comparison is deterministic.

Tests-018

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Tests/Galaxy/GalaxyHierarchyCacheTests.cs:32, src/MxGateway.Tests/Gateway/Dashboard/DashboardSnapshotServiceTests.cs:45,51,57,105,134,163,167,202-209,284,317,523, src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:40
Status Resolved

Description: Several tests parse ISO-8601 literals with DateTimeOffset.Parse("2026-04-26T10:00:00Z") without an explicit CultureInfo.InvariantCulture. Directory.Build.props enables TreatWarningsAsErrors, but CA1305 (specify IFormatProvider) is not currently raised because the tests don't trigger it; nevertheless, DateTimeOffset.Parse without a culture takes CurrentCulture, and on a locale whose DateTimeFormatInfo rejects the Z suffix or uses non-Gregorian calendar conventions, these parses can throw at test time. WorkerClientTests.cs:327 and FakeWorkerHarnessTests.cs:121 already added System.Globalization.CultureInfo.InvariantCulture in the Tests-006 fix; the other ~15 call sites did not get the same treatment.

Recommendation: Add CultureInfo.InvariantCulture to every DateTimeOffset.Parse(...) call in MxGateway.Tests, or replace with DateTimeOffset.ParseExact against the literal "O" round-trip format. A single-line using System.Globalization; per file keeps the call sites concise.

Resolution: 2026-05-20 — Added CultureInfo.InvariantCulture to every DateTimeOffset.Parse site in MxGateway.Tests that lacked it: 16 call sites in DashboardSnapshotServiceTests.cs (a new using System.Globalization; was added so the call sites stay concise) and one in SessionManagerTests.cs (using the fully-qualified System.Globalization.CultureInfo.InvariantCulture to match the in-file style of the existing ManualTimeProvider parse sites). GalaxyHierarchyCacheTests.cs:36 was already correct from the Tests-016 rewrite. A final grep confirms every DateTimeOffset.Parse/DateTime.Parse call in src/MxGateway.Tests now passes CultureInfo.InvariantCulture.

Tests-019

Field Value
Severity Low
Category Documentation & comments
Location docs/GatewayTesting.md, code-reviews/Tests/findings.md (Tests-002 re-triage)
Status Resolved

Description: The Tests-002 re-triage (2026-05-18) confirmed there is no SQL-injection surface in GalaxyRepository because filters are applied in memory by GalaxyHierarchyProjector/GalaxyGlobMatcher against the cached snapshot, and added 10 adversarial-input tests in src/MxGateway.Tests/Galaxy/GalaxyFilterInputSafetyTests.cs. That explanation lives only in the findings file; docs/GatewayTesting.md does not mention GalaxyFilterInputSafetyTests, the in-memory filter model, or the adversarial-input matrix. A future reader of the test docs will not know which tests pin the literal-filter behaviour or why the Galaxy SQL layer is not unit-tested for parameterisation. Per CLAUDE.md ("Update docs in the same change as the source. When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs must change in the same commit"), the Galaxy security-behaviour decision warrants a paragraph in GatewayTesting.md.

Recommendation: Add a short subsection to docs/GatewayTesting.md (probably under "Focused Commands" or a new "Galaxy Filter Safety" section) that names GalaxyFilterInputSafetyTests, explains that Galaxy filtering happens in memory against the cached hierarchy (so the SQL surface is constant), and lists the adversarial-input invariants the suite pins (%, _, ', ;, [abc] are literals; the glob regex has a 100 ms timeout against pathological input).

Resolution: 2026-05-20 — Added a "Galaxy Filter Safety" section to docs/GatewayTesting.md (immediately after "Live Galaxy Repository", before "Live LDAP") that names GalaxyFilterInputSafetyTests, re-frames the Tests-002 finding (the Galaxy SQL surface is constant — HierarchySql, AttributesSql, SELECT 1, SELECT time_of_last_deploy FROM galaxy), explains that all filters are applied in memory by GalaxyHierarchyProjector / GalaxyGlobMatcher, lists the adversarial-input matrix (', ' OR '1'='1, '; DROP TABLE gobject;--, %, _, 100%_off, [abc], Pump'001), and enumerates the invariants the suite pins (SQL metacharacters are opaque literals, only */? are glob wildcards, the matcher has a 100 ms regex timeout against pathological input, the projector returns zero matches / NotFound rather than the whole hierarchy, and the DiscoverHierarchy RPC end-to-end returns zero matches for adversarial globs).

Tests-020

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceConstraintTests.cs:275-347, src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:803-829
Status Resolved

Description: Server-021 added MxAccessGatewayServiceConstraintTests to exercise BulkConstraintPlan.MergeDeniedInto / CreateDeniedReply against a non-allow-all enforcer. The WriteBulkConstraintPlan has a four-arm GetPayload/SetPayload switch covering WriteBulk, Write2Bulk, WriteSecuredBulk, and WriteSecured2Bulk, but the new fixtures only cover two of those four arms — Invoke_WriteBulk_WithDeniedHandle_DropsEntryFromWorkerCallAndMergesDenialIntoReply (the WriteBulk arm) and Invoke_WriteSecuredBulk_WhenAllHandlesDenied_ShortCircuitsWithDeniedOnlyReply (the WriteSecuredBulk arm). The other two arms (Write2Bulk and WriteSecured2Bulk) and the parallel SubscribeBulkConstraintPlan RemoveItemBulk/UnAdviseItemBulk/UnsubscribeBulk cases (the subscribe-bulk plan's SetPayload switch in service code lines 742-753 covers only three kinds — AddItemBulk, AdviseItemBulk, SubscribeBulk — and the constraint test covers all three of those, but the unsubscribe-shaped bulk routes are also dispatched into denial paths through FilterHandleBulkAsync and have no constraint-test coverage either). A regression that wires a new bulk kind to the wrong reply slot, or drops a case arm during refactor, would compile clean and pass every existing test. The comment in Invoke_WriteSecuredBulk_WhenAllHandlesDenied_… ("The merge logic is shared, so a full denial here is enough to prove the secured-bulk routing") concedes the gap explicitly — but the _routing_ (the per-kind SetPayload switch) is exactly what is not shared and not exercised for Write2Bulk / WriteSecured2Bulk.

Recommendation: Add two short fixtures: Invoke_Write2Bulk_WhenAllHandlesDenied_ShortCircuitsWithDeniedOnlyReply and Invoke_WriteSecured2Bulk_WhenAllHandlesDenied_ShortCircuitsWithDeniedOnlyReply, mirroring the existing WriteSecuredBulk denial test but asserting reply.Write2Bulk / reply.WriteSecured2Bulk is populated (proving the SetPayload arm fires). The all-denied path is enough; the merge-with-allowed path is genuinely shared. Optionally also add denied-tag tests for RemoveItemBulk / UnsubscribeBulk to cover the handle-input variants of the SubscribeBulkConstraintPlan switch.

Resolution: 2026-05-20 — Added Invoke_Write2Bulk_WhenAllHandlesDenied_ShortCircuitsWithDeniedOnlyReply and Invoke_WriteSecured2Bulk_WhenAllHandlesDenied_ShortCircuitsWithDeniedOnlyReply to MxAccessGatewayServiceConstraintTests, plus matching CreateWrite2BulkRequest/CreateWriteSecured2BulkRequest helpers. Each new fixture asserts the worker is never called (InvokeCount == 0), reply.Kind matches the requested kind, the matching reply.{Write2Bulk,WriteSecured2Bulk}.Results slot is populated with denied entries, and the three sibling reply slots remain empty — pinning that the SetPayload switch fired for the correct arm and not for one of the other three Write*Bulk kinds. This closes the Write2Bulk/WriteSecured2Bulk arms of the four-arm GetPayload/SetPayload switch in WriteBulkConstraintPlan (MxAccessGatewayService.cs:803-829).

Tests-021

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.Tests/Galaxy/GalaxyHierarchyCacheTests.cs:159-171, src/MxGateway.Tests/Gateway/Workers/FakeWorkerHarnessTests.cs:226-236, src/MxGateway.Tests/Gateway/Workers/WorkerClientTests.cs:620-630, src/MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:766-…
Status Resolved

Description: Tests-006 / Tests-017 / Tests-018 introduced an injectable ManualTimeProvider to make heartbeat-timestamp / lease / cache tests deterministic. The class is now duplicated as a private sealed class ManualTimeProvider(DateTimeOffset start...) : TimeProvider in four test files (GalaxyHierarchyCacheTests.cs, FakeWorkerHarnessTests.cs, WorkerClientTests.cs, SessionManagerTests.cs). Each copy has the same three-line implementation (_now field, GetUtcNow() override, Advance(TimeSpan) method). One copy (GalaxyHierarchyCacheTests.cs:159) accepts a default DateTimeOffset and seeds with UtcNow; the other three require an explicit start — a small but real semantic divergence. Tests-007 consolidated the same kind of duplication for TestServerCallContext / RecordingServerStreamWriter / AllowAllConstraintEnforcer into src/MxGateway.Tests/TestSupport/; this is the same drift pattern.

Recommendation: Add src/MxGateway.Tests/TestSupport/ManualTimeProvider.cs with a single implementation (default-arg DateTimeOffset start = default resolving to a deterministic seed like DateTimeOffset.UnixEpoch or UtcNow, plus the Advance helper) and delete the four nested copies in favour of using MxGateway.Tests.TestSupport;. Same pattern as the Tests-007 resolution.

Resolution: 2026-05-20 — Added src/MxGateway.Tests/TestSupport/ManualTimeProvider.cs with the unified signature ManualTimeProvider(DateTimeOffset start = default) (a default start seeds from DateTimeOffset.UtcNow for the GalaxyHierarchyCacheTests call site that previously relied on that behaviour) plus the Advance(TimeSpan) helper. Deleted the four duplicated private sealed class ManualTimeProvider definitions from GalaxyHierarchyCacheTests.cs, FakeWorkerHarnessTests.cs, WorkerClientTests.cs, and SessionManagerTests.cs; each file now imports MxGateway.Tests.TestSupport. The SessionManagerTests copy previously lacked Advance — folding it onto the shared type does not regress because that file never called Advance. Same consolidation pattern as Tests-007.

Tests-022

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Tests/Gateway/Sessions/SessionManagerBulkTests.cs:52-61,90-99,126-135,163-172,202-211,238-247,282-294,339-360,413-434,484-506,553-567,663-688
Status Resolved

Description: Tests-013 added eleven *_PropagatesCancellation tests that pre-cancel the token (cts.CancelAsync() before calling session.*BulkAsync(..., cts.Token)) and assert OperationCanceledException. The fakes' FakeBulkWorkerClient.InvokeAsync calls cancellationToken.ThrowIfCancellationRequested() as the first statement — so the exception is thrown synchronously inside the fake before any of GatewaySession.InvokeBulkInternalAsyncInvokeAsync → bulk-result projection runs. This verifies that the token reaches the worker client (a regression that swapped in CancellationToken.None between layers would fail the test), but it does not exercise mid-flight cancellation: a token that becomes cancelled while the worker is await-suspended waiting on a reply. Mid-flight cancellation is the more interesting path (it's what a real client closing its stream looks like) and is not pinned for any of the eleven bulk methods.

The cancellation tests for WorkerClient in WorkerClientTests do exercise the mid-flight path (the FakeWorkerClient returns Task.FromCanceled style via real pipe disconnection); only the gateway-side bulk tests are shallow.

Recommendation: For at least one representative bulk method (e.g. WriteSecuredBulkAsync — the highest-value gateway path), replace the pre-cancellation pattern with a fake whose InvokeAsync returns a TaskCompletionSource-backed task that never completes until cancelled, then cts.CancelAsync() after session.WriteSecuredBulkAsync(...) has been awaited far enough to register a continuation. Assert the resulting OperationCanceledException's CancellationToken matches cts.Token. The existing pre-cancel pattern is a reasonable cheap-coverage default for the other ten methods.

Resolution: 2026-05-20 — Added WriteSecuredBulkAsync_WhenCancelledMidFlight_ThrowsOperationCanceledForRequestToken to SessionManagerBulkTests backed by a new MidFlightBulkWorkerClient fake whose InvokeAsync registers a cancellation continuation on the caller's token, signals InvokeStarted, and parks on a TaskCompletionSource<WorkerCommandReply> that completes only when the token fires (or shutdown / kill / dispose tears it down). The test awaits InvokeStarted.Task, asserts the write task is still incomplete (proving the cancellation lands on an in-flight await rather than the synchronous fast-path), then calls cts.CancelAsync() and asserts the resulting OperationCanceledException.CancellationToken == cts.Token and InvokeCount == 1. The other ten *_PropagatesCancellation tests remain on the cheaper pre-cancel pattern per the finding's recommendation.

Tests-023

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.Tests/Gateway/Sessions/SessionWorkerClientFactoryFakeWorkerTests.cs:334-374
Status Resolved

Description: Tests-015 corrected the smoke-test FakeWorkerProcess.WaitForExitAsync (in GatewayEndToEndFakeWorkerSmokeTests.cs) so it now awaits a TaskCompletionSource only completed by Kill/MarkExited, removing the "set HasExited = true and return immediately" cheat. The companion FakeWorkerProcess in SessionWorkerClientFactoryFakeWorkerTests.cs:351-356 was not updated and still has the same cheat: WaitForExitAsync unconditionally sets HasExited = true; ExitCode = 0; return ValueTask.CompletedTask;. The original Tests-006 re-triage noted this companion was "fine there because no exit assertion is made"; the file at a020350 does not yet assert HasExited or ExitCode, so this is not a current bug — but it is a latent regression vector: a future test in the same file that asserts Assert.True(launcher.Process.HasExited) after triggering shutdown would pass spuriously, exactly the failure mode Tests-015 just closed in the smoke-test copy. Two near-identical fakes in the same project with diverging semantics is brittle.

Recommendation: Apply the same TaskCompletionSource _exited pattern to SessionWorkerClientFactoryFakeWorkerTests.FakeWorkerProcess: WaitForExitAsync awaits _exited.Task, Kill calls MarkExited(-1), and add a MarkExited(int) helper that completes the TCS. The scripted launchers in this file already call Kill() through the disposal path Tests-011 added, so the change is mechanical and preserves all current behaviour.

Resolution: 2026-05-20 — Brought the companion FakeWorkerProcess in SessionWorkerClientFactoryFakeWorkerTests.cs into parity with the Tests-015 smoke-test fake. WaitForExitAsync now awaits a TaskCompletionSource _exited (wrapped in WaitAsync(cancellationToken) for cooperative cancel) instead of unconditionally setting HasExited = true; ExitCode = 0. Kill(bool) increments KillCount and delegates to a new MarkExited(int exitCode) helper that sets HasExited, ExitCode, and completes the TCS. KillCount is still observable and pre-existing tests that assert KillCount > 0 continue to pass. The latent regression vector — that a future Assert.True(launcher.Process.HasExited) in this file would pass spuriously — is closed.

Tests-024

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:713-730,784-801,859-876, src/MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceConstraintTests.cs
Status Resolved

Description: Every BulkConstraintPlan.MergeDeniedInto implementation builds its merged reply by walking OriginalCount indices and dequeueing from the worker's allowedResults queue at each non-denied slot. TryDequeue silently returns false when the queue is empty, so if the worker returns fewer allowed results than the gateway forwarded (because of a protocol mismatch, a worker bug truncating the bulk reply, or a future change to per-entry result reporting), the merged reply will be shorter than OriginalCount — the gap is not filled with a synthetic failure result. Conversely, if the worker returns more allowed results than requested, the extras are silently dropped. Neither case is covered by MxAccessGatewayServiceConstraintTests: every fixture's sessionManager.InvokeReply returns exactly the same count as the number of allowed entries forwarded. A regression in worker bulk-reply construction or a contract drift could produce a silently-truncated public reply (clients observing fewer results than entries submitted, with no error) and no gateway-side test would fail.

Recommendation: Add two fixtures to MxAccessGatewayServiceConstraintTests: Invoke_WriteBulk_WhenWorkerReturnsFewerResultsThanAllowed_ProducesPartialReplyOrSyntheticFailure (worker reply has N-1 results for N allowed entries; assert either the merged reply has OriginalCount entries with a synthetic-failure tail, or — if the gateway's current policy is "truncate" — pin that behaviour explicitly and document the expectation in a comment), and Invoke_WriteBulk_WhenWorkerReturnsExtraResults_IgnoresExtras (worker returns N+2 for N allowed; assert merged reply has exactly OriginalCount). Whichever current behaviour is correct should be made explicit by the test — the goal is preventing a silent change.

Resolution: 2026-05-20 — Pinned the current BulkConstraintPlan.MergeDeniedInto behaviour for worker reply-count divergence. Added two fixtures to MxAccessGatewayServiceConstraintTests: Invoke_WriteBulk_WhenWorkerReturnsFewerResultsThanAllowed_MergedReplyIsTruncated (gateway forwards 2 allowed handles, worker returns 1 result; merged reply has 2 entries total — the worker result at the first non-denied slot and the denied entry at its original index — and the trailing under-supplied slot is silently dropped via Queue.TryDequeue returning false) and Invoke_WriteBulk_WhenWorkerReturnsExtraResults_IgnoresExtras (gateway forwards 2 allowed handles, worker returns 4; merged reply has exactly OriginalCount == 3 entries; the two extras are bounded out by the for index < OriginalCount loop). The fixtures explicitly pin "truncate / discard extras" as the current contract — a future change to synthesise failure tails or surface extras must update the test, preventing a silent behavioural change.

Tests-025

Field Value
Severity Low
Category Code organization & conventions
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs:285-289, src/ZB.MOM.WW.MxGateway.Tests/Gateway/GatewayEndToEndFakeWorkerSmokeTests.cs:417-421
Status Resolved

Description: Commit d692232 widened the EventStreamService constructor with an IDashboardEventBroadcaster parameter. Two test files now carry an identical private sealed class NullDashboardEventBroadcaster : IDashboardEventBroadcaster with a singleton Instance field and a no-op Publish. This mirrors the duplication pattern Tests-007 and Tests-021 already consolidated for TestServerCallContext / RecordingServerStreamWriter / AllowAllConstraintEnforcer / ManualTimeProvider into src/MxGateway.Tests/TestSupport/; the same pattern should apply here.

Recommendation: Extract NullDashboardEventBroadcaster to src/ZB.MOM.WW.MxGateway.Tests/TestSupport/NullDashboardEventBroadcaster.cs (or a single DashboardTestDoubles.cs), delete both nested copies, and update the two using-bearing files to import from TestSupport.

Resolution: 2026-05-24 — Extracted the shared no-op broadcaster to src/ZB.MOM.WW.MxGateway.Tests/TestSupport/NullDashboardEventBroadcaster.cs (single public sealed class with the singleton Instance field and a private constructor — matches the Tests-007 / Tests-021 consolidation pattern). Deleted both nested duplicates (EventStreamServiceTests.cs:319-323 and GatewayEndToEndFakeWorkerSmokeTests.cs:417-421); the latter already imported ZB.MOM.WW.MxGateway.Tests.TestSupport so NullDashboardEventBroadcaster.Instance resolves against the shared type. EventStreamServiceTests.cs gained a using ZB.MOM.WW.MxGateway.Tests.TestSupport;. The integration-tests copy in src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs was left alone (different module, per scope). Server-041's ThrowingDashboardEventBroadcaster remains nested in EventStreamServiceTests (single-file usage, no consolidation needed). Build clean (0 warnings), full Tests suite green (486/486).

Tests-026

Field Value
Severity Medium
Category Testing coverage
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/EventStreamServiceTests.cs, src/ZB.MOM.WW.MxGateway.Server/Grpc/EventStreamService.cs:123-126
Status Resolved

Description: The new IDashboardEventBroadcaster is wired into EventStreamService at line 123 (commit d692232) and the broadcaster's Publish is the only path that mirrors per-session events into the dashboard EventsHub. The unit tests inject NullDashboardEventBroadcaster.Instance, so the broadcaster invocation is never observed — a regression that silently dropped the Publish call (e.g. an if accidentally added around it, or removing the broadcaster ctor parameter) would not be caught by any test in this module. The hub-registration tests (DashboardHubsRegistrationTests) verify the endpoints exist but not the producer-side hook.

Recommendation: Add a fixture to EventStreamServiceTests named e.g. StreamEventsAsync_PublishesEachEventToDashboardBroadcaster: inject a recording fake that captures (sessionId, mxEvent) calls, push two events through the fake session, and assert the broadcaster received both with the correct session id and matching WorkerSequence. This pins the broadcast hook and proves the dashboard event mirror is not a no-op.

Resolution: 2026-05-24 — Added src/ZB.MOM.WW.MxGateway.Tests/TestSupport/RecordingDashboardEventBroadcaster.cs — a thread-safe IDashboardEventBroadcaster test double backed by a ConcurrentQueue<DashboardEventCapture> that captures every (sessionId, mxEvent) invocation. Added StreamEventsAsync_PublishesEachEventToDashboardBroadcaster to EventStreamServiceTests: pushes two events (WorkerSequence 7 / OnDataChange and 8 / OnWriteComplete) through the fake session, drains the stream, and asserts the recording broadcaster captured exactly two publishes with the matching SessionId, WorkerSequence, and Family for each. TDD red/green confirmed: a deliberate "expected 3 captures" failed (Expected: 3, Actual: 2) before flipping to the correct count; the green run passes deterministically. The fixture would have caught a regression that drops or wraps dashboardEventBroadcaster.Publish at EventStreamService.cs:133. Build clean (0 warnings); full Tests suite green (486/486).

Tests-027

Field Value
Severity Medium
Category Concurrency & thread safety
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Grpc/MxAccessGatewayServiceTests.cs:199-240, src/ZB.MOM.WW.MxGateway.Server/Metrics/GatewayMetrics.cs:8,73,246-251
Status Resolved

Description: The review brief explicitly flagged MxAccessGatewayServiceTests.StreamEvents_WhenEventIsWritten_RecordsSendDuration as a known flake that "passed solo on rerun". The root cause is the MeterListener subscribes by instrument.Meter.Name == GatewayMetrics.MeterName (a process-shared constant "MxGateway.Server"), not by the specific GatewayMetrics instance constructed in the test. Tests-012 made the xUnit parallelism policy explicit (parallelizeTestCollections: true, maxParallelThreads: -1), and every other test that builds its own GatewayMetrics() and exercises MxAccessGatewayService.StreamEvents or EventStreamService.StreamEventsAsync (e.g. the new StreamEventsAsync_* family added by Tests-026 and Server-041, plus the pre-existing StreamEventsAsync_YieldsEventsInWorkerOrder etc.) routes through GatewayMetrics.RecordEventStreamSend → the same histogram name mxgateway.events.stream_send.duration. When two such tests run concurrently in the same xUnit process, the MeterListener in this test sees measurements from both meters and families.Count grows to >1, breaking Assert.Equal([MxEventFamily.OnDataChange.ToString()], families). Solo reruns pass because no other producer is alive. This is exactly the cross-test mutable-state pattern Tests-012 set the guardrail comment against.

Recommendation: Either (a) filter the MeterListener callback by the specific Meter instance — capture metrics._meter (or expose GatewayMetrics.Meter) and compare with ReferenceEquals(instrument.Meter, expectedMeter) instead of comparing Meter.Name; or (b) place this test in a single-threaded [Collection("GatewayMetrics-Listener")] so no other RecordEventStreamSend producer runs concurrently. Option (a) is preferred because it removes the cross-talk vector permanently and lets the test stay parallelisable.

Resolution: 2026-05-24 — Applied option (a). Added an internal Meter Meter => _meter; accessor on GatewayMetrics (visible to the Tests project via the existing InternalsVisibleTo) and changed both the InstrumentPublished filter and the SetMeasurementEventCallback<double> filter in StreamEvents_WhenEventIsWritten_RecordsSendDuration from instrument.Meter.Name == GatewayMetrics.MeterName to ReferenceEquals(instrument.Meter, metrics.Meter). Added a companion regression StreamEvents_RecordSendDurationListener_IgnoresMeasurementsFromOtherMetersWithSameName that constructs a second GatewayMetrics, records an OnWriteComplete measurement on it before the test-under-test publishes, and asserts the listener captures only the test-under-test's OnDataChange family. Confirmed the regression catches the original Meter.Name-only filter (got ["OnWriteComplete", "OnDataChange"] for ["OnDataChange"]) by temporarily reverting the filter shape; restored ReferenceEquals after. Suite green 3/3 (512/512); the two Tests-027 tests pass 5/5 solo. The cross-talk vector is permanently closed without giving up parallelism.

Tests-028

Field Value
Severity Low
Category Testing coverage
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Sessions/SessionManagerTests.cs:466-496,802-807, src/ZB.MOM.WW.MxGateway.Server/Sessions/SessionManager.cs:216-253
Status Resolved

Description: The new KillWorkerAsync_KillsWorkerAndRemovesSession (line 466) and KillWorkerAsync_WhenSessionMissing_ThrowsSessionNotFound (line 486) pin the new kill-path entry, but they do not pin the reason argument propagating through the chain. SessionManager.KillWorkerAsync(sessionId, reason, ct) validates reason with ArgumentException.ThrowIfNullOrWhiteSpace(reason) (line 221), calls session.KillWorker(reason) (line 229), and logs reason={Reason} (line 251); but the FakeWorkerClient.Kill(string reason) discards the argument (line 803-807) and the assertion is only Assert.Equal(1, workerClient.KillCount). A regression that (a) hard-coded an internal "unspecified" reason between SessionManager and GatewaySession, (b) swapped to a different overload that dropped the reason, or (c) deleted the ThrowIfNullOrWhiteSpace guard would all pass the current tests. The dashboard caller (DashboardSessionAdminService.KillWorkerAsync) passes a hard-coded "dashboard-admin-kill" reason and the only test that observes it (KillWorkerAsync_AdminKillsWorker) asserts !string.IsNullOrWhiteSpace(LastKillReason) rather than pinning the value — so the same-class drift is also untested.

Recommendation: (1) Capture LastKillReason on FakeWorkerClient.Kill and assert KillWorkerAsync_KillsWorkerAndRemovesSession propagates the test-supplied "test-kill" string end-to-end. (2) Add KillWorkerAsync_WithBlankReason_ThrowsArgumentException (parameterised over null, "", " ") to pin the ArgumentException.ThrowIfNullOrWhiteSpace guard. (3) Tighten DashboardSessionAdminServiceTests.KillWorkerAsync_AdminKillsWorker to Assert.Equal("dashboard-admin-kill", sessionManager.LastKillReason) so a future reason-string change is a deliberate test update.

Resolution: 2026-05-24 — Added LastKillReason to FakeWorkerClient in SessionManagerTests.cs and set it inside Kill(string reason). Tightened KillWorkerAsync_KillsWorkerAndRemovesSession to assert workerClient.LastKillReason == "test-kill", pinning the end-to-end propagation from SessionManager.KillWorkerAsyncsession.KillWorker(reason)IWorkerClient.Kill(reason). Added KillWorkerAsync_WithBlankReason_ThrowsArgumentException as a [Theory] over "", " ", "\t" plus a separate KillWorkerAsync_WithNullReason_ThrowsArgumentNullException fact (xUnit InlineData cannot carry null for a non-nullable string, and ArgumentException.ThrowIfNullOrWhiteSpace throws ArgumentNullException for null). Both new tests confirm KillCount == 0 and the session remains registered, proving the guard fires before any lookup or worker call. Tightened DashboardSessionAdminServiceTests.KillWorkerAsync_AdminKillsWorker to Assert.Equal("dashboard-admin-kill", sessionManager.LastKillReason). All affected tests pass; suite green.

Tests-029

Field Value
Severity Low
Category Error handling & resilience
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardSessionAdminServiceTests.cs:61-106,139-222, src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardSessionAdminService.cs:77-125
Status Resolved

Description: The new DashboardSessionAdminServiceTests covers the happy path and the viewer-denial path for both CloseSessionAsync and KillWorkerAsync, plus CloseSessionAsync_WhenSessionMissing_ReportsFriendlyError for the close-side SessionNotFound catch — but the kill-side error branches are not tested. The product code's KillWorkerAsync (lines 111-114) has the same SessionNotFound catch returning "Session {id} was not found." and (lines 115-124) a generic SessionManagerException catch returning "Kill failed: {message}"; neither is exercised. The fake's KillWorkerAsync (lines 200-209) only succeeds — there is no KillThrowsNotFound / KillThrowsGeneric configuration option matching the existing CloseThrowsNotFound. Symmetrically, CloseSessionAsync has the same IsNullOrWhiteSpace(sessionId) guard (line 37-40) but no blank-id test even though KillWorkerAsync_BlankSessionId_ReturnsFailure exists for the parallel kill guard — a guard-removal regression on close would slip through.

Recommendation: Mirror the existing close-side fixtures onto the kill side: add KillThrowsNotFound / KillThrowsGeneric init-flags to the FakeSessionManager, then KillWorkerAsync_WhenSessionMissing_ReportsFriendlyError, KillWorkerAsync_WhenSessionManagerThrows_ReportsKillFailedMessage, and CloseSessionAsync_BlankSessionId_ReturnsFailure. These are mechanical copies of the existing patterns and bring close/kill coverage into symmetry.

Re-triage note: The Server batch already added CloseSessionAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail and KillWorkerAsync_WhenManagerThrowsUnexpected_ReturnsFriendlyFail (the Server-050 regressions visible at HEAD lines 125-162 of the test file), so the kill-side SessionManagerException general-catch branch and the close-side parallel are both covered there in a generic-exception shape. The only remaining asymmetry was the blank-session-id guard, per the prompt scope.

Resolution: 2026-05-24 — Added CloseSessionAsync_BlankSessionId_ReturnsFailure to DashboardSessionAdminServiceTests. The new test invokes service.CloseSessionAsync(adminUser, " ", ct) and asserts Succeeded == false and sessionManager.CloseCount == 0, pinning the string.IsNullOrWhiteSpace(sessionId) guard at DashboardSessionAdminService.cs:52-55. This brings close/kill blank-id coverage into symmetry with the existing KillWorkerAsync_BlankSessionId_ReturnsFailure. The KillThrowsNotFound / KillThrowsGeneric extensions from the original recommendation are not needed because the unexpected-throw branches are already covered by the Server-050 regressions noted above. All tests pass; suite green.

Tests-030

Field Value
Severity Low
Category Testing coverage
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardApiKeyManagementServiceTests.cs:115-163, src/ZB.MOM.WW.MxGateway.Server/Dashboard/DashboardApiKeyManagementService.cs:146-177
Status Resolved

Description: The three new DeleteAsync_* fixtures cover unauthorised user, success path with audit, and store-refuses-with-friendly-error. They do not exercise two production behaviours: (1) DeleteAsync_WhenStoreRefuses_ReportsFriendlyError (line 151-163) does not construct or inject a FakeApiKeyAuditStore, so it never observes that the product code still emits an audit entry with EventType = "dashboard-delete-key" and Details = "not-found-or-active" on the failure branch (AppendAuditAsync runs unconditionally at line 167-172). A regression that placed the AppendAuditAsync call inside the if (deleted) branch would silently drop the audit trail for refused deletes — a real audit-completeness gap. (2) There is no DeleteAsync_BlankKeyId_ReturnsFailure or DeleteAsync_InvalidKeyId_ReturnsFailure test, even though ValidateKeyId(keyId) (line 156-160) guards on the same conditions as Create/Revoke/Rotate. The Revoke/Rotate paths have equivalent fixtures (the file's earlier tests cover them); only Delete is missing them.

Recommendation: (1) Add a FakeApiKeyAuditStore to DeleteAsync_WhenStoreRefuses_ReportsFriendlyError and assert it contains exactly one entry with EventType == "dashboard-delete-key" and Details == "not-found-or-active". (2) Add DeleteAsync_BlankKeyId_ReturnsFailure (parameterised over null, "", " ") and DeleteAsync_InvalidKeyId_ReturnsFailure (a keyId with characters the ValidateKeyId rules reject) to pin the validation branch end-to-end.

Resolution: 2026-05-24 — Renamed DeleteAsync_WhenStoreRefuses_ReportsFriendlyError to DeleteAsync_WhenStoreRefuses_ReportsFriendlyErrorAndAudits and extended it to inject a FakeApiKeyAuditStore; the test now asserts the single audit entry has EventType == "dashboard-delete-key", KeyId == "operator01", and Details == "not-found-or-active". This pins the unconditional-audit invariant at DashboardApiKeyManagementService.cs:167-172 — a regression moving the AppendAuditAsync call inside if (deleted) would now fail the test. Added DeleteAsync_BlankKeyId_ReturnsFailure as a [Theory] over "", " ", "\t" that asserts Succeeded == false, adminStore.DeleteCount == 0, AND auditStore.Entries is empty — pinning that the ValidateKeyId guard at line 156-160 fires before any store or audit work. All tests pass; suite green.

Tests-031

Field Value
Severity Low
Category Concurrency & thread safety
Location src/ZB.MOM.WW.MxGateway.Tests/Gateway/Dashboard/DashboardSnapshotPublisherTests.cs:22-61
Status Resolved

Description: ExecuteAsync_WhenSnapshotServiceThrowsOnce_ReconnectsAfterDelay records startedAt = DateTimeOffset.UtcNow before calling publisher.StartAsync(...), then asserts secondSubscribeAt - startedAt >= reconnectDelay - 10ms (line 59). The measured gap is not the reconnect delay in isolation — it is (StartAsync scheduling) + (first WatchSnapshotsAsync setup + Task.Yield) + (throw) + reconnect delay + (second WatchSnapshotsAsync setup). On a slow/contended CI agent the first three terms easily dominate (favouring the assertion); but on a fast machine, Windows Task.Delay(50ms) rounds up to the next ~15.6 ms tick boundary and may return at ~46-50 ms relative to schedule, while the first three terms can be sub-millisecond — so the gap measurement can land within 1-2 ms of the lower bound, and the 10 ms slack may not absorb a single missed quantum. This is a latent flake of the same flavour as Tests-006 (heartbeat timing) but on a wall-clock dependency the test cannot inject around because DashboardSnapshotPublisher uses Task.Delay(_reconnectDelay) directly. Tests-006 / Tests-017 moved heartbeat tests onto ManualTimeProvider; this test cannot do that without a product change to use a TimeProvider-aware delay.

Recommendation: (a) The cheap fix: have ThrowOnceThenYieldSnapshotService record _firstThrowAt = DateTimeOffset.UtcNow immediately before the throw, and change the assertion to secondSubscribeAt - firstThrowAt >= reconnectDelay - 10ms — the gap then measures only the reconnect delay, eliminating the variable scheduling baseline. (b) The deeper fix: extend DashboardSnapshotPublisher to accept an ITimeProvider-style delay seam (or a virtual DelayAsync hook) so a ManualTimeProvider could advance time deterministically. (a) is preferred for now; (b) belongs as a follow-up if more reconnect-loop tests are added.

Resolution: 2026-05-24 — Applied option (a). Added FirstThrowAt to ThrowOnceThenYieldSnapshotService and set it via FirstThrowAt = DateTimeOffset.UtcNow; immediately before the first-call throw. Removed the pre-StartAsync startedAt baseline; the assertion now reads gap = secondSubscribeAt - firstThrowAt (both timestamps captured inside the fake), and the 10 ms slack absorbs the Windows Task.Delay quantum without the variable StartAsync / scheduling overhead in the baseline. This is the same flake-isolation pattern Tests-006 / Tests-017 used (measuring only the production delay, not test-side setup). Suite green; the test passes deterministically across repeated runs.