Files
mxaccessgw/code-reviews/IntegrationTests/findings.md
T
Joseph Doherty b4f5e8eb48 Resolve IntegrationTests-007..010 code-review findings
IntegrationTests-007: the three live test classes contend for shared
singletons (one MXAccess COM, one ZB SQL DB, one GLAuth). Added
LiveResourcesCollection with DisableParallelization and applied it to all
three so they no longer run concurrently.

IntegrationTests-008: the three live fact attributes each re-implemented the
env-var check. Added IntegrationTestEnvironment.IsEnabled and all three now
delegate to it.

IntegrationTests-009: reworded the misleading "Mock server call context" XML
doc — it is a hand-written stub with no verification behavior.

IntegrationTests-010: WaitForMessageAsync ignored cancellation. It now takes
an optional CancellationToken linked with the timeout; the smoke test shares
one cancellation source with the StreamEvents call context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:59:18 -04:00

17 KiB

Code Review — IntegrationTests

Field Value
Module src/MxGateway.IntegrationTests
Reviewer Claude Code
Review date 2026-05-18
Commit reviewed 6c64030
Status Reviewed
Open findings 0

Checklist coverage

# Category Result
1 Correctness & logic bugs Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (WaitForMessageAsync ignores cancellation).
2 mxaccessgw conventions Live tests correctly gated and skip (not fail) when prerequisites are absent; LiveGalaxyRepositoryFactAttribute undocumented in the opt-in matrix.
3 Concurrency & thread safety Issue found: IntegrationTests-007 (no [Collection]/parallelism guard for shared MXAccess/ZB/GLAuth).
4 Error handling & resilience Issue found: IntegrationTests-004 (cleanup WaitAsync can mask the original failure).
5 Security No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found.
6 Performance & resource management Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found.
7 Design-document adherence Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (GwAdmin LDAP prerequisite undocumented).
8 Code organization & conventions Issue found: IntegrationTests-008 (three near-identical fact attributes).
9 Testing coverage Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage).
10 Documentation & comments Issue found: IntegrationTests-009 (TestServerCallContext mislabelled "Mock").

Findings

IntegrationTests-001

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs
Status Resolved

Description: The Galaxy Repository live test suite and its gating env var MXGATEWAY_RUN_LIVE_GALAXY_TESTS (plus connection-string override MXGATEWAY_LIVE_GALAXY_CONN) are completely absent from docs/GatewayTesting.md. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.

Recommendation: Add a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1, MXGATEWAY_LIVE_GALAXY_CONN, the ZB database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.

Resolution: Resolved 2026-05-18: Added a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS, MXGATEWAY_LIVE_GALAXY_CONN, the deployed-ZB prerequisite, and the covered GalaxyRepository RPCs.

IntegrationTests-002

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13, src/MxGateway.Server/Configuration/LdapOptions.cs:27
Status Resolved

Description: DashboardLdapLiveTests builds the authenticator with new GatewayOptions(), so it relies on LdapOptions.RequiredGroup defaulting to GwAdmin and asserts the admin user is a member of a GwAdmin LDAP group. glauth.md does not list GwAdmin as a provisioned group — it lists admin only in the five role groups and describes GwAdmin as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."

Recommendation: Either document the required GwAdmin GLAuth provisioning step in glauth.md and GatewayTesting.md, or have the test set RequiredGroup to a baseline group glauth.md guarantees admin belongs to (e.g. WriteOperate).

Resolution: Resolved 2026-05-18: Took the documentation fix — promoted the glauth.md "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants GwAdmin to admin, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to docs/GatewayTesting.md calling out GwAdmin as a hard prerequisite. Alternative considered: weaken the test to a baseline group (WriteOperate) — rejected because GwAdmin is the real default LdapOptions.RequiredGroup and the test should exercise it.

IntegrationTests-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97
Status Resolved

Description: The test asserts only on the first MxEvent recorded by RecordingServerStreamWriter. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected OnDataChange (e.g. a registration-state or bad-quality bootstrap event). Because WaitForFirstMessageAsync returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.

Recommendation: Filter for the first event with Family == OnDataChange (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.

Resolution: Resolved 2026-05-18: Confirmed against source — WaitForFirstMessageAsync completed a TaskCompletionSource on the very first WriteAsync. Replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages, skips earlier non-matching events, and blocks on a SemaphoreSlim until a matching one arrives or the timeout elapses (throwing a TimeoutException that reports the scanned count). GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now waits for the first Family == OnDataChange event. Live execution was not possible in this environment (no MXAccess COM); verified by build.

IntegrationTests-004

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111
Status Resolved

Description: In the finally block, after CloseSessionAsync, the test does await streamTask.WaitAsync(StreamShutdownTimeout). If closing the session does not promptly complete the stream (or StreamEvents itself faults), this throws TimeoutException from inside finally, which replaces/masks any original assertion failure from the try block. The diagnostic value of the real failure is lost.

Recommendation: Wrap the streamTask.WaitAsync (and ideally WaitForProcessesAsync) in a try/catch that logs the cleanup exception via output.WriteLine instead of letting it propagate.

Resolution: Resolved 2026-05-18: Confirmed — the finally block awaited streamTask.WaitAsync and WaitForProcessesAsync with no exception handling. Extracted a shared ShutDownAsync helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via output.WriteLine instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.

IntegrationTests-005

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs
Status Resolved

Description: The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (WriteSecured ordering, OperationComplete semantics, invalid-handle exceptions). None of Write, WriteSecured, Unadvise, RemoveItem, Unregister, OperationComplete, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.

Recommendation: Add live coverage for at least a Write round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.

Resolution: Resolved 2026-05-18: Added two [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests. GatewaySession_WithLiveWorker_WritesValueToAdvisedItem registers/adds/advises then issues a Write of an integer value, asserting the command round-trips with ProtocolStatusCode.Ok and MxCommandKind.Write. GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault issues AddItem against int.MaxValue as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the ShutDownAsync cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the Write round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.

IntegrationTests-006

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs
Status Resolved

Description: LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which DashboardAuthenticator has distinct branches for (the LdapException catch, the candidate is null branch). The negative test only proves group-membership denial, not credential rejection.

Recommendation: Add a live test for admin with a wrong password asserting Succeeded == false and that the password is not leaked into FailureMessage, and a test for an unknown username.

Resolution: Resolved 2026-05-18: Added three [LiveLdapFact]-gated tests to DashboardLdapLiveTests. AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword exercises the LdapException catch via a rejected candidate bind and asserts the wrong password never reaches FailureMessage. AuthenticateAsync_UnknownUsername_Fails exercises the candidate is null branch. AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing builds the authenticator with LdapOptions.Port = 1 (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic catch (Exception) branch. All three are gated by the existing MXGATEWAY_RUN_LIVE_LDAP_TESTS opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.

IntegrationTests-007

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9
Status Resolved

Description: The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No [Collection] annotation or DisableTestParallelization is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.

Recommendation: Place the live test classes in a shared [Collection], or set [assembly: CollectionBehavior(DisableTestParallelization = true)] for this opt-in project, so live external resources are accessed serially.

Resolution: Resolved 2026-05-18: Confirmed — no [Collection] or assembly-level CollectionBehavior existed. Added LiveResourcesCollection.cs with a [CollectionDefinition(Name, DisableParallelization = true)] and applied [Collection(LiveResourcesCollection.Name)] to WorkerLiveMxAccessSmokeTests, GalaxyRepositoryLiveTests, and DashboardLdapLiveTests. A named collection (rather than an assembly-wide DisableTestParallelization) was chosen so the live classes serialize against each other and within themselves while non-live tests (IntegrationTestEnvironmentTests) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).

IntegrationTests-008

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs, src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs
Status Resolved

Description: Three near-identical fact attributes each re-implement the same "compare env var to 1 with StringComparison.Ordinal, set Skip otherwise" pattern. LiveMxAccessFactAttribute delegates to IntegrationTestEnvironment while the other two inline the logic, so the project has two divergent styles for the same concern.

Recommendation: Extract a shared helper (e.g. IntegrationTestEnvironment.IsEnabled(string variableName)) and have all three attributes call it.

Resolution: Resolved 2026-05-18: Confirmed — LiveLdapFactAttribute.Enabled and LiveGalaxyRepositoryFactAttribute.Enabled each inlined the ordinal == "1" comparison while LiveMxAccessFactAttribute delegated to IntegrationTestEnvironment. Added IntegrationTestEnvironment.IsEnabled(string variableName) as the single implementation; LiveMxAccessTestsEnabled, LiveLdapFactAttribute.Enabled, and LiveGalaxyRepositoryFactAttribute.Enabled now all call it. Verified by build.

IntegrationTests-009

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375
Status Resolved

Description: TestServerCallContext is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.

Recommendation: Reword the summary to "test stub" / "minimal ServerCallContext implementation for in-process gRPC calls."

Resolution: Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal ServerCallContext stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.

IntegrationTests-010

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369
Status Resolved

Description: WaitForFirstMessageAsync accepts only a timeout and never observes a CancellationToken. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s WaitAsync timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare TimeoutException.

Recommendation: Accept a CancellationToken (linked to TestServerCallContext's token), pass it to firstMessage.Task.WaitAsync(timeout, token), and on timeout emit the recorded Messages count via output.WriteLine before throwing.

Re-triage: The named method WaitForFirstMessageAsync no longer exists — IntegrationTests-003's resolution renamed/replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages and blocks on a SemaphoreSlim. The underlying defect still held: that replacement method also took only a timeout and never observed a CancellationToken. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's firstMessage.Task.WaitAsync detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.

Resolution: Resolved 2026-05-18: Added an optional CancellationToken parameter to WaitForMessageAsync, linked with the existing timeout source via CancellationTokenSource.CreateLinkedTokenSource, so a per-test cancellation aborts the wait promptly. GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now creates a CancellationTokenSource, passes its token into the StreamEvents TestServerCallContext and into WaitForMessageAsync, so the stream call and the wait share one cancellation source. On timeout the method already throws a TimeoutException whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate output.WriteLine). Verified by build; live tests not executed.