Append 1 new finding (IntegrationTests-025): the ResolveRepositoryRoot_NoMarkers test walks up from Path.GetTempPath() through unisolated ancestors; a redirected TMP or co-located checkout at C:\src silently breaks the assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
66 KiB
Code Review — IntegrationTests
| Field | Value |
|---|---|
| Module | src/ZB.MOM.WW.MxGateway.IntegrationTests |
| Reviewer | Claude Code |
| Review date | 2026-05-24 |
| Commit reviewed | 42b0037 |
| Status | Re-reviewed |
| Open findings | 1 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
2026-05-20 re-review (commit a020350)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring "worker", which matches almost any plausible fault message and dilutes the check to "the description is non-empty"). |
| 2 | mxaccessgw conventions | No issues found. The five new tests honor live opt-in gating, [Collection] serialization, "no synthesized events", and the credential-redaction contract for the assertions they make. |
| 3 | Concurrency & thread safety | No issues found. GatewaySession.State/FinalFault access in the abnormal-exit poll loop goes through _syncRoot; RecordingServerStreamWriter.Messages returns a locked snapshot copy. |
| 4 | Error handling & resilience | No issues found. ShutDownAsync's opt-in propagateStreamFaults correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path. |
| 5 | Security | Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from DiagnosticMessage only; it does not assert the credential is absent from the accumulated test output, where the worker stderr/stdout and the gateway log are echoed). |
| 6 | Performance & resource management | No issues found. All six RecordingServerStreamWriter<MxEvent> instantiations use using declarations; using CancellationTokenSource is the consistent pattern. |
| 7 | Design-document adherence | No issues found. docs/GatewayTesting.md documents all five new parity surfaces and the two new env-var defaults (MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER/_PASSWORD). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-018 (GatewayServiceFixture.TryGetSession declares out GatewaySession session non-nullable while the caller binds it as out GatewaySession? session; the null-forgiving operator inside SessionRegistry.TryGet propagates a misleading non-null annotation). |
| 9 | Testing coverage | Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active StreamEvents task observed the worker fault; relies entirely on the session-state poll and would silently pass if MarkFaulted were ever moved off the stream-consumption path). |
| 10 | Documentation & comments | No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases). |
2026-05-20 review (commit 1cd51bb)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a StreamEvents task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
| 2 | mxaccessgw conventions | Live opt-ins, [Collection] serialization, and the "don't synthesize events" rule are honored. No issues found. |
| 3 | Concurrency & thread safety | LiveResourcesCollection serializes all three live classes; RecordingServerStreamWriter locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
| 4 | Error handling & resilience | ShutDownAsync already isolates cleanup exceptions per category. No issues found. |
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (RecordingServerStreamWriter.messageArrived SemaphoreSlim is never disposed; the type owns an IDisposable field but is not itself disposable). |
| 7 | Design-document adherence | No issues found. docs/GatewayTesting.md now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 ([Trait("Category", ...)] repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between LiveGalaxyRepositoryFactAttribute and GalaxyRepositoryOptions). |
| 9 | Testing coverage | Issue found: IntegrationTests-014 (Unadvise, RemoveItem, Unregister, WriteSecured ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-Ok MXAccess failure as ProtocolStatusCode.Ok, contradicting both the assertion and HResultConverter). |
2026-05-18 review (commit 6c64030)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (WaitForMessageAsync ignores cancellation). |
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; LiveGalaxyRepositoryFactAttribute undocumented in the opt-in matrix. |
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no [Collection]/parallelism guard for shared MXAccess/ZB/GLAuth). |
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup WaitAsync can mask the original failure). |
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (GwAdmin LDAP prerequisite undocumented). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (TestServerCallContext mislabelled "Mock"). |
2026-05-24 re-review (commit 42b0037)
Re-review pass at 42b0037 scoped to commit 865c22a (the only commit
touching src/ZB.MOM.WW.MxGateway.IntegrationTests between d692232 and
42b0037 — all later commits in the range affect dashboard/clients only).
865c22a resolved IntegrationTests-022..024: the
ResolveRepositoryRoot throw replaces the silent
Directory.GetCurrentDirectory() fallback and ships a regression test;
DashboardLdapLiveTests adds the Role: Admin claim assertion; and
NullDashboardEventBroadcaster is extracted into
TestSupport/NullDashboardEventBroadcaster.cs.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-025 (the new ResolveRepositoryRoot_NoMarkers_… test seeds an "isolated" directory under Path.GetTempPath() and walks parents, implicitly assuming no ancestor of the temp dir contains src/ plus .git/*.sln/*.slnx; the assumption is environment-dependent and can flap on machines whose TMP is redirected). |
| 2 | mxaccessgw conventions | No issues found — file-scoped namespaces, sealed types, accurate XML docs, env-var names preserved. |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | No issues found. The new InvalidOperationException only fires from the live-MXAccess worker-exe resolution path, which is gated by MXGATEWAY_RUN_LIVE_MXACCESS_TESTS; non-live tests are unaffected. |
| 5 | Security | No issues found — the new live-LDAP claim assertion adds no new credential surface. |
| 6 | Performance & resource management | No issues found. |
| 7 | Design-document adherence | No issues found. The diff is internal behavior hardening + a private-claim assertion; no public API / env-var / opt-in surface changes, so the CLAUDE.md "update docs in the same change" rule does not require a docs/GatewayTesting.md edit. |
| 8 | Code organization & conventions | No issues found. NullDashboardEventBroadcaster's placement under TestSupport/ matches the unit-test project layout; the deliberate duplication is documented in IntegrationTests-024's resolution. |
| 9 | Testing coverage | No issues found — the new role-claim assertion in DashboardLdapLiveTests and the new exception-path test in IntegrationTestEnvironmentTests both add coverage; no further gaps surface from this diff. |
| 10 | Documentation & comments | No issues found. XML docs on ResolveRepositoryRoot accurately describe the new throw, and the inline IntegrationTests-023 comment correctly explains why the role claim is asserted. |
2026-05-24 review (commit d692232)
Re-review pass at d692232 scoped to: the rename (dc9c0c9) of the
namespaces, csproj, and the IntegrationTestEnvironment.ResolveRepositoryRoot
walker; the DashboardLdapLiveTests update to inject a GroupToRole
mapping in place of the dropped LdapOptions.RequiredGroup default
(27ed651); and the inline NullDashboardEventBroadcaster fake added
to WorkerLiveMxAccessSmokeTests for the new EventStreamService ctor
parameter (d692232).
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | No issues found in the a020350..d692232 diff. |
| 2 | mxaccessgw conventions | No issues found — namespaces, env-var names, and live-test attribute discipline preserved. |
| 3 | Concurrency & thread safety | No issues found in this diff. |
| 4 | Error handling & resilience | Confirmed: the RpcException(StatusCode.Cancelled) sibling catch added in WorkerLiveMxAccessSmokeTests.ShutDownAsync covers the only site where a gRPC-mapped OperationCanceledException can surface — no other call sites need widening. |
| 5 | Security | No issues found. |
| 6 | Performance & resource management | No issues found. |
| 7 | Design-document adherence | No issues found — docs/GatewayTesting.md was updated in the same wave to describe the GroupToRole-fixture pattern. |
| 8 | Code organization & conventions | Issues found: IntegrationTests-022 (ResolveRepositoryRoot silently falls back to Directory.GetCurrentDirectory() on exhausted walk, which can hide misconfiguration), IntegrationTests-024 (inline NullDashboardEventBroadcaster fake is the only such use today but worth watching for duplication once a second consumer arrives). |
| 9 | Testing coverage | Issues found: IntegrationTests-023 (DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds asserts the ldap_group claim but does not assert the emitted Role: Admin claim, leaving the role-mapping path untested). |
| 10 | Documentation & comments | No issues found. |
Findings
IntegrationTests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs |
| Status | Resolved |
Description: The Galaxy Repository live test suite and its gating env var MXGATEWAY_RUN_LIVE_GALAXY_TESTS (plus connection-string override MXGATEWAY_LIVE_GALAXY_CONN) are completely absent from docs/GatewayTesting.md. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
Recommendation: Add a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1, MXGATEWAY_LIVE_GALAXY_CONN, the ZB database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
Resolution: Resolved 2026-05-18: Added a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS, MXGATEWAY_LIVE_GALAXY_CONN, the deployed-ZB prerequisite, and the covered GalaxyRepository RPCs.
IntegrationTests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13, src/MxGateway.Server/Configuration/LdapOptions.cs:27 |
| Status | Resolved |
Description: DashboardLdapLiveTests builds the authenticator with new GatewayOptions(), so it relies on LdapOptions.RequiredGroup defaulting to GwAdmin and asserts the admin user is a member of a GwAdmin LDAP group. glauth.md does not list GwAdmin as a provisioned group — it lists admin only in the five role groups and describes GwAdmin as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
Recommendation: Either document the required GwAdmin GLAuth provisioning step in glauth.md and GatewayTesting.md, or have the test set RequiredGroup to a baseline group glauth.md guarantees admin belongs to (e.g. WriteOperate).
Resolution: Resolved 2026-05-18: Took the documentation fix — promoted the glauth.md "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants GwAdmin to admin, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to docs/GatewayTesting.md calling out GwAdmin as a hard prerequisite. Alternative considered: weaken the test to a baseline group (WriteOperate) — rejected because GwAdmin is the real default LdapOptions.RequiredGroup and the test should exercise it.
IntegrationTests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97 |
| Status | Resolved |
Description: The test asserts only on the first MxEvent recorded by RecordingServerStreamWriter. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected OnDataChange (e.g. a registration-state or bad-quality bootstrap event). Because WaitForFirstMessageAsync returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
Recommendation: Filter for the first event with Family == OnDataChange (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
Resolution: Resolved 2026-05-18: Confirmed against source — WaitForFirstMessageAsync completed a TaskCompletionSource on the very first WriteAsync. Replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages, skips earlier non-matching events, and blocks on a SemaphoreSlim until a matching one arrives or the timeout elapses (throwing a TimeoutException that reports the scanned count). GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now waits for the first Family == OnDataChange event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
IntegrationTests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111 |
| Status | Resolved |
Description: In the finally block, after CloseSessionAsync, the test does await streamTask.WaitAsync(StreamShutdownTimeout). If closing the session does not promptly complete the stream (or StreamEvents itself faults), this throws TimeoutException from inside finally, which replaces/masks any original assertion failure from the try block. The diagnostic value of the real failure is lost.
Recommendation: Wrap the streamTask.WaitAsync (and ideally WaitForProcessesAsync) in a try/catch that logs the cleanup exception via output.WriteLine instead of letting it propagate.
Resolution: Resolved 2026-05-18: Confirmed — the finally block awaited streamTask.WaitAsync and WaitForProcessesAsync with no exception handling. Extracted a shared ShutDownAsync helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via output.WriteLine instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
IntegrationTests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs |
| Status | Resolved |
Description: The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (WriteSecured ordering, OperationComplete semantics, invalid-handle exceptions). None of Write, WriteSecured, Unadvise, RemoveItem, Unregister, OperationComplete, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
Recommendation: Add live coverage for at least a Write round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
Resolution: Resolved 2026-05-18: Added two [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests. GatewaySession_WithLiveWorker_WritesValueToAdvisedItem registers/adds/advises then issues a Write of an integer value, asserting the command round-trips with ProtocolStatusCode.Ok and MxCommandKind.Write. GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault issues AddItem against int.MaxValue as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the ShutDownAsync cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the Write round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
IntegrationTests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs |
| Status | Resolved |
Description: LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which DashboardAuthenticator has distinct branches for (the LdapException catch, the candidate is null branch). The negative test only proves group-membership denial, not credential rejection.
Recommendation: Add a live test for admin with a wrong password asserting Succeeded == false and that the password is not leaked into FailureMessage, and a test for an unknown username.
Resolution: Resolved 2026-05-18: Added three [LiveLdapFact]-gated tests to DashboardLdapLiveTests. AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword exercises the LdapException catch via a rejected candidate bind and asserts the wrong password never reaches FailureMessage. AuthenticateAsync_UnknownUsername_Fails exercises the candidate is null branch. AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing builds the authenticator with LdapOptions.Port = 1 (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic catch (Exception) branch. All three are gated by the existing MXGATEWAY_RUN_LIVE_LDAP_TESTS opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
IntegrationTests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9 |
| Status | Resolved |
Description: The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No [Collection] annotation or DisableTestParallelization is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
Recommendation: Place the live test classes in a shared [Collection], or set [assembly: CollectionBehavior(DisableTestParallelization = true)] for this opt-in project, so live external resources are accessed serially.
Resolution: Resolved 2026-05-18: Confirmed — no [Collection] or assembly-level CollectionBehavior existed. Added LiveResourcesCollection.cs with a [CollectionDefinition(Name, DisableParallelization = true)] and applied [Collection(LiveResourcesCollection.Name)] to WorkerLiveMxAccessSmokeTests, GalaxyRepositoryLiveTests, and DashboardLdapLiveTests. A named collection (rather than an assembly-wide DisableTestParallelization) was chosen so the live classes serialize against each other and within themselves while non-live tests (IntegrationTestEnvironmentTests) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
IntegrationTests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs, src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs |
| Status | Resolved |
Description: Three near-identical fact attributes each re-implement the same "compare env var to 1 with StringComparison.Ordinal, set Skip otherwise" pattern. LiveMxAccessFactAttribute delegates to IntegrationTestEnvironment while the other two inline the logic, so the project has two divergent styles for the same concern.
Recommendation: Extract a shared helper (e.g. IntegrationTestEnvironment.IsEnabled(string variableName)) and have all three attributes call it.
Resolution: Resolved 2026-05-18: Confirmed — LiveLdapFactAttribute.Enabled and LiveGalaxyRepositoryFactAttribute.Enabled each inlined the ordinal == "1" comparison while LiveMxAccessFactAttribute delegated to IntegrationTestEnvironment. Added IntegrationTestEnvironment.IsEnabled(string variableName) as the single implementation; LiveMxAccessTestsEnabled, LiveLdapFactAttribute.Enabled, and LiveGalaxyRepositoryFactAttribute.Enabled now all call it. Verified by build.
IntegrationTests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375 |
| Status | Resolved |
Description: TestServerCallContext is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
Recommendation: Reword the summary to "test stub" / "minimal ServerCallContext implementation for in-process gRPC calls."
Resolution: Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal ServerCallContext stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
IntegrationTests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369 |
| Status | Resolved |
Description: WaitForFirstMessageAsync accepts only a timeout and never observes a CancellationToken. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s WaitAsync timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare TimeoutException.
Recommendation: Accept a CancellationToken (linked to TestServerCallContext's token), pass it to firstMessage.Task.WaitAsync(timeout, token), and on timeout emit the recorded Messages count via output.WriteLine before throwing.
Re-triage: The named method WaitForFirstMessageAsync no longer exists — IntegrationTests-003's resolution renamed/replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages and blocks on a SemaphoreSlim. The underlying defect still held: that replacement method also took only a timeout and never observed a CancellationToken. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's firstMessage.Task.WaitAsync detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
Resolution: Resolved 2026-05-18: Added an optional CancellationToken parameter to WaitForMessageAsync, linked with the existing timeout source via CancellationTokenSource.CreateLinkedTokenSource, so a per-test cancellation aborts the wait promptly. GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now creates a CancellationTokenSource, passes its token into the StreamEvents TestServerCallContext and into WaitForMessageAsync, so the stream call and the wait share one cancellation source. On timeout the method already throws a TimeoutException whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate output.WriteLine). Verified by build; live tests not executed.
IntegrationTests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187 |
| Status | Resolved |
Description: The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code). HResultConverter.CreateProtocolStatus (src/MxGateway.Worker/Conversion/HResultConverter.cs:39) actually sets Code = ProtocolStatusCode.MxaccessFailure whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is not Ok on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting Ok. A reader can reasonably conclude the gateway always reports Ok for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
Recommendation: Reword the invalid-handle comment to "the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero Hresult carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as MxaccessFailure (per HResultConverter), not as a RpcException.
Resolution: 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero hresult carrying the COM failure (per HResultConverter), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as MxaccessFailure, never as an RpcException.
IntegrationTests-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_WritesValueToAdvisedItem constructs a RecordingServerStreamWriter<MxEvent> and starts a StreamEvents task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the Write command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an OnWriteComplete, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with new TestServerCallContext() (no cancellation source), any fault raised by the stream task (an exception from EventStreamService, a session-not-found, a backpressure overflow) is swallowed — streamTask is later awaited in ShutDownAsync only inside a broad catch (Exception ex), which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (eventWriter) is dead code in this test that suggests an assertion was intended.
Recommendation: Either remove the unused eventWriter/StreamEvents plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an OnWriteComplete event for the written item via eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...), matching the round-trip proof used by scripts/run-client-e2e-tests.ps1 -VerifyWrite.
Resolution: Resolved 2026-05-20: Rewrote GatewaySession_WithLiveWorker_WritesValueToAdvisedItem so the previously-dead eventWriter/StreamEvents plumbing actually drives an assertion. The test now waits for an OnWriteComplete event matching the Write's (server, item) handle pair via eventWriter.WaitForMessageAsync (using IntegrationTestEnvironment.LiveMxAccessEventTimeout), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a CancellationTokenSource and the test asserts streamTask.IsFaulted == false before cleanup. ShutDownAsync gained an opt-in propagateStreamFaults flag so a faulted StreamEvents task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so StreamEvents observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.
IntegrationTests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609 |
| Status | Resolved |
Description: RecordingServerStreamWriter<T> owns a SemaphoreSlim messageArrived (IDisposable) but does not itself implement IDisposable, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an IDisposable field — the standard hygiene under Directory.Build.props's TreatWarningsAsErrors=true is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself IDisposable; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
Recommendation: Make RecordingServerStreamWriter<T> implement IDisposable, dispose messageArrived in Dispose, and wrap each instantiation in a using block (using RecordingServerStreamWriter<MxEvent> eventWriter = new();).
Resolution: 2026-05-20 — RecordingServerStreamWriter<T> now implements IDisposable and its Dispose releases the messageArrived semaphore. All six live tests in WorkerLiveMxAccessSmokeTests now allocate the writer with a top-of-method using declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
IntegrationTests-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs |
| Status | Resolved |
Description: IntegrationTests-005 was resolved by adding live coverage for Write and an invalid-handle AddItem, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped Unadvise, RemoveItem, Unregister, OperationComplete, and WriteSecured ordering. CLAUDE.md singles out WriteSecured ("WriteSecured failing before a value-bearing NMX body") and OperationComplete semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
Recommendation: Add live MXAccess coverage for the teardown chain (Unadvise then RemoveItem then Unregister, asserting each replies with ProtocolStatusCode.Ok and the next operation no longer references the freed handle), and at minimum one WriteSecured parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
Resolution: Resolved 2026-05-20: Added three new [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests, all reusing the existing opt-in env var and ShutDownAsync cleanup helper. (1) GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies Ok with the matching MxCommandKind, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-Ok MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves an ArchestrA user id via AuthenticateUser (credentials env-overridable through MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER / ..._PASSWORD, defaulting to the admin/admin123 GLAuth user from glauth.md), issues WriteSecured against an advised item, and asserts the reply carries MxCommandKind.WriteSecured, the protocol status is one of the documented parity outcomes (Ok for an unprotected provider, MxaccessFailure when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted opens a session, kills the worker process tree (via a new TestWorkerProcessFactory.KillAllAndDetach helper) without going through CloseSession, and polls the session via a new GatewayServiceFixture.TryGetSession accessor until it transitions to SessionState.Faulted within the live event timeout; asserts the final state is Faulted, that FinalFault is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). docs/GatewayTesting.md was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.
IntegrationTests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52 |
| Status | Resolved |
Description: Every live-test method in the three live classes carries an identical [Trait("Category", "LiveMxAccess")] (or LiveLdap / LiveGalaxy) attribute. The trait is uniform within each class and is exactly the information the [Collection(LiveResourcesCollection.Name)] class-level attribute also implies. xUnit's [Trait] is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's LiveLdap includes five copies of the same line.
Recommendation: Move each [Trait("Category", ...)] to the class declaration alongside the existing [Collection(...)], and remove the per-method copies. Verify the trait still surfaces in --filter Trait=Category=LiveLdap after the change.
Resolution: 2026-05-20 — Lifted [Trait("Category", "LiveMxAccess")], [Trait("Category", "LiveLdap")], and [Trait("Category", "LiveGalaxy")] to the class declarations of WorkerLiveMxAccessSmokeTests, DashboardLdapLiveTests, and GalaxyRepositoryLiveTests respectively (alongside the existing [Collection(LiveResourcesCollection.Name)]), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so --filter Category=Live* filters still match.
IntegrationTests-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26, src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13 |
| Status | Resolved |
Description: The default Galaxy Repository connection string "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;" is duplicated verbatim between the production GalaxyRepositoryOptions.ConnectionString initializer and the test-side LiveGalaxyRepositoryFactAttribute.ConnectionString fallback. The docs (docs/GatewayTesting.md) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
Recommendation: Expose the production default through a public const string on GalaxyRepositoryOptions (e.g. DefaultConnectionString) and have LiveGalaxyRepositoryFactAttribute.ConnectionString read Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString. Single source of truth, build-time guarantee they cannot drift.
Resolution: 2026-05-20 — Added public const string GalaxyRepositoryOptions.DefaultConnectionString carrying the production default, set the ConnectionString initializer to reference it, and changed LiveGalaxyRepositoryFactAttribute.ConnectionString to fall back to GalaxyRepositoryOptions.DefaultConnectionString. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
IntegrationTests-017
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity proves the subscription is live by waiting for one matching OnDataChange, snapshots dataChangeCountBeforeUnadvise, then sends UnAdvise, waits 500 ms, snapshots dataChangeCountAfterTeardown, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:
- After
WaitForMessageAsyncreturns the first match, any additionalOnDataChangefor the same(serverHandle, itemHandle)published by the provider before the worker processesUnAdviseis delivered into the recording writer. - The snapshot at line 362 is taken immediately before the
UnAdvisecommand is sent (line 370). Events that arrive in the window between that snapshot and the worker processingUnAdvise(network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot. MXAccessproviders can publishOnDataChangeat sub-second cadence; the strict-equality assertion has no slack for in-flight events.
The test passes today only because the chosen test item (TestChildObject.TestInt) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events after the worker stops the subscription") would be better expressed by capturing the snapshot after UnAdvise returns Ok rather than before it is issued.
Recommendation: Move the "before" snapshot to immediately after UnAdvise returns Ok (the point past which the parity rule applies), or weaken the assertion to "no events with a WorkerSequence strictly greater than the last sequence observed within dataChangeCountBeforeUnadvise + N events arrived in the post-teardown drain" where N accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.
Resolution: 2026-05-20 — Removed the dataChangeCountBeforeUnadvise snapshot taken just before the UnAdvise command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken after the teardown chain completes. The test now waits one 500 ms settle window for in-flight OnDataChange events (which the provider already published before the worker acknowledged UnAdvise) to drain, captures dataChangeCountAfterFirstSettle, waits another 500 ms, and asserts the count is unchanged in dataChangeCountAfterSecondSettle. The parity rule under test ("no further OnDataChange after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.
IntegrationTests-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595 |
| Status | Resolved |
Description: GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session) declares session as a non-nullable GatewaySession, but its implementation is _registry.TryGet(sessionId, out session), which (in SessionRegistry.cs:43) uses session! to silence the nullability warning on Dictionary.TryGetValue. On a false return the out parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as out GatewaySession? session, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a false return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces Nullable=enable;TreatWarningsAsErrors=true (src/Directory.Build.props), so the convention is for TryX patterns to either annotate the out as T? or to mirror BCL Dictionary.TryGetValue<TKey, TValue> (which uses [MaybeNullWhen(false)] out TValue).
Recommendation: Change the signature to public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session) (mirroring BCL TryGetValue) and propagate the same annotation down through ISessionRegistry.TryGet / SessionRegistry.TryGet so the session! fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.
Resolution: 2026-05-20 — Propagated [MaybeNullWhen(false)] through the entire TryGet* chain. GatewayServiceFixture.TryGetSession, ISessionManager.TryGetSession / SessionManager.TryGetSession, and ISessionRegistry.TryGet/TryRemove plus their SessionRegistry implementations now carry the BCL Dictionary.TryGetValue-style annotation, and the session! null-forgiving operator inside SessionRegistry.TryGet / TryRemove was removed because the annotation makes it redundant. Existing in-tree callers (SessionManagerTests.cs line 28) were updated to out GatewaySession? to match. The compiler now warns at callsites that read session without checking the boolean return, closing the silent contract gap. Verified by dotnet build src/MxGateway.IntegrationTests/... (0 warnings), dotnet build src/MxGateway.Tests/... (0 warnings), and dotnet test src/MxGateway.Tests/... (479 passed).
IntegrationTests-019
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves a credential pair via ResolveLiveMxAccessSecuredCredentials, passes the password into AuthenticateUser and WriteSecured, and asserts the password is absent from writeSecuredReply.DiagnosticMessage. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, WriteSecured payloads, and AuthenticateUser credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:
- The
TestOutputLoggerProviderwrites every gateway-sideILoggerentry toITestOutputHelper. A regression that logged the request body (or theWorkerCommandRequestenvelope) would put the password into test output without failing this test. WriteWorkerOutputechoes workerstdout/stderrlines toITestOutputHelper. A worker-side regression that printed the credential (e.g. a debug log added toMxAccessCommandExecutor) would land in test output without failing this test.output.WriteLine(...)calls in the test body (AuthenticateUser status=... user_id=...andLogReply("WriteSecured", ...)) currently don't include the request body, but a future maintenance change that printedcommand.WriteSecured.Valueor a similar struct dump would silently leak the credential past the existing assertion.
Because ITestOutputHelper doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.
Recommendation: Replace the bare ITestOutputHelper injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a StringBuilder. At the end of the test, assert the buffer does not contain verifyPassword. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through GatewayLogRedactor (src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs) so the credential-bearing commands are redacted at the logger sink the test sees.
Resolution: 2026-05-20 — Added a RecordingTestOutputHelper private class that implements ITestOutputHelper, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a StringBuilder exposed via a Captured property. The WriteSecured parity test now constructs this wrapper, passes it to both TestWorkerProcessFactory (so worker stdout/stderr lines flow through it) and GatewayServiceFixture (so the TestOutputLoggerProvider's gateway-ILogger entries flow through it), and uses it for every direct WriteLine. A new LogReplyTo(ITestOutputHelper sink, …) static helper underpins the existing LogReply instance method so the test body can route reply logging through the recording wrapper. After the cleanup finally block completes, the test asserts recordedOutput.Captured does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body WriteLine — a future regression that dumped the request body, the WorkerCommandRequest envelope, or the WriteSecured payload would land in the buffer and fail the assertion.
IntegrationTests-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted asserts the FinalFault description contains at least one of these substrings (case-insensitive): disconnect, pipe, heartbeat, worker, end of stream. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The "worker" substring defeats that intent — the gateway routes through SetFaulted with messages like "Worker pipe disconnected.", "Worker shutdown timed out.", "Worker was killed by the gateway: …", "Worker heartbeat expired. …", "Worker event channel rejected an event.", "Worker pipe write failed.", "Worker read loop failed.", "Worker sent unexpected envelope body …" — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word Worker would still pass this test.
CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (SessionState.Faulted with an actionable cause), so the test's documented value is verifying which of the WorkerClient error codes drove the transition. Today the assertion is effectively Assert.NotEmpty(observedFault).
Recommendation: Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — PipeDisconnected ("pipe", "disconnect") and EndOfStream ("end of stream"). Drop the broad "worker" term, and drop "heartbeat" unless the test deliberately covers the heartbeat path too (it does not — HeartbeatGraceSeconds = 15 and the poll deadline is StreamShutdownTimeout = 10 seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert FinalFault.StartsWith("Worker pipe disconnected") against the message constant in WorkerClient.cs:380 so a rename surfaces as a compile-time / test-time failure.
Resolution: 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the FinalFault description to contain "pipe disconnected" (matching the WorkerClient.cs:378-381 WorkerFrameProtocolErrorCode.EndOfStream → WorkerClientErrorCode.PipeDisconnected → "Worker pipe disconnected." message) or "end of stream", dropping the broad "worker" term that previously matched every WorkerClient fault message (all of which begin with "Worker"), and dropping "heartbeat" because the test's StreamShutdownTimeout (10 s) is below HeartbeatGraceSeconds (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.
IntegrationTests-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622 |
| Status | Resolved |
Description: The abnormal-exit test only polls session.State and session.FinalFault. It does not assert anything about streamTask after the kill. The chain that puts the session into Faulted is: the read loop hits EOS → SetFaulted(PipeDisconnected, …) → _events.Writer.TryComplete(fault) → ReadEventsAsync propagates the WorkerClientException → EventStreamService.ProduceEventsAsync's catch (Exception exception) when exception is WorkerClientException calls session.MarkFaulted(exception.Message). The test verifies the end state of that chain but not that the StreamEvents call is what produced the transition. If a future change moved the MarkFaulted call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that streamTask.IsFaulted (or that awaiting it throws a WorkerClientException) would protect that contract.
The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (Assert.False(streamTask.IsFaulted, …)). The abnormal-exit test should add the inverse: the stream task must be faulted (or at least completed with a WorkerClientException) after the kill.
Recommendation: After the session-state poll succeeds, assert streamTask.IsCompleted (the channel has terminated) and inspect streamTask.Exception?.InnerException for a WorkerClientException (or assert streamTask.IsFaulted and await with ShouldThrowAsync). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing Assert.False(streamTask.IsFaulted, …) on line 217 — the abnormal-exit case wants the opposite assertion.
Resolution: 2026-05-20 — After the session-state poll loop confirms SessionState.Faulted, the test now awaits streamTask.WaitAsync(StreamShutdownTimeout) (with a try/catch that logs the surfaced exception type/message), then asserts streamTask.IsCompleted and streamTask.IsFaulted. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → WorkerClient.SetFaulted(PipeDisconnected, …) → ReadEventsAsync propagates the fault → EventStreamService calls session.MarkFaulted → MxAccessGatewayService.StreamEvents re-throws the mapped RpcException. A future refactor that moved MarkFaulted off the stream-consumption path would leave streamTask completing cleanly, which the new IsFaulted assertion would now catch (inverse of the existing Assert.False(streamTask.IsFaulted, …) in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps WorkerClientException to RpcException at the public boundary (MxAccessGatewayService.MapWorkerClientException); asserting on the surface type alone would be brittle, while the IsFaulted check directly tests the contract the recommendation is protecting.
IntegrationTests-022
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138 (ResolveRepositoryRoot / IsRepositoryRoot) |
| Status | Resolved |
Description: The walker introduced in dc9c0c9 searches parents for a directory containing src/ plus either .git or a *.slnx/*.sln file, falling back to Directory.GetCurrentDirectory() when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a .git or .slnx marker silently uses the wrong root and then the live MXGATEWAY_LIVE_MXACCESS_WORKER_EXE lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current bin/ layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.
Recommendation: Throw a clear InvalidOperationException from ResolveRepositoryRoot when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (MXGATEWAY_LIVE_MXACCESS_WORKER_EXE) remains the escape hatch for unusual deployments.
Resolution: Resolved 2026-05-24 — Replaced the silent Directory.GetCurrentDirectory() fallback in IntegrationTestEnvironment.ResolveRepositoryRoot with an InvalidOperationException whose message names the start directory, the expected markers (src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE escape hatch. Added regression test IntegrationTestEnvironmentTests.ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers that creates an isolated temp directory under Path.GetTempPath() containing no markers, asserts the throw, and asserts the message carries the start directory, .git, .sln, and the worker-exe env-var name so a future refactor that dropped any of these from the diagnostic would fail loudly. TDD red/green confirmed: the new test fails against the pre-fix walker (Assert.Throws() Failure: No exception was thrown) and passes after the throw is added. The existing ResolveRepositoryRoot_AcceptsGitWorktreeFile test still passes, so the happy path is unaffected.
IntegrationTests-023
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29 |
| Status | Resolved |
Description: AuthenticateAsync_AdminInGwAdminGroup_Succeeds asserts the principal carries an LdapGroupClaimType claim containing GwAdmin (line 26-28). After the 27ed651 refactor, DashboardAuthenticator.CreatePrincipal also emits a ClaimTypes.Role claim with the mapped role (Admin for the seeded GwAdmin → Admin mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting Role: Admin (e.g. a future refactor of MapGroupsToRoles that returned an empty list) would silently pass.
Recommendation: Add an Assert.Contains asserting the principal holds a ClaimTypes.Role claim with value DashboardRoles.Admin (or "Admin"). Mirror the style of the existing LdapGroupClaimType assertion.
Resolution: Resolved 2026-05-24 — Added an Assert.Contains to AuthenticateAsync_AdminInGwAdminGroup_Succeeds that mirrors the existing LdapGroupClaimType assertion style, asserting the principal carries a ClaimTypes.Role claim with value DashboardRoles.Admin. Couples the live test to the GwAdmin → Admin mapping wired through CreateAuthenticator's GroupToRole dictionary, so a future regression to MapGroupsToRoles (returning an empty list, dropping the RDN-fallback lookup, breaking the case-insensitive comparer) would now surface as a failed live test rather than a silent pass. The test is gated by MXGATEWAY_RUN_LIVE_LDAP_TESTS; build is green and the test continues to skip cleanly when the env var is unset.
IntegrationTests-024
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs (NullDashboardEventBroadcaster private class at end of file) |
| Status | Resolved |
Description: The inline NullDashboardEventBroadcaster private class is the third copy in the repository (the other two live in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests under the Tests module — see Tests-025). Each carries a singleton Instance field and a no-op Publish. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.
Recommendation: When Tests-025 lands the shared TestSupport/NullDashboardEventBroadcaster.cs, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.
Resolution: Resolved 2026-05-24 — Chose option (b) (extract within IntegrationTests) so the unit-test and integration-test projects stay independently buildable without a shared test-helpers project. Moved the inline private class out of WorkerLiveMxAccessSmokeTests.cs into a new public type at src/ZB.MOM.WW.MxGateway.IntegrationTests/TestSupport/NullDashboardEventBroadcaster.cs (namespace ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport), mirroring the layout the unit-test Tests project established for Tests-007/Tests-021. The fixture wires through the same NullDashboardEventBroadcaster.Instance singleton, but the constructor is now private so future integration tests can rely on a single shared no-op. A new using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport; in WorkerLiveMxAccessSmokeTests.cs is the only change to its EventStreamService wiring. Build is green (0 warnings, 0 errors); no regression in the existing tests (4 passed / 15 skipped, identical to the pre-change baseline).
IntegrationTests-025
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironmentTests.cs:57-84 (ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers) |
| Status | Open |
Description: The new regression test for IntegrationTests-022 builds an "isolated" start directory under Path.GetTempPath() (e.g. C:\Users\<user>\AppData\Local\Temp\<random>\nested on Windows) and calls ResolveRepositoryRoot(isolatedStart), asserting an InvalidOperationException is thrown. The walker walks every parent — <random>, Temp, Local, AppData, <user>, Users, C:\ — and stops only when it either finds a repository root marker or runs out of parents. The test silently assumes none of those ancestor directories satisfies IsRepositoryRoot (a src/ subdirectory next to .git / *.sln / *.slnx). The assumption is environment-dependent:
Path.GetTempPath()honors theTMP/TEMPenvironment variable. A developer or CI runner that redirectsTMPto a path under a checkout (e.g.D:\work\repo\tmp) — or simply has another git checkout atC:\(C:\srcwith a.git, common on Windows build agents) — will see the walker stop early at the real repo root and return that path instead of throwing. TheAssert.Throws<InvalidOperationException>then fails with a misleading "No exception was thrown" message that does not name the path the walker actually returned.- The sibling test
ResolveRepositoryRoot_AcceptsGitWorktreeFile(line 25) is unaffected because it walks from a directory it controls upward into an enclosing directory it also controls — the worktree file marker is encountered before any ancestor ofPath.GetTempPath()is reached. The new test has the opposite shape: it walks past every directory the test owns into directories it does not control.
The current dev box layout (C:\Users\dohertj2\Desktop\mxaccessgw) is safe because Temp is at C:\Users\dohertj2\AppData\Local\Temp and the walker exits at C:\ without ever encountering src/. The fragility is invisible on this machine and only surfaces if the test ever runs in CI / on a contributor box with a less hermetic file-system layout.
Recommendation: Isolate the walker from any ambient ancestor by either (a) constructing an isolatedRoot directly under a drive root and pointing the walker at a chain entirely under it (e.g. create <isolatedRoot>\level1\level2\level3 and start the walk at level3, then assert the throw — the walker stops at the drive root regardless of what is on it), (b) refactoring ResolveRepositoryRoot to accept an injectable stopBoundary parameter for tests and pass isolatedRoot as the boundary, or (c) replacing the Assert.Throws shape with an explicit upward-walk check that the test owns. Option (a) is the smallest change: prepend a sentinel — e.g. create a dummy <isolatedRoot>\sentinel-no-markers and assert nothing about Temp ancestors — and pass the test only when the walker reaches that sentinel without finding a marker. The current shape is acceptable on the documented dev box but should not be the sole regression coverage for IntegrationTests-022.