Files
mxaccessgw/code-reviews/IntegrationTests/findings.md
T
Joseph Doherty a68f0cf222 code-review: regen README, all 21 open findings resolved
Closes the 2026-05-24 resolution sweep at HEAD `83eba4b`. All 21 open
findings from the d692232 re-review are now Resolved:

  Server          327e9c5  Server-031, -032, -038, -039, -040, -041, -042, -043
  Contracts       bd1d1f1  Contracts-016, -017
  Tests           d48099f  Tests-025, -026
  IntegrationTests 865c22a IntegrationTests-022, -023, -024
  Client.Java     10bd0c0  Client.Java-027, -028, -029, -030, -031
  Client.Rust     83eba4b  Client.Rust-021

Also fixes stale "Open findings" header counts in Client.Java and
IntegrationTests findings.md that survived the resolve passes
(the agents updated each finding's Status but missed the header
sum). `regen-readme.py --check` is now green.

Module status: 11 / 11 reviewed, 0 / 276 total open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 03:21:39 -04:00

60 KiB

Code Review — IntegrationTests

Field Value
Module src/ZB.MOM.WW.MxGateway.IntegrationTests
Reviewer Claude Code
Review date 2026-05-24
Commit reviewed d692232
Status Reviewed
Open findings 0

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

2026-05-20 re-review (commit a020350)

# Category Result
1 Correctness & logic bugs Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring "worker", which matches almost any plausible fault message and dilutes the check to "the description is non-empty").
2 mxaccessgw conventions No issues found. The five new tests honor live opt-in gating, [Collection] serialization, "no synthesized events", and the credential-redaction contract for the assertions they make.
3 Concurrency & thread safety No issues found. GatewaySession.State/FinalFault access in the abnormal-exit poll loop goes through _syncRoot; RecordingServerStreamWriter.Messages returns a locked snapshot copy.
4 Error handling & resilience No issues found. ShutDownAsync's opt-in propagateStreamFaults correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path.
5 Security Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from DiagnosticMessage only; it does not assert the credential is absent from the accumulated test output, where the worker stderr/stdout and the gateway log are echoed).
6 Performance & resource management No issues found. All six RecordingServerStreamWriter<MxEvent> instantiations use using declarations; using CancellationTokenSource is the consistent pattern.
7 Design-document adherence No issues found. docs/GatewayTesting.md documents all five new parity surfaces and the two new env-var defaults (MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER/_PASSWORD).
8 Code organization & conventions Issue found: IntegrationTests-018 (GatewayServiceFixture.TryGetSession declares out GatewaySession session non-nullable while the caller binds it as out GatewaySession? session; the null-forgiving operator inside SessionRegistry.TryGet propagates a misleading non-null annotation).
9 Testing coverage Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active StreamEvents task observed the worker fault; relies entirely on the session-state poll and would silently pass if MarkFaulted were ever moved off the stream-consumption path).
10 Documentation & comments No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases).

2026-05-20 review (commit 1cd51bb)

# Category Result
1 Correctness & logic bugs Issue found: IntegrationTests-012 (Write test starts a StreamEvents task and never observes it — silent event-stream coverage gap and an unobserved fault path).
2 mxaccessgw conventions Live opt-ins, [Collection] serialization, and the "don't synthesize events" rule are honored. No issues found.
3 Concurrency & thread safety LiveResourcesCollection serializes all three live classes; RecordingServerStreamWriter locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found.
4 Error handling & resilience ShutDownAsync already isolates cleanup exceptions per category. No issues found.
5 Security The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found.
6 Performance & resource management Issue found: IntegrationTests-013 (RecordingServerStreamWriter.messageArrived SemaphoreSlim is never disposed; the type owns an IDisposable field but is not itself disposable).
7 Design-document adherence No issues found. docs/GatewayTesting.md now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions.
8 Code organization & conventions Issues found: IntegrationTests-015 ([Trait("Category", ...)] repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between LiveGalaxyRepositoryFactAttribute and GalaxyRepositoryOptions).
9 Testing coverage Issue found: IntegrationTests-014 (Unadvise, RemoveItem, Unregister, WriteSecured ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out).
10 Documentation & comments Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-Ok MXAccess failure as ProtocolStatusCode.Ok, contradicting both the assertion and HResultConverter).

2026-05-18 review (commit 6c64030)

# Category Result
1 Correctness & logic bugs Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (WaitForMessageAsync ignores cancellation).
2 mxaccessgw conventions Live tests correctly gated and skip (not fail) when prerequisites are absent; LiveGalaxyRepositoryFactAttribute undocumented in the opt-in matrix.
3 Concurrency & thread safety Issue found: IntegrationTests-007 (no [Collection]/parallelism guard for shared MXAccess/ZB/GLAuth).
4 Error handling & resilience Issue found: IntegrationTests-004 (cleanup WaitAsync can mask the original failure).
5 Security No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found.
6 Performance & resource management Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found.
7 Design-document adherence Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (GwAdmin LDAP prerequisite undocumented).
8 Code organization & conventions Issue found: IntegrationTests-008 (three near-identical fact attributes).
9 Testing coverage Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage).
10 Documentation & comments Issue found: IntegrationTests-009 (TestServerCallContext mislabelled "Mock").

2026-05-24 review (commit d692232)

Re-review pass at d692232 scoped to: the rename (dc9c0c9) of the namespaces, csproj, and the IntegrationTestEnvironment.ResolveRepositoryRoot walker; the DashboardLdapLiveTests update to inject a GroupToRole mapping in place of the dropped LdapOptions.RequiredGroup default (27ed651); and the inline NullDashboardEventBroadcaster fake added to WorkerLiveMxAccessSmokeTests for the new EventStreamService ctor parameter (d692232).

# Category Result
1 Correctness & logic bugs No issues found in the a020350..d692232 diff.
2 mxaccessgw conventions No issues found — namespaces, env-var names, and live-test attribute discipline preserved.
3 Concurrency & thread safety No issues found in this diff.
4 Error handling & resilience Confirmed: the RpcException(StatusCode.Cancelled) sibling catch added in WorkerLiveMxAccessSmokeTests.ShutDownAsync covers the only site where a gRPC-mapped OperationCanceledException can surface — no other call sites need widening.
5 Security No issues found.
6 Performance & resource management No issues found.
7 Design-document adherence No issues found — docs/GatewayTesting.md was updated in the same wave to describe the GroupToRole-fixture pattern.
8 Code organization & conventions Issues found: IntegrationTests-022 (ResolveRepositoryRoot silently falls back to Directory.GetCurrentDirectory() on exhausted walk, which can hide misconfiguration), IntegrationTests-024 (inline NullDashboardEventBroadcaster fake is the only such use today but worth watching for duplication once a second consumer arrives).
9 Testing coverage Issues found: IntegrationTests-023 (DashboardLdapLiveTests.AuthenticateAsync_AdminInGwAdminGroup_Succeeds asserts the ldap_group claim but does not assert the emitted Role: Admin claim, leaving the role-mapping path untested).
10 Documentation & comments No issues found.

Findings

IntegrationTests-001

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs
Status Resolved

Description: The Galaxy Repository live test suite and its gating env var MXGATEWAY_RUN_LIVE_GALAXY_TESTS (plus connection-string override MXGATEWAY_LIVE_GALAXY_CONN) are completely absent from docs/GatewayTesting.md. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.

Recommendation: Add a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1, MXGATEWAY_LIVE_GALAXY_CONN, the ZB database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.

Resolution: Resolved 2026-05-18: Added a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS, MXGATEWAY_LIVE_GALAXY_CONN, the deployed-ZB prerequisite, and the covered GalaxyRepository RPCs.

IntegrationTests-002

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13, src/MxGateway.Server/Configuration/LdapOptions.cs:27
Status Resolved

Description: DashboardLdapLiveTests builds the authenticator with new GatewayOptions(), so it relies on LdapOptions.RequiredGroup defaulting to GwAdmin and asserts the admin user is a member of a GwAdmin LDAP group. glauth.md does not list GwAdmin as a provisioned group — it lists admin only in the five role groups and describes GwAdmin as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."

Recommendation: Either document the required GwAdmin GLAuth provisioning step in glauth.md and GatewayTesting.md, or have the test set RequiredGroup to a baseline group glauth.md guarantees admin belongs to (e.g. WriteOperate).

Resolution: Resolved 2026-05-18: Took the documentation fix — promoted the glauth.md "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants GwAdmin to admin, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to docs/GatewayTesting.md calling out GwAdmin as a hard prerequisite. Alternative considered: weaken the test to a baseline group (WriteOperate) — rejected because GwAdmin is the real default LdapOptions.RequiredGroup and the test should exercise it.

IntegrationTests-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97
Status Resolved

Description: The test asserts only on the first MxEvent recorded by RecordingServerStreamWriter. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected OnDataChange (e.g. a registration-state or bad-quality bootstrap event). Because WaitForFirstMessageAsync returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.

Recommendation: Filter for the first event with Family == OnDataChange (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.

Resolution: Resolved 2026-05-18: Confirmed against source — WaitForFirstMessageAsync completed a TaskCompletionSource on the very first WriteAsync. Replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages, skips earlier non-matching events, and blocks on a SemaphoreSlim until a matching one arrives or the timeout elapses (throwing a TimeoutException that reports the scanned count). GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now waits for the first Family == OnDataChange event. Live execution was not possible in this environment (no MXAccess COM); verified by build.

IntegrationTests-004

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111
Status Resolved

Description: In the finally block, after CloseSessionAsync, the test does await streamTask.WaitAsync(StreamShutdownTimeout). If closing the session does not promptly complete the stream (or StreamEvents itself faults), this throws TimeoutException from inside finally, which replaces/masks any original assertion failure from the try block. The diagnostic value of the real failure is lost.

Recommendation: Wrap the streamTask.WaitAsync (and ideally WaitForProcessesAsync) in a try/catch that logs the cleanup exception via output.WriteLine instead of letting it propagate.

Resolution: Resolved 2026-05-18: Confirmed — the finally block awaited streamTask.WaitAsync and WaitForProcessesAsync with no exception handling. Extracted a shared ShutDownAsync helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via output.WriteLine instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.

IntegrationTests-005

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs
Status Resolved

Description: The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (WriteSecured ordering, OperationComplete semantics, invalid-handle exceptions). None of Write, WriteSecured, Unadvise, RemoveItem, Unregister, OperationComplete, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.

Recommendation: Add live coverage for at least a Write round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.

Resolution: Resolved 2026-05-18: Added two [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests. GatewaySession_WithLiveWorker_WritesValueToAdvisedItem registers/adds/advises then issues a Write of an integer value, asserting the command round-trips with ProtocolStatusCode.Ok and MxCommandKind.Write. GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault issues AddItem against int.MaxValue as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the ShutDownAsync cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the Write round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.

IntegrationTests-006

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs
Status Resolved

Description: LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which DashboardAuthenticator has distinct branches for (the LdapException catch, the candidate is null branch). The negative test only proves group-membership denial, not credential rejection.

Recommendation: Add a live test for admin with a wrong password asserting Succeeded == false and that the password is not leaked into FailureMessage, and a test for an unknown username.

Resolution: Resolved 2026-05-18: Added three [LiveLdapFact]-gated tests to DashboardLdapLiveTests. AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword exercises the LdapException catch via a rejected candidate bind and asserts the wrong password never reaches FailureMessage. AuthenticateAsync_UnknownUsername_Fails exercises the candidate is null branch. AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing builds the authenticator with LdapOptions.Port = 1 (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic catch (Exception) branch. All three are gated by the existing MXGATEWAY_RUN_LIVE_LDAP_TESTS opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.

IntegrationTests-007

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9
Status Resolved

Description: The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No [Collection] annotation or DisableTestParallelization is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.

Recommendation: Place the live test classes in a shared [Collection], or set [assembly: CollectionBehavior(DisableTestParallelization = true)] for this opt-in project, so live external resources are accessed serially.

Resolution: Resolved 2026-05-18: Confirmed — no [Collection] or assembly-level CollectionBehavior existed. Added LiveResourcesCollection.cs with a [CollectionDefinition(Name, DisableParallelization = true)] and applied [Collection(LiveResourcesCollection.Name)] to WorkerLiveMxAccessSmokeTests, GalaxyRepositoryLiveTests, and DashboardLdapLiveTests. A named collection (rather than an assembly-wide DisableTestParallelization) was chosen so the live classes serialize against each other and within themselves while non-live tests (IntegrationTestEnvironmentTests) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).

IntegrationTests-008

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs, src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs
Status Resolved

Description: Three near-identical fact attributes each re-implement the same "compare env var to 1 with StringComparison.Ordinal, set Skip otherwise" pattern. LiveMxAccessFactAttribute delegates to IntegrationTestEnvironment while the other two inline the logic, so the project has two divergent styles for the same concern.

Recommendation: Extract a shared helper (e.g. IntegrationTestEnvironment.IsEnabled(string variableName)) and have all three attributes call it.

Resolution: Resolved 2026-05-18: Confirmed — LiveLdapFactAttribute.Enabled and LiveGalaxyRepositoryFactAttribute.Enabled each inlined the ordinal == "1" comparison while LiveMxAccessFactAttribute delegated to IntegrationTestEnvironment. Added IntegrationTestEnvironment.IsEnabled(string variableName) as the single implementation; LiveMxAccessTestsEnabled, LiveLdapFactAttribute.Enabled, and LiveGalaxyRepositoryFactAttribute.Enabled now all call it. Verified by build.

IntegrationTests-009

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375
Status Resolved

Description: TestServerCallContext is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.

Recommendation: Reword the summary to "test stub" / "minimal ServerCallContext implementation for in-process gRPC calls."

Resolution: Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal ServerCallContext stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.

IntegrationTests-010

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369
Status Resolved

Description: WaitForFirstMessageAsync accepts only a timeout and never observes a CancellationToken. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s WaitAsync timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare TimeoutException.

Recommendation: Accept a CancellationToken (linked to TestServerCallContext's token), pass it to firstMessage.Task.WaitAsync(timeout, token), and on timeout emit the recorded Messages count via output.WriteLine before throwing.

Re-triage: The named method WaitForFirstMessageAsync no longer exists — IntegrationTests-003's resolution renamed/replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages and blocks on a SemaphoreSlim. The underlying defect still held: that replacement method also took only a timeout and never observed a CancellationToken. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's firstMessage.Task.WaitAsync detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.

Resolution: Resolved 2026-05-18: Added an optional CancellationToken parameter to WaitForMessageAsync, linked with the existing timeout source via CancellationTokenSource.CreateLinkedTokenSource, so a per-test cancellation aborts the wait promptly. GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now creates a CancellationTokenSource, passes its token into the StreamEvents TestServerCallContext and into WaitForMessageAsync, so the stream call and the wait share one cancellation source. On timeout the method already throws a TimeoutException whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate output.WriteLine). Verified by build; live tests not executed.

IntegrationTests-011

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187
Status Resolved

Description: The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code). HResultConverter.CreateProtocolStatus (src/MxGateway.Worker/Conversion/HResultConverter.cs:39) actually sets Code = ProtocolStatusCode.MxaccessFailure whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is not Ok on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting Ok. A reader can reasonably conclude the gateway always reports Ok for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.

Recommendation: Reword the invalid-handle comment to "the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero Hresult carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as MxaccessFailure (per HResultConverter), not as a RpcException.

Resolution: 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero hresult carrying the COM failure (per HResultConverter), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as MxaccessFailure, never as an RpcException.

IntegrationTests-012

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151
Status Resolved

Description: GatewaySession_WithLiveWorker_WritesValueToAdvisedItem constructs a RecordingServerStreamWriter<MxEvent> and starts a StreamEvents task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the Write command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an OnWriteComplete, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with new TestServerCallContext() (no cancellation source), any fault raised by the stream task (an exception from EventStreamService, a session-not-found, a backpressure overflow) is swallowed — streamTask is later awaited in ShutDownAsync only inside a broad catch (Exception ex), which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (eventWriter) is dead code in this test that suggests an assertion was intended.

Recommendation: Either remove the unused eventWriter/StreamEvents plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an OnWriteComplete event for the written item via eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...), matching the round-trip proof used by scripts/run-client-e2e-tests.ps1 -VerifyWrite.

Resolution: Resolved 2026-05-20: Rewrote GatewaySession_WithLiveWorker_WritesValueToAdvisedItem so the previously-dead eventWriter/StreamEvents plumbing actually drives an assertion. The test now waits for an OnWriteComplete event matching the Write's (server, item) handle pair via eventWriter.WaitForMessageAsync (using IntegrationTestEnvironment.LiveMxAccessEventTimeout), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a CancellationTokenSource and the test asserts streamTask.IsFaulted == false before cleanup. ShutDownAsync gained an opt-in propagateStreamFaults flag so a faulted StreamEvents task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so StreamEvents observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.

IntegrationTests-013

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609
Status Resolved

Description: RecordingServerStreamWriter<T> owns a SemaphoreSlim messageArrived (IDisposable) but does not itself implement IDisposable, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an IDisposable field — the standard hygiene under Directory.Build.props's TreatWarningsAsErrors=true is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself IDisposable; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.

Recommendation: Make RecordingServerStreamWriter<T> implement IDisposable, dispose messageArrived in Dispose, and wrap each instantiation in a using block (using RecordingServerStreamWriter<MxEvent> eventWriter = new();).

Resolution: 2026-05-20 — RecordingServerStreamWriter<T> now implements IDisposable and its Dispose releases the messageArrived semaphore. All six live tests in WorkerLiveMxAccessSmokeTests now allocate the writer with a top-of-method using declaration so the semaphore's wait handle is released on scope exit even when the test body throws.

IntegrationTests-014

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs
Status Resolved

Description: IntegrationTests-005 was resolved by adding live coverage for Write and an invalid-handle AddItem, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped Unadvise, RemoveItem, Unregister, OperationComplete, and WriteSecured ordering. CLAUDE.md singles out WriteSecured ("WriteSecured failing before a value-bearing NMX body") and OperationComplete semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.

Recommendation: Add live MXAccess coverage for the teardown chain (Unadvise then RemoveItem then Unregister, asserting each replies with ProtocolStatusCode.Ok and the next operation no longer references the freed handle), and at minimum one WriteSecured parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.

Resolution: Resolved 2026-05-20: Added three new [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests, all reusing the existing opt-in env var and ShutDownAsync cleanup helper. (1) GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies Ok with the matching MxCommandKind, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-Ok MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves an ArchestrA user id via AuthenticateUser (credentials env-overridable through MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER / ..._PASSWORD, defaulting to the admin/admin123 GLAuth user from glauth.md), issues WriteSecured against an advised item, and asserts the reply carries MxCommandKind.WriteSecured, the protocol status is one of the documented parity outcomes (Ok for an unprotected provider, MxaccessFailure when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted opens a session, kills the worker process tree (via a new TestWorkerProcessFactory.KillAllAndDetach helper) without going through CloseSession, and polls the session via a new GatewayServiceFixture.TryGetSession accessor until it transitions to SessionState.Faulted within the live event timeout; asserts the final state is Faulted, that FinalFault is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). docs/GatewayTesting.md was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.

IntegrationTests-015

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52
Status Resolved

Description: Every live-test method in the three live classes carries an identical [Trait("Category", "LiveMxAccess")] (or LiveLdap / LiveGalaxy) attribute. The trait is uniform within each class and is exactly the information the [Collection(LiveResourcesCollection.Name)] class-level attribute also implies. xUnit's [Trait] is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's LiveLdap includes five copies of the same line.

Recommendation: Move each [Trait("Category", ...)] to the class declaration alongside the existing [Collection(...)], and remove the per-method copies. Verify the trait still surfaces in --filter Trait=Category=LiveLdap after the change.

Resolution: 2026-05-20 — Lifted [Trait("Category", "LiveMxAccess")], [Trait("Category", "LiveLdap")], and [Trait("Category", "LiveGalaxy")] to the class declarations of WorkerLiveMxAccessSmokeTests, DashboardLdapLiveTests, and GalaxyRepositoryLiveTests respectively (alongside the existing [Collection(LiveResourcesCollection.Name)]), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so --filter Category=Live* filters still match.

IntegrationTests-016

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26, src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13
Status Resolved

Description: The default Galaxy Repository connection string "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;" is duplicated verbatim between the production GalaxyRepositoryOptions.ConnectionString initializer and the test-side LiveGalaxyRepositoryFactAttribute.ConnectionString fallback. The docs (docs/GatewayTesting.md) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.

Recommendation: Expose the production default through a public const string on GalaxyRepositoryOptions (e.g. DefaultConnectionString) and have LiveGalaxyRepositoryFactAttribute.ConnectionString read Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString. Single source of truth, build-time guarantee they cannot drift.

Resolution: 2026-05-20 — Added public const string GalaxyRepositoryOptions.DefaultConnectionString carrying the production default, set the ConnectionString initializer to reference it, and changed LiveGalaxyRepositoryFactAttribute.ConnectionString to fall back to GalaxyRepositoryOptions.DefaultConnectionString. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.

IntegrationTests-017

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407
Status Resolved

Description: GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity proves the subscription is live by waiting for one matching OnDataChange, snapshots dataChangeCountBeforeUnadvise, then sends UnAdvise, waits 500 ms, snapshots dataChangeCountAfterTeardown, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:

  1. After WaitForMessageAsync returns the first match, any additional OnDataChange for the same (serverHandle, itemHandle) published by the provider before the worker processes UnAdvise is delivered into the recording writer.
  2. The snapshot at line 362 is taken immediately before the UnAdvise command is sent (line 370). Events that arrive in the window between that snapshot and the worker processing UnAdvise (network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot.
  3. MXAccess providers can publish OnDataChange at sub-second cadence; the strict-equality assertion has no slack for in-flight events.

The test passes today only because the chosen test item (TestChildObject.TestInt) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events after the worker stops the subscription") would be better expressed by capturing the snapshot after UnAdvise returns Ok rather than before it is issued.

Recommendation: Move the "before" snapshot to immediately after UnAdvise returns Ok (the point past which the parity rule applies), or weaken the assertion to "no events with a WorkerSequence strictly greater than the last sequence observed within dataChangeCountBeforeUnadvise + N events arrived in the post-teardown drain" where N accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.

Resolution: 2026-05-20 — Removed the dataChangeCountBeforeUnadvise snapshot taken just before the UnAdvise command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken after the teardown chain completes. The test now waits one 500 ms settle window for in-flight OnDataChange events (which the provider already published before the worker acknowledged UnAdvise) to drain, captures dataChangeCountAfterFirstSettle, waits another 500 ms, and asserts the count is unchanged in dataChangeCountAfterSecondSettle. The parity rule under test ("no further OnDataChange after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.

IntegrationTests-018

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595
Status Resolved

Description: GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session) declares session as a non-nullable GatewaySession, but its implementation is _registry.TryGet(sessionId, out session), which (in SessionRegistry.cs:43) uses session! to silence the nullability warning on Dictionary.TryGetValue. On a false return the out parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as out GatewaySession? session, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a false return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces Nullable=enable;TreatWarningsAsErrors=true (src/Directory.Build.props), so the convention is for TryX patterns to either annotate the out as T? or to mirror BCL Dictionary.TryGetValue<TKey, TValue> (which uses [MaybeNullWhen(false)] out TValue).

Recommendation: Change the signature to public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session) (mirroring BCL TryGetValue) and propagate the same annotation down through ISessionRegistry.TryGet / SessionRegistry.TryGet so the session! fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.

Resolution: 2026-05-20 — Propagated [MaybeNullWhen(false)] through the entire TryGet* chain. GatewayServiceFixture.TryGetSession, ISessionManager.TryGetSession / SessionManager.TryGetSession, and ISessionRegistry.TryGet/TryRemove plus their SessionRegistry implementations now carry the BCL Dictionary.TryGetValue-style annotation, and the session! null-forgiving operator inside SessionRegistry.TryGet / TryRemove was removed because the annotation makes it redundant. Existing in-tree callers (SessionManagerTests.cs line 28) were updated to out GatewaySession? to match. The compiler now warns at callsites that read session without checking the boolean return, closing the silent contract gap. Verified by dotnet build src/MxGateway.IntegrationTests/... (0 warnings), dotnet build src/MxGateway.Tests/... (0 warnings), and dotnet test src/MxGateway.Tests/... (479 passed).

IntegrationTests-019

Field Value
Severity Medium
Category Security
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534
Status Resolved

Description: GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves a credential pair via ResolveLiveMxAccessSecuredCredentials, passes the password into AuthenticateUser and WriteSecured, and asserts the password is absent from writeSecuredReply.DiagnosticMessage. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, WriteSecured payloads, and AuthenticateUser credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:

  • The TestOutputLoggerProvider writes every gateway-side ILogger entry to ITestOutputHelper. A regression that logged the request body (or the WorkerCommandRequest envelope) would put the password into test output without failing this test.
  • WriteWorkerOutput echoes worker stdout/stderr lines to ITestOutputHelper. A worker-side regression that printed the credential (e.g. a debug log added to MxAccessCommandExecutor) would land in test output without failing this test.
  • output.WriteLine(...) calls in the test body (AuthenticateUser status=... user_id=... and LogReply("WriteSecured", ...)) currently don't include the request body, but a future maintenance change that printed command.WriteSecured.Value or a similar struct dump would silently leak the credential past the existing assertion.

Because ITestOutputHelper doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.

Recommendation: Replace the bare ITestOutputHelper injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a StringBuilder. At the end of the test, assert the buffer does not contain verifyPassword. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through GatewayLogRedactor (src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs) so the credential-bearing commands are redacted at the logger sink the test sees.

Resolution: 2026-05-20 — Added a RecordingTestOutputHelper private class that implements ITestOutputHelper, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a StringBuilder exposed via a Captured property. The WriteSecured parity test now constructs this wrapper, passes it to both TestWorkerProcessFactory (so worker stdout/stderr lines flow through it) and GatewayServiceFixture (so the TestOutputLoggerProvider's gateway-ILogger entries flow through it), and uses it for every direct WriteLine. A new LogReplyTo(ITestOutputHelper sink, …) static helper underpins the existing LogReply instance method so the test body can route reply logging through the recording wrapper. After the cleanup finally block completes, the test asserts recordedOutput.Captured does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body WriteLine — a future regression that dumped the request body, the WorkerCommandRequest envelope, or the WriteSecured payload would land in the buffer and fail the assertion.

IntegrationTests-020

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622
Status Resolved

Description: GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted asserts the FinalFault description contains at least one of these substrings (case-insensitive): disconnect, pipe, heartbeat, worker, end of stream. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The "worker" substring defeats that intent — the gateway routes through SetFaulted with messages like "Worker pipe disconnected.", "Worker shutdown timed out.", "Worker was killed by the gateway: …", "Worker heartbeat expired. …", "Worker event channel rejected an event.", "Worker pipe write failed.", "Worker read loop failed.", "Worker sent unexpected envelope body …" — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word Worker would still pass this test.

CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (SessionState.Faulted with an actionable cause), so the test's documented value is verifying which of the WorkerClient error codes drove the transition. Today the assertion is effectively Assert.NotEmpty(observedFault).

Recommendation: Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — PipeDisconnected ("pipe", "disconnect") and EndOfStream ("end of stream"). Drop the broad "worker" term, and drop "heartbeat" unless the test deliberately covers the heartbeat path too (it does not — HeartbeatGraceSeconds = 15 and the poll deadline is StreamShutdownTimeout = 10 seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert FinalFault.StartsWith("Worker pipe disconnected") against the message constant in WorkerClient.cs:380 so a rename surfaces as a compile-time / test-time failure.

Resolution: 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the FinalFault description to contain "pipe disconnected" (matching the WorkerClient.cs:378-381 WorkerFrameProtocolErrorCode.EndOfStreamWorkerClientErrorCode.PipeDisconnected"Worker pipe disconnected." message) or "end of stream", dropping the broad "worker" term that previously matched every WorkerClient fault message (all of which begin with "Worker"), and dropping "heartbeat" because the test's StreamShutdownTimeout (10 s) is below HeartbeatGraceSeconds (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.

IntegrationTests-021

Field Value
Severity Low
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622
Status Resolved

Description: The abnormal-exit test only polls session.State and session.FinalFault. It does not assert anything about streamTask after the kill. The chain that puts the session into Faulted is: the read loop hits EOS → SetFaulted(PipeDisconnected, …)_events.Writer.TryComplete(fault)ReadEventsAsync propagates the WorkerClientExceptionEventStreamService.ProduceEventsAsync's catch (Exception exception) when exception is WorkerClientException calls session.MarkFaulted(exception.Message). The test verifies the end state of that chain but not that the StreamEvents call is what produced the transition. If a future change moved the MarkFaulted call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that streamTask.IsFaulted (or that awaiting it throws a WorkerClientException) would protect that contract.

The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (Assert.False(streamTask.IsFaulted, …)). The abnormal-exit test should add the inverse: the stream task must be faulted (or at least completed with a WorkerClientException) after the kill.

Recommendation: After the session-state poll succeeds, assert streamTask.IsCompleted (the channel has terminated) and inspect streamTask.Exception?.InnerException for a WorkerClientException (or assert streamTask.IsFaulted and await with ShouldThrowAsync). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing Assert.False(streamTask.IsFaulted, …) on line 217 — the abnormal-exit case wants the opposite assertion.

Resolution: 2026-05-20 — After the session-state poll loop confirms SessionState.Faulted, the test now awaits streamTask.WaitAsync(StreamShutdownTimeout) (with a try/catch that logs the surfaced exception type/message), then asserts streamTask.IsCompleted and streamTask.IsFaulted. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → WorkerClient.SetFaulted(PipeDisconnected, …)ReadEventsAsync propagates the fault → EventStreamService calls session.MarkFaultedMxAccessGatewayService.StreamEvents re-throws the mapped RpcException. A future refactor that moved MarkFaulted off the stream-consumption path would leave streamTask completing cleanly, which the new IsFaulted assertion would now catch (inverse of the existing Assert.False(streamTask.IsFaulted, …) in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps WorkerClientException to RpcException at the public boundary (MxAccessGatewayService.MapWorkerClientException); asserting on the surface type alone would be brittle, while the IsFaulted check directly tests the contract the recommendation is protecting.

IntegrationTests-022

Field Value
Severity Low
Category Code organization & conventions
Location src/ZB.MOM.WW.MxGateway.IntegrationTests/IntegrationTestEnvironment.cs:103-138 (ResolveRepositoryRoot / IsRepositoryRoot)
Status Resolved

Description: The walker introduced in dc9c0c9 searches parents for a directory containing src/ plus either .git or a *.slnx/*.sln file, falling back to Directory.GetCurrentDirectory() when nothing matches. The fallback masks misconfiguration: a test run from a subdirectory of an unpacked tree without a .git or .slnx marker silently uses the wrong root and then the live MXGATEWAY_LIVE_MXACCESS_WORKER_EXE lookup produces a misleading "worker exe not found" error pointing at a fabricated path. The current bin/ layout makes this unlikely in practice (the test host sets a stable working directory), but the failure mode would be confusing if it triggered.

Recommendation: Throw a clear InvalidOperationException from ResolveRepositoryRoot when the walk exhausts without finding a root, naming the directories searched and the markers expected. The opt-in env var (MXGATEWAY_LIVE_MXACCESS_WORKER_EXE) remains the escape hatch for unusual deployments.

Resolution: Resolved 2026-05-24 — Replaced the silent Directory.GetCurrentDirectory() fallback in IntegrationTestEnvironment.ResolveRepositoryRoot with an InvalidOperationException whose message names the start directory, the expected markers (src/, .git, *.sln, *.slnx), and the MXGATEWAY_LIVE_MXACCESS_WORKER_EXE escape hatch. Added regression test IntegrationTestEnvironmentTests.ResolveRepositoryRoot_NoMarkers_ThrowsInvalidOperationExceptionNamingStartAndMarkers that creates an isolated temp directory under Path.GetTempPath() containing no markers, asserts the throw, and asserts the message carries the start directory, .git, .sln, and the worker-exe env-var name so a future refactor that dropped any of these from the diagnostic would fail loudly. TDD red/green confirmed: the new test fails against the pre-fix walker (Assert.Throws() Failure: No exception was thrown) and passes after the throw is added. The existing ResolveRepositoryRoot_AcceptsGitWorktreeFile test still passes, so the happy path is unaffected.

IntegrationTests-023

Field Value
Severity Low
Category Testing coverage
Location src/ZB.MOM.WW.MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:14-29
Status Resolved

Description: AuthenticateAsync_AdminInGwAdminGroup_Succeeds asserts the principal carries an LdapGroupClaimType claim containing GwAdmin (line 26-28). After the 27ed651 refactor, DashboardAuthenticator.CreatePrincipal also emits a ClaimTypes.Role claim with the mapped role (Admin for the seeded GwAdminAdmin mapping). The test asserts the LDAP-group claim but not the role claim, so a regression that stopped emitting Role: Admin (e.g. a future refactor of MapGroupsToRoles that returned an empty list) would silently pass.

Recommendation: Add an Assert.Contains asserting the principal holds a ClaimTypes.Role claim with value DashboardRoles.Admin (or "Admin"). Mirror the style of the existing LdapGroupClaimType assertion.

Resolution: Resolved 2026-05-24 — Added an Assert.Contains to AuthenticateAsync_AdminInGwAdminGroup_Succeeds that mirrors the existing LdapGroupClaimType assertion style, asserting the principal carries a ClaimTypes.Role claim with value DashboardRoles.Admin. Couples the live test to the GwAdminAdmin mapping wired through CreateAuthenticator's GroupToRole dictionary, so a future regression to MapGroupsToRoles (returning an empty list, dropping the RDN-fallback lookup, breaking the case-insensitive comparer) would now surface as a failed live test rather than a silent pass. The test is gated by MXGATEWAY_RUN_LIVE_LDAP_TESTS; build is green and the test continues to skip cleanly when the env var is unset.

IntegrationTests-024

Field Value
Severity Low
Category Code organization & conventions
Location src/ZB.MOM.WW.MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs (NullDashboardEventBroadcaster private class at end of file)
Status Resolved

Description: The inline NullDashboardEventBroadcaster private class is the third copy in the repository (the other two live in EventStreamServiceTests and GatewayEndToEndFakeWorkerSmokeTests under the Tests module — see Tests-025). Each carries a singleton Instance field and a no-op Publish. While the integration-tests project doesn't directly share test-support code with the Tests project, a duplicate-everywhere pattern reads as a code smell.

Recommendation: When Tests-025 lands the shared TestSupport/NullDashboardEventBroadcaster.cs, either reference the same shared helper from this project (a project reference if practical) or accept the duplication as a deliberate isolation between the unit-test and integration-test trees. Either choice is fine; the current state is the only one that should not persist.

Resolution: Resolved 2026-05-24 — Chose option (b) (extract within IntegrationTests) so the unit-test and integration-test projects stay independently buildable without a shared test-helpers project. Moved the inline private class out of WorkerLiveMxAccessSmokeTests.cs into a new public type at src/ZB.MOM.WW.MxGateway.IntegrationTests/TestSupport/NullDashboardEventBroadcaster.cs (namespace ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport), mirroring the layout the unit-test Tests project established for Tests-007/Tests-021. The fixture wires through the same NullDashboardEventBroadcaster.Instance singleton, but the constructor is now private so future integration tests can rely on a single shared no-op. A new using ZB.MOM.WW.MxGateway.IntegrationTests.TestSupport; in WorkerLiveMxAccessSmokeTests.cs is the only change to its EventStreamService wiring. Build is green (0 warnings, 0 errors); no regression in the existing tests (4 passed / 15 skipped, identical to the pre-change baseline).