Files
mxaccessgw/code-reviews/IntegrationTests/findings.md
T
Joseph Doherty a0203503a7 Code-review 2026-05-20 sweep: re-review at 1cd51bb, resolve 72 findings across all 11 modules
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).

Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
  GatewayGrpcScopeResolver so non-admin keys can use them; document
  the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
  CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
  in generated tonic code by reformatting the ReadBulkCommand proto
  comment and scoping a #![allow(...)] to the generated submodules.

Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
  make DisposeAsync race-safe against in-flight CloseAsync (-016);
  add constraint-enforcement test coverage for the bulk-plan path
  (-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
  can distinguish graceful shutdown from a real STA-affinity
  violation (-016); have the watchdog skip StaHung while
  CurrentCommandCorrelationId is non-empty so a legitimate slow
  ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
  11 GatewaySession bulk methods (-013); replace the real TCP probe
  in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
  (-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
  test and assert OnWriteComplete (-012); add live tests for
  Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
  abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
  CreateForTesting factory (-016); cover WorkerCancel and
  unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
  beforeStart() (-014); return a CancellingCompletableFuture that
  actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
  the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
  histograms with failed-call durations (-015); add coverage for
  the five MalformedReply paths, the bulk-write helpers, the
  Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
  command family (-009).

Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
  WorkerAlarmRpcDispatcher missing-session handling; drop the
  duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
  XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
  subscriptionExpression / ExecutingCommand arms; preserve
  factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
  three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
  FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
  source; switch the heartbeat-expires test to ManualTimeProvider;
  add InvariantCulture to the remaining DateTimeOffset.Parse sites;
  document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
  IDisposable, class-level [Trait], single-source ZB default
  connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
  so absent env vars SKIP not pass; PascalCase rename of probe
  [Fact]s; deterministic deadline test; new frame-protocol error
  tests; ComputeTransitions diff-coverage; relocate dev-rig probes
  to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
  Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
  TreatWarningsAsErrors / analysers apply; document
  DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
  bulk-read handles in CLI; surface AcknowledgeAlarm transport
  faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
  runWriteBulkVariant; document the six new subcommands in
  writeUsage; drain galaxy-watch events on limit; switch io.EOF
  comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
  option; regex-based credential redaction; Long.toUnsignedString
  for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
  _percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
  _api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
  stop hard-coding correlation IDs; resync RustClientDesign.md
  with the current Session / Error surface and CLI subcommand set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:46:47 -04:00

34 KiB

Code Review — IntegrationTests

Field Value
Module src/MxGateway.IntegrationTests
Reviewer Claude Code
Review date 2026-05-20
Commit reviewed 1cd51bb
Status Reviewed
Open findings 0

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

2026-05-20 review (commit 1cd51bb)

# Category Result
1 Correctness & logic bugs Issue found: IntegrationTests-012 (Write test starts a StreamEvents task and never observes it — silent event-stream coverage gap and an unobserved fault path).
2 mxaccessgw conventions Live opt-ins, [Collection] serialization, and the "don't synthesize events" rule are honored. No issues found.
3 Concurrency & thread safety LiveResourcesCollection serializes all three live classes; RecordingServerStreamWriter locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found.
4 Error handling & resilience ShutDownAsync already isolates cleanup exceptions per category. No issues found.
5 Security The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found.
6 Performance & resource management Issue found: IntegrationTests-013 (RecordingServerStreamWriter.messageArrived SemaphoreSlim is never disposed; the type owns an IDisposable field but is not itself disposable).
7 Design-document adherence No issues found. docs/GatewayTesting.md now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions.
8 Code organization & conventions Issues found: IntegrationTests-015 ([Trait("Category", ...)] repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between LiveGalaxyRepositoryFactAttribute and GalaxyRepositoryOptions).
9 Testing coverage Issue found: IntegrationTests-014 (Unadvise, RemoveItem, Unregister, WriteSecured ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out).
10 Documentation & comments Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-Ok MXAccess failure as ProtocolStatusCode.Ok, contradicting both the assertion and HResultConverter).

2026-05-18 review (commit 6c64030)

# Category Result
1 Correctness & logic bugs Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (WaitForMessageAsync ignores cancellation).
2 mxaccessgw conventions Live tests correctly gated and skip (not fail) when prerequisites are absent; LiveGalaxyRepositoryFactAttribute undocumented in the opt-in matrix.
3 Concurrency & thread safety Issue found: IntegrationTests-007 (no [Collection]/parallelism guard for shared MXAccess/ZB/GLAuth).
4 Error handling & resilience Issue found: IntegrationTests-004 (cleanup WaitAsync can mask the original failure).
5 Security No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found.
6 Performance & resource management Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found.
7 Design-document adherence Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (GwAdmin LDAP prerequisite undocumented).
8 Code organization & conventions Issue found: IntegrationTests-008 (three near-identical fact attributes).
9 Testing coverage Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage).
10 Documentation & comments Issue found: IntegrationTests-009 (TestServerCallContext mislabelled "Mock").

Findings

IntegrationTests-001

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs
Status Resolved

Description: The Galaxy Repository live test suite and its gating env var MXGATEWAY_RUN_LIVE_GALAXY_TESTS (plus connection-string override MXGATEWAY_LIVE_GALAXY_CONN) are completely absent from docs/GatewayTesting.md. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.

Recommendation: Add a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1, MXGATEWAY_LIVE_GALAXY_CONN, the ZB database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.

Resolution: Resolved 2026-05-18: Added a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS, MXGATEWAY_LIVE_GALAXY_CONN, the deployed-ZB prerequisite, and the covered GalaxyRepository RPCs.

IntegrationTests-002

Field Value
Severity High
Category Design-document adherence
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13, src/MxGateway.Server/Configuration/LdapOptions.cs:27
Status Resolved

Description: DashboardLdapLiveTests builds the authenticator with new GatewayOptions(), so it relies on LdapOptions.RequiredGroup defaulting to GwAdmin and asserts the admin user is a member of a GwAdmin LDAP group. glauth.md does not list GwAdmin as a provisioned group — it lists admin only in the five role groups and describes GwAdmin as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."

Recommendation: Either document the required GwAdmin GLAuth provisioning step in glauth.md and GatewayTesting.md, or have the test set RequiredGroup to a baseline group glauth.md guarantees admin belongs to (e.g. WriteOperate).

Resolution: Resolved 2026-05-18: Took the documentation fix — promoted the glauth.md "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants GwAdmin to admin, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to docs/GatewayTesting.md calling out GwAdmin as a hard prerequisite. Alternative considered: weaken the test to a baseline group (WriteOperate) — rejected because GwAdmin is the real default LdapOptions.RequiredGroup and the test should exercise it.

IntegrationTests-003

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97
Status Resolved

Description: The test asserts only on the first MxEvent recorded by RecordingServerStreamWriter. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected OnDataChange (e.g. a registration-state or bad-quality bootstrap event). Because WaitForFirstMessageAsync returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.

Recommendation: Filter for the first event with Family == OnDataChange (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.

Resolution: Resolved 2026-05-18: Confirmed against source — WaitForFirstMessageAsync completed a TaskCompletionSource on the very first WriteAsync. Replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages, skips earlier non-matching events, and blocks on a SemaphoreSlim until a matching one arrives or the timeout elapses (throwing a TimeoutException that reports the scanned count). GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now waits for the first Family == OnDataChange event. Live execution was not possible in this environment (no MXAccess COM); verified by build.

IntegrationTests-004

Field Value
Severity Medium
Category Error handling & resilience
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111
Status Resolved

Description: In the finally block, after CloseSessionAsync, the test does await streamTask.WaitAsync(StreamShutdownTimeout). If closing the session does not promptly complete the stream (or StreamEvents itself faults), this throws TimeoutException from inside finally, which replaces/masks any original assertion failure from the try block. The diagnostic value of the real failure is lost.

Recommendation: Wrap the streamTask.WaitAsync (and ideally WaitForProcessesAsync) in a try/catch that logs the cleanup exception via output.WriteLine instead of letting it propagate.

Resolution: Resolved 2026-05-18: Confirmed — the finally block awaited streamTask.WaitAsync and WaitForProcessesAsync with no exception handling. Extracted a shared ShutDownAsync helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via output.WriteLine instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.

IntegrationTests-005

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs
Status Resolved

Description: The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (WriteSecured ordering, OperationComplete semantics, invalid-handle exceptions). None of Write, WriteSecured, Unadvise, RemoveItem, Unregister, OperationComplete, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.

Recommendation: Add live coverage for at least a Write round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.

Resolution: Resolved 2026-05-18: Added two [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests. GatewaySession_WithLiveWorker_WritesValueToAdvisedItem registers/adds/advises then issues a Write of an integer value, asserting the command round-trips with ProtocolStatusCode.Ok and MxCommandKind.Write. GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault issues AddItem against int.MaxValue as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the ShutDownAsync cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the Write round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.

IntegrationTests-006

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs
Status Resolved

Description: LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which DashboardAuthenticator has distinct branches for (the LdapException catch, the candidate is null branch). The negative test only proves group-membership denial, not credential rejection.

Recommendation: Add a live test for admin with a wrong password asserting Succeeded == false and that the password is not leaked into FailureMessage, and a test for an unknown username.

Resolution: Resolved 2026-05-18: Added three [LiveLdapFact]-gated tests to DashboardLdapLiveTests. AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword exercises the LdapException catch via a rejected candidate bind and asserts the wrong password never reaches FailureMessage. AuthenticateAsync_UnknownUsername_Fails exercises the candidate is null branch. AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing builds the authenticator with LdapOptions.Port = 1 (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic catch (Exception) branch. All three are gated by the existing MXGATEWAY_RUN_LIVE_LDAP_TESTS opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.

IntegrationTests-007

Field Value
Severity Low
Category Concurrency & thread safety
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9
Status Resolved

Description: The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No [Collection] annotation or DisableTestParallelization is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.

Recommendation: Place the live test classes in a shared [Collection], or set [assembly: CollectionBehavior(DisableTestParallelization = true)] for this opt-in project, so live external resources are accessed serially.

Resolution: Resolved 2026-05-18: Confirmed — no [Collection] or assembly-level CollectionBehavior existed. Added LiveResourcesCollection.cs with a [CollectionDefinition(Name, DisableParallelization = true)] and applied [Collection(LiveResourcesCollection.Name)] to WorkerLiveMxAccessSmokeTests, GalaxyRepositoryLiveTests, and DashboardLdapLiveTests. A named collection (rather than an assembly-wide DisableTestParallelization) was chosen so the live classes serialize against each other and within themselves while non-live tests (IntegrationTestEnvironmentTests) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).

IntegrationTests-008

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs, src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs
Status Resolved

Description: Three near-identical fact attributes each re-implement the same "compare env var to 1 with StringComparison.Ordinal, set Skip otherwise" pattern. LiveMxAccessFactAttribute delegates to IntegrationTestEnvironment while the other two inline the logic, so the project has two divergent styles for the same concern.

Recommendation: Extract a shared helper (e.g. IntegrationTestEnvironment.IsEnabled(string variableName)) and have all three attributes call it.

Resolution: Resolved 2026-05-18: Confirmed — LiveLdapFactAttribute.Enabled and LiveGalaxyRepositoryFactAttribute.Enabled each inlined the ordinal == "1" comparison while LiveMxAccessFactAttribute delegated to IntegrationTestEnvironment. Added IntegrationTestEnvironment.IsEnabled(string variableName) as the single implementation; LiveMxAccessTestsEnabled, LiveLdapFactAttribute.Enabled, and LiveGalaxyRepositoryFactAttribute.Enabled now all call it. Verified by build.

IntegrationTests-009

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375
Status Resolved

Description: TestServerCallContext is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.

Recommendation: Reword the summary to "test stub" / "minimal ServerCallContext implementation for in-process gRPC calls."

Resolution: Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal ServerCallContext stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.

IntegrationTests-010

Field Value
Severity Low
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369
Status Resolved

Description: WaitForFirstMessageAsync accepts only a timeout and never observes a CancellationToken. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s WaitAsync timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare TimeoutException.

Recommendation: Accept a CancellationToken (linked to TestServerCallContext's token), pass it to firstMessage.Task.WaitAsync(timeout, token), and on timeout emit the recorded Messages count via output.WriteLine before throwing.

Re-triage: The named method WaitForFirstMessageAsync no longer exists — IntegrationTests-003's resolution renamed/replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages and blocks on a SemaphoreSlim. The underlying defect still held: that replacement method also took only a timeout and never observed a CancellationToken. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's firstMessage.Task.WaitAsync detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.

Resolution: Resolved 2026-05-18: Added an optional CancellationToken parameter to WaitForMessageAsync, linked with the existing timeout source via CancellationTokenSource.CreateLinkedTokenSource, so a per-test cancellation aborts the wait promptly. GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now creates a CancellationTokenSource, passes its token into the StreamEvents TestServerCallContext and into WaitForMessageAsync, so the stream call and the wait share one cancellation source. On timeout the method already throws a TimeoutException whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate output.WriteLine). Verified by build; live tests not executed.

IntegrationTests-011

Field Value
Severity Low
Category Documentation & comments
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187
Status Resolved

Description: The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code). HResultConverter.CreateProtocolStatus (src/MxGateway.Worker/Conversion/HResultConverter.cs:39) actually sets Code = ProtocolStatusCode.MxaccessFailure whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is not Ok on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting Ok. A reader can reasonably conclude the gateway always reports Ok for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.

Recommendation: Reword the invalid-handle comment to "the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero Hresult carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as MxaccessFailure (per HResultConverter), not as a RpcException.

Resolution: 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero hresult carrying the COM failure (per HResultConverter), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as MxaccessFailure, never as an RpcException.

IntegrationTests-012

Field Value
Severity Medium
Category Correctness & logic bugs
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151
Status Resolved

Description: GatewaySession_WithLiveWorker_WritesValueToAdvisedItem constructs a RecordingServerStreamWriter<MxEvent> and starts a StreamEvents task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the Write command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an OnWriteComplete, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with new TestServerCallContext() (no cancellation source), any fault raised by the stream task (an exception from EventStreamService, a session-not-found, a backpressure overflow) is swallowed — streamTask is later awaited in ShutDownAsync only inside a broad catch (Exception ex), which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (eventWriter) is dead code in this test that suggests an assertion was intended.

Recommendation: Either remove the unused eventWriter/StreamEvents plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an OnWriteComplete event for the written item via eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...), matching the round-trip proof used by scripts/run-client-e2e-tests.ps1 -VerifyWrite.

Resolution: Resolved 2026-05-20: Rewrote GatewaySession_WithLiveWorker_WritesValueToAdvisedItem so the previously-dead eventWriter/StreamEvents plumbing actually drives an assertion. The test now waits for an OnWriteComplete event matching the Write's (server, item) handle pair via eventWriter.WaitForMessageAsync (using IntegrationTestEnvironment.LiveMxAccessEventTimeout), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a CancellationTokenSource and the test asserts streamTask.IsFaulted == false before cleanup. ShutDownAsync gained an opt-in propagateStreamFaults flag so a faulted StreamEvents task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so StreamEvents observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.

IntegrationTests-013

Field Value
Severity Low
Category Performance & resource management
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609
Status Resolved

Description: RecordingServerStreamWriter<T> owns a SemaphoreSlim messageArrived (IDisposable) but does not itself implement IDisposable, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an IDisposable field — the standard hygiene under Directory.Build.props's TreatWarningsAsErrors=true is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself IDisposable; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.

Recommendation: Make RecordingServerStreamWriter<T> implement IDisposable, dispose messageArrived in Dispose, and wrap each instantiation in a using block (using RecordingServerStreamWriter<MxEvent> eventWriter = new();).

Resolution: 2026-05-20 — RecordingServerStreamWriter<T> now implements IDisposable and its Dispose releases the messageArrived semaphore. All six live tests in WorkerLiveMxAccessSmokeTests now allocate the writer with a top-of-method using declaration so the semaphore's wait handle is released on scope exit even when the test body throws.

IntegrationTests-014

Field Value
Severity Medium
Category Testing coverage
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs
Status Resolved

Description: IntegrationTests-005 was resolved by adding live coverage for Write and an invalid-handle AddItem, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped Unadvise, RemoveItem, Unregister, OperationComplete, and WriteSecured ordering. CLAUDE.md singles out WriteSecured ("WriteSecured failing before a value-bearing NMX body") and OperationComplete semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.

Recommendation: Add live MXAccess coverage for the teardown chain (Unadvise then RemoveItem then Unregister, asserting each replies with ProtocolStatusCode.Ok and the next operation no longer references the freed handle), and at minimum one WriteSecured parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.

Resolution: Resolved 2026-05-20: Added three new [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests, all reusing the existing opt-in env var and ShutDownAsync cleanup helper. (1) GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies Ok with the matching MxCommandKind, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-Ok MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves an ArchestrA user id via AuthenticateUser (credentials env-overridable through MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER / ..._PASSWORD, defaulting to the admin/admin123 GLAuth user from glauth.md), issues WriteSecured against an advised item, and asserts the reply carries MxCommandKind.WriteSecured, the protocol status is one of the documented parity outcomes (Ok for an unprotected provider, MxaccessFailure when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted opens a session, kills the worker process tree (via a new TestWorkerProcessFactory.KillAllAndDetach helper) without going through CloseSession, and polls the session via a new GatewayServiceFixture.TryGetSession accessor until it transitions to SessionState.Faulted within the live event timeout; asserts the final state is Faulted, that FinalFault is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). docs/GatewayTesting.md was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.

IntegrationTests-015

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52
Status Resolved

Description: Every live-test method in the three live classes carries an identical [Trait("Category", "LiveMxAccess")] (or LiveLdap / LiveGalaxy) attribute. The trait is uniform within each class and is exactly the information the [Collection(LiveResourcesCollection.Name)] class-level attribute also implies. xUnit's [Trait] is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's LiveLdap includes five copies of the same line.

Recommendation: Move each [Trait("Category", ...)] to the class declaration alongside the existing [Collection(...)], and remove the per-method copies. Verify the trait still surfaces in --filter Trait=Category=LiveLdap after the change.

Resolution: 2026-05-20 — Lifted [Trait("Category", "LiveMxAccess")], [Trait("Category", "LiveLdap")], and [Trait("Category", "LiveGalaxy")] to the class declarations of WorkerLiveMxAccessSmokeTests, DashboardLdapLiveTests, and GalaxyRepositoryLiveTests respectively (alongside the existing [Collection(LiveResourcesCollection.Name)]), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so --filter Category=Live* filters still match.

IntegrationTests-016

Field Value
Severity Low
Category Code organization & conventions
Location src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26, src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13
Status Resolved

Description: The default Galaxy Repository connection string "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;" is duplicated verbatim between the production GalaxyRepositoryOptions.ConnectionString initializer and the test-side LiveGalaxyRepositoryFactAttribute.ConnectionString fallback. The docs (docs/GatewayTesting.md) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.

Recommendation: Expose the production default through a public const string on GalaxyRepositoryOptions (e.g. DefaultConnectionString) and have LiveGalaxyRepositoryFactAttribute.ConnectionString read Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString. Single source of truth, build-time guarantee they cannot drift.

Resolution: 2026-05-20 — Added public const string GalaxyRepositoryOptions.DefaultConnectionString carrying the production default, set the ConnectionString initializer to reference it, and changed LiveGalaxyRepositoryFactAttribute.ConnectionString to fall back to GalaxyRepositoryOptions.DefaultConnectionString. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.