Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52 KiB
Code Review — IntegrationTests
| Field | Value |
|---|---|
| Module | src/MxGateway.IntegrationTests |
| Reviewer | Claude Code |
| Review date | 2026-05-20 |
| Commit reviewed | a020350 |
| Status | Reviewed |
| Open findings | 0 |
Checklist coverage
A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.
2026-05-20 re-review (commit a020350)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-017 (teardown-parity test's "no further OnDataChange after UnAdvise" assertion races against in-flight events the provider already published); IntegrationTests-020 (abnormal-exit test's fault-classification keyword list accepts the substring "worker", which matches almost any plausible fault message and dilutes the check to "the description is non-empty"). |
| 2 | mxaccessgw conventions | No issues found. The five new tests honor live opt-in gating, [Collection] serialization, "no synthesized events", and the credential-redaction contract for the assertions they make. |
| 3 | Concurrency & thread safety | No issues found. GatewaySession.State/FinalFault access in the abnormal-exit poll loop goes through _syncRoot; RecordingServerStreamWriter.Messages returns a locked snapshot copy. |
| 4 | Error handling & resilience | No issues found. ShutDownAsync's opt-in propagateStreamFaults correctly threads silent stream-task faults into the Write parity test without re-masking the IntegrationTests-004 path. |
| 5 | Security | Issue found: IntegrationTests-019 (WriteSecured live test asserts the password is absent from DiagnosticMessage only; it does not assert the credential is absent from the accumulated test output, where the worker stderr/stdout and the gateway log are echoed). |
| 6 | Performance & resource management | No issues found. All six RecordingServerStreamWriter<MxEvent> instantiations use using declarations; using CancellationTokenSource is the consistent pattern. |
| 7 | Design-document adherence | No issues found. docs/GatewayTesting.md documents all five new parity surfaces and the two new env-var defaults (MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER/_PASSWORD). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-018 (GatewayServiceFixture.TryGetSession declares out GatewaySession session non-nullable while the caller binds it as out GatewaySession? session; the null-forgiving operator inside SessionRegistry.TryGet propagates a misleading non-null annotation). |
| 9 | Testing coverage | Issue found: IntegrationTests-021 (abnormal-exit test does not assert the active StreamEvents task observed the worker fault; relies entirely on the session-state poll and would silently pass if MarkFaulted were ever moved off the stream-consumption path). |
| 10 | Documentation & comments | No issues found. Test XML comments now match what each assertion verifies (the IntegrationTests-011 fix is intact across both the Write and invalid-handle cases). |
2026-05-20 review (commit 1cd51bb)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issue found: IntegrationTests-012 (Write test starts a StreamEvents task and never observes it — silent event-stream coverage gap and an unobserved fault path). |
| 2 | mxaccessgw conventions | Live opt-ins, [Collection] serialization, and the "don't synthesize events" rule are honored. No issues found. |
| 3 | Concurrency & thread safety | LiveResourcesCollection serializes all three live classes; RecordingServerStreamWriter locks correctly and the semaphore wait is linked to both timeout and external cancellation. No issues found. |
| 4 | Error handling & resilience | ShutDownAsync already isolates cleanup exceptions per category. No issues found. |
| 5 | Security | The only embedded strings are documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. The wrong-password and unreachable-server tests assert no password leakage. No issues found. |
| 6 | Performance & resource management | Issue found: IntegrationTests-013 (RecordingServerStreamWriter.messageArrived SemaphoreSlim is never disposed; the type owns an IDisposable field but is not itself disposable). |
| 7 | Design-document adherence | No issues found. docs/GatewayTesting.md now documents the Live LDAP, Live Galaxy, and Write/invalid-handle MXAccess opt-ins added by the prior round of resolutions. |
| 8 | Code organization & conventions | Issues found: IntegrationTests-015 ([Trait("Category", ...)] repeated on every test method instead of declared once at class level); IntegrationTests-016 (the Galaxy default connection string is duplicated between LiveGalaxyRepositoryFactAttribute and GalaxyRepositoryOptions). |
| 9 | Testing coverage | Issue found: IntegrationTests-014 (Unadvise, RemoveItem, Unregister, WriteSecured ordering, and worker-fault parity still uncovered — IntegrationTests-005's resolution scoped these out). |
| 10 | Documentation & comments | Issue found: IntegrationTests-011 (the invalid-handle and write test comments describe a non-Ok MXAccess failure as ProtocolStatusCode.Ok, contradicting both the assertion and HResultConverter). |
2026-05-18 review (commit 6c64030)
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Issues found: IntegrationTests-003 (asserts only on first event), IntegrationTests-010 (WaitForMessageAsync ignores cancellation). |
| 2 | mxaccessgw conventions | Live tests correctly gated and skip (not fail) when prerequisites are absent; LiveGalaxyRepositoryFactAttribute undocumented in the opt-in matrix. |
| 3 | Concurrency & thread safety | Issue found: IntegrationTests-007 (no [Collection]/parallelism guard for shared MXAccess/ZB/GLAuth). |
| 4 | Error handling & resilience | Issue found: IntegrationTests-004 (cleanup WaitAsync can mask the original failure). |
| 5 | Security | No production secrets; only documented dev GLAuth creds and a localhost ZB connection string, all env-overridable. No issues found. |
| 6 | Performance & resource management | Worker process disposed transitively via session disposal; no leaked pipes/COM/processes. No issues found. |
| 7 | Design-document adherence | Issues found: IntegrationTests-001 (Galaxy live suite absent from the opt-in matrix), IntegrationTests-002 (GwAdmin LDAP prerequisite undocumented). |
| 8 | Code organization & conventions | Issue found: IntegrationTests-008 (three near-identical fact attributes). |
| 9 | Testing coverage | Issues found: IntegrationTests-005 (thin MXAccess parity coverage), IntegrationTests-006 (thin LDAP failure-path coverage). |
| 10 | Documentation & comments | Issue found: IntegrationTests-009 (TestServerCallContext mislabelled "Mock"). |
Findings
IntegrationTests-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:7, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs |
| Status | Resolved |
Description: The Galaxy Repository live test suite and its gating env var MXGATEWAY_RUN_LIVE_GALAXY_TESTS (plus connection-string override MXGATEWAY_LIVE_GALAXY_CONN) are completely absent from docs/GatewayTesting.md. CLAUDE.md mandates updating docs in the same change as the source. The opt-in matrix documents only the MXAccess and LDAP env vars, so an operator running the documented matrix has no way to know these tests exist or how to enable them.
Recommendation: Add a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1, MXGATEWAY_LIVE_GALAXY_CONN, the ZB database prerequisite, and the covered RPCs, mirroring the existing "Live MXAccess Smoke" section.
Resolution: Resolved 2026-05-18: Added a "Live Galaxy Repository" section to docs/GatewayTesting.md documenting MXGATEWAY_RUN_LIVE_GALAXY_TESTS, MXGATEWAY_LIVE_GALAXY_CONN, the deployed-ZB prerequisite, and the covered GalaxyRepository RPCs.
IntegrationTests-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13, src/MxGateway.Server/Configuration/LdapOptions.cs:27 |
| Status | Resolved |
Description: DashboardLdapLiveTests builds the authenticator with new GatewayOptions(), so it relies on LdapOptions.RequiredGroup defaulting to GwAdmin and asserts the admin user is a member of a GwAdmin LDAP group. glauth.md does not list GwAdmin as a provisioned group — it lists admin only in the five role groups and describes GwAdmin as a group to add "when reuse isn't enough." If GLAuth has only the documented baseline groups, AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (not skips) on any box where the env var is set. This is an undocumented hard prerequisite beyond "LDAP is up."
Recommendation: Either document the required GwAdmin GLAuth provisioning step in glauth.md and GatewayTesting.md, or have the test set RequiredGroup to a baseline group glauth.md guarantees admin belongs to (e.g. WriteOperate).
Resolution: Resolved 2026-05-18: Took the documentation fix — promoted the glauth.md "Adding a gw-specific group" section into a concrete "Provisioning the GwAdmin group" step that grants GwAdmin to admin, cross-referenced it from the groups/verification sections, and added a "Live LDAP" section to docs/GatewayTesting.md calling out GwAdmin as a hard prerequisite. Alternative considered: weaken the test to a baseline group (WriteOperate) — rejected because GwAdmin is the real default LdapOptions.RequiredGroup and the test should exercise it.
IntegrationTests-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:89-97 |
| Status | Resolved |
Description: The test asserts only on the first MxEvent recorded by RecordingServerStreamWriter. A live MXAccess provider can deliver an initial state/quality event whose family or handles differ from the expected OnDataChange (e.g. a registration-state or bad-quality bootstrap event). Because WaitForFirstMessageAsync returns whatever arrives first, a genuine ordering/family defect could fail spuriously or leave later wrong events unverified.
Recommendation: Filter for the first event with Family == OnDataChange (with a bounded retry/poll) or assert the full recorded sequence, so the test verifies the event the worker is supposed to emit.
Resolution: Resolved 2026-05-18: Confirmed against source — WaitForFirstMessageAsync completed a TaskCompletionSource on the very first WriteAsync. Replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages, skips earlier non-matching events, and blocks on a SemaphoreSlim until a matching one arrives or the timeout elapses (throwing a TimeoutException that reports the scanned count). GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now waits for the first Family == OnDataChange event. Live execution was not possible in this environment (no MXAccess COM); verified by build.
IntegrationTests-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:108-111 |
| Status | Resolved |
Description: In the finally block, after CloseSessionAsync, the test does await streamTask.WaitAsync(StreamShutdownTimeout). If closing the session does not promptly complete the stream (or StreamEvents itself faults), this throws TimeoutException from inside finally, which replaces/masks any original assertion failure from the try block. The diagnostic value of the real failure is lost.
Recommendation: Wrap the streamTask.WaitAsync (and ideally WaitForProcessesAsync) in a try/catch that logs the cleanup exception via output.WriteLine instead of letting it propagate.
Resolution: Resolved 2026-05-18: Confirmed — the finally block awaited streamTask.WaitAsync and WaitForProcessesAsync with no exception handling. Extracted a shared ShutDownAsync helper that wraps the session-close + stream-drain in one try/catch and the worker-process wait in a second try/catch, logging each cleanup exception via output.WriteLine instead of throwing. All three live tests now route shutdown through it, so a cleanup timeout can no longer mask an assertion failure. Live execution was not possible in this environment; verified by build.
IntegrationTests-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs |
| Status | Resolved |
Description: The only live MXAccess test covers the Register→AddItem→Advise→one-OnDataChange→Close happy path. CLAUDE.md stresses that MXAccess parity is the contract and calls out non-obvious behaviors (WriteSecured ordering, OperationComplete semantics, invalid-handle exceptions). None of Write, WriteSecured, Unadvise, RemoveItem, Unregister, OperationComplete, an invalid-handle command, or a worker-fault path is exercised against live COM — exactly the paths fake-worker tests cannot validate.
Recommendation: Add live coverage for at least a Write round-trip and an invalid-handle command, plus a worker-fault/abnormal-exit scenario, even if behind additional opt-in env vars.
Resolution: Resolved 2026-05-18: Added two [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests. GatewaySession_WithLiveWorker_WritesValueToAdvisedItem registers/adds/advises then issues a Write of an integer value, asserting the command round-trips with ProtocolStatusCode.Ok and MxCommandKind.Write. GatewaySession_WithLiveWorker_InvalidHandleCommand_SurfacesFailureWithoutTransportFault issues AddItem against int.MaxValue as the server handle (never issued by MXAccess) and asserts the failure surfaces in the command reply without a usable item handle. Both reuse the existing opt-in env var and the ShutDownAsync cleanup helper. A worker-fault/abnormal-exit case was deliberately scoped out — it needs a controlled COM crash injection beyond what the existing harness supports; the two added cases cover the Write round-trip and invalid-handle paths the recommendation calls out. Live execution was not possible in this environment; verified by build.
IntegrationTests-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs |
| Status | Resolved |
Description: LDAP live coverage is two cases: admin succeeds, readonly is denied for missing group. There is no coverage of a wrong password for a valid user, an unknown username, or the LDAP-server-unreachable path — all of which DashboardAuthenticator has distinct branches for (the LdapException catch, the candidate is null branch). The negative test only proves group-membership denial, not credential rejection.
Recommendation: Add a live test for admin with a wrong password asserting Succeeded == false and that the password is not leaked into FailureMessage, and a test for an unknown username.
Resolution: Resolved 2026-05-18: Added three [LiveLdapFact]-gated tests to DashboardLdapLiveTests. AuthenticateAsync_AdminWithWrongPassword_FailsWithoutLeakingPassword exercises the LdapException catch via a rejected candidate bind and asserts the wrong password never reaches FailureMessage. AuthenticateAsync_UnknownUsername_Fails exercises the candidate is null branch. AuthenticateAsync_ServerUnreachable_FailsWithoutThrowing builds the authenticator with LdapOptions.Port = 1 (a reserved port no LDAP server listens on) and asserts the connect failure is absorbed into a failed result rather than thrown — covering the generic catch (Exception) branch. All three are gated by the existing MXGATEWAY_RUN_LIVE_LDAP_TESTS opt-in so they stay opt-in. Live execution was not possible in this environment (no live LDAP); verified by build.
IntegrationTests-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Concurrency & thread safety |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:20, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:5, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:9 |
| Status | Resolved |
Description: The live test classes contend for genuinely shared singletons — one MXAccess COM provider, one ZB SQL database, one GLAuth instance with a 3-fail/10-minute per-IP lockout. No [Collection] annotation or DisableTestParallelization is declared, so xUnit's default cross-class parallelism could run the Galaxy tests concurrently or interleave an LDAP failure burst that trips the GLAuth lockout.
Recommendation: Place the live test classes in a shared [Collection], or set [assembly: CollectionBehavior(DisableTestParallelization = true)] for this opt-in project, so live external resources are accessed serially.
Resolution: Resolved 2026-05-18: Confirmed — no [Collection] or assembly-level CollectionBehavior existed. Added LiveResourcesCollection.cs with a [CollectionDefinition(Name, DisableParallelization = true)] and applied [Collection(LiveResourcesCollection.Name)] to WorkerLiveMxAccessSmokeTests, GalaxyRepositoryLiveTests, and DashboardLdapLiveTests. A named collection (rather than an assembly-wide DisableTestParallelization) was chosen so the live classes serialize against each other and within themselves while non-live tests (IntegrationTestEnvironmentTests) keep parallelizing. Verified by build; live tests not executed (no MXAccess COM / live LDAP in this environment).
IntegrationTests-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/LiveLdapFactAttribute.cs, src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs, src/MxGateway.IntegrationTests/LiveMxAccessFactAttribute.cs |
| Status | Resolved |
Description: Three near-identical fact attributes each re-implement the same "compare env var to 1 with StringComparison.Ordinal, set Skip otherwise" pattern. LiveMxAccessFactAttribute delegates to IntegrationTestEnvironment while the other two inline the logic, so the project has two divergent styles for the same concern.
Recommendation: Extract a shared helper (e.g. IntegrationTestEnvironment.IsEnabled(string variableName)) and have all three attributes call it.
Resolution: Resolved 2026-05-18: Confirmed — LiveLdapFactAttribute.Enabled and LiveGalaxyRepositoryFactAttribute.Enabled each inlined the ordinal == "1" comparison while LiveMxAccessFactAttribute delegated to IntegrationTestEnvironment. Added IntegrationTestEnvironment.IsEnabled(string variableName) as the single implementation; LiveMxAccessTestsEnabled, LiveLdapFactAttribute.Enabled, and LiveGalaxyRepositoryFactAttribute.Enabled now all call it. Verified by build.
IntegrationTests-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:372-375 |
| Status | Resolved |
Description: TestServerCallContext is XML-documented as a "Mock server call context," but it is a hand-written stub/fake with no mocking framework and no verification behavior. Per the style guides (accurate naming; explain why not what), calling it a mock misleads readers who may expect verifiable interactions.
Recommendation: Reword the summary to "test stub" / "minimal ServerCallContext implementation for in-process gRPC calls."
Resolution: Resolved 2026-05-18: Confirmed — the summary read "Mock server call context for testing gRPC calls." Reworded to "Minimal ServerCallContext stub for invoking the gRPC service in-process," noting it is a hand-written fake with no verification behavior. No mocking framework is involved; this is a documentation-only fix. Verified by build.
IntegrationTests-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:366-369 |
| Status | Resolved |
Description: WaitForFirstMessageAsync accepts only a timeout and never observes a CancellationToken. There is no per-test cancellation propagation, so if the gateway/worker hangs without writing an event the test relies solely on the 15s WaitAsync timeout and gives no contextual diagnostics. Combined with IntegrationTests-004, a hung live worker produces a bare TimeoutException.
Recommendation: Accept a CancellationToken (linked to TestServerCallContext's token), pass it to firstMessage.Task.WaitAsync(timeout, token), and on timeout emit the recorded Messages count via output.WriteLine before throwing.
Re-triage: The named method WaitForFirstMessageAsync no longer exists — IntegrationTests-003's resolution renamed/replaced it with RecordingServerStreamWriter.WaitForMessageAsync(predicate, timeout), which scans recorded messages and blocks on a SemaphoreSlim. The underlying defect still held: that replacement method also took only a timeout and never observed a CancellationToken. The finding remains valid (Low, Correctness) against the renamed method; the recommendation's firstMessage.Task.WaitAsync detail is stale but the intent (thread a token, surface a count on timeout) is unchanged.
Resolution: Resolved 2026-05-18: Added an optional CancellationToken parameter to WaitForMessageAsync, linked with the existing timeout source via CancellationTokenSource.CreateLinkedTokenSource, so a per-test cancellation aborts the wait promptly. GatewaySession_WithLiveWorker_RegistersAdvisesStreamsDataAndCloses now creates a CancellationTokenSource, passes its token into the StreamEvents TestServerCallContext and into WaitForMessageAsync, so the stream call and the wait share one cancellation source. On timeout the method already throws a TimeoutException whose message includes the scanned message count, satisfying the "emit recorded count" intent (the count surfaces in the test failure rather than via a separate output.WriteLine). Verified by build; live tests not executed.
IntegrationTests-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Documentation & comments |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:236-240, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:183-187 |
| Status | Resolved |
Description: The XML/inline comments on the two new MXAccess parity tests misdescribe how the gateway surfaces an MXAccess failure. The invalid-handle test reads "the gateway protocol status is Ok and the failure shows up in hresult / the status proxies — it must not be reported as a transport fault", then asserts Assert.NotEqual(ProtocolStatusCode.Ok, addItemReply.ProtocolStatus.Code). HResultConverter.CreateProtocolStatus (src/MxGateway.Worker/Conversion/HResultConverter.cs:39) actually sets Code = ProtocolStatusCode.MxaccessFailure whenever the COM call throws (HRESULT ≠ 0), so the assertion is correct but the comment is wrong — the protocol status is not Ok on an MXAccess failure. The write-round-trip test carries the same misleading framing on lines 183-187 ("MXAccess parity details … belong in hresult / statuses, not in a transport failure") immediately before asserting Ok. A reader can reasonably conclude the gateway always reports Ok for round-tripped commands and tweak code accordingly. The intended distinction is "this is not a gRPC transport fault" (the RPC reply still arrives) — the protocol status code carries the MXAccess outcome.
Recommendation: Reword the invalid-handle comment to "the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero Hresult carrying the COM failure, not a gRPC transport fault." Reword the write-round-trip comment to clarify it is asserting the happy-path Ok and that an MXAccess rejection would surface as MxaccessFailure (per HResultConverter), not as a RpcException.
Resolution: 2026-05-20 — Reworded the invalid-handle test comment to say the gateway must reply with ProtocolStatusCode.MxaccessFailure and a non-zero hresult carrying the COM failure (per HResultConverter), and reworded the write-round-trip comment to make explicit it is asserting the happy-path Ok while an MXAccess rejection would surface as MxaccessFailure, never as an RpcException.
IntegrationTests-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:147-151 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_WritesValueToAdvisedItem constructs a RecordingServerStreamWriter<MxEvent> and starts a StreamEvents task, then never reads from it and never asserts anything about the recorded messages. The test verifies only that the Write command round-trips at the protocol level — it does not verify that the worker actually emits any event after the write (for example an OnWriteComplete, which is the proof of round-trip used by the cross-language client e2e runner). Because the stream task is started with new TestServerCallContext() (no cancellation source), any fault raised by the stream task (an exception from EventStreamService, a session-not-found, a backpressure overflow) is swallowed — streamTask is later awaited in ShutDownAsync only inside a broad catch (Exception ex), which logs and continues. The Write test therefore cannot fail on stream-task faults. Two consequences: (a) the live Write parity coverage promised in IntegrationTests-005 is weaker than it appears, and (b) the fixture (eventWriter) is dead code in this test that suggests an assertion was intended.
Recommendation: Either remove the unused eventWriter/StreamEvents plumbing from the Write test so the test scope matches its assertions, or — preferred — extend the test to wait for an OnWriteComplete event for the written item via eventWriter.WaitForMessageAsync(candidate => candidate.Family == MxEventFamily.OnWriteComplete && candidate.ItemHandle == itemHandle, ...), matching the round-trip proof used by scripts/run-client-e2e-tests.ps1 -VerifyWrite.
Resolution: Resolved 2026-05-20: Rewrote GatewaySession_WithLiveWorker_WritesValueToAdvisedItem so the previously-dead eventWriter/StreamEvents plumbing actually drives an assertion. The test now waits for an OnWriteComplete event matching the Write's (server, item) handle pair via eventWriter.WaitForMessageAsync (using IntegrationTestEnvironment.LiveMxAccessEventTimeout), and asserts the recorded event's family, session id, and handles — the same round-trip proof the cross-language client e2e runner uses. The stream call is now bound to a CancellationTokenSource and the test asserts streamTask.IsFaulted == false before cleanup. ShutDownAsync gained an opt-in propagateStreamFaults flag so a faulted StreamEvents task is rethrown into the test rather than silently swallowed by the broad cleanup catch; the cancellation token is also signalled before the drain so StreamEvents observes a clean shutdown instead of a forced timeout. Verified by build and by confirming the test skips cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.
IntegrationTests-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:519-609 |
| Status | Resolved |
Description: RecordingServerStreamWriter<T> owns a SemaphoreSlim messageArrived (IDisposable) but does not itself implement IDisposable, so the semaphore's wait handle is never released back to the OS. Each live test allocates one such writer and discards it at scope exit. Live tests run on opt-in only, so the cumulative leak is bounded, but the type holds an IDisposable field — the standard hygiene under Directory.Build.props's TreatWarningsAsErrors=true is to either dispose the field or document why not. CA2213 does not fire because the owner is not itself IDisposable; an analyzer-driven warning is the only reason this is not a build break, not an indication that the leak is acceptable.
Recommendation: Make RecordingServerStreamWriter<T> implement IDisposable, dispose messageArrived in Dispose, and wrap each instantiation in a using block (using RecordingServerStreamWriter<MxEvent> eventWriter = new();).
Resolution: 2026-05-20 — RecordingServerStreamWriter<T> now implements IDisposable and its Dispose releases the messageArrived semaphore. All six live tests in WorkerLiveMxAccessSmokeTests now allocate the writer with a top-of-method using declaration so the semaphore's wait handle is released on scope exit even when the test body throws.
IntegrationTests-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs |
| Status | Resolved |
Description: IntegrationTests-005 was resolved by adding live coverage for Write and an invalid-handle AddItem, but its resolution explicitly scoped out the worker-fault/abnormal-exit case and silently dropped Unadvise, RemoveItem, Unregister, OperationComplete, and WriteSecured ordering. CLAUDE.md singles out WriteSecured ("WriteSecured failing before a value-bearing NMX body") and OperationComplete semantics as parity surprises the gateway must not "fix" — exactly the paths fake-worker tests cannot validate. After this commit the live MXAccess smoke still doesn't exercise any teardown command, the secured-write ordering rule, or a deliberately faulted worker. A regression in any of these would only be caught by manual testing.
Recommendation: Add live MXAccess coverage for the teardown chain (Unadvise then RemoveItem then Unregister, asserting each replies with ProtocolStatusCode.Ok and the next operation no longer references the freed handle), and at minimum one WriteSecured parity case asserting the documented ordering. A worker-fault test can be deferred to a separate finding once a deterministic COM-crash injection harness exists.
Resolution: Resolved 2026-05-20: Added three new [LiveMxAccessFact]-gated tests to WorkerLiveMxAccessSmokeTests, all reusing the existing opt-in env var and ShutDownAsync cleanup helper. (1) GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity runs Register → AddItem → Advise → wait for one OnDataChange → UnAdvise → RemoveItem → Unregister, asserting each step replies Ok with the matching MxCommandKind, that no further OnDataChange events for the un-advised (server, item) pair arrive after a 500 ms settle window, and that a second RemoveItem against the freed handle returns a non-Ok MXAccess failure (so a regression that left a stale subscription or accepted a stale handle would surface). (2) GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves an ArchestrA user id via AuthenticateUser (credentials env-overridable through MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER / ..._PASSWORD, defaulting to the admin/admin123 GLAuth user from glauth.md), issues WriteSecured against an advised item, and asserts the reply carries MxCommandKind.WriteSecured, the protocol status is one of the documented parity outcomes (Ok for an unprotected provider, MxaccessFailure when the item is not WriteSecured-eligible — never a transport fault), and the credential never leaks into the diagnostic message. (3) GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted opens a session, kills the worker process tree (via a new TestWorkerProcessFactory.KillAllAndDetach helper) without going through CloseSession, and polls the session via a new GatewayServiceFixture.TryGetSession accessor until it transitions to SessionState.Faulted within the live event timeout; asserts the final state is Faulted, that FinalFault is non-empty, and that the fault description carries a known worker-client classification (pipe disconnected / worker faulted / heartbeat expired / end-of-stream). docs/GatewayTesting.md was updated to list all five parity surfaces and the two new env-var defaults. Verified by build and confirmed all six live tests skip cleanly when MXGATEWAY_RUN_LIVE_MXACCESS_TESTS is unset.
IntegrationTests-015
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:30,119,201, src/MxGateway.IntegrationTests/DashboardLdapLiveTests.cs:13,32,48,67,84, src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs:10,22,34,52 |
| Status | Resolved |
Description: Every live-test method in the three live classes carries an identical [Trait("Category", "LiveMxAccess")] (or LiveLdap / LiveGalaxy) attribute. The trait is uniform within each class and is exactly the information the [Collection(LiveResourcesCollection.Name)] class-level attribute also implies. xUnit's [Trait] is inheritable from the class to its methods, so the same metadata can be declared once at class scope. The current shape adds maintenance burden — adding a new test in any of these classes requires remembering to add the trait, and the existing pattern's LiveLdap includes five copies of the same line.
Recommendation: Move each [Trait("Category", ...)] to the class declaration alongside the existing [Collection(...)], and remove the per-method copies. Verify the trait still surfaces in --filter Trait=Category=LiveLdap after the change.
Resolution: 2026-05-20 — Lifted [Trait("Category", "LiveMxAccess")], [Trait("Category", "LiveLdap")], and [Trait("Category", "LiveGalaxy")] to the class declarations of WorkerLiveMxAccessSmokeTests, DashboardLdapLiveTests, and GalaxyRepositoryLiveTests respectively (alongside the existing [Collection(LiveResourcesCollection.Name)]), and removed all per-method duplicates. xUnit propagates class-level traits to every method, so --filter Category=Live* filters still match.
IntegrationTests-016
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/Galaxy/LiveGalaxyRepositoryFactAttribute.cs:26, src/MxGateway.Server/Galaxy/GalaxyRepositoryOptions.cs:13 |
| Status | Resolved |
Description: The default Galaxy Repository connection string "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;" is duplicated verbatim between the production GalaxyRepositoryOptions.ConnectionString initializer and the test-side LiveGalaxyRepositoryFactAttribute.ConnectionString fallback. The docs (docs/GatewayTesting.md) document the value once and reference it from both places. If the production default changes (e.g. tightening to a named instance, or switching to a SQL-auth template), the test default silently keeps the old string and the live Galaxy tests connect to the wrong server. The drift is invisible to the build.
Recommendation: Expose the production default through a public const string on GalaxyRepositoryOptions (e.g. DefaultConnectionString) and have LiveGalaxyRepositoryFactAttribute.ConnectionString read Environment.GetEnvironmentVariable(ConnectionStringVariableName) ?? GalaxyRepositoryOptions.DefaultConnectionString. Single source of truth, build-time guarantee they cannot drift.
Resolution: 2026-05-20 — Added public const string GalaxyRepositoryOptions.DefaultConnectionString carrying the production default, set the ConnectionString initializer to reference it, and changed LiveGalaxyRepositoryFactAttribute.ConnectionString to fall back to GalaxyRepositoryOptions.DefaultConnectionString. The literal now lives in exactly one place and any future change to the production default propagates to the live-test fallback at compile time.
IntegrationTests-017
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:350-407 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_UnadviseRemoveItemUnregister_TeardownOrderingParity proves the subscription is live by waiting for one matching OnDataChange, snapshots dataChangeCountBeforeUnadvise, then sends UnAdvise, waits 500 ms, snapshots dataChangeCountAfterTeardown, and asserts strict equality. The assertion races against the natural cadence of the live MXAccess provider:
- After
WaitForMessageAsyncreturns the first match, any additionalOnDataChangefor the same(serverHandle, itemHandle)published by the provider before the worker processesUnAdviseis delivered into the recording writer. - The snapshot at line 362 is taken immediately before the
UnAdvisecommand is sent (line 370). Events that arrive in the window between that snapshot and the worker processingUnAdvise(network round-trip + STA dispatch + worker pipe send + gateway channel write) are racing in — they are not "after UnAdvise" but they will be in the post-teardown snapshot. MXAccessproviders can publishOnDataChangeat sub-second cadence; the strict-equality assertion has no slack for in-flight events.
The test passes today only because the chosen test item (TestChildObject.TestInt) likely changes value rarely. Against a more active item — or on a slower machine where the round-trip widens — the assertion would flap. The intent ("no further events after the worker stops the subscription") would be better expressed by capturing the snapshot after UnAdvise returns Ok rather than before it is issued.
Recommendation: Move the "before" snapshot to immediately after UnAdvise returns Ok (the point past which the parity rule applies), or weaken the assertion to "no events with a WorkerSequence strictly greater than the last sequence observed within dataChangeCountBeforeUnadvise + N events arrived in the post-teardown drain" where N accounts for the documented in-flight window. Either change moves the test from racing on provider cadence to verifying the actual parity rule.
Resolution: 2026-05-20 — Removed the dataChangeCountBeforeUnadvise snapshot taken just before the UnAdvise command (the source of the race) and replaced the strict-equality assertion against a pre-teardown count with a two-window stability check taken after the teardown chain completes. The test now waits one 500 ms settle window for in-flight OnDataChange events (which the provider already published before the worker acknowledged UnAdvise) to drain, captures dataChangeCountAfterFirstSettle, waits another 500 ms, and asserts the count is unchanged in dataChangeCountAfterSecondSettle. The parity rule under test ("no further OnDataChange after the worker stops the subscription") is now expressed as steadiness across the post-teardown window rather than equality with a count snapshotted during the round-trip race, so a slower machine or a more active test item no longer flaps the assertion while a genuine regression (a stale subscription continuing to fire) still surfaces as a count drift between the two settles.
IntegrationTests-018
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:1037, src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:595 |
| Status | Resolved |
Description: GatewayServiceFixture.TryGetSession(string sessionId, out GatewaySession session) declares session as a non-nullable GatewaySession, but its implementation is _registry.TryGet(sessionId, out session), which (in SessionRegistry.cs:43) uses session! to silence the nullability warning on Dictionary.TryGetValue. On a false return the out parameter is null, contradicting the non-nullable annotation. The caller at line 595 binds it as out GatewaySession? session, which compiles only because non-nullable-to-nullable variance is permitted — but no callsite tooling will warn that a false return yields a null value through what the fixture's contract describes as non-nullable. The repo enforces Nullable=enable;TreatWarningsAsErrors=true (src/Directory.Build.props), so the convention is for TryX patterns to either annotate the out as T? or to mirror BCL Dictionary.TryGetValue<TKey, TValue> (which uses [MaybeNullWhen(false)] out TValue).
Recommendation: Change the signature to public bool TryGetSession(string sessionId, [MaybeNullWhen(false)] out GatewaySession session) (mirroring BCL TryGetValue) and propagate the same annotation down through ISessionRegistry.TryGet / SessionRegistry.TryGet so the session! fudge can be removed. The call sites already treat the parameter as nullable; aligning the declaration removes the silent contract gap.
Resolution: 2026-05-20 — Propagated [MaybeNullWhen(false)] through the entire TryGet* chain. GatewayServiceFixture.TryGetSession, ISessionManager.TryGetSession / SessionManager.TryGetSession, and ISessionRegistry.TryGet/TryRemove plus their SessionRegistry implementations now carry the BCL Dictionary.TryGetValue-style annotation, and the session! null-forgiving operator inside SessionRegistry.TryGet / TryRemove was removed because the annotation makes it redundant. Existing in-tree callers (SessionManagerTests.cs line 28) were updated to out GatewaySession? to match. The compiler now warns at callsites that read session without checking the boolean return, closing the silent contract gap. Verified by dotnet build src/MxGateway.IntegrationTests/... (0 warnings), dotnet build src/MxGateway.Tests/... (0 warnings), and dotnet test src/MxGateway.Tests/... (479 passed).
IntegrationTests-019
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Security |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:497-534 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_WriteSecured_AuthenticatedRoundTripParity resolves a credential pair via ResolveLiveMxAccessSecuredCredentials, passes the password into AuthenticateUser and WriteSecured, and asserts the password is absent from writeSecuredReply.DiagnosticMessage. CLAUDE.md's secret-handling rule is broader: "API keys, passwords, WriteSecured payloads, and AuthenticateUser credentials must never reach logs." The test's assertion covers only one of the surfaces the rule protects:
- The
TestOutputLoggerProviderwrites every gateway-sideILoggerentry toITestOutputHelper. A regression that logged the request body (or theWorkerCommandRequestenvelope) would put the password into test output without failing this test. WriteWorkerOutputechoes workerstdout/stderrlines toITestOutputHelper. A worker-side regression that printed the credential (e.g. a debug log added toMxAccessCommandExecutor) would land in test output without failing this test.output.WriteLine(...)calls in the test body (AuthenticateUser status=... user_id=...andLogReply("WriteSecured", ...)) currently don't include the request body, but a future maintenance change that printedcommand.WriteSecured.Valueor a similar struct dump would silently leak the credential past the existing assertion.
Because ITestOutputHelper doesn't expose its accumulated text to the test, the assertion can only be made by buffering output through a recording sink the test owns.
Recommendation: Replace the bare ITestOutputHelper injection (just for the WriteSecured test, or for all live MXAccess tests) with a recording wrapper that mirrors writes both to the xUnit output and to a StringBuilder. At the end of the test, assert the buffer does not contain verifyPassword. This makes the credential-redaction contract a property of the entire test run, not just the one explicit field. Alternative: route the test through GatewayLogRedactor (src/MxGateway.Server/Diagnostics/GatewayLogRedactor.cs) so the credential-bearing commands are redacted at the logger sink the test sees.
Resolution: 2026-05-20 — Added a RecordingTestOutputHelper private class that implements ITestOutputHelper, mirrors every line to the wrapped xUnit sink, and accumulates the same text into a StringBuilder exposed via a Captured property. The WriteSecured parity test now constructs this wrapper, passes it to both TestWorkerProcessFactory (so worker stdout/stderr lines flow through it) and GatewayServiceFixture (so the TestOutputLoggerProvider's gateway-ILogger entries flow through it), and uses it for every direct WriteLine. A new LogReplyTo(ITestOutputHelper sink, …) static helper underpins the existing LogReply instance method so the test body can route reply logging through the recording wrapper. After the cleanup finally block completes, the test asserts recordedOutput.Captured does not contain the verify password. The credential-redaction contract is now enforced across the gateway-logger sink, worker stdout/stderr echo, and every test-body WriteLine — a future regression that dumped the request body, the WorkerCommandRequest envelope, or the WriteSecured payload would land in the buffer and fail the assertion.
IntegrationTests-020
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:616-622 |
| Status | Resolved |
Description: GatewaySession_WithLiveWorker_AbnormalWorkerExit_MarksSessionFaulted asserts the FinalFault description contains at least one of these substrings (case-insensitive): disconnect, pipe, heartbeat, worker, end of stream. The intent (per the IntegrationTests-014 resolution prose) is to verify the gateway surfaces "a known worker-client classification". The "worker" substring defeats that intent — the gateway routes through SetFaulted with messages like "Worker pipe disconnected.", "Worker shutdown timed out.", "Worker was killed by the gateway: …", "Worker heartbeat expired. …", "Worker event channel rejected an event.", "Worker pipe write failed.", "Worker read loop failed.", "Worker sent unexpected envelope body …" — every classification message begins with the word "Worker". A regression that introduced an entirely new fault path with a generic message containing the word Worker would still pass this test.
CLAUDE.md singles out "abnormal exit" as one of the parity surfaces (SessionState.Faulted with an actionable cause), so the test's documented value is verifying which of the WorkerClient error codes drove the transition. Today the assertion is effectively Assert.NotEmpty(observedFault).
Recommendation: Tighten the keyword set to the specific classifications the abnormal-exit (kill-the-process) path actually drives — PipeDisconnected ("pipe", "disconnect") and EndOfStream ("end of stream"). Drop the broad "worker" term, and drop "heartbeat" unless the test deliberately covers the heartbeat path too (it does not — HeartbeatGraceSeconds = 15 and the poll deadline is StreamShutdownTimeout = 10 seconds, so a heartbeat-expired transition is impossible inside the wait window). If a more exhaustive matrix is wanted, assert FinalFault.StartsWith("Worker pipe disconnected") against the message constant in WorkerClient.cs:380 so a rename surfaces as a compile-time / test-time failure.
Resolution: 2026-05-20 — Tightened the keyword set to the specific classifications the kill-the-process path actually drives. The assertion now requires the FinalFault description to contain "pipe disconnected" (matching the WorkerClient.cs:378-381 WorkerFrameProtocolErrorCode.EndOfStream → WorkerClientErrorCode.PipeDisconnected → "Worker pipe disconnected." message) or "end of stream", dropping the broad "worker" term that previously matched every WorkerClient fault message (all of which begin with "Worker"), and dropping "heartbeat" because the test's StreamShutdownTimeout (10 s) is below HeartbeatGraceSeconds (15 s) so a heartbeat-expired transition cannot occur inside the poll window. A regression that routed an unrelated fault classification through the abnormal-exit path would now fail loudly instead of silently passing.
IntegrationTests-021
| Field | Value |
|---|---|
| Severity | Low |
| Category | Testing coverage |
| Location | src/MxGateway.IntegrationTests/WorkerLiveMxAccessSmokeTests.cs:579-622 |
| Status | Resolved |
Description: The abnormal-exit test only polls session.State and session.FinalFault. It does not assert anything about streamTask after the kill. The chain that puts the session into Faulted is: the read loop hits EOS → SetFaulted(PipeDisconnected, …) → _events.Writer.TryComplete(fault) → ReadEventsAsync propagates the WorkerClientException → EventStreamService.ProduceEventsAsync's catch (Exception exception) when exception is WorkerClientException calls session.MarkFaulted(exception.Message). The test verifies the end state of that chain but not that the StreamEvents call is what produced the transition. If a future change moved the MarkFaulted call somewhere else (a session-manager background watcher, for example), the test would still pass — but the stream task could now silently swallow the fault. A direct assertion that streamTask.IsFaulted (or that awaiting it throws a WorkerClientException) would protect that contract.
The Write parity test (IntegrationTests-012's resolution) added exactly this assertion (Assert.False(streamTask.IsFaulted, …)). The abnormal-exit test should add the inverse: the stream task must be faulted (or at least completed with a WorkerClientException) after the kill.
Recommendation: After the session-state poll succeeds, assert streamTask.IsCompleted (the channel has terminated) and inspect streamTask.Exception?.InnerException for a WorkerClientException (or assert streamTask.IsFaulted and await with ShouldThrowAsync). This couples the test to the actual fault-propagation path and prevents a future refactor that bypasses the stream from quietly weakening the coverage. Compare to the existing Assert.False(streamTask.IsFaulted, …) on line 217 — the abnormal-exit case wants the opposite assertion.
Resolution: 2026-05-20 — After the session-state poll loop confirms SessionState.Faulted, the test now awaits streamTask.WaitAsync(StreamShutdownTimeout) (with a try/catch that logs the surfaced exception type/message), then asserts streamTask.IsCompleted and streamTask.IsFaulted. This couples the test to the actual fault-propagation chain — read loop hits EndOfStream → WorkerClient.SetFaulted(PipeDisconnected, …) → ReadEventsAsync propagates the fault → EventStreamService calls session.MarkFaulted → MxAccessGatewayService.StreamEvents re-throws the mapped RpcException. A future refactor that moved MarkFaulted off the stream-consumption path would leave streamTask completing cleanly, which the new IsFaulted assertion would now catch (inverse of the existing Assert.False(streamTask.IsFaulted, …) in the Write parity test on line 217). The inner-exception type assertion was deliberately omitted because the gateway maps WorkerClientException to RpcException at the public boundary (MxAccessGatewayService.MapWorkerClientException); asserting on the surface type alone would be brittle, while the IsFaulted check directly tests the contract the recommendation is protecting.